Aura's edit's to the text bits of Chapter 3-4 (#627)

Co-authored-by: Tuomas Borman <[email protected]>
microbiome · Oct 7, 2024 · 8e01062 · 8e01062
1 parent 917f7c7
commit 8e01062
Show file tree

Hide file tree

Showing 2 changed files with 61 additions and 56 deletions.
diff --git a/inst/pages/containers.qmd b/inst/pages/containers.qmd
@@ -9,7 +9,7 @@ chapterPreamble()
 This section provides an introduction to `TreeSummarizedExperiment (TreeSE)`
 and `MultiAssayExperiment (MAE)` data containers introduced in
 [@sec-microbiome-bioc]. In microbiome data science, these containers
-link taxonomic abundance tables with rich side information on the features and
+link taxonomic abundance tables with rich side information on the taxa and
 samples. Taxonomic abundance data can be obtained by 16S rRNA amplicon
 or metagenomic sequencing, phylogenetic microarrays, or by other
 means. Many microbiome experiments include multiple versions and types
@@ -76,12 +76,12 @@ tse
 ```
 
 The `TreeSE` object, similar to a standard `data.frame` or `matrix`, has rows
-and columns. Typically, samples are stored in columns, while features or taxa
+and columns. Typically, samples are stored in columns, while taxa (or features)
 are stored in rows. You can extract subsets of the data, such as the first
-five rows and certain three columns. The object manages the linkages
-between data, ensuring, for example, that when you subset the data, for
-instance, both the assay and sample metadata are subsetted simultaneously,
-ensuring they remain matched with each other.
+five rows (first five taxa) and certain three columns (certain three samples). 
+The object manages the linkages between data (e.g., the assay data and the sample
+metadata), ensuring that when you subset the data object, all its parts 
+are subsetted simultaneously, such that they remain matched with each other.
 
 ```{r}
 #| label: subset_intro
@@ -105,21 +105,24 @@ from [@sec-treese_subsetting].
 
 ## Assay data {#sec-assay-slot}
 
-The microbiome is the collection of all microbes (such as bacteria, viruses,
-fungi, etc.) in the body. When studying these microbes, abundance data is
-needed, and that’s where assays come in.
-
-An assay is a way of measuring the presence and abundance of different types
-of microbes in a sample. For example, if you want to know how many bacteria of
-a certain type are in your gut, you can use an assay to measure this. When
-storing assays, the original data is count-based. However, the original
-count-based taxonomic abundance tables may undergo different
-transformations, such as logarithmic, Centered Log-Ratio (CLR), or relative
-abundance. These are typically stored in _**assays**_. See
-[@sec-assay-transform] for more information on transformations.
-
-The `assays` slot contains the experimental data as multiple count matrices.
-The result of `assays` is a list of matrices.
+When studying microbiomes, the primary type of data is in the form of abundance
+of given microbes in given samples. This sample per taxa table forms the 
+core of the TreeSE object, called the  ‘The assay data’ .
+
+An assay is a measurement of the presence and abundance of different microbial
+taxa in a sample. The assay data records this in a table where rows are unique
+taxa and columns are unique samples and each entry contains a number 
+describing how many of a given taxon is present in a given sample. Note that 
+when storing assays, the original data is count-based. However, due to the 
+nature of how microbiome data is produced, these count-based abundances rarely 
+reflect the true counts of microbial taxa in the sample, and thus the abundance 
+tables often undergo different transformations, such as logarithmic, Centered 
+Log-Ratio (CLR), or relative abundance to make these abundance values comparable 
+with each other. See[@sec-assay-transform] for more information on transformations.
+
+The microbial abundance tables are stored in _**assays**_. The assays slot 
+contains the abundance data as multiple count matrices. The result of assays
+is a list of matrices.
 
 ```{r}
 assays(tse)
@@ -132,9 +135,8 @@ assay(tse, "counts") |> head()
 ```
 
 So, in summary, in the world of microbiome analysis, an assay is essentially
-a way to quantify and understand the composition of microbes in a given sample,
-which is super important for all kinds of research, ranging from human health
-to environment studies.
+a way to describe the composition of microbes in a given sample. This way we
+can summarise the microbiome profile of a human gut or a sample of soil. 
 
 Furthermore, to illustrate the use of multiple assays, we can create an empty
 matrix and add it to the object.
@@ -158,7 +160,7 @@ a requirement for the assays.
 ## colData
 
 `colData` contains information about the samples used in the study. This
-information can include details such as the sample ID, the primers used in
+sample metadata can include details such as the sample ID, the primers used in
 the analysis, the barcodes associated with the sample (truncated or complete),
 the type of sample (e.g. soil, fecal, mock) and a description of the sample.
 
@@ -172,23 +174,27 @@ indicates the sample type (e.g. soil, fecal matter, control) and
 
 ## rowData {#sec-rowData}
 
-`rowData` contains data on the features of the analyzed samples. This is
-particularly important in the microbiome field for storing taxonomic
-information. This taxonomic information is extremely important for
-understanding the composition and diversity of the microbiome in each sample
-analyzed. It enables identification of the different types of microorganisms
-present in samples. It also allows you to explore the relationships between
-microbiome composition and various environmental or health factors.
+`rowData` contains data on the features, such as microbial taxa of the analyzed 
+samples. This is particularly important in the microbiome field for storing 
+taxonomic information, such as the Species, Genus or Family of the different 
+microorganisms present in samples. This taxonomic information is extremely important 
+for understanding the composition and diversity of the microbiome in each sample.
+
 
 ```{r rowdata}
 rowData(tse)
 ```
 
 ## rowTree
 
-Phylogenetic trees also play an important role in the microbiome field. The
-`TreeSE` class can keep track of features and node
-relations via two functions, `rowTree` and `rowLinks`.
+Phylogenetic trees play an important role in the microbiome field. Many times it 
+is useful to know how closely related the microbial taxa present in the data are. 
+For example, to calculate widely-used phylogenetically weighted microbiome 
+dissimilarity metrics such as UniFrac and wUniFrac, we need information
+on not only the presence and abundance of microbial taxa in each sample but 
+also the evolutionary relatedness among these taxa. The `TreeSE` class can 
+keep track of relations among features (taxa) via two functions, 
+`rowTree` and `rowLinks`.
 
 A tree can be accessed via `rowTree` as `phylo` object.
 
@@ -213,7 +219,7 @@ the links in an existing object, the `changeTree()` function is available.
 
 ## Alternative Experiments {#sec-alt-exp}
 
-_**Alternative experiments**_ complement _assays_. They can contain
+_**Alternative experiments**_  (`altExp`) complement _assays_. They can contain
 complementary data, which is no longer tied to the same dimensions as
 the assay data. However, the number of samples (columns) must be the
 same.
@@ -233,8 +239,8 @@ or an object from a derived class with independent feature data.
 
 The following shows how to store taxonomic abundance tables
 agglomerated at different taxonomic levels. However, the data could as
-well originate from entirely different measurement sources as long as
-the samples match.
+well originate from entirely different measurement sources (e.g., 16S 
+amplicon and metagenomic sequence data) as long as the samples match.
 
 Let us first subset the data so that it has only two rows.
 
@@ -257,8 +263,7 @@ altExp(tse, "subsetted") <- tse_sub
 altExpNames(tse)
 ```
 
-We can now subset the data by taking certain samples, for instance, and this
-acts on both `altExp` and assay data.
+Now, if we subset the data, this acts on both the `altExp` and the assay data.
 
 ```{r altexp_agglomerate3}
 tse_single_sample <- tse[, 1]
@@ -271,9 +276,9 @@ to the `SingleCellExperiment` package [@R_SingleCellExperiment].
 
 ## Multiple experiments {#sec-mae}
 
-_**Multiple experiments**_ relate to complementary measurement types,
-such as transcriptomic or metabolomic profiling of the microbiome or
-the host. Multiple experiments can be represented using the same
+_**Multiple experiments**_ relate to complementary measurement types from
+the same samples, such as transcriptomic or metabolomic profiling of the 
+microbiome. Multiple experiments can be represented using the same
 options as alternative experiments, or by using the
 `MAE` class [@Ramos2017]. Depending on how the
 datasets relate to each other the data can be stored as:
@@ -317,10 +322,10 @@ mae <- HintikkaXOData
 mae
 ```
 
-The `sampleMap` is a crucial component of the `MAE` object as it acts as the
+The `sampleMap` is a crucial component of the `MAE` object as it acts as an
 important bookkeeper, maintaining the information about which samples are
 associated with which experiments. This ensures that data linkages are
-correctly managed and preserved across different types of experiments.
+correctly managed and preserved across different types of measurements.
 
 ```{r}
 #| label: show_mae2
@@ -340,15 +345,14 @@ mae
 ::: {.callout-note}
 ## Note
 
-If you have multiple experiments (e.g., different omics data types like
-metagenomics, transcriptomics, proteomics, or metabolomics), the
-`MultiAssayExperiment` object allows you to organize and integrate these
-datasets, even if the samples across experiments don’t have a perfect 1:1 match.
+If you have multiple experiments containing multiple measures from same sources 
+(e.g., patients/host, individuals/sites), you can utilize the `MultiAssayExperiment` 
+object to keep track of which samples belong to which patient.
 
 :::
 
 The following dataset illustrates how to utilize the sample mapping system in
-`MAE`. It includes two omics: biogenic amines and fatty acids,
+`MAE`. It includes two omics data: biogenic amines and fatty acids,
 collected from 10 chickens.
 
 ```{r}
@@ -359,17 +363,17 @@ mae
 ```
 
 We can see that there are more than ten samples per omic dataset due to
-multiple time points collected for some animals. From the `colData` of `MAE`,
-we can observe the animal metadata shared between omics and time points,
+multiple samples from different time points collected for some animals. 
+From the `colData` of `MAE`, we can observe the individual animal metadata,
 including information that remains constant throughout the trial.
 
 ```{r}
 #| label: show_coldata_mae
 colData(mae)
 ```
 
-The `sampleMap` slot now contains mappings between each sample and the
-corresponding animal. There are as many rows as there are total samples.
+The `sampleMap` slot now contains mappings between each unique sample and the
+corresponding individual animal. There are as many rows as there are total samples.
 
 The "colname" column refers to the samples in the omic dataset identified in
 the "assay" column, while the "primary" column provides information about the

diff --git a/inst/pages/import.qmd b/inst/pages/import.qmd
@@ -58,7 +58,7 @@ tree_file_path <- system.file(
 ```
 
 Now we can read in the biom file and convert it into a `TreeSE` object. In
-addition, we retrieve the rank names from the prefixes of the feature names
+addition, we retrieve the rank names from the prefixes of the taxa names
 and then remove them with the `rank.from.prefix` and `prefix.rm` optional
 arguments.
 
@@ -108,7 +108,8 @@ sample_meta <- read.csv(
     sample_meta_file_path, sep = ",", row.names = 1)
 
 # Add this sample data to colData of the taxonomic data object
-# Note that the data must be given in a DataFrame format
+# Note that the samples in the sample data must be in the same  order as
+in the original biom file and that data must be given in a DataFrame format
 colData(tse) <- DataFrame(sample_meta)
 ```