Skip to content

Commit

Permalink
Aura's edit's to the text bits of Chapter 3-4 (#627)
Browse files Browse the repository at this point in the history
Co-authored-by: Tuomas Borman <[email protected]>
  • Loading branch information
nuorenarra and TuomasBorman authored Oct 7, 2024
1 parent 917f7c7 commit 8e01062
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 56 deletions.
112 changes: 58 additions & 54 deletions inst/pages/containers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ chapterPreamble()
This section provides an introduction to `TreeSummarizedExperiment (TreeSE)`
and `MultiAssayExperiment (MAE)` data containers introduced in
[@sec-microbiome-bioc]. In microbiome data science, these containers
link taxonomic abundance tables with rich side information on the features and
link taxonomic abundance tables with rich side information on the taxa and
samples. Taxonomic abundance data can be obtained by 16S rRNA amplicon
or metagenomic sequencing, phylogenetic microarrays, or by other
means. Many microbiome experiments include multiple versions and types
Expand Down Expand Up @@ -76,12 +76,12 @@ tse
```

The `TreeSE` object, similar to a standard `data.frame` or `matrix`, has rows
and columns. Typically, samples are stored in columns, while features or taxa
and columns. Typically, samples are stored in columns, while taxa (or features)
are stored in rows. You can extract subsets of the data, such as the first
five rows and certain three columns. The object manages the linkages
between data, ensuring, for example, that when you subset the data, for
instance, both the assay and sample metadata are subsetted simultaneously,
ensuring they remain matched with each other.
five rows (first five taxa) and certain three columns (certain three samples).
The object manages the linkages between data (e.g., the assay data and the sample
metadata), ensuring that when you subset the data object, all its parts
are subsetted simultaneously, such that they remain matched with each other.

```{r}
#| label: subset_intro
Expand All @@ -105,21 +105,24 @@ from [@sec-treese_subsetting].

## Assay data {#sec-assay-slot}

The microbiome is the collection of all microbes (such as bacteria, viruses,
fungi, etc.) in the body. When studying these microbes, abundance data is
needed, and that’s where assays come in.

An assay is a way of measuring the presence and abundance of different types
of microbes in a sample. For example, if you want to know how many bacteria of
a certain type are in your gut, you can use an assay to measure this. When
storing assays, the original data is count-based. However, the original
count-based taxonomic abundance tables may undergo different
transformations, such as logarithmic, Centered Log-Ratio (CLR), or relative
abundance. These are typically stored in _**assays**_. See
[@sec-assay-transform] for more information on transformations.

The `assays` slot contains the experimental data as multiple count matrices.
The result of `assays` is a list of matrices.
When studying microbiomes, the primary type of data is in the form of abundance
of given microbes in given samples. This sample per taxa table forms the
core of the TreeSE object, called the ‘The assay data’ .

An assay is a measurement of the presence and abundance of different microbial
taxa in a sample. The assay data records this in a table where rows are unique
taxa and columns are unique samples and each entry contains a number
describing how many of a given taxon is present in a given sample. Note that
when storing assays, the original data is count-based. However, due to the
nature of how microbiome data is produced, these count-based abundances rarely
reflect the true counts of microbial taxa in the sample, and thus the abundance
tables often undergo different transformations, such as logarithmic, Centered
Log-Ratio (CLR), or relative abundance to make these abundance values comparable
with each other. See[@sec-assay-transform] for more information on transformations.

The microbial abundance tables are stored in _**assays**_. The assays slot
contains the abundance data as multiple count matrices. The result of assays
is a list of matrices.

```{r}
assays(tse)
Expand All @@ -132,9 +135,8 @@ assay(tse, "counts") |> head()
```

So, in summary, in the world of microbiome analysis, an assay is essentially
a way to quantify and understand the composition of microbes in a given sample,
which is super important for all kinds of research, ranging from human health
to environment studies.
a way to describe the composition of microbes in a given sample. This way we
can summarise the microbiome profile of a human gut or a sample of soil.

Furthermore, to illustrate the use of multiple assays, we can create an empty
matrix and add it to the object.
Expand All @@ -158,7 +160,7 @@ a requirement for the assays.
## colData

`colData` contains information about the samples used in the study. This
information can include details such as the sample ID, the primers used in
sample metadata can include details such as the sample ID, the primers used in
the analysis, the barcodes associated with the sample (truncated or complete),
the type of sample (e.g. soil, fecal, mock) and a description of the sample.

Expand All @@ -172,23 +174,27 @@ indicates the sample type (e.g. soil, fecal matter, control) and

## rowData {#sec-rowData}

`rowData` contains data on the features of the analyzed samples. This is
particularly important in the microbiome field for storing taxonomic
information. This taxonomic information is extremely important for
understanding the composition and diversity of the microbiome in each sample
analyzed. It enables identification of the different types of microorganisms
present in samples. It also allows you to explore the relationships between
microbiome composition and various environmental or health factors.
`rowData` contains data on the features, such as microbial taxa of the analyzed
samples. This is particularly important in the microbiome field for storing
taxonomic information, such as the Species, Genus or Family of the different
microorganisms present in samples. This taxonomic information is extremely important
for understanding the composition and diversity of the microbiome in each sample.


```{r rowdata}
rowData(tse)
```

## rowTree

Phylogenetic trees also play an important role in the microbiome field. The
`TreeSE` class can keep track of features and node
relations via two functions, `rowTree` and `rowLinks`.
Phylogenetic trees play an important role in the microbiome field. Many times it
is useful to know how closely related the microbial taxa present in the data are.
For example, to calculate widely-used phylogenetically weighted microbiome
dissimilarity metrics such as UniFrac and wUniFrac, we need information
on not only the presence and abundance of microbial taxa in each sample but
also the evolutionary relatedness among these taxa. The `TreeSE` class can
keep track of relations among features (taxa) via two functions,
`rowTree` and `rowLinks`.

A tree can be accessed via `rowTree` as `phylo` object.

Expand All @@ -213,7 +219,7 @@ the links in an existing object, the `changeTree()` function is available.

## Alternative Experiments {#sec-alt-exp}

_**Alternative experiments**_ complement _assays_. They can contain
_**Alternative experiments**_ (`altExp`) complement _assays_. They can contain
complementary data, which is no longer tied to the same dimensions as
the assay data. However, the number of samples (columns) must be the
same.
Expand All @@ -233,8 +239,8 @@ or an object from a derived class with independent feature data.

The following shows how to store taxonomic abundance tables
agglomerated at different taxonomic levels. However, the data could as
well originate from entirely different measurement sources as long as
the samples match.
well originate from entirely different measurement sources (e.g., 16S
amplicon and metagenomic sequence data) as long as the samples match.

Let us first subset the data so that it has only two rows.

Expand All @@ -257,8 +263,7 @@ altExp(tse, "subsetted") <- tse_sub
altExpNames(tse)
```

We can now subset the data by taking certain samples, for instance, and this
acts on both `altExp` and assay data.
Now, if we subset the data, this acts on both the `altExp` and the assay data.

```{r altexp_agglomerate3}
tse_single_sample <- tse[, 1]
Expand All @@ -271,9 +276,9 @@ to the `SingleCellExperiment` package [@R_SingleCellExperiment].

## Multiple experiments {#sec-mae}

_**Multiple experiments**_ relate to complementary measurement types,
such as transcriptomic or metabolomic profiling of the microbiome or
the host. Multiple experiments can be represented using the same
_**Multiple experiments**_ relate to complementary measurement types from
the same samples, such as transcriptomic or metabolomic profiling of the
microbiome. Multiple experiments can be represented using the same
options as alternative experiments, or by using the
`MAE` class [@Ramos2017]. Depending on how the
datasets relate to each other the data can be stored as:
Expand Down Expand Up @@ -317,10 +322,10 @@ mae <- HintikkaXOData
mae
```

The `sampleMap` is a crucial component of the `MAE` object as it acts as the
The `sampleMap` is a crucial component of the `MAE` object as it acts as an
important bookkeeper, maintaining the information about which samples are
associated with which experiments. This ensures that data linkages are
correctly managed and preserved across different types of experiments.
correctly managed and preserved across different types of measurements.

```{r}
#| label: show_mae2
Expand All @@ -340,15 +345,14 @@ mae
::: {.callout-note}
## Note

If you have multiple experiments (e.g., different omics data types like
metagenomics, transcriptomics, proteomics, or metabolomics), the
`MultiAssayExperiment` object allows you to organize and integrate these
datasets, even if the samples across experiments don’t have a perfect 1:1 match.
If you have multiple experiments containing multiple measures from same sources
(e.g., patients/host, individuals/sites), you can utilize the `MultiAssayExperiment`
object to keep track of which samples belong to which patient.

:::

The following dataset illustrates how to utilize the sample mapping system in
`MAE`. It includes two omics: biogenic amines and fatty acids,
`MAE`. It includes two omics data: biogenic amines and fatty acids,
collected from 10 chickens.

```{r}
Expand All @@ -359,17 +363,17 @@ mae
```

We can see that there are more than ten samples per omic dataset due to
multiple time points collected for some animals. From the `colData` of `MAE`,
we can observe the animal metadata shared between omics and time points,
multiple samples from different time points collected for some animals.
From the `colData` of `MAE`, we can observe the individual animal metadata,
including information that remains constant throughout the trial.

```{r}
#| label: show_coldata_mae
colData(mae)
```

The `sampleMap` slot now contains mappings between each sample and the
corresponding animal. There are as many rows as there are total samples.
The `sampleMap` slot now contains mappings between each unique sample and the
corresponding individual animal. There are as many rows as there are total samples.

The "colname" column refers to the samples in the omic dataset identified in
the "assay" column, while the "primary" column provides information about the
Expand Down
5 changes: 3 additions & 2 deletions inst/pages/import.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ tree_file_path <- system.file(
```

Now we can read in the biom file and convert it into a `TreeSE` object. In
addition, we retrieve the rank names from the prefixes of the feature names
addition, we retrieve the rank names from the prefixes of the taxa names
and then remove them with the `rank.from.prefix` and `prefix.rm` optional
arguments.

Expand Down Expand Up @@ -108,7 +108,8 @@ sample_meta <- read.csv(
sample_meta_file_path, sep = ",", row.names = 1)
# Add this sample data to colData of the taxonomic data object
# Note that the data must be given in a DataFrame format
# Note that the samples in the sample data must be in the same order as
in the original biom file and that data must be given in a DataFrame format
colData(tse) <- DataFrame(sample_meta)
```

Expand Down

0 comments on commit 8e01062

Please sign in to comment.