Skip to content

Commit

Permalink
Fixes to container chapter (microbiome#625)
Browse files Browse the repository at this point in the history
  • Loading branch information
TuomasBorman authored and artur-sannikov committed Oct 10, 2024
1 parent a723f3d commit e7819a0
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 8 deletions.
83 changes: 75 additions & 8 deletions inst/pages/containers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,50 @@ on how to represent different varieties of multi-table data within the

The options and recommendations are summarized in [@tbl-options].

## Structure of `TreeSE`

`TreeSE` contains several distinct slots, each holding a specific type of data.
The `assays` slot is the core of `TreeSE`, storing abundance tables that
contain the counts or concentrations of features in each sample.
Features can be taxa, metabolites, antimicrobial resistance genes, or other
measured entities, and are represented as rows. The columns correspond to
unique samples.

Building upon the `assays`, `TreeSE` accommodates various data types for both
features and samples. In `rowData`, the rows correspond to the same features
(rows) as in the abundance tables, while the columns represent variables such as
taxonomy ranks. Similarly, in `colData`, each row matches the samples (columns)
from the abundance tables, with the columns of `colData` containing metadata
like disease status or patient ID and time point if the dataset includes time
series.

The slots in `TreeSE` are outlined below:

- `assays`: Stores a list of abundance tables. Each table has consistent rows and columns, where rows represent taxa and columns represent samples.
- `rowData`: Contains metadata about the rows (taxa). For example, this slot can include a taxonomy table.
- `colData`: Holds metadata about the columns (samples), such as patient information or the time points when samples were collected.
- `rowTree`: Stores a hierarchical tree for the rows, such as a phylogenetic tree representing the relationships between taxa.
- `colTree`: Includes a hierarchical tree for the columns, which can represent relationships between samples, for example, indicating whether patients are relatives and the structure of those relationships.
- `rowLinks`: Contains information about the linkages between rows and the nodes in the `rowTree`.
- `colLinks`: Contains information about the linkages between columns and the nodes in the `colTree`.
- `referenceSeq`: Holds reference sequences, i.e., the sequences that correspond to each taxon identified in the rows.
- `metadata`: Contains metadata about the experiment, such as the date it was conducted and the researchers involved.

These slots are illustrated in the figure below:

![The structure of TreeSummarizedExperiment (TreeSE) object [@Huang2021].](figures/treese.png){width="80%"}

Additionally, TreeSE includes:

- `reducedDim`: Contains reduced dimensionality representations of the samples, such as Principal Component Analysis (PCA) results (see [@sec-community-similarity].
- `altExp`: Stores alternative experiments, which are `TreeSE` objects sharing the same samples but with different feature sets.

Among these, `assays`, `rowData`, `colData`, and `metadata` are shared with the
`SummarizedExperiment` (`SE`) data container. `reducedDim` and `altExp` come
from inheriting the `SingleCellExperiment` (`SCE`) class. The `rowTree`,
`colTree`, `rowLinks`, `colLinks`, and `referenceSeq` slots are unique to
`TreeSE`.

## Rows and columns {#sec-rows-and-cols}

Let us load example data and store it in variable `tse`.
Expand Down Expand Up @@ -152,7 +196,11 @@ A tree can be accessed via `rowTree` as `phylo` object.
rowTree(tse)
```

The links to the individual features are available through `rowLinks`.
Each row in `TreeSE` is linked to a specific node in a tree. This relationship
is stored in the `rowLinks` slot, which has the same rows as `TreeSE`.
The `rowLinks` slot contains information about which tree node corresponds to
each row and whether the node is a leaf (tip) or an internal node, among other
details.

```{r rowlinks}
rowLinks(tse)
Expand Down Expand Up @@ -202,10 +250,10 @@ original data.

```{r altexp_agglomerate2}
# Add the new data object to the original data object as an alternative
# experiment with the name "Phylum"
# experiment with the specified name
altExp(tse, "subsetted") <- tse_sub
# Check the alternative experiment names available in the data
# Retrieve and display the names of alternative experiments available
altExpNames(tse)
```

Expand Down Expand Up @@ -237,6 +285,26 @@ samples are defined through a `sampleMap`. Each element on the
`matrix`-like objects, including `SE` objects, and
the number of samples can differ between the elements.

In a `MAE`, the "subjects" represent patients. The `MAE` has four main slots,
with `experiments` being the core. This slot holds a list of experiments, each
in (`Tree`)`SE` format. To handle complex mappings between samples
(observations) across different experiments, the `sampleMap` slot stores
information about how each
sample in the experiments is linked to a patient. Metadata for each patient is
stored in the `colData` slot. Unlike the `colData` in `TreeSE`, this `colData`
is meant to store only metadata that remains constant throughout the trial.

- `experiments`: Contains experiments, such as different omics data, in TreeSE format.
- `sampleMap`: Holds linkages between patients (subjects) and samples in the experiments (observations).
- `colData`: Includes patient metadata that remains unchanged throughout the trial.

These slots are illustrated in the figure below:

![The structure of MultiAssayExperiment (MAE) object [@Ramos2017].](figures/mae.png){width="60%"}

Additionally, the object includes a `metadata` slot that contains information
about the dataset, such as the trial period and the creator of the `MAE` object.

The `MAE` object can handle more complex relationships between experiments.
It manages the linkages between samples and experiments, ensuring that
the data remains consistent and well-organized.
Expand All @@ -254,8 +322,6 @@ important bookkeeper, maintaining the information about which samples are
associated with which experiments. This ensures that data linkages are
correctly managed and preserved across different types of experiments.

In fact, we can have

```{r}
#| label: show_mae2
Expand All @@ -274,9 +340,10 @@ mae
::: {.callout-note}
## Note

If you have multiple experiments containing multiple measures from same patients,
you can utilize the `MultiAssayExperiment` object to keep track of which
samples belong to which patient.
If you have multiple experiments (e.g., different omics data types like
metagenomics, transcriptomics, proteomics, or metabolomics), the
`MultiAssayExperiment` object allows you to organize and integrate these
datasets, even if the samples across experiments don’t have a perfect 1:1 match.

:::

Expand Down
Binary file added inst/pages/figures/mae.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added inst/pages/figures/treese.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e7819a0

Please sign in to comment.