Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline datasets for documenation #700

Open
6 tasks
LucaMarconato opened this issue Sep 3, 2024 · 3 comments
Open
6 tasks

Streamline datasets for documenation #700

LucaMarconato opened this issue Sep 3, 2024 · 3 comments
Labels
docs 📜 Documentation-related issues

Comments

@LucaMarconato
Copy link
Member

We should make the usage of datasets more heterogenous across the notebooks in the docs.

Practically:

  • select 1, max 3, small datasets (<1 GB each, ideally ~100 MB), use these datasets in all the notebooks across the repos:
    • spatialdata
    • spatialdata-plot
    • napari-spatialdata
  • in particular, remove the non-bio datasets from the docs (e.g. remove the raccoon dataset from the transformation notebook, and the blobs dataset from the aggregation and rasterize notebooks)
  • implement a dataset class, like in squidpy, to automatically download the datasets

CC @timtreis @melonora

@LucaMarconato LucaMarconato added the docs 📜 Documentation-related issues label Sep 3, 2024
@LucaMarconato
Copy link
Member Author

LucaMarconato commented Sep 3, 2024

We will use the following datasets:

For citations (used in the readme of spatialdata-notebooks/datasets and when we use the data in the tutorial notebooks), here are the referneces:

spatialdata notebooks

  • I. Use SpatialData with your data: the SpatialData object. No dataset used -> the SpatialData object. No dataset used
  • II. Use SpatialData with your data: SpatialElements and tables: custom dataset -> no change
  • (needed for the workshops) Transformations and coordinate systems: raccoon -> Xenium
  • Spatial query: visium mouse brain -> Visium
  • Annotating regions of interest with napari: visium breast cancer -> Visium
  • Use landmark annotations to align multiple -omics layers: visium + xenium breast cancer -> Visium + Visium HD
  • Working with annotations in SpatialData: blobs -> Xenium
  • Integrate/aggregate signals across spatial layers: blobs and custom dataset -> Xenium
  • Interchangeability between raster and vector representations: blobs + custom binned dataset -> Xenium + Visium HD (for the bins)
  • Squidpy integration: xenium breast cancer -> Xenium
  • (needed for the workshops) Deep learning example on image tiles: xenium breast cancer (annotated with the xenium_visium_00 paper notebook) + visium breast cancer
    • move the notebook into the paper reproducibillity notebook
    • link the old notebook in the one
    • the new notebook is going to be very lightweight and not training a model (just dataloader): using Visium

spatialdata-plot notebooks

  • Technology notebooks Visium: visium mouse brain -> Visium
  • Technology notebooks MIBI-TOF: no change
  • Technology notebooks MERFISH: no change, but:
    • rename the title to "MERFISH prototype pipeline"
    • explain in the notebook that this is not MERFISH from Vizgen, but indeed the prototype pipeline
  • Technology notebooks CosMx: no change
  • Technology notebooks Visium HD: visium hd mouse intenstine -> Visium HD
  • Technology notebooks Xenium: XOA 2.0.x -> Xenium
  • Implicit performance improvements when plotting raster data: visium breast cancer -> Visium

napari-spatialdata notebooks

  • Analyse MibiTOF in Napari-SpatialData: no change
  • Analyse Nanostring data in Napari-SpatialData: nanostring cosmx -> CosMx (it's the same dataset, but here I mean to use the subsampled dataset and the download API)
  • Using Napari-SpatialData: nanostring cosmx -> CosMx
  • Use the Scatterwidget with AnnData from Notebook scatterwidget.ipynb: visium_hne_adata AnnData format -> choose a bigger object and SpatialData object: use Visium
  • Use the Scatterwidget with AnnData from Notebook scatterwidget_annotation.ipynb: same as above
  • annotation widget notebook: Visium

tasks

  • cosmx, mibitof and merfish are already available in spatialdata-sandbox, add the 3 missing datasets.
  • we add the missing datasets to the Readme in spatialdata-notebooks/datatasets.
    • add a disclaimer that the data is a subet
  • and for the elected 6 datasets above, we say that we use them in the docs
  • add a job in the data CI
    • that converts the raw data to SpatialData
    • subsets the data
    • write them to Zarr
    • upload them to S3.
  • write a small frontend (like the squidpy one), so that we download the data via code, better for the user
  • add a disclaimer that the data is a subset also in each notebook

@Pancreas-Pratik
Copy link

Pancreas-Pratik commented Nov 19, 2024

@LucaMarconato From your comment here (scverse/scanpy#2992 (comment)), I can confirm that spaceranger v3.1.1 output also was not working using visium_hd() from:

from spatialdata_io import visium_hd
import spatialdata as sd

In short, visium_hd() gave an error that it was looking for requiring a dataset_id to be able to find the prefix for _feature_slice.h5 even though there was no prefix (or beginning underscore) in the actual file name (feature_slice.h5).

So I thought providing an empty input could fix this, so: dataset_id='', but with this visium_hd() said it could not find _feature_slice.h5. I renamed the actual file feature_slice.h5 to _feature_slice.h5. This fixed the dataset_id= error, but I then received an error about something related to rgb. I could provide an output of these errors, if needed.


As a workaround I tried reading the VisiumHD spaceranger v3.1.1 outputs to an AnnData format using bin2cell's read_visium() and then using from spatialdata_io.experimental import from_legacy_anndata using this tutorial, this worked.

But the converted AnnData to SpatialData object did not have all of the information (such as bin info) that the SpatialData object generated from spatialdata_io.visium_HD() has based on the outputs from here have.


10X also just released spaceranger v3.1.2 (Nov. 18th 2024) which fixed a misalignment issue from CytAssist Firmware 2.1.

@LucaMarconato
Copy link
Member Author

@Pancreas-Pratik thanks for reporting. The first bug is tracker here: scverse/spatialdata-io#252, I will work on this soon. The second bug (rgb) is due to this: scverse/spatialdata-io#222; the bug is now fixed and the solution (minimum requirement for spatial-image) will appear in the incoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs 📜 Documentation-related issues
Projects
None yet
Development

No branches or pull requests

2 participants