Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read Visium HD data using spatialdata-io (Recurrent error). Data is non-zarr format. #252

Open
2 of 3 tasks
ankshe91 opened this issue Nov 6, 2024 · 15 comments
Open
2 of 3 tasks

Comments

@ankshe91
Copy link

ankshe91 commented Nov 6, 2024

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

I have non zarr format Visium HD data.
I tried reading it with sdata = visium_hd(path_read)

it keeps asking me for a dataset_id which is not there in the feature_slice file name or my folder.
Nonetheless, I kept setting it to None or "" or other possible dataset id values.

I cannot find any tech support on the error either.

(I also tried specifying the file path to the different binned folders)

Minimal code sample

path_read = '/Users/DarthRNA/Downloads/1299_1_XS_VHD_v2_outs'
sdata = visium_hd(path_read)

Error output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[54], line 1
----> 1 sdata = visium_hd(path_read)

File /Volumes/Ankitha/Conda/miniconda3/envs/myenv/lib/python3.12/site-packages/spatialdata_io/readers/visium_hd.py:95, in visium_hd(path, dataset_id, filtered_counts_file, bin_size, bins_as_squares, fullres_image_file, load_all_images, imread_kwargs, image_models_kwargs, anndata_kwargs)
     92 images: dict[str, Any] = {}
     94 if dataset_id is None:
---> 95     dataset_id = _infer_dataset_id(path)
     96 filename_prefix = f"{dataset_id}_"
     98 def load_image(path: Path, suffix: str, scale_factors: list[int] | None = None) -> None:

File /Volumes/Ankitha/Conda/miniconda3/envs/myenv/lib/python3.12/site-packages/spatialdata_io/readers/visium_hd.py:361, in _infer_dataset_id(path)
    359 files = [f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f)) and f.endswith(suffix)]
    360 if len(files) == 0 or len(files) > 1:
--> 361     raise ValueError(
    362         f"Cannot infer `dataset_id` from the feature slice file in {path}, please pass `dataset_id` as an argument."
    363     )
    364 return files[0].replace(suffix, "")

ValueError: Cannot infer `dataset_id` from the feature slice file in /Users/DarthRNA/Downloads/1299_1_XS_VHD_v2_outs, please pass `dataset_id` as an argument.

Versions


@Nina-Song
Copy link

Same issue here. My HD data structure is similar to 10x Mouse Small Intestine default structure, which contains ['feature_slice.h5', 'metrics_summary.csv', 'probe_set.csv', 'possorted_genome_bam.bam', 'spatial', 'binned_outputs', 'molecule_info.h5', 'possorted_genome_bam.bam.bai', 'web_summary.html', 'cloupe_008um.cloupe']

if there could be any tutorial how to read it and then convert to zarr will be great :> thank you again for this amazing package development!

@Nina-Song
Copy link

Nina-Song commented Nov 8, 2024

Hi @ankshe91 , i tried to directly download 10x Mouse Small Intestine data from their website, and used it as input (remember to unzip some of the .tar.gz files)
now the visium_hd function works. i guess our previous naming issue causing error.

nsong@gemini-data1:/home/Visium_HD_Mouse_Small_Intestine
$ ls
binned_outputs
spatial
Visium_HD_Mouse_Small_Intestine_cloupe_008um.cloupe
Visium_HD_Mouse_Small_Intestine_feature_slice.h5
Visium_HD_Mouse_Small_Intestine_metrics_summary.csv
Visium_HD_Mouse_Small_Intestine_molecule_info.h5
Visium_HD_Mouse_Small_Intestine_spatial.tar.gz
Visium_HD_Mouse_Small_Intestine_web_summary.html

sdata = spatialdata_io.visium_hd(path_read)
Image

@ankshe91
Copy link
Author

ankshe91 commented Nov 8, 2024 via email

@Nina-Song
Copy link

Hi @ankshe91 , does Visium_HD_Mouse_Small_Intestine demo works in your script?

@ankshe
Copy link

ankshe commented Nov 8, 2024

I haven't tried the sample dataset yet.

But I see that your own dataset also doesn't have the dataset_ids.
does that work with the reader?

@Nina-Song
Copy link

Nina-Song commented Nov 8, 2024

Maybe start with https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine (batch download) could be a good idea, i mimic their folder structure and now worked on my own data as well. (the screenshot attached previously was from this demo data not my own data but both of them work now)

@ankshe
Copy link

ankshe commented Nov 8, 2024

Thank you!
I'll try doing that!

@ankshe91
Copy link
Author

ankshe91 commented Nov 12, 2024 via email

@Nina-Song
Copy link

Nina-Song commented Nov 12, 2024

@ankshe91
Copy link
Author

ankshe91 commented Nov 12, 2024 via email

@ankshe91
Copy link
Author

ankshe91 commented Nov 12, 2024 via email

@flying-sheep flying-sheep transferred this issue from scverse/scanpy Dec 16, 2024
@LucaMarconato
Copy link
Member

Hi, thanks @Nina-Song for sharing your workaround.

@ankshe91, it is strange that the renaming workaround didn't fix the issue. The error OSError: Unable to synchronously open file (file signature not found) seems to suggest that one file that is not an HDF5 file is being tried to read as an HDF5 file. Could it be that the renaming modified some file extension?

On the other hand a fix for this bug could be done by extending the reader to enable parsing a dataset where no prefix is added. Would you like to try making a small PR for that?

No, there was another code page. Never mind, they took it down.

A previous release accidentally included some old commits from a submodule that we use to build the documentation and as a result the documentation was not including some notebooks. We fixed this the release after. You should been able to find the Visium HD notebook again since 3-4 weeks ago.

@XiaolongYang-HZAU
Copy link

Thank you for developing such an excellent toolkit. As of now, this bug still exists. I have installed the latest version of the spatialdata package using pip. However, when I use sdata = visium_hd, I still encounter the error "Cannot infer dataset_id from the feature slice file in *** please pass dataset_id as an argument." My data comes from spaceranger and its structure is as follows:

  • binned_outputs
  • metrics_summary.csv
  • probe_set.csv
  • cloupe_008um.cloupe
  • molecule_info.h5
  • spatial
  • cloupe_custom.cloupe
  • possorted_genome_bam.bam
  • web_summary.html
  • feature_slice.h5
  • possorted_genome_bam.bam.bai

@ankshe
Copy link

ankshe commented Jan 2, 2025

@LucaMarconato
Hi, yes please. That would be great!

@ankshe
Copy link

ankshe commented Jan 6, 2025

Thank you!

@LucaMarconato Hi, yes please. That would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants