Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when analyzing the gene expression data with totalSeq C #1

Open
zenghk opened this issue May 12, 2022 · 1 comment
Open

Errors when analyzing the gene expression data with totalSeq C #1

zenghk opened this issue May 12, 2022 · 1 comment

Comments

@zenghk
Copy link

zenghk commented May 12, 2022

I am using the script from Processing and integrating 5k PBMCs CITE-seq data to deal with my own data with totalSeq C, but the full pipeline fail when running the following step, the command and the errors are showed below. The result is available when running RNASeq using scanpy, Does this error occur due to the combination of protein and RNA?

pt.pp.dsb(mdata, mdata_raw, empty_counts_range=(1.5, 2.8), isotype_controls=isotypes, random_state=1)

ValueError Traceback (most recent call last)
in
----> 1 pt.pp.dsb(mdata, mdata_raw,empty_counts_range=(3.5, 4), isotype_controls=isotypes, random_state=1)

~\anaconda3\lib\site-packages\muon_prot\preproc.py in dsb(data, data_raw, pseudocount, denoise_counts, isotype_controls, empty_counts_range, cell_counts_range, add_layer, random_state)
162 )
163 for c in range(cells_scaled.shape[0]):
--> 164 sharedvar.fit(cells_scaled[c, :, np.newaxis])
165 separatevar.fit(cells_scaled[c, :, np.newaxis])
166

~\anaconda3\lib\site-packages\sklearn\mixture_base.py in fit(self, X, y)
191 self
192 """
--> 193 self.fit_predict(X, y)
194 return self
195

~\anaconda3\lib\site-packages\sklearn\mixture_base.py in fit_predict(self, X, y)
218 Component labels.
219 """
--> 220 X = _check_X(X, self.n_components, ensure_min_samples=2)
221 self._check_n_features(X, reset=True)
222 self._check_initial_parameters(X)

~\anaconda3\lib\site-packages\sklearn\mixture_base.py in _check_X(X, n_components, n_features, ensure_min_samples)
51 """
52 X = check_array(X, dtype=[np.float64, np.float32],
---> 53 ensure_min_samples=ensure_min_samples)
54 if n_components is not None and X.shape[0] < n_components:
55 raise ValueError('Expected n_samples >= n_components '

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
719 if force_all_finite:
720 _assert_all_finite(array,
--> 721 allow_nan=force_all_finite == 'allow-nan')
722
723 if ensure_min_samples > 0:

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
104 msg_err.format
105 (type_err,
--> 106 msg_dtype if msg_dtype is not None else X.dtype)
107 )
108 # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

@gtca
Copy link
Collaborator

gtca commented May 13, 2022

Hey @zenghk, there seems to be an issue during the fitting step. Are there NaN values in the dense or sparse protein counts matrix or features that are not expressed at all in the data?
We can also have a closer look at this issue if you could share some data with us or a way to reproduce this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants