Errors when analyzing the gene expression data with totalSeq C #1

zenghk · 2022-05-12T01:12:47Z

I am using the script from Processing and integrating 5k PBMCs CITE-seq data to deal with my own data with totalSeq C, but the full pipeline fail when running the following step, the command and the errors are showed below. The result is available when running RNASeq using scanpy, Does this error occur due to the combination of protein and RNA?

pt.pp.dsb(mdata, mdata_raw, empty_counts_range=(1.5, 2.8), isotype_controls=isotypes, random_state=1)

ValueError Traceback (most recent call last)
in
----> 1 pt.pp.dsb(mdata, mdata_raw,empty_counts_range=(3.5, 4), isotype_controls=isotypes, random_state=1)

~\anaconda3\lib\site-packages\muon_prot\preproc.py in dsb(data, data_raw, pseudocount, denoise_counts, isotype_controls, empty_counts_range, cell_counts_range, add_layer, random_state)
162 )
163 for c in range(cells_scaled.shape[0]):
--> 164 sharedvar.fit(cells_scaled[c, :, np.newaxis])
165 separatevar.fit(cells_scaled[c, :, np.newaxis])
166

~\anaconda3\lib\site-packages\sklearn\mixture_base.py in fit(self, X, y)
191 self
192 """
--> 193 self.fit_predict(X, y)
194 return self
195

~\anaconda3\lib\site-packages\sklearn\mixture_base.py in fit_predict(self, X, y)
218 Component labels.
219 """
--> 220 X = _check_X(X, self.n_components, ensure_min_samples=2)
221 self._check_n_features(X, reset=True)
222 self._check_initial_parameters(X)

~\anaconda3\lib\site-packages\sklearn\mixture_base.py in _check_X(X, n_components, n_features, ensure_min_samples)
51 """
52 X = check_array(X, dtype=[np.float64, np.float32],
---> 53 ensure_min_samples=ensure_min_samples)
54 if n_components is not None and X.shape[0] < n_components:
55 raise ValueError('Expected n_samples >= n_components '

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
719 if force_all_finite:
720 _assert_all_finite(array,
--> 721 allow_nan=force_all_finite == 'allow-nan')
722
723 if ensure_min_samples > 0:

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
104 msg_err.format
105 (type_err,
--> 106 msg_dtype if msg_dtype is not None else X.dtype)
107 )
108 # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

gtca · 2022-05-13T13:53:46Z

Hey @zenghk, there seems to be an issue during the fitting step. Are there NaN values in the dense or sparse protein counts matrix or features that are not expressed at all in the data?
We can also have a closer look at this issue if you could share some data with us or a way to reproduce this error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when analyzing the gene expression data with totalSeq C #1

Errors when analyzing the gene expression data with totalSeq C #1

zenghk commented May 12, 2022

gtca commented May 13, 2022

Errors when analyzing the gene expression data with totalSeq C #1

Errors when analyzing the gene expression data with totalSeq C #1

Comments

zenghk commented May 12, 2022

gtca commented May 13, 2022