Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Restore SOMASparseNDArray$read_sparse_matrix() #1414

Merged
merged 2 commits into from
May 24, 2023

Conversation

mojaveazure
Copy link
Member

@mojaveazure mojaveazure commented May 24, 2023

Restore reading of one-based sparse matrices. Zero-based sparse matrices cause issues with the downstream ecosystems (eg. #1279). This PR brings back support for reading one-based matrices while retaining ability to read zero-based sparse matrices. Code is shared between both readers to reduce duplication.

Also makes uses of Matrix::sparseMatrix(index1 = FALSE) to reduce the amount of coercion from zero-based in SOMA → 1-based in call to Matrix::sparseMatrix() → 0-based internally within Matrix

Restore reading of one-based sparse matrices.
Zero-based sparse matrices cause issues with the downstream ecosystems.
This PR brings back support for reading one-based matrices while retaining ability to read zero-based sparse matrices.
Code is shared between both readers to reduce duplication.
coords = NULL,
result_order = "auto",
repr = c("C", "T", "R"),
iterated = FALSE,
log_level = "warn"
) {
private$check_open_for_read()

repr <- repr[1L]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need that given the next line. match.arg() will not let you pass a vector through.

Copy link
Member Author

@mojaveazure mojaveazure May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had match.arg() fail when passing default vectors, eg

f <- function(repr = c("C", "T", "R")) {
  match.arg(repr)
}

g <- function(repr = c("C", "T", "R")) {
  f(x)
}

g() # throws an error

but can't seem to reproduce it; I'll remove the line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g() is 'outside the spec' methinks. match.arg() does some black art with the signature of its enclosing function. You break that in the example, you get to keep the pieces.

At least that's how it reads to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Department of Couldn't be quicker and dirtier if we tried calling on line 2:

> f <- function(repr = c("C", "T", "R")) {
  match.arg(repr)
}
> f(c("T", "R", "R"))
Error in match.arg(repr) : 'arg' must be of length 1
> 

You had the right idea there but is already baked in. And by default more explicit (--> actual error).

Copy link
Contributor

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked in full detail but it looks generally ok. One worry is whether this will bite with eg #1411 ?

@mojaveazure mojaveazure requested a review from pablo-gar May 24, 2023 22:32
@codecov-commenter
Copy link

Codecov Report

Patch coverage has no change and project coverage change: -11.85 ⚠️

Comparison is base (99e5094) 63.95% compared to head (1faa15a) 52.10%.

❗ Current head 1faa15a differs from pull request most recent head 06e0105. Consider uploading reports for the commit 06e0105 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1414       +/-   ##
===========================================
- Coverage   63.95%   52.10%   -11.85%     
===========================================
  Files          98       68       -30     
  Lines        8170     5711     -2459     
===========================================
- Hits         5225     2976     -2249     
+ Misses       2945     2735      -210     
Flag Coverage Δ
python ?
r 52.10% <ø> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 31 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mojaveazure
Copy link
Member Author

Good question; haven't seen #1411 so I'm tagging @pablo-gar to ensure he sees this. From a cursory glance, it seems like the only things changing are the use of matrixZeroBasedView$new() and no-longer needing the complicated zero-to-one based conversions in the ecosystem methods

@@ -188,7 +188,6 @@ SOMASparseNDArray <- R6::R6Class(
log_level = "warn"
) {
private$check_open_for_read()
repr <- repr[1L]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pablo-gar
Copy link
Member

pablo-gar commented May 24, 2023

@mojaveazure What's the problem of reading a zero-based matrix and then converting to one-based?

Like here:

mat <- as.one.based(self$ms$obsp$get(obsp_layer)$read_sparse_matrix_zero_based(repr = 'C'))

I think that gives you effectively the same result as read_sparse_matrix()

which after #1411 it would look like this

one_based <- soma_sparse$read_sparse_matrix_zero_based(repr="C")$get_one_based_matrix()

@mojaveazure
Copy link
Member Author

mojaveazure commented May 24, 2023

@pablo-gar I believe it does give the same result, but in a far less convoluted way. What as.one.based(self$ms$obsp$get(obsp_layer)$read_sparse_matrix_zero_based(repr = 'C')) is doing is

  1. reading in coordinates as zero-based (SOMA)
  2. converting to one-based for creation with Matrix::sparseMatrix()
  3. converting to zero-based for internal usage within Matrix
  4. wrapping the matrix in the zero-based view
  5. extracting the matrix from the zero-based view

In perusing #1411, step 5 seems to get a lot more complicated

embed_mat <- if (inherits(embed, 'SOMASparseNDArray')) {
this_mat <- embed$read_sparse_matrix_zero_based()
this_mat <- this_mat$take(coords$cells, coords$dims)
this_mat <- this_mat$get_one_based_matrix()
as.matrix(this_mat)

This PR instead simplifies this process to

  1. reading in coordinates as zero-based
  2. creating with zero-based coordinates using Matrix::sparseMatrix(index1 = FALSE)
  3. letting Matrix handle 1-based for the ecosystems and preserving the matrixZeroBasedView wrapper

I've argued before (and again) that the vast majority of users, including Seurat/SingleCellExperiment, won't need zero-based matrices, so providing easy access for one-based matrices makes life easier. Moreover, because the zero-based view is a wrapper around one-based matrices, this simply exposes the first part of that process

@pablo-gar
Copy link
Member

pablo-gar commented May 24, 2023

@mojaveazure If this is a UX wrapper I'm not opposed nor in favor but I would suggest we don't duplicate code.

So I propose we either implement these function this way:

read_sparse_matrix = function(...) {
   mat <- self$read_sparse_matrix_zero_based(...)
   mat$get_one_based_matrix()
}

OR the converse:

read_sparse_matrix_zero_based = function(...) {
   mat <- self$read_sparse_matrix(...)
   matrixZeroBasedView$new(mat)
}

you did the latter! I'm ok with that

@mojaveazure
Copy link
Member Author

@pablo-gar This PR does just that: read_sparse_matrix_zero_based() wraps read_sparse_matrix() just as the matrixZeroBasedView() wraps sparse matrices

read_sparse_matrix_zero_based = function(
coords = NULL,
result_order = "auto",
repr = c("C", "T", "R"),
iterated = FALSE,
log_level = "warn"
) {
# If we're setting up an iterated reader, set the tracker for zero-based
# and use `self$read_sparse_matrix()` for the rest of the setup
# Use `!ifFALSE()` as the logic in `self$read_sparse_matrix()` says
# if iterated is FALSE, do non-iterated
# otherwise, do iterated
if (!isFALSE(iterated) && is.null(private$soma_reader_pointer)) {
private$zero_based <- TRUE
}
mat <- self$read_sparse_matrix(
coords = coords,
result_order = result_order,
repr = repr,
iterated = iterated,
log_level = log_level
)
# Wrap in zero-based view
if (isFALSE(iterated)) {
return(matrixZeroBasedView(mat))
}
return(invisible(NULL))
},

@mojaveazure
Copy link
Member Author

@pablo-gar Whoops, didn't see that your edit saying you already saw that, my bad

@pablo-gar
Copy link
Member

pablo-gar commented May 24, 2023

@mojaveazure Np! I'm processing as I type! I should process and then type after.

The other part that I'm trying to grasp is how this will collide with #1274 but I think that should be an "easy fix" on that other PR

@pablo-gar
Copy link
Member

@mojaveazure also see that we may get rid altogether of the ability to do "full" reads

#1274 (comment)

@mojaveazure
Copy link
Member Author

@pablo-gar Hmm, interesting. I'm not sure how well that will work in the R-world as R doesn't have a useful version of iterator and both Seurat and SingleCellExperiment need the full matrices (at least for now, there are ways around that but it'll involve a lot more work)

@pablo-gar
Copy link
Member

@mojaveazure yup it'll be a long call like in Python

full_mat <- soma_sparse$read()$sparse_matrix()$concat()

@mojaveazure mojaveazure merged commit e25c5b7 into main May 24, 2023
@mojaveazure mojaveazure deleted the ph/feat/read-sparse-matrix branch May 24, 2023 23:45
@mlin
Copy link
Member

mlin commented May 25, 2023

Belated- I'm also neutral on this, but having merged it, shouldn't we also update SOMAExperimentAxisQuery.R to use it instead of read_sparse_matrix_zero_based()?

@mojaveazure
Copy link
Member Author

@mlin Yes; that will be brought about in future PRs, namely #1279 for SingleCellExperiment and #1261 for Seurat

@johnkerl johnkerl changed the title Restore SOMASparseNDArray$read_sparse_matrix() [r] Restore SOMASparseNDArray$read_sparse_matrix() Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants