[r] Restore `SOMASparseNDArray$read_sparse_matrix()` #1414

mojaveazure · 2023-05-24T22:14:01Z

Restore reading of one-based sparse matrices. Zero-based sparse matrices cause issues with the downstream ecosystems (eg. #1279). This PR brings back support for reading one-based matrices while retaining ability to read zero-based sparse matrices. Code is shared between both readers to reduce duplication.

Also makes uses of Matrix::sparseMatrix(index1 = FALSE) to reduce the amount of coercion from zero-based in SOMA → 1-based in call to Matrix::sparseMatrix() → 0-based internally within Matrix

Restore reading of one-based sparse matrices. Zero-based sparse matrices cause issues with the downstream ecosystems. This PR brings back support for reading one-based matrices while retaining ability to read zero-based sparse matrices. Code is shared between both readers to reduce duplication.

eddelbuettel · 2023-05-24T22:26:08Z

apis/r/R/SOMASparseNDArray.R

      coords = NULL,
      result_order = "auto",
      repr = c("C", "T", "R"),
      iterated = FALSE,
      log_level = "warn"
    ) {
      private$check_open_for_read()
-
+      repr <- repr[1L]


I don't think you need that given the next line. match.arg() will not let you pass a vector through.

I've had match.arg() fail when passing default vectors, eg

f <- function(repr = c("C", "T", "R")) { match.arg(repr) } g <- function(repr = c("C", "T", "R")) { f(x) } g() # throws an error

but can't seem to reproduce it; I'll remove the line

g() is 'outside the spec' methinks. match.arg() does some black art with the signature of its enclosing function. You break that in the example, you get to keep the pieces.

At least that's how it reads to me.

Department of Couldn't be quicker and dirtier if we tried calling on line 2:

> f <- function(repr = c("C", "T", "R")) { match.arg(repr) } > f(c("T", "R", "R")) Error in match.arg(repr) : 'arg' must be of length 1 >

You had the right idea there but is already baked in. And by default more explicit (--> actual error).

eddelbuettel

I haven't looked in full detail but it looks generally ok. One worry is whether this will bite with eg #1411 ?

codecov-commenter · 2023-05-24T22:32:50Z

Codecov Report

Patch coverage has no change and project coverage change: -11.85 ⚠️

Comparison is base (99e5094) 63.95% compared to head (1faa15a) 52.10%.

❗ Current head 1faa15a differs from pull request most recent head 06e0105. Consider uploading reports for the commit 06e0105 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1414       +/-   ##
===========================================
- Coverage   63.95%   52.10%   -11.85%     
===========================================
  Files          98       68       -30     
  Lines        8170     5711     -2459     
===========================================
- Hits         5225     2976     -2249     
+ Misses       2945     2735      -210

Flag	Coverage Δ
python	`?`
r	`52.10% <ø> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 31 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

mojaveazure · 2023-05-24T22:35:21Z

Good question; haven't seen #1411 so I'm tagging @pablo-gar to ensure he sees this. From a cursory glance, it seems like the only things changing are the use of matrixZeroBasedView$new() and no-longer needing the complicated zero-to-one based conversions in the ecosystem methods

eddelbuettel · 2023-05-24T22:35:24Z

apis/r/R/SOMASparseNDArray.R

@@ -188,7 +188,6 @@ SOMASparseNDArray <- R6::R6Class(
      log_level = "warn"
    ) {
      private$check_open_for_read()
-      repr <- repr[1L]


pablo-gar · 2023-05-24T22:36:24Z

@mojaveazure What's the problem of reading a zero-based matrix and then converting to one-based?

Like here:

TileDB-SOMA/apis/r/R/SOMAExperimentAxisQuery.R

Line 707 in 99e5094

    
           mat <- as.one.based(self$ms$obsp$get(obsp_layer)$read_sparse_matrix_zero_based(repr = 'C'))

I think that gives you effectively the same result as read_sparse_matrix()

which after #1411 it would look like this

one_based <- soma_sparse$read_sparse_matrix_zero_based(repr="C")$get_one_based_matrix()

mojaveazure · 2023-05-24T22:53:16Z

@pablo-gar I believe it does give the same result, but in a far less convoluted way. What as.one.based(self$ms$obsp$get(obsp_layer)$read_sparse_matrix_zero_based(repr = 'C')) is doing is

reading in coordinates as zero-based (SOMA)
converting to one-based for creation with Matrix::sparseMatrix()
converting to zero-based for internal usage within Matrix
wrapping the matrix in the zero-based view
extracting the matrix from the zero-based view

In perusing #1411, step 5 seems to get a lot more complicated

TileDB-SOMA/apis/r/R/SOMAExperimentAxisQuery.R

Lines 592 to 596 in f0b9d51

    
           embed_mat <- if (inherits(embed, 'SOMASparseNDArray')) { 
        
             this_mat <- embed$read_sparse_matrix_zero_based() 
        
             this_mat <- this_mat$take(coords$cells, coords$dims) 
        
             this_mat <- this_mat$get_one_based_matrix() 
        
             as.matrix(this_mat)

This PR instead simplifies this process to

reading in coordinates as zero-based
creating with zero-based coordinates using Matrix::sparseMatrix(index1 = FALSE)
letting Matrix handle 1-based for the ecosystems and preserving the matrixZeroBasedView wrapper

I've argued before (and again) that the vast majority of users, including Seurat/SingleCellExperiment, won't need zero-based matrices, so providing easy access for one-based matrices makes life easier. Moreover, because the zero-based view is a wrapper around one-based matrices, this simply exposes the first part of that process

pablo-gar · 2023-05-24T23:08:06Z

@mojaveazure If this is a UX wrapper I'm not opposed nor in favor but I would suggest we don't duplicate code.

So I propose we either implement these function this way:

read_sparse_matrix = function(...) {
   mat <- self$read_sparse_matrix_zero_based(...)
   mat$get_one_based_matrix()
}

OR the converse:

read_sparse_matrix_zero_based = function(...) {
   mat <- self$read_sparse_matrix(...)
   matrixZeroBasedView$new(mat)
}

you did the latter! I'm ok with that

mojaveazure · 2023-05-24T23:11:46Z

@pablo-gar This PR does just that: read_sparse_matrix_zero_based() wraps read_sparse_matrix() just as the matrixZeroBasedView() wraps sparse matrices

TileDB-SOMA/apis/r/R/SOMASparseNDArray.R

Lines 238 to 266 in 06e0105

    
           read_sparse_matrix_zero_based = function( 
        
             coords = NULL, 
        
             result_order = "auto", 
        
             repr = c("C", "T", "R"), 
        
             iterated = FALSE, 
        
             log_level = "warn" 
        
           ) { 
        
             # If we're setting up an iterated reader, set the tracker for zero-based 
        
             # and use `self$read_sparse_matrix()` for the rest of the setup 
        
             # Use `!ifFALSE()` as the logic in `self$read_sparse_matrix()` says 
        
             # if iterated is FALSE, do non-iterated 
        
             # otherwise, do iterated 
        
             if (!isFALSE(iterated) && is.null(private$soma_reader_pointer)) { 
        
               private$zero_based <- TRUE 
        
             } 
        
             mat <- self$read_sparse_matrix( 
        
               coords = coords, 
        
               result_order = result_order, 
        
               repr = repr, 
        
               iterated = iterated, 
        
               log_level = log_level 
        
             ) 
        
             # Wrap in zero-based view 
        
             if (isFALSE(iterated)) { 
        
               return(matrixZeroBasedView(mat)) 
        
             } 
        
             return(invisible(NULL)) 
        
           },

mojaveazure · 2023-05-24T23:12:42Z

@pablo-gar Whoops, didn't see that your edit saying you already saw that, my bad

pablo-gar · 2023-05-24T23:16:01Z

@mojaveazure Np! I'm processing as I type! I should process and then type after.

The other part that I'm trying to grasp is how this will collide with #1274 but I think that should be an "easy fix" on that other PR

pablo-gar · 2023-05-24T23:22:37Z

@mojaveazure also see that we may get rid altogether of the ability to do "full" reads

#1274 (comment)

mojaveazure · 2023-05-24T23:32:11Z

@pablo-gar Hmm, interesting. I'm not sure how well that will work in the R-world as R doesn't have a useful version of iterator and both Seurat and SingleCellExperiment need the full matrices (at least for now, there are ways around that but it'll involve a lot more work)

pablo-gar · 2023-05-24T23:35:02Z

@mojaveazure yup it'll be a long call like in Python

full_mat <- soma_sparse$read()$sparse_matrix()$concat()

mlin · 2023-05-25T06:22:51Z

Belated- I'm also neutral on this, but having merged it, shouldn't we also update SOMAExperimentAxisQuery.R to use it instead of read_sparse_matrix_zero_based()?

mojaveazure · 2023-05-25T14:16:04Z

@mlin Yes; that will be brought about in future PRs, namely #1279 for SingleCellExperiment and #1261 for Seurat

mojaveazure added the r-api label May 24, 2023

mojaveazure requested review from mlin, eddelbuettel and johnkerl May 24, 2023 22:14

eddelbuettel reviewed May 24, 2023

View reviewed changes

Remove unneeded repr <- repr[1L]

06e0105

mojaveazure requested a review from pablo-gar May 24, 2023 22:32

eddelbuettel reviewed May 24, 2023

View reviewed changes

pablo-gar approved these changes May 24, 2023

View reviewed changes

mojaveazure merged commit e25c5b7 into main May 24, 2023

mojaveazure deleted the ph/feat/read-sparse-matrix branch May 24, 2023 23:45

johnkerl changed the title ~~Restore SOMASparseNDArray$read_sparse_matrix()~~ [r] Restore SOMASparseNDArray$read_sparse_matrix() Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[r] Restore `SOMASparseNDArray$read_sparse_matrix()` #1414

[r] Restore `SOMASparseNDArray$read_sparse_matrix()` #1414

mojaveazure commented May 24, 2023 •

edited

Loading

eddelbuettel May 24, 2023

mojaveazure May 24, 2023 •

edited

Loading

eddelbuettel May 24, 2023

eddelbuettel May 24, 2023

eddelbuettel left a comment

codecov-commenter commented May 24, 2023

mojaveazure commented May 24, 2023

eddelbuettel May 24, 2023

pablo-gar commented May 24, 2023 •

edited

Loading

mojaveazure commented May 24, 2023 •

edited

Loading

pablo-gar commented May 24, 2023 •

edited

Loading

mojaveazure commented May 24, 2023

mojaveazure commented May 24, 2023

pablo-gar commented May 24, 2023 •

edited

Loading

pablo-gar commented May 24, 2023

mojaveazure commented May 24, 2023

pablo-gar commented May 24, 2023

mlin commented May 25, 2023

mojaveazure commented May 25, 2023

[r] Restore SOMASparseNDArray$read_sparse_matrix() #1414

[r] Restore SOMASparseNDArray$read_sparse_matrix() #1414

Conversation

mojaveazure commented May 24, 2023 • edited Loading

eddelbuettel May 24, 2023

Choose a reason for hiding this comment

mojaveazure May 24, 2023 • edited Loading

Choose a reason for hiding this comment

eddelbuettel May 24, 2023

Choose a reason for hiding this comment

eddelbuettel May 24, 2023

Choose a reason for hiding this comment

eddelbuettel left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 24, 2023

Codecov Report

mojaveazure commented May 24, 2023

eddelbuettel May 24, 2023

Choose a reason for hiding this comment

pablo-gar commented May 24, 2023 • edited Loading

mojaveazure commented May 24, 2023 • edited Loading

pablo-gar commented May 24, 2023 • edited Loading

mojaveazure commented May 24, 2023

mojaveazure commented May 24, 2023

pablo-gar commented May 24, 2023 • edited Loading

pablo-gar commented May 24, 2023

mojaveazure commented May 24, 2023

pablo-gar commented May 24, 2023

mlin commented May 25, 2023

mojaveazure commented May 25, 2023

[r] Restore `SOMASparseNDArray$read_sparse_matrix()` #1414

[r] Restore `SOMASparseNDArray$read_sparse_matrix()` #1414

mojaveazure commented May 24, 2023 •

edited

Loading

mojaveazure May 24, 2023 •

edited

Loading

pablo-gar commented May 24, 2023 •

edited

Loading

mojaveazure commented May 24, 2023 •

edited

Loading

pablo-gar commented May 24, 2023 •

edited

Loading

pablo-gar commented May 24, 2023 •

edited

Loading