Skip to content

Commit

Permalink
[r] Write group-level string metadata as TILEDB_STRING_UTF8 (#3469)
Browse files Browse the repository at this point in the history
* [r] Write group-level string metadata as `TILEDB_STRING_UTF8`
String group-level metadata was previously encoded using
`TILEDB_CHAR` or `TILEDB_STRING_ASCII`; however, this resulting in the
metadata being read in as `bytes` in the Python API instead of as `str`.
The Python API already [encodes all strings (`str`) as
`TILEDB_STRING_UTF8`](https://github.com/single-cell-data/TileDB-SOMA/blob/884342a1ceb994d677c52c74ba2d789fc4e208d4/apis/python/src/tiledbsoma/common.cc#L211-L223)
so this PR brings the R API in-line with the Python API

[SC-61001](https://app.shortcut.com/tiledb-inc/story/61001)

resolves #2698

* Add tests

* Update test

* Attempt to fix Python nonsense

* Fix more Python nonsense

* Use spaces instead of tabs

* Update changelog
Bump develop version
  • Loading branch information
mojaveazure authored Dec 18, 2024
1 parent 4fbbebe commit bda7d97
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 2 deletions.
2 changes: 1 addition & 1 deletion apis/r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Description: Interface for working with 'TileDB'-based Stack of Matrices,
like those commonly used for single cell data analysis. It is documented at
<https://github.com/single-cell-data>; a formal specification available is at
<https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md>.
Version: 1.16.99
Version: 1.16.99.1
Authors@R: c(
person(given = "Aaron", family = "Wolen",
role = c("cre", "aut"), email = "[email protected]",
Expand Down
2 changes: 2 additions & 0 deletions apis/r/NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Unreleased

* Encode string metadata as `TILEDB_STRING_UTF8` instead of `TILEDB_STRING_ASCII`

# tiledbsoma 1.15.0

## Changes
Expand Down
3 changes: 2 additions & 1 deletion apis/r/src/groups.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -254,8 +254,9 @@ void c_group_put_metadata(
std::string s(v[0]);
// We use TILEDB_CHAR interchangeably with TILEDB_STRING_ASCII is
// this best string type?
// Use TILEDB_STRING_UTF8 for compatibility with Python API
xp->grpptr->set_metadata(
key, TILEDB_STRING_ASCII, s.length(), s.c_str());
key, TILEDB_STRING_UTF8, s.length(), s.c_str());
break;
}
case LGLSXP: { // experimental: map R logical (ie TRUE, FALSE, NA) to
Expand Down
28 changes: 28 additions & 0 deletions apis/system/tests/test_character_write_python_read_r.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env python3

import pytest

import tiledbsoma

from .common import TestWritePythonReadR


class TestCharacterMetadataWritePythonReadR(TestWritePythonReadR):

@pytest.fixture(scope="class")
def experiment(self):
exp = tiledbsoma.Experiment.create(self.uri)
exp.close

def base_R_script(self):
return f"""
library(tiledbsoma)
exp <- SOMAExperimentOpen("{self.uri}")
md <- exp$get_metadata()
"""

def test_r_character(self, experiment):
self.r_assert(
"stopifnot(all(vapply(md, \(x) is.character(x$name), logical(1L))))"
)
24 changes: 24 additions & 0 deletions apis/system/tests/test_character_write_r_read_python.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env python3

import pytest

import tiledbsoma

from .common import TestReadPythonWriteR


class TestCharacterMetadataWriteRReadPython(TestReadPythonWriteR):
@pytest.fixture(scope="class")
def R_character(self):
base_script = f"""
library(tiledbsoma)
exp <- SOMAExperimentCreate("{self.uri}")
exp$close()
"""
self.execute_R_script(base_script)

def test_py_character(self, R_character):
with tiledbsoma.open(self.uri) as exp:
for key in exp.metadata.keys():
assert isinstance(exp.metadata.get(key), str)

0 comments on commit bda7d97

Please sign in to comment.