Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for writing HCS metadata #153

Merged
merged 7 commits into from
Jan 12, 2022
Merged

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Jan 5, 2022

Follow-up of #149, this adds two new top-level APIs to the ome_zarr.writer module: write_plate_metadata and write_well_metadata for creating plate and well specifications.

Unit tests are added to cover the combinations of parameters both in writing mode (test_writer) as well as in writing/reading mode (test_node/test_reader). This should significantly increase the coverage of the HCS codepath especially in reader.py.

The test_plate_2D5D unit test reproduces the issue raised in #145 and is currently marked as xfail. As the bug should be addressed in #148, I would propose to prioritize the review and integration of this PR first and remove the test marker in #148 together with the fix.

Thinking of the upcoming sepcification changes, the biggest potential implication for this new API are the changes proposed in ome/ngff#24 to the wells attribute of the plate specification with the addition of rowIndex and columnIndex attributes. A proposal would be to preemptively change type of the wells argument to List[dict] and add an internal API similar to _validate_plate_acquisitions to inspect the required fields conditionally to the format version.

Define two new methods write_plate_metadata and write_well_metadata including
internal validation methods
@codecov
Copy link

codecov bot commented Jan 5, 2022

Codecov Report

Merging #153 (ff197a0) into master (1c71531) will increase coverage by 9.20%.
The diff coverage is 90.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #153      +/-   ##
==========================================
+ Coverage   71.28%   80.49%   +9.20%     
==========================================
  Files          11       11              
  Lines        1132     1174      +42     
==========================================
+ Hits          807      945     +138     
+ Misses        325      229      -96     
Impacted Files Coverage Δ
ome_zarr/format.py 94.52% <ø> (ø)
ome_zarr/writer.py 93.44% <90.69%> (-1.56%) ⬇️
ome_zarr/reader.py 83.66% <0.00%> (+24.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c71531...ff197a0. Read the comment docs.

For writing HCS datasets where group names are case-sensitive (e.g. A), this
option alters the name and creates a mismatch with the plate metadata.
@sbesson
Copy link
Member Author

sbesson commented Jan 5, 2022

After some quick investigation of https://github.com/ome/ome-zarr-py/runs/4714215369?check_suite_focus=true, the failure is unrelated to the Zarr development branch but the tests were simply failing on Ubuntu.

Reproducing them in a Docker environment, I found the issue comes from the key normalization which leads to the creation of Zarr group a for row A causing an obvious mismatch between the hierarchy and the .zattrs metadata. f468a0a, which I also found to be necessary in #148, addresses this issue.
d1311aa also re-enables Ubuntu tests without the Zarr development branch.

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but one "future warning" so to speak.

images[index] = {"path": str(image)}
elif isinstance(image, dict):
if not all(e in VALID_KEYS for e in image.keys()):
raise ValueError(f"{image} contains invalid keys")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec has been updated with a "MUST NOT" have other keys? Eventually, that could be problematic. Perhaps the prefix mechanism that @will-moore found in json-schema that leaves them unchecked could eventually be introduced here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking up on this. I have not found a clear statement regarding the validity of additional keys in the ngff spec. This strict implementation is probably derived from my reading of multiscales where I assume metadata is the single key that tried to capture extra arguments.

That being said, I totally see this MUST NOT interpretation is and will be limiting both for the extension of the specification itself as well as for supporting external metadata not covered by the spec.

The well metadata is a perfect example as ome/ngff#24 defines more keys. Assuming someone wanted to write a version 0.3 with these new keys populated, is it legit and desired to let the writer implementation write this metadata? If so, is it worth a logging statement at WARN, INFO or DEBUG level? Or should this writer only care about the keys defined in the spec and ignore anything extra?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was salad schema where you could use a prefix: to avoid validation, but I think for json-schema you don't need to do that and it won't fail for unrecognised attributes.

@sbesson
Copy link
Member Author

sbesson commented Jan 9, 2022

Pushed 12b341d in response to the concerns expressed in #153 (comment) about unspecified keys. With this commit, the writer handles leniently unspecified keys are present in lists of dictionaries (acquisitions, images). The unit tests are adjusted to ensure the group creation is successful with all keys been written.

If we agree with this approach, I think it makes sense to implement immediately the proposal made at the end of this PR description i.e. update the write_plate_metadata signature to support wells either as Union[List[str], List[dict]] or as List[dict] and handle unspecified keys similarly to acquisitions and images.

@will-moore
Copy link
Member

@sbesson That commit 12b341d replaces not all with not any in one case and any in the other. Is that right?

@sbesson
Copy link
Member Author

sbesson commented Jan 10, 2022

Good catch. ff197a0 should unify both behaviors. Since the behavior is a simple logging statement, that's not something that the unit tests picked up

@will-moore
Copy link
Member

@sbesson Thanks PR looks good. Updating write_plate_metadata to support List[Dict] makes sense. In this PR?

@sbesson
Copy link
Member Author

sbesson commented Jan 11, 2022

In this PR?

Originally, that's what I thought but with different fires coming up, I feel this will get pushed. If you and @joshmoore are happy with the state of the latest proposed API, I'd propose to get this merged and have the new tests as well as the re-activated Ubuntu tests merged into the mainline. I'll handle the new wells type handling in a follow-up PR which should be self-contained and easier to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants