Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some indexed fields aren't covered by field mapping #6773

Open
dsotirho-ucsc opened this issue Dec 18, 2024 · 5 comments
Open

Some indexed fields aren't covered by field mapping #6773

dsotirho-ucsc opened this issue Dec 18, 2024 · 5 comments
Assignees
Labels
orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point

Comments

@dsotirho-ucsc
Copy link
Contributor

dsotirho-ucsc commented Dec 18, 2024

[updated description in comment below]

Datasets

_field_mapping azul_v2_anvildev_anvil_datasets_aggregate
consent_group consent_group
data_modality data_modality
data_use_permission data_use_permission
dataset_id dataset_id
description
document_id document_id
owner owner
principal_investigator principal_investigator
registered_identifier registered_identifier
source_datarepo_row_ids source_datarepo_row_ids
title title

Files

_field_mapping azul_v2_anvildev_anvil_files_aggregate
crc32 crc32
data_modality data_modality
document_id document_id
drs_uri drs_uri
file_format file_format
file_id file_id
file_md5sum file_md5sum
file_name file_name
file_size file_size
file_size_
is_supplementary is_supplementary
name
reference_assembly reference_assembly
sha256 sha256
size
size_
source_datarepo_row_ids source_datarepo_row_ids
uuid
version
@dsotirho-ucsc dsotirho-ucsc added the orange [process] Done by the Azul team label Dec 18, 2024
@dsotirho-ucsc dsotirho-ucsc self-assigned this Dec 18, 2024
@hannes-ucsc
Copy link
Member

What about entity types other than files and dataset. What about HCA?

Please add English description of the problem. The tables alone aren't sufficient.

@hannes-ucsc hannes-ucsc removed their assignment Dec 20, 2024
@achave11-ucsc achave11-ucsc added the spike:1 [process] Spike estimate of one point label Dec 20, 2024
@dsotirho-ucsc
Copy link
Contributor Author

Updated ticket's description.

@hannes-ucsc
Copy link
Member

If you keep editing the original description, the conversation below the description will be confusing.

Please undo the last edit, going back to version 2 of the description, and provide the most recent version as an additional comment.

@dsotirho-ucsc
Copy link
Contributor Author

The Plugin._field_mapping properties for the anvil and hca metadata plugins act as an incomplete listing of the fields stored in the indices.

Indexed fields not in the HCA field_mapping:

contents.analysis_protocols.document_id
contents.cell_lines.biomaterial_id
contents.cell_lines.document_id
contents.cell_lines.model_organ
contents.cell_suspensions.biomaterial_id
contents.cell_suspensions.document_id
contents.cell_suspensions.organ
contents.cell_suspensions.organ_part
contents.cell_suspensions.selected_cell_type
contents.cell_suspensions.total_estimated_cells
contents.cell_suspensions.total_estimated_cells_
contents.cell_suspensions.total_estimated_cells_redundant
contents.cell_suspensions.total_estimated_cells_redundant_
contents.contributed_analyses.document_id
contents.contributed_analyses.file
contents.donors.biomaterial_id
contents.donors.document_id
contents.donors.donor_count_
contents.files._type
contents.files.content-type
contents.files.count
contents.files.crc32c
contents.files.document_id
contents.files.drs_uri
contents.files.file_type
contents.files.indexed
contents.files.lane_index
contents.files.lane_index_
contents.files.read_index
contents.files.related_files
contents.files.sha256
contents.files.size_
contents.imaging_protocols.document_id
contents.library_preparation_protocols.document_id
contents.organoids.biomaterial_id
contents.organoids.document_id
contents.organoids.model_organ
contents.organoids.model_organ_part
contents.projects._type
contents.projects.contributors
contents.projects.estimated_cell_count_
contents.projects.publications
contents.projects.supplementary_links
contents.sample_specimens._source
contents.sample_specimens._type
contents.sample_specimens.biomaterial_id
contents.sample_specimens.document_id
contents.sample_specimens.has_input_biomaterial
contents.sample_specimens.organ
contents.sample_specimens.organ_part
contents.sample_specimens.preservation_method
contents.sample_specimens.storage_method
contents.samples.document_id
contents.sequencing_inputs.biomaterial_id
contents.sequencing_inputs.document_id
contents.sequencing_inputs.sequencing_input_type
contents.sequencing_protocols.document_id
contents.specimens._source
contents.specimens._type
contents.specimens.biomaterial_id
contents.specimens.document_id
contents.specimens.has_input_biomaterial
contents.specimens.storage_method

Indexed fields not in the AnVIL field_mapping:

contents.datasets.description
contents.diagnoses.diagnosis_age
contents.diagnoses.onset_age
contents.files.count
contents.files.file_size_
contents.files.name
contents.files.size
contents.files.size_
contents.files.uuid
contents.files.version

@achave11-ucsc
Copy link
Member

@hannes-ucsc: "The field mapping is not supposed to cover every field in the index, only fields in the response. We might have meant to cover every index field by the field types."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point
Projects
None yet
Development

No branches or pull requests

3 participants