Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update download PDF for new schema #1259

Open
phraenquex opened this issue Jan 12, 2024 · 11 comments
Open

Update download PDF for new schema #1259

phraenquex opened this issue Jan 12, 2024 · 11 comments

Comments

@phraenquex
Copy link
Collaborator

phraenquex commented Jan 12, 2024

Mostly check that the PDF actually documents the downloaded zip file. General sanity check.

Also, see #785 for additional spec.

Also, see #786 for frontend ticket.

@kaliif
Copy link
Collaborator

kaliif commented Feb 16, 2024

I could use some input on this, I simply don't know enough of these file types. This is what the updated description says now:


### aligned_files directory

The aligned directory contains a subdirectory for each ligand that was selected for downloading.
#### Contents of aligned_files subdirectory

Depending on your selection of options when downloading the data the follow file suffixes may be present 

- [site observation code].pdb --- protein model without ligand bound
- [site observation code]\_apo.pdb - protein model with ligand bound
- [site observation code]\_event.ccp) - Event Electron density cut to around 12 Angstrom around the ligand. This has a higher signal-to-noise ratio which will amplify the evidence of ligand occupancy
- [site observation code]\_sigmaa.ccp4 - estimate of the true electron density from diffraction data and atomic model. Cut to around 12 Angstrom around the ligand.
- [site observation code]\_diff.ccp4 - difference electron density map, negative density typically represents where no electron density is found but exists in the atom model. Positive densities represent electron density without mapped atom model. Cut to around 12 Angstrom around the ligand.

### crystallographic_files directory

The crystallographic folder contains the unprocessed versions of all data found in the aligned folder. As one crystal can have mutliple ligands we provide the input crystallographic files once to avoid redundancy and keep download sizes to a minimum.

#### Contents of crystallographic_files subdirectory

Depending on your selection of options when downloading the data the follow file suffixes may be present:

- [site observation code].cif
- [site observation code].mtz Reflection data corresponding to pdb file.
- [site observation code].mtz Event Backgroud corrected reflection data corresponding to pdb file.
- [site observation code]\_[chain/ligand].ccp4 - estimate of the true electron density from diffraction data and atomic model.

In aligned_files section, there used to be a pdb with _bound.pdb suffix, the field is still called bound_file in the database, but it's now populated with a pdb file without _bound suffix.

I'm also confused about the handling of .sdf files. In v1, there were 2 options, if the Molecule model had a reference to sdf file, it was added under aligned_files, if not, and the sd file contents were stored in the database field as text, it went to missing_sdfs directory. If you try to download now, you'll see that all the site observation's sdfs are going to missing_sdfs; that's because now, a reference to sdf is not stored and there's only a text field with file contents. Since this is how it was set up in v1 and has not been changed, I can only conclude this is the desired behaviour?

@phraenquex
Copy link
Collaborator Author

@kaliif how can we edit the text of the PDF?

@tdudgeon @ConorFWild need to pin down the precise content of _apo and _desolv etc.

@kaliif
Copy link
Collaborator

kaliif commented Mar 5, 2024

@mwinokan mwinokan added 2024-03-15 indigo Data dissemination loose ends and removed 2023-08-23 violet V2 full release 2023-11-02 yellow Too big for V2 labels Mar 12, 2024
@mwinokan
Copy link
Collaborator

Include snapshot link as per #1175

@phraenquex phraenquex added 2024-03-13 green Data dissemination and removed 2024-03-15 indigo Data dissemination loose ends labels Mar 14, 2024
@mwinokan
Copy link
Collaborator

@mwinokan to take a look with Jenke

@mwinokan
Copy link
Collaborator

mwinokan commented Apr 4, 2024

@mwinokan spoke to Jenke, he is still most concerned with the SEQRES headers (#1149) and linking ASAP ID (possibly #1262)

@mwinokan
Copy link
Collaborator

ASAP ID's will be implemented in #1262 and SEQRES in #1149

@mwinokan mwinokan removed the 2024-03-13 green Data dissemination label Apr 30, 2024
@mwinokan mwinokan added the 2024-03-15 indigo Data dissemination loose ends label Apr 30, 2024
@mwinokan mwinokan moved this to XChemAlign in Fragalysis May 29, 2024
@phraenquex phraenquex added 2024-06-14 mint Data dissemination 2 and removed 2024-03-15 indigo Data dissemination loose ends labels Jun 14, 2024
@phraenquex phraenquex added the 2024-03-13 green Data dissemination label Aug 20, 2024
@phraenquex
Copy link
Collaborator Author

Add note about which map files to load into coot (the "crystallographic" ones)

@phraenquex phraenquex self-assigned this Aug 20, 2024
@phraenquex
Copy link
Collaborator Author

@kaliif please point us to the relevant file & repo to edit.

@kaliif
Copy link
Collaborator

kaliif commented Aug 20, 2024

@phraenquex
Copy link
Collaborator Author

phraenquex commented Sep 9, 2024

Things to fix.

  • Datestamp, downloaderID (if available)
  • Include snapshot (supercedes Download PDF must contain snapshot #1175
  • Describe all yaml files
  • Describe all site types
  • Describe standard extra_files (esp. metadata.csv)

@phraenquex phraenquex added 2024-09-17 olive data curation big items (too big for mint) and removed 2024-03-13 green Data dissemination labels Sep 26, 2024
@mwinokan mwinokan removed the 2024-09-17 olive data curation big items (too big for mint) label Sep 26, 2024
@mwinokan mwinokan added 2024-11-20 mint:docs v2 documentation and removed 2024-06-14 mint Data dissemination 2 labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: v2 Documentation
Development

No branches or pull requests

5 participants