Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to Review Data Before Starting WF Automation #339

Open
Buwujiu opened this issue Mar 29, 2023 · 0 comments
Open

Ability to Review Data Before Starting WF Automation #339

Buwujiu opened this issue Mar 29, 2023 · 0 comments
Labels

Comments

@Buwujiu
Copy link

Buwujiu commented Mar 29, 2023

Why we need it?

DCC can do sanity check here so make sure submitter IDs are consistent etc .
Future note: Need a validation between clinical and song to make sure IDs are consistent.

We need a function to enable data quality review before kicking off workflow automation.
The first step is just to give admins a chance to review the submitted data before it gets sent to automation, we don't want submitted data to start the automated pipeline immediately, but instead to have a step for human review and approval.

Detailed Description

  • Don’t want Model-T to be an officially supported tool that all RDPCs rely on… GoogleSheets should not be required data manager for our RDPCs.
  • Instead, we would add a new page to the Workflow UI that would let us view Submitted analyses and the workflows run on them. From this view, a data manager for the RDPC can approve each analysis to begin processing (button on the UI)
  • data manager will need to change the params sometimes. To support this, we could give access to the resume/restart api through this UI.
  1. Need a list of analyses from submitted song, show wf that have been run on these(option 1: update analysis index to include wf data. Option 2: query wf for analysis ID ). If not any, need buttons to start the wf.
  2. on each analysis, in order to start wf, need to change API to send msg to a new topic(msg has full analysis info), so ingest node will have same data as song_analysis topic.
  3. Do we want to include info like analysis is validated? for a simple solution, as long as the msg is sent, we assume the analysis is validated.

Question:

  1. what is being reviewed?
  • once users submit data -> pre-alignment QC -> alignment. We are skipping pre-align QC, because previous 25K has high quality data.
  • alignment to VC can be all automatic.
  1. Pre-align QC what it is?
  • almost done, need to upgrade nextflow version, and test.
  • next step is to integrate pre-align QC into the pipeline.
  1. once we have the pre-align QC, and data passes QC, will we need a manual step to kick off automation?
  • No.
  1. what we need is if QC fails, need to let submitters know, and need:
  • until we have the QC, we need to disconnect the ingest node from song analysis topic.

    • step 1 : config change only, no code change.
      • rdpc-infra
        step 2: ability to start wf pipeline after review
      • need a new API as part of the wf api, so that rdpc admin can request analysis id to be sent to the automation system.
      • the endpoint will get analysis info from song, and send it a kafka topic, and let ingest node listen to this topic. The switch is also a config change.
      • Mutation to Inject Song Analysis into A kafka Topic song-search#83
  • step 3: ability to review submitted files, so we can pre-align check manually.

    • need research Investigate the Ability to List Analysis and WF song-search#82
    • end goal is to have a ui to show submitted analyses(seq exp) and what wf have been done on these analyses if any
    • question is does our current gateway api let us query these info??
    • list submitted analyses
    • wf on these analyses
    • if the api doesn't provide, need to implement
  1. what's the process right now?
  • we are skipping pre-align QC for POC-CA
  • check song if new data is submitted, look for seq exp and if there is T/N pair. if so, can start alignment.
  1. step 4: Future work after API work done - What do yo need on the UI?
    - start automation button(will replace model T)
    • UI to list analysis info and show wf on these analyses.

What we know:

  1. pre-align QC is adding a new WF before alignment. it doesn't exist now.
  2. pre-align QC will produce reports, if wf pass certain threshold, wf can proceed.
  3. Need something simple to use until pre-align QC is ready.
  4. will be more QC reports, post alignment, post VC.

Future Step:
1.how to report QC failures?

Architecture: https://drive.google.com/file/d/1JI9dO-Hx098CvUFj-W4y6_46up2BRPF0/view?usp=sharing

@Buwujiu Buwujiu added the Epic label Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant