Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Preprocess DAG following recent architecture updates #216

Merged
merged 12 commits into from
Nov 5, 2023

Conversation

tskir
Copy link
Contributor

@tskir tskir commented Nov 2, 2023

Closes opentargets/issues#3106.

  • Added three new functions to the common_airflow module: submit_step, read_yaml_config, and generate_dag.
  • Updated the ETL DAG to use these functions, making it even more readable and focused on logic.
  • Implemented a new separate DAG for the Preprocess part, as we discussed.
  • Made some refactoring changes specific to FinnGen ingestion.

@codecov-commenter
Copy link

codecov-commenter commented Nov 2, 2023

Codecov Report

Merging #216 (cc3a1e5) into tskir-3140-better-dag-import-fix (5a63093) will increase coverage by 0.10%.
The diff coverage is 86.84%.

Impacted file tree graph

@@                         Coverage Diff                          @@
##           tskir-3140-better-dag-import-fix     #216      +/-   ##
====================================================================
+ Coverage                             85.53%   85.64%   +0.10%     
====================================================================
  Files                                    80       81       +1     
  Lines                                  1839     1853      +14     
====================================================================
+ Hits                                   1573     1587      +14     
  Misses                                  266      266              
Files Coverage Δ
src/airflow/dags/common_airflow.py 100.00% <100.00%> (ø)
src/airflow/dags/dag_genetics_etl.py 100.00% <100.00%> (ø)
src/airflow/dags/dag_preprocess.py 100.00% <100.00%> (ø)
src/otg/datasource/finngen/__init__.py 100.00% <ø> (ø)
src/otg/datasource/finngen/summary_stats.py 100.00% <100.00%> (ø)
src/otg/finngen.py 65.38% <0.00%> (ø)

@d0choa
Copy link
Collaborator

d0choa commented Nov 2, 2023

@ireneisdoomed and I failed to make the (commented) test test_no_import_errors in tests/airflow/test_dag.py run. The test is meant to validate that the DAG will be imported into Airflow without errors. It used to work before the split that created the common_airflow.py script, but I think it's now having issues with the fact that it needs to import the other script.

It feels like this is a critical test to have from a CI/CD point of view.

@tskir
Copy link
Contributor Author

tskir commented Nov 2, 2023

Thank you for the comment @d0choa! I wasn't aware of this commented out test. I'll look into it now.

@d0choa
Copy link
Collaborator

d0choa commented Nov 2, 2023

I know this is not part of this PR, but I felt you might know better what the issue might be. I reproduced the issue in a python console:

In [1]: import airflow
In [3]: from airflow.models import DagBag
In [4]: DagBag(dag_folder='src/airflow/dags', include_examples=False)
[2023-11-02T11:08:01.788+0000] {dagbag.py:536} INFO - Filling up the DagBag from src/airflow/dags
/Users/ochoa/Library/Caches/pypoetry/virtualenvs/otgenetics-_BWSZ0mB-py3.10/lib/python3.10/site-packages/airflow/providers/google/cloud/links/dataproc.py:79 AirflowProviderDeprecationWarning: This DataprocLink is deprecated.
/Users/ochoa/Library/Caches/pypoetry/virtualenvs/otgenetics-_BWSZ0mB-py3.10/lib/python3.10/site-packages/airflow/providers/google/cloud/links/dataproc.py:128 AirflowProviderDeprecationWarning: This DataprocListLink is deprecated.
[2023-11-02T11:08:04.379+0000] {dagbag.py:346} ERROR - Failed to import: src/airflow/dags/dag_preprocess.py
Traceback (most recent call last):
  File "/Users/ochoa/Library/Caches/pypoetry/virtualenvs/otgenetics-_BWSZ0mB-py3.10/lib/python3.10/site-packages/airflow/models/dagbag.py", line 342, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "src/airflow/dags/dag_preprocess.py", line 8, in <module>
    from . import common_airflow as common
ImportError: attempted relative import with no known parent package
[2023-11-02T11:08:04.384+0000] {dagbag.py:346} ERROR - Failed to import: src/airflow/dags/dag_genetics_etl.py
Traceback (most recent call last):
  File "/Users/ochoa/Library/Caches/pypoetry/virtualenvs/otgenetics-_BWSZ0mB-py3.10/lib/python3.10/site-packages/airflow/models/dagbag.py", line 342, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "src/airflow/dags/dag_genetics_etl.py", line 8, in <module>
    from . import common_airflow as common
ImportError: attempted relative import with no known parent package
Out[4]: <airflow.models.dagbag.DagBag at 0x108dbccd0>

@tskir
Copy link
Contributor Author

tskir commented Nov 2, 2023

@d0choa Thank you for the commands! I've created an issue to address this: opentargets/issues#3140

@tskir tskir force-pushed the tskir-3106-ingest-finngen branch from afb0f92 to 1427f5f Compare November 2, 2023 11:30
@tskir tskir changed the base branch from main to tskir-3140-fix-dag-test November 2, 2023 11:32
Base automatically changed from tskir-3140-fix-dag-test to main November 2, 2023 11:35
@tskir
Copy link
Contributor Author

tskir commented Nov 2, 2023

@d0choa I've now rebased this branch on top of the recently merged PR, so that we can take advantage of the re-enabled test to check these new changes (it passes here as well)

@tskir tskir force-pushed the tskir-3106-ingest-finngen branch from 1427f5f to 4490d5a Compare November 2, 2023 11:41
@ireneisdoomed
Copy link
Contributor

@tskir Thanks! This fix is relevant if you want to test this PR #219

@tskir tskir force-pushed the tskir-3106-ingest-finngen branch from 96b1051 to 029fd5d Compare November 2, 2023 15:11
@tskir tskir requested a review from ireneisdoomed November 2, 2023 15:28
@tskir
Copy link
Contributor Author

tskir commented Nov 2, 2023

@ireneisdoomed Thank you for the fix! I've applied it and rebased this PR on top of it. Autoscaling PR is coming up separately as it's out of scope for this one 🙃

@tskir tskir force-pushed the tskir-3106-ingest-finngen branch from 9faafc3 to 104f020 Compare November 2, 2023 17:09
@tskir tskir changed the base branch from main to tskir-3140-better-dag-import-fix November 2, 2023 17:12
@ireneisdoomed ireneisdoomed deleted the branch tskir-3140-better-dag-import-fix November 3, 2023 09:15
@ireneisdoomed ireneisdoomed reopened this Nov 3, 2023
Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good.
Just a comment outside the scope of this PR: I hadn't realised that Finngen's study table generation is coupled to SS processing. Right now Finngen's table is static, and we wouldn't need to produce one unless there's a new version out. However we haven't dealt with trait mappings yet, and this process could break the static nature of this step.

src/airflow/dags/dag_preprocess.py Show resolved Hide resolved
src/otg/finngen.py Show resolved Hide resolved
Copy link
Collaborator

@d0choa d0choa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @ireneisdoomed comments.
Let's try to merge this and we keep moving

@tskir
Copy link
Contributor Author

tskir commented Nov 5, 2023

Thank you @d0choa @ireneisdoomed for constructive and thought-provoking comments, as usual!

@tskir tskir merged commit 90e7d44 into tskir-3140-better-dag-import-fix Nov 5, 2023
1 check passed
@tskir tskir deleted the tskir-3106-ingest-finngen branch November 5, 2023 07:13
@tskir tskir restored the tskir-3106-ingest-finngen branch November 5, 2023 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement FinnGen Airflow DAG under the new architecture
4 participants