-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'dev' into il-pics-drop-nulls
- Loading branch information
Showing
26 changed files
with
277 additions
and
125 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
codecov: | ||
branch: dev | ||
|
||
comment: | ||
layout: "reach, diff, flags, files" | ||
behavior: default | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
_target_: otg.finngen_studies.FinnGenStudiesStep | ||
finngen_study_index_out: ${datasets.finngen_study_index} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
_target_: otg.finngen_sumstat_preprocess.FinnGenSumstatPreprocessStep | ||
raw_sumstats_path: ??? | ||
out_sumstats_path: ??? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11 changes: 11 additions & 0 deletions
11
docs/python_api/datasource/eqtl_catalogue/_eqtl_catalogue.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: eQTL Catalogue | ||
--- | ||
|
||
The [eQTL Catalogue](https://www.ebi.ac.uk/eqtl/) aims to provide uniformly processed gene expression and splicing Quantitative Trait Loci (QTLs) from all available public studies on humans. | ||
|
||
It serves as the ultimate resource of eQTLs that we use for colocalization and target prioritization. | ||
|
||
We utilize data from the following study within the eQTL Catalogue: | ||
|
||
1. **GTEx v8**, 49 tissues |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,25 @@ | ||
--- | ||
title: Chromatin intevals | ||
title: Interaction and Interval-based Studies | ||
--- | ||
|
||
# Chromatin intervals | ||
# List of Interaction and Interval-based Studies | ||
|
||
TBC | ||
In this section, we provide a list of studies that focus on interaction and interval-based investigations, shedding light on the intricate relationships between genetic elements and their functional implications. | ||
|
||
1. **Promoter Capture Hi-C (Javierre et al., 2016):** | ||
_Title:_ "Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters". | ||
This study presents evidence linking genetic variation to genes through the application of Promoter Capture Hi-C across each of the 17 human primary hematopoietic cell types. The method captures interactions between promoters and distal regulatory elements, providing valuable insights into the three-dimensional chromatin architecture. DOI: 10.1016/j.cell.2016.09.037 | ||
|
||
2. **Enhancer-TSS Correlation (Andersson et al., 2014):** | ||
_Title:_ "An Atlas of Active Enhancers across Human Cell Types and Tissues". | ||
This study explores genetic variation's impact on genes by examining the correlation between the transcriptional activity of enhancers and transcription start sites. The findings are documented in the FANTOM5 CAGE expression atlas, offering a comprehensive view of the regulatory landscape. DOI: 10.1038/nature12787 | ||
|
||
3. **DHS-Promoter Correlation (Thurman et al., 2012):** | ||
_Title:_ "The accessible chromatin landscape of the human genome". | ||
Investigating genetic variation's connection to genes, this study employs the correlation of DNase I hypersensitive sites (DHS) and gene promoters. The analysis spans 125 cell and tissue types from the ENCODE project, providing a broad understanding of the regulatory interactions across diverse biological contexts. DOI: 10.1038/nature11232 | ||
|
||
4. **Promoter Capture Hi-C (Jung et al., 2019):** | ||
_Title:_ "A compendium of promoter-centered long-range chromatin interactions in the human genome". | ||
This study compiles a compendium of promoter-centered long-range chromatin interactions in the human genome. By focusing on the three-dimensional organization of chromatin, the research contributes to our understanding of the spatial arrangement of genetic elements and their implications in gene regulation. DOI: 10.1038/s41588-019-0494-8 | ||
|
||
For in-depth details on each study, you may refer to the respective publications. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
title: FinnGen Studies | ||
--- | ||
|
||
::: otg.finngen_studies.FinnGenStudiesStep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
title: FinnGen Preprocess Summary Stats | ||
--- | ||
|
||
::: otg.finngen_sumstat_preprocess.FinnGenSumstatPreprocessStep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
"""Airflow DAG for the harmonisation part of the pipeline.""" | ||
from __future__ import annotations | ||
|
||
import re | ||
import time | ||
from pathlib import Path | ||
from typing import Any | ||
|
||
import common_airflow as common | ||
from airflow.decorators import task | ||
from airflow.models.dag import DAG | ||
from airflow.providers.google.cloud.operators.gcs import GCSListObjectsOperator | ||
|
||
CLUSTER_NAME = "otg-finngen-harmonisation" | ||
AUTOSCALING = "gwascatalog-harmonisation" # same as GWAS Catalog harmonisation | ||
SUMMARY_STATS_BUCKET_NAME = "finngen-public-data-r10" | ||
RELEASEBUCKET = "gs://genetics_etl_python_playground/output/python_etl/parquet/XX.XX" | ||
SUMSTATS_PARQUET = f"{RELEASEBUCKET}/summary_statistics/finngen" | ||
|
||
with DAG( | ||
dag_id=Path(__file__).stem, | ||
description="Open Targets Genetics — Finngen harmonisation", | ||
default_args=common.shared_dag_args, | ||
**common.shared_dag_kwargs, | ||
): | ||
# List raw harmonised files from GWAS Catalog | ||
list_inputs = GCSListObjectsOperator( | ||
task_id="list_raw_sumstats", | ||
bucket=SUMMARY_STATS_BUCKET_NAME, | ||
prefix="summary_stats", | ||
match_glob="**/*.gz", | ||
) | ||
|
||
# Submit jobs to dataproc | ||
@task(task_id="submit_jobs") | ||
def submit_jobs(**kwargs: Any) -> None: | ||
"""Submit jobs to dataproc. | ||
Args: | ||
**kwargs (Any): Keyword arguments. | ||
""" | ||
ti = kwargs["ti"] | ||
todo = ti.xcom_pull(task_ids="list_raw_sumstats", key="return_value") | ||
print("Number of jobs to submit: ", len(todo)) # noqa: T201 | ||
for i in range(len(todo)): | ||
# Not to exceed default quota 400 jobs per minute | ||
if i > 0 and i % 399 == 0: | ||
time.sleep(60) | ||
input_path = todo[i] | ||
match_result = re.search(r"summary_stats/finngen_(.*).gz", input_path) | ||
if match_result: | ||
study_id = match_result.group(1) | ||
print("Submitting job for study: ", study_id) # noqa: T201 | ||
common.submit_pyspark_job_no_operator( | ||
cluster_name=CLUSTER_NAME, | ||
step_id="finngen_sumstat_preprocess", | ||
other_args=[ | ||
f"step.raw_sumstats_path=gs://{SUMMARY_STATS_BUCKET_NAME}/{input_path}", | ||
f"step.out_sumstats_path={SUMSTATS_PARQUET}/{study_id}.parquet", | ||
], | ||
) | ||
|
||
# list_inputs >> | ||
( | ||
list_inputs | ||
>> common.create_cluster( | ||
CLUSTER_NAME, | ||
autoscaling_policy=AUTOSCALING, | ||
num_workers=8, | ||
# num_preemptible_workers=8, | ||
master_machine_type="n1-highmem-32", | ||
worker_machine_type="n1-standard-2", | ||
) | ||
>> common.install_dependencies(CLUSTER_NAME) | ||
>> submit_jobs() | ||
>> common.delete_cluster(CLUSTER_NAME) | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.