Skip to content

Commit

Permalink
fix(eqtl): correct pattern that extracts geneid in study index
Browse files Browse the repository at this point in the history
  • Loading branch information
ireneisdoomed committed Dec 21, 2023
1 parent 3e372a4 commit 14e9f65
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/otg/datasource/eqtl_catalogue/study_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def add_gene_to_study_id(
# Explode the list of full study IDs into separate rows
.withColumn("studyId", f.explode("fullStudyIdList"))
# Add geneId column
.withColumn("geneId", f.regexp_extract(f.col("studyId"), r"(.*)_[^_]+", 1))
.withColumn("geneId", f.regexp_extract(f.col("studyId"), r"([^_]+)$", 1))
.drop("fullStudyIdList")
)
return StudyIndex(_df=study_index_df, _schema=StudyIndex.get_schema())
Expand Down

1 comment on commit 14e9f65

@ireneisdoomed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern to extract geneId from the studyId didn't work. proposedGeneId is my suggested solution

+-----------------------+----------------+--------------+
|studyId                |originalGeneId  |proposedGeneId|
+-----------------------+----------------+--------------+
|PROJECT_QTLGROUP_GENEID|PROJECT_QTLGROUP|GENEID        |
+-----------------------+----------------+--------------+

Please sign in to comment.