Releases: opentargets/baseline-expression
v2.1: Fix for missing AdaTiSS scores
In some cases, AdaTiSS doesn't return any results for a particular gene, even though it was part of the input file.
This is probably due to mundane reasons (such as very low coverage, or a coverage which is too uniform) and will be investigated later. For now, to avoid schema autodetection issues, the adatissScores
section is removed from records where there is no data returned by AdaTiSS.
A total of 3105 out of total 28619 genes are affected.
v2: AdaTiSS scores, UBERON codes, updated schema
Full code changes can be viewed here: v1...v2.
Expression section schema changes
Compared to v1, the data schema was changed for easier extensibility in the future. Specifically, the old expressionFpkm
section:
"expressionFpkm": {
"Brodmann (1909) area 24": 6.0,
"breast": 6.0,
...
}
Was replaced with the more general expression
section:
"expression": [
{"bodyPartLevel": "tissue", "bodyPartId": "UBERON:0006101", "bodyPartName": "Brodmann (1909) area 24", "fpkm": 6.0},
{"bodyPartLevel": "tissue", "bodyPartId": "UBERON:0000310", "bodyPartName": "breast", "fpkm": 6.0},
...
]
Added AdaTiSS scores
The new expressionSpecificity → adatissScores section contains the AdaTiSS Z-scores for the specificity of expression of each tissue. It follow the format:
"adatissScores": [
{"bodyPartLevel": "tissue", "bodyPartName": "Brodmann (1909) area 24", "bodyPartId": "UBERON:0006101", "adatissScore": 0.123},
{"bodyPartLevel": "tissue", "bodyPartName": "breast", "bodyPartId": "UBERON:0000310", "adatissScore": 0.123},
...
]
Added UBERON identifiers in addition to tissue names
Because GTEx metadata in the Expression Atlas release does not contain UBERON codes, these were mapped from tissue names using exact match in the UBERON OWL file.
Note, however, that two tissues from GTEx are not contained in UBERON, because they characterise not normal healthy tissues, but tissues specifically modified in laboratory conditions. Pending further discussion, these were mapped manually to the closest available normal tissue:
name_mapping['EBV-transformed lymphocyte'] = 'CL:0000945' # lymphocyte of B lineage
name_mapping['transformed skin fibroblast'] = 'CL:0002620' # skin fibroblast
Because in those two cases UBERON term does not match exactly the tissue in GTEx, both bodyPartName
and bodyPartId
were retained in the output schema for the time being.
v1: Initial release
Initial release of the revamped baseline expression data.
- 53 tissues from old GTEx data
- 28619 genes with expression ≥ 1 TPM for at least one tissue
- Flat list of tissues (no ontology structure information)
- FPKM expression data for all tissues
- Two sets of expression specificity metrics
- Overall: Gini coefficient
- Categorical: HPA specificity and distribution metrics
Example record for this release:
{
"ensemblGeneId": "ENSG00000001561",
"expressionFpkm": [
"Brodmann (1909) area 24": 6.0,
"breast": 6.0,
"caudate nucleus": 4.0,
...
],
"expressionSpecificity": {
"gini": 0.299,
"hpaSpecificity": "Low tissue specificity",
"hpaDistribution": "Detected in many"
}
}