Skip to content

Releases: opentargets/baseline-expression

v2.1: Fix for missing AdaTiSS scores

07 Mar 18:26
41f4bad
Compare
Choose a tag to compare

In some cases, AdaTiSS doesn't return any results for a particular gene, even though it was part of the input file.

This is probably due to mundane reasons (such as very low coverage, or a coverage which is too uniform) and will be investigated later. For now, to avoid schema autodetection issues, the adatissScores section is removed from records where there is no data returned by AdaTiSS.

A total of 3105 out of total 28619 genes are affected.

v2: AdaTiSS scores, UBERON codes, updated schema

07 Mar 17:17
36ea814
Compare
Choose a tag to compare

Full code changes can be viewed here: v1...v2.

Expression section schema changes

Compared to v1, the data schema was changed for easier extensibility in the future. Specifically, the old expressionFpkm section:

"expressionFpkm": { 
    "Brodmann (1909) area 24": 6.0,
    "breast": 6.0,
    ...
}

Was replaced with the more general expression section:

"expression": [
    {"bodyPartLevel": "tissue", "bodyPartId": "UBERON:0006101", "bodyPartName": "Brodmann (1909) area 24", "fpkm": 6.0},
    {"bodyPartLevel": "tissue", "bodyPartId": "UBERON:0000310", "bodyPartName": "breast", "fpkm": 6.0},
    ...
]

Added AdaTiSS scores

The new expressionSpecificity → adatissScores section contains the AdaTiSS Z-scores for the specificity of expression of each tissue. It follow the format:

"adatissScores": [
    {"bodyPartLevel": "tissue", "bodyPartName": "Brodmann (1909) area 24", "bodyPartId": "UBERON:0006101", "adatissScore": 0.123},
    {"bodyPartLevel": "tissue", "bodyPartName": "breast", "bodyPartId": "UBERON:0000310", "adatissScore": 0.123},
    ...
]

Added UBERON identifiers in addition to tissue names

Because GTEx metadata in the Expression Atlas release does not contain UBERON codes, these were mapped from tissue names using exact match in the UBERON OWL file.

Note, however, that two tissues from GTEx are not contained in UBERON, because they characterise not normal healthy tissues, but tissues specifically modified in laboratory conditions. Pending further discussion, these were mapped manually to the closest available normal tissue:

name_mapping['EBV-transformed lymphocyte'] = 'CL:0000945'  # lymphocyte of B lineage
name_mapping['transformed skin fibroblast'] = 'CL:0002620'  # skin fibroblast

Because in those two cases UBERON term does not match exactly the tissue in GTEx, both bodyPartName and bodyPartId were retained in the output schema for the time being.

v1: Initial release

26 Feb 13:36
a0d737e
Compare
Choose a tag to compare

Initial release of the revamped baseline expression data.

  • 53 tissues from old GTEx data
  • 28619 genes with expression ≥ 1 TPM for at least one tissue
  • Flat list of tissues (no ontology structure information)
  • FPKM expression data for all tissues
  • Two sets of expression specificity metrics
    • Overall: Gini coefficient
    • Categorical: HPA specificity and distribution metrics

Example record for this release:

{
	"ensemblGeneId": "ENSG00000001561",
	"expressionFpkm": [
		"Brodmann (1909) area 24": 6.0,
		"breast": 6.0,
		"caudate nucleus": 4.0,
		...
	],
	"expressionSpecificity": {
		"gini": 0.299,
		"hpaSpecificity": "Low tissue specificity",
		"hpaDistribution": "Detected in many"
	}
}