Technology migration for better restartability
Pre-release
Pre-release
Version 2.0 of the EVA pipeline will move from Luigi to Spring Batch. Instead of tracking progress of steps as a whole (done / not done), Spring Batch splits the work in chunks of configurable size. This way, if a step has processed millions of variants before failing, it will be resumed from that point instead of completely restarted.
The functionality implemented for this first beta includes:
- Normalization of variants reported in a VCF file
- Storage of variants in MongoDB
- Calculation of allele frequencies and other statistics for all the samples in a VCF file
- Annotation using Ensembl Variant Effect Predictor
Future beta releases will include support for population statistics via a PED file and improved usability.