Skip to content

Variant Representation

Eric T. Dawson edited this page Apr 1, 2017 · 2 revisions

A unified representation of variants

In vg, there is almost no difference in the way variants of any size are represented. It is best to first think about our reference, which is Path in the graph, or a string of nodes connected by edges in a linear fashion.

Any variants relative to the reference path create bubbles within the graph. A bubble represents alternative paths between two nodes in the graph.

Paths and Snarls and SnarlTraversals

Paths in vg are annotations on Nodes of the graph. This may include mismatches, insertions, and deletions relative to the basepairs of the node. Paths are permitted to be empty (i.e. in the case of a flat genomic deletion), however these empty paths are not useful in practice as they do not provide any coordinate reference to the graph.

Snarls and SnarlTraversals solve the problem of empty paths. Snarls are defined by an entrance node, an exit node, and a collection of internal child Nodes or even nested internal child Snarls.

Snarls allow us to represent deletions as the flanking reference nodes and an empty set of internal nodes, allowing us to locate them in the graph. They can be used to represent most variants we expect to see in the graph, though not the complete universe as Paths do.

Flat vs. Regular ("parsed" or "aligned") alleles

Flat alleles require that every allele in the graph must be represented by at least one base. This allows as to label deletions, rather than using empty paths.

Clone this wiki locally