Skip to content

Commit

Permalink
add logo to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ekg committed Jul 18, 2022
1 parent 0690737 commit 37e9ded
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# pggb

![PanGenome Graph Builder](https://raw.githubusercontent.com/pangenome/pggb/master/data/images/pggb-logo-rounded.png)

![Publish container to github container registry](https://github.com/pangenome/pggb/workflows/Publish%20container%20to%20github%20container%20registry/badge.svg)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](https://anaconda.org/bioconda/pggb)

Expand All @@ -22,7 +24,7 @@ If you have many genomes, we recommend using the [PanSN prefix naming pattern](h
To build a graph from `input.fa`, which contains 9 haplotypes, in the directory `output`, scaffolding the graph using 10kb matches at >= 90% identity, and using 16 parallel threads for processing, execute:

```
pggb \
pggb \
-i input.fa \
-o output \
-t 16 \
Expand Down Expand Up @@ -249,22 +251,22 @@ Although its design represents efforts to scale these approaches to collections
It's straightforward to generate a pangenome graph by the all-pairs alignment of a set of input sequences.
This can scale poorly, but it has ideal sensitivity.
The mashmap/wfa alignment algorithm in `wfmash` is a very fast way to generate alignments between the sequences.
Crucially, it is robust to repetitive sequences (the initial mash mapping step is linear in the space of the genome
Crucially, it is robust to repetitive sequences (the initial mash mapping step is linear in the space of the genome
irrespective of its sequence context), and it can be adjusted using probabilistic thresholds for segment alignment identity.
This allows us to define the base graph structure using a few free parameters: we consider the best-n candidate alignments
for each N-bp segment, where the alignments must have at least a given identity threshold.

The wfa-based alignments can break down in the case of large indels, yielding ambiguous and difficult-to-interpret alignments.
But, we should not use such regions of the alignments directly in the graph construction, as this can increase graph complexity.
We ignore such regions by preventing `seqwish` from closing the graph through matches less than `-k, --min-match-len` bp.
In effect, this filter to the input to `seqwish` forces structural variations and regions of very low identity to be
In effect, this filter to the input to `seqwish` forces structural variations and regions of very low identity to be
represented as bubbles. This reduces the local topological complexity of the graph at the cost of increasing its redundancy.

The manifold nature of typical variation graphs means that they are very likely to look linear locally.
By running a stochastic 1D layout algorithm that attempts to match graph distances (as given by paths) between nodes and
their distances in the layout, we execute a kind of multi-dimensional scaling (MDS). In the aggregate, we see that
By running a stochastic 1D layout algorithm that attempts to match graph distances (as given by paths) between nodes and
their distances in the layout, we execute a kind of multi-dimensional scaling (MDS). In the aggregate, we see that
regions that are linear (the chains of nodes and bubbles) in the graph tend to co-localize in the 1D sort.
Applying an MSA algorithm (in this case, `abPOA` or `spoa`) to each of these chunks enforces a local linearity and
Applying an MSA algorithm (in this case, `abPOA` or `spoa`) to each of these chunks enforces a local linearity and
homogenizes the alignment representation. This smoothing step thus yields a graph that is locally as we expect: partially
ordered, and linear as the base DNA molecules are, but globally can represent large structural variation. The homogenization
also rectifies issues with the initial wfa-based alignment.
Expand Down

0 comments on commit 37e9ded

Please sign in to comment.