This subdirectory contains the python
and R
notebooks that perform the bulk of the analysis for this project. Below is a description of what each notebook contains.
I numbered the notebooks based on their order in the analysis. Although order is important for some notebooks (e.g. an output is used in a downstream notebook), in others it is arbitrary.
-
process-variant-calls.Rmd
The goal of this notebook is to determine the quality of the variant calling data. In this notebook, I filter variants based on quality and combine variants across programs. I return a filtered and joined variant dataframe.
-
determine-main-genotypes.Rmd
The goal of this notebook is to phase mutations by clustering variants that have strongly correlated frequencies across multiple tissue samples. In this notebook, I identify the 'genotypes'
Genome 1
andGenome 2
and the subclonal genomeGenome 1-1
. These are clusters of mutations that are present in every tissue. -
phase-subclonal-mutations.Rmd
The goal of this notebook is to phase more clusters of variants using the same approach we used to define the 'genotypes'
Genome-1
andGenome-2
. In this notebook I identify a rough set of subclonal haplotypes (or mutation clusters) on the background of G1 and G2. -
assign-haplotype-backgrounds.Rmd
The goal of this notebook is to establish a method to genotype SNPs as either belonging to
Genome 1
or toGenome 2
using reads that 'bridge' between haplotype SNPs and SNPs inGenome-1
andGenome-2
. In this notebook I determine the background of subclonal haplotypes. -
validate-haplotype-assignments.Rmd
The goal of this notebook is take the haplotypes that we identified and check for problems like homoplasy or other issues. In this notebook, I produce a final set of filtered and validated haplotypes.
-
prepare-spruce-input.Rmd
The goal of this notebook is to prepare the data for SPRUCE analysis. In this notebook, I calculate the mean haplotype frequency and variance across all samples for each haplotype. I format this data for SPRUCE.
-
filter-spruce-trees.Rmd
The goal of this notebook is to finalize the phylogenetic relationship of the haplotypes identified by SPRUCE/MACHINA. In this notebook, I filter the SPRUCE trees by removing trees with edges that are not supported by bridging reads.
-
investigate-driver-mutations.Rmd
The goal of this notebook is to determine how subclonal 'driver' mutations fit on the tree. I also look at the frequeny of the two Fusion C-terminal tail mutations in the brain.
-
make-phylogenetic-tree.ipynb
The goal of this notebook is to use iqtree to make a phylogenetic tree of the haplotypes.
-
visualize-phylogenetic-tree.Rmd
The goal of this notebook is to visualize the iqtree with ggplot. In this notebook, I use ggtree to visualize the phylogenetic tree.
-
cluster-tissues-spatially.Rmd
The goal of this notebook is to take an unbiased approach to show how similar tissue compartments have similar viral populations. In this notebook, I use PCA on the frequency of SNVs in each tissue.
-
analyze-strand-origin.Rmd
The goal of this notebook is to see if SNVs called on reads from either positive sense or negative sense strands are different. Also, this notebook contains plots of the coverage of reads from positive and negative stand RNA.
clique-snv-analysis.Rmd
The goal of this notebook is to adopt an approach from CliqueSNV to see if G1/G1, G2/G2, and G1/G2 SNV pairs are linked or forbidden.