-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with vg call on graph generated from cactus #2546
Comments
Hi Andrea,
Basic support for cyclic paths was indeed added in 1.20 but I haven't had a
chance to test it enough. Are you able to share the inputs that are
crashing? I will try to fix it. Thanks!!
This is unrelated to the crash, but augment/pack/call can be run on
chromosome-sized (or genome-sized, if you're patient) graphs now, reducing
the need for chunk. I will post examples of how to do this soon.
…On Mon, Nov 25, 2019 at 8:17 AM RenzoTale88 ***@***.***> wrote:
Hello,
I'm writing again to ask support on the vg call stage for some alignments.
I've generated cactus alingments for 5 different mammalian genomes, and
converted that to vg as described in issue #2514
<#2514>. After that, I've proceeded as
follow for each sample analised:
sample=sample1
# Map the reads to the graph
vg map -R $sample -N $sample -S 0 -u 1 -t 4 -x all.xg -g all.gcsa my_lib1_R1.fastq.gz my_lib1_R2.fastq.gz
# combine different libraries into one gam file
while read p; do cat ${p} && rm ${p}; done < mappedlist.txt | vg gamsort --threads $NSLOTS -p -i ALIGN/${sample}/${sample}.gam.gai - > ALIGN/${sample}/${sample}.gam
# Filter the alignments
vg filter ALIGN/${sample}/${sample}.gam -r 0.90 -fu -m 1 -q 15 -D 999 -x all.xg > FILTER/$sample/$sample.filtered.gam
# Chunk the gam
vg chunk -x /exports/cmvm/eddie/eb/groups/prendergast_roslin/Andrea/GraphGenomes/NewGraph_03112019/GRAPH_CACTUS/all.xg -a FILTER/${sample}/${sample}.sorted.gam -P ./LISTS/PATHS.txt -c 50 -g -s 2500000 -o 100000 -b FILTER/${sample}/${sample}_call_chunk -t ${NSLOTS} -E FILTER/${sample}/${sample}.chunklist -f
# augment every chunk
while read p; do
reads=$( echo $p | awk '{print $4}' )
bname=$(basename -s ".gam" $reads)
vg augment ./FILTER/$sample/${bname}.vg $reads -p -C -t $NSLOTS -A ./FILTER/$sample/${bname}.aug.gam > ./FILTER/$sample/${bname}.aug.vg
vg index ./FILTER/$sample/${bname}.aug.vg -x ./FILTER/$sample/${bname}.aug.xg
vg snarls ./FILTER/$sample/${bname}.aug.xg > ./FILTER/$sample/${bname}.aug.snarls
vg pack -x ./FILTER/$sample/${bname}.aug.xg -g ./FILTER/$sample/${bname}.aug.gam -Q 15 -o ./FILTER/$sample/${bname}.aug.pack
done < FILTER/${sample}/${sample}.chunklist
# Do variant call
while read p; do
chroms=$( echo $p | awk '{print $1}' )
bpi=$( echo $p | awk '{print $2}' )
bpe=$( echo $p | awk '{print $3}' )
reads=$( echo $p | awk '{print $4}' )
clength=$( awk -v var=$chroms '$1==var {print $2}' ./LISTS/lengths.txt )
bname=$(basename -s ".gam" $reads)
vg call ./FILTER/$sample/${bname}.aug.xg -s $sample -k ./FILTER/$sample/${bname}.aug.pack -t $NSLOTS -p $chroms -l $clength -o $bpi | vcf-sort | bgzip -c > ./VCALL/$sample/${bname}.vcf.gz
done < FILTER/${sample}/${sample}.chunklist
I've performed all the stages a first time using VG 1.19.0. The code works
fine up the augmentation, then fails with the following error during vg
call:
what(): cyclic reference path (angus.1) not supported by caller
I've also tried to perform the augmentation and calling using vg 1.20.0,
and got the error attached.
stacktrace.txt <https://github.com/vgteam/vg/files/3886723/stacktrace.txt>
I've also tried more stringent filtering on each chunk as specified here
#2474 <#2474>, but it didn't worked.
Is there anything I can try or that I did wrong? I've tested the
mapping/calling pipeline on another graph genome, generated from vcf
instead of cactus, and it worked fine.
Thank you in advance for your help,
Andrea
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2546?email_source=notifications&email_token=AAG373QLHE422BY36SRNMN3QVPF5TA5CNFSM4JRIYHF2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H3Z3XLA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG373TTMWL4TKYM23R4T4DQVPF5TANCNFSM4JRIYHFQ>
.
|
Hi Glenn, PS one more detail: I'm getting the error that you find in the attached file when I use the vg graph, not the xg. With the Xg the error is different (see below). Not sure if it is of any help.
|
Sure. My email's my username here @gmail.com. Making a link here https://transfer.sh/ may also work for chunk-sized file.s |
Ok I've just shared the data through onedrive. Let me know if you manage to get them. Andrea |
This should be corrected once #2548 is merged. |
Just compiled vg and tested it, and it seems to work fine also on another sample that I chunked. Thank you very much again for your help! Andrea |
After making some more tests, I've seen that vg call now fails only with some chunks. This happens when I run the following commands:
I've tried to calculate the snarls on both the augmented vg, pg and xg, without success. Is it somewhat related to the fact that I'm computing the snarls on the wrong dataset? Or do I have to tweak some parameters? Thanks again for the help Andrea PS: @glennhickey if you want to test them, I've uploaded the data in a subfolder in the OneDrive folder I've shared with you yesterday |
It also crashes on
and looking for the only path from
and In summary, if chunking on a graph with multiple overlapping paths with Broken subpaths are themselves a bug, and would be resolved by #2506. |
So, I've changed my code so that it runs an additional step when computing the snarls. The code is now:
When I use this code, the software works fine and call the variants also in other paths (path1.1, path2.1, path3.1 etc.). But if I use the augmented XG graph, it will fail with the error code attached. stacktrace_vg_crash_M8Hhch.txt Is that right? Does the removal of the paths when computing the snarls affect downstream analysis? Also, I've just seen that #2506 has been merged. If I compile the newest version of VG, will it fix the problem when chunking the datasets? Thanks again for the support Andrea |
That looks okay. Those paths you are removing are broken, so their removal can only help. It will also save you memory, especially while working with #2506 is not merged. But if it were, it should fix this problem. But again, if you're not using the other paths then there's no harm removing them. |
Perfect, I'll proceed as recommended! |
Hello,
I'm writing again to ask support on the vg call stage for some alignments.
I've generated cactus alingments for 5 different mammalian genomes, and converted that to vg as described in issue #2514. After that, I've proceeded as follow for each sample analised:
I've performed all the stages a first time using VG 1.19.0. The code works fine up the augmentation, then fails with the following error during vg call:
I've also tried to perform the augmentation and calling using vg 1.20.0, and got the error attached.
stacktrace.txt
I've also tried more stringent filtering on each chunk as specified here #2474, but it didn't worked.
Is there anything I can try or that I did wrong? I've tested the mapping/calling pipeline on another graph genome, generated from vcf instead of cactus, and it worked fine.
Thank you in advance for your help,
Andrea
The text was updated successfully, but these errors were encountered: