Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.17.0 never complete #262

Open
jiadong324 opened this issue Aug 5, 2024 · 5 comments
Open

v0.17.0 never complete #262

jiadong324 opened this issue Aug 5, 2024 · 5 comments

Comments

@jiadong324
Copy link

Hi,

I am using both v0.13.0 and v0.17.0 to align a same set of query and reference sequences with same paramers. The size of ref.fa is about 12G and query.fa is 46G.

wfmash -s 50k -l 150k -p 90 -n 1 -H 0.001 -t 30 ref.fa query.fa > wfmash.paf

The v0.13.0 finished in about 1.5 hours. The *.paf of v0.17.0 is approximately the same size as the output generated by v0.13.0. It seems that v0.17.0 stop to write but the program is still running. It stuck at the 99.94% completeness of sequence alignment.

[mashmap] MashMap v3.1.1
[mashmap] Reference = [../g2_fa/g2_asm.fa]
[mashmap] Query = [../g3_fa/g3_asm.fa]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 2998
[mashmap] Segment length = 50000 (read split allowed)
[mashmap] Block length min = 150000
[mashmap] Chaining gap max = 30000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Do not skip self mappings
[mashmap] Hypergeometric filter w/ delta = 0 and confidence 0.999
[mashmap] Mapping output file = /net/eichler/vol28/projects/medical_reference/nobackups/human_SDR/DP_Platinum/wfmash_g2g3/wfmash-9VW7x7
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads  = 30
[mashmap::skch::Sketch::build] Unique minmer hashes before pruning = 138308619
[mashmap::skch::Sketch::build] Total minmer windows before pruning = 1390109507
[mashmap::skch::Sketch::computeFreqHist] Frequency histogram of minmer interval points = (2, 4297961) ... (3863160, 1)
[mashmap::skch::Sketch::computeFreqHist] With threshold 0.001%, ignore minmers with more than >= 30412 interval points during mapping.
[mashmap::skch::Sketch] Unique minmer hashes after pruning = 138307236
[mashmap::skch::Sketch] Total minmer windows after pruning = 1307505552
[wfmash::map] time spent computing the reference index: 427.234 sec
[mashmap::skch::Map::mapQuery] mapped  -nan% @ 0.00e+00 bp/s elapsed: 00:01:38:23 remain: 00:00:00:00
[mashmap::skch::Map::mapQuery] count of mapped reads = 1847, reads qualified for mapping = 2132, total input reads = 2132, total input bp = 0
[wfmash::map] time spent mapping the query: 5.91e+03 sec
[wfmash::map] mapping results saved in: /net/eichler/vol28/projects/medical_reference/nobackups/human_SDR/DP_Platinum/wfmash_g2g3/wfmash-9VW7x7
[wfmash::align] Reference = [../g2_fa/g2_asm.fa]
[wfmash::align] Query = [../g3_fa/g3_asm.fa]
[wfmash::align] Mapping file = /net/eichler/vol28/projects/medical_reference/nobackups/human_SDR/DP_Platinum/wfmash_g2g3/wfmash-9VW7x7
[wfmash::align] Alignment identity cutoff = 0.00%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.08 sec
[wfmash::align::computeAlignments] aligned 99.94% @ 2.58e+05 bp/s elapsed: 01:20:46:26 remain: 00:00:01:36
@miliasot
Copy link

Hello,
I get some of my jobs stuck with v0.21.0 too.

@ekg
Copy link
Collaborator

ekg commented Oct 16, 2024 via email

@ASLeonard
Copy link
Contributor

ASLeonard commented Oct 17, 2024

The issue is probably a bit deeper than that. I can provide a small-ish example that always triggers the freeze.

The first thing to note was changing --erode-match-mismatch allowed wfmash to finish. It froze in the default setting (the erode_k=127), froze for erode_k down to around 45, but would finish if erode_k < 40.

After chasing lambdas, it is getting stuck in an infinite loop in do_progressive_wfa_patch_alignment(), where the updated alignment seems to be identical to the original alignment, and so never triggers the break statements. For simplicity, this is just two iterations.

WFA fwd alignment: Alignment: Query(2663892-2665099/1207) Target(2685261-2686470/1209) Score=2147483647 Rev=No Status=NotOK Keep=No CIGAR= Indices(i,j)=(2685261,2663892)
WFA rev alignment: Alignment: Query(0-0/0) Target(0-0/0) Score=2147483647 Rev=Yes Status=NotOK Keep=No CIGAR= Indices(i,j)=(0,0)
bounds: 1207 1219 1209 1221
left_query_size: 1207 left_target_size: 1209
right_query_size: 0 right_target_size: 0
max_left_size: 1209 max_right_size: 0
WFA fwd alignment: Alignment: Query(2663892-2665099/1207) Target(2685261-2686470/1209) Score=2147483647 Rev=No Status=NotOK Keep=No CIGAR= Indices(i,j)=(2685261,2663892)
WFA rev alignment: Alignment: Query(0-0/0) Target(0-0/0) Score=2147483647 Rev=Yes Status=NotOK Keep=No CIGAR= Indices(i,j)=(0,0)
bounds: 1207 1219 1209 1221
left_query_size: 1207 left_target_size: 1209
right_query_size: 0 right_target_size: 0
max_left_size: 1209 max_right_size: 0

Since both alignments are not okay and the scores are identical, it never triggers either branch

if (rev_aln.ok && (!aln.ok || rev_aln.score < aln.score)) {
alignments.push_back(rev_aln);
} else if (aln.ok) {
alignments.push_back(aln);
if (alignments.size() == 1) {
break;
}
}

Nothing seems to get updated later on, and so the while loop just continues through forever. I'm not sure if the alignment score is coincidentally the INT32 max, but when using a small erode_k that did finish, all the scores looked normal.

The actual sequences being processed in do_wfa_patch_alignment() are

query NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTT 
target CCTTCTTTATGGTCAATTTCTCACATGGACAGGGAGGCCTCGCGTGCATGGGATCGCAAAGAGTCGGACACGACTGAGCGACTGATCTGATCTCACATGGATACATGATTACTGGAAAAAAACGCAGCTTTAACTATAAGGACCTTTGTCAGCAAAGTGATATGTCTGCTTTTTAACACATTGTGTCTAGGTTTGTCATAGCTCCTCTTCCAAGGAGCAAGTGTCTTTTAATTTCATGGCTGCAGTCACCATCCACAGTGCTTTTGAAGCCCAAGAAAATAAAATTTGTCAGTTTCCACTTTTTCCCCATCTATTTACCATGAAGTGATGGTACCGGATGCCATGATCTCAGTTTTTTTGTGAATGTTGAGTTTAAAGCCAGGTTTTTCACTCTCCTGTTGCACTCTCATTAGTGCCTCTTTAGTTCCTCTTCACTTTCTGGTATTAGAGTGGTGTCAGCCTGCATAGCCATGAAGTGAAAGTTGTTCAGTCATGTCCAACTCTTTGTGACACCATGAACTATACAGTCCATGGAATTTTCCAGGCCAGAATACTACAGTGGATAGCCTTTCCCTTCTCAAGGGAATCTTCCCAACCTAGCGATCGAACCCAGGTCTCCCATATTGCAGGTAGATCCTTTACCAGCTGAGCTGCAAGGGCAGCATATCTGTGGTTGTCCATATTTCTCCTGGCAATTTTAATTCCAGCTTGTGATTTATCTGGACATTTCACATGACTTATTCTGCATATAAGTTAAATAAGCAGGGTGACAATATACAGCTTTCATGTACTCCTTTTCGAAATTGAATCAGTCCATTGTTTCTTGTAAGGTTCTAACTGTTGCTTCTTGACCTGCACACAGGTTTTTTCAGGAGACAGGTAAAGTATTCTGGTATTCCCATCTTTTTTTAAATTCCAAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGAGATCCACACAGTCAAAGGATTTAGTGCAGTCATTGAAACAGAAATAGATTTTTTTTTTTTTTTTTTGGAATTCCCTTGATTTTTCTATGATCCAGTGGATGTTGGCAATTTGATCTCTGGTTTCTCTGCCTTTTCTAAATCCAGCTTGTACATCTGGCAATTCCTGGTTCACATAATGCTGAAGCCTAGCTTCAAGGATTTTGAGCATAATCTTAACAGCATCTGAAATGAGTGCAATTG

Interestingly, the reported forward score (aln.score) here is -2147483648, so maybe there is also some underflow error since the score is reported as 2147483647 later on.

I guess this is just some painful edge case where a patch has just enough non-N sequence to progress but too much N to give a real score?

@ASLeonard
Copy link
Contributor

After some more digging, the very first iteration for that patch looks different. The score seems to have underflowed for the forward alignment, but the reverse looks okay. But because the rev score is below the MAX_INT32 forward score, it gets in the bad infinite cycle.

score is -2147483648
WFA fwd alignment: Alignment: Query(2663892-2665111/1219) Target(2685261-2686482/1221) Score=2147483647 Rev=No Status=NotOK Keep=No CIGAR= Indices(i,j)=(2685261,2663892)
WFA rev alignment: Alignment: Query(2663892-2665111/1219) Target(2685261-2686482/1221) Score=2480 Rev=Yes Status=OK Keep=No CIGAR=MXMMDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDMMMMMXMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Indices(i,j)=(2685261,2663892)

@ASLeonard
Copy link
Contributor

It seems like there was a check against the forward score being -2147483648, but that was removed in b663abe during #255.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants