-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping speed improvement #791
base: master
Are you sure you want to change the base?
Conversation
Hi Sebastian, thanks a lot for this PR! This is definitely yet another part of the code that need improvements. Cheers |
As a note here, I reach the limit all the time when aligning Ribo-seq to tRNA databases from tRNA-scan. Using a small tRNA database of 80kb:
I get thousands of these lines in the log. I know I could join the genome with the trna database, or align to genome before trna, but I prefer it this way, as it is logically much simpler. Alignment speed on these small databases are terrible since the SA index must be like 4, and it tries hard to map reads to the small trna database. So if I get this right, this will make it even slower for me ? |
Hi @Roleren, Thanks for the feedback. This could be a useful test case for some of the more unusual scenarios where the patch might be detrimental. Have you actually tried it or this speculation?
Note that my post talks about
The easiest and most reliable answer to this question is to run a benchmark. Do you want to give my patch a try? Or can you share the tRNA database and a test sample with me? Then I could benchmark this myself. Regards, |
This is the run for current STAR: tRNA file (80 kB , 400 sequences of rat mature tRNAs) `
` |
Seeing these terrible stats, I see I should maybe just change the way I do this, instead of testing your commit :D |
Hi Alex,
This is a proposal for a mapping speed improvement. It is independent of the improvement suggested by alexey0308 and boosts the throughput by 10-25% - depending on the operating system / compiler version / CPU. I benchmarked it on two systems with human data:
I could not benchmark it on a Mac. This might be of interest, however, because
memset
is supposedly very efficient on MacOS.Here are the essentials:
stitchPieces
wastes a lot of time resetting thewinBin
array usingmemset
. With every call, it rewrites ~200kb of memory, although typically only a few hundred bytes of the array are accessed. I replacedwinBin
with a "smart" vector that only resets the elements that are actually accessed (FastResetVector
). You pay a little bit of overhead with every access to an element, because upon each access the vector needs to check if the element being accessed is stale. But you avoid having to rewrite the whole vector. This pays off when the vector is sparse (as is the case withwinBin
) and when there are not too many accesses to the vector. Or stated the other way around: This implementation would be detrimental to performance when many elements ofwinBin
would be written/read. According to my measurements there are typically only a few hundred. I'm curious, though, whether there could be a scenario wherewinBin
is heavily accessed. I noticed thatalignWindowsPerReadNmax
is set to10000
by default. Under what circumstances would this limit be reached? In my tests it barely even reached 100 and only with a few reads. Let me know what you think. I lack the overview to judge if my suggested patch would be infeasible in some settings.There is a second (minor) code optimization: The
operator[]()
function ofPackedArray
seems to be a code hot spot. The bit shifting in this line turns out to be the most expensive:a1 = ((a1>>S)<<wordCompLength)>>wordCompLength;
. It can easily be replaced with a bit mask operation. I did not really think that it would make a difference, but it does. The effect is a bit hard to measure, but taking the median of ten runs of STAR with/without the optimization confirmed a 1-2% performance improvement.Regards,
Sebastian