-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvements for sampling #25
Comments
Hey!
Try to use scipy.sparse.lil_matrix instead of scipy.sparse.csr_matrix and I
believe you'll get another speedup.
вт, 5 мар. 2019 г. в 10:45, Feras <[email protected]>:
… The code in the book for estimating uniqueness and building the ind matrix
is quite crude and assumes a relatively small number of signals (and or
bars). The major speed bump comes most likely because of large memory usage
and therefore swap. Switching to sparse matrices fixes the problem even for
large numbers of signals and bars:
def getIndMatrixSparse(barIx, t1):
from scipy.sparse import csr_matrix
rows = barIx[(barIx>=t1.index[0])&(barIx<=t1.max())]
cols = t1
indM = csr_matrix((len(rows), len(cols)), dtype=np.float)
with tqdm(total=len(cols)) as pbar:
for i,(t0,t1) in enumerate(t1.iteritems()):
start_int = rows.searchsorted(t0)
end_int = rows.searchsorted(t1)
indM[start_int:end_int, i] = 1.
pbar.update(1)
return indM
def getAvgUniquenessSparse(indM):
from scipy.sparse import csr_matrix
# Average uniqueness from indicator matrix
c=indM.sum(axis=1) # concurrency
u=csr_matrix(indM.multiply(1/c)) # uniqueness
avgU=u[u>0].mean() # avg. uniqueness
return avgU
I an open a PR although I can't rerun the NB so im not sure where and how
to add it. I couldn't find a way to div with a csr matrix and multiply by
1/c we get a coo_matrix, so it needs another conversion to csr to do the
mean calc. If someone is better versed in scipy.sparse, im happy to improve
on it. Right now it does about 800Kx10K avgU calc in about 30ms on my
laptop.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#25>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AJp_ooarSJCvnLKyLf09_4sO_WX1PLiGks5vTiCkgaJpZM4bd-R_>
.
|
What's the intuition behind it? Because from the docs it reads
and the vast majority of the code i shared is basically either a column slicing op or arithmetic op (in getAvgU). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The code in the book for estimating uniqueness and building the ind matrix is quite crude and assumes a relatively small number of signals (and or bars). The major speed bump comes most likely because of large memory usage and therefore swap. Switching to sparse matrices fixes the problem even for large numbers of signals and bars:
I an open a PR although I can't rerun the NB so im not sure where and how to add it. I couldn't find a way to
div
with a csr matrix and multiply by1/c
we get acoo_matrix
, so it needs another conversion to csr to do the mean calc. If someone is better versed in scipy.sparse, im happy to improve on it. Right now it does about 800Kx10K avgU calc in about 30ms on my laptop.The text was updated successfully, but these errors were encountered: