unicodecsv is kind of slow; but maybe unavoidable? #46

NelsonMinar · 2015-02-26T02:03:33Z

Thank you so much for unicodecsv, it's been a big help for me in Python2. Not to sound ungrateful, but...

unicodecsv seems fairly slow. Some benchmarking suggests it's about 5-6x slower than the plain Py2 csv module. Of course it's doing more work, decoding bytes to strings! But for comparison the Py3 csv module (which does decoding) is only 2-3x slower than Py2. Is there room for improvement in unicodecsv?

I did some profiling and code reading and didn't see any obvious way unicodecsv could be made faster. So maybe there's no real way to optimize it. But wanted to file the issue both to document what I learned and get a second opinion.

My benchmark code and results are at https://nelsonslog.wordpress.com/2015/02/26/python-csv-benchmarks/

kengruven · 2015-02-28T04:40:31Z

I don't see the file you were using, but for the 1M line CSV file I was playing with today, I found that the isinstance() calls in UnicodeReader#next were taking around 50% of the runtime. And unless the dialect requests QUOTE_NONNUMERIC, that's never going to hit.

I've submitted a pull request, /pull/47, which avoids the isinstance() call here in this case. It's still about 3x slower than the built-in (ASCII) 'csv' module, but it's significantly faster than before.

NelsonMinar · 2015-02-28T04:47:40Z

I noticed a fair amount of time with isinstance too but assumed it was unavoidable. Sounds like your code is a good improvement if it works!

I spent some time looking at the speed of Python Unicode decoding and am more confused than ever as to exactly what's going on with the larger speed issue. https://nelsonslog.wordpress.com/2015/02/26/python-file-reading-benchmarks/

jdunck · 2015-03-03T22:53:28Z

@NelsonMinar thanks for the detailed benchmarking. I'll leave this open as a reminder to do other optimization work, but I've merged #47.

jdunck · 2015-03-11T17:06:05Z

I've just released 0.11.0, which includes changes in #47.

NelsonMinar · 2015-03-11T17:51:09Z

Nice, thanks for the update! I just tested it and it makes my benchmark run in 70-80% of the time it used to. Very nice improvement for a simple change. Detailed timings: https://nelsonslog.wordpress.com/2015/03/11/unicodecsv-0-11-0-speed-improvement/

jdunck closed this as completed Mar 11, 2015

jdunck reopened this Mar 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicodecsv is kind of slow; but maybe unavoidable? #46

unicodecsv is kind of slow; but maybe unavoidable? #46

NelsonMinar commented Feb 26, 2015

kengruven commented Feb 28, 2015

NelsonMinar commented Feb 28, 2015

jdunck commented Mar 3, 2015

jdunck commented Mar 11, 2015

NelsonMinar commented Mar 11, 2015

unicodecsv is kind of slow; but maybe unavoidable? #46

unicodecsv is kind of slow; but maybe unavoidable? #46

Comments

NelsonMinar commented Feb 26, 2015

kengruven commented Feb 28, 2015

NelsonMinar commented Feb 28, 2015

jdunck commented Mar 3, 2015

jdunck commented Mar 11, 2015

NelsonMinar commented Mar 11, 2015