Failing to detect header row on unicode csv file #74

kaselis · 2016-02-03T11:10:45Z

Using Sniffer class to detect if file contains a header row fails with:

has_header = unicodecsv.Sniffer().has_header(csvfile.read(4096))
Error: line contains NULL byte

The same error, that csv module from standard lib throws.

The text was updated successfully, but these errors were encountered:

ryanhiebert · 2016-02-03T14:24:12Z

Unfortunately unicodecsv does not currently support encodings that have null bytes, the usual example of which is utf-16. There are some ideas to fix it, but it hasn't happened quite yet.

A possible option could be to use https://github.com/ryanhiebert/backports.csv, which is a backport of the Python 3 version of csv, which works exclusively with text, not bytes. If you're so inclined, I'd love for you to try it out and tell me if it works for you. Be aware though, that I haven't put it on PyPI quite yet, so you'll need to install it with a git url. If it works for you, I'll make it a priority to put it on PyPI.

kaselis · 2016-02-04T08:04:43Z

Thanks for the reply. Though I'm still fairly new to these encodings, but you pointed me to the right direction, and it seems my problem was not with csv module, but rather with opening and reading a file. Once I opened file with correct encoding, csv module had no problems to read it.

kaselis · 2016-02-04T10:43:40Z

Though after opening and reading file successfully, I started to get "UnicodeEncodeError":

ipdb> unicodecsv.Sniffer().has_header(sample)
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

but it works fine with backports.csv:

ipdb> backports.csv.Sniffer().has_header(sample)
False

ryanhiebert · 2016-02-11T05:32:20Z

I've released backport.csv version 1.0! https://pypi.python.org/pypi/backports.csv.

kaselis · 2016-02-11T07:08:33Z

Awesome, thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to detect header row on unicode csv file #74

Failing to detect header row on unicode csv file #74

kaselis commented Feb 3, 2016

ryanhiebert commented Feb 3, 2016

kaselis commented Feb 4, 2016

kaselis commented Feb 4, 2016

ryanhiebert commented Feb 11, 2016

kaselis commented Feb 11, 2016

Failing to detect header row on unicode csv file #74

Failing to detect header row on unicode csv file #74

Comments

kaselis commented Feb 3, 2016

ryanhiebert commented Feb 3, 2016

kaselis commented Feb 4, 2016

kaselis commented Feb 4, 2016

ryanhiebert commented Feb 11, 2016

kaselis commented Feb 11, 2016