Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to detect header row on unicode csv file #74

Open
kaselis opened this issue Feb 3, 2016 · 5 comments
Open

Failing to detect header row on unicode csv file #74

kaselis opened this issue Feb 3, 2016 · 5 comments

Comments

@kaselis
Copy link

kaselis commented Feb 3, 2016

Using Sniffer class to detect if file contains a header row fails with:

has_header = unicodecsv.Sniffer().has_header(csvfile.read(4096))
Error: line contains NULL byte

The same error, that csv module from standard lib throws.

@ryanhiebert
Copy link
Collaborator

Unfortunately unicodecsv does not currently support encodings that have null bytes, the usual example of which is utf-16. There are some ideas to fix it, but it hasn't happened quite yet.

A possible option could be to use https://github.com/ryanhiebert/backports.csv, which is a backport of the Python 3 version of csv, which works exclusively with text, not bytes. If you're so inclined, I'd love for you to try it out and tell me if it works for you. Be aware though, that I haven't put it on PyPI quite yet, so you'll need to install it with a git url. If it works for you, I'll make it a priority to put it on PyPI.

@kaselis
Copy link
Author

kaselis commented Feb 4, 2016

Thanks for the reply. Though I'm still fairly new to these encodings, but you pointed me to the right direction, and it seems my problem was not with csv module, but rather with opening and reading a file. Once I opened file with correct encoding, csv module had no problems to read it.

@kaselis
Copy link
Author

kaselis commented Feb 4, 2016

Though after opening and reading file successfully, I started to get "UnicodeEncodeError":

ipdb> unicodecsv.Sniffer().has_header(sample)
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

but it works fine with backports.csv:

ipdb> backports.csv.Sniffer().has_header(sample)
False

@ryanhiebert
Copy link
Collaborator

I've released backport.csv version 1.0! https://pypi.python.org/pypi/backports.csv.

@kaselis
Copy link
Author

kaselis commented Feb 11, 2016

Awesome, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants