Skip to content Skip to sidebar Skip to footer

Python Open Csv File With Supposedly Mixed Encodings?

I'm trying read a CSV textfile (UTF-8 without BOM according to Notepad++) using Python. However there seems to be a problem with encoding: print(open(path, encoding='utf-8').read(

Solution 1:

You can use the errors parameter in the open function. You can try one of the options below (I extracted the descriptions from python documentation):

  • 'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss.
  • 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data.
  • 'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the surrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding.

So, you can use:

print(open(path, encoding="utf-8", errors="ignore").read())

Solution 2:

What is the version of your python? If use the 2.x try to paste the import at the beginning of your script:

from __future__ import unicode_literals

than try:

print(open(path).read().encode('utf-8'))

There is also a great tool for charset detections: chardet. I hope it'll help you.

Post a Comment for "Python Open Csv File With Supposedly Mixed Encodings?"