Python encoding issue while reading a file

By : Jimmy
Source: Stackoverflow.com
Question!

I am trying to read a file that contains this character in it "ë". The problem is that I can not figure out how to read it no matter what I try to do with the encoding. When I manually look at the file in textedit it is listed as a unknown 8-bit file. If I try changing it to utf-8, utf-16 or anything else it either does not work or messes up the entire file. I tried reading the file just in standard python commands as well as using codecs and can not come up with anything that will read it correctly. I will include a code sample of the read below. Does anyone have any clue what I am doing wrong? This is Python 2.17.10 by the way.

readFile = codecs.open("FileName",encoding='utf-8')

The line I am trying to read is this with nothing else in it.

Aeëtes

Here are some of the errors I get:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 0: invalid start byte

UTF-16 stream does not start with BOM" UnicodeError: UTF-16 stream does not start with BOM -- I know this one is that it is not a utf-16 file.

UnicodeDecodeError: 'ascii' codec can't decode byte 0x91 in position 0: ordinal not in range(128)

If I don't use a Codec the word comes in as Ae?tes which then crashes later in the program. Just to be clear, none of the suggested questions or any other anywhere on the net have pointed to an answer. One other detail that might help is that I am using OS X, not Windows.

By : Jimmy


Answers

Credit for this answer goes to RadLexus for figuring out the proper encoding and also to Mad Physicist who pointed me in the right track even if I did not consider all possible encodings.

The issue is apparently a Mac will convert the .txt file to mac_roman. If you use that encoding it will work perfectly.

This is the line of code that I used to convert it.

readFile = codecs.open("FileName",encoding='mac_roman')
By : Jimmy


This video can help you solving your question :)
By: admin