exception - How to handle Python 3.x UnicodeDecodeError in Email package? -


i try read email file, this:

import email open("xxx.eml") f:    msg = email.message_from_file(f) 

and error:

traceback (most recent call last):   file "i:\fakt\real\maildecode.py", line 53, in <module>     main()   file "i:\fakt\real\maildecode.py", line 50, in main     decode_file(infile, outfile)   file "i:\fakt\real\maildecode.py", line 30, in decode_file     msg = email.message_from_file(f)  #, policy=mypol   file "c:\python33\lib\email\__init__.py", line 56, in message_from_file     return parser(*args, **kws).parse(fp)   file "c:\python33\lib\email\parser.py", line 55, in parse     data = fp.read(8192)   file "c:\python33\lib\encodings\cp1252.py", line 23, in decode     return codecs.charmap_decode(input,self.errors,decoding_table)[0] unicodedecodeerror: 'charmap' codec can't decode byte 0x81 in position 1920: character maps <undefined> 

the file contains multipart email, part encoded in utf-8. file's content or encoding might broken, have handle anyway.

how can read file, if has unicode errors? cannot find policy object compat32 , there seems no way handle exception , let python continue right exception occured.

what can do?

i can't test on message, don't know if work, can string decoding yourself:

with open("xxx.eml", encoding='utf-8', errors='replace') f:     text = f.read()     msg = email.message_from_string(f) 

that's going lot of replacement characters if message isn't in utf-8. if it's got \x81 in it, utf-8 guess.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -