exception - How to handle Python 3.x UnicodeDecodeError in Email package? -
i try read email file, this:
import email open("xxx.eml") f: msg = email.message_from_file(f)
and error:
traceback (most recent call last): file "i:\fakt\real\maildecode.py", line 53, in <module> main() file "i:\fakt\real\maildecode.py", line 50, in main decode_file(infile, outfile) file "i:\fakt\real\maildecode.py", line 30, in decode_file msg = email.message_from_file(f) #, policy=mypol file "c:\python33\lib\email\__init__.py", line 56, in message_from_file return parser(*args, **kws).parse(fp) file "c:\python33\lib\email\parser.py", line 55, in parse data = fp.read(8192) file "c:\python33\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] unicodedecodeerror: 'charmap' codec can't decode byte 0x81 in position 1920: character maps <undefined>
the file contains multipart email, part encoded in utf-8. file's content or encoding might broken, have handle anyway.
how can read file, if has unicode errors? cannot find policy object compat32
, there seems no way handle exception , let python continue right exception occured.
what can do?
i can't test on message, don't know if work, can string decoding yourself:
with open("xxx.eml", encoding='utf-8', errors='replace') f: text = f.read() msg = email.message_from_string(f)
that's going lot of replacement characters if message isn't in utf-8. if it's got \x81
in it, utf-8 guess.
Comments
Post a Comment