python - How to open A HTML page with windows-1252 encoding in beautifulsoup -
i try parse html document beautifulsoup run in troubles. best way open html document windows-1252 encoding?
i tried iconv convert utf-8 doesn't work.
doc = open("e.html").read() soup = beautifulsoup(doc) soup.findall('p')
unicodeencodeerror: 'ascii' codec can't encode character u'\xfc' in position 103: ordinal not in range(128)
when open without iconv same error.
full traceback:
>>> soup.findall('p') traceback (most recent call last): file "<stdin>", line 1, in <module> unicodeencodeerror: 'ascii' codec can't encode character u'\xfc' in position 103: ordinal not in range(128)
try this:
doc = open("e.html").read() doc = doc.decode('cp1252') soup = beautifulsoup(doc) soup.findall('p')
Comments
Post a Comment