python - How to open A HTML page with windows-1252 encoding in beautifulsoup -


i try parse html document beautifulsoup run in troubles. best way open html document windows-1252 encoding?

i tried iconv convert utf-8 doesn't work.

doc = open("e.html").read()  soup = beautifulsoup(doc)  soup.findall('p') 

unicodeencodeerror: 'ascii' codec can't encode character u'\xfc' in position 103: ordinal not in range(128)

when open without iconv same error.

full traceback:

>>> soup.findall('p') traceback (most recent call last):   file "<stdin>", line 1, in <module> unicodeencodeerror: 'ascii' codec can't encode character u'\xfc' in position 103: ordinal not in range(128) 

try this:

doc = open("e.html").read()  doc = doc.decode('cp1252')  soup = beautifulsoup(doc)  soup.findall('p') 

Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -