ruby - How do I correctly deal with non-breaking spaces using Nokogiri? -


i using nokogiri parse html page, having odd problems non-breaking spaces. tried different encodings, replacing whitespace, , few other headache inducing attempts.

here html snippet in question:

<td>amount 15,300&nbsp;at&nbsp;dollars</td> 

note change &nbsp; representation after use nokogiri:

<td>amount 15,300&#xa0;at&#xa0;dollars</td> 

and outputting inner_text:

amount 15,300 at dollars 

this base nokogiri grab, did try few alternatives solve failed miserably:

doc = nokogiri::html(open(url)) 

and doc.search item in question.

note if @ doc, line shows &#xa0; on line.

clarification: not think stated difficulty having. can't inner_text show without strange  symbol.

i know old, took me hour find out how solve problem, , easy once know. pass string function , "de-nbsp-fied".

def strip_html(str)   nbsp = nokogiri::html("&nbsp;").text   str.gsub(nbsp,'') end 

you replace whith space if wished. may many of find answer!


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -