ruby - How do I correctly deal with non-breaking spaces using Nokogiri? -
i using nokogiri parse html page, having odd problems non-breaking spaces. tried different encodings, replacing whitespace, , few other headache inducing attempts.
here html snippet in question:
<td>amount 15,300 at dollars</td>
note change
representation after use nokogiri:
<td>amount 15,300 at dollars</td>
and outputting inner_text
:
amount 15,300 at dollars
this base nokogiri grab, did try few alternatives solve failed miserably:
doc = nokogiri::html(open(url))
and doc.search
item in question.
note if @ doc, line shows  
on line.
clarification: not think stated difficulty having. can't inner_text
show without strange Â
symbol.
i know old, took me hour find out how solve problem, , easy once know. pass string function , "de-nbsp-fied".
def strip_html(str) nbsp = nokogiri::html(" ").text str.gsub(nbsp,'') end
you replace whith space if wished. may many of find answer!
Comments
Post a Comment