Yahoo Pipe: How to parse sub DIVs -


for page has multiple divs, how fetch content divs contain useful text , avoid other divs ads, etc.

for example, page structure this:

...

<div id="articlecopy">    <div class="advertising 1">ads not want fetch.</div>    <p>useful texts go here</p>    <div class="advertising 2">ads not want fetch.</div>    <div class="related_articles_list">i not want read related articles parse part too</div>  </div> 

...

in fictional example, want rid of 2 divs advertising , div related articles. want fetch useful content in

inside parent div.

can pipe this?

thank you.

try yql module xpath. along these lines:

select * html url="http://mywebpagewithads.com" , xpath='//div/p' 

the above query retrieve part of html inside <p> tag under parent <div> tag. can fancy xpath if divs have attributes.

say example had page several divs, 1 wanted looked this:

<div>     <div>stuff don't want</div>     <div class="main_content">stuff want add feed</div>     <div>other stuff don't want</div>  </div> 

you change yql string above this:

select * html url="http://mywebpagewithads.com"  , xpath='//div/div[contains(@class,"main_content")]' 

i've discovered yql myself, , new using xpaths, has worked me far.


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -