Yahoo Pipe: How to parse sub DIVs -
for page has multiple divs, how fetch content divs contain useful text , avoid other divs ads, etc.
for example, page structure this:
...
<div id="articlecopy"> <div class="advertising 1">ads not want fetch.</div> <p>useful texts go here</p> <div class="advertising 2">ads not want fetch.</div> <div class="related_articles_list">i not want read related articles parse part too</div> </div>
...
in fictional example, want rid of 2 divs advertising , div related articles. want fetch useful content in
inside parent div.
can pipe this?
thank you.
try yql module xpath. along these lines:
select * html url="http://mywebpagewithads.com" , xpath='//div/p'
the above query retrieve part of html inside <p> tag under parent <div> tag. can fancy xpath if divs have attributes.
say example had page several divs, 1 wanted looked this:
<div> <div>stuff don't want</div> <div class="main_content">stuff want add feed</div> <div>other stuff don't want</div> </div>
you change yql string above this:
select * html url="http://mywebpagewithads.com" , xpath='//div/div[contains(@class,"main_content")]'
i've discovered yql myself, , new using xpaths, has worked me far.
Comments
Post a Comment