html - retrieving just the title of a webpage in python -
i have more 5000 webpages want titles of of them. in project using beautifulsoup html parser this.
soup = beautifulsoup(open(url).read()) soup('title')[0].string
but taking lots of time. title of webpage reading entire file , building parse tree(i thought reason delay, correct me if wrong).
is there in other simple way in python.
it faster if used simple regular expression, beautifulsoup
pretty slow. like:
import re regex = re.compile('<title>(.*?)</title>', re.ignorecase|re.dotall) regex.search(string_to_search).group(1)
Comments
Post a Comment