html - retrieving just the title of a webpage in python -


i have more 5000 webpages want titles of of them. in project using beautifulsoup html parser this.

soup = beautifulsoup(open(url).read()) soup('title')[0].string 

but taking lots of time. title of webpage reading entire file , building parse tree(i thought reason delay, correct me if wrong).

is there in other simple way in python.

it faster if used simple regular expression, beautifulsoup pretty slow. like:

import re regex = re.compile('<title>(.*?)</title>', re.ignorecase|re.dotall) regex.search(string_to_search).group(1) 

Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -