xml - Python regex issue -


i'm trying extract phone screen resolutions wurfl xml file below python script. problem first match, though. why? how matches?

the wurfl xml file can found @ http://sourceforge.net/projects/wurfl/files/wurfl/latest/wurfl-latest.zip/download?use_mirror=freefr

def read_file(file_name):     f = open(file_name, 'rb')     data = f.read()     f.close()     return data  text = read_file('wurfl.xml')  import re pattern = '<device id="(.*?)".*actual_device_root="true">.*<capability name="resolution_width" value="(\d+)"/>.*<capability name="resolution_height" value="(\d+)"/>.*</device>' m in re.findall(pattern, text, re.dotall):     print(m) 

first, use xml parser instead of regular expressions. you'll happier in long run.

second, if insist on using regexes, use finditer() instead of findall().

third, regex matches first entry last 1 (the .* greedy, , have set dotall mode), either see first paragraph or @ least change regex to

pattern = r'<device id="(.*?)".*?actual_device_root="true">.*?<capability name="resolution_width" value="(\d+)"/>.*?<capability name="resolution_height" value="(\d+)"/>.*?</device>' 

also, use raw strings regexes. \d happens work, \b behave unexpectedly in "normal" string, though.


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -