xml - Python regex issue -
i'm trying extract phone screen resolutions wurfl xml file below python script. problem first match, though. why? how matches?
the wurfl xml file can found @ http://sourceforge.net/projects/wurfl/files/wurfl/latest/wurfl-latest.zip/download?use_mirror=freefr
def read_file(file_name): f = open(file_name, 'rb') data = f.read() f.close() return data text = read_file('wurfl.xml') import re pattern = '<device id="(.*?)".*actual_device_root="true">.*<capability name="resolution_width" value="(\d+)"/>.*<capability name="resolution_height" value="(\d+)"/>.*</device>' m in re.findall(pattern, text, re.dotall): print(m)
first, use xml parser instead of regular expressions. you'll happier in long run.
second, if insist on using regexes, use finditer()
instead of findall()
.
third, regex matches first entry last 1 (the .*
greedy, , have set dotall
mode), either see first paragraph or @ least change regex to
pattern = r'<device id="(.*?)".*?actual_device_root="true">.*?<capability name="resolution_width" value="(\d+)"/>.*?<capability name="resolution_height" value="(\d+)"/>.*?</device>'
also, use raw strings regexes. \d
happens work, \b
behave unexpectedly in "normal" string, though.
Comments
Post a Comment