Python Regex Takes So Long In Some Cases
I compiled the following pattern pattern = re.compile( r''' (?P.*?) \s* (?P\w+) \s*PACKET\s* (?P\w+) \s*
Solution 1:
Yep, you've got yourself a case of catastrophic backtracking, also known as an "evil regex", here:
\s*
(?P<q_r>.*?)
\s*
Here:
\s*
(?P<flag_char_code>.*?)
\s*
And here:
\s*
\.(?P<domain>.*)\.
Replacing .*
with \S*
should do the trick.
For more information about what an evil regex is and why it's evil, check out this question:
How can I recognize an evil regex?
Solution 2:
You can improve your pattern with:
(?P<domain>\w+(?:[-.]\w+)*)
(?P<date>\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2} [AP]M)
(?P<q_r>[^[]*)
You need a more explicit subpattern for flag_char_code too, the goal is to describe the content of each group to reduce the regex engine work and avoid backtracking.
Post a Comment for "Python Regex Takes So Long In Some Cases"