Skip to content Skip to sidebar Skip to footer

Python Regex Takes So Long In Some Cases

I compiled the following pattern pattern = re.compile( r''' (?P.*?) \s* (?P\w+) \s*PACKET\s* (?P\w+) \s*

Solution 1:

Yep, you've got yourself a case of catastrophic backtracking, also known as an "evil regex", here:

\s*
(?P<q_r>.*?)
\s*

Here:

\s*
(?P<flag_char_code>.*?)
\s*

And here:

\s*
\.(?P<domain>.*)\.

Replacing .* with \S* should do the trick.

For more information about what an evil regex is and why it's evil, check out this question:
How can I recognize an evil regex?


Solution 2:

You can improve your pattern with:

(?P<domain>\w+(?:[-.]\w+)*)
(?P<date>\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2} [AP]M)
(?P<q_r>[^[]*)

You need a more explicit subpattern for flag_char_code too, the goal is to describe the content of each group to reduce the regex engine work and avoid backtracking.


Post a Comment for "Python Regex Takes So Long In Some Cases"