\B+ Vs [\B]+ Vs [^\b]+ In Python Regex
Solution 1:
The \B+
pattern causes nothing to repeat error that is a usual error when you try to quantify a special regex operator that is a zero-width assertion. Any of these - (*
, |*
, \b+
, \B+
- will cause this error. Repeating a zero-width assertion makes no sense as it does not consume any characters and the regex index remains at the same position. Note that a{1,2}+
and f*+
(possessive quantifiers that Python re
does not support) cause another, but similar error - multiple repeat.
Now, the \b
and \B
cannot be used inside a character class. See re
Python reference:
Note that
\b
is used to represent word boundaries, and means “backspace” only inside character classes. ... Inside a character range,\b
represents the backspace character, for compatibility with Python’s string literals.
Also, FYI,
\number
... Inside the[
and]
of a character class, all numeric escapes are treated as characters.
In the same way, you cannot use \B
, \A
, \Z
and backreferences like \1
inside character classes. They just lose their special regex meaning and are treated as whatever Python sees right. Actually, since Python parses invalid escape sequences as \
+ char, the [\B]
matches only B
char, since \
is escaping a literal symbol and the symbol is matched as such. Thus,
print(re.findall(r'[\B]+', "BBB \\Bash"))
And r"[^\b]+"
only matches all chars that are not a backspace char:
print(re.findall(r'[^\b]+', "bbb \\bash\baaa"))
outputs ['bbb \\bash', 'aaa']
.
Solution 2:
\B+
causes an error because there's no point in repeating a boundary - one boundary is the same as two boundaries. It's more likely that you've done this by mistake, so the error makes sense.[\B]+
is something completely different. (Most) Escape sequences do not work inside a character class, which is why this is a character set that matches the character\
orB
, so obviously repeating this is possible.
Post a Comment for "\B+ Vs [\B]+ Vs [^\b]+ In Python Regex"