Making Regex More Specific To Exclude Certain Characters
Solution 1:
Instead of matching .
inbetween the sets of numbers - which will match any character - only match the ones you are looking for; . - /
.
Fixed regex:
\d{1,2}[\.\/-]\d{1,2}[\.\/-](?:\d{4}|\d{2})\b
Also adds a word boundary at the end to avoid matching dates with only 3 digit years.
regex101 example: https://regex101.com/r/0r6jru/2
Solution 2:
If you want to match the same delimiter for the whole "date like" pattern, you could make use of a capturing group and a backreference \1
so that a date like 12/12.1981
is not matched.
Note that the pattern does not validate a date itself.
\b\d{1,2}([./-])\d{1,2}\1(?:\d{4}|\d{2})\b
\b
Word boundary\d{1,2}
Match 1-2 digits([./-])
Capture group 1, match.
/
or-
\d{1,2}\1
Match 1-2 digits and a backreference to the first captured delimiter(?:\d{4}|\d{2})
Match eithe 4 or 2 digits\b
Word boundary
For example using re.finditer (as re.findall will return the capturing group which is the delimiter)
import re
reg = r"\b\d{1,2}([./-])\d{1,2}\1(?:\d{4}|\d{2})\b"
s = "01.11.11 12/12/1981 1*51*12 . 22|1|13 03-02-1919 1-22-12 or 01-23-18 or 03-23-1984 01.11.18 or 2.2.17 or 02.02.18 or 12.1.16 12.23.1943 01-23-11 not 12.23.192 not 02.02.1"
matches = re.finditer(reg, s)
for matchNum, match inenumerate(matches, start=1):
print(match.group())
Solution 3:
\b((?:\d{1,2}(?:\.|\/|-)){2}(?:\d{4}|\d{2}))\b
This regex will match all of your test cases, and will filter improper years, such as 12.23.192
Post a Comment for "Making Regex More Specific To Exclude Certain Characters"