Skip to content Skip to sidebar Skip to footer

Making Regex More Specific To Exclude Certain Characters

import re s = '01.11.11 12/12/1981 1*51*12 . 22|1|13 03-02-1919 1-22-12 or 01-23-18 or 03-23-1984 01.11.18 or 2.2.17 or 02.02.18 or 12.1.16 12.23.1943 01-23-11 not 12.23.192 not 02

Solution 1:

Instead of matching . inbetween the sets of numbers - which will match any character - only match the ones you are looking for; . - /.

Fixed regex:

\d{1,2}[\.\/-]\d{1,2}[\.\/-](?:\d{4}|\d{2})\b

Also adds a word boundary at the end to avoid matching dates with only 3 digit years.

regex101 example: https://regex101.com/r/0r6jru/2

Solution 2:

If you want to match the same delimiter for the whole "date like" pattern, you could make use of a capturing group and a backreference \1 so that a date like 12/12.1981 is not matched.

Note that the pattern does not validate a date itself.

\b\d{1,2}([./-])\d{1,2}\1(?:\d{4}|\d{2})\b
  • \b Word boundary
  • \d{1,2} Match 1-2 digits
  • ([./-]) Capture group 1, match ./ or -
  • \d{1,2}\1 Match 1-2 digits and a backreference to the first captured delimiter
  • (?:\d{4}|\d{2}) Match eithe 4 or 2 digits
  • \b Word boundary

Regex demo | Python demo

For example using re.finditer (as re.findall will return the capturing group which is the delimiter)

import re
reg = r"\b\d{1,2}([./-])\d{1,2}\1(?:\d{4}|\d{2})\b"
s = "01.11.11 12/12/1981 1*51*12 . 22|1|13 03-02-1919 1-22-12 or 01-23-18 or 03-23-1984 01.11.18 or 2.2.17 or 02.02.18 or 12.1.16 12.23.1943 01-23-11 not 12.23.192 not 02.02.1"

matches = re.finditer(reg, s)
for matchNum, match inenumerate(matches, start=1):
    print(match.group())

Solution 3:

\b((?:\d{1,2}(?:\.|\/|-)){2}(?:\d{4}|\d{2}))\b

This regex will match all of your test cases, and will filter improper years, such as 12.23.192

Try it here!

Post a Comment for "Making Regex More Specific To Exclude Certain Characters"