Skip to content Skip to sidebar Skip to footer

Extract Salaries From A List Of Strings

I'm trying to extract salaries from a list of strings. I'm using the regex findall() function but it's returning many empty strings as well as the salaries and this is causing me

Solution 1:

Using re.findall will give you the capturing groups when you use them in your pattern and you are using a group where almost everything is optional giving you the empty strings in the result.

In your pattern you use [0-9]* which would match 0+ times a digit. If there is not limit to the leading digits, you might use [0-9]+ instead to not make it optional.

You might use this pattern with a capturing group:

(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)

Regex demo | Python demo

Explanation

  • (?<!\S) Assert what is on the left is not a non whitespace character
  • ( Capture group
    • [0-9]+(?: [0-9]{1,3})? match 1+ digits followed by an optional part that matches a space and 1-3 digits
  • ) Close capture group
  • Match literally
  • (?!\S) Assert what is on the right is not a non whitespace character

Your code might look like:

import re
sal= '41 000€ à 63 000€ / an' #thisis a sample string for which i have errors
regex = '(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)'
print(re.findall(regex,sal))  # ['41 000', '63 000']

Post a Comment for "Extract Salaries From A List Of Strings"