Skip to content Skip to sidebar Skip to footer

How To Extract A Specific Digit In Each Row Of A Pandas Series Containing Text

I have a pd.Series looks like as follows O some texts...final exam marks:50 next level:10 1 some texts....final exam marks he has got:54 next level:15 2 some texts...f

Solution 1:

Try

s.str.extract('.*marks:\s?(\d+)', expand = False)


050154245

With the update:

s.str.extract('.*marks.*?(\d+)', expand = False)

This regex considers the fact that there may or may not be a character after marks

You get

0    50
1    54
2    45

Solution 2:

You need look behind syntax (?<=), which asserts a desired pattern is preceded by another pattern, (?<=marks:) *([0-9]+) extract digits after the word marks: followed by optional spaces:

s#0sometexts...finalexammarks:50nextlev...
#1sometexts....finalexammarks:54nextle...
#2sometexts...finalmarks: 45nextbestle...
#Name: 1, dtype: objects.str.extract("(?<=marks:) *([0-9]+)", expand=False)

#050#154#245#Name: 1, dtype: object

Post a Comment for "How To Extract A Specific Digit In Each Row Of A Pandas Series Containing Text"