Skip to content Skip to sidebar Skip to footer

Extracting Dates In Any Format From A Pandas Column (the Date Is A Part Of A Longer String)

I'm trying to extract dates in any format from a pandas column (the date is a part of a longer string). I have found this answer which does it outside of pandas, but I'm not sure h

Solution 1:

using the approach from the linked answer:

import dateutil.parser as dparser
s.apply(lambda x: dparser.parse(x,fuzzy=True).strftime('%Y-%m-%d'))

Although dparser can't of course cope with all possibilities: in the sample data you'll have to change footballer, born 1900s to footballer, born 1900's, otherwise parse will complain that second must be in 0..59

If you need exception handling, you'll have to define a regular function as lambdas can't handle try/except:

defmyparser(x):
    try:
       return dparser.parse(x,fuzzy=True)
    except:
       returnNone

s.apply(lambda x: myparser(x))

This will insert NaT values for wrong dates (or you can provide a 'default date' if you like):

01989-10-121NaT21987-12-2931983-07-124NaT52019-05-16

Solution 2:

Try this, if it can't recognize a row as containing a date it will return 1/1/1 , if the date is not complete with month and date will assume january 1st, but you can change it, by adjust the default.

import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.parser import parse

l = ['footballer, born October 1989',
'footballer, born 1900s',
'footballer, born 29 December 1987',
'Brazilian footballer, born 1983',
'31/02/1901',
'16 May 2019']

df  = pd.Series(l, name='strings')

defget_dates(series):
my_list =[]
for i inrange(len(series)):
    for j inrange(len(series[i])):
        try:
            my_list.append(parse(series[i][j:],default=datetime(1, 1, 1)).strftime('%Y-%m-%d'))
            breakexcept:
            passreturn pd.Series(my_list)    


get_dates(df)

01989-10-01
1    0001-01-01
21987-12-2931983-01-01
41901-01-02
52019-05-16
dtype: object

Post a Comment for "Extracting Dates In Any Format From A Pandas Column (the Date Is A Part Of A Longer String)"