Skip to content Skip to sidebar Skip to footer

Remove Specific Columns In Dataframe With Same Id On Date Condition

I have two datasets: One contains house energy certificates issued the last 10 years with an ID for the house and the date it was issued. One house could have more certificates is

Solution 1:

Extended your df wiht one more address transaction_id for better testing..and taken dataframe from excel you can modify that part as per your need..

input_df

transaction_id  address_id  official_date   certificate issued_date
83866285    1157600091  5/25/2016   A2012-278940    17.12.2012 17:44:17
83866285    1157600091  5/25/2016   A2012-278941    17.12.2012 17:48:35
83866285    1157600091  5/25/2016   A2016-638538    22.02.2016 10:16:12
83866285    1157600091  5/25/2016   A2016-638577    22.02.2016 10:22:45
83866285    1157600091  5/25/2016   A2019-1065662   21.10.2019 15:39:30
83866286    1157600093  5/25/2019   A2012-278940    17.12.2012 17:44:17
83866286    1157600093  5/25/2019   A2012-278941    17.12.2012 17:48:35
83866286    1157600093  5/25/2019   A2016-638538    22.02.2016 10:16:12
83866286    1157600093  5/25/2019   A2016-638577    22.02.2016 10:22:45
83866286    1157600093  5/25/2019   A2019-1065662   21.11.2019 15:39:30

..

import pandas as pd
import numpy
import re

input_df = pd.read_excel('input.xlsx',sheet_name='Sheet1')

# convert columns in date time 

input_df['issued_date'] = pd.to_datetime(input_df['issued_date'])
input_df['official_date'] = pd.to_datetime(input_df['official_date'])

# Add below column just for calculation
input_df['diff_days']= (input_df['issued_date']-input_df['official_date']).abs()
print(input_df)

# Filter the group of transaction_id
input_df=input_df.loc[input_df.groupby('transaction_id').diff_days.idxmin()]

# Now remove temp column
input_df = input_df.drop(['diff_days'], axis=1)
print(input_df)

Output -

38386628511576000912016-05-25   A2016-6385772016-02-22 10:22:4598386628611576000932019-05-25  A2019-10656622019-11-21 15:39:30

Post a Comment for "Remove Specific Columns In Dataframe With Same Id On Date Condition"