Skip to content Skip to sidebar Skip to footer

How To Remove Duplicate Timestamp Overlaps And Add Column In The Original Dataframe From Zipped List Findings

I have two columns in my dataframe 'START_TIME' and 'END_TIME' which i zipped into a list and brought it to the below form. Used the following snippet to generate that. zippedList

Solution 1:

Since you're just interested if there is an overlap (and not how many), you can break the inner for loop as soon as you find an overlap. Create the boolean mask as you check for overlaps in the nested for loops.

import pandas as pd

zippedList = (
    [(pd.Timestamp('2020-06-09 06:00:00'), pd.Timestamp('2020-06-09 16:00:00')),
     (pd.Timestamp('2020-06-09 02:00:00'), pd.Timestamp('2020-06-09 06:00:00')),
     (pd.Timestamp('2020-06-10 02:00:00'), pd.Timestamp('2020-06-10 06:00:00')),
     (pd.Timestamp('2020-06-09 16:00:00'), pd.Timestamp('2020-06-10 02:00:00')),
     (pd.Timestamp('2020-06-10 06:00:00'), pd.Timestamp('2020-06-10 16:00:00')),
     (pd.Timestamp('2020-06-10 16:00:00'), pd.Timestamp('2020-06-11 02:00:00')),
     (pd.Timestamp('2020-06-11 02:00:00'), pd.Timestamp('2020-06-11 06:00:00')),
     (pd.Timestamp('2020-06-11 01:00:00'), pd.Timestamp('2020-06-11 05:00:00')),
     (pd.Timestamp('2020-06-11 06:00:00'), pd.Timestamp('2020-06-11 16:00:00')),
     (pd.Timestamp('2020-06-11 16:00:00'), pd.Timestamp('2020-06-12 02:00:00'))]
    )

# map to intervals before looping
intervals = list(map(lambda i: pd.Interval(i[0], i[1], closed='neither'), zippedList))

m = []
for i1 in intervals:
    for i2 in intervals:
        if (i1.overlaps(i2)) and i1 != i2:
            m.append(True)
            breakelse: # else clause will only be called if break wasn't executed
        m.append(False)

for b, t inzip(m, zippedList):
    if b:
        print(t)

# (Timestamp('2020-06-10 16:00:00'), Timestamp('2020-06-11 02:00:00'))# (Timestamp('2020-06-11 02:00:00'), Timestamp('2020-06-11 06:00:00'))# (Timestamp('2020-06-11 01:00:00'), Timestamp('2020-06-11 05:00:00'))

Now you can make m a column of your df, i.e. new_df['OVERLAPS'] = m

Post a Comment for "How To Remove Duplicate Timestamp Overlaps And Add Column In The Original Dataframe From Zipped List Findings"