How To Remove Duplicate Timestamp Overlaps And Add Column In The Original Dataframe From Zipped List Findings

February 01, 2024 Post a Comment

I have two columns in my dataframe 'START_TIME' and 'END_TIME' which i zipped into a list and brought it to the below form. Used the following snippet to generate that. zippedList

Solution 1:

Since you're just interested if there is an overlap (and not how many), you can break the inner for loop as soon as you find an overlap. Create the boolean mask as you check for overlaps in the nested for loops.

import pandas as pd

zippedList = (
    [(pd.Timestamp('2020-06-09 06:00:00'), pd.Timestamp('2020-06-09 16:00:00')),
     (pd.Timestamp('2020-06-09 02:00:00'), pd.Timestamp('2020-06-09 06:00:00')),
     (pd.Timestamp('2020-06-10 02:00:00'), pd.Timestamp('2020-06-10 06:00:00')),
     (pd.Timestamp('2020-06-09 16:00:00'), pd.Timestamp('2020-06-10 02:00:00')),
     (pd.Timestamp('2020-06-10 06:00:00'), pd.Timestamp('2020-06-10 16:00:00')),
     (pd.Timestamp('2020-06-10 16:00:00'), pd.Timestamp('2020-06-11 02:00:00')),
     (pd.Timestamp('2020-06-11 02:00:00'), pd.Timestamp('2020-06-11 06:00:00')),
     (pd.Timestamp('2020-06-11 01:00:00'), pd.Timestamp('2020-06-11 05:00:00')),
     (pd.Timestamp('2020-06-11 06:00:00'), pd.Timestamp('2020-06-11 16:00:00')),
     (pd.Timestamp('2020-06-11 16:00:00'), pd.Timestamp('2020-06-12 02:00:00'))]
    )

# map to intervals before looping
intervals = list(map(lambda i: pd.Interval(i[0], i[1], closed='neither'), zippedList))

m = []
for i1 in intervals:
    for i2 in intervals:
        if (i1.overlaps(i2)) and i1 != i2:
            m.append(True)
            breakelse: # else clause will only be called if break wasn't executed
        m.append(False)

for b, t inzip(m, zippedList):
    if b:
        print(t)

# (Timestamp('2020-06-10 16:00:00'), Timestamp('2020-06-11 02:00:00'))# (Timestamp('2020-06-11 02:00:00'), Timestamp('2020-06-11 06:00:00'))# (Timestamp('2020-06-11 01:00:00'), Timestamp('2020-06-11 05:00:00'))

Now you can make m a column of your df, i.e. new_df['OVERLAPS'] = m

Python Playground

How To Remove Duplicate Timestamp Overlaps And Add Column In The Original Dataframe From Zipped List Findings

Solution 1:

Post a Comment for "How To Remove Duplicate Timestamp Overlaps And Add Column In The Original Dataframe From Zipped List Findings"