Get Ranges Of True Values (start And End) In A Boolean List (without Using A For Loop)

May 24, 2024 Post a Comment

For example I want to convert this list x=[False, True, True, True, True, False, True, True, False, True] to a ranges (start and end locations) of True values [[1,4], [6,7], [9,

Solution 1:

A solution with Pandas only:

s = pd.Series(x)
grp = s.eq(False).cumsum()
arr = grp.loc[s.eq(True)] \
         .groupby(grp) \
         .apply(lambda x: [x.index.min(), x.index.max()])

Output:

>>> arr1[1, 4]2[6, 7]3[9, 9]dtype: object

>>> arr.tolist()
[[1, 4], [6, 7], [9, 9]]

Alternative:

np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values, 
           s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T

# Output:array([[1, 4],
       [6, 7],
       [9, 9]])

Performance

# Solution 1
>>> %timeit s.eq(False).cumsum().loc[s.eq(True)].groupby(s.eq(False).cumsum()).apply(lambda x: [x.index.min(), x.index.max()])
1.22 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Solution 2
>>> %timeit np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values, s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T
477 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Solution @psidom
>>> %timeit np_vec2ran(x)
29.2 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For 1,000,000 records:

Baca Juga

x = np.random.choice([True, False], 1000000)
s = pd.Series(s)

>>>%timeit np.vstack([s[s & (s.shift(1, fill_value=False) ==False)].index.values, s[s & (s.shift(-1, fill_value=False) ==False)].index.values]).T
18.2 ms ± 247 µs per loop (mean ± std. dev. of7 runs, 100 loops each)

>>>%timeit np_vec2ran(x)
5.03 ms ± 266 µs per loop (mean ± std. dev. of7 runs, 100 loops each)

Solution 2:

Option with numpy. We can check if previous value is False and current value is True, then it's the start of True sequence. On the other hand, if previous value is True and current value is False, then it's the end of True sequence.

z = np.concatenate(([False], x, [False]))

start = np.flatnonzero(~z[:-1] & z[1:])   
end = np.flatnonzero(z[:-1] & ~z[1:])

np.column_stack((start, end-1))
array([[1, 4],
       [6, 7],
       [9, 9]], dtype=int32)

A little benchmark against the faster pandas solution:

def np_vec2ran(x):
    z = np.concatenate(([False], x, [False]))

    start = np.flatnonzero(~z[:-1] & z[1:])
    end = np.flatnonzero(z[:-1] & ~z[1:])

    return np.column_stack((start, end-1))

np_vec2ran(x)
array([[1, 4],
       [6, 7],
       [9, 9]], dtype=int32)

def pd_vec2ran(x):
    s = pd.Series(x)
    return list(zip(s[s.eq(True) & s.shift(1).eq(False)].index, s[s.eq(True) & s.shift(-1, fill_value=False).eq(False)].index))

pd_vec2ran(x)
[(1, 4), (6, 7), (9, 9)]

timeit('pd_vec2ran(x)', number=10, globals=globals())
0.040585001000181364

timeit('np_vec2ran(x)', number=10, globals=globals())
0.0011799999992945231

Python Playground

Get Ranges Of True Values (start And End) In A Boolean List (without Using A For Loop)

Solution 1:

Solution 2:

Post a Comment for "Get Ranges Of True Values (start And End) In A Boolean List (without Using A For Loop)"