Get Ranges Of True Values (start And End) In A Boolean List (without Using A For Loop)
For example I want to convert this list x=[False, True, True, True, True, False, True, True, False, True] to a ranges (start and end locations) of True values [[1,4], [6,7], [9,
Solution 1:
A solution with Pandas
only:
s = pd.Series(x)
grp = s.eq(False).cumsum()
arr = grp.loc[s.eq(True)] \
.groupby(grp) \
.apply(lambda x: [x.index.min(), x.index.max()])
Output:
>>> arr1[1, 4]2[6, 7]3[9, 9]dtype: object
>>> arr.tolist()
[[1, 4], [6, 7], [9, 9]]
Alternative:
np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values,
s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T
# Output:array([[1, 4],
[6, 7],
[9, 9]])
Performance
# Solution 1
>>> %timeit s.eq(False).cumsum().loc[s.eq(True)].groupby(s.eq(False).cumsum()).apply(lambda x: [x.index.min(), x.index.max()])
1.22 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Solution 2
>>> %timeit np.vstack([s[s & (s.shift(1, fill_value=False) == False)].index.values, s[s & (s.shift(-1, fill_value=False) == False)].index.values]).T
477 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Solution @psidom
>>> %timeit np_vec2ran(x)
29.2 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For 1,000,000 records:
x = np.random.choice([True, False], 1000000)
s = pd.Series(s)
>>>%timeit np.vstack([s[s & (s.shift(1, fill_value=False) ==False)].index.values, s[s & (s.shift(-1, fill_value=False) ==False)].index.values]).T
18.2 ms ± 247 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
>>>%timeit np_vec2ran(x)
5.03 ms ± 266 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
Solution 2:
Option with numpy
. We can check if previous value is False and current value is True, then it's the start of True sequence. On the other hand, if previous value is True and current value is False, then it's the end of True sequence.
z = np.concatenate(([False], x, [False]))
start = np.flatnonzero(~z[:-1] & z[1:])
end = np.flatnonzero(z[:-1] & ~z[1:])
np.column_stack((start, end-1))
array([[1, 4],
[6, 7],
[9, 9]], dtype=int32)
A little benchmark against the faster pandas solution:
def np_vec2ran(x):
z = np.concatenate(([False], x, [False]))
start = np.flatnonzero(~z[:-1] & z[1:])
end = np.flatnonzero(z[:-1] & ~z[1:])
return np.column_stack((start, end-1))
np_vec2ran(x)
array([[1, 4],
[6, 7],
[9, 9]], dtype=int32)
def pd_vec2ran(x):
s = pd.Series(x)
return list(zip(s[s.eq(True) & s.shift(1).eq(False)].index, s[s.eq(True) & s.shift(-1, fill_value=False).eq(False)].index))
pd_vec2ran(x)
[(1, 4), (6, 7), (9, 9)]
timeit('pd_vec2ran(x)', number=10, globals=globals())
0.040585001000181364
timeit('np_vec2ran(x)', number=10, globals=globals())
0.0011799999992945231
Post a Comment for "Get Ranges Of True Values (start And End) In A Boolean List (without Using A For Loop)"