Apply A Function Pairwise On A Pandas Series
I've a pandas series whose elements constitute frozensets: data = {0: frozenset({'apple', 'banana'}), 1: frozenset({'apple', 'orange'}), 2: frozenset({'banana'}), 3:
Solution 1:
Couple of ways
Option 1] list comprehension
In [3631]: pd.Series([x[0].union(x[1])
for x in zip(tokens, tokens.shift(-1).fillna(''))],
index=tokens.index)
Out[3631]:
0 (orange, banana, apple)
1 (orange, apple, banana)
2 (orange, kumquat, banana)
3 (orange, kumquat)
4 (orange, pear)
5 (orange, pear)
6 (orange, pear, banana, apple)
7 (persimmon, pear, banana, apple)
8 (apple, persimmon, banana)
9 (apple, banana)
10 (banana, apple)
11 (apple)
dtype: object
Option 2]map
In [3632]: pd.Series(map(lambda x: x[0].union(x[1]),
zip(tokens, tokens.shift(-1).fillna(''))),
index=tokens.index)
Out[3632]:
0 (orange, banana, apple)
1 (orange, apple, banana)
2 (orange, kumquat, banana)
3 (orange, kumquat)
4 (orange, pear)
5 (orange, pear)
6 (orange, pear, banana, apple)
7 (persimmon, pear, banana, apple)
8 (apple, persimmon, banana)
9 (apple, banana)
10 (banana, apple)
11 (apple)
dtype: object
Option 3] Using concat
and apply
In [3633]: pd.concat([tokens, tokens.shift(-1).fillna('')],
axis=1).apply(lambda x: x[0].union(x[1]), axis=1)
Out[3633]:
0 (orange, banana, apple)
1 (orange, apple, banana)
2 (orange, kumquat, banana)
3 (orange, kumquat)
4 (orange, pear)
5 (orange, pear)
6 (orange, pear, banana, apple)
7 (persimmon, pear, banana, apple)
8 (apple, persimmon, banana)
9 (apple, banana)
10 (banana, apple)
11 (apple)
dtype: object
Timings
In [3647]: tokens.shape
Out[3647]: (60000L,)
In [3648]: %timeit pd.Series([x[0].union(x[1]) for x in zip(tokens, tokens.shift(-1).fillna(''))], index=tokens.index)
10 loops, best of 3: 35 ms per loop
In [3649]: %timeit pd.Series(map(lambda x: x[0].union(x[1]), zip(tokens, tokens.shift(-1).fillna(''))), index=tokens.index)
10 loops, best of 3: 40.9 ms per loop
In [3650]: %timeit pd.concat([tokens, tokens.shift(-1).fillna('')], axis=1).apply(lambda x: x[0].union(x[1]), axis=1)
1 loop, best of 3: 2.2 s per loop
Unrelated and for sake of a number on diff
In [3653]: %timeit tokens.diff()
10 loops, best of 3: 10.8 ms per loop
Post a Comment for "Apply A Function Pairwise On A Pandas Series"