Skip to content Skip to sidebar Skip to footer

Pandas: Create A New Column In A Dataframe That Is A Function Of A Rolling Window

I have a data frame and can compute a new column of rolling 10 period means using pandas.stats.moments.rolling_mean(ExistingColumn, 10, min_periods=10). If there are fewer than 10

Solution 1:

You could use pandas.rolling_apply:

import numpy as np
defhodgesLehmanMean(x): 
    return0.5 * np.median([x[i] + x[j] 
                           for i inrange(len(x)) 
                           for j inrange(i+1,len(x))])

df = pd.DataFrame({'foo': np.arange(20, dtype='float')})
df['bar'] = pd.rolling_apply(df['foo'], 10, hodgesLehmanMean)
print(df)

yields

    foo   bar
00NaN11NaN22NaN33NaN44NaN55NaN66NaN77NaN88NaN994.510105.511116.512127.513138.514149.5151510.5161611.5171712.5181813.5191914.5

A faster version of hodgesLehmanMean would be:

def hodgesLehmanMean_alt(x): 
    m = np.add.outer(x,x)
    ind = np.tril_indices(len(x), -1)
    return 0.5 * np.median(m[ind])

Here is a sanity-check showing hodgesLehmanMean_alt returns the same value as hodgesLehmanMean for 1000 random arrays of length 100:

In [68]: m = np.random.random((1000, 100))

In [69]: all(hodgesLehmanMean(x) == hodgesLehmanMean_alt(x) for x in m)
Out[69]: True

Here is a benchmark showing hodgesLehmanMean_alt is about 8x faster:

In [80]: x = np.random.random(5000)

In [81]: %timeit hodgesLehmanMean(x)
1 loops, best of 3: 3.99 s per loop

In [82]: %timeit hodgesLehmanMean_alt(x)
1 loops, best of 3: 463 ms per loop

Post a Comment for "Pandas: Create A New Column In A Dataframe That Is A Function Of A Rolling Window"