Skip to content Skip to sidebar Skip to footer

How Do Numpy Functions Operate On Pandas Objects Internally?

Numpy functions, eg np.mean(), np.var(), etc, accept an array-like argument, like np.array, or list, etc. But passing in a pandas dataframe also works. This means that a pandas dat

Solution 1:

If you step through this:

--Call--
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2796)mean()
-> def mean(a, axis=None, dtype=None, out=None, keepdims=False):
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2877)mean()
-> if type(a) is not mu.ndarray:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2878)mean()
-> try:
(Pdb) s
> d:\winpython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py(2879)mean()
-> mean = a.mean

You can see that the type is not a ndarray so it tries to call a.mean which in this case would be df.mean():

In [6]:

df.mean()
Out[6]:
00.57299910.468268
dtype: float64

This is why the output is different

Code to reproduce above:

In [3]:
a = np.random.rand(4,2)
a

Out[3]:
array([[ 0.96750329,  0.67623187],
       [ 0.44025179,  0.97312747],
       [ 0.07330062,  0.18341157],
       [ 0.81094166,  0.04030253]])

In [4]:    
np.mean(a)

Out[4]:
0.52063384885403818

In [5]:    
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]), 
columns=range(np.shape(a)[1]))
​
df

Out[5]:
          0100.9675030.67623210.4402520.97312720.0733010.18341230.8109420.040303

numpy output:

In [7]:
np.mean(df)

Out[7]:
00.57299910.468268
dtype: float64

If you'd called .values to return a np array then the output is the same:

In [8]:
np.mean(df.values)

Out[8]:
0.52063384885403818

Post a Comment for "How Do Numpy Functions Operate On Pandas Objects Internally?"