Skip to content Skip to sidebar Skip to footer

Trying To Understand .apply() In Pandas

I'm trying to avoid looping through dataframes so started using .apply() recently. However I don't really understand the behaviour. I have a super easy toy example below. The user

Solution 1:

See the documentation for pd.DataFrame.apply:

Notes


In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

Your function check_fruit does have side-effects, namely asking the user for some input, which happens once more than you would expect.

In general, apply and other data frame functions are meant to be used with functions that transform the data in some way, not with application logic. You do not get any particular benefit for not writing out the loop explicitly in this case, so the best you can do is probably just go through each row by hand:

import pandas as pd

defcheck_fruit(row):
    # ...

df = pd.DataFrame({'fruit': ['apple', 'apple', 'apple', 'apple', 'apple'],
                   'result': [''] * 5})
for row in df.iterrows():
    check_fruit(row)

Solution 2:

@jdehesa explained why the first row was being repeated.

My second question was: why isn't the new data being returned. I found the problem, very noob mistake. I had row['result']=='Correct' instead of row['result']='Correct' .i.e. == vs =.

Post a Comment for "Trying To Understand .apply() In Pandas"