Pandas Sum Of Two Columns - Dealing With Nan-values Correctly

March 03, 2024 Post a Comment

When summing two pandas columns, I want to ignore nan-values when one of the two columns is a float. However when nan appears in both columns, I want to keep nan in the output (ins

Solution 1:

From the documentation pandas.DataFrame.sum

By default, the sum of an empty or all-NA Series is 0.
>>> pd.Series([]).sum() # min_count=0 is the default 0.0
This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

Change your code to

data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1, min_count=1)

output

   Surf1  Surf2
010.022.01NaN8.028.015.03NaNNaN416.014.0515.07.0
   Surf1  Surf2   Sum
010.022.032.01NaN8.08.028.015.023.03NaNNaNNaN416.014.030.0515.07.022.0

Solution 2:

You could mask the result by doing:

df.sum(1).mask(df.isna().all(1))

00.018.0223.03     NaN
430.0522.0
dtype: float64

Solution 3:

You can do:

df['Sum'] = df.dropna(how='all').sum(1)

Output:

   Surf1  Surf2   Sum
010.022.032.01NaN8.08.028.015.023.03NaNNaNNaN416.014.030.0515.07.022.0

Solution 4:

You can use min_count, this will sum all the row when there is at least on not null, if all null return null

df['SUM']=df.sum(min_count=1,axis=1)
#df.sum(min_count=1,axis=1)
Out[199]: 
00.018.0223.03     NaN
430.0522.0
dtype: float64

Solution 5:

I think All the solutions listed above work only for the cases when when it is the FIRST column value that is missing. If you have cases when the first column value is non-missing but the second column value is missing, try using:

df['sum'] = df['Surf1']

df.loc[(df['Surf2'].notnull()), 'sum'] = df['Surf1'].fillna(0) + df['Surf2']

Python Playground