Pandas Sum Of Two Columns - Dealing With Nan-values Correctly
Solution 1:
From the documentation pandas.DataFrame.sum
By default, the sum of an empty or all-NA Series is 0.
>>> pd.Series([]).sum() # min_count=0 is the default 0.0
This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.
Change your code to
data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1, min_count=1)
output
Surf1 Surf2
010.022.01NaN8.028.015.03NaNNaN416.014.0515.07.0
Surf1 Surf2 Sum
010.022.032.01NaN8.08.028.015.023.03NaNNaNNaN416.014.030.0515.07.022.0
Solution 2:
You could mask
the result by doing:
df.sum(1).mask(df.isna().all(1))
00.018.0223.03 NaN
430.0522.0
dtype: float64
Solution 3:
You can do:
df['Sum'] = df.dropna(how='all').sum(1)
Output:
Surf1 Surf2 Sum
010.022.032.01NaN8.08.028.015.023.03NaNNaNNaN416.014.030.0515.07.022.0
Solution 4:
You can use min_count
, this will sum all the row when there is at least on not null, if all null return null
df['SUM']=df.sum(min_count=1,axis=1)
#df.sum(min_count=1,axis=1)
Out[199]:
00.018.0223.03 NaN
430.0522.0
dtype: float64
Solution 5:
I think All the solutions listed above work only for the cases when when it is the FIRST column value that is missing. If you have cases when the first column value is non-missing but the second column value is missing, try using:
df['sum'] = df['Surf1']
df.loc[(df['Surf2'].notnull()), 'sum'] = df['Surf1'].fillna(0) + df['Surf2']
Post a Comment for "Pandas Sum Of Two Columns - Dealing With Nan-values Correctly"