Skip to content Skip to sidebar Skip to footer

Condensing Pandas Dataframe By Dropping Missing Elements

Problem I have a dataframe that looks like this: Key Var ID_1 Var_1 ID_2 Var_2 ID_3 Var_3 1 True 1.0 True NaN NaN 5.0 True 2 True NaN NaN 4.0 False 7.

Solution 1:

There is one simple solution i.e push the nans to right and drop the nans on axis 1. i.e

ndf = data.apply(lambda x : sorted(x,key=pd.isnull),1).dropna(1)

Output:

  Key    Var ID_1  Var_1 ID_2 Var_2
0   1   True    1   True    5  True
1   2   True    4  False    7  True
2   3  False    2  False    5  True

Hope it helps.

A numpy solution from Divakar here for 10x speed i.e

def mask_app(a):
    out = np.full(a.shape,np.nan,dtype=a.dtype)
    mask = ~np.isnan(a.astype(float))
    out[np.sort(mask,1)[:,::-1]] = a[mask]
    return out

ndf = pd.DataFrame(mask_app(data.values),columns=data.columns).dropna(1)
  Key    Var ID_1  Var_1 ID_2 Var_2
0   1   True    1   True    5  True
1   2   True    4  False    7  True
2   3  False    2  False    5  True

Post a Comment for "Condensing Pandas Dataframe By Dropping Missing Elements"