Skip to content Skip to sidebar Skip to footer

Add Unique Groups To DF For Each Row Including Sum From Other Columns

I got a DatFrame looking like this: ID field_1 area_1 field_2 area_2 field_3 area_3 field_4 area_4 1 scoccer 500 basketball 200

Solution 1:

Use pd.wide_to_long to reshape the DataFrame, which allows you to group by field and ID and sum the areas. Then pivot_table back to the wide format, after creating the column label with cumcount.

df = (pd.wide_to_long(df, i='ID', j='num', stubnames=['field', 'area'], sep='_')
        .groupby(['ID', 'field'])['area'].sum()
        .reset_index())
#   ID       field    area
#0   1  basketball   250.0
#1   1     scoccer   500.0
#2   1    swimming   100.0
#3   2  volleyball   100.0
#4   3  basketball  1000.0
#5   3    football    10.0
#6   4  basketball   320.0
#7   4    swimming   480.0
#8   5    football   160.0
#9   5  volleyball   140.0

df['idx'] = df.groupby('ID').cumcount()+1
df = (pd.pivot_table(df, index='ID', columns='idx', values=['field', 'area'], 
                     aggfunc='first')
        .sort_index(axis=1, level=1))
df.columns = ['_'.join(map(str, tup)) for tup in df.columns]

    area_1     field_1  area_2     field_2  area_3   field_3
ID                                                          
1    250.0  basketball   500.0     scoccer   100.0  swimming
2    100.0  volleyball     NaN         NaN     NaN       NaN
3   1000.0  basketball    10.0    football     NaN       NaN
4    320.0  basketball   480.0    swimming     NaN       NaN
5    160.0    football   140.0  volleyball     NaN       NaN

Just for fun, you could use the undocumented pd.lreshape instead of wide_to_long.

# Change range to (1,31) for your real data.
pd.lreshape(df, {'area': [f'area_{i}' for i in range(1,5)],
                 'field': [f'field_{i}' for i in range(1,5)]}

#    ID    area       field
#0    1   500.0     scoccer
#1    2   100.0  volleyball
#2    3  1000.0  basketball
#3    4   280.0    swimming
#4    5   110.0  volleyball
#5    1   200.0  basketball
#....
#10   4   320.0  basketball
#11   5    30.0  volleyball
#12   1    50.0  basketball

Post a Comment for "Add Unique Groups To DF For Each Row Including Sum From Other Columns"