Skip to content Skip to sidebar Skip to footer

Combine Count And Percentage (normalization) In Pandas Crosstab

I know that I can have percentage values in a pandas.crosstab() when normalize=True. But I want to combine absolute and normalized values in one table. What I expect is a snipped l

Solution 1:

You can join the 2 result dataframes and then rearrange the column index, as follows:

tab2 = taba.join(tabb, lsuffix='_n', rsuffix='_%')

tab2.columns = tab2.columns.map(lambda x: tuple(x.split('_')))

tab2 = (tab2.sort_index(ascending=[True, False] , axis=1)
            .rename_axis(columns=['YEAR', 'count_pct'], axis=1)
       )

Result:

YEAR      200020012002All          
count_pct    n         %    n         %    n         %   n         %
foo                                                                 
A10.16666700.00000010.16666720.333333B10.16666710.16666700.00000020.333333
C            00.00000010.16666710.16666720.333333All20.33333320.33333320.33333361.000000

Edit:

Breaking down the steps with more explanations on .sort_index() and rename_axis():

The interim resulting layout of tab2 before the last step is as follows:

YEAR2000 2001 2002 All2000      2001      2002       AllYEARnnnn%%%%fooA10120.1666670.0000000.1666670.333333B11020.1666670.1666670.0000000.333333C01120.0000000.1666670.1666670.333333All22260.3333330.3333330.3333331.000000

Here, 2 more fine-tunings we need to do:

  1. Group the columns by same years together, so that each n and % is under the same year. We do it by sort_index() here. axis=1 is to specify the index sorting is on columns instead of row index. The ascending= parameter is to specify the sorting order of the 2 levels of the column MultiIndex. The first True is to specify that the YEAR index should be sorted in ascending order, while the second False is to specify sorting 'n' and '%' in descending order. This is the required sorting for them to appear in the required sequence with 'n' to show before '%'.

Result:

YEAR 200020012002All          
YEAR    n         %    n         %    n         %   n         %
foo                                                            
A10.16666700.00000010.16666720.333333B10.16666710.16666700.00000020.333333
C       00.00000010.16666710.16666720.333333All20.33333320.33333320.33333361.000000
  1. Second fine-tuning is to change the axis-name of the lower MultiIndex from 'YEAR' to 'count_pct'. As you can see, there are now 2 'YEAR' on the left of the first and second line of the display. These corresponds to the axis names of first and second level column MultiIndex. We don't want all these axis names are the same. Hence, change by .rename_axis to:
YEAR      200020012002All          
count_pct    n         %    n         %    n         %   n         %
foo                                                                 
A10.16666700.00000010.16666720.333333B10.16666710.16666700.00000020.333333
C            00.00000010.16666710.16666720.333333All20.33333320.33333320.33333361.000000

The axis name of second (lower) level of column MultiIndex is changed to 'count_pct'.

Solution 2:

Since you already calculated the two cross tabs, the simplest solution is to concatenate them into your final data frame:

taba = pd.crosstab(df.foo, df.YEAR, dropna=False)
tabb = pd.crosstab(df.foo, df.YEAR, dropna=False, normalize=True)

tab = (
    pd.concat([taba, tabb], axis=1, keys=['n', '%'])
      .swaplevel(axis=1)
      .sort_index(axis=1, ascending=[True, False])
      .rename_axis(['YEAR', 'foo'], axis=1)
)

Resulting output is:

YEAR2000           2001           2002foon%n%n%fooA10.16666700.00000010.166667B10.16666710.16666700.000000C00.00000010.16666710.166667

Post a Comment for "Combine Count And Percentage (normalization) In Pandas Crosstab"