Skip to content Skip to sidebar Skip to footer

Create Dataframes From Unique Value Pairs By Filtering Across Multiple Columns

I want to filter values across multiple columns creating dataframes for the unique value combinations. Any help would be appreciated. Here is my code that is failing (given datafra

Solution 1:

Use pandas groupby functionality to extract the unique indices and the corresponding rows of your dataframe.

import pandas as pd
from collections import defaultdict

df = pd.DataFrame({'col1': ['A']*4 + ['B']*4,
                   'col2': [0,1]*4,
                   'col3': np.arange(8),
                   'col4': np.arange(10, 18)})

dd = defaultdict(dict)
grouped = df.groupby(['col1', 'col2'])
for (c1, c2), g in grouped:
    dd[c1][c2] = g

This is the generated df:

  col1  col2  col3  col4
0A00101A11112A02123A13134B04145B15156B06167B1717

And this is the extracted dd (well, dict(dd) really)

{'B': {0:col1col2col3col44B04146B0616,
       1:col1col2col3col45B15157B1717},
 'A': {0:col1col2col3col40A00102A0212,
       1:col1col2col3col41A11113A1313}}

(I don't know what your use case for this is, but you may be better off not parsing the groupby object to a dictionary anyway).

Post a Comment for "Create Dataframes From Unique Value Pairs By Filtering Across Multiple Columns"