Skip to content Skip to sidebar Skip to footer

Basic Groupby Operations In Dask

I am attempting to use Dask to handle a large file (50 gb). Typically, I would load it in memory and use Pandas. I want to groupby two columns 'A', and 'B', and whenever column 'C'

Solution 1:

It appears dask does not currently implement the fillna method for GroupBy objects. I've tried PRing it some time ago and gave up quite quickly.

Also, dask doesn't support the method parameter (as it isn't always trivial to implement with delayed algorithms).

A workaround for this could be using fillna before grouping, like so:

df['C'] = df.fillna(0).groupby(['A','B'])['C']

Although this wasn't tested.

You can find my (failed) attempt here: https://github.com/nirizr/dask/tree/groupy_fillna

Post a Comment for "Basic Groupby Operations In Dask"