Basic Groupby Operations In Dask
I am attempting to use Dask to handle a large file (50 gb). Typically, I would load it in memory and use Pandas. I want to groupby two columns 'A', and 'B', and whenever column 'C'
Solution 1:
It appears dask does not currently implement the fillna
method for GroupBy
objects. I've tried PRing it some time ago and gave up quite quickly.
Also, dask doesn't support the method
parameter (as it isn't always trivial to implement with delayed algorithms).
A workaround for this could be using fillna
before grouping, like so:
df['C'] = df.fillna(0).groupby(['A','B'])['C']
Although this wasn't tested.
You can find my (failed) attempt here: https://github.com/nirizr/dask/tree/groupy_fillna
Post a Comment for "Basic Groupby Operations In Dask"