Converting Index Into Multiindex (hierarchical Index) In Pandas
Solution 1:
Once we have a DataFrame
import pandas as pd
df = pd.read_csv("input.csv", index_col=0) # or from another source
and a function mapping each index to a tuple (below, it is for the example from this question)
defprocess_index(k):
returntuple(k.split("|"))
we can create a hierarchical index in the following way:
df.index = pd.MultiIndex.from_tuples([process_index(k) for k,v in df.iterrows()])
An alternative approach is to create two columns then set them as the index (the original index will be dropped):
df['e-mail'] = [x.split("|")[0] for x in df.index]
df['date'] = [x.split("|")[1] for x in df.index]
df = df.set_index(['e-mail', 'date'])
or even shorter
df['e-mail'], df['date'] = zip(*map(process_index, df.index))
df = df.set_index(['e-mail', 'date'])
Solution 2:
In pandas>=0.16.0
, we can use the .str
accessor on indices. This makes the following possible:
df.index = pd.MultiIndex.from_tuples(df.index.str.split('|').tolist())
(Note: I tried the more intuitive: pd.MultiIndex.from_arrays(df.index.str.split('|'))
but for some reason that gives me errors.)
Solution 3:
My preference would be to initially read this in as a column (i.e. not as an index), then you can use the str split method:
csv ='\n'.join(['name@domain.com|2013-05-07 05:52:51 +0200, 42'] *3)
df = pd.read_csv(StringIO(csv), header=None)
In [13]: df[0].str.split('|')
Out[13]:
0 [name@domain.com, 2013-05-0705:52:51+0200]
1 [name@domain.com, 2013-05-0705:52:51+0200]
2 [name@domain.com, 2013-05-0705:52:51+0200]
Name: 0, dtype: object
And then feed this into a MultiIndex (perhaps this can be done cleaner?):
m = pd.MultiIndex.from_arrays(zip(*df[0].str.split('|')))
Delete the 0th column and set the index to the new MultiIndex:
deldf[0]df.index=mIn [17]:dfOut[17]:1name@domain.com2013-05-07 05:52:51+0200422013-05-07 05:52:51+0200422013-05-07 05:52:51+020042
Post a Comment for "Converting Index Into Multiindex (hierarchical Index) In Pandas"