Skip to content Skip to sidebar Skip to footer

Python Slicing Does Not Give Key Error Even When The Column Is Missing

I have a pandas dataframe with 10 keys. If I try to access a column that is not present, even then it returns a NaN for this. I was expecting a KeyError. How is pandas not able to

Solution 1:

This is expected behaviour and is due to the feature setting with enlargement

In [15]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df.ix[:,['a','d']]

Out[15]:
          a   d
0 -1.164349 NaN
10.400116 NaN
2 -0.599496 NaN
30.186837 NaN
40.385656 NaN

If you try df['d'] or df[['a','d']] then you will get a KeyError

Effectively what you're doing is reindexing, the fact the column doesn't exists when using ix doesn't matter, you'll just get a column of NaNs

Same behaviour is observed using loc:

In[24]:
df.loc[:,['a','d']]

Out[24]:
          ad0-1.164349NaN10.400116NaN2-0.599496NaN30.186837NaN40.385656NaN

When you don't use ix or loc and try to do df['d'] you're trying to index a specific column or list of columns, there is no expectation of enlargement here unless you are assigning to a new column: e.g. df['d'] = some_new_vals

To guard against this you can validate your list using isin with the columns:

In [26]:
valid_cols = df.columns.isin(['a','d'])
df.ix[:, valid_cols]

Out[26]:
          a
0 -1.16434910.4001162 -0.59949630.18683740.385656

Now you will only see columns that exist, plus if you have mis-spelt any columns then it will also guard against this

Solution 2:

For me works select by subset:

final_feature[['vendor_id','this column is absent']]

KeyError: "['this column is absent'] not in index"

Also ix is deprecated in last version of pandas (0.20.1), check here.

Post a Comment for "Python Slicing Does Not Give Key Error Even When The Column Is Missing"