Insert Rows Into Pandas Dataframe While Maintaining Column Data Types
Solution 1:
As you found, since NaN
is a float
, adding NaN
to a series may cause it to be either upcasted to float
or converted to object
. You are right in determining this is not a desirable outcome.
There is no straightforward approach. My suggestion is to store your input row data in a dictionary and combine it with a dictionary of defaults before appending. Note that this works because pd.DataFrame.append
accepts a dict
argument.
In Python 3.6, you can use the syntax {**d1, **d2}
to combine two dictionaries with preference for the second.
default = {'name': '', 'age': 0, 'weight': 0.0, 'has_children': False}
row = {'name': 'Cindy', 'age': 42}
df = df.append({**default, **row}, ignore_index=True)
print(df)
age has_children name weight
045True Bob 143.2140True Sue 130.2210False Tom 34.9342False Cindy 0.0print(df.dtypes)
age int64
has_children bool
name object
weight float64
dtype: object
Solution 2:
It's because, NaN value is a float, but True and False are bool. There are mixed dtypes in one column, so Pandas will automatically convert it into object.
Another instance of this is, if you have a column with all integer values and append a value with float, then pandas change entire column to float by adding '.0' to the remaining values.
Edit
Based on comments, Another hacky way to convert object to bool dtype.
df = pandas.DataFrame({
'name': ['Bob', 'Sue', 'Tom'],
'age': [45, 40, 10],
'weight': [143.2, 130.2, 34.9],
'has_children': [True, True, False]
})
row = {'name': 'Cindy', 'age': 12}
df = df.append(row, ignore_index=True)
df['has_children'] = df['has_children'].fillna(False).astype('bool')
Now the new dataframe looks like this :
age has_children name weight
045True Bob 143.2140True Sue 130.2210False Tom 34.9312False Cindy NaN
Post a Comment for "Insert Rows Into Pandas Dataframe While Maintaining Column Data Types"