Append Series To Empty Dataframe Column Always Results The Same After A Loop
Solution 1:
TL;DR By assigning series to Dataframe column, the series will be conformed to the DataFrames index. The result of append()
has more elements than the index of df
, so column value won't change.
There is no problem with the append()
function, the problem is in df["A"]
assignment.
With df["A"] = xx
, we are calling __setitem__()
:
def __setitem__(self, key, value):
key = com.apply_if_callable(key, self)
# see if we can slice the rows
indexer = convert_to_index_sliceable(self, key)
if indexer is not None:
# either we have a slice or we have a string that can be converted
# to a slice for partial-string date indexing
return self._setitem_slice(indexer, value)
if isinstance(key, DataFrame) or getattr(key, "ndim", None) == 2:
self._setitem_frame(key, value)
elif isinstance(key, (Series, np.ndarray, list, Index)):
self._setitem_array(key, value)
else:
# set column
self._set_item(key, value)
In this case, we are not accessing the dataframe like df[:]
, so indexer
is None. key
value is A
, which is just a string type. So we actually call:
self._set_item(key, value)
Let's see how _set_item()
is defined:
def _set_item(self, key, value):
"""
Add series to DataFrame in specified column.
If series is a numpy-array (not a Series/TimeSeries), it must be the
same length as the DataFrames index or an error will be thrown.
Series/TimeSeries will be conformed to the DataFrames index to
ensure homogeneity.
"""
self._ensure_valid_index(value)
value = self._sanitize_column(key, value)
NDFrame._set_item(self, key, value)
# check if we are modifying a copy
# try to set first as we want an invalid
# value exception to occur first
if len(self):
self._check_setitem_copy()
From the doc, we can see Series/TimeSeries will be conformed to the DataFrames index to ensure homogeneity.
. This explains why the dataframe df
doesn't change. Because after the first loop, the result of append()
is larger than the index of df
, the redundant is truncated.
If so, why appending to dataframe df
is successful in the first loop? The answer lays in self._ensure_valid_index(value)
def _ensure_valid_index(self, value):
"""
Ensure that if we don't have an index, that we can create one from the
passed value.
"""
If the dataframe is empty, this method extends the dataframe to a len(value)*columns
matrix with NaN
values. Then with NDFrame._set_item(self, key, value)
, we replace the column key
with value
.
In the second example, we are trying to append to B
column after A
column:
for i in range(5):
df["A"] = df["A"].append(df2["C"], ignore_index=True)
df["B"] = df["B"].append(df2["D"], ignore_index=True)
In the first loop, after appending to A
column, the B
column of dataframe df
is filled with NaN
. df["B"].append(df2["D"], ignore_index=True)
appends values to original NaN
. By assigning it to df["B"]
, the append()
result will be conformed to the DataFrames index. That's why df["B"]
remains NaN
.
In the third example, we just replace the dataframe df with the result of append, it doesn't involve with dataframe __setitem__()
.
for i in range(5):
df = df.append(df2, ignore_index=True)
Post a Comment for "Append Series To Empty Dataframe Column Always Results The Same After A Loop"