Array Inside List
Solution 1:
In the tutorial you linked to, the object series
is actually a Pandas Series
. It's a vector of information, with a named index. Your dataset, however, contains two fields of information, in addition to the time series index, which makes it a DataFrame
. This is the reason why the tutorial code breaks with your data.
Here's a sample from your data:
import pandas as pd
defparser(x):
return datetime.strptime(''+x, '%Y-%m-%d')
df = pd.read_csv("datos2.csv", header=None, parse_dates=[0],
index_col=0, squeeze=True, date_parser=parser)
df.head()
1202012-01-01 10.937362012-01-02 10.335702012-01-03 9.036892012-01-04 9.536802012-01-05 10.33697
And the equivalent section from the tutorial: "Running the example loads the dataset as a Pandas Series and prints the first 5 rows."
Month1901-01-01 266.01901-02-01 145.91901-03-01 183.11901-04-01 119.31901-05-01 180.3Name:Sales,dtype:float64
To verify this, select one of your fields and store it as series
, and then try running the MinMaxScaler
. You'll see that it runs without error:
series = df[1]
# ... compute difference anddo scaling ...
print(scaled_values)
[[ 0.58653846]
[ 0.55288462]
[ 0.63942308]
...,
[ 0.75 ]
[ 0.6875 ]
[ 0.51923077]]
Note: One other minor difference in your dataset compared to the tutorial data is that there's no header in your data. Set header=None
to avoid assigning your first row of data as column headers.
UPDATE
To pass your entire dataset to MinMaxScaler
, just run difference()
on both columns and pass in the transformed vectors for scaling. MinMaxScaler
accepts an n-dimensional DataFrame
object:
ncol = 2diff_df = pd.concat([difference(df[i], 1) for i in range(1,ncol+1)], axis=1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_values = scaler.fit_transform(diff_df)
Post a Comment for "Array Inside List"