Array Inside List
Solution 1:
In the tutorial you linked to, the object series is actually a Pandas Series. It's a vector of information, with a named index. Your dataset, however, contains two fields of information, in addition to the time series index, which makes it a DataFrame. This is the reason why the tutorial code breaks with your data.
Here's a sample from your data:
import pandas as pd
defparser(x):
return datetime.strptime(''+x, '%Y-%m-%d')
df = pd.read_csv("datos2.csv", header=None, parse_dates=[0],
index_col=0, squeeze=True, date_parser=parser)
df.head()
1202012-01-01 10.937362012-01-02 10.335702012-01-03 9.036892012-01-04 9.536802012-01-05 10.33697And the equivalent section from the tutorial: "Running the example loads the dataset as a Pandas Series and prints the first 5 rows."
Month1901-01-01 266.01901-02-01 145.91901-03-01 183.11901-04-01 119.31901-05-01 180.3Name:Sales,dtype:float64To verify this, select one of your fields and store it as series, and then try running the MinMaxScaler. You'll see that it runs without error:
series = df[1]
# ... compute difference anddo scaling ...
print(scaled_values)
[[ 0.58653846]
[ 0.55288462]
[ 0.63942308]
...,
[ 0.75 ]
[ 0.6875 ]
[ 0.51923077]]Note: One other minor difference in your dataset compared to the tutorial data is that there's no header in your data. Set header=None to avoid assigning your first row of data as column headers.
UPDATE
To pass your entire dataset to MinMaxScaler, just run difference() on both columns and pass in the transformed vectors for scaling. MinMaxScaler accepts an n-dimensional DataFrame object:
ncol = 2diff_df = pd.concat([difference(df[i], 1) for i in range(1,ncol+1)], axis=1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_values = scaler.fit_transform(diff_df)
Post a Comment for "Array Inside List"