Skip to content Skip to sidebar Skip to footer

Array Inside List

I'm really confused trying to solve this problem. I'm trying to use the sklearn function: MinMaxScaler but I'm getting an error because it seems to be that I'm setting an array ele

Solution 1:

In the tutorial you linked to, the object series is actually a Pandas Series. It's a vector of information, with a named index. Your dataset, however, contains two fields of information, in addition to the time series index, which makes it a DataFrame. This is the reason why the tutorial code breaks with your data.

Here's a sample from your data:

import pandas as pd

defparser(x):
    return datetime.strptime(''+x, '%Y-%m-%d')

df = pd.read_csv("datos2.csv", header=None, parse_dates=[0], 
                 index_col=0, squeeze=True, date_parser=parser)
df.head()
               1202012-01-01  10.937362012-01-02  10.335702012-01-03   9.036892012-01-04   9.536802012-01-05  10.33697

And the equivalent section from the tutorial: "Running the example loads the dataset as a Pandas Series and prints the first 5 rows."

Month1901-01-01    266.01901-02-01    145.91901-03-01    183.11901-04-01    119.31901-05-01    180.3Name:Sales,dtype:float64

To verify this, select one of your fields and store it as series, and then try running the MinMaxScaler. You'll see that it runs without error:

series = df[1]
# ... compute difference anddo scaling ...
print(scaled_values)
[[ 0.58653846]
 [ 0.55288462]
 [ 0.63942308]
 ..., 
 [ 0.75      ]
 [ 0.6875    ]
 [ 0.51923077]]

Note: One other minor difference in your dataset compared to the tutorial data is that there's no header in your data. Set header=None to avoid assigning your first row of data as column headers.

UPDATE To pass your entire dataset to MinMaxScaler, just run difference() on both columns and pass in the transformed vectors for scaling. MinMaxScaler accepts an n-dimensional DataFrame object:

ncol = 2diff_df = pd.concat([difference(df[i], 1) for i in range(1,ncol+1)], axis=1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_values = scaler.fit_transform(diff_df)

Post a Comment for "Array Inside List"