Skip to content Skip to sidebar Skip to footer

How To Use Missing Parameter Of Xgbregressor Of Scikit-learn

I am working on a dataset which contains missing values in certain columns. I am trying to use XGBRegressor of Scikit-Learn wrapper interface for XGBoost. There it provides a param

Solution 1:

The missing value parameter works as whatever value you provide for 'missing' parameter it treats it as missing value. For example if you provide 0.5 as missing value, then wherever it finds 0.5 in your data it treats it as missing value. Default is NaN. So what XGBoost does is based on the data it defines one of the path as default path. For example based on one parameter say it can go in two directions either left or right, so one of that will be made default based on the data. So whenever one of the missing value comes as input for a parameter, say you defined 0.5 as missing, then whenever 0.5 comes in the data it takes the default path. Initially I thought it imputes the missing value but it does not. It just defines one of the path as default and whenever any missing value come it takes that default path. This is defined in the paper XGBoost: A Scalable Tree Boosting System

Solution 2:

its my understanding you got it mixed up. The missing parameter only replaces a certain value (or list of values) for missing (aka NaN) - the default is "np.nan"

if you want to replace the actual missing values for some different value, lets say "X" you gotta do it on your data before applying the model.

if you got a dataframe "df" you can:

df.fillna(X)

if you got a np.array "array" you can:

np.nan_to_num(array)

but the above will replace the np.nan with zeros.

hope that helps,

Post a Comment for "How To Use Missing Parameter Of Xgbregressor Of Scikit-learn"