数据挖掘 - 烧瓶输出未显示 - 吾爱随笔录

烧瓶输出未显示

数据挖掘 xgboost 超参数调整一热编码

2022-02-17 09:45:07

我正在尝试XGBClassifier使用flask. 将值提供给网页上的相关字段后，未显示输出。下面是我的代码：

train_x, test_x, train_y, test_y = train_test_split(data1, y, test_size = 0.2, 
random_state=69)

# IMPUTING NAN VALUES
train_x['JobType'].fillna(train_x['JobType'].value_counts().index[0], inplace = True) 
train_x['occupation'].fillna(train_x['occupation'].value_counts().index[0], inplace = True)

test_x['JobType'].fillna(train_x['JobType'].value_counts().index[0], inplace = True)
test_x['occupation'].fillna(train_x['occupation'].value_counts().index[0], inplace = True)

# SEPARATING CATEGORICAL VARIABLES
train_x_cat = train_x.select_dtypes(include = 'object')
train_x_num = train_x.select_dtypes(include = 'number')

test_x_cat = test_x.select_dtypes(include = 'object')
test_x_num = test_x.select_dtypes(include = 'number')

#ONE HOT ENCODING THE CATEGORICAL VARIABLES AND THEN CONCAT THEM TO NUMERICAL VARIABLES
ohe = OneHotEncoder(handle_unknown='ignore', sparse = False)
train_x_encoded = pd.DataFrame(ohe.fit_transform(train_x_cat))
train_x_encoded.columns = ohe.get_feature_names(train_x_cat.columns)

train_x_encoded = train_x_encoded.reset_index(drop = True)
train_x_num = train_x_num.reset_index(drop = True)
train_x1 = pd.concat([train_x_num, train_x_encoded], axis = 1)


test_x_encoded = pd.DataFrame(ohe.transform(test_x_cat))
test_x_encoded.columns = ohe.get_feature_names(test_x_cat.columns)

test_x_encoded = test_x_encoded.reset_index(drop = True)
test_x_num = test_x_num.reset_index(drop = True)
test_x1 = pd.concat([test_x_num, test_x_encoded], axis = 1)

#XGBC MODEL
model = XGBClassifier(random_state = 69)

#Hyperparameter tuning
def objective(trial):
    learning_rate = trial.suggest_float('learning_rate', 0.001, 0.01)
    n_estimators = trial.suggest_int('n_estimators', 10, 500)
    sub_sample = trial.suggest_float('sub_sample', 0.0, 1.0)
    max_depth = trial.suggest_int('max_depth', 1, 20)

    params = {'max_depth' : max_depth,
           'n_estimators' : n_estimators,
           'sub_sample' : sub_sample,
           'learning_rate' : learning_rate}

    model.set_params(**params)

    return np.mean(-1 * cross_val_score(model, train_x1, train_y,
                                    cv = 5, n_jobs = -1, scoring = 'neg_mean_squared_error'))

xgbc_study = optuna.create_study(direction = 'minimize')
xgbc_study.optimize(objective, n_trials = 10)

xgbc_study.best_params
optuna_rfc_mse = xgbc_study.best_value

model.set_params(**xgbc_study.best_params)
model.fit(train_x1, train_y)

这是我的 Flask (app.py) 代码：-

@app.route('/', methods = ['GET', 'POST'])
def main():
    if request.method == 'GET':
       return render_template('index.html')

    if request.method == "POST":
       AGE= request.form['age']
       JOBTYPE= request.form['JobType']
       EDUCATIONTYPE= request.form['EdType']
       MARITALSTATUS= request.form['maritalstatus']
       OCCUPATION= request.form['occupation']
       RELATIONSHIP= request.form['relationship']
       GENDER= request.form['gender']
       CAPITALGAIN= request.form['capitalgain']
       CAPITALLOSS= request.form['capitalloss']
       HOURSPERWEEK= request.form['hoursperweek']
    
       data = [[AGE, JOBTYPE, EDUCATIONTYPE, MARITALSTATUS, OCCUPATION, RELATIONSHIP, 
             GENDER, CAPITALGAIN, CAPITALLOSS, HOURSPERWEEK]]
    
       input_variables = pd.DataFrame(data, columns = ['age', 'JobType', 'EdType', 
                                                       'maritalstatus', 'occupation', 
                                                       'relationship', 'gender', 
                                                       'capitalgain', 'capitalloss', 
                                                       'hrsperweek'], 
                                                       dtype = 'float', index = ['input'])
    
       predictions = model.predict(input_variables)[0]
       print(predictions)
    
       return render_template('index.html', original_input = {'age':AGE, 'JobType':JOBTYPE, 
                                                              'EdType':EDUCATIONTYPE,
                                                           'maritalstatus':MARITALSTATUS, 
                                                           'occupation':OCCUPATION, 
                                                           'relationship':RELATIONSHIP, 
                                                           'gender':GENDER, 
                                                           'capitalgain':CAPITALGAIN,
                                                           'capitalloss':CAPITALLOSS, 
                                                           'hrsperweek':HOURSPERWEEK},
                                                            result = predictions)

我的 index.html 代码：-

<form action="{{ url_for('main') }}" method="POST">
    
    <div class="form_group">
    
        <legend>Input Variables</legend>
        
        <br>age<br>
        <input name="age" type="number" step="any" min="0" class="form 
        control" required>
        <br>
        <-- AND SO ON ALL THE INPUT ARE ADDED -->

        <br>
        <input type="submit" value="Submit" class="btn btn-primary">
        
    </div>
    
</form>
<br>

<div class="result" align="center">
    {% if result %} {% for variable, value in original_input.items() %}
    <b>{{ variable }}</b> : {{ value }} {% endfor %}
    <br>
    <br>
    <h1>Predicted Salary:</h1>
    <p style="font-size:50px">${{ result }}</p>
    {% endif %}
</div>

当我使用 Flask 部署它时，给出网页上每个字段的值，它没有给我预测的输出。相反，它只是刷新输出区域为空白，如红色圆圈所示。我必须添加图像，因为没有其他方式来描述！

提前致谢！

2个回答

您已将请求数据直接传递到模型中。

我们必须做所需的预处理部分，即 OHE 和 Scaling 等。

如果它在过去有效，那一定是因为所有特征的数据都是纯浮点数。

为了进行预处理，我们必须有训练阶段的编码器和统计数据。

下面是一个展示应有步骤的玩具示例。

# save the model/encoders to disk/DB after training
pickle.dump(model, open(model.sav, 'wb'))
pickle.dump(ohe, open('ohe.sav', 'wb'))

# Load model in the Flask runtime [ Only once ]
model = pickle.load(open('model.sav', 'rb')) # Loaded the Model
ohe_test = pickle.load(open('ohe.sav', 'rb')) # Loaded the OHE
# Get x_mean, x_median, x_std from Database/Files [Where it was saved ]

##Post method for Predict
@app.route('/predict',methods=['POST'])
def predict_():

    # Get request param
    jsonData = flask.request.get_json(force=True)
    data = pd.Series(jsonData)

    #Pre-processing
    data.fillna(x_median,inplace=True) # Fill NA
    data  = (data - x_mean)/x_std # Scale
    data  = ohe_test.transform(data) # OHE

    # Make prediction
    pred = model.predict([data])
  
    # Prepare output
    res = {'pred':pred}

    return flask.Response(response=json.dumps(res), status=200, mimetype='application/json')

笔记 -

处理unknown categories是一项单独的任务。
意图不是指导模型服务。有专门的工具/框架来构建管道。

有几种方法可以修复您的代码。

一种选择是编写包含特征工程代码的自定义函数。然后在训练 ( model.fit) 和预测 ( model.predict) 之前调用函数。

另一种选择是使用旨在在训练和预测期间应用适当转换的框架，例如 scikit-learn 的Pipeline。

其它你可能感兴趣的问题

上一篇句子嵌入之间的余弦相似度总是正的下一篇是否可以使用神经网络来插入数据？