烧瓶输出未显示

数据挖掘 xgboost 超参数调整 一热编码
2022-02-17 09:45:07

我正在尝试XGBClassifier使用flask. 将值提供给网页上的相关字段后,未显示输出。下面是我的代码:

train_x, test_x, train_y, test_y = train_test_split(data1, y, test_size = 0.2, 
random_state=69)

# IMPUTING NAN VALUES
train_x['JobType'].fillna(train_x['JobType'].value_counts().index[0], inplace = True) 
train_x['occupation'].fillna(train_x['occupation'].value_counts().index[0], inplace = True)

test_x['JobType'].fillna(train_x['JobType'].value_counts().index[0], inplace = True)
test_x['occupation'].fillna(train_x['occupation'].value_counts().index[0], inplace = True)

# SEPARATING CATEGORICAL VARIABLES
train_x_cat = train_x.select_dtypes(include = 'object')
train_x_num = train_x.select_dtypes(include = 'number')

test_x_cat = test_x.select_dtypes(include = 'object')
test_x_num = test_x.select_dtypes(include = 'number')

#ONE HOT ENCODING THE CATEGORICAL VARIABLES AND THEN CONCAT THEM TO NUMERICAL VARIABLES
ohe = OneHotEncoder(handle_unknown='ignore', sparse = False)
train_x_encoded = pd.DataFrame(ohe.fit_transform(train_x_cat))
train_x_encoded.columns = ohe.get_feature_names(train_x_cat.columns)

train_x_encoded = train_x_encoded.reset_index(drop = True)
train_x_num = train_x_num.reset_index(drop = True)
train_x1 = pd.concat([train_x_num, train_x_encoded], axis = 1)


test_x_encoded = pd.DataFrame(ohe.transform(test_x_cat))
test_x_encoded.columns = ohe.get_feature_names(test_x_cat.columns)

test_x_encoded = test_x_encoded.reset_index(drop = True)
test_x_num = test_x_num.reset_index(drop = True)
test_x1 = pd.concat([test_x_num, test_x_encoded], axis = 1)

#XGBC MODEL
model = XGBClassifier(random_state = 69)

#Hyperparameter tuning
def objective(trial):
    learning_rate = trial.suggest_float('learning_rate', 0.001, 0.01)
    n_estimators = trial.suggest_int('n_estimators', 10, 500)
    sub_sample = trial.suggest_float('sub_sample', 0.0, 1.0)
    max_depth = trial.suggest_int('max_depth', 1, 20)

    params = {'max_depth' : max_depth,
           'n_estimators' : n_estimators,
           'sub_sample' : sub_sample,
           'learning_rate' : learning_rate}

    model.set_params(**params)

    return np.mean(-1 * cross_val_score(model, train_x1, train_y,
                                    cv = 5, n_jobs = -1, scoring = 'neg_mean_squared_error'))

xgbc_study = optuna.create_study(direction = 'minimize')
xgbc_study.optimize(objective, n_trials = 10)

xgbc_study.best_params
optuna_rfc_mse = xgbc_study.best_value

model.set_params(**xgbc_study.best_params)
model.fit(train_x1, train_y)

这是我的 Flask (app.py) 代码:-

@app.route('/', methods = ['GET', 'POST'])
def main():
    if request.method == 'GET':
       return render_template('index.html')

    if request.method == "POST":
       AGE= request.form['age']
       JOBTYPE= request.form['JobType']
       EDUCATIONTYPE= request.form['EdType']
       MARITALSTATUS= request.form['maritalstatus']
       OCCUPATION= request.form['occupation']
       RELATIONSHIP= request.form['relationship']
       GENDER= request.form['gender']
       CAPITALGAIN= request.form['capitalgain']
       CAPITALLOSS= request.form['capitalloss']
       HOURSPERWEEK= request.form['hoursperweek']
    
       data = [[AGE, JOBTYPE, EDUCATIONTYPE, MARITALSTATUS, OCCUPATION, RELATIONSHIP, 
             GENDER, CAPITALGAIN, CAPITALLOSS, HOURSPERWEEK]]
    
       input_variables = pd.DataFrame(data, columns = ['age', 'JobType', 'EdType', 
                                                       'maritalstatus', 'occupation', 
                                                       'relationship', 'gender', 
                                                       'capitalgain', 'capitalloss', 
                                                       'hrsperweek'], 
                                                       dtype = 'float', index = ['input'])
    
       predictions = model.predict(input_variables)[0]
       print(predictions)
    
       return render_template('index.html', original_input = {'age':AGE, 'JobType':JOBTYPE, 
                                                              'EdType':EDUCATIONTYPE,
                                                           'maritalstatus':MARITALSTATUS, 
                                                           'occupation':OCCUPATION, 
                                                           'relationship':RELATIONSHIP, 
                                                           'gender':GENDER, 
                                                           'capitalgain':CAPITALGAIN,
                                                           'capitalloss':CAPITALLOSS, 
                                                           'hrsperweek':HOURSPERWEEK},
                                                            result = predictions)

我的 index.html 代码:-

<form action="{{ url_for('main') }}" method="POST">
    
    <div class="form_group">
    
        <legend>Input Variables</legend>
        
        <br>age<br>
        <input name="age" type="number" step="any" min="0" class="form 
        control" required>
        <br>
        <-- AND SO ON ALL THE INPUT ARE ADDED -->

        <br>
        <input type="submit" value="Submit" class="btn btn-primary">
        
    </div>
    
</form>
<br>

<div class="result" align="center">
    {% if result %} {% for variable, value in original_input.items() %}
    <b>{{ variable }}</b> : {{ value }} {% endfor %}
    <br>
    <br>
    <h1>Predicted Salary:</h1>
    <p style="font-size:50px">${{ result }}</p>
    {% endif %}
</div>

当我使用 Flask 部署它时,给出网页上每个字段的值,它没有给我预测的输出。相反,它只是刷新输出区域为空白,如红色圆圈所示。我必须添加图像,因为没有其他方式来描述!在此处输入图像描述

提前致谢!

2个回答

您已将请求数据直接传递到模型中。

我们必须做所需的预处理部分,即 OHE 和 Scaling 等

如果它在过去有效,那一定是因为所有特征的数据都是纯浮点数

为了进行预处理,我们必须有训练阶段的编码器和统计数据。

下面是一个展示应有步骤的玩具示例。

# save the model/encoders to disk/DB after training
pickle.dump(model, open(model.sav, 'wb'))
pickle.dump(ohe, open('ohe.sav', 'wb'))

# Load model in the Flask runtime [ Only once ]
model = pickle.load(open('model.sav', 'rb')) # Loaded the Model
ohe_test = pickle.load(open('ohe.sav', 'rb')) # Loaded the OHE
# Get x_mean, x_median, x_std from Database/Files [Where it was saved ]

##Post method for Predict
@app.route('/predict',methods=['POST'])
def predict_():

    # Get request param
    jsonData = flask.request.get_json(force=True)
    data = pd.Series(jsonData)

    #Pre-processing
    data.fillna(x_median,inplace=True) # Fill NA
    data  = (data - x_mean)/x_std # Scale
    data  = ohe_test.transform(data) # OHE

    # Make prediction
    pred = model.predict([data])
  
    # Prepare output
    res = {'pred':pred}

    return flask.Response(response=json.dumps(res), status=200, mimetype='application/json')

笔记 -

  • 处理unknown categories是一项单独的任务。
  • 意图不是指导模型服务。有专门的工具/框架来构建管道

有几种方法可以修复您的代码。

一种选择是编写包含特征工程代码的自定义函数。然后在训练 ( model.fit) 和预测 ( model.predict) 之前调用函数。

另一种选择是使用旨在在训练和预测期间应用适当转换的框架,例如 scikit-learn 的Pipeline