加载的模型在 colab 中预测良好,但在下载时给出相同的标签和准确度

数据挖掘 喀拉斯 张量流 rnn 情绪分析
2022-03-10 09:36:27

我开发了一个循环神经网络,使用 Kaggle 中的 Kazanova/sentiment140 数据集对推文进行情感分析。

该模型如下所示:

def scheduler(epoch):
  if epoch < 10:
    return 0.001
  else:
    return 0.001 * tf.math.exp(0.1 * (10 - epoch))
callback1 = tf.keras.callbacks.LearningRateScheduler(scheduler)
callback2 = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',patience=10, verbose=0, mode='auto',min_delta=0.0001, cooldown=0, min_lr=0)
callback3 = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, patience=3, verbose=0, mode='auto',baseline=None, restore_best_weights=True)
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=max_length, weights=[embeddings_matrix], trainable=False),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(pool_size=4),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
num_epochs = 50

training_padded = np.array(training_sequences)
training_labels = np.array(training_labels)
testing_padded = np.array(test_sequences)
testing_labels = np.array(test_labels)

history = model.fit(training_padded, training_labels, epochs=num_epochs, validation_data=(testing_padded, testing_labels), verbose=2,callbacks=[callback1,callback2])

print("Training Complete")
model.save('sentiment_final.h5')

当从 colab 本身加载时,该模型运行良好并且可以完美地预测输出

加载的 colab 代码:

load_model= tf.keras.models.load_model('sentiment_final.h5')
#load_model.summary()

def decode_sentiment(score):

    if score < 0.5:
        return "NEGATIVE"
    else:
        return "POSITIVE"

def predict(text):
    
    x_test = pad_sequences(tokenizer.texts_to_sequences([text]), maxlen=16)
    
    score = load_model.predict([x_test])[0]

    return {"label": decode_sentiment(score), "score": float(score)}
predict("I love this day") #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
predict("I hate this day") #Outputs -> {'label': 'NEGATIVE', 'score': 0.38644927740097046}
predict("I shouldn't be alive") #Outputs -> {'label': 'NEGATIVE', 'score': 0.12737956643104553}

但是如果我在 VSCode 中加载模型,所有模型的输出都是相同的。

VSCode 实现:

import tensorflow
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import os


tokenizer=Tokenizer()
model = load_model('sentiment_final.h5')

def decode_sentiment(score):

    if score<0.5:
        return "Negative"
    else:
        return "Positive"

def predict_score(text):

    x_test=pad_sequences(tokenizer.texts_to_sequences([text]),maxlen=16)
    score=model.predict([x_test])[0]
    return {"label":decode_sentiment(score),"score": float(score)}

def call_predict_function(text):
    return predict_score(text)

        
print(call_predict_function("I love this day")) #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
print(call_predict_function("I hate this day")) #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
print(call_predict_function("I shouldn't be alive")) #Outputs -> {'label': 'POSITIVE', 'score': 0.793081521987915}
 

我哪里错了?有人可以解决这个问题吗?

1个回答

据我所知,您还需要保存和加载您使用的标记器。标记器没有经过拟合/训练,因此没有输出任何对模型进行预测的合理信息。