为什么我得到一个元组作为输出而不是 1 和 0 的向量?
你得到这个是因为默认情况下 OneHotEncoder() 使用稀疏矩阵表示。因此,它将 y 的元素转换为类型的元素 -
<1x3 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
如果您希望输出为向量,则只需将 sparse=False 放入 OneHotEncoder()
以下是相同的示例 -
from sklearn import datasets
from sklearn.preprocessing import OneHotEncoder
# Iris dataset
X, y = datasets.load_iris(return_X_y=True)
print("Shape of dataset - ",X.shape, y.shape)
# Your code
def OneHot(y):
ohe = OneHotEncoder(sparse=False)
y = y.reshape(len(y) , 1) # you can also use y = y.reshape(-1, 1) instead
y_hot = ohe.fit_transform(y)
return y_hot
y_oh = OneHot(y)
print("Shape of One Hot Encoded y - ",y_oh.shape)
print("Single element in y - ",y_oh[0])
代码生成以下输出 -
Shape of dataset - (150, 4) (150,)
Shape of One Hot Encoded y - (150, 3)
Single element in y - [1. 0. 0.]