作为一个快速的答案,您可以将每个学生表示为具有个元素(其中是主题数)和值的向量,表示对此的正面/不存在/负面意见话题。KK{+1,0,−1}
然后,两个学生之间一致性的一个简单度量是两个学生向量之间的元素乘积。那就是产品将是:
,其中是学生向量。显然,只有两个学生意见一致的主题才会增加总和 [eg and,而意见不一致会减少总和。如果两个学生中的任何一个都没有对某个话题发表意见,那么这个话题在总和中就无关紧要了。similarity=∑Ki=1st1[i]∗st2[i]st1,st21∗1=1(−1)∗(−1)=1]
从这个意义上说,您可以找到与特定学生最相似的学生,作为具有最高的学生。如果您真正需要的是每个唯一学生的多个同意学生,那么可以设置阈值的值可以根据您的数据凭经验确定。similaritysimilarity
这很容易实现,如果您对编码感到满意,我可以在 python 中发布一个示例脚本。不过要考虑的一件事是二分图的格式是什么(.csv、某种图形文件等)。
编辑:小例子。从此处获取使用的示例 .csv 文件。
import pandas as pd
import numpy as np
# Change location of file according to your needs
with open('students_example.csv', 'r') as f:
df = pd.read_csv(f)
# Print for visualization
print(df.head())
print("~"*25)
# Delete column containing the student_id
del df['Student_ID']
# Parse the pandas DataFrame as matrix
student_vectors = df.as_matrix()
# The number of students at hand, let it be N.
N_students = student_vectors.shape[0]
# Initialize empty matrix of similarity between students
# Its size will be NxN (each student with each other)
similarity_scores = np.zeros((N_students, N_students))
# Iterate over each student vector and calculate the
# similarity with all students
for i, student in enumerate(student_vectors):
# Reshaping and transposing to get the dot product between each student
# And all the student vectors
similarity_scores[i,:] = np.dot(student.reshape(1,-1), student_vectors.T)
# Fill the diagonal (that is the similarity of each student with him/herself)
# with low similarity scores so as not to confuse them with other possibly
# agreeing students
np.fill_diagonal(similarity_scores, -1000)
# Random wanted student for example purposes
wanted_id = 3
# Print Students Opinion
print("Wanted Students Opinion:")
print(df.loc[wanted_id].to_string())
print("~"*25)
print("Most similar:(Student ID = %d)"% np.argsort(similarity_scores[wanted_id,:])[::-1][0])
print df.loc[np.argsort(similarity_scores[wanted_id,:])[::-1][0]].to_string()
print("~"*25)
print("Second most similar:(Student ID = %d)"% np.argsort(similarity_scores[wanted_id,:])[::-1][1])
print df.loc[np.argsort(similarity_scores[wanted_id,:])[::-1][1]].to_string()
print("~"*25)
如果您按照示例进行操作,则通缉学生的输出(带有)的意见:{Trump -1, Net Neutrality -1,Vaccination 1, Obamacare -1}studentID=3
会给你另外两个意见相同的学生和他们的身份。
您可以相应地修改脚本以满足您的需要。
PS:抱歉代码乱七八糟,写的比较仓促。另外,我用 Python 2.7 进行了尝试。