数据挖掘 - 我如何计算网络中的每个人有多少人同意他们的意见？ - 吾爱随笔录

我如何计算网络中的每个人有多少人同意他们的意见？

数据挖掘社会网络分析

2022-02-18 16:34:11

我有一个带符号的二部图，其中节点是（1）学生和（2）主题。如果学生在简短回答中提到他们对主题的看法（即，一些学生对一个主题有看法但对另一个没有看法），则在学生和主题节点之间画一条边。边的效价表明意见是正面的还是负面的。

我的问题是：我如何找出有多少其他学生同意某个学生？不仅是他们对哪些话题有意见，还包括意见是什么（正面/负面）。

编辑：基于下面的评论

1) 同意到底是什么意思？是否应该所有现有意见一致，还是仅针对特定主题的意见？如果一个学生对某个主题有其他意见怎么办？

所有现有的观点（包括主题和价）都应该一致。如果一个学生对相同的话题和另一个学生给出了相同的意见，但碰巧也谈到了另一个话题，那将不被视为完全同意。也许还有一种方法可以计算部分一致性？

2）你的问题到底是什么？定义特征很简单：数数。您可能会为算法实现而苦恼吗？

算法实现也许是我想要的。由于我有 100 名学生，因此很难手动计算同意他们的同龄人的数量。因此，如果有一种方法可以为每个学生计算一个值，那将很有帮助。

1个回答

作为一个快速的答案，您可以将每个学生表示为具有个元素（其中是主题数）和值的向量，表示对此的正面/不存在/负面意见话题。 $K$ $K$ $\{+1, 0, -1\}$

然后，两个学生之间一致性的一个简单度量是两个学生向量之间的元素乘积。那就是产品将是：，其中是学生向量。显然，只有两个学生意见一致的主题才会增加总和 [eg and，而意见不一致会减少总和。如果两个学生中的任何一个都没有对某个话题发表意见，那么这个话题在总和中就无关紧要了。 $similarity = \sum_{i=1}^{K}st_1[i]*st_2[i]$ $st_1,st_2$ $1*1=1$ $(-1)*(-1)=1]$

从这个意义上说，您可以找到与特定学生最相似的学生，作为具有最高的学生。如果您真正需要的是每个唯一学生的多个同意学生，那么可以设置阈值的值可以根据您的数据凭经验确定。 $similarity$ $similarity$

这很容易实现，如果您对编码感到满意，我可以在 python 中发布一个示例脚本。不过要考虑的一件事是二分图的格式是什么（.csv、某种图形文件等）。

编辑：小例子。从此处获取使用的示例 .csv 文件。

import pandas as pd
import numpy as np

# Change location of file according to your needs
with open('students_example.csv', 'r') as f:
    df = pd.read_csv(f)
# Print for visualization
print(df.head())
print("~"*25)

# Delete column containing the student_id
del df['Student_ID']
# Parse the pandas DataFrame as matrix
student_vectors = df.as_matrix()
# The number of students at hand, let it be N.
N_students = student_vectors.shape[0]
# Initialize empty matrix of similarity between students
# Its size will be NxN (each student with each other)
similarity_scores = np.zeros((N_students, N_students))
# Iterate over each student vector and calculate the
# similarity with all students
for i, student in enumerate(student_vectors):
    # Reshaping and transposing to get the dot product between each student
    # And all the student vectors
    similarity_scores[i,:] = np.dot(student.reshape(1,-1), student_vectors.T)
# Fill the diagonal (that is the similarity of each student with him/herself)
# with low similarity scores so as not to confuse them with other possibly
# agreeing students
np.fill_diagonal(similarity_scores, -1000)

# Random wanted student for example purposes
wanted_id = 3
# Print Students Opinion
print("Wanted Students Opinion:")
print(df.loc[wanted_id].to_string())
print("~"*25)
print("Most similar:(Student ID = %d)"% np.argsort(similarity_scores[wanted_id,:])[::-1][0])
print df.loc[np.argsort(similarity_scores[wanted_id,:])[::-1][0]].to_string()
print("~"*25)
print("Second most similar:(Student ID = %d)"% np.argsort(similarity_scores[wanted_id,:])[::-1][1])
print df.loc[np.argsort(similarity_scores[wanted_id,:])[::-1][1]].to_string()
print("~"*25)

如果您按照示例进行操作，则通缉学生的输出（带有）的意见：{Trump -1, Net Neutrality -1,Vaccination 1, Obamacare -1} $student_{ID}=3$

会给你另外两个意见相同的学生和他们的身份。

您可以相应地修改脚本以满足您的需要。

PS：抱歉代码乱七八糟，写的比较仓促。另外，我用 Python 2.7 进行了尝试。

其它你可能感兴趣的问题

上一篇关于“sklearn.ensemble.BaggingClassifier”的问题下一篇学习排名：使用成对排名方法构建绝对排名