如何在数据集中产生变化

人工智能 深度学习
2021-11-09 08:59:52

我建立了一个深度学习模型来检测吸毒者使用什么。我有很多症状和每种药物的持续时间。我创建了 X 和 y 数据,但例如 LSD 的效果持续时间为 180 - 720 分钟。我真的需要制作 540 个数组吗?我真的很需要帮助。

我的 LSD 数组:

[28, 180],
[28, 720],
[29, 180],
[29, 720],
[30, 180],
[30, 720],
[31, 180],
[31, 720],
[32, 180],
[32, 720],
[33, 180],
[33, 720],
[34, 180],
[34, 720],
[35, 180],
[35, 720],
[36, 180],
[36, 720],
[37, 180],
[37, 720],
[1, 180],
[1, 720],
[38, 180],
[38, 720],
[12, 180],
[12, 720],
[9, 180],
[9, 720],
[24, 180],
[24, 720],
[17, 180],
[17, 720],
[7, 180],
[7, 720],
[4, 180],
[4, 720],

在第一个位置,我有不同的症状,在第二个位置持续时间。我只是复制了每个症状并设置了最短持续时间和最长持续时间。但这给我带来了一个完美的模型。我知道,我需要为每个症状添加所有分钟,但是我如何使用 python 来实现呢?

症状列表

0 - relaxamento
1 - euforia
2 - diminuicao da memoria a curto prazo
3 - boca seca
4 - habilidades motoras debilitadas
5 - olhos vermelhos
6 - humor
7 - aumento frequencia cardiaca
8 - aumento apetite
9 - concentracao debilitada
10 - sensacao de poder
11 - ausencia de medo
12 - ansiedade
13 - agressividade
14 - excitacao
15 - perda do apetite
16 - tremores
17 - dilatacao da pupila
18 - dentes anestesiados
19 - insonia
20 - movimentos descontrolados
21 - espasmos maxilar
22 - dor de cabeça
23 - visao turva
24 - nauseas
25 - desidratacao
26 - periodos de depressao
27 - perda total da memoria
28 - ilusões
29 - alucinações
30 - grande sensibilidade sensorial
31 - experiências místicas
32 - flashbacks
33 - paranoia
34 - perda da noção temporal e espacial
35 - confusão
36 - perda do controle emocional
37 - sentimento de bem-estar
38 - pânico
39 - sonolencia
40 - batimentos cardiacos diminuem
41 - insuficiencia respiratoria
42 - desanimo
43 - desinteresse pela vida familiar/profissional
44 - sensacao de estar no paraiso
45 - mal-estar
46 - Incapacidade de sentir prazer
47 - Incapacidade de sentir dor

** 持续时间效果(以分钟为单位)**

Cannabis. 120 - 240
Cocain. 30 - 40
Ecstasy. 240 - 480
LSD. 180 - 720
Heroin. 45 - 60

我的完整代码:

X = [
    #cannabis
    [0, 120],
    [0, 240],
    [1, 120],
    [1, 240],
    [2, 120],
    [2, 240],
    [3, 120],
    [3, 240],
    [4, 120],
    [4, 240],
    [5, 120],
    [5, 240],
    [6, 120],
    [6, 240],
    [7, 120],
    [7, 240],
    [8, 120],
    [8, 240],
    [9, 120],
    [9, 240],
    #cocain
    [1, 30],
    [1, 40],
    [10, 30],
    [10, 40],
    [11, 30],
    [11, 40],
    [12, 30],
    [12, 40],
    [13, 30],
    [13, 40],
    [14, 30],
    [14, 40],
    [15, 30],
    [15, 40],
    [7, 30],
    [7, 40],
    [16, 30],
    [16, 40],
    [17, 30],
    [17, 40],
    [18, 30],
    [18, 40],
    #ecstasy
    [19, 240],
    [19, 480],
    [20, 240],
    [20, 480],
    [21, 240],
    [21, 480],
    [22, 240],
    [22, 480],
    [23, 240],
    [23, 480],
    [24, 240],
    [24, 480],
    [25, 240],
    [25, 480],
    [26, 240],
    [26, 480],
    [27, 240],
    [27, 480],
    [15, 240],
    [15, 480],
    #LSD
    [28, 180],
    [28, 720],
    [29, 180],
    [29, 720],
    [30, 180],
    [30, 720],
    [31, 180],
    [31, 720],
    [32, 180],
    [32, 720],
    [33, 180],
    [33, 720],
    [34, 180],
    [34, 720],
    [35, 180],
    [35, 720],
    [36, 180],
    [36, 720],
    [37, 180],
    [37, 720],
    [1, 180],
    [1, 720],
    [38, 180],
    [38, 720],
    [12, 180],
    [12, 720],
    [9, 180],
    [9, 720],
    [24, 180],
    [24, 720],
    [17, 180],
    [17, 720],
    [7, 180],
    [7, 720],
    [4, 180],
    [4, 720],
    # Heroin
    [39, 45],
    [39, 60],
    [29, 45],
    [29, 60],
    [40, 45],
    [40, 60],
    [41, 45],
    [41, 60],
    [42, 45],
    [42, 60],
    [43, 45],
    [43, 60],
    [44, 45],
    [44, 60],
    [12, 45],
    [12, 60],
    [45, 45],
    [45, 60],
    [46, 45],
    [46, 60],
    [1, 45],
    [1, 60],
    [13, 45],
    [13, 60],
    [24, 45],
    [24, 60],
]
"""
    # DROGAS

    0 - Cannabis
    1 - Cocain
    2 - Ecstasy
    3 - LSD
    4 - Heroin
"""
y = [ 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5)

from sklearn import tree
my_classifier = tree.DecisionTreeClassifier()

my_classifier.fit(X_train, y_train)

predictions = my_classifier.predict(X_test)

print(predictions)

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, predictions))

对不起我的英语不好:(谢谢

1个回答

我有很多症状和每种药物的持续时间。我创建了 X 和 y 数据,但例如 LSD 的效果持续时间为 180 - 720 分钟。我真的需要制作 540 个数组吗?

您可以(在这种特殊情况下,一次在 CSV 中生成约 800 行是相当容易的),但您不必:您可以即时应用数据扩充这会在你的训练中增加一些随机性,但通常有助于泛化。

顺便说一句,您似乎并没有真正使用深度学习,而是DecisionTreeClassifier有点不同。