我有一个输入大小为 100,输出大小为 2 的网络。只有这些层。我应用了 keep_prob 为 0.8 的 dropout,并试图了解结果。
正如预期的那样,每次运行时,dropout 掩码都有大约 17-23 个零,但是,几乎所有的权重都会更新。根据论文:
该训练案例的前向和反向传播仅在此细化网络上完成。
所以我期望在训练的每一步中我的权重中大约有 80 个会发生变化,但实际上它们都在变化(开始时大约 90-95 个变化,在接下来的迭代中它们都发生了变化)。
我不知道这是否与在 Tensorflow 中实现 Dropout 的方式有关。有人知道为什么会这样吗?
这是我正在运行以检查它的代码。
import numpy as np
import tensorflow as tf
# As input, 100 random numbers.
input_size = 100
output_size = 2
x = tf.placeholder(tf.float32,[None, input_size],name="input")
y = tf.placeholder(tf.float32,[None, output_size],name="labels")
with tf.variable_scope("dense1") as scope:
W = tf.get_variable("W",shape=[input_size,output_size],initializer=tf.keras.initializers.he_uniform())
b = tf.get_variable("b",initializer=tf.zeros([output_size]))
dropped = tf.nn.dropout(x,0.8)
dense = tf.matmul(dropped,W)+b
eval_pred = tf.nn.sigmoid(dense,name="prediction")
cost = tf.reduce_mean(tf.losses.absolute_difference(eval_pred,y))
train_step = tf.train.AdamOptimizer(learning_rate=0.01).minimize(cost)
# 20 epochs, batch size of 1
epochs = 20
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
allWeights = []
for i in range(epochs):
x_raw = np.random.random((1,input_size))
y_raw = np.random.random((1,output_size))
[_,c,d,w]=sess.run([train_step,cost,dropped,W], feed_dict={x: x_raw, y: y_raw})
#print("Epoch {0}/{1}. Loss: {2}".format(i+1,epochs,c))
# Numbers will be around 20% of input_size (17-22)
print(np.sum(d==0))
allWeights.append(w)
print("Calculate the difference between W_i and W_{i-1}")
for wi in range(1,len(allWeights)):
difference = allWeights[wi]-allWeights[wi-1]
# I expect that there will be around 20 weights that won't be updated
# so the difference between the current weight and the previous one
# should be zero.
print(np.sum(difference==0))