首先是一些愚蠢的完整性检查问题:你的本地机器上有 GPU 吗?(您没有明确提及)。我问是因为它不能在某些笔记本电脑中的集成英特尔显卡上工作。
第二,你安装了Keras和Tensorflow,但是你安装了GPU版的Tensorflow吗?使用 Anaconda,这将通过以下命令完成:
conda install -c anaconda tensorflow-gpu
其他需要了解的有用信息:
- 您使用什么操作系统?(我假设 Linux 例如 Ubuntu)
- 您希望显示什么 GPU 可用?
- 您可以
nvidia-smi
在终端中运行命令:并使用输出更新您的问题吗?
如果您安装了正确的软件包(上述方法是几种可能的方法之一),并且如果您有可用的 Nvidia-GPU,Tensorflow 通常会在启动时默认保留 GPU 的所有可用内存构建静态图。
如果您还没有,那么使用 conda 环境可能是一个好主意,它将您的模型的要求与您的系统可能已经安装的任何内容分开。看看这个关于入门的不错的小演练- 这可能是一个很好的测试,看看你的系统是否能够在 GPU 上运行模型,因为它消除了所有其他可能的问题,创建了与你的脚本无关的组件。简而言之,创建并激活一个包含 GPU 版本的 Tensowflow 的新环境,如下所示:
conda create --name gpu_test tensorflow-gpu # creates the env and installs tf
conda activate gpu_test # activate the env
python test_gpu_script.py # run the script given below
更新
我建议在 CPU 和 GPU 上运行一个小脚本在 Tensorflow 中执行一些操作。这将排除您尝试训练的 RNN 可能没有足够内存的问题。
我为此编写了一个脚本,因此它在 CPU 和 GPU 上测试相同的操作并打印摘要。您应该能够复制粘贴代码并运行它:
import numpy as np
import tensorflow as tf
from datetime import datetime
# Choose which device you want to test on: either 'cpu' or 'gpu'
devices = ['cpu', 'gpu']
# Choose size of the matrix to be used.
# Make it bigger to see bigger benefits of parallel computation
shapes = [(50, 50), (100, 100), (500, 500), (1000, 1000)]
def compute_operations(device, shape):
"""Run a simple set of operations on a matrix of given shape on given device
Parameters
----------
device : the type of device to use, either 'cpu' or 'gpu'
shape : a tuple for the shape of a 2d tensor, e.g. (10, 10)
Returns
-------
out : results of the operations as the time taken
"""
# Define operations to be computed on selected device
with tf.device(device):
random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
sum_operation = tf.reduce_sum(dot_operation)
# Time the actual runtime of the operations
start_time = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
result = session.run(sum_operation)
elapsed_time = datetime.now() - start_time
return result, elapsed_time
if __name__ == '__main__':
# Run the computations and print summary of each run
for device in devices:
print("--" * 20)
for shape in shapes:
_, time_taken = compute_operations(device, shape)
# Print the result and also the time taken on the selected device
print("Input shape:", shape, "using Device:", device, "took: {:.2f}".format(time_taken.seconds + time_taken.microseconds/1e6))
#print("Computation on shape:", shape, "using Device:", device, "took:")
print("--" * 20)
在 CPU 上运行的结果:
Computation on shape: (50, 50), using Device: 'cpu' took: 0.04s
Computation on shape: (500, 500), using Device: 'cpu' took: 0.05s
Computation on shape: (1000, 1000), using Device: 'cpu' took: 0.09s
Computation on shape: (10000, 10000), using Device: 'cpu' took: 32.81s
在 GPU 上运行的结果:
Computation on shape: (50, 50), using Device: 'gpu' took: 0.03s
Computation on shape: (500, 500), using Device: 'gpu' took: 0.04s
Computation on shape: (1000, 1000), using Device: 'gpu' took: 0.04s
Computation on shape: (10000, 10000), using Device: 'gpu' took: 0.05s