一般来说,我如何计算运行深度学习网络所需的 GPU 内存?
我问这个问题是因为我对某些网络配置的培训内存不足。
如果 TensorFlow 只存储可调参数所需的内存,并且如果我有大约 800 万,我认为所需的 RAM 将是:
RAM = 8.000.000 * (8 (float64)) / 1.000.000(缩放到 MB)
RAM = 64 MB,对吗?
TensorFlow 需要更多内存来存储每一层的图像?
顺便说一下,这些是我的 GPU 规格:
- 英伟达 GeForce 1050 4GB
网络拓扑
- 网络
- 输入形状 (256,256,4)
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 256, 256, 4) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 256, 256, 64) 2368 input_1[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 256, 256, 64) 0 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 256, 256, 64) 36928 dropout[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 64) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 128 73856 max_pooling2d[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 128, 128, 128 0 conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 128, 128, 128 147584 dropout_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 64, 64, 128) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 64, 64, 256) 295168 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 64, 64, 256) 0 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 64, 64, 256) 590080 dropout_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 32, 32, 256) 0 conv2d_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 32, 32, 512) 1180160 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 32, 32, 512) 0 conv2d_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 32, 32, 512) 2359808 dropout_3[0][0]
__________________________________________________________________________________________________
conv2d_transpose (Conv2DTranspo (None, 64, 64, 256) 524544 conv2d_7[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 64, 64, 512) 0 conv2d_transpose[0][0]
conv2d_5[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 64, 64, 256) 1179904 concatenate[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout) (None, 64, 64, 256) 0 conv2d_8[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 64, 64, 256) 590080 dropout_4[0][0]
__________________________________________________________________________________________________
conv2d_transpose_1 (Conv2DTrans (None, 128, 128, 128 131200 conv2d_9[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 128, 128, 256 0 conv2d_transpose_1[0][0]
conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 128, 128, 128 295040 concatenate_1[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout) (None, 128, 128, 128 0 conv2d_10[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 128, 128, 128 147584 dropout_5[0][0]
__________________________________________________________________________________________________
conv2d_transpose_2 (Conv2DTrans (None, 256, 256, 64) 32832 conv2d_11[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 256, 256, 128 0 conv2d_transpose_2[0][0]
conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 256, 256, 64) 73792 concatenate_2[0][0]
__________________________________________________________________________________________________
dropout_6 (Dropout) (None, 256, 256, 64) 0 conv2d_12[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 256, 256, 64) 36928 dropout_6[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 256, 256, 1) 65 conv2d_13[0][0]
==================================================================================================
Total params: 7,697,921
Trainable params: 7,697,921
Non-trainable params: 0
这是给出的错误。
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-17-d4852b86b8c1> in <module>
23 # Train the model, doing validation at the end of each epoch.
24 epochs = 30
---> 25 result_model = model.fit(train_gen, epochs=epochs, validation_data=val_gen, callbacks=callbacks)
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
106 def _method_wrapper(self, *args, **kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
--> 108 return method(self, *args, **kwargs)
109
110 # Running inside `run_distribute_coordinator` already.
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1096 batch_size=batch_size):
1097 callbacks.on_train_batch_begin(step)
-> 1098 tmp_logs = train_function(iterator)
1099 if data_handler.should_sync:
1100 context.async_wait()
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
778 else:
779 compiler = "nonXla"
--> 780 result = self._call(*args, **kwds)
781
782 new_tracing_count = self._get_tracing_count()
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
838 # Lifting succeeded, so variables are initialized and we can run the
839 # stateless function.
--> 840 return self._stateless_fn(*args, **kwds)
841 else:
842 canon_args, canon_kwds = \
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
2827 with self._lock:
2828 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2829 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
2830
2831 @property
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\function.py in _filtered_call(self, args, kwargs, cancellation_manager)
1846 resource_variable_ops.BaseResourceVariable))],
1847 captured_inputs=self.captured_inputs,
-> 1848 cancellation_manager=cancellation_manager)
1849
1850 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1922 # No tape is watching; skip to running the function.
1923 return self._build_call_outputs(self._inference_function.call(
-> 1924 ctx, args, cancellation_manager=cancellation_manager))
1925 forward_backward = self._select_forward_and_backward_functions(
1926 args,
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
548 inputs=args,
549 attrs=attrs,
--> 550 ctx=ctx)
551 else:
552 outputs = execute.execute_with_cancellation(
~\Anaconda3\envs\tf23\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: OOM when allocating tensor with shape[8,64,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node gradient_tape/functional_1/conv2d_14/Conv2D/Conv2DBackpropInput (defined at <ipython-input-17-d4852b86b8c1>:25) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_function_17207]
Function call stack:
train_function
网络定义中是否存在任何类型的错误?我怎样才能改善网络来解决这个问题?