数据挖掘 - 重新采样图像需要大量时间 - 吾爱随笔录

重新采样图像需要大量时间

数据挖掘 Python 图像分类

2022-02-17 22:46:59

我正在研究医学图像正是 CT 扫描图像，有一种读取此类图像的方法，还有另一种重采样方法，两种方法的代码如下所示：

def load_itk_image(filename):
    # read infos from route
    itkimage = sitk.ReadImage(filename)
    # read imagearray
    numpyImage = sitk.GetArrayFromImage(itkimage)
    # read coords
    # the given coords and spacing comes: x, y, z and we should transfer it to z, y, x
    numpyOrigin = np.array(list(reversed(itkimage.GetOrigin())))
    numpySpacing = np.array(list(reversed(itkimage.GetSpacing())))

    return numpyImage, numpyOrigin, numpySpacing

def resample(image,oldspacing,newspacing):
    start_time = time.time()
    resize_factor = np.array(oldspacing).astype(np.float)/np.array(newspacing).astype(np.float)
    new_real_shape = image.shape * resize_factor
    new_shape = np.round(new_real_shape)
    real_resize_factor = new_shape/image.shape
    image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode = "nearest")
    print("%s time takes in seconds" % (time.time() - start_time))
    return np.array(image)

我的问题是，重新采样功能需要大量时间来重新采样一张 2d 图像大约需要 30 秒，为什么要花这么多时间？有什么方法可以减少重采样时间？

1个回答

虽然我的钱会花在插值函数上，但这里有一些其他的想法：

1.resize_factor每个图像的矩阵都会不同吗？否则，您可以为给定大小的所有图像预先计算一个，然后直接使用它，而不是为每个图像重新计算它。

2.一个方面，可以回答您关于 DS 的其他相关问题）是内存使用情况，如果数组足够大，这可能需要一些时间。您执行的那些矩阵除法将返回一个新矩阵，这意味着一个新的内存块。如果事实证明（根据下面的分析想法），这些除法操作代价高昂，您可以尝试预先分配一个数组来保存所有结果（假设您知道相关的形状），然后用输出填充数组。

3.``scipy.interpolation.zoom ndarray returns annp.array , so you don't need to callresample`on it again in your return statement of the函数。

我建议分析你的脚本。这将允许您查看最常调用哪些函数调用以及哪些函数调用最耗时。然后，您可以专注于进行将对运行时产生最大影响的更改。如果您不熟悉 Python 分析，这是一个很好的示例教程。

您可以使用cProfile内置的 python 模块。您可以通过运行以下命令在终端中非常简单地执行此操作（如果您的脚本设计为以这种方式运行）：

python -m cProfile -o profiling_results.prof your_script.py

用文字（以粗体显示的标志）：python运行模块 cProfile，运行your_script.py，产生输出文件profiling_results.prof。

此命令的通用版本：

python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)

您可以在普通文本编辑器中打开和读取输出文件。后缀是一个常见的约定，因为您可以将其传递给像SnakeViz这样的工具，这将允许您可视化结果并内省脚本的方法树；函数相互调用的顺序。

Kaggle 上有一个 notebook ，你可能会觉得它很有趣，因为它在同一个 LUNA16 数据集上执行类似的任务。

其它你可能感兴趣的问题

上一篇在批量标准化中，不应该使用 DropConnect 损害测试准确性吗？下一篇如何将不同的模型用于基于 DQN 的强化学习深度神经网络？