数据挖掘 - 如何在 GCP Compute Engine 上运行 python 脚本？ - 吾爱随笔录

如何在 GCP Compute Engine 上运行 python 脚本？

数据挖掘机器学习 Python

2022-02-24 04:14:40

我想使用相对较大的图像数据集（>2000 rgb 图像）运行一些机器学习算法，例如 PCA 和 KNN，以便对这些图像进行分类。

我的源代码如下：

import cv2
import numpy as np
import os
from glob import glob
from sklearn.decomposition import PCA
from sklearn import neighbors
from sklearn import preprocessing


data = []

# Read images from file
for filename in glob('Images/*.jpg'):

    img = cv2.imread(filename)
    height, width = img.shape[:2]
    img = np.array(img)

    # Check that all my images are of the same resolution
    if height == 529 and width == 940:

        # Reshape each image so that it is stored in one line
        img = np.concatenate(img, axis=0)
        img = np.concatenate(img, axis=0)
        data.append(img)

# Normalise data
data = np.array(data)
Norm = preprocessing.Normalizer()
Norm.fit(data)
data = Norm.transform(data)

# PCA model
pca = PCA(0.95)
pca.fit(data)
data = pca.transform(data)

# K-Nearest neighbours
knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data)
distances, indices = knn.kneighbors(data)

print(indices)

但是，我的笔记本电脑不足以完成这项任务，因为它需要很多小时才能处理超过 700 个 rgb 图像。所以我需要使用在线平台的计算资源（例如GCP提供的那些）。

我可以简单地从 Pycharm 调用 Compute Engine API（在我在其中创建虚拟机之后）来运行我的 python 脚本吗？

或者是否可以在虚拟机中安装 PyCharm 并在其中运行 python 脚本，或者在 docker 容器中编写我的源代码？

总而言之，我怎样才能在 GCP Compute Engine 上简单地运行 python 脚本而不浪费时间在不必要的事情上？

2个回答

首先，您需要安装 Cloud SDK：https ://cloud.google.com/sdk/downloads#apt-get

然后，最简单的方法是通过终端运行您的脚本（mac，我认为这些说明也适用于 Linux）：

配置您的项目： gcloud config set project insert_your_project_name
设置 SSH 密钥：gcloud compute config-ssh
连接到虚拟机：gcloud beta compute ssh vm_name --internal-ip
运行脚本：python your_script.py

您也可以将 PyCharm 直接连接到 GCP 并在您的 VM 上运行所有内容，但您需要 PyCharm Pro，否则部署选项不可用。让我知道这个是否奏效。

此外，如果您想使用设置项目的交互式版本，请在步骤 1 中改为：gcloud init

另一种选择是在 GCP 上设置 jupyter notebook。您可以使用以下命令在后台运行 jupyter notebook。

nohup jupyter notebpook --ip=0.0.0.0 &

现在您可以通过 ssh 进入 GCP 来进行隧道传输：

ssh username@<public_ip> -L 8888:127.0.0.1:8888

现在您应该可以在浏览器中使用以下 url 从本地计算机访问 jupyter notebook

127.0.0.1:8888

其它你可能感兴趣的问题

上一篇使用多个距离度量进行聚类下一篇软决策树的详细信息