I want to run some machine learning algorithms such as PCA and KNN with a relatively large dataset of images (>2000 rgb images) in order to classify these images.
My source code is the following:
import cv2 import numpy as np import os from glob import glob from sklearn.decomposition import PCA from sklearn import neighbors from sklearn import preprocessing data = [] # Read images from file for filename in glob('Images/*.jpg'): img = cv2.imread(filename) height, width = img.shape[:2] img = np.array(img) # Check that all my images are of the same resolution if height == 529 and width == 940: # Reshape each image so that it is stored in one line img = np.concatenate(img, axis=0) img = np.concatenate(img, axis=0) data.append(img) # Normalise data data = np.array(data) Norm = preprocessing.Normalizer() Norm.fit(data) data = Norm.transform(data) # PCA model pca = PCA(0.95) pca.fit(data) data = pca.transform(data) # K-Nearest neighbours knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data) distances, indices = knn.kneighbors(data) print(indices)
However, my laptop is not sufficient for this task as it needs many hours in order to process more than 700 rgb images. So I need to use the computational resources of an online platform (e.g. like the ones provided by GCP).
Can I simply make a call from Pycharm to Compute Engine API (after a I have created a virtual machine in it) to run my python script?
Or is it possible either to install PyCharm in the virtual machine and run the python script in it or to write my source code in a docker container?
In all, how can I simply run a python script on GCP Compute Engine without wasting time in needless things?