Computation power needed to train machine learning and deep learning model on large data sets has always been a huge hindrance for machine learning enthusiast. But with jupyter notebook which runs on the cloud, anyone who has the passion to learn can train and come up with great results.
In this post, I will be providing information about the various service that gives us the computation power to us for training models.
Jupyter Notebook on GCP
The collaboratory is a google research project created to help disseminate machine learning education and research. Collaboratory (colab) provides free Jupyter notebook environment that requires no setup and runs entirely in the cloud. It comes pre-installed with most of the machine learning libraries, it acts as a perfect place where you can plug and play and try out stuff where dependency and compute is not an issue.
The notebooks are connected to your google drive, so you can access it any time you want, and also upload or download notebook from GitHub.
GPU and TPU enabling
First, you'll need to enable GPU or TPU for the notebook.
Navigate to Edit ->Notebook Settings, and select TPU from the Hardware Accelerator drop-down.
code to check whether TPU is enabled
import os import pprint import tensorflow as tf
if ‚??COLAB_TPU_ADDR‚?? not in os.environ: print(‚??ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!‚??) else: tpu_address = ‚??grpc://‚?? + os.environ[‚??COLAB_TPU_ADDR‚??] print (‚??TPU address is‚??, tpu_address)
with tf.Session(tpu_address) as session: devices = session.list_devices()
print(‚??TPU devices:‚??) pprint.pprint(devices)
Colab comes with most of machine learning libraries installed, but you can also add libraries easily which are not pre-installed.
Colab supports both the pip and apt package managers.
!pip install torch
!apt-get install graphviz -y
both commands work in colab, don't forget the! (exclamatory) before the command.
There are many ways to upload datasets to the notebook
One can upload files from the local machine.
Upload files from google drive
One can also directly upload datasets from kaggle
Code to upload from local
from google.colab import files uploaded = files.upload()
you can browse and select the file.
Upload files from google drive
PyDrive library is used to upload and files from google drive
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive from google.colab import auth from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client. auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() drive = GoogleDrive(gauth)
You can train and run fashion_mnist online without any dependency here.
Colab is a great tool for everyone who is interested in machine learning, all the educational resource and code snippets to use colab is provided in the official website itself with notebook examples.
Kaggle Kernels is a cloud computational environment that enables reproducible and collaborative analysis.
One can run both Python and R code in kaggle kernel
Kaggle Kernel runs in a remote computational environment. They provide the hardware needed.
At the time of writing, each kernel editing session is provided with the following resources:
4 CPU cores
17 Gigabytes of RAM
6 hours execution time
5 Gigabytes of auto-saved disk space (/kaggle/working)
16 Gigabytes of temporary, scratchpad disk space (outside /kaggle/working)
2 CPU cores
14 Gigabytes of RAM
Kernels in action
Once we create an account at kaggle.com, we can choose a dataset to play with and spin up a new kernel, with just a few clicks.
Click on create new kernel
You will be having jupyter notebook up and running. At the bottom, you will be having the console which you can use, and at the right side you will be having various options like
When you Commit & Run a kernel, you execute the kernel from top to bottom in a separate session from your interactive session. Once it finishes, you will have generated a new kernel version. A kernel version is a snapshot of your work including your compiled code, log files, output files, data sources, and more. The latest kernel version of your kernel is what is shown to users in the kernel viewer.
When you create a kernel for a dataset, the dataset will be preloaded into the notebook in the input directory
you can also click on add data source, to add other datasets
Sharing: you can keep your kernel private, or you can also make it public so that others can learn from your kernel.
Adding GPU: You can add a single NVIDIA Tesla K80 to your kernel. One of the major benefits to using Kernels as opposed to a local machine or your own VM is that the Kernels environment is already pre-configured with GPU-ready software and packages which can be time-consuming and frustrating to set-up. To add a GPU, navigate to the "Settings" pane from the Kernel editor and click the "Enable GPU" option.
Custom package: The kernel has the default packages, if you need any other package you can easily add it by the following ways
Just enter the library name, kaggle will download it for you.
Enter the username/repo name
both methods work fine in adding custom packages.
Kaggle acts as a perfect platform for both providing data, and also the computer to work with the great data provided. It also hosts various competition one can experiment it out to improve one's skill set.
For more resource regarding kaggle link here. If you are new to kaggle you should definitely try the titanic dataset it comes with awesome tutorials.
Since I was not able to cover all the services to train ml model online in this post, there will be a part2 to this post.
All the resource need to learn and practice machine learning is open sourced and available online. From Compute, datasets , algorithms and there are various high-quality tutorials available online for free, all you need is an internet connection, and passion to learn.
Thank you for reading until the end, I hope this article would be useful, as it solves the major problem faced by people who are starting the path towards machine learning and data science.
Machine learning as the potential to transform the world so does you.