Cloud computing#

Note

Learning Goals

Know where to access free Cloud computing resources for ML research
Understand pros and cons of various free Cloud computing cyberinfrastructure options

Machine learning workflows often require significant computational resources. Toy problems and demos can be constructed to work on typical workstations and laptops. But many workflows such as model training quickly hit bottlenecks either with data management or GPU resources to obtain results in a reasonable amount of time.

Here we provide and overview of several options for researchers to utilize cloud computing services for hackweek projects. We focus on pre-configured services that offer Jupyter servers to connect and run code on remote machines.

We limit discussion to 3 major commercial cloud providers: Microsoft Azure, Amazon Web Services (AWS), and Google Cloud. You can consider “cloud computing” simply as renting computers from these 3 companies!

Warning

This is a fast-evolving space and services and tech specs change rapidly! To the best of our knowledge this information is correct as of September 2023

Data-proximate computing#

ML workflows often require huge volumes of training data. Rather than having to download and store that data, Cloud providers often host large public archives.

Note

You will see better performance and have reduced costs if you make sure that your computation runs in the same Cloud as where your data is stored.

Geoscience community-supported cyberinfrastructure#

All participants of GeoSmart Hackweek have access to a computing environment provided by the CryoCloud project. CryoCloud operates a JupyterHub in the AWS us-west-2 data center (where NASA is storing many public remote sensing datasets). We encourage you to use CryoCloud but also list other options below:

Service	Max vCPU	Max RAM (GB)	Storage (GB)	Datacenter
CryoCloud	4	32	10	AWS us-west-2
Pangeo JupyterHub	16	32	10	GCP us-central-1b
ASF Open Science Lab	8	16	500	AWS us-west-2

Free GPUs#

Many leading machine learning libraries (e.g. tensorflow, pytorch) are designed to take advantage of Graphical Processing Units (GPUs). Typically, using a machine with a GPU on the cloud costs ~$1/hr, but there are some pre-configured services to try things out for free (usually with a time cap). Also, free services have no guarantee on current or future availability. Nevertheless, these are great for experimenting!

Service	vCPU	RAM (GB)	GPU	GPU RAM (GB)	Storage (GB)	Max Session (hr)	Datacenter
Google Colab	2	12	T4	16	40	12	random!
AWS Sagemaker Studio Lab	4	12	T4	16	15	4	us-east-2
Microsoft Planetary Computer	4	32	T4	16	150	12	eu-west-2

Free CPUs#

If you don’t need a GPU (maybe you are just visualizing results), you can access machines that allow longer sessions. As a rough rule of thumb you can expect a machine with a single CPU to cost an order of magnitude less (~$0.1/hr). And once again, there are free options to get started:

Service	Max vCPU	Max RAM (GB)	Storage (GB)	Session (vCPU hr/mo)	Datacenter
GitHub Codespaces	16	32	15	120	Azure
BinderHub	2	4	10	n/a	Various

Guaranteed Access#

If your workflow requires resources or time limits exceeding what is offered by the free services listed above you’ll need your own Cloud account. Configuring Cloud resources and keeping track of costs is non-trivial. Fortunately for researchers, Cloud providers offer generous credit programs.

Also, the free Cloud platforms typically offer an “enhanced” service for a fee:

Google Colab Pro: https://colab.research.google.com/signup
AWS Sagemaker (not studio lab): https://aws.amazon.com/sagemaker
Azure ML: https://azure.microsoft.com/en-us/products/machine-learning
Google Vertex AI: https://cloud.google.com/vertex-ai/docs/workbench/introduction

GeoSMART Hackweek

Cloud computing

Contents

Cloud computing#

Data-proximate computing#

Geoscience community-supported cyberinfrastructure#

Free GPUs#

Free CPUs#

Guaranteed Access#