NVIDIA DIGITS 3 on EC2

So you have heard a lot about Deep Learning and Convolutional Neural Network, and you want to quickly try it out. But before you dive into the theory you want to get your hands dirty. And you don’t want to write a line of code. You also want to monitor progress of your training process from your smart phone. All I can say is that I respect your laziness! Let’s get started.

Instead of a step by step tutorial on how to install DIGITS on Amazon EC2, if you would rather have an Amazon Machine Image (AMI ) that has DIGITS preinstalled, you can read my follow up article titled “Deep Learning Example using NVIDIA DIGITS 3 on EC2”.

In this post we will learn how to set up a Deep Learning framework ( NVIDIA DIGITS + Caffe / Torch ) on an Amazon EC2 instance. This setup will enable you to schedule training tasks, monitor progress, and visualize results using a web interface.

What is NVIDIA DIGITS ?

DIGITS stands for Deep Learning GPU Training System. It is a web / browser based graphical user interface that allows you to prepare data, set training parameters, choose from some popular neural net architectures (or use your own) and train a deep neural net. It is a perfect tool to get started if you know very little about Deep Learning. Under the hood DIGITS uses Caffe — the popular open source deep learning framework. Support for Torch — a deep learning framework backed by Facebook — is in beta, but you can try it out.

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

GPUs on EC2

One big obstacle in immediately starting with Deep Learning is access to a good GPU. You may not have an NVIDIA card on your laptop and even if you do it may not be very powerful. Sometimes training a deep neural net takes hours and it makes no sense to use your primary computer for the task.

Without a GPU deep learning is painfully slow. In fact, one of contributions of the 2012 paper that firmly established Deep Learning as the undisputed king of image classification algorithms was its clever use of two GPUs.

Fortunately we live in amazing times. We have access to near infinite compute power at our finger tips. All you need to do is to register for Amazon Web Services ( AWS ).

https://aws.amazon.com/

This will give you access to Amazon’s Elastic Compute Cloud (EC2) and its virtually unlimited compute resources ( for a price of course ). The web interface allows you to start a virtual server called an “instance”. We are interested in the two GPU enabled instance types that have the following specifications.

Model	GPUs	vCPU	Mem (GiB)	SSD Storage (GB)
g2.2xlarge	1	8	15	1 x 60
g2.8xlarge	4	32	60	2 x 120

In this tutorial we are going to use g2.2xlarge because it is less expensive ( $0.6 / hour ) and is sufficient for this tutorial. g2.8xlarge comes with 4 GPUs and you can use them all in parallel if you are using DIGITS with Caffe.

Install NVIDIA DIGITS using Amazon Web Services

I am going to assume that you have created an account on Amazon AWS and are logged in. Follow the steps below to set up an EC2 GPU instance. If you are already familiar with the process skip to the next section.

Set up EC2 GPU Instance

Go to EC2 Management Console : On AWS Management Console click on EC2. This will bring you to EC2 Management Console.
Launch instance : On EC2 Management Console go to Instances and click on Launch Instance button
Choose Operating System : From the list of Operating Systems choose Ubuntu 14.04. Then click Next.
Choose instance type : From the list of instance types choose g2.2xlarge. Then click on the Configure Instance Details button at the bottom of the page.
Configure instance details : Make sure the number of instances is one. Pick a Subnet. It does not matter which one you pick. Later if you decided to attach a Volume ( storage space ) to your instance you will need to know the Subnet. Click the Next button.
Add storage : I recommend you add 50GB at least. Click Next.
Note : This storage is NOT permanent. You will lose all data when you terminate your EC2 instance. If you are doing serious work, you should add an EC2 Volume to your instance.
Tag Instance : Pick a name — any name is fine. Then click Next.
Configure security group : Pick the “Create a new security group” option and give your security group a descriptive name. We want two ways to access the server. First, we want to be able to log on to the machine via ssh. Second, we want to open port 80 to run DIGITS web server. Notice these two services are available from my IP address only. You may choose other custom IP. I do not recommend you make it accessible from any IP address.
Review & launch
Download Key : You need a key pair to ssh into this machine. Create a new key if you don’t have one. Choose a descriptive name. The downloaded file will have a .pem extension.
Verify instance : To verify your instance is running, go to the EC2 Management Console, and then click on “Instances”. Copy the public ip address into your clipboard.

Install NVIDIA DIGITS on EC2 GPU Instance

We are now ready to install NVIDIA DIGITS on the GPU instance we created in the last step.

SSH into EC2 Instance : Open a terminal ( on OSX or Linux ) or use an ssh client on Windows to log onto the machine. Type the following command with the full path to the .pem file you had downloaded and the public IP address of your machine.

# Change permission of your ssh key file.
chmod 600 your-pemfile.pem
# SSH into machine.
ssh -Y -i your-pemfile.pem ubuntu@your-public-ip.com
 If you do not change the permission of your ssh key file you may receive the following warning.
 WARNING: UNPROTECTED PRIVATE KEY FILE!
Permissions 0644 for 'yourpem.pem' are too open.
It is recommended that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: sentiment.pem
Permission denied (publickey).

Update and upgrade package manager apt-get : Assuming you were able to log in and are on the server now.

sudo apt-get update &amp;amp;amp;amp;&amp;amp;amp;amp; sudo apt-get -y upgrade

Install linux-image-extra : The base linux kernel package that comes with Ubuntu 14.04 instance on Amazon has some drivers missing. This is done to slim down the size of the linux image. So we need to install the drivers left out of the base package.
```
sudo apt-get install -y linux-image-extra-`uname -r`
 
```

Install NVIDIA drivers

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-352 nvidia-settings

Get CUDA and NVIDIA’s machine learning repos

CUDA_REPO_PKG=cuda-repo-ubuntu1404_7.5-18_amd64.deb &amp;amp;amp;amp;&amp;amp;amp;amp;
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/$CUDA_REPO_PKG &amp;amp;amp;amp;&amp;amp;amp;amp;
sudo dpkg -i $CUDA_REPO_PKG

ML_REPO_PKG=nvidia-machine-learning-repo_4.0-2_amd64.deb &amp;amp;amp;amp;&amp;amp;amp;amp;
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64/$ML_REPO_PKG &amp;amp;amp;amp;&amp;amp;amp;amp;
sudo dpkg -i $ML_REPO_PKG
 The machine learning repo above gives access to digits, caffe-nv, torch, libcudnn4.

Install DIGITS

sudo apt-get update
sudo apt-get install digits
 If everything went well, go to your public IP on the browser, and you will see this screen.

Woohoo! we are all set up! BTW if you relax your security requirements, you can actually view this page and therefore monitor progress of your training process from your smart phone!

Getting Started with NDVIDIA DIGITS

My follow up article titled “Deep Learning Example using NVIDIA DIGITS 3 on EC2” provides a detailed video tutorial on how to use DIGITS for Image Classification.

The github page for DIGITS provides an example for creating a dataset and training at model. Click here to get started.

NDVIDIA DIGITS Configuration FAQ

How can you configure DIGITS to run a different port ?
You can configure DIGITS to run a different port using the following command.
```
sudo dpkg-reconfigure digits
 
```

Where does DIGITS store the datasets and trained models ?
DIGITS stores all data inside /usr/share/digits/digits/jobs.

ls /usr/share/digits/digits/jobs

There are two kinds of jobs directories– 1) Dataset job — contains information about a dataset created using DIGITS 2) Training job — contains information about a model trained using DIGITS. You can tell a jobs directory contains a dataset if it contains labels.txt, mean.binaryproto, train_db, train.txt, val_db, val.txt etc. E.g.

# Here 20160208-182427-0f82 is a Dataset job
$ ls -1 /usr/share/digits/digits/jobs/20160208-182427-0f82
create_train_db.log
create_val_db.log
labels.txt
mean.binaryproto
mean.jpg
status.pickle
train_db
train.txt
val_db
val.txt

On the other hand if it contains a trained model, you will see files named deploy.prototxt, solver.prototxt, train_val.prototxt, snapshot_iter_*.caffemodel etc. E.g.

# Here 20160209-011941-7953 is a Training job
$ ls -1 /usr/share/digits/digits/jobs/20160209-011941-7953
caffe_output.log
deploy.prototxt
snapshot_iter_104.caffemodel
.
.
snapshot_iter_960.caffemodel
snapshot_iter_960.solverstate
solver.prototxt
status.pickle
train_val.prototxt

How to start / stop / restart DIGITS server ?

cd /usr/share/digits
# set new config
sudo python -m digits.config.edit -v
# restart server
sudo stop nvidia-digits-server
sudo start nvidia-digits-server

How to change the default jobs directory in NVIDIA DIGITS ?

As mentioned above, by default DIGITS stores all data inside /usr/share/digits/digits/jobs/ . You probably want a different location for your data. For example, you may want all the DIGITS jobs to be stored on an attached volume. You can do so using the following commands.

cd /usr/share/digits
# set new config
sudo python -m digits.config.edit -v
# restart server
sudo stop nvidia-digits-server
sudo start nvidia-digits-server

NOTE: The new jobs directory you choose should be writable by www-data.

sudo chown -R www-data path_to_new_jobs_dir

How to change configurations in NVIDIA DIGITS ?
The following commands will allow you to change all configurations in DIGITS. The configurations include the jobs directory, the GPUs to use, the log file location, the log level, server name, location of caffe installation and the location of Torch installation.

cd /usr/share/digits
# set new config
sudo python -m digits.config.edit -v
# restart server
sudo stop nvidia-digits-server
sudo start nvidia-digits-server