• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Learn OpenCV

OpenCV, PyTorch, Keras, Tensorflow examples and tutorials

  • Home
  • Getting Started
    • Installation
    • PyTorch
    • Keras & Tensorflow
    • Resource Guide
  • Courses
    • Opencv Courses
    • CV4Faces (Old)
  • Resources
  • AI Consulting
  • About

NVIDIA DIGITS 3 on EC2

Satya Mallick
February 10, 2016 36 Comments
Deep Learning how-to Install Tutorial

February 10, 2016 By 36 Comments

NVIDIA DIGITS 3 on EC2.

So you have heard a lot about Deep Learning and Convolutional Neural Network, and you want to quickly try it out. But before you dive into the theory you want to get your hands dirty. And you don’t want to write a line of code. You also want to monitor progress of your training process from your smart phone. All I can say is that I respect your laziness! Let’s get started.

Instead of a step by step tutorial on how to install DIGITS on Amazon EC2, if you would rather have an Amazon Machine Image (AMI ) that has DIGITS preinstalled, you can read my follow up article titled “Deep Learning Example using NVIDIA DIGITS 3 on EC2”.

In this post we will learn how to set up a Deep Learning framework ( NVIDIA DIGITS + Caffe / Torch ) on an Amazon EC2 instance. This setup will enable you to schedule training tasks, monitor progress, and visualize results using a web interface.

What is NVIDIA DIGITS ?

DIGITS stands for Deep Learning GPU Training System. It is a web / browser based graphical user interface that allows you to prepare data, set training parameters, choose from some popular neural net architectures (or use your own) and train a deep neural net. It is a perfect tool to get started if you know very little about Deep Learning. Under the hood DIGITS uses Caffe — the popular open source deep learning framework. Support for Torch — a deep learning framework backed by Facebook — is in beta, but you can try it out.

GPUs on EC2

One big obstacle in immediately starting with Deep Learning is access to a good GPU. You may not have an NVIDIA card on your laptop and even if you do it may not be very powerful. Sometimes training a deep neural net takes hours and it makes no sense to use your primary computer for the task.

Without a GPU deep learning is painfully slow. In fact, one of contributions of the 2012 paper that firmly established Deep Learning as the undisputed king of image classification algorithms was its clever use of two GPUs.

Fortunately we live in amazing times. We have access to near infinite compute power at our finger tips. All you need to do is to register for Amazon Web Services ( AWS ).

https://aws.amazon.com/

This will give you access to Amazon’s Elastic Compute Cloud (EC2) and its virtually unlimited compute resources ( for a price of course ). The web interface allows you to start a virtual server called an “instance”. We are interested in the two GPU enabled instance types that have the following specifications.

ModelGPUsvCPUMem (GiB)SSD Storage (GB)
g2.2xlarge18151 x 60
g2.8xlarge432602 x 120

 

In this tutorial we are going to use g2.2xlarge because it is less expensive ( $0.6 / hour ) and is sufficient for this tutorial. g2.8xlarge comes with 4 GPUs and you can use them all in parallel if you are using DIGITS with Caffe.

Install NVIDIA DIGITS using Amazon Web Services

I am going to assume that you have created an account on Amazon AWS and are logged in. Follow the steps below to set up an EC2 GPU instance. If you are already familiar with the process skip to the next section.

Set up EC2 GPU Instance

  1. Go to EC2 Management Console : On AWS Management Console click on EC2. This will bring you to EC2 Management Console.

    Amazon Management Console
  2. Launch instance : On EC2 Management Console go to Instances and click on Launch Instance button

    EC2 Launch Instance
  3. Choose Operating System : From the list of Operating Systems choose Ubuntu 14.04. Then click Next.

    ec2 choose OS
  4. Choose instance type : From the list of instance types choose g2.2xlarge. Then click on the Configure Instance Details button at the bottom of the page.

    ec2-choose-instance-type
  5. Configure instance details : Make sure the number of instances is one. Pick a Subnet. It does not matter which one you pick. Later if you decided to attach a Volume ( storage space ) to your instance you will need to know the Subnet. Click the Next button.

    EC2 Instance Details
  6. Add storage : I recommend you add 50GB at least. Click Next.
    Note : This storage is NOT permanent. You will lose all data when you terminate your EC2 instance. If you are doing serious work, you should add an EC2 Volume to your instance.

    ec2-add-storage
  7. Tag Instance : Pick a name — any name is fine. Then click Next.

    EC2 Tag Instance
  8. Configure security group : Pick the “Create a new security group” option and give your security group a descriptive name. We want two ways to access the server. First, we want to be able to log on to the machine via ssh. Second, we want to open port 80 to run DIGITS web server. Notice these two services are available from my IP address only. You may choose other custom IP. I do not recommend you make it accessible from any IP address.

    EC2 Security Group
  9. Review & launch

    EC2 Launch Instance
  10. Download Key : You need a key pair to ssh into this machine. Create a new key if you don’t have one. Choose a descriptive name. The downloaded file will have a .pem extension.

    EC2 Key
  11. Verify instance : To verify your instance is running, go to the EC2 Management Console, and then click on “Instances”. Copy the public ip address into your clipboard.

    EC2 Verify Instance

Install NVIDIA DIGITS on EC2 GPU Instance

We are now ready to install NVIDIA DIGITS on the GPU instance we created in the last step.

  1. SSH into EC2 Instance : Open a terminal ( on OSX or Linux ) or use an ssh client on Windows to log onto the machine. Type the following command with the full path to the .pem file you had downloaded and the public IP address of your machine.
    # Change permission of your ssh key file. 
    chmod 600 your-pemfile.pem
    # SSH into machine. 
    ssh -Y -i your-pemfile.pem [email protected] 
    

    If you do not change the permission of your ssh key file you may receive the following warning.

     
    WARNING: UNPROTECTED PRIVATE KEY FILE! 
    Permissions 0644 for 'yourpem.pem' are too open.
    It is recommended that your private key files are NOT accessible by others.
    This private key will be ignored.
    bad permissions: ignore key: sentiment.pem
    Permission denied (publickey).
    
  2. Update and upgrade package manager apt-get : Assuming you were able to log in and are on the server now.
    sudo apt-get update && sudo apt-get -y upgrade
    
  3. Install linux-image-extra : The base linux kernel package that comes with Ubuntu 14.04 instance on Amazon has some drivers missing. This is done to slim down the size of the linux image. So we need to install the drivers left out of the base package.
    sudo apt-get install -y linux-image-extra-`uname -r`
    
  4. Install NVIDIA drivers
    sudo add-apt-repository ppa:graphics-drivers/ppa
    sudo apt-get update
    sudo apt-get install nvidia-352 nvidia-settings
    
  5. Get CUDA and NVIDIA’s machine learning repos
    CUDA_REPO_PKG=cuda-repo-ubuntu1404_7.5-18_amd64.deb && 
    wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/$CUDA_REPO_PKG && 
    sudo dpkg -i $CUDA_REPO_PKG
    
    ML_REPO_PKG=nvidia-machine-learning-repo_4.0-2_amd64.deb &&
    wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64/$ML_REPO_PKG &&
    sudo dpkg -i $ML_REPO_PKG
    

    The machine learning repo above gives access to digits, caffe-nv, torch, libcudnn4.

  6. Install DIGITS
    sudo apt-get update
    sudo apt-get install digits
    

    If everything went well, go to your public IP on the browser, and you will see this screen.

    EC2 Digits Installed

Woohoo! we are all set up! BTW if you relax your security requirements, you can actually view this page and therefore monitor progress of your training process from your smart phone!

Getting Started with NDVIDIA DIGITS

My follow up article titled “Deep Learning Example using NVIDIA DIGITS 3 on EC2” provides a detailed video tutorial on how to use DIGITS for Image Classification.

The github page for DIGITS provides an example for creating a dataset and training at model. Click here to get started.

NDVIDIA DIGITS Configuration FAQ

  1. How can you configure DIGITS to run a different port ?
    You can configure DIGITS to run a different port using the following command.

    sudo dpkg-reconfigure digits
    
  2. Where does DIGITS store the datasets and trained models ?
    DIGITS stores all data inside /usr/share/digits/digits/jobs.

    ls /usr/share/digits/digits/jobs
    

    There are two kinds of jobs directories– 1) Dataset job — contains information about a dataset created using DIGITS 2) Training job — contains information about a model trained using DIGITS. You can tell a jobs directory contains a dataset if it contains labels.txt, mean.binaryproto, train_db, train.txt, val_db, val.txt etc. E.g.

    # Here 20160208-182427-0f82 is a Dataset job
    $ ls -1 /usr/share/digits/digits/jobs/20160208-182427-0f82
    create_train_db.log
    create_val_db.log
    labels.txt
    mean.binaryproto
    mean.jpg
    status.pickle
    train_db
    train.txt
    val_db
    val.txt
    

    On the other hand if it contains a trained model, you will see files named deploy.prototxt, solver.prototxt, train_val.prototxt, snapshot_iter_*.caffemodel etc. E.g.

    # Here 20160209-011941-7953 is a Training job
    $ ls -1 /usr/share/digits/digits/jobs/20160209-011941-7953
    caffe_output.log
    deploy.prototxt
    snapshot_iter_104.caffemodel
    .
    .
    snapshot_iter_960.caffemodel
    snapshot_iter_960.solverstate
    solver.prototxt
    status.pickle
    train_val.prototxt
    
  3. How to start / stop / restart DIGITS server ?
    cd /usr/share/digits
    # set new config
    sudo python -m digits.config.edit -v
    # restart server
    sudo stop nvidia-digits-server
    sudo start nvidia-digits-server
    
  4. How to change the default jobs directory in NVIDIA DIGITS ?

    As mentioned above, by default DIGITS stores all data inside /usr/share/digits/digits/jobs/ . You probably want a different location for your data. For example, you may want all the DIGITS jobs to be stored on an attached volume. You can do so using the following commands.

    cd /usr/share/digits
    # set new config
    sudo python -m digits.config.edit -v
    # restart server
    sudo stop nvidia-digits-server
    sudo start nvidia-digits-server
    

    NOTE: The new jobs directory you choose should be writable by www-data.

    sudo chown -R www-data path_to_new_jobs_dir  
    
  5. How to change configurations in NVIDIA DIGITS ?
    The following commands will allow you to change all configurations in DIGITS. The configurations include the jobs directory, the GPUs to use, the log file location, the log level, server name, location of caffe installation and the location of Torch installation.

    cd /usr/share/digits
    # set new config
    sudo python -m digits.config.edit -v
    # restart server
    sudo stop nvidia-digits-server
    sudo start nvidia-digits-server
    

Subscribe

If you liked this article, please subscribe to our newsletter and receive a free
Computer Vision Resource guide. In our newsletter we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now

Tags: CNN convolutional neural network digits DNN nvidia

Filed Under: Deep Learning, how-to, Install, Tutorial

About

I am an entrepreneur with a love for Computer Vision and Machine Learning with a dozen years of experience (and a Ph.D.) in the field.

In 2007, right after finishing my Ph.D., I co-founded TAAZ Inc. with my advisor Dr. David Kriegman and Kevin Barnes. The scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products. Read More…

Getting Started

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

Resources

Download Code (C++ / Python)

ENROLL IN OFFICIAL OPENCV COURSES

I've partnered with OpenCV.org to bring you official courses in Computer Vision, Machine Learning, and AI.
Learn More

Recent Posts

  • Making A Low-Cost Stereo Camera Using OpenCV
  • Optical Flow in OpenCV (C++/Python)
  • Introduction to Epipolar Geometry and Stereo Vision
  • Depth Estimation using Stereo matching
  • Classification with Localization: Convert any Keras Classifier to a Detector

Disclaimer

All views expressed on this site are my own and do not represent the opinions of OpenCV.org or any entity whatsoever with which I have been, am now, or will be affiliated.

GETTING STARTED

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

COURSES

  • Opencv Courses
  • CV4Faces (Old)

COPYRIGHT © 2020 - BIG VISION LLC

Privacy Policy | Terms & Conditions

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.AcceptPrivacy policy