In many of our previous posts, we used OpenCV DNN Module, which allows running pre-trained neural networks. One of the module’s main drawback is its limited CPU-only inference use since it was the only supported mode. Starting from OpenCV version 4.2, the DNN module supports NVIDIA GPU usage, which means acceleration of CUDA and cuDNN when running deep learning networks on it. This post will help us learn compiling the OpenCV library with DNN GPU support to speed up the neural network inference.
- Installation Instructions for Ubuntu
- Test an example code
Installation Instructions for Ubuntu 18.04
To enable NVIDIA GPU support in OpenCV, we have to compile it from scratch with proper configurations.
Step 1. Prerequisites
To make sure you have everything we will need to start, run the following commands to install packages you may be missing:
sudo apt-get update sudo apt-get upgrade sudo apt-get install build-essential cmake unzip pkg-config sudo apt-get install libjpeg-dev libpng-dev libtiff-dev sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev sudo apt-get install libv4l-dev libxvidcore-dev libx264-dev sudo apt-get install libgtk-3-dev sudo apt-get install libblas-dev liblapack-dev gfortran sudo apt-get install python3-dev
Step 2. Getting OpenCV Sources
The first step is to download the OpenCV library sources. We will use the latest release, which is 4.5.1.
To download it to the
$HOME folder, simply run the following commands in the terminal:
cd ~ wget -O opencv-4.5.1.zip https://github.com/opencv/opencv/archive/4.5.1.zip unzip -q opencv-4.5.1.zip mv opencv-4.5.1 opencv rm -f opencv-4.5.1.zip
You will also need to do the same for the
wget -O opencv_contrib-4.5.1.zip https://github.com/opencv/opencv_contrib/archive/4.5.1.zip unzip -q opencv_contrib-4.5.1.zip mv opencv_contrib-4.5.1 opencv_contrib rm -f opencv_contrib-4.5.1.zip
Step 3. CUDA Installation
The blog has been tested with CUDA 10.2, we recommend you the same version.
Step 4. cuDNN Installation
The blog has been tested with cuDNN v8.0.3, we recommend you the same version.
Step 5. Python Dependencies
If we also want to use OpenCV with DNN GPU support in our Python scripts, we will need to generate OpenCV-Python bindings. You can think of them as bridges that enable calling C++ OpenCV functions from the inside of a python script.
Note that we are using Python 3.7.5.
First of all, let’s create a new virtual environment. Make sure you have virtualenv package, else install it by running the following command:
pip install virtualenv
Now we can create a new virtualenv variable and call it
python3 -m venv ~/env
The last thing we have to do is to activate it:
Now, install numpy package:
pip install numpy
Step 6. Building OpenCV Library
Now that we met all the dependencies, let’s go ahead and build the OpenCV library.
Let’s create a
build directory and navigate to it:
cd ~/opencv mkdir build cd build
Next thing to do is to configure properly, our cmake build by passing correct arguments:
cmake \ -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D INSTALL_PYTHON_EXAMPLES=OFF \ -D INSTALL_C_EXAMPLES=OFF \ -D OPENCV_ENABLE_NONFREE=ON \ -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \ -D PYTHON_EXECUTABLE=~/env/bin/python3 \ -D BUILD_EXAMPLES=ON \ -D WITH_CUDA=ON \ -D WITH_CUDNN=ON \ -D OPENCV_DNN_CUDA=ON \ -D ENABLE_FAST_MATH=ON \ -D CUDA_FAST_MATH=ON \ -D WITH_CUBLAS=ON \ -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.2 \ -D OpenCL_LIBRARY=/usr/local/cuda-10.2/lib64/libOpenCL.so \ -D OpenCL_INCLUDE_DIR=/usr/local/cuda-10.2/include/ \ ..
Note that we,
OPENCV_EXTRA_MODULES_PATHto the location of the opencv_contrib folder, we downloaded earlier
PYTHON_EXECUTABLEto the created python virtual environment
CUDA_TOOLKIT_ROOT_DIRto the installed CUDA
OpenCL_LIBRARYto the shared OpenCL library
OpenCL_INCLUDE_DIRto the directory with the OpenCL header
WITH_CUDNN=ONto enable CUDA and cuDNN support
OPENCV_DNN_CUDA=ONto build the DNN module with CUDA support. This is the most important flag. Without it, the DNN module with CUDA support will not be generated.
WITH_CUBLASare enabled for optimization purposes
If everything is correct, you will get the following message specifying that the configuration is successful:
Double check, that NVIDIA is ON and CUDA has been found:
Now, you can run the
Building will take some time. After it’s done, run:
sudo make install sudo ldconfig
The only thing left is to add a symlink to the Python environment. To do so, navigate to your virtual environment site-packages directory and link the freshly-built OpenCV library:
cd ~/env/lib/python3.x/site-packages/ ln -s /usr/local/lib/python3.x/site-packages/cv2/python-3.x/cv2.cpython-3xm-x86_64-linux-gnu.so cv2.so
Don’t forget to replace the “x” symbol to the version of Python you have. Since we use Python 3.7, in our case, x is 7.
That’s it! Now you can implement the code using OpenCV library with DNN GPU Support. We’ve updated all of our learnopencv.com blog posts that use the DNN module so that they can now utilize GPU. You may check the details below.
Test an example code
We will be testing the OpenPose code, which is available on the blog https://learnopencv.com/deep-learning-based-human-pose-estimation-using-opencv-cpp-python/
We are using AWS system. The system configuration is
processor: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Number of cores: 18
GPU: Tesla K80 12GB
To run the code with CUDA, we will do a simple addition to the C++ and Python code:
This is all the code updation you will need to run code with CUDA acceleration. The results with GPU and CPU back end are as follows.
In this example, the GPU outputs are 10 times FASTER than the CPU output!
GPU takes ~0.2 seconds to execute a frame, whereas CPU takes ~2.2 seconds. CUDA backend has reduced the execution time by upwards of 90% for this code example. Try the CUDA optimisation with our other blog posts and let us know the time improvement you get in the comments.
In this blog post we have installed OpenCV with CUDA support. First we have prepared the system by installing the required OS libraries. Then, we install CUDA and cuDNN on the system. Finally, we build OpenCV for source and explained the different cmake options which we have used. OpenCV is also built for Python virtual environment.