In this tutorial I will explore a few ways to speed up Dlib’s Facial Landmark Detector.
Dlib’s Facial Landmark Detector
Dlib has a very good implementation of a very fast facial landmark detector. I had reviewed it in my post titled Facial Landmark Detection.
Subsequently, I wrote a series of posts that utilize Dlib’s facial landmark detector.
There are two example files in Dlib that deal with facial landmark detection
- For Images : dlib/examples/face_landmark_detection_ex.cpp
- For Videos : dlib/examples/webcam_face_pose_ex.cpp
- For Images : dlib/examples/face_landmark_detection_to_file.cpp
- For Videos : dlib/examples/webcam_face_pose_fast.cpp
This post fully explains all the tricks and provides snippets of code. To get access to the above files, and code and images used in all other posts please subscribe to our newsletter.
About the only complaint I have heard from readers of this blog about Dlib’s facial landmark detector is that it is slow. Is it really slow ? Yes and No. Out of the box it appears to be slow, but that is not because of bad implementation of the Facial Landmark Detector. Let’s find out the bottlenecks and how to improve the speed.
How to make Dlib’s Facial Landmark Detector faster ?
Dlib’s facial landmark detector implements a paper that can detect landmarks in just 1 millisecond! That is 1000 frames a second. You will never get 1000 fps because you first need to detect the face before doing landmark detection and that takes a few 10s of milliseconds. But you can easily do 30 fps with the optimizations listed below.
Compile Dlib in Release Mode with Optimizations turned on
As mentioned in the Dlib’s documentation, it is critical to compile Dlib in release mode with appropriate compiler instructions turned on.
Using CMAKE
cd dlib/examples
mkdir build
cd build
# Enable compiler instructions.
# In the example below I have enabled SSE4
# Use the one that is appropriate for you
# SSE2 works for most Intel or AMD chip.
# cmake .. -DUSE_SSE2_INSTRUCTIONS=ON
# SSE4 works for most current machines
cmake .. -DUSE_SSE4_INSTRUCTIONS=ON
# AVX works on processors released after 2011.
# cmake .. -DUSE_AVX_INSTRUCTIONS=ON
# Compile in release mode
cmake --build . --config Release
If you are using Intel or AMD chip enable at least SSE2 instructions. AVX is the fastest but requires a CPU from at least 2011. SSE4 is the next fastest and is supported by most current machines.
Using Visual Studios
People often make this mistake while using Visual Studios because by default they are working in the debug mode. You can see detailed explanation and how to fix it here.
Using QT
Similarly while using QT you need to turn on Release mode as show below.
Speed Up Face Detection
The following steps will help speed up face detection with small ( probably negligible ) loss in accuracy.
Resize Frame
Facial Landmark Detector algorithms usually require the user to provide a bounding box containing a face. The algorithm takes as input this box and returns the landmarks. The time reported by these algorithms is only the time required to do landmark detection and not the face detection. Landmark detection algorithms can run in less than 5 milliseconds, but face detection can take a long time ( 30 milliseconds ). The speed of face detection depends on the the resolution of the image because with smaller resolution images, you look for a smaller range of face sizes. The downside is that you will miss out smaller faces, but in most of the applications I have listed above we have one person looking at the webcam from arm’s length.
An easy way to speed up face detection is to resize the frame. My webcam records video at 720p ( i.e. 1280×720 ) resolution and I resize the image to a quarter of that for face detection. The bounding box obtained should be resized by dividing the coordinates by the scale used for resizing the original frame. This allows us to do facial landmark detection at full resolution.
Skip frame
Typically webcams record video at 30 fps. In a typical application you are sitting right in front of the webcam and not moving much. So there is no need to detect the face in every frame. We can simply do facial landmark detection based on facial bounding box obtained a few frames earlier. If you do face detection every 3 frames, you can have just sped up landmark detection by almost three times.
Is is possible to do better than using the previous location of the frame ? Yes, we can use Kalman filtering to predict the location of the face in frames where detection is not done, but in a webcam application it is an overkill.
The snippet of code for the above optimizations is show below. Check out the highlighted lines.
#define FACE_DOWNSAMPLE_RATIO 4
#define SKIP_FRAMES 2
cv::VideoCapture cap(0);
cv::Mat im;
cv::Mat im_small, im_display;
frontal_face_detector detector = get_frontal_face_detector();
shape_predictor pose_model;
deserialize("shape_predictor_68_face_landmarks.dat") >> pose_model;
int count = 0;
std::vector<rectangle> faces;
// Grab a frame
cap >> im;
// Resize image for face detection
cv::resize(im, im_small, cv::Size(), 1.0/FACE_DOWNSAMPLE_RATIO, 1.0/FACE_DOWNSAMPLE_RATIO);
// Change to dlib's image format. No memory is copied.
cv_image<bgr_pixel> cimg_small(im_small);
cv_image<bgr_pixel> cimg(im);
// Detect faces on resize image
if ( count % SKIP_FRAMES == 0 )
{
faces = detector(cimg_small);
}
// Find the pose of each face.
std::vector<full_object_detection> shapes;
for (unsigned long i = 0; i < faces.size(); ++i)
{
// Resize obtained rectangle for full resolution image.
rectangle r(
(long)(faces[i].left() * FACE_DOWNSAMPLE_RATIO),
(long)(faces[i].top() * FACE_DOWNSAMPLE_RATIO),
(long)(faces[i].right() * FACE_DOWNSAMPLE_RATIO),
(long)(faces[i].bottom() * FACE_DOWNSAMPLE_RATIO)
);
// Landmark detection on full sized image
full_object_detection shape = pose_model(cimg, r);
shapes.push_back(shape);
// Custom Face Render
render_face(im, shape);
}
Optimizing Display
When I first tried speeding up facial landmark detector, I was surprised to find that a third of the time was spent in drawing the landmarks and displaying the frame. I did two optimizations that helped speed up things
Resize Frame
I resized the image to half resolution for display. This makes a huge difference because when the resolution is changed from 720p to 360p, the actual number of pixels that need to be displayed goes down by a factor of 4.
Custom Face Renderer
Dlib’s face render didn’t work very well for me; the frames did not render smoothly. So I wrote my own using OpenCV’s polylines. The code is shown below
#ifndef BIGVISION_RENDER_FACE_H_
#define BIGVISION_RENDER_FACE_H_
#include <dlib/image_processing/frontal_face_detector.h>
#include <opencv2/highgui/highgui.hpp>
void draw_polyline(cv::Mat &img, const dlib::full_object_detection& d, const int start, const int end, bool isClosed = false)
{
std::vector <cv::Point> points;
for (int i = start; i <= end; ++i)
{
points.push_back(cv::Point(d.part(i).x(), d.part(i).y()));
}
cv::polylines(img, points, isClosed, cv::Scalar(255,0,0), 2, 16);
}
void render_face (cv::Mat &img, const dlib::full_object_detection& d)
{
DLIB_CASSERT
(
d.num_parts() == 68,
"\n\t Invalid inputs were given to this function. "
<< "\n\t d.num_parts(): " << d.num_parts()
);
draw_polyline(img, d, 0, 16); // Jaw line
draw_polyline(img, d, 17, 21); // Left eyebrow
draw_polyline(img, d, 22, 26); // Right eyebrow
draw_polyline(img, d, 27, 30); // Nose bridge
draw_polyline(img, d, 30, 35, true); // Lower nose
draw_polyline(img, d, 36, 41, true); // Left eye
draw_polyline(img, d, 42, 47, true); // Right Eye
draw_polyline(img, d, 48, 59, true); // Outer lip
draw_polyline(img, d, 60, 67, true); // Inner lip
}
#endif // BIGVISION_RENDER_FACE_H_
I also tried rendering all the points using a single polyline hoping to see some improvement in speed, but there was no difference in speed at all.
Results
Using the above optimizations I am able to get a speed of 70 fps on videos recorded at 120 fps. On my webcam I get 27-30 fps because we are limited by the recording speed of the webcam. The reported numbers include the time needed to read the frame from camera or video file, face detection, facial landmark detection and display at half resolution.
thank you Mr SATYA MALLICK :like:
Glad it is useful.
You can use the opencv face detector (LBP cascade) to boost the algorithm more
Other very effective way to speed up is that decrese the pyramid_down size in “frontal_face_detector.h”:
typedef object_detector<scan_fhog_pyramid<pyramid_down > > frontal_face_detector;
The number 6 in the code above is a little large, resulting in a slow face detection which cost most of
time, although it can detect smaller faces.
However, if you do not need to detect such small face, you can set the number smaller, the minimum is 2, the speed can be increased 2-3 times.
Thanks. I had missed that one.
Changing the pyramid_down argument, I didn’t see any significant change in speed. Should I only change it in the “frontal_face_detector.h” or there are some other places as well?
dlib is very perfect job!
Detaching webcam reading, face detection and display, using separate threads for each might also speed up the system even further. You can use one thread for continuously updating the landmarks and displaying them (main thread) while in the background, the other threads are capturing images and recalculating face detection.
How to landmark whole face?
Thank you.
Hello Mr Satya Mallick,
I’m trying to find efficient and correct way to capture mouth pose.
And your face landmark to capture facial detail is super and realtime!!
I have some question that is there any existed dataset for only track mouth movement in real time?
(If the camera is always set in front of mouth)
Or should I have to train my own dataset?
(If giving some document link that would be very helpful!!)
Thank you!!
Maybe processing a cropped version (ROI) from the camera matrix would be good too. It does not need to go through the 10%/20% around the border of the matrix matrix. Just need to recalculate dynamically the ROI and the coordinates of points.
I do not actually get your idea. Could you elaborate it?
Hi, thanks for your great work!
How can I get coordinates of 68 points after using facial landmark detector? I need to know this for detecting mouth closing or opening, detecting smile… Is it possible if I have the coordinates?
Thanks in advance!
Did you find the solution ? I am also trying to achieve the same thing
I tried in 2 ways: make a classifier by SVM and use facial landmark detector. Classifier is quite good especially if you want to build application in mobile. I’m reading this paper for facial landmark detector: http://vision.fe.uni-lj.si/cvww2016/proceedings/papers/05.pdf
I tried in 2 ways: make a classifier by SVM and use facial landmark detector. Classifier is quite good especially if you want to build application in mobile. I’m reading this paper for using facial landmark detector: http://vision.fe.uni-lj.si/cvww2016/proceedings/papers/05.pdf
thanks for this demo but i have a qustion what about exporting the data or file .FBX or …
for face animation in 3D Max
when i’m trying to compile this examples i get this error from compiler:
[ 99%] Building CXX object CMakeFiles/webcam_face_pose_ex.dir/webcam_face_pose_ex.cpp.o
/home/infatum/Projects/dlib/examples/webcam_face_pose_ex.cpp: In function ‘int main()’:
/home/infatum/Projects/dlib/examples/webcam_face_pose_ex.cpp:75:13: error: ‘resize’ is not a member of ‘cv’
cv::resize(im, im_small, cv::Size(), 1.0/FACE_DOWNSAMPLE_RATIO, 1.0/FACE_DOWNSAMPLE_RATIO);
^
make[2]: *** [CMakeFiles/webcam_face_pose_ex.dir/webcam_face_pose_ex.cpp.o] Error 1
make[1]: *** [CMakeFiles/webcam_face_pose_ex.dir/all] Error 2
make: *** [all] Error 2
What am i doing wrong? why id doesn’t recognize resize function?
#include
I cannot download the code 🙁
https://uploads.disquscdn.com/images/07df99fefb54867e0c4f139777c53d5da85421ca7864b4638519702d670682e2.png https://uploads.disquscdn.com/images/7990050bb232c246f4ce5ca86eb969bf5ba712f6b28ad07e961db74a3822e81f.png
HI Mr. Mallick
It may be a very basic question for you. I have been working on different computer vision techniques since two years but using MATLAB only. Now I was trying to follow your blog for facial landmark detection using dlib in codeblock. But I am not able to compile the library. Here are steps I followed:
1. I downloaded and unzipped the dlib
2. use an empty console in codeblock and pasted the example code of facial landmark detection
3. in compiler settings, I added the library as shown in figure 1 and 2 attached.
4. When I build it, it shows the error that don’ put the dlib folder in the include search path. I don’t get the solution of it. What is include search path.
Hello sir, i downloaded your code and run the webcam face pose fast program. But am still not getting 30fps. It is still in range of 15-17 fps only. Am a missing something? Please help.
Also, i want to use dlib only for the purpose of eye corners detection. Is there anyway i can modify code and make it more efficient??
Are you sure your webcam is fast enough? Check its specification.
Can you do profiling of your code to see where exactly most of the time is being spent ?
Hello Satya Mallick. I have been following your tutorial for both OpenCV and Dlib libraries. It is really very helpful for freshers like me to work on computer vision. My aim was to detect and extract face landmarks in mobile device camera preview at real time in both Android and iOS platforms. I am happy to that it works in both the platforms. Compared Android, iOS provides much more better performance due to few compiler optimization and flexible support of iOS platform. Even i have tried real time pose model estimation in iOS it works well. I am trying to improve the performance in Android. Dlib Detection is taking much time(I have tried to resize and skip the frame) among all other processes. I have implemented resizing frame technique it worked pretty well. But skipping frames will cause for flicking problem. I tried to reduce the resolution of camera frame though it improves the performance detection distance will be less. i want to work with at least 720p resolution. I need guidance from you. Please do suggest me few more optimization techniques in mobile application development area. Thank you so much.
have you tried using Android’s face detector instead of dlib ?
Thanks for the reply Satya Mallick.. No I haven’t tried that one.. I have tried opencv face detection and converted resulted rectangle to dlib::rectangle and used it for pose model implementation..
Most of the processing time is consumed by the face detector and the most effective way to speed up things on mobile is to use the face detector provided by Android / iOS. Face detection can take 30 ms or more and after that landmark detection takes just 1 ms.
That’s is really good idea.. I’ll try to implement that.. Thank you for the guidance..
I have subsricbed for long time ago but now I cant get the code?
Thanks a lot for the article but I have some problem:
if I use VS2012, OpenCV2.4.11 and dlib with facial landmarks C++ code, and I use combination of OpenCV face detection and dlib shape prediction, it’s worked, but the speed is very slow ! Is it the VS or OpenCV version too old so the execution time is high? Thank you !
Hi I want to extract facial expression from image how can I do it
thank you
Hi. Thanks for your tips.
I downloaded your code and tried. But camera wasn’t showed. Does that code still work?
Yes it does. What kind of camera are you using? Does it work with any other OpenCV / Dlib code?
I ran with webcam_face_pose_ex and webcam_face_pose_fast. webcam_face_pose_ex can show the camera, but webcam_face_pose_fast doesn’t show anything.
I’m using macbook’s camera
Hi Satya,
Nice work. The code runs perfect with my Mac.
Just one question. Could u give a instruction how shoul I implement a Kalman filter to boost the face detection?
I also try the head pose estimation with dlib with opencv. The accuracy of result is acceptable with solvePnPRansace but jitter a lot. I try solvePnP. It is stable but the pose is totally somehow reversed. The head is always behind the camera. Would u mind giving some hints that I can improve my work.
Thanks for your help in advance.
Hi!
How can I turn on optimization using cmake on an arm machine?
Thank you!
Can we use GPU for speeding up opencv’s face detection?
The other way is to use Yolo: Darknet for face detection.
Hey there! I was using the openCV for Unity3d package to implement face detection into my app. My users told me that there is a huge problem with tracking or even worse detecting black skinned people. Did you had some of these problems too? Is this package (Dlib Face Landmark Detector) much more better and more accurate on detecting black skinned people than those package i included from the openCV for Unity3d package? (which is by the way from the same publisher, but they wrote, they only transmitted the code from another tutorial into that package and also told me i would have to try the dlib package). I wanted to ask you guys, before i buy the plugin. Please note: Iam using Unity3d.
Thanks for an answer!
My friend wants to read a story I wrote in a video on her Youtube channel. I’m concerned that my story could be stolen by some one, and have them claim it as their own, not that I think it’s really good enough for anyone to want to steal it. How likely do you think it would be that my story would be plagiarized? Is there anything Youtube does to try to stop plagiarism?.