Face recognition models: This article focuses on the comprehensive examination of existing face recognition models, toolkits, datasets and FR pipelines. From early Eigen faces and Fisher face methods to advanced deep learning techniques, these models have progressively refined the art of identifying individuals from digital imagery. We’ll dissect these fascinating models, shedding light on their unique attributes, strengths, and shortcomings. A careful analysis will illustrate how each successive model, toolkit, or dataset has built upon its predecessors, driving the technology to remarkable new heights. This exploration aims to enrich your understanding of the underlying mechanisms that shape modern face recognition systems.
Existing State-of-the-Art Face Recognition Models
What models are used for face recognition? At the moment, few state-of-the-art models have already been trained on massive datasets. In this section, we will explore some of the top-performing pre-trained models.
DeepFace by Facebook (2014)
Highlights
- DeepFace model works with more than 120 million parameters
- It is a 9-layer deep neural network
- Uses locally connected layers instead of convolutional neural networks
- Trained on Social Face Classification Dataset, with 4 million facial images
- Prediction accuracy: 97.35%
It is a deep learning model designed for facial analysis tasks such as face verification, face recognition, facial attribute analysis, and real-time face analysis. The architecture is shown in Figure 1.
The images in the SFC dataset were collected from a massive collection of face data from Facebook’s user profile dataset. Additionally, the model can perform facial recognition, which involves finding a person’s face in a database of face images. Facial attribute analysis is another capability of DeepFace that describes the visual properties of face images. The model has been tested in real-time face analysis scenarios, demonstrating its ability to perform face recognition and attribute analysis on real-time webcam feeds.
FaceNet by Google (2015)
Highlights
- The FaceNet model works with 140 million parameters
- It is a 22-layer deep convolutional neural network with L2 normalization
- Introduces triplet loss function
- Prediction accuracy: 99.25% on LFW, and 95.12% on YFD dataset
Google’s answer to the face recognition problem was FaceNet. The model’s network architecture is shown in Figure 2:
In this approach, a compact Euclidean space has been implemented where distances directly correspond to the measure of face similarity. There are a few noteworthy features to this model. First, each face is represented by a 128-dimensional byte vector that assists with scalable clustering and recognition. Second, Google introduced triplet loss along with FaceNet (shown in Figure 3). The function follows a less greedy approach. It has the ability to form useful triplets and also takes advantage of the triplet loss function and the triplet selection mechanism for training.
It forms useful triplets using the sample selection techniques. Data is arranged into ‘Triplets’ as an anchor, positive example, and negative example. This data is then fed into a common neural network for training, aiming to reduce the anchor-positive distance and increase the anchor-negative distance. This technique can be expressed mathematically as:
Google trained a few other models and compared the base FaceNet model with them. In Figure 5, the custom model specifications and their respective validation performance metrics have been shown:
VGG-Face by University of Oxford (2015)
Highlights
- VGG-Face model works with 145 million parameters
- It is a 37-layer convolutional neural network, with 11-building blocks
- Model trained on 2.6 million facial image
- Prediction accuracy: 98.95% on LFW, and 97.3% on YFD dataset
The VGG-Face model was developed at the Department of Engineering Science, University of Oxford by a special group known as the “Visual Geometry Group”. Their main goal is to build a sense of vision in Machine and Artificial Intelligence. The model’s architecture has been shown in Figure 6. The overall architecture is a fairly simple one, with a combination of convolution + ReLU, max pooling layers, and the softmax activation function.
Following the footsteps of FaceNet, the VGG-Face model uses the triplet loss function during the training process, to learn the face embedding.
ArcFace (2015)
Highlights
- ArcFace model employs a convolutional neural network backbone
- Introduction of additive angular margin loss
- Uses cosine similarity for recognition
- Prediction accuracy: 99.40% on LFW dataset
The ArcFace architecture has been shown in Figure 7, which involves several key components. The backbone is used to extract high-level features from face images. These features capture important facial characteristics and are used to represent the input faces. Along with this, ArcFace introduces a fully connected layer, often referred to as the “ArcFace Layer,” which computes the angular representation of the features.
This layer applies the arc-cosine function to the dot product of the feature vectors and the corresponding weight vectors. The resulting angles are then used to measure the similarity between different face identities. To enhance the discriminative power of the model, ArcFace incorporates a normalization technique known as additive angular margin. This margin enforces a desired separation between different identities in the angular space. By increasing the margin, the model can better distinguish between similar faces and improve the accuracy of the model. The mathematical implementation of the ArcFace approach has been shown below:
During training, ArcFace optimizes the model parameters by minimizing the angular softmax loss. This loss function encourages the correct class to have a higher probability than other classes while simultaneously enforcing angular separation between classes. It achieves this by penalizing the deviation of the predicted angles from the target angles. In the illustration below, we can see the comparison between softmax loss and ArcFace implementation. It can be observed that softmax loss gives barely separable feature embeddings, while ArcFace Loss can establish a more evident gap between the classes that are near each other.
Popular Face Recognition Toolkits
In the previous section, we explored a few state-of-the-art models. What if you want to implement and test these pretrained models, and perform experiments with them? As a beginner, it would be a complex and time-consuming process. For this exact problem, multiple open-source toolkits have been developed from scratch.
Let’s quickly dive deep and look into a few face recognition toolkits that have the highest amount of traction on GitHub.
OpenCV Face Recognition (with Seventh Sense):
OpenCV Face Recognition represents the cutting-edge face recognition service resulting from the partnership between OpenCV, the leading computer vision library, and Seventh Sense, the creators of the world’s highest-rated face recognition technology.
With the announcement of this toolkit, developers can add face recognition capabilities to their applications with just a few lines of simple code. It is also important to note that this implementation scores top 10 in the 2022 NIST face recognition challenge. It also does not require any prior experience of ML or a GPU, as it is designed to be completely based on API calls. The toolkit can be accessed using the in-built web interface. If you are interested in learning more, then please feel free to check out this detailed article on OpenCV Face Recognition.
DeepFace:
It is a lightweight facial recognition attribute analysis library built for Python. It features a robust pipeline that supports: the detection, alignment, normalization, representation, and verification of faces. For detection, it supports famous detection implementations such as OpenCV, MTCNN, RetinaFace, MediaPipe, Dlib, and SSD. The library also has support for facial verification, to verify whether a given pair of faces are the same or not, along with NumPy array and base64 encoded image support.
DeepFace has a face embedding function with which the embeddings can be represented as multi-dimensional vectors. There is also a dedicated representation function that returns a list of embeddings from the input face image. The library can also perform facial attribute analysis, which looks for parameters such as age, emotion, gender, and also race. One of the biggest features of DeepFace library is its massive support for multiple models. This includes VGG-Face, FaceNet, FaceNet512, OpenFace, DeepFace, DeepID, ArcFace, Dlib, and SFace.
With real-time analysis, the function can focus on a face frame sequentially for every 5 frames. Lastly, it can be served to users as a REST-API service and can also be distributed as docker containers or deployed on a Kubernetes cluster.
TFace:
TFace is an open-source face analysis research platform developed by the Tencent Youtu Lab. It has some helpful functionalities for dataset processing, such as single and multi-dataset support, with IndexParser, ImgSampleParser, and TFRecordSampleParser.
TFace has a backbone Model Zoo with ready-to-use implementations of ResNet(SEResNet), MobileFaceNet, EfficientNet, FBNet, and GhostNet. It also supports loss functions such as CurricularFace, DDL, CIFP, and SCF. In addition to separate functions, it has testing protocols for performing benchmarks on model latency on both ARM and x86 architectures.
InSightFace:
InSightFace is a 2D and 3D deep face analysis library. It efficiently implements a rich variety of state-of-the-art algorithms of face recognition, face detection, and face alignment, which is optimized for both training and deployment. It supports a collection of backbone architectures, including IResNet, RetinaNet, MobileFaceNet, InceptionResNet_v2, and DenseNet. Apart from just models, it enables the use of facial datasets such as MS1M, VGG2 and CASIA-WebFace. InSightFace has a few evaluation pipelines IJB and MegaFace.
Face Detection and Recognition Datasets
The question is, “How many datasets are available for face detection and recognition?” In reality, there are way too many options to choose from, each with its own advantages and disadvantages. Let’s look into a few such open-source datasets.
Face Detection Datasets
- UMD Faces
- For still images: 367,888 facial annotations for 8,277 subjects
- For video frames: 3.7+ Million annotated video frames
- Link(Official): http://umdfaces.io/
- Wider Face
- Covers a variety of variations in facial data: scale, pose, occlusion, expression, illumination and makeup
- No. of images: 32,203 images, with 393,703 labeled faces within the dataset
- Link(Official): http://shuoyang1213.me/WIDERFACE/
Face Recognition Datasets
- Labeled Faces in the Wild – LWF
- No. of images: 13,232 images of 5749 people, where 1680 people have two or more images
- Link(Official): http://vis-www.cs.umass.edu/lfw/
- MS-Celeb-1M
- Diversity of face images has been preserved for each individual
- No. of images: 6,464,018 images
- No. of celebrities: 94,682 individuals
- Link(Unofficial): https://github.com/EB-Dodo/C-MS-Celeb
- VGG Face2
- Bias-free distribution between Male and Female Faces
- No. of faces: 3.3+ Million
- No. of identities: 9,000+
- Variations in: pose, emotion, lighting and occlusion
- Link(Official): https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/
- IMDB-Wiki:
- Combination of facial images from both IMDB and Wikipedia webpages
- No. of images (IMDB): 460,723 images
- No. of images (Wikipedia): 62,328 images
- Link(Official): https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/
- Digi-Face 1M
- Synthetic Face Data
- No. of images: 1+ Million
- 740K images with 10K identities (72 images/identity)
- 500K images with 100K identities (5 images/identity)
- Link(Official): https://microsoft.github.io/DigiFace1M/
Face Recognition Pipeline to Build a Consolidated System
Let’s assume that you are interested in building a system for surveillance. It might occur to you to ask, “What are the different steps involved in building a face recognition system from the ground up?”. Let’s take a step forward and get a high-level overview of the various processes involved in building this system. A block diagram has been shown in Figure 16.
Preprocessing of Input Face Image
There is a saying in AI, “A model is only as good as the data it is fed”, which is very true indeed. It is safe to understand that facial images are generally not usable directly in their original form. In any image, a lot of details lie in shadows or probably in the highlights. Image preprocessing is one method that assists with detail recovery and helps with the cleaning up of raw facial image data. Image operations such as resizing, grayscale filter, histogram equalization filter, and creation of training and validation datasets are considered the most common pre-processing steps for any computer vision-based problem.
In the above illustration, we can see that two filters, grayscale and histogram equalization, have been applied on top of the original RGB image. It can be inferred that the facial features are more prominent after applying the histogram equalization filter. The system’s accuracy and recognition performance can be further improved with quality data to learn.
Perform Face Detection
Given an input image, the model must first narrow down the location of the face in the frame. As of now, there are multiple methods that assist with this process. The most famous face detection technique is the haar-cascade based detector, introduced in 2001. It uses haar-based features to detect faces from the input image frame. In today’s standards, it is considered slow in detecting accuracy and performance, as it is known to get confused with other patterns present in the frame for human faces. Also, it is heavily dependent on lighting conditions and performs poorly in badly lit environments.
Recently, many Deep Learning-based models help with the process of Face Detection. MTCNN was introduced in 2016, and it takes advantage of a cascade structure coupled with the three stages of a convolutional neural network. There’s also an in-built deep neural network-based face detector in OpenCV. It uses a Caffe-model based on SSD architecture and has a ResNet-10 network as its backbone. Most of the Deep Learning-based models also support Multi-face detection from a single input frame where at the end, a bounding box is drawn around the detected face. An appropriate example of this process is shown in Figure 18.
Transformation of Facial Data
If the system is meant to be real-time, then the facial images from the live-feed need to be rotated to be uniformly distributed. When a person comes and stands in front of the camera in the system, it is not possible for their head to be perfectly aligned for the system to recognize. This edge-case needs to be tackled as it plays a big role in recognition. An example of this process has been shown in Figure 19. It can be observed that Elon’s face is tilted on the input image, whereas, after the transformations were applied, his face appears to be straight on the right.
Inference
This is the part where the transformed face frame is forward-passed through the model. Through the course of this article, we had a chance to look at multiple models. Here, you have a choice to choose between pre-trained and building a custom model from the top. In short, the latter takes more time and is harder to develop. Inference can be done on a real-time camera feed or face images given as an input to the trained model.
In the above video illustration, it can be seen that the face in the frame is detected, and the model has successfully recognized the person as Elon Musk.
Deployment
To perform inference, the model must first be deployed to some platform. Let’s look into a few well-known deployment options for deploying models.
Local Deployment
In this approach, the model is deployed on a local machine or a dedicated server. It is integrated into the application or system where it will be used. This method is suitable for applications with low latency requirements or when the model needs to access local resources such as GPUs.
Cloud-based Deployment
Cloud platforms provide infrastructure and services to deploy custom models. The model is uploaded to a cloud service provider Amazon Web Services, GCP, or Microsoft Azure), which manages the deployment and scalability. Cloud-based deployment offers flexibility, scalability, and ease of integration with other cloud services.
Edge Deployment
Edge deployment involves deploying the model directly on edge devices, such as smartphones, IoT devices, or edge servers. This approach enables real-time processing and reduces the need for constant data transfer to the cloud. Edge deployment suits applications with low-latency requirements or when data privacy and bandwidth constraints are important. One downside to this approach is that the prediction accuracy will be less, and performance will be considerably slower.
Conclusion
Deep learning models have revolutionized the field of face recognition, achieving remarkable results in accuracy and robustness. Convolutional neural networks (CNNs) have emerged as the predominant architecture for computer vision tasks, enabling the learning of discriminative features directly from raw pixel data. Models like DeepFace, FaceNet, and ArcFace, have demonstrated state-of-the-art performance and paved the way for further breakthroughs in this domain.
The development of ready-to-use toolkits such as DeepFace, TFace, and InSightFace. These have been instrumental in the advancement of the widespread adaptability of this technology. The success of deep learning-based face recognition can be largely attributed to the availability of diverse and large-scale datasets.
Datasets like LFW, IMDB-Wiki, and MS-Celeb-1M have provided researchers and practitioners with valuable resources for training and evaluating face recognition models. These datasets encompass a wide range of variations in pose, lighting conditions, facial expressions, and identities, enabling models to generalize well across different scenarios.
It is important to note that this technology raises important ethical and privacy considerations. As face recognition becomes more pervasive in society, it is crucial to implement appropriate safeguards to protect individuals’ privacy and ensure responsible usage of the technology.
Your thoughts and insights matter to us. Please feel free to share any reflections, questions, or observations you may have in the comment section below. Whether you have a different perspective, additional information, or even a fresh doubt, your participation helps enrich this learning space. Let’s continue this conversation and further our collective understanding of models used for face recognition.
References
By diving into these resources, you’ll significantly expand your knowledge about face recognition. So make sure you don’t overlook them.
For easy access, consider bookmarking this page. Dive in and elevate your skills to the next level!
Face Recognition – Introduction for Beginners – Nearly Everything You Need to Know
Anti Spoofing Face Recognition System using OAK-D and DepthAI
What is Face Detection? Ultimate Guide 2023 + Model Comparison
Face Detection – Dlib, OpenCV, and Deep Learning ( C++ / Python )
FaceNet: A Unified Embedding for Face Recognition and Clustering
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
A Novel Face Recognition and Temperature Detection System – FRTDS