• Home
  • >
  • CNN
  • >
  • Create Snapchat/Instagram Filters Using Mediapipe

Create Snapchat/Instagram Filters Using Mediapipe

A tutorial for using Mediapipe’s Face-Mesh to create Augmented Reality Facial Filters. We are sharing the code in Python.
Create Snapchat/Instagram filters using Mediapipe
Create Snapchat/Instagram filters using Mediapipe
  1. Introduction
  2. Augmented Reality
  3. How AR filters work
  4. MediaPipe
    1. Why MediaPipe
    2. Face Mesh
  5. Implementation
    1. Prerequisites
    2. Installing MediaPipe
    3. Pipeline overview
    4. Landmark points from Face Mesh
    5. Selecting points we need
    6. Getting filters
    7. Annotating filters with points
    8. Code overview
  6. Final results
  7. Conclusion
    1. Limitations
    2. Possibilities

Introduction

Why invest in make-up or trendy clothes and eyewear when Snapchat and Instagram provide filters that can make you look as wild, exotic, or beautiful as you wish, that too in a matter of seconds. 

Maybe you would want just to goof around and turn into a witch or Santa, or perhaps your favorite fictional character just by adding a filter onto your face. There are hundreds of such filters, all powered by Augmented Reality (AR). 

In this post, we will learn how these Augmented Reality Filters work and how we can create our own filters using a framework called Mediapipe!

Augmented Reality

Augmented Reality (AR) is an interactive experience of a real-world environment where computer-generated digital objects can reside. It is a combination of the real-world and virtual world, whereas Virtual Reality (VR) completely replaces the real environment with a virtual one.

Augmented Reality - mediapipe face filter

Thanks to the advancements in AI, you can build these seemingly magical Augmented Reality filters and run them on the most modest of hardware. 

Augmented Reality has a lot of applications in different domains, such as

  • Gaming: We have all heard about PokemonGo. This game allows you to catch and collect virtual cartoon characters.
  • Retail: Imagine how seamless would be the shopping experience if you could try on clothes with a click of a button.
  • Marketing: Companies can create new experiences to attract the eyes of consumers.
  • Education: New interactive AR experiences can be designed to aid the learning process. 

If you have heard of Metaverse, you know that Facebook is investing heavily in AR. Apple, too came out with Animojis. But by far, the most popular application of AR camera filters on social media apps like Instagram and Snapchat.

Let’s have a closer look at how they work.

How AR filters work

Although it doesn’t seem like it, there are several technologies working together to produce a simple camera filter on your face. Let’s look at the different parts of the puzzle and how they work together.

  1. Face detection:
    First and foremost, your face is detected in the camera frame. In the past, this was achieved using crude detection methods such as the Viola-Jones algorithm, Histogram of Oriented Gradients ( HOG ), etc. In the present age, with advances in machine learning and mobile hardware getting more and more powerful, it is possible to get good face detection with a significantly modest computational load.
  2. Feature key points detection:
    Once the face is detected, the next step is to identify the key points of your facial features. A Facial Landmark predictor is used for this, which aligns the predefined set of feature points to your face.
  3. Applying filter: Finally, the detected key points mesh can be morphed, or a filter can be applied on top of it to change the appearance of the face.

Now that we understand how it works. Let’s implement a couple of face filters using MediaPipe.

MediaPipe

MediaPipe Logo

Source: https://mediapipe.dev/assets/img/brand.svg

MediaPipe is an open-source, cross-platform Machine Learning framework developed by Google researchers. It provides customizable ML solutions. 

Although the Mediapipe project is still in the Alpha stage, its solutions have already been deployed in many everyday applications that we use. Google’s ‘Motion Stills’ and YouTube’s ‘Privacy Blur’ feature are examples.

Why MediaPipe? 

On top of lightweight and blazingly fast performance, MediaPipe supports cross-platform compatibility. The idea is to build an ML model once and deploy it on different platforms and devices with reproducible results. 

Medisnapchat filters using python
aPipe Solutions -

Source: https://google.github.io/mediapipe/

It supports Python, C, Javascript, Android, and IOS platforms and features like Face Detection, Face Mesh, Facial Landmark Detection, Person Segmentation, Object Detection, Human Pose estimation, etc. Can one ask for more? 

Here, we shall focus on Face Mesh.

Overview of MediaPipe Face Mesh

Facemesh Demo - mediapipe face effect

Source: https://google.github.io/mediapipe/solutions/face_mesh.html

MediaPipe Face Mesh provides a whopping 468 3D-face landmarks in real-time, even on mobile devices. Both IOS and Android are supported, so you can build those mobile apps with them and give Snapchat a run for its money. In this blog post, we will use Python with MediaPipe, and OpenCV to implement AR Filters.

Like any Facial-Landmark model, Mediapipe starts with Face Detection and detects landmarks on the detected face. For Face Detection, it uses BlazeFast, which, as the name suggests, is extremely fast and lightweight, and optimized for mobile GPU inference. The Face Detection outputs a cropped region from the video frame. We then run the 3D-Landmark model on the cropped area. 

Implementation

Prerequisites

  1. Python environment
  2. Python modules:
    1. OpenCV
    2. numpy

Installing MediaPipe

Use the following pip command to install the MediaPipe Python package.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!
pip install mediapipe

Pipeline Overview

Before getting into the nitty-gritty of things, let’s discuss our pipeline for implementing AR filters using Mediapipe and OpenCV:

  1. Detect 468 Facial Landmarks using Mediapipe Face Mesh.
  2. Select the relevant landmarks because we do not need 468 landmarks.
  3. Annotate the filters with the selected landmarks.
  4. Load annotations.
  5. Detect landmark points of the face.
  6. Stabilize the landmark points.
  7. Transform the Filter on the face using the landmarks.

Now that you have an overview of the pipeline let’s check the details. 

Landmark points from Face Mesh

We start by importing MediaPipe. Next, we create an instance of Face Mesh with two configurable parameters for detection and tracking landmarks. 

min_detection_confidence=0.5 

min_tracking_confidence=0.5

Finally, we pass in an input image and receive a list of face objects.  

import cv2
import numpy as np
import mediapipe as mp
 
# Configuration Face Mesh.
mp_face_mesh = mp.solutions.face_mesh
face_mesh =  mp_face_mesh.FaceMesh(min_detection_confidence=0.5, min_tracking_confidence=0.5)
 
img = cv2.imread('filters/face.jpg', cv2.IMREAD_UNCHANGED)
image = cv2.cvtColor(cv2.flip(img, 1), cv2.COLOR_BGR2RGB)
# To improve performance.
image.flags.writeable = False
results =  face_mesh.process(image)

Every face in the list contains 468 landmark points. 

Note: The coordinates for the landmark points are normalized to lie between 0 and 1. Before using them, we need to multiply the x and y coordinates with the image’s width and height, respectively. The following code can get you any landmark from the detected ones.

(results.multi_face_landmarks[0].landmark[205].x * image.shape[1], results.multi_face_landmarks[0].landmark[205].y * image.shape[0])

The above code snippet gives us the x and y coordinates of the 205th landmark point of the first face detected in the image. Find the order of the landmark points here

468 Landmarks drawn plotted on the face using mediapipe - face filter python

Selecting relevant Landmarks points

For our application, we have selected the points of the key features of the face, i.e., Eyes, Nose, Lips, Eyebrows, and jawline. These are 75 landmark points similar to those provided by Dlib 68 points face landmark detector with a few points for the forehead added in.

Selected landmarks points - snapchat filters using python

Getting the Filters

Now that you have the coordinates for the landmarks, we will use them to overlay the face with filters. The filters are simply PNG image with transparency. 

Here’s a selection of filters:

Full Face Filters - instagram filters using python

The above filters will cover the entire face.

Annotating filters with points

For annotating the face filter with the selected points, we use a simple annotation tool called Makesense which can run on the browser.

Face filter annotation using makesense

Need to take care of the order of the points. Below are the labels for the points we will be using.

Landmarks Labels

Once all the required points are annotated, the annotations can be exported, and we will have a CSV file containing the coordinates of all the points.

Annotations file

Code overview

Import the required modules

We will be using a faceBlendCommon.py file that contains some function definitions for ‘Delaunay triangulation’ and ‘triangle warping’. We’ll see what they mean when we call them in the code.

import mediapipe as mp
import cv2
import math
import numpy as np
import faceBlendCommon as fbc
import csv

Define paths for filter image and annotations file

VISUALIZE_FACE_POINTS = False

filters_config = {
    'anonymous':
        [{'path': "filters/anonymous.png",
         'anno_path': "filters/anonymous_annotations.csv",
         'morph': True, 'animated': False, 'has_alpha': True}],
    'anime':
        [{'path': "filters/anime.png",
         'anno_path': "filters/anime_annotations.csv",
         'morph': True, 'animated': False, 'has_alpha': True}],
    'dog':
        [{'path': "filters/dog-ears.png",
         'anno_path': "filters/dog-ears_annotations.csv",
         'morph': False, 'animated': False, 'has_alpha': True},
         {'path': "filters/dog-nose.png",
          'anno_path': "filters/dog-nose_annotations.csv",
          'morph': False, 'animated': False, 'has_alpha': True}],
    'cat':
        [{'path': "filters/cat-ears.png",
         'anno_path': "filters/cat-ears_annotations.csv",
         'morph': False, 'animated': False, 'has_alpha': True},
         {'path': "filters/cat-nose.png",
          'anno_path': "filters/cat-nose_annotations.csv",
          'morph': False, 'animated': False, 'has_alpha': True}],
}

Define a function to get landmarks from medipipe

def getLandmarks(img):
    mp_face_mesh = mp.solutions.face_mesh
    selected_keypoint_indices = [127, 93, 58, 136, 150, 149, 176, 148, 152, 377, 400, 378, 379, 365, 288, 323, 356, 70, 63, 105, 66, 55,
                 285, 296, 334, 293, 300, 168, 6, 195, 4, 64, 60, 94, 290, 439, 33, 160, 158, 173, 153, 144, 398, 385,
                 387, 466, 373, 380, 61, 40, 39, 0, 269, 270, 291, 321, 405, 17, 181, 91, 78, 81, 13, 311, 306, 402, 14,
                 178, 162, 54, 67, 10, 297, 284, 389]

    height, width = img.shape[:-1]

    with mp_face_mesh.FaceMesh(max_num_faces=1, static_image_mode=True, min_detection_confidence=0.5) as face_mesh:

        results = face_mesh.process(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

        if not results.multi_face_landmarks:
            print('Face not detected!!!')
            return 0

        for face_landmarks in results.multi_face_landmarks:
            values = np.array(face_landmarks.landmark)
            face_keypnts = np.zeros((len(values), 2))

            for idx,value in enumerate(values):
                face_keypnts[idx][0] = value.x
                face_keypnts[idx][1] = value.y

            # Convert normalized points to image coordinates
            face_keypnts = face_keypnts * (width, height)
            face_keypnts = face_keypnts.astype('int')

            relevant_keypnts = []

            for i in selected_keypoint_indices:
                relevant_keypnts.append(face_keypnts[i])
            return relevant_keypnts
    return 0

Function to load filter image

Get alpha channel from filter image for later use and convert image to BGR

def load_filter(img_path, has_alpha):
    # Read the image
    img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)

    alpha = None
    if has_alpha:
        b, g, r, alpha = cv2.split(img)
        img = cv2.merge((b, g, r))

    return img, alpha

Load landmark points from the annotations file

def load_landmarks(annotation_file):
    with open(annotation_file) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=",")
        points = {}
        for i, row in enumerate(csv_reader):
            # skip head or empty line if it's there
            try:
                x, y = int(row[1]), int(row[2])
                points[row[0]] = (x, y)
            except ValueError:
                continue
        return points

Find a convex hull for Delaunay triangulation using the landmark points. 

Then, to the array of convex hull points, we add the points for the facial features lying inside the hull i.e., Lips, Eyes and Nose. We add these points so that the filter fits better and it can be morphed as we change the expressions on our faces.

def find_convex_hull(points):
    hull = []
    hullIndex = cv2.convexHull(np.array(list(points.values())), clockwise=False, returnPoints=False)
    addPoints = [
        [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59],  # Outer lips
        [60], [61], [62], [63], [64], [65], [66], [67],  # Inner lips
        [27], [28], [29], [30], [31], [32], [33], [34], [35],  # Nose
        [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47],  # Eyes
        [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]  # Eyebrows
    ]
    hullIndex = np.concatenate((hullIndex, addPoints))
    for i in range(0, len(hullIndex)):
        hull.append(points[str(hullIndex[i][0])])

    return hull, hullIndex

The function to load a filter and apply denaulay Triangulation is the filter needs to be warped onto the face

def load_filter(filter_name="dog"):

    filters = filters_config[filter_name]

    multi_filter_runtime = []

    for filter in filters:
        temp_dict = {}

        img1, img1_alpha = load_filter_img(filter['path'], filter['has_alpha'])

        temp_dict['img'] = img1
        temp_dict['img_a'] = img1_alpha

        points = load_landmarks(filter['anno_path'])

        temp_dict['points'] = points

        if filter['morph']:
            # Find convex hull for delaunay triangulation using the landmark points
            hull, hullIndex = find_convex_hull(points)

            # Find Delaunay triangulation for convex hull points
            sizeImg1 = img1.shape
            rect = (0, 0, sizeImg1[1], sizeImg1[0])
            dt = fbc.calculateDelaunayTriangles(rect, hull)

            temp_dict['hull'] = hull
            temp_dict['hullIndex'] = hullIndex
            temp_dict['dt'] = dt

            if len(dt) == 0:
                continue

        if filter['animated']:
            filter_cap = cv2.VideoCapture(filter['path'])
            temp_dict['cap'] = filter_cap

        multi_filter_runtime.append(temp_dict)

    return filters, multi_filter_runtime

What is Delaunay Triangulation ?

Given a set of points in a plane, Triangulation refers to the subdivision of the plane into triangles, with the points as vertices. But Delaunay triangulation stands out because it has some nice properties. In a Delaunay triangulation, triangles are chosen such that no point is inside the circumcircle of any triangle.

In the following Figure, we see a set of landmarks in the left image and the triangulation in the right image. To know more about how Delaunay triangulation works, refer to this blog.

Delaunay triangulation

After we have the Delaunay triangulation for the filter, we can find the same triangulation for the face of the person.

Once we have triangulations of both images, we can transform the triangles from one image(filter) to fit onto another image(person’s face). We achieve this by using the Affine transform that requires 3 points (aka a triangle) on the images to warp them onto each other. To know more about how Affine transform works, refer to this blog.

Warp triangles

Create cap object to process input from a webcam or video file

# Process input from webcam or video file
cap = cv2.VideoCapture(0)

# Some variables we will need later
count = 0
isFirstFrame = True
sigma = 50

# Load an initial filter
iter_filter_keys = iter(filters_config.keys())
filters, multi_filter_runtime = load_filter(next(iter_filter_keys))

We also use Optical Flow to stabilize the points in each video frame.

What is Optical Flow?

The main idea of Optical Flow is to estimate the object’s displacement vector caused by its motion or camera movements. Using that, we can predict the location of landmarks in the next frame and stabilize them.

Optical Flow arrows

Main loop to apply the filter to frames of the video

# The main loop
while True:

    ret, frame = cap.read()
    if not ret:
        break
    else:

        points2 = getLandmarks(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

        # if face is partially detected
        if not points2 or (len(points2) != 75):
            continue

        ################ Optical Flow and Stabilization Code #####################
        img2Gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        if isFirstFrame:
            points2Prev = np.array(points2, np.float32)
            img2GrayPrev = np.copy(img2Gray)
            isFirstFrame = False

        lk_params = dict(winSize=(101, 101), maxLevel=15,
                         criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 0.001))
        points2Next, st, err = cv2.calcOpticalFlowPyrLK(img2GrayPrev, img2Gray, points2Prev,
                                                        np.array(points2, np.float32),
                                                        **lk_params)

        # Final landmark points are a weighted average of detected landmarks and tracked landmarks

        for k in range(0, len(points2)):
            d = cv2.norm(np.array(points2[k]) - points2Next[k])
            alpha = math.exp(-d * d / sigma)
            points2[k] = (1 - alpha) * np.array(points2[k]) + alpha * points2Next[k]
            points2[k] = fbc.constrainPoint(points2[k], frame.shape[1], frame.shape[0])
            points2[k] = (int(points2[k][0]), int(points2[k][1]))

        # Update variables for next pass
        points2Prev = np.array(points2, np.float32)
        img2GrayPrev = img2Gray
        ################ End of Optical Flow and Stabilization Code ###############

        if VISUALIZE_FACE_POINTS:
            for idx, point in enumerate(points2):
                cv2.circle(frame, point, 2, (255, 0, 0), -1)
                cv2.putText(frame, str(idx), point, cv2.FONT_HERSHEY_SIMPLEX, .3, (255, 255, 255), 1)
            cv2.imshow("landmarks", frame)

        for idx, filter in enumerate(filters):

            filter_runtime = multi_filter_runtime[idx]
            img1 = filter_runtime['img']
            points1 = filter_runtime['points']
            img1_alpha = filter_runtime['img_a']

            if filter['morph']:

                hullIndex = filter_runtime['hullIndex']
                dt = filter_runtime['dt']
                hull1 = filter_runtime['hull']

                # create copy of frame
                warped_img = np.copy(frame)

                # Find convex hull
                hull2 = []
                for i in range(0, len(hullIndex)):
                    hull2.append(points2[hullIndex[i][0]])

                mask1 = np.zeros((warped_img.shape[0], warped_img.shape[1]), dtype=np.float32)
                mask1 = cv2.merge((mask1, mask1, mask1))
                img1_alpha_mask = cv2.merge((img1_alpha, img1_alpha, img1_alpha))

                # Warp the triangles
                for i in range(0, len(dt)):
                    t1 = []
                    t2 = []

                    for j in range(0, 3):
                        t1.append(hull1[dt[i][j]])
                        t2.append(hull2[dt[i][j]])

                    fbc.warpTriangle(img1, warped_img, t1, t2)
                    fbc.warpTriangle(img1_alpha_mask, mask1, t1, t2)

                # Blur the mask before blending
                mask1 = cv2.GaussianBlur(mask1, (3, 3), 10)

                mask2 = (255.0, 255.0, 255.0) - mask1

                # Perform alpha blending of the two images
                temp1 = np.multiply(warped_img, (mask1 * (1.0 / 255)))
                temp2 = np.multiply(frame, (mask2 * (1.0 / 255)))
                output = temp1 + temp2
            else:
                dst_points = [points2[int(list(points1.keys())[0])], points2[int(list(points1.keys())[1])]]
                tform = fbc.similarityTransform(list(points1.values()), dst_points)
                # Apply similarity transform to input image
                trans_img = cv2.warpAffine(img1, tform, (frame.shape[1], frame.shape[0]))
                trans_alpha = cv2.warpAffine(img1_alpha, tform, (frame.shape[1], frame.shape[0]))
                mask1 = cv2.merge((trans_alpha, trans_alpha, trans_alpha))

                # Blur the mask before blending
                mask1 = cv2.GaussianBlur(mask1, (3, 3), 10)

                mask2 = (255.0, 255.0, 255.0) - mask1

                # Perform alpha blending of the two images
                temp1 = np.multiply(trans_img, (mask1 * (1.0 / 255)))
                temp2 = np.multiply(frame, (mask2 * (1.0 / 255)))
                output = temp1 + temp2

            frame = output = np.uint8(output)

        cv2.putText(frame, "Press F to change filters", (10, 20), cv2.FONT_HERSHEY_SIMPLEX, .5, (255, 0, 0), 1)

        cv2.imshow("Face Filter", output)

        keypressed = cv2.waitKey(1) & 0xFF
        if keypressed == 27:
            break
        # Put next filter if 'f' is pressed
        elif keypressed == ord('f'):
            try:
                filters, multi_filter_runtime = load_filter(next(iter_filter_keys))
            except:
                iter_filter_keys = iter(filters_config.keys())
                filters, multi_filter_runtime = load_filter(next(iter_filter_keys))

        count += 1

cap.release()
cv2.destroyAllWindows()

Results

Check out the results of our code:

Conclusion

Limitations

It is important to note that we are only working in 2 dimensions, i.e., the x-y plane. Although it gives the illusion of being 3d, our approach thus falls short when 3d object filters come into play, like sunglasses.

Facemesh 3D Filter

Source: https://google.github.io/mediapipe/solutions/face_mesh.html

Possibilities

Facemesh AR effects

Source: https://google.github.io/mediapipe/solutions/face_mesh.html

Don’t forget that the filters we have been using till now have all been two-dimensional images. Mediapipe provides a z-axis, which can be used to create a 3D filter with significant effect. 

Just switch to 3D filters and see for yourself how realistic and exciting the results are. The possibilities are only limited by your imagination and creativity. 

More on Mediapipe

Hang on; the journey doesn’t end here. We have some more exciting blog posts for you to explore!!!

1. Building a Poor Body Posture Detection and Alert System using MediaPipe
2. Gesture Control in Zoom Call using Mediapipe
3. Center Stage for Zoom Calls using MediaPipe
4. Drowsy Driver Detection using Mediapipe
5. Comparing Yolov7 and Mediapipe Pose Estimation models

Never Stop Learning!!!

References

  1. https://google.github.io/mediapipe/solutions/face_mesh.html
  2. https://arxiv.org/abs/1907.05047
  3. https://arxiv.org/abs/1907.06724


Read Next

VideoRAG: Redefining Long-Context Video Comprehension

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

AI Agent in Action: Automating Desktop Tasks with VLMs

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from…

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

 

Get Started with OpenCV

Subscribe To Receive

We hate SPAM and promise to keep your email address safe.​