Correct posture is the key to the overall well-being of a person. It keeps us healthy, energetic, and more active. People who make it a habit to maintain good posture are less likely to get back pain, migraine, and related issues. Good posture is a sign of solid interpersonal discipline. It also helps leave a good impression. However, maintaining a correct posture can be difficult as we often forget it.
This blog post will walk you through the steps required to build a solution for that. You will learn how to use MediaPipe to create a human posture analysis system. We will use side-view samples to perform analysis and draw conclusions. A practical application would require a webcam pointed at the side view of the person. Using the results, you can quickly build a bad posture alert system. If you are new to MediaPipe, we recommend checking out the post on Introduction to MediaPipe.
MediaPipe Pose
MediaPipe Pose is a high-fidelity body pose tracking solution that renders 33 3D landmarks and a background segmentation mask on the whole body from RGB frames (Note RGB image frame). It utilizes BlazePose topology, a superset of COCO, BlazeFace, and BlazePalm topology.
Fig: BlazePose Topology
OBJECTIVE
Our goal is to detect a person from a perfect side view and measure the neck and torso inclination to some reference axis. By monitoring the inclination angle when the person bends below a certain threshold angle. Other features include measuring the time of a particular posture and the camera alignment. We need to make sure that the camera looks at the proper side view. Hence the alignment feature is required.
APPLICATION WORKFLOW
PREREQUISITES
OpenCV and MediaPipe are the primary packages that we will need. Use the requirements.txt file provided within the code folder to install the dependencies.
pip install -r requirements.txt
You will need a working knowledge of OpenCV Python to understand this code. New to OpenCV? Here is a tutorial for beginners.
FULL CODE DOWNLOAD
Ensure that you go through the explanation once before running the script to avoid errors.
CODE EXPLANATION
1. Import Libraries
import cv2
import time
import math as m
import MediaPipe as mp
2. Function to calculate offset distance
The setup requires the person to be in the proper side view. The function findDistance helps us determine offset distance between two points. It can be the hip points, the eyes, or the shoulder, as these points are always more or less symmetric about the central axis of a human body. With this, we will incorporate the camera alignment feature in the script.
def findDistance(x1, y1, x2, y2):
dist = m.sqrt((x2-x1)**2+(y2-y1)**2)
return dist
3. Function to calculate the inclination
The angle is the primary deterministic factor for the posture. We are using the angle subtended by the neckline and the torso line to the y-axis. The neckline connects the shoulder and the eye. Here we take the shoulder as the pivotal point. Similarly, the torso line connects the hip and the shoulder, where the hip is considered a pivotal point.
Fig. Neckline inclination measurement
Taking the neckline as an example, here points are P1(x1, y1) (shoulder), P2(x2, y2) (eye), and P3(x3, y3) (any point on the vertical axis passing through P1). Apparently, for P3 x-coordinate is the same as that of P1. Since y3 is valid for all y, let’s take y3 = 0 for simplicity.
We take the vector approach to find the inner angle of three points. The angle between two vectors P12 and P13 is given by,
# Calculate angle.
def findAngle(x1, y1, x2, y2):
theta = m.acos( (y2 -y1)*(-y1) / (m.sqrt(
(x2 - x1)**2 + (y2 - y1)**2 ) * y1) )
degree = int(180/m.pi)*theta
return degree
4. Function to send alerts
Use this function to send alerts when bad posture is detected. Feel free to get creative and customize as per your convenience. For example, you can connect a Telegram Bot for alerts, which is super easy. Check out Telegram Bots here. Or you can take it up a notch by creating an android app.
def sendWarning(x):
pass
5. Initializations
# Initialize frame counters.
good_frames = 0
bad_frames = 0
# Font type.
font = cv2.FONT_HERSHEY_SIMPLEX
# Colors.
blue = (255, 127, 0)
red = (50, 50, 255)
green = (127, 255, 0)
dark_blue = (127, 20, 0)
light_green = (127, 233, 100)
yellow = (0, 255, 255)
pink = (255, 0, 255)
# Initialize mediapipe pose class.
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()
6. Main Function
6.1. Create Video Capture and Video Writer Objects
For demonstration, we are using pre-recorded video samples. You would need to position the webcam to capture your side view in practice. In the following snippet, video capture and video writer objects are created. As you can see, we are acquiring the video metadata to be used in creating the video capture object. If you want to write in mp4 format, change the codec to *’mp4v’.
# For webcam input replace file name with 0.
file_name = 'input.mp4'
cap = cv2.VideoCapture(file_name)
# Meta.
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_size = (width, height)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
# Video writer.
video_output = cv2.VideoWriter('output.mp4', fourcc, fps, frame_size)
6.2. Main Loop
Get key points
The configurable APIs of Pose() solution do not require much tweaking. The default values are enough to detect pose landmarks. However, if we want the utility to generate a segmentation mask, the ENABLE_SEGMENTATION flag must be set True. Following are some of the configurable APIs in the MediaPipe pose solution.
- STATIC_IMAGE_MODE: It’s a boolean. If set to True, person detection runs for every input image. This is not necessary for videos, where detection runs once followed by landmark tracking. The default value is False.
- MODEL_COMPLEXITY: Default value is 1. It can be 0, 1, or 2. If higher complexity is chosen, the inference time increases.
- ENABLE_SEGMENTATION: If set to True, the solution generates a segmentation mask along with the pose landmarks. The default value is False.
- MIN_DETECTION_CONFIDENCE: Ranges from [0.0 – 1.0]. As the name suggests, it is the least confidence value for the detection to be considered valid. The default value is 0.5.
- MIN_TRACKING_CONFIDENCE: Ranges from [0.0 – 1.0]. It is the minimum confidence value for a landmark to be considered tracked. The default value is 0.5.
Usually, the default values do a good job; hence we are not passing any argument in mp_pose.Pose().
Check out this link for more information on APIs. The following section deals with the processing of the RGB frames, from which we can extract the pose landmarks later. At last, we convert the image back to OpenCV-friendly BGR color space.
# Capture frames.
success, image = cap.read()
if not success:
print("Null.Frames")
break
# Get fps.
fps = cap.get(cv2.CAP_PROP_FPS)
# Get height and width of the frame.
h, w = image.shape[:2]
# Convert the BGR image to RGB.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the image.
keypoints = pose.process(image)
# Convert the image back to BGR.
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
6.3. Acquire landmark coordinates
The pose_landmarks attribute of the solution output object provides the landmarks’ normalized x and y coordinates. Hence, to get actual values, we need to multiply the outputs with the image’s width and height, respectively. The landmarks’ LEFT_SHOULDER’, ‘RIGHT_SHOULDER’ etc., are the attributes of PoseLandmark class. To acquire the normalized coordinates, we use the following syntax.
norm_coordinate = pose.process(image).pose_landmark.landmark[MediaPipe.solutions.pose.PoseLandmark.<SPECIFIC_LANDMARK>].coordinate
The methods are shortened using representations, as shown below.
# Use lm and lmPose as representative of the following methods.
lm = keypoints.pose_landmarks
lmPose = mp_pose.PoseLandmark
# Left shoulder.
l_shldr_x = int(lm.landmark[lmPose.LEFT_SHOULDER].x * w)
l_shldr_y = int(lm.landmark[lmPose.LEFT_SHOULDER].y * h)
# Right shoulder.
r_shldr_x = int(lm.landmark[lmPose.RIGHT_SHOULDER].x * w)
r_shldr_y = int(lm.landmark[lmPose.RIGHT_SHOULDER].y * h)
# Left ear.
l_ear_x = int(lm.landmark[lmPose.LEFT_EAR].x * w)
l_ear_y = int(lm.landmark[lmPose.LEFT_EAR].y * h)
# Left hip.
l_hip_x = int(lm.landmark[lmPose.LEFT_HIP].x * w)
l_hip_y = int(lm.landmark[lmPose.LEFT_HIP].y * h)
6.4. Align camera
This is to make sure that the camera captures the person’s proper side view. We are measuring the horizontal distance between the left and right shoulder points. With correct alignment, the left and right points should almost coincide.
Note that the offset distance threshold is based on observation on a dataset having the exact dimensions as the video samples. If you try on higher resolution samples, this value will change. It does not have to be very concrete; you can set a threshold value based on your intuition. Actually, the distance method is not at all a correct way to determine the alignment. It should be angle-based. But let’s just use the distance method for simplicity.
# Calculate distance between left shoulder and right shoulder points.
offset = findDistance(l_shldr_x, l_shldr_y, r_shldr_x, r_shldr_y)
# Assist to align the camera to point at the side view of the person.
# Offset threshold 30 is based on results obtained from analysis over 100 samples.
if offset < 100:
cv2.putText(image, str(int(offset)) + ' Aligned', (w - 150, 30), font, 0.9, green, 2)
else:
cv2.putText(image, str(int(offset)) + ' Not Aligned', (w - 150, 30), font, 0.9, red, 2)
6.5. Calculate inclination and draw landmarks
The inclination angles are obtained using the predefined function findAngle. Landmarks and their connections are drawn as shown below.
# Calculate angles.
neck_inclination = findAngle(l_shldr_x, l_shldr_y, l_ear_x, l_ear_y)
torso_inclination = findAngle(l_hip_x, l_hip_y, l_shldr_x, l_shldr_y)
# Draw landmarks.
cv2.circle(image, (l_shldr_x, l_shldr_y), 7, yellow, -1)
cv2.circle(image, (l_ear_x, l_ear_y), 7, yellow, -1)
# Let's take y - coordinate of P3 100px above x1, for display elegance.
# Although we are taking y = 0 while calculating angle between P1,P2,P3.
cv2.circle(image, (l_shldr_x, l_shldr_y - 100), 7, yellow, -1)
cv2.circle(image, (r_shldr_x, r_shldr_y), 7, pink, -1)
cv2.circle(image, (l_hip_x, l_hip_y), 7, yellow, -1)
# Similarly, here we are taking y - coordinate 100px above x1. Note that
# you can take any value for y, not necessarily 100 or 200 pixels.
cv2.circle(image, (l_hip_x, l_hip_y - 100), 7, yellow, -1)
# Put text, Posture and angle inclination.
# Text string for display.
angle_text_string = 'Neck : ' + str(int(neck_inclination)) + ' Torso : ' + str(int(torso_inclination))
6.6. Conditionals
On the basis of the posture, whether good or bad, results are displayed. Again here, the threshold angles are based on intuition. You can set the thresholds as per your need. With every detection, the frame counters are incremented for good posture and bad postures respectively. The time of a particular posture can be calculated by dividing the number of frames by fps. Check out fps measuring methods in a previous blog post here.
# Determine whether good posture or bad posture.
# The threshold angles have been set based on intuition.
if neck_inclination < 40 and torso_inclination < 10:
bad_frames = 0
good_frames += 1
cv2.putText(image, angle_text_string, (10, 30), font, 0.9, light_green, 2)
cv2.putText(image, str(int(neck_inclination)), (l_shldr_x + 10, l_shldr_y), font, 0.9, light_green, 2)
cv2.putText(image, str(int(torso_inclination)), (l_hip_x + 10, l_hip_y), font, 0.9, light_green, 2)
# Join landmarks.
cv2.line(image, (l_shldr_x, l_shldr_y), (l_ear_x, l_ear_y), green, 4)
cv2.line(image, (l_shldr_x, l_shldr_y), (l_shldr_x, l_shldr_y - 100), green, 4)
cv2.line(image, (l_hip_x, l_hip_y), (l_shldr_x, l_shldr_y), green, 4)
cv2.line(image, (l_hip_x, l_hip_y), (l_hip_x, l_hip_y - 100), green, 4)
else:
good_frames = 0
bad_frames += 1
cv2.putText(image, angle_text_string, (10, 30), font, 0.9, red, 2)
cv2.putText(image, str(int(neck_inclination)), (l_shldr_x + 10, l_shldr_y), font, 0.9, red, 2)
cv2.putText(image, str(int(torso_inclination)), (l_hip_x + 10, l_hip_y), font, 0.9, red, 2)
# Join landmarks.
cv2.line(image, (l_shldr_x, l_shldr_y), (l_ear_x, l_ear_y), red, 4)
cv2.line(image, (l_shldr_x, l_shldr_y), (l_shldr_x, l_shldr_y - 100), red, 4)
cv2.line(image, (l_hip_x, l_hip_y), (l_shldr_x, l_shldr_y), red, 4)
cv2.line(image, (l_hip_x, l_hip_y), (l_hip_x, l_hip_y - 100), red, 4)
# Calculate the time of remaining in a particular posture.
good_time = (1 / fps) * good_frames
bad_time = (1 / fps) * bad_frames
# Pose time.
if good_time > 0:
time_string_good = 'Good Posture Time : ' + str(round(good_time, 1)) + 's'
cv2.putText(image, time_string_good, (10, h - 20), font, 0.9, green, 2)
else:
time_string_bad = 'Bad Posture Time : ' + str(round(bad_time, 1)) + 's'
cv2.putText(image, time_string_bad, (10, h - 20), font, 0.9, red, 2)
# If you stay in bad posture for more than 3 minutes (180s) send an alert.
if bad_time > 180:
sendWarning()
CONCLUSION
So that’s all about building a posture corrector application using MediaPipe Pose. In this post, we discussed the implementation of MediaPipe Pose to obtain pose landmarks, configurable APIs, Outputs, etc. I hope this blog post helped you understand the basics of the MediaPipe pose and helped generate some new ideas for your next project.