Simple Background Estimation in Videos using OpenCV (C++/Python)

In many computer vision applications, the processing power at your disposal is low. In such cases, we have to use simple, yet effective techniques.

In this post, we will cover one such technique for estimating the background of a scene when the camera is static and there are some moving objects in the scene. This scenario is not uncommon. For example, many traffic and surveillance cameras are rigidly fixed.

This post has been tested on OpenCV 4.4.

Temporal Median Filtering

To understand the idea we are going to describe in this post, let us consider a simpler problem in 1D.

Suppose we are estimating a quantity (say the temperature of the room) every 10 milliseconds.

Let’s say, the temperature of the room is 70 degrees Fahrenheit.

In the above figure, we have shown measurements from two thermometers — a good thermometer and a bad thermometer.

The good thermometer shown on the left reports 70 degrees with some level of Gaussian noise. To get a more accurate estimate of temperature, we can simply average the values over a few seconds. Since the noise is Gaussian with positive and negative values, the average will cancel out the noise. Indeed, the average value in this specific case is 70.01.

The bad thermometer, on the other hand, behaves like the good thermometer most of the time but occasionally, the numbers are completely wrong.

In fact, if we take an average of the numbers reported by the bad thermometer we get 71.07 degrees. This is clearly an overestimate.

Can we still get a good estimate of temperature?

The answer is YES. When the data contains outliers, the median is a more robust estimate of the value we are trying to estimate.

The median is the mid-value of the data when it is sorted in ascending or descending order.

The median of the curve shown above is 70.05 degrees which is a much better estimate than 71.07 degrees.

The only downside is that the median is more expensive to calculate compared to the mean/average.

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

Using the Median for Background Estimation

Now, let’s return to the problem of estimating the background when the camera is static.

We can assume most of the time, every pixel sees the same piece of the background because the camera is not moving. Occasionally, a car or other moving object comes in the front and obscure the background.

For a video sequence, we can randomly sample a few frames (say 25 frames).

In other words, for every pixel, we now have 25 estimates of the background. As long as a pixel is not covered by a car or other moving object more than 50% of the time, the median of the pixel over these 25 frames will give a good estimate of the background at that pixel.

We can repeat this for every pixel and recover the entire background.

Background Estimation Code (C++ / Python)

Now, let’s look at the actual code.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

Python code

import numpy as np
import cv2
from skimage import data, filters

# Open Video
cap = cv2.VideoCapture('video.mp4')

# Randomly select 25 frames
frameIds = cap.get(cv2.CAP_PROP_FRAME_COUNT) * np.random.uniform(size=25)

# Store selected frames in an array
frames = []
for fid in frameIds:
    cap.set(cv2.CAP_PROP_POS_FRAMES, fid)
    ret, frame = cap.read()
    frames.append(frame)

# Calculate the median along the time axis
medianFrame = np.median(frames, axis=0).astype(dtype=np.uint8)    

# Display median frame
cv2.imshow('frame', medianFrame)
cv2.waitKey(0)

C++ code

#include <opencv2/opencv.hpp>
#include <iostream>
#include <random>

using namespace std;
using namespace cv;

We will create a couple of functions to calculate the median frame.

int computeMedian(vector<int> elements) 
{
  nth_element(elements.begin(), elements.begin()+elements.size()/2, elements.end());

  //sort(elements.begin(),elements.end());
  return elements[elements.size()/2];
}

cv::Mat compute_median(std::vector<cv::Mat> vec) 
{
  // Note: Expects the image to be CV_8UC3
  cv::Mat medianImg(vec[0].rows, vec[0].cols, CV_8UC3, cv::Scalar(0, 0, 0));

  for(int row=0; row<vec[0].rows; row++) 
  {
    for(int col=0; col<vec[0].cols; col++) 
    {
      std::vector<int> elements_B;
      std::vector<int> elements_G;
      std::vector<int> elements_R;

      for(int imgNumber=0; imgNumber<vec.size(); imgNumber++) 
      {
        int B = vec[imgNumber].at<cv::Vec3b>(row, col)[0];
        int G = vec[imgNumber].at<cv::Vec3b>(row, col)[1];
        int R = vec[imgNumber].at<cv::Vec3b>(row, col)[2];

        elements_B.push_back(B);
        elements_G.push_back(G);
        elements_R.push_back(R);
      }

      medianImg.at<cv::Vec3b>(row, col)[0]= computeMedian(elements_B);
      medianImg.at<cv::Vec3b>(row, col)[1]= computeMedian(elements_G);
      medianImg.at<cv::Vec3b>(row, col)[2]= computeMedian(elements_R);
    }
  }
  return medianImg;
}

int main(int argc, char const *argv[])
{
  std::string video_file;
  // Read video file
  if(argc > 1)
  {
    video_file = argv[1];
  } else 
  {
    video_file = "video.mp4";
  }

  VideoCapture cap(video_file);
  if(!cap.isOpened())
    cerr << "Error opening video file\n";

  // Randomly select 25 frames
  default_random_engine generator;
  uniform_int_distribution<int>distribution(0, 
  cap.get(CAP_PROP_FRAME_COUNT));

  vector<Mat> frames;
  Mat frame;

  for(int i=0; i<25; i++) 
  {
    int fid = distribution(generator);
    cap.set(CAP_PROP_POS_FRAMES, fid);
    Mat frame;
    cap >> frame;
    if(frame.empty())
      continue;
    frames.push_back(frame);
  }
  // Calculate the median along the time axis
  Mat medianFrame = compute_median(frames);

  // Display median frame
  imshow("frame", medianFrame);
  waitKey(0);
}

As you can see, we randomly select 25 frames and calculate the median of every pixel over the 25 frames. This median frame is a good estimate of the background as long as every pixel sees the background at least 50% of the time.

The result is shown below

Median Frame calculating using temporal median filtering — The estimate of the background is the **median frame** calculated by finding the median value of every pixel over 25 frames.

Frame differencing (C++ / Python)

The obvious next question is if we can create a mask for every frame which shows parts of the image that are in motion.

This is accomplished in the following steps

Convert the median frame to grayscale.
Loop over all frames in the video. Extract the current frame and convert it to grayscale.
Calcualte the absolute difference between the current frame and the median frame.
Threshold the above image to remove noise and binarize the output.

Let’s check out the code.

Python code

# Reset frame number to 0
cap.set(cv2.CAP_PROP_POS_FRAMES, 0)

# Convert background to grayscale
grayMedianFrame = cv2.cvtColor(medianFrame, cv2.COLOR_BGR2GRAY)

# Loop over all frames
ret = True
while(ret):

  # Read frame
  ret, frame = cap.read()
  # Convert current frame to grayscale
  frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
  # Calculate absolute difference of current frame and 
  # the median frame
  dframe = cv2.absdiff(frame, grayMedianFrame)
  # Treshold to binarize
  th, dframe = cv2.threshold(dframe, 30, 255, cv2.THRESH_BINARY)
  # Display image
  cv2.imshow('frame', dframe)
  cv2.waitKey(20)

# Release video object
cap.release()

# Destroy all windows
cv2.destroyAllWindows()

C++ code

//  Reset frame number to 0
cap.set(CAP_PROP_POS_FRAMES, 0);

// Convert background to grayscale
Mat grayMedianFrame;
cvtColor(medianFrame, grayMedianFrame, COLOR_BGR2GRAY);

// Loop over all frames
while(1) 
{
  // Read frame
  cap >> frame;

  if (frame.empty())
    break;

  // Convert current frame to grayscale
  cvtColor(frame, frame, COLOR_BGR2GRAY);

  // Calculate absolute difference of current frame and the median frame
  Mat dframe;
  absdiff(frame, grayMedianFrame, dframe);

  // Threshold to binarize
  threshold(dframe, dframe, 30, 255, THRESH_BINARY);

  // Display Image
  imshow("frame", dframe);
  waitKey(20);
}

  cap.release();
  return 0;
}

Results

The video below shows the output of background estimation and frame differencing.

Subscribe for More

Subscribe for More

Simple Background Estimation in Videos using OpenCV (C++/Python)

Temporal Median Filtering

Using the Median for Background Estimation

Background Estimation Code (C++ / Python)

Python code

C++ code

Frame differencing (C++ / Python)

Python code

C++ code

Results

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

Subscribe for More

Subscribe for More

Simple Background Estimation in Videos using OpenCV (C++/Python)

Temporal Median Filtering

Using the Median for Background Estimation

Background Estimation Code (C++ / Python)

Python code

C++ code

Frame differencing (C++ / Python)

Python code

C++ code

Results

Subscribe & Download Code

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?