• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Learn OpenCV

OpenCV, PyTorch, Keras, Tensorflow examples and tutorials

  • Home
  • Getting Started
    • Installation
    • PyTorch
    • Keras & Tensorflow
    • Resource Guide
  • Courses
    • Opencv Courses
    • CV4Faces (Old)
  • Resources
  • AI Consulting
  • About

Background Subtraction with OpenCV and BGS Libraries

Avatar Anastasia Murzova
January 25, 2021 Leave a Comment
Computer Vision Stories OpenCV 4 Video Analysis

January 25, 2021 By Leave a Comment

The task of marking foreground entities plays an important role in the video pre-processing pipeline as the initial phase of computer vision (CV) applications. As examples of such applications, we can perform monitoring, tracking, and recognition of the objects: traffic analysis, people detection, tracking of the animals and others.

Steeping into the idea behind these CV-systems we can observe that in most cases the initial steps contain background subtraction (BS), which helps to obtain relatively rough and rapid identifications of the objects in the video stream for their further subtle handling. In the current post, we are going to cover several noteworthy algorithms in terms of accuracy and processing time BS methods: SuBSENSE and LSBP-based GSoC method.

The blog post is divided into the following sub-topics:

  1. Basic concepts and approaches of Background Subtraction
  2. Descriptors and Types
  3. The SubSENSE Algorithm
    1. Implementation using BGS library
  4. GSoC Algorithm
    1. Implementation using OpenCV library
  5. Evaluation
  6. Evaluation Pipeline
  7. Results

Background Subtraction: Basic Concepts and Approaches

Background subtraction methods solve the task of the foreground extraction by creating a background model. The full BS pipeline may contain the following phases:

  • background generation – processing N frames to provide the background image
  • background modeling – defining the model for background representation
  • background model update – introducing the model update algorithm for handling the changes, which occur over time
  • foreground detection – dividing pixels into sets of background or foreground.
Figure 1: BS basic pipeline

Background subtraction output consists of a binary mask, which separates frame pixels into two sets: foreground and background pixels.

It should be mentioned that frequently in the BS-approaches the focus is shifted to the implementation of the advanced background models and robust feature representation aspect.

I've partnered with OpenCV.org to bring you official courses in Computer Vision, Machine Learning, and AI! Sign up now and take your skills to the next level!

Official Courses by OpenCV.org

Descriptors

Here we touched upon another important concept – descriptors (features). Descriptors define the captured image region in the current video frame for its mapping with a known background model. The goal of this comparison is to distinguish the region from the background or foreground. It can be done, for example, with color, texture and edge descriptors.

Obviously, BS-algorithm design, including a combination of the features, should be relied on the initial object field analysis. It’s needed to consider the possible challenging factors: specific illumination, oscillations, movement of objects and others.

For instance, suppose most of the background area is statical. Then it’s assumed that the color of the same regions is fixed, hence, we can identify the background. However, there are different foreground objects and illumination variations, which can distort the colors.

Types of Descriptors

Let’s examine the sorts of features and specific challenges for them. The pixel values of the frames are available during video processing. Thus, the computation of the pixel domain descriptors is widespread in the BS-algorithms. Popular pixel domain descriptors:

  • color: descriptive object features. The components of the RGB color space are tightly connected, reacting to the illumination changes. There is no brightness and chroma separation (as in YCrCb). Color features are sensitive to the illumination, camouflage, shadows, which can affect the appearance of moving objects. That is the reason why usually they are combined with other features for more robustness.
  • edge: edge features are robust to the light variations and good for the detection of the moving objects. Edge features are sensitive to both high and low textured objects.
  • texture: texture features provide spatial information. They are robust to the illumination and shadows. For example, texture features are applied in the Local Binary Pattern (LBP).

Texture Features

In the current subchapter we will briefly overview the texture descriptors and their evolution.

  1. Local Binary Pattern (LBP). LBP was introduced in 2005 as “a gray-scale invariant texture primitive statistic” for texture description and defined the starting point for further development of texture descriptors. The LBP operator produces a binary pattern (number) labeling the frame pixels of the specified area by thresholding each neighboring pixel value with the value of the center pixel.
  2. Figure 2. LBP scheme
    There are eight neighboring pixels in the basic LBP but the number can be extended. The drawback of the LBP is the absence of intensity tracking. The center pixel intensity value can stay more (or less) than neighboring values after scene change. Thus, the detection of changes can be failed.
  1. Local Binary Similarity Patterns (LBSP). LBSP method was introduced in 2013 to solve the issue using absolute difference thresholding in comparison of the center and neighboring pixels. However, LBSP is not spatiotemporal, the information about features and intensity updates not simultaneously.
  2. Figure 3. LBP and LBSP schemes
  1. Self-balanced sensitivity segmenter (SuBSENSE). SuBSENSE was introduced in 2014. It uses improved spatiotemporal LBSP in combination with color features.
  2. Background Subtraction using Local SVD Binary Pattern (LSBP). Local SVD Binary Pattern feature descriptor is robust to the illumination variations, shadows and noise. The coefficients of the singular value decomposition (SVD) used in LSBP characterize the illumination invariant.

In the following chapters we will explore SuBSENSE and GSoC methods in more detail.

SuBSENSE Algorithm

Overview

The below scheme presents the SuBSENSE functioning mechanism:

Figure 4. SuBSENSE pipeline

Suppose there is a video sequence as an input. Then I_{t}(x) is the result of the t-th frame (at time t) spatial analysis, wherex is a pixel index. The background model block is a non-parametric statical model. It produces the background at pixel locations denoted by B(x) on the basis of 50 past representations (samples) N. S_{t}(x) is a segmentation output. Its has the following values:

  • S_{t}(x) = 0: the pixel is marked as background, if there is an intersection of at least 2 samples with the representation of x in the t-th frame (I_{t}(x))
  • S_{t}(x) = 1: the pixel is automatically marked as foreground in the opposite cases.

SuBSENSE solves the background subtraction problem as a classification task, where a pixel value is analyzed due to its neighboring pixels in the feature space. Hence, B(x)={B_{1}(x),B_{2}(x), ..., B_{N}(x)} – modeling of pixel x relying on the N samples. These samples are randomly chosen at the background model initialization time. The SuBSENSE analysis core is color comparison and LBSP descriptors, calculated on the color channels. Thus, B_{n}(x), where n \in [1, N] include the following: B_{n}(x)={R_{n}(x),G_{n}(x),B_{n}(x),LBSP_{n}^{R}(x),LBSP_{n}^{G}(x),LBSP_{n}^{B}(x)}.

B(x) and I_{t}(x) are matched through the color values and LBSP-descriptors.

For the colors comparison L1 distance is used, whereas descriptors are compared with the Hamming distance. The resulting mask is binary and can be described as:

M_t(x) = \begin{cases} 1, & \mbox{if } {dist(I_{t}(x), B_{t}(x)) < D_{max}, \forall n}<2 \\ 0, & \mbox{otherwise } \end{cases}

D_{max} is the threshold of the maximum distance. Its value is dynamically assigned in accordance with the model loyalty, segmentation noise.

Implementation Using BGSLibrary

In the current subchapter we will experiment with background subtraction using BGS library API. It’s worth noting that the BGS framework was developed as a specialized OpenCV-based C++ project for video foreground-background separation. BGS library also has wrappers for Python, Java and MATLAB. Thus, BGS contains a wide range of background subtraction methods as it can be seen from its, for example, Python demo script.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Download Code

As default input we will use a video sequence with a static background and dynamic foreground objects:

default="space_traffic.mp4"

Specify --input_video key to set another input video.

1. Video Processing

Upload and process video data with OpenCV VideoCapture:

# create VideoCapture object for further video processing
captured_video = cv2.VideoCapture(video_to_process)
# check video capture status
if not captured_video.isOpened:
    print("Unable to open: " + video_to_process)
    exit(0)

2. Model Initialization

Instantiate the model:

# instantiate background subtraction
background_subtr_method = bgs.SuBSENSE()

3. Obtaining results

Obtain the results (the initial size of frames was 1920×1080):

while True:
    # read video frames
    retval, frame = captured_video.read()

    # check whether the frames have been grabbed
    if not retval:
        break

    # resize video frames
    frame = cv2.resize(frame, (640, 360))

    # pass the frame to the background subtractor
    foreground_mask = background_subtr_method.apply(frame)
    # obtain the background without foreground mask
    img_bgmodel = background_subtr_method.getBackgroundModel()

4. Visualizing results

Visualize results with OpenCV imshow:

while True:
    # ...

    # show the current frame, foreground mask, subtracted result
    cv2.imshow("Initial Frames", frame)
    cv2.imshow("Foreground Masks", foreground_mask)
    cv2.imshow("Subtraction result", img_bgmodel)

    keyboard = cv2.waitKey(10)
    if keyboard == 27:
        break

The outputs are:

  • initial frame:
Figure 5: Input video frame
  • obtained foreground mask:
Figure 6: Foreground mask obtained with BGS SuBSENSE BS method
  • subtraction result:
Figure 7: Background after subtracted foreground entities

In the above case masks were detected more accurately, in general foreground objects were captured correctly, however, there are the following defects:

  • statical group of people in the left part of the frame was not detected at all:
Figure 8: BGS SuBSENSE artifact 1
  • almost statical objects with moving components:
Figure 9: BGS SuBSENSE artifact 2
  • the regions, where people are close to each other were combined into one shared mask:
Figure 10: BGS SuBSENSE artifact 3

GSoC Algorithm

Overview

During Google Summer of Code (GSoC) 2017 the advancement of LSPB was provided: BackgroundSubtractorGSOC. GSoC BS-method was introduced in order to make LSBP faster and more robust. The method relies on RGB color values instead of LSBP descriptors and achieves high performance on the CDnet-2012.

GSoC BS-implementation doesn`t refer to any article, therefore let’s view the basic points by exploring its source bgfg_gsoc.cpp. Firstly, we need to pay attention to the BackgroundSubtractorGSOC instantiation parameters:

Ptr< BackgroundSubtractorGSOC > createBackgroundSubtractorGSOC(
     int mc,
     int nSamples,
     float replaceRate,
     float propagationRate,
     int hitsThreshold,
     float alpha,
     float beta,
     float blinkingSupressionDecay,
     float blinkingSupressionMultiplier,
     float noiseRemovalThresholdFacBG,
     float noiseRemovalThresholdFacFG
)

There are the following meanings behind them:

  • mc: camera motion compensation flag
  • nSamples: number of samples to maintain at each point of the frame.
  • replaceRate: probability of replacing the old sample – how fast the model will be updated.
  • propagationRate: probability of propagating to neighbors.
  • hitsThreshold: how many positives the sample must get before it will be considered as a possible replacement.
  • alpha: scale coefficient for threshold.
  • beta: bias coefficient for threshold.
  • blinkingSupressionDecay: blinking suppression decay factor.
  • blinkingSupressionMultiplier: blinking suppression multiplier.
  • noiseRemovalThresholdFacBG: strength of the noise removal for background.
  • noiseRemovalThresholdFacFG: strength of the noise removal for foreground .

Keeping the above data in mind let’s examine the core BS-part described in apply() method. The computation core init is in apply():

parallel_for_(Range(0, sz.area()), ParallelGSOC(sz, this, frame, learningRate, fgMask));

The ParallelGSOC contains comparison operations of the neighboring pixels relying on RGB color features.

Another important point concerns the type of frame pixels. The pixels, which frequently switch between the foreground and background are defined as blinking. GSoC BS-approach applies a special heuristic for blinking pixels detection:

cv::add(blinkingSupression, (fgMask != prevFgMask) / 255, blinkingSupression, cv::noArray(), CV_32F);
blinkingSupression *= blinkingSupressionDecay;
fgMask.copyTo(prevFgMask);
Mat prob = blinkingSupression * (blinkingSupressionMultiplier * (1 - blinkingSupressionDecay) / blinkingSupressionDecay);

for (int i = 0; i < sz.height; ++i)
    for (int j = 0; j < sz.width; ++j)
        if (rng.uniform(0.0f, 1.0f) < prob.at< float >(i, j))
            backgroundModel->replaceOldest(i, j, BackgroundSampleGSOC(frame.at< Point3f >(i, j), 0, currentTime));

Here the blinkingSupression can be defined as a blinking pixel map obtained by current and previous mask XOR. Then the values obtained with blinking suppression coefficients are picked at random for classifying of the appropriate pixels as background.

Produced mask postprocessing step is final and consists of denoising and gaussian blur:

void BackgroundSubtractorGSOCImpl::postprocessing(Mat& fgMask) {
    removeNoise(fgMask, fgMask, size_t(noiseRemovalThresholdFacBG * fgMask.size().area()), 0);
    Mat invFgMask = 255 - fgMask;
    removeNoise(fgMask, invFgMask, size_t(noiseRemovalThresholdFacFG * fgMask.size().area()), 255);

    GaussianBlur(fgMask, fgMask, Size(5, 5), 0);
    fgMask = fgMask > 127;
}

The threshold value for the noise removal is produced with noiseRemovalThresholdFacBG and noiseRemovalThresholdFacBG multiplication on the mask area. Further the mask values are updated in accordance with the obtained threshold:

for (int i = 0; i < sz.height; ++i)
    for (int j = 0; j < sz.width; ++j)
        if (compArea[labels.at< int >(i, j)] < threshold)
            fgMask.at< uchar >(i, j) = filler;

Implementation Using OpenCV

In the current section we will experiment with background subtraction using the appropriate API from the OpenCV library by the example of default "space_traffic.mp4" video.

1. Video Processing

Upload and process video data with OpenCV VideoCapture:

# create VideoCapture object for further video processing
captured_video = cv2.VideoCapture(video_to_process)
# check video capture status
if not captured_video.isOpened:
    print("Unable to open: " + video_to_process)
    exit(0)

2. Model Initialization

Instantiate the model:

# instantiate background subtraction
background_subtr_method = cv2.bgsegm.createBackgroundSubtractorGSOC()

3. Obtaining results

Obtain the results (the initial size of frames was 1920×1080):

while True:
    # read video frames
    retval, frame = captured_video.read()

    # check whether the frames have been grabbed
    if not retval:
        break

    # resize video frames
    frame = cv2.resize(frame, (640, 360))

    # pass the frame to the background subtractor
    foreground_mask = background_subtr_method.apply(frame)
    # obtain the background without foreground mask
    background_img = background_subtr_method.getBackgroundImage()

4. Visualizing results

Visualize results with OpenCV imshow:

while True:
    # ...

    # show the current frame, foreground mask, subtracted result
    cv2.imshow("Initial Frames", frame)
    cv2.imshow("Foreground Masks", foreground_mask)
    cv2.imshow("Subtraction Result", background_img)

    keyboard = cv2.waitKey(10)
    if keyboard == 27:
        break

The outputs are:

  • initial frame:
Figure 11: Input video fragment
  • obtained foreground mask:
Figure 12: Foreground mask obtained with OpenCV BS-GSoC method
  • subtraction result:
Figure 13: Obtained background result

We can see that mostly foreground objects were correctly located. However, there are some artifacts:

  • masks of the foreground cover some extra space at the footing of the objects, which denotes their shadows:
Figure 14: OpenCV BS-GSoC artifact 1
  • statical objects were partially defined, only their moving components were detected, for example, moving man’s hand in the below picture:
    Figure 15: OpenCV BS-GSoC artifact 2

    or some parts of the non-dynamic people group in the left part of the frame:

Figure 16: OpenCV BS-GSoC artifact 3

It can be noted that the most challenging areas for both algorithms contain statical foreground objects or partly moving objects with some dynamic components.

Evaluation

Data Sets

In the current post, we will use two datasets from ChangeDetection.NET(CDNET): CDNET-2012 and CDNET-2014 to provide an evaluation of the proposed BS-methods. CDNET data set is frequently used video collection for evaluation of algorithms due to the variety of its content: categories, input frames and corresponding ground truth (GT) images. There are 6 categories in CDNET-2012 and 11 in CDNET-2014. Let’s quickly look through them and view the video fragments:

Common categories:

  1. baseline: 4 videos with a statical background containing moving foreground objects
  1. cameraJitter: 4 videos with slight camera oscillation effect
  1. dynamicBackground: 6 videos with partly moving background and dynamic foreground
  1. intermittentObjectMotion: 6 videos containing statical background with periodic moving foreground entities
  1. shadow: 6 video sequences, which contain the shadows of the foreground objects
  1. thermal: 5 videos obtained from a thermal camera

Introduced in CDNET-2014:

  1. badWeather: 4 videos of traffic with poor visibility, distorted by snowfall images
  1. lowFramerate: 4 video sequences with the low frame rate
  1. nightVideos: 6 videos containing low illuminated views
  1. PTZ: 4 video sequences obtained with a pan-tilt-zoom camera (dynamic foreground: rotation, zoom; slight oscillation effect)
  1. turbulence: 4 videos distorted with a slight ripple.

Evaluation Pipeline

To evaluate the algorithms we will use evaluator.py, based on opencv-contrib evaluation pipeline. To run the script we need to obtain the data set, in our case CDnet-2012 and CDnet-2014. The path to the data should be specified in --dataset_path required parameter. The below line initiates evaluation execution:

python evaluator.py --dataset_path ./cdnet_12

In the below lines we define the list of algorithms to evaluate (method creator, its title, passing arguments):

import cv2
import pybgs as bgs

ALGORITHMS_TO_EVALUATE = [
    (cv2.bgsegm.createBackgroundSubtractorGSOC, "GSoC", {}),
    (bgs.SuBSENSE, "SuBSENSE", {}),
]

Iterating over ALGORITHMS_TO_EVALUATE the specified background subtraction models are instantiated. To compute a foreground mask the apply(frame) method should be called. The mask list accumulates obtained foreground masks for further calculation of the algorithm quality metrics. Before we get their values, let’s remember the following key concepts:

  • true positives (TP) – properly masked objects
  • true negatives (TN) – properly not masked objects
  • false positives (FP) – improperly masked objects
  • false negatives (FN) – improperly not masked objects. Knowing TP, TN, FP and FN values we can calculate precision, recall and, finally, F1-measure and accuracy value:
    1. precision – the ratio of true positives in the obtained results: \frac{TP}{TP+FP}
    2. recall – the amount of true positives found among all the ground truth: \frac{TP}{TP+FN}
    1. F1-measure (FM): 2\times\frac{precision\times recall}{precision+recall}=\frac{2TP}{2TP+FP+FN}
    2. Accuracy: \frac{TP+TN}{TP+TN+FP+FN}

Results

The minimal F1 value 0.084 for LSBP was obtained in dynamicBackground video series:

PrecisionRecallF1Accuracy
LSBP:0.0640.7840.0840.864
GSoC:0.2690.9130.2890.990
SuBSENSE:0.6100.7400.5280.996

The most challenging videos for GSoC were from PTZ category:

PrecisionRecallF1Accuracy
LSBP:0.2310.6390.2160.888
GSoC:0.2460.9330.2650.811
SuBSENSE:0.5270.7300.4850.964

SuBSENSE showed the lowest F1 in nightVideos:

PrecisionRecallF1Accuracy
LSBP:0.4670.3920.2960.977
GSoC:0.2940.7800.3420.947
SuBSENSE:0.4620.6240.4480.975

The average values for all categories presented in the data sets are:

  • CDnet-2012:
PrecisionRecallF1Accuracy
LSBP0.4910.6430,3930.93
GSoC0,7050,7140,5620,972
SuBSENSE0,8240,7420,6880,982
  • CDnet-2014:
PrecisionRecallF1Accuracy
LSBP0.4550.6240.3620.945
GSoC0.6100.7530.5220.960
SuBSENSE0.7470.7340.6440,983

The above evaluations illustrate that in OpenCV category GSoC BS-method exceeds the LSBP values. SuBSENSE outperforms all the methods.

References

The following links contain detailed information about the above methods and additional materials for further exploration:

  1. On the Role and the Importance of Features for Background Modeling and Foreground Detection: contains basic information about BS and its methods, detailed information about types of features, helpful illustrative comparative tables
  2. Background Subtraction using Local SVD Binary Pattern: describes the LSBP method
  3. Flexible Background Subtraction With Self-Balanced Local Sensitivity: describes SuBSENSE method
  4. Regularly updating compilation of BS materials
  5. CDnet datasets

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please subscribe to our newsletter. You will also receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now


Tags: Background Subtraction BGS BGSLibrary foreground separation OpenCV OpenCV4 SuBSENSE

Filed Under: Computer Vision Stories, OpenCV 4, Video Analysis

About

AvatarI am an entrepreneur with a love for Computer Vision and Machine Learning with a dozen years of experience (and a Ph.D.) in the field.

In 2007, right after finishing my Ph.D., I co-founded TAAZ Inc. with my advisor Dr. David Kriegman and Kevin Barnes. The scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products. Read More…

Getting Started

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

Resources

Download Code (C++ / Python)

ENROLL IN OFFICIAL OPENCV COURSES

I've partnered with OpenCV.org to bring you official courses in Computer Vision, Machine Learning, and AI.
Learn More

Recent Posts

  • How to use OpenCV DNN Module with NVIDIA GPUs
  • Code OpenCV in Visual Studio
  • Install OpenCV on Windows – C++ / Python
  • Face Recognition with ArcFace
  • Background Subtraction with OpenCV and BGS Libraries

Disclaimer

All views expressed on this site are my own and do not represent the opinions of OpenCV.org or any entity whatsoever with which I have been, am now, or will be affiliated.

GETTING STARTED

  • Installation
  • PyTorch
  • Keras & Tensorflow
  • Resource Guide

COURSES

  • Opencv Courses
  • CV4Faces (Old)

COPYRIGHT © 2021 - BIG VISION LLC

Privacy Policy | Terms & Conditions

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Privacy policyAccept