Parallel Pixel Access in OpenCV using forEach

In this tutorial, we will compare the performance of the forEach method of the Mat class to other ways of accessing and transforming pixel values in OpenCV. We will show how forEach is much faster than naively using the at method or even efficiently using pointer arithmetic.

There are hidden gems inside OpenCV that are sometimes not very well known. One of these hidden gems is the forEach method of the Mat class that utilizes all the cores on your machine to apply any function at every pixel.

Let us first define a function complicatedThreshold. It takes in an RGB pixel value and applies a complicated threshold to it.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

// Define a pixel
typedef Point3_<uint8_t> Pixel;

// A complicated threshold is defined so
// a non-trivial amount of computation
// is done at each pixel.
void complicatedThreshold(Pixel &pixel)
{
  if (pow(double(pixel.x)/10,2.5) > 100)
  {
    pixel.x = 255;
    pixel.y = 255;
    pixel.z = 255;
  }
  else
  {
    pixel.x = 0;
    pixel.y = 0;
    pixel.z = 0;
  }
}

This function is computationally much heavier compared to a simple threshold. This way we are not just testing pixel access time but also how forEach uses all the cores when each pixel operation is computationally heavy.

Next, we will go over four different ways of applying this function to every pixel in an image and examine the relative performance.

Method 1 : Naive Pixel Access Using the at Method

The Mat class has a convenient method called at to access a pixel at location (row, column) in the image. The following code uses the at method to access every pixel and applies complicatedThreshold to it.

// Naive pixel access
// Loop over all rows
for (int r = 0; r < image.rows; r++)
{
  // Loop over all columns
  for ( int c = 0; c < image.cols; c++)
  {
    // Obtain pixel at (r, c)
    Pixel pixel = image.at<Pixel>(r, c);
    // Apply complicatedTreshold
    complicatedThreshold(pixel);
    // Put result back
    image.at<Pixel>(r, c) = pixel;
  }

}

The above method is considered inefficient because the location of a pixel in memory is being calculated every time we call the at method. This involves a multiplication operation. The fact that the pixels are located in a contiguous block of memory is not used.

Method 2 : Pixel Access Using Pointer Arithmetic

In OpenCV, all pixels in a row are stored in one continuous block of memory. If the Mat object is created using the create, ALL pixels are stored in one contiguous block of memory. Since we are reading the image from disk and imread uses the create method, we can simply loop over all pixels using simple pointer arithmetic that does not require a multiplication.

The code is shown below.

// Using pointer arithmetic

// Get pointer to first pixel
Pixel* pixel = image1.ptr<Pixel>(0,0);

// Mat objects created using the create method are stored
// in one continous memory block.
const Pixel* endPixel = pixel + image1.cols * image1.rows;
// Loop over all pixels
for (; pixel != endPixel; pixel++)
{
  complicatedThreshold(*pixel);
}

Method 3 : Using forEach

The forEach method of the Mat class, takes in a function operator. The usage is

void cv::Mat::forEach (const Functor &operation)

The easiest way to understand the above usage is by way of an example shown below. We define a function object ( Operator ) for use with forEach.

// Parallel execution with function object.
struct Operator
{
  void operator ()(Pixel &pixel, const int * position) const
  {
    // Perform a simple threshold operation
    complicatedThreshold(pixel);
  }
};

Calling forEach is straightforward and is done in just one line of code

// Call forEach
image2.forEach<Pixel>(Operator());

Method 4 : Using forEach with C++11 Lambda

Some of you are looking at Method 3, shaking your head in disgust and shouting, “lambda, Lambda, LAMBDA!”

Well, here you go, C++11 junkie!

image3.forEach<Pixel>
(
  [](Pixel &pixel, const int * position) -> void
  {
    complicatedThreshold(pixel);
  }
);

Comparing Performance of forEach

The function complicatedThreshold was applied to all pixels of a large image of size 9000 x 6750 five times in a row. The 2.5 GHz Intel Core i7 processor, used in the experiment, has four cores. The following timings were obtained. Note that using forEach made the code about five times faster than using Naive Pixel Access or Pointer Arithmetic method.

Method Type	Time ( milliseconds )
Naive Pixel Access	6656
Pointer Arithmetic	6575
forEach	1221
forEach (C++11 Lambda)	1272

I have been writing code in OpenCV for more than a decade and whenever I had to write optimized code that accessed a pixel, I used pointer arithmetic instead of the naive at method. However, while writing this post, I was shocked to find there does not seem to be much of difference between the two methods even for large images.