Non Maximum Suppression (NMS) is a technique used in numerous computer vision tasks. It is a class of algorithms to select one entity (e.g., bounding boxes) out of many overlapping entities. We can choose the selection criteria to arrive at the desired results. The criteria are most commonly some form of probability number and some form of overlap measure (e.g. Intersection over Union).

This post will go over how it works and implement it in Python using the PyTorch framework. So without further ado, let us get started 🙂

## Why we need it ?

Before we discuss how NMS works, we must try to answer why we need it first.

Most object detection algorithms use NMS to whittle down many detected bounding boxes to only a few. At the most basic level, most object detectors do some form of **windowing**. Thousands of windows (anchors) of various **sizes and shapes** are generated.

These windows supposedly contain **only one object**, and a classifier is used to obtain a **probability/score for each class**. Once the detector outputs a **large number of bounding boxes**, it is necessary to** filter out the best ones**. **NMS** is the most commonly used algorithm for this task.

Now that we know why we need it let us dive into the explanation of how it works.

## Intersection Over Union (IoU)

Before we go ahead, I would like to discuss this one concept that we will be using in the following sections i.e. **Intersection Over Union** or **IoU** in short.

Jaccard index,

The **Intersection over Union (IoU)** metric, also referred to as the **Jaccard index**, is essentially a method used usually to quantify the percent overlap between the ground truth BBox (Bounding Box) and the prediction BBox. However, in NMS, we find IoU between two predictions BBoxes instead.

**IoU** in mathematical terms can be represented by the following expression,

**Intersection Over Union(IoU) = (Target ∩ Prediction) / (Target U Prediction)**

In our case using BBoxes, it can be modified to,

**IOU(Box1, Box2) = Intersection_Size(Box1, Box2) / Union_Size(Box1, Box2)**

Consider the the two BBoxes in the following figure,

Their union area is the orange area, and their intersection area is the purple area. So the IoU can be calculated by dividing the purple area by the orange area.

### Implementation of IoU in Python

Let us implement this in Python so that we can use it in the later sections. Consider two BBoxes, namely

`Box1`

having lower left coordinates as `(x1,y1) and upper right coordinates as (a1,b1)`

.

Similarly, there is another BBox, called `Box2`

having coordinates `(x2,y2) and (a2,b2)`

.

We find their intersection box that has coordinates `(xx,yy) and (aa,bb)`

.

We then use the expression discussed above to find IoU.

```
# find the area for the box1 (x1,y1) (a1,b1)
area1 = (a1-x1)*(b1-y1);
# find the area for the box2 (x2,y2) (a2,b2)
area2 = (a2-x2)*(b2-y2);
# Now we need to find the intersection box
# to do that, find the largest (x, y) coordinates
# for the start of the intersection bounding box and
# the smallest (x, y) coordinates for the
# end of the intersection bounding box
xx = max(x1, x2)
yy = max(y1, y2)
aa = min(a1, a2)
bb = min(b1, b2)
# So the intersection BBox has the coordinates (xx,yy) (aa,bb)
# compute the width and height of the intersection bounding box
w = max(0, aa - xx)
h = max(0, bb - yy)
# find the intersection area
intersection_area = w*h
# find the union area of both the boxes
union_area = area1 + area2 - intersection_area
# compute the ratio of overlap between the computed
# bounding box and the bounding box in the area list
IoU = intersection_area / union_area
```

**Download Code**To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

## The NMS Algorithm

Let us get to the nitty-gritty of this post, the actual algorithm. I will divide this into three parts, what we need as input, what we get after applying the algorithm and the actual algorithm itself.

### Input

We get a list `P`

of prediction BBoxes of the form `(x1,y1,x2,y2,c)`

, where `(x1,y1) and (x2,y2)`

are the ends of the BBox and `c`

is the predicted confidence score of the model. We also get overlap threshold IoU `thresh_iou`

.

### Output

We return a list `keep`

of filtered prediction BBoxes.

### Algorithm

**Step 1 :** Select the prediction

with highest confidence score and remove it from **S**

and add it to the final prediction list **P**

. (**keep**** keep **is empty initially).

**Step 2 :** Now compare this prediction ** S **with all the predictions present in

**P**

. Calculate the IoU of this prediction **with every other predictions in**

`S`

**P**

. If the IoU is greater than the threshold **for any prediction**

`thresh_iou`

**present in**

`T`

**P**

, remove prediction **from**

`T`

**P**

.**Step 3 :** If there are still predictions left in

, then go to **P****Step 1** again, else return the list ** keep **containing the filtered predictions.

In layman terms, we select the predictions with the **maximum confidence** and **suppress all the other predictions** having overlap with the selected predictions greater than a threshold. In other words, we **take the maximum and suppress the non-maximum ones**, hence the name non-maximum suppression.

If you observe the algorithm above, the whole filtering process depends on a single threshold value thresh_iou. So selection of threshold value is vital for the performance of the model. Usually, we take its value as 0.5, but it depends on the experiment you are doing.As discussed in the NMS algorithm above, we extract the BBox of highest confidence score and remove it from P.

Now that we have a good grasp of how NMS works, let us implement it in PyTorch so that you can use it in your future Object Detection pipelines 🙂

## Implementation of NMS in PyTorch

Let us begin with the fun part.

We define the function that takes in a list of prediction boxes along with their confidence score and a thresh_iou.

### Define the function

```
def nms_pytorch(P : torch.tensor ,thresh_iou : float):
"""
Apply non-maximum suppression to avoid detecting too many
overlapping bounding boxes for a given object.
Args:
boxes: (tensor) The location preds for the image
along with the class predscores, Shape: [num_boxes,5].
thresh_iou: (float) The overlap thresh for suppressing
unnecessary boxes.
Returns:
A list of filtered boxes, Shape: [ , 5]
"""
# we extract coordinates for every
# prediction box present in P
x1 = P[:, 0]
y1 = P[:, 1]
x2 = P[:, 2]
y2 = P[:, 3]
```

When I say coordinate `(x1,y1)`

, I refer to the lower left coordinate of the BBox and `(x2,y2)`

refers to the upper right coordinate of the BBox.

We extract all the `x1`

coordinates using simple slicing , `P[ : , 0]`

. Similarly we extract the `x2`

, `y1`

and `y2`

coordinates as well.

```
# we extract the confidence scores as well
scores = P[:, 4]
```

We also extract the confidence scores in `scores`

.

```
# calculate area of every block in P
areas = (x2 - x1) * (y2 - y1)
# sort the prediction boxes in P
# according to their confidence scores
order = scores.argsort()
# initialise an empty list for
# filtered prediction boxes
keep = []
```

We find the area of every BBox in `P`

and store in `areas`

. We also initialise an empty list `keep`

as discussed above to store the filtered predictions.

Now we `argsort`

(Argument Sort) the list `scores`

, i.e. `order`

will contain the respective indices of the `scores`

if it had been sorted.

For example, if `scores = [0.7, 0.3, 0.6]`

, then `order = scores.argsort() = [1, 2, 0]`

.

So now we have the indices of every BBox in `P`

sorted according to their confidence scores.

### Step 1 :

**Finding BBox **`S`

with highest score

`S`

with highest score```
while len(order) > 0:
# Step 1
# extract the index of the
# prediction with highest score
# we call this prediction S
idx = order[-1]
# push S in filtered predictions list
keep.append(P[idx])
# remove S from P
order = order[:-1]
```

As discussed in the NMS algorithm above, we extract the BBox with the highest confidence score and remove it from `P`

. We select `S`

using `idx = order[-1]`

and then push this prediction in filtered box list `keep`

. (So here, `S`

is actually `P[idx]`

)

```
# select coordinates of BBoxes according to
# the indices in order
xx1 = torch.index_select(x1,dim = 0, index = order)
xx2 = torch.index_select(x2,dim = 0, index = order)
yy1 = torch.index_select(y1,dim = 0, index = order)
yy2 = torch.index_select(y2,dim = 0, index = order)
```

Next we extract all the `x1`

coordinates according to the indices present in `order`

.

For example, if `order = [2, 0, 1]`

and `x1 = [2.0, 7.0, 4.0]`

, then `xx1 = [4.0, 2.0, 7.0]`

.

`Torch.index_select`

selects elements from a `input`

tensor according to an `index`

list. In our case `index = order`

and `input = x1`

.

Similarly we do it for `x2,y1 and y2`

.

### Step 2 :

**Finding intersection BBoxes between **`S`

and all predictions left in `P`

`S`

and all predictions left in `P`

```
# Step 2
# find the coordinates of the intersection boxes
xx1 = torch.max(xx1, x1[idx])
yy1 = torch.max(yy1, y1[idx])
xx2 = torch.min(xx2, x2[idx])
yy2 = torch.min(yy2, y2[idx])
# find height and width of the intersection boxes
w = xx2 - xx1
h = yy2 - yy1
# take max with 0.0 to avoid negative w and h
# due to non-overlapping boxes
w = torch.clamp(w, min=0.0)
h = torch.clamp(h, min=0.0)
```

Next, we find intersection bounding box of the selected prediction `S = P[idx]`

with all other predictions left in `P`

.

A thing to note is that`(xx1,yy1)`

is the lower left corner of intersection BBox and`(xx2,yy2)`

is the upper right corner of intersection BBox

Since to find the `xx1`

coordinate of the intersection BBox of the selected prediction `S`

with any prediction `T`

in `P`

, we do `xx1 = max(S.x1, T.x1)`

. We do the same thing with this line, `xx1 = torch.max(xx1, x1[idx])`

`x1[idx]`

is actually `P.x1`

since `idx`

is the index of the current selected prediction `S`

in `P`

Similarly, we find `xx2, yy1 and yy2`

and obtain intersection box coordinates of all the boxes present in `P`

with the selected prediction `S`

.

Next we find width `w`

and height `h`

of every intersection box and take a `max`

with `0.0`

to avoid negative `w`

and `h`

due to non-overlapping BBoxes.

**Finding IoU between **`S`

and all other BBox left in `P`

`S`

and all other BBox left in `P`

```
# find the intersection area
inter = w*h
# find the areas of BBoxes according the indices in order
rem_areas = torch.index_select(areas, dim = 0, index = order)
# find the union of every prediction T in P
# with the prediction S
# Note that areas[idx] represents area of S
union = (rem_areas - inter) + areas[idx]
# find the IoU of every prediction in P with S
IoU = inter / union
```

Now we find the intersection area `inter`

of all intersection boxes. We get the areas of the BBoxes according to the indices present in `order`

using `torch.index_select`

, and then calculate `union`

, which contains union area of all the BBoxes present in `P`

with `S`

.

`union = (rem_areas - inter) + areas[idx]`

First, subtract the intersection area from all boxes in `P`

Then, add the area of `S`

, which adds back:

– the intersection area

– plus the extra area not present in intersection of `S`

and `T`

, but present in `S`

.

Hence we have calculated the union of S and BBoxes `T`

in `P`

.

Next we find the IoU by simply dividing `inter`

by `union`

.

**Filtering out BBox having IoU overlap with **`S`

greater than the threshold `thresh_iou`

`S`

greater than the threshold `thresh_iou`

```
# keep the boxes with IoU less than thresh_iou
mask = IoU < thresh_iou
order = order[mask]
# Step 3, check if there are still elements in P
return keep
```

Now we need to eliminate all such BBoxes `T`

in `P`

having `IoU >= thresh_iou`

. To achieve this, we create a mask using the criteria *which elements in IoU have a value less than thresh_iou* as those are the values we want to keep.

For example, if `IoU = [0.4, 0.6, 0.3]`

and `thresh_iou = 0.5`

, then `mask = IoU < thresh_iou`

will result in `mask = [True, False, True]`

. If we then apply `mask`

over `order`

, we get the list only with elements where corresponding element in `mask`

was `True`

.

For example, if `score = [1, 0, 2]`

and `mask = [True, False, True]`

, then `score[mask] = [1, 2]`

.

So in this way we keep only those predictions whose IoU with the current prediction `S`

is less than the `thresh_iou`

and remove all those with greater than threshold.

### Step 3 :

#### Putting it together

We repeat this loop until `P`

or, in our case, `order`

is not empty. We go out of the loop. We simply return `keep`

that stored our filtered predictions.

Let us use a simple example to test out nms_pytorch function. Notice below that P is a tensor containing tensors of the form (x,y, a,b,c). We discussed this previously in the input section of the NMS algorithm. **So if we put everything together, our NMS function looks like below**:

```
def nms_pytorch(P : torch.tensor ,thresh_iou : float):
"""
Apply non-maximum suppression to avoid detecting too many
overlapping bounding boxes for a given object.
Args:
boxes: (tensor) The location preds for the image
along with the class predscores, Shape: [num_boxes,5].
thresh_iou: (float) The overlap thresh for suppressing unnecessary boxes.
Returns:
A list of filtered boxes, Shape: [ , 5]
"""
# we extract coordinates for every
# prediction box present in P
x1 = P[:, 0]
y1 = P[:, 1]
x2 = P[:, 2]
y2 = P[:, 3]
# we extract the confidence scores as well
scores = P[:, 4]
# calculate area of every block in P
areas = (x2 - x1) * (y2 - y1)
# sort the prediction boxes in P
# according to their confidence scores
order = scores.argsort()
# initialise an empty list for
# filtered prediction boxes
keep = []
while len(order) > 0:
# extract the index of the
# prediction with highest score
# we call this prediction S
idx = order[-1]
# push S in filtered predictions list
keep.append(P[idx])
# remove S from P
order = order[:-1]
# sanity check
if len(order) == 0:
break
# select coordinates of BBoxes according to
# the indices in order
xx1 = torch.index_select(x1,dim = 0, index = order)
xx2 = torch.index_select(x2,dim = 0, index = order)
yy1 = torch.index_select(y1,dim = 0, index = order)
yy2 = torch.index_select(y2,dim = 0, index = order)
# find the coordinates of the intersection boxes
xx1 = torch.max(xx1, x1[idx])
yy1 = torch.max(yy1, y1[idx])
xx2 = torch.min(xx2, x2[idx])
yy2 = torch.min(yy2, y2[idx])
# find height and width of the intersection boxes
w = xx2 - xx1
h = yy2 - yy1
# take max with 0.0 to avoid negative w and h
# due to non-overlapping boxes
w = torch.clamp(w, min=0.0)
h = torch.clamp(h, min=0.0)
# find the intersection area
inter = w*h
# find the areas of BBoxes according the indices in order
rem_areas = torch.index_select(areas, dim = 0, index = order)
# find the union of every prediction T in P
# with the prediction S
# Note that areas[idx] represents area of S
union = (rem_areas - inter) + areas[idx]
# find the IoU of every prediction in P with S
IoU = inter / union
# keep the boxes with IoU less than thresh_iou
mask = IoU < thresh_iou
order = order[mask]
return keep
```

## Testing our NMS function

Let us use a very simple example to test out `nms_pytorch`

function. Notice below that P is a tensor containing tensors of the form `(x,y,a,b,c)`

as was discussed above in the input section of NMS algorithm.

```
# Let P be the following
P = torch.tensor([
[1, 1, 3, 3, 0.95],
[1, 1, 3, 4, 0.93],
[1, 0.9, 3.6, 3, 0.98],
[1, 0.9, 3.5, 3, 0.97]
])
```

If we plot prediction boxes in `P`

, we get a figure like this.

Now we run our `nms_pytorch`

function with `thresh_iou`

as `0.5`

.

```
filtered_boxes = nms_pytorch(P,0.5)
```

We get one filtered box and if we plot that, it will look like below.

So all the other extra overlapping boxes have been cleared out and we get a nice and clean prediction.

Now, lets’ try with IOU of 0.8.

```
filtered_boxes = nms_pytorch(P,0.8)
```

The following is the result that we get:

This time we have two boxes in the final plot. The reason is that this time the IoU was 0.8. And any box that has an IoU of less than 0.8 with the highest confidence box will not be suppressed and will be treated as a separate valid bounding box.

This is a very simple example. However, in this post, we have used an image with prediction boxes on it predicted by an actual Resnet model. If we look at the image below without applying NMS, there a lot of overlapping BBoxes (especially around the face of **Fez**, the **rightmost guy** with a **light green** shirt 🙂 )

I have provided the `labels.txt`

containing all the prediction boxes in the code provided with this post. It also the contains the sample image as well as a function that shows the image before and after applying NMS algorithm. Make sure you check that out 🙂

Now if we apply our `nms_pytorch`

function, we get the following image.

Woah !! That’s a pretty big difference. Looks like the NMS removed all the overlapping predictions and we got a very clean looking final prediction.

## Outro

Congratulations on making it this far. That was a lot to cover, so I will recap it for you guys:

- We learned what NMS is? And why is it used in object detection pipelines?
- We learned about Intersection Over Union as a metric to quantify percent overlap between two BBoxes.
- We got to know how NMS works and implemented it in PyTorch.
- We tested it out on an example image and saw a massive difference between the prediction boxes in the images before and after applying NMS.

I have discussed the Vanilla NMS that is being used in a lot of computer vision pipelines currently. However, there are some problems with it that recent researches on NMS aim to fix. One of the solutions is Soft NMS algorithm. If you are interested in reading about this I recommend that you check out this research paper **Soft-NMS**.

So there you have it, your own implemented NMS in PyTorch that you can use in your deep learning pipelines.

That’s it from my side. I hope you enjoyed reading this. If there any questions, leave a comment below and I will try to answer them as soon as possible.

Until next time 🙂

## Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please click here. Alternately, sign up to receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.## References

[1] https://medium.com/@whatdhack/reflections-on-non-maximum-suppression-nms-d2fce148ef0a

[2] https://www.jeremyjordan.me/evaluating-image-segmentation-models/

[3] https://github.com/rbgirshick/fast-rcnn/blob/master/lib/utils/nms.py

[4] https://medium.com/koderunners/intersection-over-union-516a3950269c