Intersection Over Union (IoU) is a number that quantifies the degree of overlap between two boxes. In the case of object detection and segmentation, IoU evaluates the overlap of the Ground Truth and Prediction region.
If you are a computer vision practitioner or even an enthusiast, you must have come across the term very often. It is the first checkpoint for evaluating the accuracy of a model. In simple terms, it’s a metric that helps us measure the correctness of a prediction.
In this blog post, you will get a detailed and intuitive explanation of the following. ✅ Intersection over Union in Object Detection ✅ Intersection over Union in Image Segmentation ✅ Implementing IoU using NumPy ✅ Implementing IoU using PyTorch |
- Intersection over Union (IoU) in Object Detection
- Intersection over Union (IoU) in Image Segmentation
- A sample object detection example
- Implementing IoU using NumPy
- Implementing IoU using the built-in function box_iou in PyTorch
- Implementing IoU manually using PyTorch
- Conclusion
Intersection over Union in Object Detection
Let’s go through the following example to understand how IoU is calculated. Let three models- A, B, and C- be trained to predict birds. We pass an image through the models where we already know the Ground Truth (marked in red). The image below shows predictions of the models (marked in cyan).
1.1 Observations
- It is clear that the predicted box of Model A has more overlap with the Ground Truth as compared to Model B.
- However, Model C has an even higher overlap with the ground truth. But it also has a high overlap with the background.
- So from models B and C, it is clear that a metric based on only overlap is not a fair one as we should also account for localization accuracy. It is not just about matching the Ground Truth but how closely the prediction matches it.
- Therefore, we need a metric that will penalize the metric whenever,
- The prediction fails to predict the area inside the Ground Truth.
- The prediction overflows the Ground Truth.
Keeping the above in mind, the IoU metric has been designed.
1.2 Designing Intersection over Union metric for Object Detection
It is the ratio of the overlap area to the combined area of prediction and ground truth. The numerator will be lesser as the prediction fails to predict the area inside the ground truth. If the area of the predicted box is higher, the denominator will be higher, making the IoU lower.
IoU values range from 0 to 1. Where 0 means no overlap and 1 means perfect overlap.
Looking closely, we are adding the area of the intersection twice in the denominator. So actually, we calculate IoU as shown in the illustration below.
1.3 Qualitative Analysis of Predictions
With the help of the IoU threshold, we can decide whether the prediction is True Positive(TP), False Positive(FP), or False Negative(FN). The example below shows predictions with the IoU threshold ɑ set at 0.5.
The decision of making a detection as True Positive or False Positive completely depends on the requirement.
- The first prediction is True Positive as the IoU threshold is 0.5.
- If we set the threshold at 0.97, it becomes a False Positive.
- Similarly, the second prediction shown above is False Positive due to the threshold but can be True Positive if we set the threshold at 0.20.
- Theoretically, the third prediction can also be True Positive, given that we lower the threshold all the way to 0.
Intersection over Union in Image Segmentation
IoU in object detection is a helper metric. However, in image segmentation, IoU is the primary metric to evaluate model accuracy.
In the case of Image Segmentation, the area is not necessarily rectangular. It can have any regular or irregular shape. That means the predictions are segmentation masks and not bounding boxes. Therefore, pixel-by-pixel analysis is done here. Moreover, the definition of TP, FP, and FN is slightly different as it is not based on a predefined threshold.
(a) True Positive: The area of intersection between Ground Truth(GT) and segmentation mask(S). Mathematically, this is logical AND operation of GT and S i.e.,
(b) False Positive: The predicted area outside the Ground Truth. This is the logical OR of GT and segmentation minus GT.
(c) False Negative: Number of pixels in the Ground Truth area that the model failed to predict. This is the logical OR of GT and segmentation minus S.
We know from Object Detection that IoU is the ratio of the intersected area to the combined area of prediction and ground truth. Since the values of TP, FP, and FN are nothing but areas or number of pixels; we can write IoU as follows.
Note that we have already provided colab notebooks for PyTorch and Numpy versions in the code download section. Therefore, no need to install dependencies manually. However, if you use them locally, install PyTorch from the official source.
A Sample Object Detection Example
In the image above, the blue bounding box is the detected object. Given that the Ground Truth is known (shown in red), let us see how to implement IoU calculation using NumPy and PyTorch. We will see the available in-built function and define manual functions as well.
In the order of top left to bottom right corner, the coordinates are,
☑ Ground truth [1202, 123, 1650, 868]
☑ Prediction [1162.0001, 92.0021, 1619.9832, 694.0033]
In practice, the predictions are obtained from model inference.
Implementing Intersection over Union using NumPy
Now that we know how IoU is calculated in theory let us define a function to calculate IoU with our data, i.e., coordinates of the Ground Truth and Prediction.
(a). Import Dependencies for IoU
import numpy as np
np.__version__
(b). Defining a Function to Calculate IoU
Here, we find the coordinates of the bounding box surrounding the intersection area. Then subtract the area of intersection from the sum of the area of Ground Truth and Prediction. We add 1 while calculating height and width to counter zero division errors. Theoretically, it is possible to add an infinitesimally small positive value, say 0.0001. However, images are discrete. The minimum possible dimension of an image is 1×1. Therefore, we have to add 1.
def get_iou(ground_truth, pred):
# coordinates of the area of intersection.
ix1 = np.maximum(ground_truth[0], pred[0])
iy1 = np.maximum(ground_truth[1], pred[1])
ix2 = np.minimum(ground_truth[2], pred[2])
iy2 = np.minimum(ground_truth[3], pred[3])
# Intersection height and width.
i_height = np.maximum(iy2 - iy1 + 1, np.array(0.))
i_width = np.maximum(ix2 - ix1 + 1, np.array(0.))
area_of_intersection = i_height * i_width
# Ground Truth dimensions.
gt_height = ground_truth[3] - ground_truth[1] + 1
gt_width = ground_truth[2] - ground_truth[0] + 1
# Prediction dimensions.
pd_height = pred[3] - pred[1] + 1
pd_width = pred[2] - pred[0] + 1
area_of_union = gt_height * gt_width + pd_height * pd_width - area_of_intersection
iou = area_of_intersection / area_of_union
return iou
(c). Bounding Box Coordinates
ground_truth_bbox = np.array([1202, 123, 1650, 868], dtype=np.float32)
prediction_bbox = np.array([1162.0001, 92.0021, 1619.9832, 694.0033], dtype=np.float32)
(d). Get IoU Value
iou = get_iou(ground_truth_bbox, prediction_bbox)
print('IOU: ', iou)
Output
IOU: 0.6441399913136432
PyTorch Built-In Function for IoU
Pytorch already has a built-in function box_iou [1] to calculate IoU. Documentation in the Reference section. It takes the set of bounding boxes as inputs and returns an IoU tensor.
# Import dependencies.
import torch
from torchvision import ops
# Bounding box coordinates.
ground_truth_bbox = torch.tensor([[1202, 123, 1650, 868]], dtype=torch.float)
prediction_bbox = torch.tensor([[1162.0001, 92.0021, 1619.9832, 694.0033]], dtype=torch.float)
# Get iou.
iou = ops.box_iou(ground_truth_bbox, prediction_bbox)
print('IOU : ', iou.numpy()[0][0])
Output
IOU : 0.6436676
Implementing IoU by defining a function in PyTorch
The code flow is similar to the NumPy implementation that we have done above.
(a). Import Dependencies
import torch
torch.__version__
(b). Function to Calculate IoU
def get_iou_torch(ground_truth, pred):
# Coordinates of the area of intersection.
ix1 = torch.max(ground_truth[0][0], pred[0][0])
iy1 = torch.max(ground_truth[0][1], pred[0][1])
ix2 = torch.min(ground_truth[0][2], pred[0][2])
iy2 = torch.min(ground_truth[0][3], pred[0][3])
# Intersection height and width.
i_height = torch.max(iy2 - iy1 + 1, torch.tensor(0.))
i_width = torch.max(ix2 - ix1 + 1, torch.tensor(0.))
area_of_intersection = i_height * i_width
# Ground Truth dimensions.
gt_height = ground_truth[0][3] - ground_truth[0][1] + 1
gt_width = ground_truth[0][2] - ground_truth[0][0] + 1
# Prediction dimensions.
pd_height = pred[0][3] - pred[0][1] + 1
pd_width = pred[0][2] - pred[0][0] + 1
area_of_union = gt_height * gt_width + pd_height * pd_width - area_of_intersection
iou = area_of_intersection / area_of_union
return iou
(c). Bounding Box Coordinates
The prediction bounding box is usually obtained while performing model inference. We are defining it manually here for the sake of simplicity.
ground_truth_bbox = torch.tensor([[1202, 123, 1650, 868]], dtype=torch.float)
prediction_bbox = torch.tensor([[1162.0001, 92.0021, 1619.9832, 694.0033]], dtype=torch.float)
(d). Get IoU Value
iou_val = get_iou_torch(ground_truth_bbox, prediction_bbox)
print('IOU : ', iou_val.numpy()[0][0])
Output
IOU : 0.64413995
We can see that the output varies slightly. This error is introduced for adding 1 to counter zero division error. In practice, values are clamped to a Min-Max range. Here, let’s keep it as it is for the sake of simplicity. You can also look at the source code [2] for a better understanding.
Conclusion
So that’s all about Intersection over Union or Jaccard Index. In this blog post, we discussed the basics of IoU and why it is needed. You also learned the implementation of IoU using NumPy and PyTorch. It should be noted that IoU in object detection does not have the same meaning in segmentation.
In object detection, IoU does not calculate the accuracy of a model directly. Rather, it is a helper metric that evaluates the degree of overlap between ground truth and the prediction.
We have Average Precision (AP) and Mean Average Precision (mAP) metrics for evaluating model accuracy. When we see [email protected], [email protected], etc. These are essentially mAP values calculated at IOU thresholds 0.5 and 0.75 respectively. We will discuss more on mAP in a separate blog post.
Must Read Articles
Congratulations on making your way to the end of the post. I appreciate your commitment to mastering computer vision. Here are a few more articles that you might find interesting. 1. Mean Average Precision in Object Detection 2. Object Detection using YOLOv5 and OpenCV DNN in C++ and Python 3. Custom Object Detection Training using YOLOv5 4. YOLOv6 Object Detection – Paper Explanation and Inference 5. YOLOX Object Detector Paper Explanation and Custom Training 6. YOLOv7 Object Detection Paper Explanation and Inference 7. Fine Tuning YOLOv7 |