If you are undertaking an object detection project, the probability is high that you would choose one of the many YOLO models. Going by the number of YOLO object detection models out there, it’s a tough choice to make on how to choose the best one.
You may find yourself contemplating over:
- Which YOLO model to choose for the best FPS?
- What about inference speed on CPU vs GPU?
- Which GPU to choose?
- Tiny, Small, Medium, or Large model?
- Which YOLO model is the most accurate?
These questions become even more relevant when building real-world applications.
Our main objective in writing this article is to address the above questions by performing a thorough performance comparison of the different YOLO object detection models.This definitive guide will give you a complete and well rounded perspective of which model stands where in terms of its strengths, shortcomings and more.
For carrying out a comparative analysis we choose the YOLOv5, YOLOv6, and YOLOv7 families of models. We choose these models because these are the latest and some of the best YOLO models. Other comparable models YOLOX and YOLOR, we plan on adding them to this comparison if time permits.
The performance evaluation criteria will be based on 3 key points:
- Accuracy of the models in terms of Mean Average Precision (mAP)
- Speed of Inference in terms of Frames Per Second (FPS)
- Type of GPU Used: Gaming or AI GPUs. GTX 1080 Ti, RTX 4090, Tesla V100, and Tesla P100 GPUs to be specific.
Along with the above, we will also uncover how the FPS of different YOLO models is influenced when using either a gaming GPU or an AI GPU for inference.
We will additionally provide you with answers to the most frequently asked questions on the YOLO models:
- Which YOLO model is the fastest on the CPU?
- Which YOLO model is the fastest on the GPU?
- Why do we encounter a decrease in FPS with Tiny/Nano models on some GPUs?
- Which YOLO model is the most accurate?
- Which are some of the best models to fine-tune from each, YOLOv5, YOLO6, and YOLOv7?
- Which models are the best for small object detection?
- How much GPU VRAM do I need for training YOLO models?
We highly recommend this article for someone who is trying to build an application using YOLO object detection models and aspires to get the best results.
Table of Contents
- FPS Performance Comparison of YOLO Models on CPU
- FPS Performance Comparison of YOLO Models on NVIDIA RTX 4090 GPU
- Performance Comparison of YOLO Models on NVIDIA Tesla P100, V100, GTX 1080 Ti, and RTX 4090
- Performance Comparison of YOLO Models for mAP vs FPS
- YOLOv5 Inference At More than 230 FPS on NVIDIA RTX 4090
- FAQs About Performance Comparison of YOLO Object Detection Models
YOLO Master Post – Every Model Explained
Don’t miss out on this comprehensive resource, Mastering All Yolo Models for a richer, more informed perspective on the YOLO series.
FPS Performance Comparison of YOLO Models on CPU
For the CPU performance benchmarks, we use a machine with i7 6850K CPU with 32GB of RAM.
The following is a bar graph showing the FPS of each model from YOLOv5, YOLOv6, and YOLOv7 in a sorted manner.
From the graph, it’s clearly evident that the YOLOv5 Nano and YOLOv5 Nano P6 are some of the fastest models on CPU.
With just above 30 FPS, they can perform at more than real-time speed.
In case you want more than 20 FPS, then you can choose either of the four models – YOLOv6 Tiny, YOLOv6 Nano, YOLOv5 Nano P6, or YOLOv5 Nano.
You may observe that it is challenging to break the 30 FPS barrier even on a 6th-generation i7 CPU. To squeeze the real performance out of the YOLO models, we need to use a GPU.
FPS Performance Comparison of YOLO Models on NVIDIA RTX 4090 GPU
For the GPU inference, we use a machine with the latest flagship CUDA enabled GPU from NVIDIA, the RTX 4090. It is coupled with an AMD Ryzen 9 7950X 16-Core Processor.
FPS Results on 640 Resolution Images
Here, the following graph shows the FPS results for all the models of YOLOv5, YOLOv6, and YOLOv7 on 640 resolution images.
The results are absolutely astounding – the YOLOv5 Nano model is running at 230 FPS! Very interestingly, even the largest of the models from each of the YOLO family do not go below 30 FPS. All the models are running in real-time.
FPS Results on 1280 Resolution Images
Bear in mind that the YOLOv5 P6 models and YOLOv7-W6, E6, D6, and E6E are trained on 1280 images. To attain the best detection results, we should run inference on 1280-resolution images on them.
The following graph shows the FPS for 1280-resolution images using these models.
We can see drastic drops in FPS when moving from smaller to larger models in YOLOv5. For YOLOv7, the largest models, D6 and E6E, run at less than 30 FPS. The above graph does not include YOLOv6 models, as none of the YOLOv6 models were trained on 1280-resolution images.
The following are performance comparison graphs for YOLO models that show FPS graphs for P100, V100, and GTX 1080 Ti GPUs.
Let’s list out some of the observations from the above plots:
- In general, AI GPUs like TESLA V100, and P100 perform consistently over model scales. This means they will output more FPS on smaller models and less FPS on larger ones.
- This is not always the case with the GTX 1080 Ti GPU. We can observe that YOLOv5n is running slightly slower than YOLOv5s.
- Even then, in some cases with P100 (it is an older AI GPU), we can see smaller models running slower compared to larger models. For instance, YOLOv6-Tiny is running at 77 FPS, while YOLOv6-Nano, which is smaller than Tiny, is running at 71 FPS.
Performance Comparison of YOLO Models on NVIDIA Tesla P100, V100, GTX 1080 Ti, and RTX 4090
What Are the Fastest Models from Each YOLO Family on GPU?
In the above sections, we saw how the YOLO models perform on specific CPU and GPU architecture.
Let’s go ahead and conduct a comparison of the YOLO object detection models on different GPUs. Our objective is to find the fastest model after testing them on the following NVIDIA GPUs:
- TESLA P100 GPU
- TESLA V100 GPU
- GTX 1080Ti GPU
- RTX 4090 GPU
From the above graph, we can observe the following:
- On the RTX 4090 GPU and TESLA P100, YOLOv5 Nano emerges as the fastest.
- YOLOv7 Tiny gives the most throughput on the GTX 1080 Ti and TESLA V100.
- The YOLOv6 Nano and Tiny models do not perform at the same FPS as the YOLOv5 and YOLOv7 models, although they are not very slow.
What are the Fastest YOLO Models on i7 6850K CPU?
On a general consumer CPU, we can expect the YOLOv5 Nano models (either P5 or P6) to be the fastest.
They give real-time FPS (more than 30), while the YOLOv7 Tiny runs at around 20 FPS.
Performance Comparison of YOLO Models for mAP vs. FPS
In this section, we compare the different models on CPU and different GPUs according to their mAP (Mean Average Precision) and FPS.
In the following graphs, all the mAP results have been reported at 0.50:0.95 IoU (Intersection Over Union).
Let’s begin by looking at the mAP vs. FPS comparison graph for the CPU throughput. As the CPUs are not meant for large models, we compare the YOLOv5 Nano (P5 and P6), YOLOv6 Nano, and YOLOv7 Tiny models.
Note: mAP values for YOLOv7 Tiny and YOLOv5 Nano P6 have been recalculated on 640 resolution images.
Now, let’s take it further by comparing mAP and FPS on the GPUs.
Note: In some cases, when running experiments on GTX GPUs (gaming GPU from NVIDIA), we can see anomalies in the FPS of the smallest and the second smallest models in the YOLO family. We speculate that this happens because of the type of layer implementation.
The smallest models in the YOLO families are meant for edge devices and generally don’t use the same layers as the bigger models. Hence, they give slightly lower FPS compared to the second smallest model in that particular family. Such issues get resolved when using newer GPUs (like the RTX series) or AI GPUs (like the TESLA V100 series).
Some of the older AI GPUs like TESLA P100 also show this anomaly.
The following graphs show YOLOv5, YOLOv6, and YOLOv7 models pre-trained on 640-resolution images. The inference was also run on videos with the frames resized to 640 resolution.
We are excluding the YOLOv7-Tiny model from GPU experiments as it was pretrained on 416 resolution images. Besides, we have already seen how the YOLOv7 Tiny model performs on 640 resolution images when run on a CPU.
The following experiments were run on an NVIDIA RTX 4090 GPU.
It’s fascinating to see that even medium and large models from the YOLOv5 family can run at more than 100 FPS on the RTX 4090.
Our next graph shows the mAP and FPS comparison of 1280-resolution pre-trained models. The inference was conducted on 1280-resolution frames.
The YOLOv5m P6 model is running at more than 100 FPS. However, the YOLOv5l P6 model drops below 100 FPS.
The rest of the following graphs show the mAP and FPS comparison of 640-resolution pre-trained models on TESLA P100, TESLA V100, and GTX 1080 Ti GPU.
A key observation here is that the FPS of YOLOv6 models seems to saturate between 150 to 170 FPS. This is observed with the latest RTX 4090 GPU and the V100 GPU. However, the FPS of the YOLOv5 models does not appear to display this effect.
It is clear from the above graphs that the YOLOv5 Nano P5 model is capable enough to run at more than 230 FPS on the NVIDIA RTX 4090 GPU.
YOLOv5 Inference At More than 230 FPS on NVIDIA RTX 4090
The NVIDIA RTX 4090 is the latest flagship gaming GPU. But it can also be used for AI and Deep Learning just as efficiently as was shown above.
We used the RTX 4090 GPU to run inference on the YOLOv5 Nano model to check the FPS.
Guess what? The RTX 4090 GPU can easily output more than 230 frames per second.
The following video shows an example of such a result.
FAQs About Performance Comparison of YOLO Object Detection Models
Here are some Frequently Asked Questions that most beginners getting started with YOLO object detection ask.
Which YOLO model is the fastest on the CPU?
Although the numbers vary depending on the CPU architecture, we can find a similar trend for the speed. The smaller the model, the faster it is on the CPU. From our experiments, we find that YOLOv5 Nano and Nano P6 models are the fastest. They can run at more than 30 FPS, even on an older generation i7 CPU.
Which YOLO model is the fastest on the GPU?
It is a bit tricky to answer this question. The FPS will vary depending on the GPU architecture (GTX/RTX Gaming or TESLA AI). Even older models of TESLA-based AI GPUs like P100 may show anomalies in FPS.
If you are looking at more than 200 FPS inference speed, then RTX 4090 GPU paired with YOLO Nano/Tiny/Small models is the way to go.
Why do we encounter a decrease in FPS with Tiny/Nano models on some GPUs?
We encounter this mostly on older GPUs like the TESLA P100 or the GTX 1080 Ti. This is most probably because the processing of the layer implementations is not well optimized in these GPUs for the Nano or Tiny models. We can also see this issue getting resolved in newer RTX and TESLA V100 GPUs.
Which YOLO model is the most accurate?
It is difficult to point out a single YOLO model out of all. But it is pretty safe to say that the largest models from each family (v5x, v6l, v7x, and v7d6/e6e) perform quite well. If your application demands using any of the COCO classes, then using one of the mentioned pre-trained models will give you very accurate predictions.
Which are some of the best models to fine-tune from YOLOv5, YOLO6, and YOLOv7?
Using huge models like YOLOv5x or x6 should be the last resort. From the YOLOv5 family, fine tuning YOLOv5m at 640 resolution will yield good results. It is capable of running at more than 80 FPS, even on an older GPU like the TESLA P100. All the while still giving an mAP of 45.4.
YOLOv6m is also a pretty good model with 49.5 mAP and almost 50 FPS on the TESLA P100 GPU. Custom dataset training of YOLOv6 Medium model should give some of the best results.
Similarly, fine tuning YOLOv7 provides a good balance between FPS and mAP. They can run at 56 FPS while giving more than 51 mAP.
Given the COCO pre-trained mAP of the above models, all of them are good for fine-tuning at 640 image resolution.
Which models are the best for small object detection?
If you are working with small object detection, then starting with YOLOv5m6 at 1280 resolution will be a good idea.
Similarly, YOLOv6l at 1280 resolution images or YOLOv7-W6 are also fairly good.
You can also experiment with multi-resolution training, which in most cases, helps with small object detection.
How much GPU VRAM do I need for training YOLO models?
There are multiple points to take into consideration here. Let’s assume you are training a model the size of YOLOvs, or YOLOv5m, or even YOLOv5l. A GPU with 10 GB VRAM should be enough with an input image resolution of 640. The same approach can be applied to YOLOv6m, YOLOv7, and YOLOv7x. This requires playing with the batch size a bit, but YOLO repositories handle most of the batch sizes quite well.
In case you want to train any of the YOLOv5 P6 models, or YOLOv6l, or YOLOv7-W6 to YOLOv7-D6, then you should consider having at least 16GB VRAM. Further, when considering training with larger batch sizes like 32 or 64, then having GPUs with 24GB VRAM is better. There are cases when you may want to carry out multi-resolution training. In such cases, using distributed training with multiple GPUs (at least 2) works the best. All the above YOLO repositories support distributed training.
To Conclude
During our extensive testing and carrying out a comparative analysis of the YOLO object detection models:.
We specifically focused on YOLOv5, YOLOv6, and YOLOv7 models and checked how they stack up against each other in terms of FPS and accuracy.
We also conducted experiments on the GTX, RTX, and TESLA series of NVIDIA GPUs to validate our results.
Congrats on making it till here. If you could follow along easily or even with a little more effort, kudos to you. If you do run your own experiments on different hardware, please share your findings with us in the comment section. We would love to hear from you!