YOLO-NAS: New Object Detection Model Beats YOLOv6 & YOLOv8

Meet YOLO-NAS: New YOLO Object Detection Model Beats YOLOv6 & YOLOv8

Developing a new YOLO-based architecture can redefine state-of-the-art (SOTA) object detection by addressing the existing limitations and incorporating recent advancements in deep learning. Deep learning firm Deci.ai has recently launched YOLO-NAS. This deep learning model delivers superior real-time object detection capabilities and high performance ready for production. These YOLO-NAS models were constructed using Deci’s AutoNAC™ NAS technology and outperformed models like YOLOv7, and YOLOv8, including the recently launched YOLOv6-v3.0.

What is YOLO-NAS?
Some Key Architectural Insights into YOLO-NAS
A Brief Summary Training of YOLO-NAS Models
How To Use YOLO-NAS For Inference?
1. Object Detection Inference
Conclusion
References

YOLO Master Post – Every Model Explained

Unlock the full story behind all the YOLO models’ evolutionary journey: Dive into our extensive pillar post, where we unravel the evolution from YOLOv1 to YOLO-NAS. This essential guide is packed with insights, comparisons, and a deeper understanding that you won’t find anywhere else.
Don’t miss out on this comprehensive resource, Mastering All Yolo Models for a richer, more informed perspective on the YOLO series.

Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024)

What is YOLO-NAS?

YOLO-NAS is a new real-time state-of-the-art object detection model that outperforms both YOLOv6 & YOLOv8 models in terms of mAP (mean average precision) and inference latency.

Left: YOLO-NAS S || Center: YOLO-NAS M || Right: YOLO-NAS L

Developing a new YOLO-based architecture can redefine state-of-the-art (SOTA) object detection by addressing the existing limitations and incorporating recent advancements in deep learning.

YOLO-NAS is a new foundational model for object detection from Deci.ai. The team has incorporated recent advancements in deep learning to seek out and improve some key limiting factors of current YOLO models, such as inadequate quantization support and insufficient accuracy-latency tradeoffs. In doing so, the team has successfully pushed the boundaries of real-time object detection capabilities.

“Imagine a new YOLO-based architecture that could enhance your ability to detect small objects, improve localization accuracy, and increase the performance-per-compute ratio, making the model more accessible for real-time edge-device applications…..And that’s precisely what we’ve done here at Deci.”
– Deci.ai YOLO-NAS team

By addressing existing limitations and incorporating recent advancements in deep learning, here are the 4 key advantages that YOLO-NAS offers:

✅ YOLO-NAS runs at unparalleled accuracy and speed, outperforming other well known models (see chart below).

3/n pic.twitter.com/3x8DYotCq6
— Deci AI (@deci_ai) May 3, 2023

The newly released models are:

“Designed specifically for production use, YOLO-NAS is fully compatible with high-performance inference engines like NVIDIA® TensorRT™ and supports INT8 quantization for unprecedented runtime performance.“
– Deci.ai team

As of writing this article, three YOLO-NAS models have been released that can be used in FP32, FP16, and INT8 precisions.

Mean Average Precision (mAP) is a performance metric for evaluating machine learning models.

Currently, the YOLO-NAS model architectures are available under an open-source license, but the pre-trained weights are available for research use (non-commercial) on Deci’s SuperGradients library only.

What does the “NAS” in YOLO-NAS stand for?

The “NAS” stands for “Neural Architecture Search,” a technique used to automate the design process of neural network architectures. Instead of relying on manual design and human intuition, NAS employs optimization algorithms to discover the most suitable architecture for a given task. NAS aims to find an architecture that achieves the best trade-off between accuracy, computational complexity, and model size.

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

Some Key Architectural Insights into YOLO-NAS

YOLO-NAS models architecture variants. — The red dots from left to right indicate small, medium and large models.

The architectures of YOLO-NAS models were “found” using Deci’s proprietary NAS technology, AutoNAC. This engine was used to ascertain the optimal sizes and structures of stages, encompassing block type, the number of blocks, and the number of channels in each stage.
In all, there were 10¹⁴ possible architecture configurations in the NAS search space. Being hardware and data-aware, the AutoNAC engine considers all the components in the inference stack, including compilers and quantization, and honed into a region termed the “efficiency frontier” to find the best models. All three YOLO-NAS models were found in this region of the search space.
Throughout the NAS process, Quantization-Aware RepVGG (QA-RepVGG) blocks are incorporated into the model architecture, guaranteeing the model’s compatibility with Post-Training Quantization (PTQ).
Using quantization-aware “QSP” and “QCI” modules consisting of QA-RepVGG blocks provide 8-bit quantization and reparameterization benefits, enabling minimal accuracy loss during PTQ.
The researchers also use a hybrid quantization method that selectively quantizes specific layers to optimize accuracy and latency tradeoffs while maintaining overall performance.
YOLO-NAS models also use attention mechanisms and inference time reparametrization to improve object detection capabilities.

A Brief Summary Training of YOLO-NAS Models

The full details of the entire training regimen are not declared at the time of writing this article. We’ll update this section as soon as a paper or any new information is available. From what we can gather from their official press release, the models underwent a coherent and expensive training process.

The models were pre-trained on the famous Object365 benchmark dataset. A dataset consisting of 2M images and 365 categories.
Another pretraining round after “pseudo-labeling” 123k COCO unlabeled images.
Knowledge Distillation (KD) & Distribution Focal Loss (DFL) were also incorporated to enhance the training process of YOLO-NAS models.

“The YOLO-NAS architecture and pre-trained weights define a new frontier in low-latency inference and an excellent starting point for fine-tuning downstream tasks.”
– Deci.ai team

After the models were pre-trained, as an experiment, the team decided to test the performance on RoboFlow’s “RoboFlow100 dataset” to demonstrate its ability to handle complex object detection tasks. Needless to say, YOLO-NAS outperformed other YOLO versions by a considerable margin.

YOLO-NAS comparison results on the ROBOFLOW100 dataset against popular YOLO models.

How To Use YOLO-NAS For Inference?

YOLO-NAS models have been tightly coupled and available with SuperGradients, Deci’s PyTorch-based, open-source, computer vision training library. In effect, using these models is really easy.

To use YOLO-NAS models, we first need to install some libraries. In a new development environment, execute the following Python installation commands:

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

pip install -qU super-gradients imutils
pip install -qU roboflow
pip install -qU pytube

The “super-gradients” package will install all the required packages, such as PyTorch and TorchVision, with Cuda support and other necessary libraries as well.

Object Detection Inference

For inference, first, we’ll import the two necessary packages:

import torch
from super_gradients.training import models

Running inference on images is super easy. The following command loads the model YOLO-NAS S small model.

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

model = models.get("yolo_nas_s", pretrained_weights="coco").to(device)

#                  "yolo_nas_m"

#                  "yolo_nas_l"

We are testing the model on the classic object detection test image. We need to call the model’s .predict(...) method to perform inference.

out = model.predict("test_1.jpg", conf=0.6)

Finally, to visualize the outputs, simply run: out.show()

To save the predicted image, call:

out.save("save_folder_path")

Similarly, for running inference on videos, the API calls remain the same. Only this time, we will use the largest available model.

model = models.get("yolo_nas_l", pretrained_weights="coco").to(device)

model.predict("/test_videos/kitchen_small_items.mp4").save("kitchen_small_items_detections.mp4")

On free-tier Colab T4 GPU, this inference ran at ~22 iterations/sec, i.e., 22 FPS. The 30-second video took about 35 seconds to be processed completely.

Here are some more video inference results on a drone shot:

YOLO-NAS S

YOLO-NAS M

The “S” small model took 15 seconds, while the “M” medium model took 22 seconds on T4.

Please note that (at the moment) we’re refraining from making a concrete comparison of the models against YOLOv8 and YOLOv6 because even though YOLO-NAS models are better regarding the number, the gap is small. Any comparison we try to make will be minimal in terms of impact. The accurate comparison will happen when, in the next post of the YOLO-NAS series, we train these models on custom tasks and record the ease and quality of training against the current giants.

YOLO-NAS comparison against current best YOLO models on the COCO dataset.

YOLO-NAS-L

YOLOv8-L

When comparing the inference speeds, i.e., pre-processing + forward pass + post-processing. The FP32 YOLO-NAS L model took 24secs while YOLOv8-L took ~12 secs to process the entire video. The predictions of both models are very similar, and one cannot figure them out solely by looking at the videos.
Please note that the difference in the video lengths is because YOLOv8-L encoded the video at 29 FPS while YOLO-NAS L encoded it at 29.97 FPS (original).

Here’s a table listing all the metrics from YOLOv6 3.0, YOLOv8, and YOLO-NAS:

Model	AP^val_50:95	Precision	Latency (bs=1) ms	Params (M)
YOLOv8-S	44.9	FP32	3.2	11.2
YOLOv6-S	45.0	FP32	2.9	18.5
YOLO-NAS S	47.5	FP16	3.21	19.0
YOLO-NAS S INT-8	47.03	INT8	2.36	19.0
YOLOv8-M	50.2	FP32	7.0	25.9
YOLOv6-M	50.0	FP32	5.7	34.9
YOLO-NAS M	51.55	FP16	5.85	51.1
YOLO-NAS M INT-8	51.0	INT8	3.78	51.1
YOLOv8-L	52.9	FP32	11.0	43.7
YOLOv6-L	52.8	FP32	10.3	59.6
YOLO-NAS L	52.22	FP16	7.87	66.9
YOLO-NAS L INT-8	52.1	INT8	4.78	66.9

That’s it; that’s all we wanted to cover in this quick introduction and hands-on of the newly released and the current SOTA – YOLO-NAS

Conclusion

The landscape of object detection state-of-the-art (SOTA) models is constantly evolving, driven by relentless research and innovation in the field of computer vision and deep learning. In recent times, YOLOv6 & YOLOv8 have been regarded as the best real-time object detection models openly available. Quite recently, a new competitor model, “YOLO-NAS” by Deci.ai has taken the top spot in terms of providing better real-time object detection capabilities.

In this article, we explored the latest installment of YOLO models, i.e., YOLO-NAS. We covered, in brief, the exploration and training details of these new models and their performance. Additionally, we also performed inference on videos.

Soon we’ll be adding new posts for custom training and deployment of these models.

If you conduct any experiments of your own, please share the results with us in the comment section. We would be delighted to hear about them.
In case you missed it, here’s the complete list of posts from our YOLO series: