In this post, we will understand what is Yolov3 and learn how to use YOLOv3 — a state-of-the-art object detector — with OpenCV.
YOLOv3 is the latest variant of a popular object detection algorithm YOLO – You Only Look Once. The published model recognizes 80 different objects in images and videos, but most importantly, it is super fast and nearly as accurate as Single Shot MultiBox (SSD).
Starting with OpenCV 3.4.2, you can easily use YOLOv3 models in your own OpenCV application.
This post mainly focuses on inference, but you can also find our tutorial on training your own YOLOv3 model on your dataset.
How does YOLO work?
We can think of an object detector as a combination of an object locator and an object recognizer.
Traditional computer vision approaches used a sliding window to look for objects at different locations and scales. Because this was such an expensive operation, the aspect ratio of the object was usually assumed to be fixed.
Early Deep Learning based object detection algorithms like the R-CNN and Fast R-CNN used a method called Selective Search to narrow down the number of bounding boxes that the algorithm had to test.
Another approach called Overfeat involved scanning the image at multiple scales using sliding windows-like mechanisms done convolutionally.
This was followed by Faster R-CNN, which used a Region Proposal Network (RPN) to identify bounding boxes that needed testing. By clever design, the features extracted for recognizing objects were also used by the RPN for proposing potential bounding boxes, thus saving a lot of computation.
YOLO, on the other hand, approaches the object detection problem in a completely different way. It forwards the whole image only once through the network. SSD is another object detection algorithm that forwards the image through a deep learning network, but YOLOv3 is much faster than SSD while achieving comparable accuracy. YOLOv3 gives faster than real-time results on a M40, TitanX or 1080 Ti GPUs.
Let’s see how YOLO detects the objects in a given image.
First, it divides the image into a 13×13 grid of cells. The size of these 169 cells varies depending on the input size. For a 416×416 input size that we used in our experiments, the cell size was 32×32. Each cell is then responsible for predicting the number of boxes in the image.
For each bounding box, the network also predicts the confidence that the bounding box actually encloses an object, and the probability of the enclosed object being a particular class.
Most of these bounding boxes are eliminated because their confidence is low or because they are enclosing the same object as another bounding box with a very high confidence score. This technique is called non-maximum suppression.
The authors of YOLOv3, Joseph Redmon and Ali Farhadi have made YOLOv3 faster and more accurate than their previous work YOLOv2. YOLOv3 handles multiple scales better. They have also improved the network by making it bigger and taking it towards residual networks by adding shortcut connections.
Why use OpenCV for YOLO?
Here are a few reasons you may want to use OpenCV for YOLO
- Easy integration with an OpenCV application: If your application already uses OpenCV and you want to use YOLOv3, you don’t have to worry about compiling and building the extra Darknet code.
- OpenCV CPU version is 9x faster: OpenCV’s CPU implementation of the DNN module is astonishingly fast. For example, Darknet, when used with OpenMP takes about 2 seconds on a CPU for inference on a single image. In contrast, OpenCV’s implementation runs in a mere 0.22 seconds! Check out table below.
- Python support: Darknet is written in C and does not officially support Python. In contrast, OpenCV does. There are python ports available for Darknet, though.
Speed Test for YOLOv3 on Darknet and OpenCV
The following table shows the performance of YOLOv3 on Darknet vs. OpenCV. The input size in all cases is 416×416. It is not surprising the GPU version of Darknet outperforms everything else. It is also not suprising that Darknet with OpenMP works much better than Darknet without OpenMP because OpenMP enables use of multiple processors.
What is indeed surprising is that OpenCV’s CPU implementation of DNN is 9x faster than Darknet with OpenML.
OS | Framework | CPU/GPU | Time(ms)/Frame |
---|---|---|---|
Linux 16.04 | Darknet | 12x Intel Core i7-6850K CPU @ 3.60GHz | 9370 |
Linux 16.04 | Darknet + OpenMP | 12x Intel Core i7-6850K CPU @ 3.60GHz | 1942 |
Linux 16.04 | OpenCV [CPU] | 12x Intel Core i7-6850K CPU @ 3.60GHz | 220 |
Linux 16.04 | Darknet | NVIDIA GeForce 1080 Ti GPU | 23 |
macOS | DarkNet | 2.5 GHz Intel Core i7 CPU | 7260 |
macOS | OpenCV [CPU] | 2.5 GHz Intel Core i7 CPU | 400 |
Table 1: Speed Test of YOLOv3 on Darknet vs OpenCV
Note: We ran into problems using OpenCV’s GPU implementation of the DNN. The documentation indicates that it is tested only with Intel’s GPUs, so the code would switch you back to CPU, if you do not have an Intel GPU.
Object Detection using YOLOv3 in C++/Python
Let us now see how to use YOLOv3 in OpenCV to perform object detection.
Step 1 : Download the models
We will start by downloading the models using the script file getModels.sh from the command line.
sudo chmod a+x getModels.sh
./getModels.sh
This will download the yolov3.weights file (containing the pre-trained network’s weights), the yolov3.cfg file (containing the network configuration) and the coco.names file, which contains the 80 different class names used in the COCO dataset.
Step 2 : Initialize the parameters
The YOLOv3 algorithm generates bounding boxes as the predicted detection outputs. Every predicted box is associated with a confidence score. In the first stage, all the boxes below the confidence threshold parameter are ignored for further processing.
The rest of the boxes undergo non-maximum suppression, removing redundant, overlapping bounding boxes. Non-maximum suppression is controlled by a parameter nmsThreshold. You can try to change these values and see how the number of output predicted boxes changes.
Next, the default values for the input width (inpWidth) and height (inpHeight) for the network’s input image are set. We set each of them to 416 to compare our runs to the Darknet’s C code given by YOLOv3’s authors. You can change both of them to 320 to get faster results or 608 to get more accurate results.
Python
# Initialize the parameters
confThreshold = 0.5 #Confidence threshold
nmsThreshold = 0.4 #Non-maximum suppression threshold
inpWidth = 416 #Width of network's input image
inpHeight = 416 #Height of network's input image
C++
// Initialize the parameters
float confThreshold = 0.5; // Confidence threshold
float nmsThreshold = 0.4; // Non-maximum suppression threshold
int inpWidth = 416; // Width of network's input image
int inpHeight = 416; // Height of network's input image
Step 3 : Load the model and classes
The file coco.names contains all the objects for which the model was trained. We read class names.
Next, we load the network, which has two parts —
- yolov3.weights : The pre-trained weights.
- yolov3.cfg : The configuration file.
We set the DNN backend to OpenCV here and the target to CPU. You could try setting the preferable target to cv.dnn.DNN_TARGET_OPENCL to run it on a GPU. But keep in mind that the current OpenCV version is tested only with Intel’s GPUs; it would automatically switch to CPU, if you do not have an Intel GPU.
Python
# Load names of classes
classesFile = "coco.names";
classes = None
with open(classesFile, 'rt') as f:
classes = f.read().rstrip('\n').split('\n')
# Give the configuration and weight files for the model and load the network using them.
modelConfiguration = "yolov3.cfg";
modelWeights = "yolov3.weights";
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)
C++
// Load names of classes
string classesFile = "coco.names";
ifstream ifs(classesFile.c_str());
string line;
while (getline(ifs, line)) classes.push_back(line);
// Give the configuration and weight files for the model
String modelConfiguration = "yolov3.cfg";
String modelWeights = "yolov3.weights";
// Load the network
Net net = readNetFromDarknet(modelConfiguration, modelWeights);
net.setPreferableBackend(DNN_BACKEND_OPENCV);
net.setPreferableTarget(DNN_TARGET_CPU);
Step 4 : Read the input
In this step, we read the image, video stream, or webcam. In addition, we also open the video writer to save the frames with detected output bounding boxes.
Python
outputFile = "yolo_out_py.avi"
if (args.image):
# Open the image file
if not os.path.isfile(args.image):
print("Input image file ", args.image, " doesn't exist")
sys.exit(1)
cap = cv.VideoCapture(args.image)
outputFile = args.image[:-4]+'_yolo_out_py.jpg'
elif (args.video):
# Open the video file
if not os.path.isfile(args.video):
print("Input video file ", args.video, " doesn't exist")
sys.exit(1)
cap = cv.VideoCapture(args.video)
outputFile = args.video[:-4]+'_yolo_out_py.avi'
else:
# Webcam input
cap = cv.VideoCapture(0)
# Get the video writer initialized to save the output video
if (not args.image):
vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M','J','P','G'), 30, (round(cap.get(cv.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv.CAP_PROP_FRAME_HEIGHT))))
C++
outputFile = "yolo_out_cpp.avi";
if (parser.has("image"))
{
// Open the image file
str = parser.get<String>("image");
ifstream ifile(str);
if (!ifile) throw("error");
cap.open(str);
str.replace(str.end()-4, str.end(), "_yolo_out_cpp.jpg");
outputFile = str;
}
else if (parser.has("video"))
{
// Open the video file
str = parser.get<String>("video");
ifstream ifile(str);
if (!ifile) throw("error");
cap.open(str);
str.replace(str.end()-4, str.end(), "_yolo_out_cpp.avi");
outputFile = str;
}
// Open the webcaom
else cap.open(parser.get<int>("device"));
catch(...) {
cout << "Could not open the input image/video stream" << endl;
return 0;
}
// Get the video writer initialized to save the output video
if (!parser.has("image")) {
video.open(outputFile, VideoWriter::fourcc('M','J','P','G'), 28, Size(cap.get(CAP_PROP_FRAME_WIDTH), cap.get(CAP_PROP_FRAME_HEIGHT)));
}
Step 4 : Process each frame
The input image to a neural network needs to be in a certain format called a blob.
After a frame is read from the input image or video stream, it is passed through the blobFromImage function to convert it to an input blob for the neural network. In this process, it scales the image pixel values to a target range of 0 to 1 using a scale factor of 1/255. It also resizes the image to the given size of (416, 416) without cropping. Note that we do not perform any mean subtraction here; hence pass [0,0,0] to the mean parameter of the function and keep the swapRB parameter to its default value of 1.
The output blob is then passed into the network as its input, and a forward pass is run to get a list of predicted bounding boxes as the network’s output. These boxes go through a post-processing step to filter out the ones with low confidence scores. We will go through the post-processing step in more detail in the next section. We print out the inference time for each frame at the top left. The image with the final bounding boxes is then saved to the disk, either as an image for an image input or using a video writer for the input video stream.
Python
while cv.waitKey(1) < 0:
# get frame from the video
hasFrame, frame = cap.read()
# Stop the program if reached end of video
if not hasFrame:
print("Done processing !!!")
print("Output file is stored as ", outputFile)
cv.waitKey(3000)
# Release device
cap.release()
break
# Create a 4D blob from a frame.
blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)
# Sets the input to the network
net.setInput(blob)
# Runs the forward pass to get output of the output layers
outs = net.forward(getOutputsNames(net))
# Remove the bounding boxes with low confidence
postprocess(frame, outs)
# Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
t, _ = net.getPerfProfile()
label = 'Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency())
cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
# Write the frame with the detection boxes
if (args.image):
cv.imwrite(outputFile, frame.astype(np.uint8))
else:
vid_writer.write(frame.astype(np.uint8))
C++
// Process frames.
while (waitKey(1) < 0)
{
// get frame from the video
cap >> frame;
// Stop the program if reached end of video
if (frame.empty()) {
cout << "Done processing !!!" << endl;
cout << "Output file is stored as " << outputFile << endl;
waitKey(3000);
break;
}
// Create a 4D blob from a frame.
blobFromImage(frame, blob, 1/255.0, cv::Size(inpWidth, inpHeight), Scalar(0,0,0), true, false);
//Sets the input to the network
net.setInput(blob);
// Runs the forward pass to get output of the output layers
vector<Mat> outs;
net.forward(outs, getOutputsNames(net));
// Remove the bounding boxes with low confidence
postprocess(frame, outs);
// Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
vector<double> layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
string label = format("Inference time for a frame : %.2f ms", t);
putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));
// Write the frame with the detection boxes
Mat detectedFrame;
frame.convertTo(detectedFrame, CV_8U);
if (parser.has("image")) imwrite(outputFile, detectedFrame);
else video.write(detectedFrame);
imshow(kWinName, frame);
}
Now let’s go into details of some of the function calls used above.
Step 4a : Getting the names of output layers
The forward function in OpenCV’s Net class needs the ending layer till which it should run in the network. Since we want to run through the whole network, we need to identify the last layer of the network. We do that by using the function getUnconnectedOutLayers() which gives the names of the unconnected output layers, which are essentially the last layers of the network. Then we run the forward pass of the network to get output from the output layers, as in the previous code snippet (net.forward(getOutputsNames(net))).
Python
# Get the names of the output layers
def getOutputsNames(net):
# Get the names of all the layers in the network
layersNames = net.getLayerNames()
# Get the names of the output layers, i.e. the layers with unconnected outputs
return [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
C++
// Get the names of the output layers
vector<String> getOutputsNames(const Net& net)
{
static vector<String> names;
if (names.empty())
{
//Get the indices of the output layers, i.e. the layers with unconnected outputs
vector<int> outLayers = net.getUnconnectedOutLayers();
//get the names of all the layers in the network
vector<String> layersNames = net.getLayerNames();
// Get the names of the output layers in names
names.resize(outLayers.size());
for (size_t i = 0; i < outLayers.size(); ++i)
names[i] = layersNames[outLayers[i] - 1];
}
return names;
}
Step 4b : Post-processing the network’s output
The network outputs bounding boxes are each represented by a vector of a number of classes + 5 elements.
The first 4 elements represent the center_x, center_y, width, and height. The fifth element represents the confidence that the bounding box encloses an object.
The rest of the elements are the confidence associated with each class (i.e., object type). The box is assigned to the class corresponding to the highest score for the box.
The highest score for a box is also called its confidence. If the confidence of a box is less than the given threshold, the bounding box is dropped and not considered for further processing.
The boxes with confidence equal to or greater than the confidence threshold are then subjected to Non Maximum Suppression. This would reduce the number of overlapping boxes.
Python
# Remove the bounding boxes with low confidence using non-maxima suppression
def postprocess(frame, outs):
frameHeight = frame.shape[0]
frameWidth = frame.shape[1]
# Scan through all the bounding boxes output from the network and keep only the
# ones with high confidence scores. Assign the box's class label as the class with the highest score.
classIds = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
classId = np.argmax(scores)
confidence = scores[classId]
if confidence > confThreshold:
center_x = int(detection[0] * frameWidth)
center_y = int(detection[1] * frameHeight)
width = int(detection[2] * frameWidth)
height = int(detection[3] * frameHeight)
left = int(center_x - width / 2)
top = int(center_y - height / 2)
classIds.append(classId)
confidences.append(float(confidence))
boxes.append([left, top, width, height])
# Perform non maximum suppression to eliminate redundant overlapping boxes with
# lower confidences.
indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)
for i in indices:
i = i[0]
box = boxes[i]
left = box[0]
top = box[1]
width = box[2]
height = box[3]
drawPred(classIds[i], confidences[i], left, top, left + width, top + height)
C++
// Remove the bounding boxes with low confidence using non-maxima suppression
void postprocess(Mat& frame, const vector<Mat>& outs)
{
vector<int> classIds;
vector<float> confidences;
vector<Rect> boxes;
for (size_t i = 0; i < outs.size(); ++i)
{
// Scan through all the bounding boxes output from the network and keep only the
// ones with high confidence scores. Assign the box's class label as the class
// with the highest score for the box.
float* data = (float*)outs[i].data;
for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
{
Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
Point classIdPoint;
double confidence;
// Get the value and location of the maximum score
minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
if (confidence > confThreshold)
{
int centerX = (int)(data[0] * frame.cols);
int centerY = (int)(data[1] * frame.rows);
int width = (int)(data[2] * frame.cols);
int height = (int)(data[3] * frame.rows);
int left = centerX - width / 2;
int top = centerY - height / 2;
classIds.push_back(classIdPoint.x);
confidences.push_back((float)confidence);
boxes.push_back(Rect(left, top, width, height));
}
}
}
// Perform non maximum suppression to eliminate redundant overlapping boxes with
// lower confidences
vector<int> indices;
NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i)
{
int idx = indices[i];
Rect box = boxes[idx];
drawPred(classIds[idx], confidences[idx], box.x, box.y,
box.x + box.width, box.y + box.height, frame);
}
}
The Non Maximum Suppression is controlled by the nmsThreshold parameter. If nmsThreshold is set too low, e.g. 0.1, we might not detect overlapping objects of same or different classes. But if it is set too high e.g. 1, then we get multiple boxes for the same object. So we used an intermediate value of 0.4 in our code above. The gif below shows the effect of varying the NMS threshold.
Figure 1: Effect of changing the parameter nmsThreshold
Step 4c : Draw the predicted boxes
Finally, we draw the boxes that were filtered through the non maximum suppression, on the input frame with their assigned class label and confidence scores.
Python
# Draw the predicted bounding box
def drawPred(classId, conf, left, top, right, bottom):
# Draw a bounding box.
cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3)
label = '%.2f' % conf
# Get the label for the class name and its confidence
if classes:
assert(classId < len(classes))
label = '%s:%s' % (classes[classId], label)
#Display the label at the top of the bounding box
labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)
top = max(top, labelSize[1])
cv.rectangle(frame, (left, top - round(1.5*labelSize[1])), (left + round(1.5*labelSize[0]), top + baseLine), (255, 255, 255), cv.FILLED)
C++
// Draw the predicted bounding box
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
//Draw a rectangle displaying the bounding box
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(255, 178, 50), 3);
//Get the label for the class name and its confidence
string label = format("%.2f", conf);
if (!classes.empty())
{
CV_Assert(classId < (int)classes.size());
label = classes[classId] + ":" + label;
}
//Display the label at the top of the bounding box
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - round(1.5*labelSize.height)), Point(left + round(1.5*labelSize.width), top + baseLine), Scalar(255, 255, 255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0,0,0),1);
}
Subscribe & Download Code
If you liked this article and would like to download code (C++ and Python) and example images used in this post, please click here. Alternately, sign up to receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.References:
We used video clips from the following sources:
Pixabay: [1], [2], [3], [4], [5], [6]
Pexels: [2]
Must Read Articles
Here are a few similar blog posts that you may be interested in.
- YOLOv7 Object Detection Paper Explanation and Inference
- Fine Tuning YOLOv7 on Custom Dataset
- YOLOv7 Pose vs MediaPipe in Human Pose Estimation
- YOLOv6 Object Detection – Paper Explanation and Inference
- YOLOX Object Detector Paper Explanation and Custom Training
- Object Detection using YOLOv5 and OpenCV DNN in C++ and Python
- Custom Object Detection Training using YOLOv5
- Pothole Detection using YOLOv4 and Darknet
Hi Satya Malik,
Your blog come always come with new idea. Great work. Keep it up.
Muhammad, thanks. It’s not just me, now we are a small team :). For example, this post was written by Dr. Sunita Nayak.
Your Team work is highly appreciable.
Satya, Thanks a lot of the post. How do you calculate Mean average Percision (mAP) on this obejct detected during both training and testing. I’m using keras implementation of YoloV3
Hi Satya, Thanks a lot for the post. How do you calculate mean average precision(mAP) in YoloV3 or any object detection for both training and testing data. I’m using keras implementation of yolov3.
You can use the COCO API. For Theory, you can look into the following links
medium
COCO dataset page
Thanks vikas for sharing this link. Can i use this API to calculate mAP for my own custom dataset ?
Hi Sunita,
Do you know if the OpenCV dnn module is also compatible with the latest YOLOv3-spp (spatial pyramid pooling) configuration (https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-spp.cfg) and weights (https://pjreddie.com/media/files/yolov3-spp.weights)?
And, of course, thanks a lot for the post!
Hi Fabio,
Yes, the above code runs fine with the yolov3-spp.config and yolov3-spp.weights files too. Thanks for reading !
Hello, Satya, Thanks for sharing good information. I want to compare darknet ver and Opencv ver.
Thanks, Kim. We have shared the OpenCV version. For darknet, you can directly get it from https://pjreddie.com/darknet/
Hi Satya, what version of the OpenCV did you use in the blog?
I am getting the following error
OpenCV(3.4.1) Error: Parsing error (Unknown layer type: shortcut) in ReadDarknetFromCfgFile, file opencv/modules/dnn/src/darknet/darknet_io.cpp, line 503
Traceback (most recent call last):
File “object_detection_yolo.py”, line 33, in
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
cv2.error: OpenCV(3.4.1) opencv/modules/dnn/src/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile
My mistake, didn’t check the OpenCV version requirement 3.4.2
Cool. Did you get it working?
Yup, its working after upgrading to 3.4.2. Thanks for the post Sunita & Satya.
Hello Satya. Thanks for a very good post.
Btw, I have made a project using VS 2017 and used this code. After running the code, I got this result: http://prntscr.com/kts9x0. Do you have any suggestion why it does not showing anything?
I used OpenCV 3.4.3 and yolov3.cfg, yolov3.weights, coco.names, ana my own image.
I have edited the width and height parameters in yolov3.cfg according to the size of my image.
My OpenCV version is 3.4.2 and yet, I still get the error. Mine reads
OpenCV Error: Parsing error (Unknown layer type: shortcut) in ReadDarknetFromCfgFile, file /home/epolicar/Applications/opencv/modules/dnn/src/darknet/darknet_io.cpp, line 503
Traceback (most recent call last):
File “object_detection_yolo.py”, line 33, in
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
cv2.error: /home/epolicar/Applications/opencv/modules/dnn/src/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile
Any ideas?
Could I use gpu to run yolov3 by opencv?
You could target it to OPENCL and try that. But OpenCV mentions that they have tested it only on Intel GPUs. So if you don’t have an Intel GPU, they would revert the run back to CPU.
Thanks for confirmation, I found it hard to believe opencv dnn still do not support gpu of nvidia. It is nice to know yolo v3 can run by opencv dnn module, but it is almost useless when you need to process multiple streams, this solution eat too much cpu resource, gpu should be the solution for deep learning, not sure why opencv community do not support nvidia or amd first(especially nvidia). The only reason I could think of is business issue.
you can use wrapper like this.
https://github.com/TommyX12/darknet-cpp-wrapper
Find out mxnet support cpp api(I have build it and run the yolov3 on gpu). For those who want to develop stand-alone application with non-commercial library, maybe mxnet can ease your pain. openCV dnn has ease to use api and aggressive optimize on cpu but their support on gpu is bad; dlib got decent api and support gpu but there are too few pre-trained models + build in layers are few; mxnet got decent support on cpu and nvidia gpu, but their dll are huge. No perfect choices, we have to choose the one suit our needs most.
When i tried running the code on my local machine for yolov3 cfg and weights i got the following error:
Traceback (most recent call last):
File “yolov3.py”, line 56, in
net = cv.dnn.readNet(args.model, args.config, args.framework)
cv2.error: OpenCV(3.4.2) /Users/travis/build/skvark/opencv-python/opencv/modules/dnn/src/darknet/darknet_io.cpp:511: error: (-215:Assertion failed) separator_index < line.size() in function 'ReadDarknetFromCfgFile'
When i tried running the same code with yolov2 weights, i didn't get this error but i didn't get any predictions either. The output was the same file as the input in all cases.
Because the code ran without any errors for the yolov2 file, i don't think the opencv-contrib folder is the issue as you suggested in our email conversation.
And i have read on a few other forums as well where people are facing the same issue without any solution.
What should i do to remedy this issue?
Hi Rishabh,
Did you try the code we had shared? It is named object_detection_yolo.py.
Yes, i ran the exact same code that you have shared and i continue to face this error.
I had exactly the same issue.
It turns out the way I downloaded the config file was incorrect. I used ‘right-click’ -> ‘save link as’, which resulted in an html file. That’s not the correct format. Later I just git cloned the entire cfg folder, from there I got the file with the correct format. It’s not an html, the first few lines look like this:
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16
Is there a way to build our own weights to use with YOLOV3. Actually, I want to use this algorithm to detect only terrains(grass,floor,gravel,stairs,mud,etc).
Thanks.
Yes, you will need to train it. This post may help
https://medium.com/@manivannan_data/how-to-train-yolov3-to-detect-custom-objects-ccbcafeb13d2
Thanks Sir.
Hey Satya and Sunita, amazing content. Is there a way to remove unwanted classes and make it faster only for a relevant two-three classes that I need. If so, can you please guide me on how to go about it?
You will find some useful info in those lines in the discussions at https://github.com/pjreddie/darknet/issues/142
Looks like it will improve the accuracy if you do that but still use the big dataset to retrain.
hey guys did anyone get this working with python2? it works with python3 but I get no output running it with python 2.
Dear Miss Sunita,
First thank you for this awesome article!
Secondly, i would like to ask some questions regarding the use of YOLOv3 with ros
I have a first node which publish an image and a second node which subscribe and thus process the image. the process is done during the callback when the subscriber receive a message.
But the process take so much time about 2s per frame and i think there is a problem somewhere…
Have u ever experienced this kind of errors when using with ROS
Thank you
We have not tested it on ROS yet, but it would be very interesting. Will update the post when we get a chance to work with ROS.
Hi, Sunita,
Does it run fast on an android device? I think CPU-specific instructions should have be used to improve the performance the CPU-version YOLOv3.
Thanks a lot for the post.
We have not yet tested it on an android device. Thanks for reading !
Thanks for beginning learning.
How can i use GPU Nvidia CUDA
With OpenCV GPU support for DNN is flaky. If you use YOLO 3 directly, you can change the makefile so it uses the GPU. Check this out
https://github.com/pjreddie/darknet/blob/master/Makefile
Set GPU and CUDNN to 1
Is it possible to only look for one type of object, say people? Is there a way to speed up execution by limiting to just one type of object?
You will find some useful info in those lines in the discussions at https://github.com/pjreddie/darknet/issues/142
It looks like it will improve the accuracy if you do that but still use the big dataset to retrain.
Hi Sunita and Satya,
Thanks for this great solution with OpenCV. I tried your source code and It works great.
I was wondering how you got so good results with GPU and Darknet (23ms)?
I am using the original code of Yolov3 (https://pjreddie.com/darknet/yolo/) with GPU and the best I get is 177ms. I use a Tesla P100 which is better than GTX 1080 Ti. How did you get the result so fast ? Did you change parameters in yolo configuration?
Thanks
Sorry for the late reply. You will need to change the makefile
https://github.com/pjreddie/darknet/blob/master/Makefile
Set GPU and CUDNN to 1. Might as well set OPENMP to 1 ( though it should not matter when it is using the GPU ).
Last time I could not manage to build opencv 4 even though I followed instructions correctly. Any help would be valuable I installed built it on ubuntu linux
Hi! We are planning to release a post very soon to help out with OpenCV-4 installation. Stay tuned!
Hi,
Can I train it on my own data?
Thanks for the post!
Yes, you can. Search for training YOLO 3 using your own data. You can then bring that model into an OpenCV application.
Hi Sunita and Satyia, many thanks for the post.
I see you are not using (or passing it to any function) the detection confidence given to each of the 13×13’s output cells, of containing a bounding box; that is the value “detection[4]” in line 15 of the function ‘postprocess()’ in the Python code.
Shouldn’t the first bounding box screening be made based on it instead of the confidences of each of the classes for each detection?
Thank you 🙂
Ideally, detection[4] should be compared instead of scores[classId]. But the output you get from the opencv’s forward function satisfies detection[4]>confThreshold multiple times even if all the class scores are zeros. You could see it yourself if you insert the following print code before line 19(if confidence > confThreshold: )
if detection[4]>confThreshold:
print(detection[4], ” – “, scores[classId], ” – th : “, confThreshold)
print(detection)
Also if you note the class scores, not all of them are adding up to 1 in the print’s output as well..this is something that is internal to the function call.
Regardless, even if you compare the confidence with detection[4], the results would be similar as those boxes with zero class scores will be eliminated by Non Maximum Suppression.
Does it can work also for raspberry pi?
I have not tried it, but my guess is that it will be very slow.
Is there a way to segment the classified object using this approach, other than a ‘bounding box’? If not, what methods do you recommend for this?
To segment the objects you should search for 1) Semantic Segmentation 2) Mask R-CNN.
Thank you.
Could someone explain how to update my anaconda’s opencv version? Currently it is 3.3.1. I downloaded https://github.com/opencv/opencv/releases source code of OpenCV 3.4.2, but not sure how to implement it into anaconda. I’m using windows.
Hi, I am getting an error running from the source code from github, File “object_detection_yolo.py”, line 33, in
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
cv2.error: OpenCV(3.4.1) C:bldopencv_1520732670222workopencv-3.4.1modulesdnnsrcdarknetdarknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function cv::dnn::darknet::ReadDarknetFromCfgFile
Kindly suggest how to get by this issue ?
You need to upgrade to OpenCV 3.4.2 with opencv_contrib
Thank you for the quick response, appreciate.
Hi Sunita, great post! I am both enlightened and enjoyed while reading it 🙂
I’ve got a question for my project, I just need to detect 2 objects, a box and hand; do you think traning my own model increases the speed? I am now making a color based recognition which is pretty inaccurate and short-termed before i can find a good detection algorithm. For example right now, i ve created a box object model, which i can set, get ID and locations. Each frame, i search for the closest old found object and match them with new ones to pass ID’s of objects for tracking. But color based is too weak about lightening changes, minor-overlappings etc. I’ve got a low spec computer so I have to consider speed, how much speed do you think i can obtain with training just 2 models and using them?
Thanks for your the amazing post again 🙂
Hi Utku, thanks for reading !
We will write a future post about the performance of YOLOv3 with fewer classes. In the meanwhile you will find some useful info in those lines in the discussions at
https://github.com/pjreddie/darknet/issues/142
It looks like it will improve the accuracy if you do that but still use the big dataset to retrain.
bash: ./object_detection_yolo.out: No such file or directory
I got this when I run c++ file. How can I run C++ file?
You need to compile the code first and make sure it successfully created the object_detection_yolo.out file in your current directory
Hi, thanks for the wonderful video, really helped out. Is there a tutorial to use YOLOv3 to train your custom data ?
Thanks for reading ! We will write a post on that in the future, but in the meanwhile you could find some guidelines at
https://medium.com/@manivannan_data/how-to-train-yolov3-to-detect-custom-objects-ccbcafeb13d2
Hello,
For custom dataset, to get weights file, do I need to train firstly using darknet53.conv.74? So, this code is only for testing?
please help me. i got : OpenCV Error: Parsing error (Unknown layer type: shortcut) in cv::dnn::darknet::ReadDarknetFromCfgFile, file C:projectsopencv-pythonopencvmodulesdnnsrcdarknetdarknet_io.cpp, line 503
Traceback (most recent call last), I’m working with pycharm
Hi Sunita,
I ran into this error while running this code.
OpenCV(3.4.1) Error: Parsing error (Unknown layer type: shortcut) in ReadDarknetFromCfgFile, file /home/vineeth/installations/opencv/modules/dnn/src/darknet/darknet_io.cpp, line 503
Traceback (most recent call last):
File “objectdetection.py”, line 33, in
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
cv2.error: OpenCV(3.4.1) /home/vineeth/installations/opencv/modules/dnn/src/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile
Can you help me to sort this thing out?
How to calculate the accuracy (mAP) for a video or images?
Sunita,
Unless i missed something. I am using MacOS, and looking to process hundreds of videos. Trying to identify objects and then match those objects against database. Been trying to simulate using your directions but not able to make it work. Per your article every video saves frames and then draws box around it. But there are a few issues with that… 1- When video is being processed i am not able to see object identification boxes… 2- once every frame is saved and boxes are drawn, one object could be spread across thousands of frames, and accuracy of object identification depends from frame to frame for the same object.
New to YOLOv3.. would appreciate any insight.
Thank you
Alec
Sir I am a mac user and I am really new at this topic. I download this code and run on xcode but I have “/Users/kursadlacin/Documents/opencv-3.4.2/modules/dnn/src/darknet/darknet_io.cpp:784: error: (-212:Parsing error) Failed to parse NetParameter file: yolov3.cfg in function ‘ReadNetParamsFromCfgFileOrDie'” this error. Is this error about what you are saying or different? Please help me?
Hi Sunita,
Just like to say thank you for your post and sample code; having downloaded and tested it, and it worked like a charm, thank you again.
Hello,
Thanks for such a great tutorial.
I have tested Yolov3 using OpenCv C++ by following this tutorial. My problem is when I run this code it takes soo long to generate output sometimes 7 or 8 seconds per image. Does this code work with NVIDIA GPUs. if it is possible could you guide me how I can use GPU to generate output. I have NVIDIA GTX 1060. Thanks
How to run the objectDetection using webcam?
Is it possible to get higher FPS using the following repo: https://github.com/shizukachan/darknet-nnpack
if yes can you make another article about it.
parser.add_argument(‘–image’, help=’Path to image file.’)
i am facing some issue regarding how to pass an argument in the above command
i am using windows 10 jupyter notebook….
Hi Sunita. Nice tutorial. How can I change the code and yolo configuration to detect only certain object classes (like cars and people)? Could this improve execution time?
Hello I have a problem.
libc++abi.dylib: terminating with uncaught exception of type cv::Exception: OpenCV(3.4.2) /Users/kursadlacin/Documents/opencv-3.4.2/modules/dnn/src/darknet/darknet_io.cpp:784: error: (-212:Parsing error) Failed to parse NetParameter file: yolov3.cfg in function ‘ReadNetParamsFromCfgFileOrDie’ I am using mac OS. What is my problem, please help
indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)
please help me out as i am getting error
Traceback (most recent call last):
File “E:/Image Classifier/OpenCv-Python-Yolov3/OpenCv-YoloV3.py”, line 92, in
indices = cv2.dnn.NMSBoxes(boxes)
NameError: name ‘boxes’ is not defined
Do you have any plans on making a course on deep learning like you did on “Computer Vision for Faces”? If yes, please let us know the expected date and price maybe? Thanks for your awesome projects btw!
after processing is done I have a file named “yolo_out_py.avi” in a working directory, but is empty-size so can’t be played. What is the reason for this? Thank you!
Good afternoon Sunita and Satya,
Currently I am running the OpenCV implementation of YOLOv3 on an Intel NUC device with i5 CPU and the processing speed is about 2 frames per second. As such, I am exploring on the addition of an Intel GPU card to the Intel NUC so that I can speed up the performance to at least 10 frames per second. Thus, I will like to seek your recommendation on an Intel GPU card that I should purchase such that it can be added to the Intel NUC so that the YOLOv3 DNN model can then be executed on the GPU instead of on the i5 or i7 CPU.
Thank you very much for your advice in this matter.