In this post, we will learn the details of the Histogram of Oriented Gradients (HOG) feature descriptor. We will learn what is under the hood and how this descriptor is calculated internally by OpenCV, MATLAB and other packages.
This post is part of a series I am writing on Image Recognition and Object Detection.
The complete list of tutorials in this series is given below:
- Image recognition using traditional Computer Vision techniques : Part 1
- Histogram of Oriented Gradients : Part 2
- Example code for image recognition : Part 3
- Training a better eye detector: Part 4a
- Object detection using traditional Computer Vision techniques : Part 4b
- How to train and test your own OpenCV object detector : Part 5
- Image recognition using Deep Learning : Part 6
- Object detection using Deep Learning : Part 7
A lot many things look difficult and mysterious. But once you take the time to deconstruct them, the mystery is replaced by mastery and that is what we are after. If you are a beginner and are finding Computer Vision hard and mysterious, just remember the following
Q : How do you eat an elephant ?
A : One bite at a time!
What is a Feature Descriptor?
A feature descriptor is a representation of an image or an image patch that simplifies the image by extracting useful information and throwing away extraneous information.
Typically, a feature descriptor converts an image of size width x height x 3 (channels ) to a feature vector / array of length n. In the case of the HOG feature descriptor, the input image is of size 64 x 128 x 3 and the output feature vector is of length 3780.
Keep in mind that HOG descriptor can be calculated for other sizes, but in this post I am sticking to numbers presented in the original paper so you can easily understand the concept with one concrete example.
This all sounds good, but what is “useful” and what is “extraneous” ? To define “useful”, we need to know what is it “useful” for ? Clearly, the feature vector is not useful for the purpose of viewing the image. But, it is very useful for tasks like image recognition and object detection. The feature vector produced by these algorithms when fed into an image classification algorithms like Support Vector Machine (SVM) produce good results.
But, what kinds of “features” are useful for classification tasks ? Let’s discuss this point using an example. Suppose we want to build an object detector that detects buttons of shirts and coats.
A button is circular ( may look elliptical in an image ) and usually has a few holes for sewing. You can run an edge detector on the image of a button, and easily tell if it is a button by simply looking at the edge image alone. In this case, edge information is “useful” and color information is not. In addition, the features also need to have discriminative power. For example, good features extracted from an image should be able to tell the difference between buttons and other circular objects like coins and car tires.
In the HOG feature descriptor, the distribution ( histograms ) of directions of gradients ( oriented gradients ) are used as features. Gradients ( x and y derivatives ) of an image are useful because the magnitude of gradients is large around edges and corners ( regions of abrupt intensity changes ) and we know that edges and corners pack in a lot more information about object shape than flat regions.
How to calculate Histogram of Oriented Gradients ?
In this section, we will go into the details of calculating the HOG feature descriptor. To illustrate each step, we will use a patch of an image.
Step 1 : Preprocessing
As mentioned earlier HOG feature descriptor used for pedestrian detection is calculated on a 64×128 patch of an image. Of course, an image may be of any size. Typically patches at multiple scales are analyzed at many image locations. The only constraint is that the patches being analyzed have a fixed aspect ratio. In our case, the patches need to have an aspect ratio of 1:2. For example, they can be 100×200, 128×256, or 1000×2000 but not 101×205.
To illustrate this point I have shown a large image of size 720×475. We have selected a patch of size 100×200 for calculating our HOG feature descriptor. This patch is cropped out of an image and resized to 64×128. Now we are ready to calculate the HOG descriptor for this image patch.
The paper by Dalal and Triggs also mentions gamma correction as a preprocessing step, but the performance gains are minor and so we are skipping the step.
Step 2 : Calculate the Gradient Images
To calculate a HOG descriptor, we need to first calculate the horizontal and vertical gradients; after all, we want to calculate the histogram of gradients. This is easily achieved by filtering the image with the following kernels.
We can also achieve the same results, by using Sobel operator in OpenCV with kernel size 1.
// C++ gradient calculation.
// Read image
Mat img = imread("bolt.png");
img.convertTo(img, CV_32F, 1/255.0);
// Calculate gradients gx, gy
Mat gx, gy;
Sobel(img, gx, CV_32F, 1, 0, 1);
Sobel(img, gy, CV_32F, 0, 1, 1);
# Python gradient calculation
# Read image
im = cv2.imread('bolt.png')
im = np.float32(im) / 255.0
# Calculate gradient
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0, ksize=1)
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1, ksize=1)
Next, we can find the magnitude and direction of gradient using the following formula
If you are using OpenCV, the calculation can be done using the function cartToPolar as shown below.
// C++ Calculate gradient magnitude and direction (in degrees)
Mat mag, angle;
cartToPolar(gx, gy, mag, angle, 1);
The same code in python looks like this.
# Python Calculate gradient magnitude and direction ( in degrees )
mag, angle = cv2.cartToPolar(gx, gy, angleInDegrees=True)
The figure below shows the gradients.
Notice, the x-gradient fires on vertical lines and the y-gradient fires on horizontal lines. The magnitude of gradient fires where ever there is a sharp change in intensity. None of them fire when the region is smooth. I have deliberately left out the image showing the direction of gradient because direction shown as an image does not convey much.
The gradient image removed a lot of non-essential information ( e.g. constant colored background ), but highlighted outlines. In other words, you can look at the gradient image and still easily say there is a person in the picture.
At every pixel, the gradient has a magnitude and a direction. For color images, the gradients of the three channels are evaluated ( as shown in the figure above ). The magnitude of gradient at a pixel is the maximum of the magnitude of gradients of the three channels, and the angle is the angle corresponding to the maximum gradient.
Step 3 : Calculate Histogram of Gradients in 8×8 cells
In this step, the image is divided into 8×8 cells and a histogram of gradients is calculated for each 8×8 cells.
We will learn about the histograms in a moment, but before we go there let us first understand why we have divided the image into 8×8 cells. One of the important reasons to use a feature descriptor to describe a patch of an image is that it provides a compact representation. An 8×8 image patch contains 8x8x3 = 192 pixel values. The gradient of this patch contains 2 values ( magnitude and direction ) per pixel which adds up to 8x8x2 = 128 numbers.
By the end of this section we will see how these 128 numbers are represented using a 9-bin histogram which can be stored as an array of 9 numbers. Not only is the representation more compact, calculating a histogram over a patch makes this represenation more robust to noise. Individual graidents may have noise, but a histogram over 8×8 patch makes the representation much less sensitive to noise.
But why 8×8 patch ? Why not 32×32 ? It is a design choice informed by the scale of features we are looking for. HOG was used for pedestrian detection initially. 8×8 cells in a photo of a pedestrian scaled to 64×128 are big enough to capture interesting features ( e.g. the face, the top of the head etc. ).
The histogram is essentially a vector ( or an array ) of 9 bins ( numbers ) corresponding to angles 0, 20, 40, 60 … 160.
Let us look at one 8×8 patch in the image and see how the gradients look.
If you are a beginner in computer vision, the image in the center is very informative. It shows the patch of the image overlaid with arrows showing the gradient — the arrow shows the direction of gradient and its length shows the magnitude. Notice how the direction of arrows points to the direction of change in intensity and the magnitude shows how big the difference is.
On the right, we see the raw numbers representing the gradients in the 8×8 cells with one minor difference — the angles are between 0 and 180 degrees instead of 0 to 360 degrees. These are called “unsigned” gradients because a gradient and it’s negative are represented by the same numbers. In other words, a gradient arrow and the one 180 degrees opposite to it are considered the same. But, why not use the 0 – 360 degrees ?
Empirically it has been shown that unsigned gradients work better than signed gradients for pedestrian detection. Some implementations of HOG will allow you to specify if you want to use signed gradients.
The next step is to create a histogram of gradients in these 8×8 cells. The histogram contains 9 bins corresponding to angles 0, 20, 40 … 160. The following figure illustrates the process. We are looking at magnitude and direction of the gradient of the same 8×8 patch as in the previous figure.
A bin is selected based on the direction, and the vote ( the value that goes into the bin ) is selected based on the magnitude. Let’s first focus on the pixel encircled in blue. It has an angle ( direction ) of 80 degrees and magnitude of 2. So it adds 2 to the 5th bin. The gradient at the pixel encircled using red has an angle of 10 degrees and magnitude of 4. Since 10 degrees is half way between 0 and 20, the vote by the pixel splits evenly into the two bins.
There is one more detail to be aware of. If the angle is greater than 160 degrees, it is between 160 and 180, and we know the angle wraps around making 0 and 180 equivalent. So in the example below, the pixel with angle 165 degrees contributes proportionally to the 0 degree bin and the 160 degree bin.
The contributions of all the pixels in the 8×8 cells are added up to create the 9-bin histogram. For the patch above, it looks like this
In our representation, the y-axis is 0 degrees. You can see the histogram has a lot of weight near 0 and 180 degrees, which is just another way of saying that in the patch gradients are pointing either up or down.
Step 4 : 16×16 Block Normalization
In the previous step, we created a histogram based on the gradient of the image. Gradients of an image are sensitive to overall lighting. If you make the image darker by dividing all pixel values by 2, the gradient magnitude will change by half, and therefore the histogram values will change by half.
Ideally, we want our descriptor to be independent of lighting variations. In other words, we would like to “normalize” the histogram so they are not affected by lighting variations.
Before I explain how the histogram is normalized, let’s see how a vector of length 3 is normalized.
Let’s say we have an RGB color vector [ 128, 64, 32 ]. The length of this vector is $\sqrt{128^2 + 64^2 + 32^2} = 146.64$. This is also called the L2 norm of the vector. Dividing each element of this vector by 146.64 gives us a normalized vector [0.87, 0.43, 0.22].
Now consider another vector in which the elements are twice the value of the first vector 2 x [ 128, 64, 32 ] = [ 256, 128, 64 ]. You can work it out yourself to see that normalizing [ 256, 128, 64 ] will result in [0.87, 0.43, 0.22], which is the same as the normalized version of the original RGB vector. You can see that normalizing a vector removes the scale.
Now that we know how to normalize a vector, you may be tempted to think that while calculating HOG you can simply normalize the 9×1 histogram the same way we normalized the 3×1 vector above. It is not a bad idea, but a better idea is to normalize over a bigger sized block of 16×16.
A 16×16 block has 4 histograms which can be concatenated to form a 36 x 1 element vector and it can be normalized just the way a 3×1 vector is normalized. The window is then moved by 8 pixels ( see animation ) and a normalized 36×1 vector is calculated over this window and the process is repeated.
Step 5 : Calculate the Histogram of Oriented Gradients feature vector
To calculate the final feature vector for the entire image patch, the 36×1 vectors are concatenated into one giant vector. What is the size of this vector ? Let us calculate
- How many positions of the 16×16 blocks do we have ? There are 7 horizontal and 15 vertical positions making a total of 7 x 15 = 105 positions.
- Each 16×16 block is represented by a 36×1 vector. So when we concatenate them all into one gaint vector we obtain a 36×105 = 3780 dimensional vector.
Visualizing Histogram of Oriented Gradients
The HOG descriptor of an image patch is usually visualized by plotting the 9×1 normalized histograms in the 8×8 cells. See image on the side. You will notice that dominant direction of the histogram captures the shape of the person, especially around the torso and legs.
Unfortunately, there is no easy way to visualize the HOG descriptor in OpenCV.
hi What technique will you use for HOG if opencv does not provide a way to do it thank you very much
OpenCV does provide a way to use HOG. It is only the visualization they do not provide. MATLAB and dlib have visualizations.
In the STEP-5
How many positions of the 16×16 blocks do we have ? There are 7
horizontal and 15 vertical positions making a total of 7 x 15 = 105
positions.
How there are 7 horizontal and 15 vertical positions??? I am not clear at this point..
Besides that i have already subscribed with learnopencv.com. But i didn’t receive any link to access the code.
The image patch is 64×128. Watch this animation and count the number of positions in the horizontal direction. There are 7 positions of the blue 16×16 block. You can apply the same logic to the vertical direction.
https://learnopencv.com/wp-content/uploads/2016/12/hog-16×16-block-normalization.gif
There is no code associated with this particular post. But for other posts, you must have received a welcome email ( check spam folder also ) that contains links to all code in this blog.
Satya
Thank you for the article. I learned some things that hopefully I will be able to apply in the future.
What are some of the pros/cons to trying a “deep learning” approach for pedestrian recognition instead of the approach you detail above? I mean, if you just give the whole image to a neural network, instead of trying to get rid of the extraneous parts first? Is the main disadvantage that it will take longer? Or might the recognition accuracy suffer as well?
The accuracy of using deep learning based methods is much higher than HOG + SVM. Deep Learning approaches like Faster R-CNN can also detect multiple objects at the same time ( e.g. people + vehicles ). They also produce bounding boxes at different aspect ratios and so if you train your pedestrian detector properly, you can also find people in an image who are crawling on the floor. The main disadvantage is the computational resources required. E.g. you can easily implement a real time HOG detector, but on a mobile phone, but doing so with current deep learning techniques requires a lot more expertise.
Thank you for the article. When are you publishing the next part of this series?
I am starting a project on 3D model reconstruction from multiple images using traditional vision techniques. I don’t really know where to start from. Any papers, articles, blogs or tutorials you’d refer me to?
The next part is coming this Monday. But a related one was posted last week.
https://learnopencv.com/training-better-haar-lbp-cascade-eye-detector-opencv/
For 3D model reconstruction you should look at VisualSFM. The best book on this subject is one by Hartley and Zisserman but beware that it is theoretical and needs a good grasp of linear algebra.
http://amzn.to/2kFdNyW
I’ve read Haar Cascades are not used very much these days, is that true?
If so, what has replaced it? LBP?
Any suggestions for a book or video course on starting with (Open)CV?
LBP is a reasonable alternative, but usually HAAR still produces slightly better results than LBP. But, LBP cascades are much smaller in size and train way faster than HAAR.
PyImageSearch.com has good books and courses.
Hi,
In the section 4, you mentioned that a16×16 block has 4 histograms which can be concatenated to form a 36 x 1 element vector and it can be normalized just the way a 3×1 vector is normalized …
Can you elaborate that? I’m not clear on this.
Thanks.
Lugia
Let’s say you have 3×1 vector [ 11, 20, 10 ]. The norm is sqrt ( 11^ 2 + 20^2 + 10^2 ), and you can normalize the vector by dividing every element of the vector by the norm. Now if you had a 100 element, you would take the square root of the sum of squares of all elements to obtain the norm and then divide each element of the vector by the norm.
Hi, i’m new in this topics. I got an error when i tried to use HOGDescriptor:
“error C2661: ‘cv::HOGDescriptor::HOGDescriptor’ : no overloaded function takes 12 arguments”
I hope you could help me
Are you on Windows ? Some people are reporting this on EMGU CV.
Hi could HOG be used for human action recognition ?
With some work, yes. Here is what you can start with
http://www.bmva.org/bmvc/2012/WS/paper2.pdf
Hi, How can HOG of a video file computed….. HOG feature along both spatial and temporal domain….
Technically I you can do it by doing computations over cubes instead of squares. E.g. 16x16x16 cube instead of a 16×16 block. But the computation will be very expensive.
As this question States:
what kinds of “features” are useful for classification tasks ?
I have one similar question
what kinds of “features” are useful for “Iris Recognition & Matching” tasks ?
People usually use Gabor filters for Iris Recognition, but I have never personally worked on it.
Hello,
I have a question related step 4 normalization:
– Why the 16×16 pixels blocks for the normalization overlap? Why they move just 8 pixels instead of 16 pixels?
Thank you very much
16×16 block is a design choice. They must have tried many different sizes and this performed the best. For contrast normalization, they need some overlap so the same pixel is part of multiple blocks. That way you can think the normalization is more robust ( and not affected by just one block )
hi could this filter be used for human action recognition?
i want to operate this project from rasperry pi ???
It will not be an easy plug and play. But you can with some work. You can start here
http://www.bmva.org/bmvc/2012/WS/paper2.pdf
thank very much
Your tutorial is great. I was unable to find the opencv implementation for this tutorial. Can you refer me to that.
Hi Ravi,
Thanks. Here you go.
http://docs.opencv.org/3.2.0/d5/d33/structcv_1_1HOGDescriptor.html
Satya
Hello Satya,
I am a bit confused by the parameters of the HOGDescriptor::compute() API.
There is a parameter called locations; could you explain me what it is, please?
Do you think nowadays, HoG surpasses Haar and LBP in case of car detection?
Most likely yes. But if you want to build a state of the art system, use Deep Learning ( Faster RCNN or SSD ).
Hello Satya,
Thank you for your post. It is really more clear than a lot of other instructions.
But I do have a question. Anyone here knows it, please enlighten me.
The direction of the gradient is usually point from a darker area to a lighter area, or conversely. However for visualizing the HOG, why the “star dials” are along the direction of color changes.
With the example you give above, the third “star dial” on the first row. Should not the directions of the gradient point to directions perpendicular to the red lines?
https://uploads.disquscdn.com/images/cb68f591f4cca6d6e0812b14edccf8e9ba303c76619d0d7b4112708147cb4d24.png
Hi Yang,
Did you figure out the reason for this? I have another question, answering which might as well explain what you asked. I have attached the image from this same blog where Image Gradients & Magnitudes are tabulated. If you pay attention to any of the gradients, let us consider gradient for 4th row and 3rd column, then in the visual indication gradient is upwards. So I will assume it to be close to 90. However in the table it is mentioned as 1. So there appears to be a 90 degree discrepancy here. This might as well explain the why you get similar 90 degree shift in the HOG descriptor. If you figure this out, please share here.
Will be as well great if Satya or some other readers can answer this for us. It might sound like a stupid question for experts, but answering it will do a lot of good for beginners like us. https://uploads.disquscdn.com/images/f3b4ef0e3a6e3d8afad7fbb7bd2baa5dc24fd8722d042c13dfcdc5c96eebf5f6.png
Hi Kunal,
I am not 100% sure about the reason behind this yet. But it makes perfect sense if we just do it (minus whatever the real orientation 90 degrees) for visualization purposes.
Consider the image below as a [3 3] pixel area. If we calculate the gradient orientation of the center pixel, it will point to straight up. https://uploads.disquscdn.com/images/675f81786cec6d3c7dedcfb6567d1bf8512d05ec572a370815aa95d9c8205829.png
But it will be hard to visualize, if we have a real size image which is much much larger than [3 3]. However, if we minus the real orientation 90 degree, then, the orientation will be along the edge of intensity change. Just as the second image below. https://uploads.disquscdn.com/images/c62a42b530f4dacd42f44afd5d4879258610b9959e16079b1e711520f4b9ec56.png
By doing that for every pixel in the entire image, the visualization of the gradients would highlight all the edges in an image, which is quite meaningful for human’s eyes and brain to understand. That is just my understanding for now. Really not sure whether it is the real reason behind.
Even though we can implement the algorithm without fully understand each single ideas behind it. But it would be appreciated, if an expert could pop out to clarify.
Thanks Yang. Yes it makes sense and is probably just the way we visualize the stuff. Actual gradients remain the same.
Quoting from Satya’s article — “In our representation, the y-axis is 0 degrees. You can see the histogram has a lot of weight near 0 and 180 degrees, which is just another way of saying that in the patch gradients are pointing either up or down.” . Hope this clarifies. 🙂
Hi Shravani, Thanks a lot. We didn’t quite get that point about y-axis being 0 degree. Now stuff makes perfect sense. Thanks again.
hello kunal singh, the gradient looks to be 90 but its not actually 90. this is 1. It is because of the fact that the image is represented by (x, y) coordinates with the x-axis being the vertical (increasing downwards) and y-axis the horizontal axis. With reference to this coordinates, the zero degree to x-axis is started from where it looks like 90 degrees (which is actually w.r.t y-axis). In this image the reference axis is the x-axis (vertical axis) which results such geometry to the vector representation of the gradient. hope this will be helpful for understanding and visualizing.
Based upon your later response, i assume the following:
1) the direction shown here is only for visualization
displayed direction = actual direction – 90 degree.
2) As you have stated, the actual direction of the gradient in the red box above should be perpendicular to the white line.
Is my understanding correct?
That is my understanding. Instead of calling it displayed direction, I will call it edge direction.
I am not 100% sure whether understanding it this way is correct or not. If you could find some more solid reference please also let me know.
Thanks a lot Avishek..:).really useful stuff…Saved my life..:D
Thanks a lot Avishek..:).really useful stuff…Saved my life..:D
According to the documentation of hog() function in skimage.feature: “Image of the HOG. For each cell and orientation bin, the image contains a line segment that is centered at the cell center, is perpendicular to the midpoint of the range of angles spanned by the orientation bin”.
Source: http://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.hog
So, @avishekparajuli:disqus assumption was correct, “displayed direction = actual direction – 90 degree”.
do you think it would be easier to implement human action recognition using kinnect camera and skeleton pose ? … best regards
Much much easier.
i am a phd student from Iraq do u have an account on facebook? i want to contact u for consultant iam working on human action recondition for robotics ans smart cameras
thank you. they way you explain is very simple. but why did you choose 8 x 8 cells and 16 x 16 block the other resources chose 6 x 6 cells and 3 x 3 blocks. and does that affect the performance?
i think i shall use kinect for human action recognition for robotics its easier
how to cite this article ?
It’s a bit too late, but you could use Zotero to help you in your citations. Go check at their website, it’s really useful. (We got lessons about this application at my university)
You can simply use the link.
Here is an interactive UI tool that will help to visualize the HOG descriptor and play with different parameters.
https://github.com/AvishekParajuli/SmartCar_proj/blob/HOGUI/HOGUI/readme.md
Is there a code sample provided for this article in particular on subscription?
HOG is implemented in OpenCV. This article is purely theoretical without any associated code.
Thought so, just checking 😀
Hi i did implemented the code explained in this post https://learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/ . i im tring to train a svm with hog features, but i don’t understand how to, any help please.
Awesome article on HOG. !
You’ve done a really good job explaining it plain and simple.
Can you also explain a bit about how object detection is done using HOG ?
i didn’t get the code for same?
can you provide code in your github repository?
@spmallick:disqus I am just beginner in opencv c++. I want to plot histogram in OpenCV C++. The task is that x-axis should be angle and y-axis should be magnitude of histogram. I calculate magnitude and angle by using Sobel operator. Now how can I plot histogram by using magnitude and angle? Please guide me Thanks
hi, is HoG free for open application? or is it patented?
I believe it is free. SIFT and SURF are not.
Hello,
Thanks for the post , it’s really simplifies the idea of HOG and now it’s more clear for me,
actually i’m working on project that requires fish detection on farms,
we are dealing with frames that included high intensity of fish objects as shown below,
as a initial approach of trying to detect fish, we used opencv train cascade tool (specifically the LPB algorithm) in order to get classifier of the object , at the beginning this worked fine with small number of objects per frame,
but now with high intensity of fish per frame , we are facing a problem and the detection has too many falses
i wonder if the HOG with SVM would be a right approach to detect the object in the described scene ?
i would be thankful if you could share your opinion
https://uploads.disquscdn.com/images/f678ae6ab000d56be7f4e167793758f9a9c12323031b2070bf1e12ba746fea9e.jpg
Thanks for the kind words. HOG + SVM is not the right approach for these kinds of problems. You need a Deep Learning based method. Try Googling Single Shot Multibox (SSD), YOLO 2 and Faster R-CNN. It is a hard problem, so don’t be discouraged if initial attempts are not as successful as you want.
Thanks for your reply :),
how about tensorflow and cafe machine learning methods ?
Tensorflow and Caffe are deep learning frameworks and the methods I told you about are Deep Learning architectures. You can use any framework to implement the architecture.
sir can i get the code in python my mail id is [email protected].i am stuck in cocatenating histogram so please send me the code
please anyone help me i need code of this hog features
Could u plz tell me where is the source code of this project? I cant find it in your github.
This is a theory post and does not have the source code because HOG is already implemented in OpenCV
https://github.com/opencv/opencv/blob/master/modules/objdetect/src/hog.cpp
Yeah I have already found it in opencv sample codes. Thank you very much!
The file, ProcessedComponent.js contains the HoG algorithm. https://github.com/DavidPynes/HoG/tree/master/src/js/components
Hi Satya,
Thanks for the thorough tutorial. Here is a visualization tool for HoG… https://davidpynes.github.io/HoG/
how can I download code? when I press subscribe now button it lands me on another page where I enter my name and emailed but nothing happens after that.
Hi Satya Malik, I have a small question, while calculating the histograms, When the angle is 165, why 25% is transferred to zero and 75% value is transferred to 160. why did we consider that proportion. According to me 82.424 should go to 160 and 2.576 should go to 0, Can you please explain this. Thank you so much in advance.
0 is the same as 180. For share of 165 that goes into the 180 bin, we do the following calculation
(180-165) / (180 – 160 ) = 0.75.
Similarly, for 160,
we have
(165-160) / (180 – 160 ) = 0.25
A simple way of thinking about it is that the distance between 165 and 160 is third of the distance between 165 and 180, so more weight goes to 160 than 180 ( or equivalently 0 )
Great! Thank you so much for the reply 🙂
very well explained……..Thanks a lot…
I am completely new to this and now I have to start a project where I need to take the video fthru a web camera and then detect the emotion ….could u guide me a bit so that I could start over this…….
Hi Satya Mallick,
This is a wonderful post about HOG. I am trying to separate surgical instruments from the background. But I failed when I used Color, Texture features(LBP alogrithm) and K-means algorithm to generate some points on the image, then used SVM to reach my goal. Because these features are not very useful for surgical instruments. Is it a possible way to use HOG +SVM to achieve my goal? Or can you give me some other suggestions?
https://uploads.disquscdn.com/images/20e249d5eff77a3c1861a2791210996776a51b4dd4419946c7d6c4ebe150a553.png
HOG will not be useful here. You need a good segmentation algorithm. You can use some heuristics based on color to get approximate segmentation and then use grabcut to get more accurate segmentation. GrabCut implementation in OpenCV allows you to specify an initial alpha mask.
Amazing post, great details ! Thanks a ton!
Thanks for the kind words!
thank you for this tutorial
Thanks, Hasim.
Hello Satya,
Is there any Matlab code for HOG?
thanks
Neda
Yes, MATLAB has a built in function
https://www.mathworks.com/help/vision/ref/extracthogfeatures.html
Hi Satya!
Thanks for the descriptive theoretical insights on HoG! In your opinion, if I wanted to capture textural features of say fabrics, would this perform suitably? Or if I were to explore a DNN architecture, which would you suggest?
Hi Roshan,
What do you want to do with the textures? Classify textures ? If yes, then yes CNN will be very useful. You can start with any standard model ( e.g. GoogleNet ) and it will do a reasonable job. There is no way to tell in advance ( without experimenting ) which architecture will be better.
Thanks. I will try out different architectures. If I wanted to work with the traditional approach(no DL) for texture based classification, what descriptors would you suggest trying? I thought of SIFT, SURF and LBP, anything else you would advise?
HI Satya, thanks for the post
i have a small question regarding the norm-step:
If you normalize the first 4 blocks, all values will be normalized in the range of 0 <= value <= 1
if you shift the window now 50% to the right, the left side of the block are normalized values, and the right side are the original histogram values, which are much bigger (for example 8 bit). With the L2-Norm Calculation, the left side, which were normalized once, get much more smaller, becuase they were normalized twice now. so the normalized values would be e.i 0.000012 etc (very small)
I hope you understand what I'm trying to explain; I dont know if my thoughts are correct
The normalization is not done “in-place”, so the values are not modified as we go from left to right. For example, in an implementation one may use a copy of the image.
Does this make sense ?
I have one picture buffer (block ram on fpga), which acts as the histogram input, with which I sum up the 4 histogram arrays (9 each) to a cluster. After, I normalize all of the 36 Values and write the normalized values in the same buffer again.
After the write process is complete, I shift the cluster to one right and sum up all the values again. However, this time the left side is already normalized (Thus values smaller as 1, but the right half is much bigger (10 bit values).
After the norming the values, the left side gets smaller again.
All values get written again in the buffer, And the windows shifts right again.
I’m writing this on a FPGA and I suppose I do something wrong because I cannot imagine what double normalized values like 0.0000214 are correct
Hi, how can i calculate the gradient of a pixel ?? how to use the derivatif filtes to detemine the new value of a pixel ??
hello,
I am working on a context based image retrieval method. So is HOG the right feature that can be used for context?
I am now studying Sketch Based Image Retrieval (SBIR), and for the retrieval i used histogram of oriented gradients, but how to extract a binary image feature with HOG?? because the input in SBIR is a binary image and i have to extract the feature in order to retrieve similar image. What i am confused at is if we are going to extract the feature, then the gradient magnitude values are either (-255,255,0) so, the direction (using tan^-1) will be just (90,-90,45,0,22,-0,22) (with zero replaced by 1) (and if not replace 0 by 1 i just ended up with 0 and 45 as the possible gradient direction that we can get from a binary image), can it still be on a 9 bin histogram? But the paper said they used 9 bin histogram, but i just doesn’t know how it’s done..Thank you so much in advance.
Hello Satya, How to calculate HOG feature of a binary image? will the number of histogram bin still be 9? thanks!
Hey Satya,
could you explain or share the source code on how to visualize the descriptor outputs?
Greetings
I don’t have the code yet in C++ / Python, but people are requesting it. So we will write about it too.
How to see the image containing only the magnitudes of hogs?
I don’t have the code yet in C++ / Python, but a lot of people have requested it. So we will write a post about it.
https://uploads.disquscdn.com/images/25f1d32ccd436b52211d701057f79c2ba1fafa99dea5c1535b1a0c9ddc416f37.jpg
Hello Satya,
Congratulations for your post! 😉
I want to detect water boxes in aerial images (see image). Could HOG + SVM techniques be used successfully to detect and locate these objects in images?
The idea is to apply the “Selective Search” algorithm and then apply HOG + SVM to detect the water boxes in the “patches” selected by the algorithm. In this way, the patches with the highest values of probabilities of success would be the ones chosen.
Could anyone tell you which is the best feature extractor for this case?
Hello Satya
I have a small doubt.
The dimension of the image is 64 x 128. That makes 8192 pixels. These 8192 pixels have their own magnitude and gradient.(.i.e 8192 magnitude values and 8192 gradient values). After the binning stage we are left with 1152 values as we converted 64 pixels into 9 bins based upon their orientation. Can you please explain me how after L2 normalization we get 3780 vectors?
Thank you very much for this tutorial, it was very usefull. I just have a question ! I am making a 3D HOG but I have a problem with the 3D gradients ? how can i compute them simply like in 2D ?
Sorry for this late reply. Yes, 3D gradients should be no different than 2D ones.
Awsome post. Please post on how to use these feature vectors with labels in svm classification.
Sure, here is what you need
https://learnopencv.com/handwritten-digits-classification-an-opencv-c-python-tutorial/
hello sir, your article has been very helpfull for me so far in doing my project. Now the implementation has not been done and need help in it. Can you once upload the codes in opencv for implementation.
Thanks. HOG is already implemented in OpenCV
https://github.com/opencv/opencv/blob/master/modules/objdetect/src/hog.cpp
Seems from the last figure each 8*8 patch has a HoG of 9 elements, therefore the dimension of HoG feature vector should be 8*16*9 = 1152. I also understand by sliding a 16*16 window you get 3780 dimension of HoG feature vector. Therefore my question is: Do you only use 8*8 window for visualization, and use 16*16 window for HoG feature vector?
Hey – nice blog, just looking about some blogs, appears a fairly good platform you might be making use of. Im currently using WordPress for several of my websites but looking to alter one of them around to a platform similar to yours as a trial run. Something in specific you would recommend about it?