Object detection can be considered as a part of computer vision that deals with detecting objects (such as humans, dogs, and cars) in digital images or videos. It is widely used in various tasks, for example, image annotation, object tracking, facial recognition, etc. Object detection is challenging, and multiple algorithms are used to perform this task. In this blog, we will look at different techniques to detect humans in an image or video.
Computer vision is a field of computer science that allows computers or machines to perceive and understand the world through images. It uses various algorithms to learn from the training data, usually including thousands of images. Computers can understand an image, make useful predictions, and extract information using computer vision. Convolutional Neural Networks are one of the leading technologies used for computer vision.
Human Detection
Human detection is a subfield of object detection. An object detection system that can identify humans in an image is called a human detection system. Let us look at some of the approaches through which we can do human detection.
source
Early approaches for Human Detection
Let us discuss some of the early approaches used for human detection. These approaches require less computing power than modern techniques and are readily available in computer vision libraries such as OpenCV. However, they are not very high in accuracy.
Haar Cascades
Haar cascades is a feature-based object recognition system. It was introduced in 2001 by Paul Viola and Michael Jones in their paper “Rapid Object Detection using a Boosted Cascade of Simple Features”. OpenCV provides Haar Cascade-based object detection and contains pre-trained models for face detection, full-body detection, upper body detection, and lower body detection. It is most commonly used for face detection.
To use Haarcascade for human body detection in python, we first have to import OpenCV by using:
import cv2
You can also try this code with Online Python Compiler
It is a feature descriptor used for object detection. This uses a technique similar to SIFT, which detects and classifies the occurrences of gradient orientations in localized portions of images. This was introduced in 2005 by N. Dalal and B. Triggs in their paper “Histograms of oriented gradients for human detection”. OpenCV provides inbuilt functionality for HOG-based detection. It also contains pre-trained models that can be used for human detection. To use HOG for human detection in python, we can use:
Modern approaches for human detection mainly consist of deep convolution neural networks. Modern object detection systems based on CNN are very accurate and can detect objects belonging to multiple classes.
Deep neural networks tasks can be solved by using TensorFlow. TensorFlow is an open-source API provided by Google. We will talk about three models that can be implemented by using TensorFlow, which can detect humans. Let us look at some of the popular deep neural networks.
SSD Mobilenet V1
Mobilenets are light convolutional neural networks that can be implemented on mobile applications using TensorFlow. You can read more about MobileNets here.
SSD stands for single shot detector. It can learn to predict the bounding boxes and classify them in one go. SSD can be trained end-to-end. You can read the official paper for more details.
This model is used for object detection and is faster than RCNN and FastRCNN. It is used in many real-life object detection tasks and is even used in self-driving cars. RCNN stands for region-based convolution neural network and works by dividing the image into different features or regions and classifying them. The faster RCNN model was introduced in 2016 in the paper “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. Faster RCNN model can provide good accuracy in less time if GPU acceleration is enabled.
NASNet stands for Neural Architecture Search Network. NASNet is one of the most accurate models available, and it can detect humans easily. Below are some of the results shown by NASNet.
As you can see, NASNet can easily detect humans even in a crowded environment with humans having different poses. RCNN is highly accurate, but it is a bit slow. It is not recommended if high computational power is not available.
YOLO
YOLO (You only look once) is a real-time object detection system. YOLO passes the image through a network only once and detects the object, unlike the RCNN discussed above that performs object detection on the image regionally. YOLO is very fast and hence, very popular for object detection. YOLO splits the input image into mxm grids and generates two bounding boxes and class probabilities for each grid. To gain more knowledge about YOLO, read the original paper on YOLO here.
The above modern approaches are more accurate than the earlier approaches but cost more computational time. You can use any of the above methods for human detection based on your requirements. There are many datasets available through which you can train your models. One of the famous datasets is the COCO dataset. COCO datasetis an object detection and captioning dataset by Microsoft.
Are there any other methods by which we can do human detection?
Yes, we can also use recurrent neural networks for human detection.
Can we use some other library for implementing deep neural networks?
Yes, you can use other libraries like Keras.
Are there any more datasets used for human detection?
Yes, there are many datasets such as Human Detection Dataset. It is available on Kaggle.
Conclusion
In this blog, we talked about various methods through which we can do human detection. We talked about older approaches such as Haar cascades and state-of-the-art techniques such as NASNet. Human detection is a crucial task used in many real-life applications such as self-driving cars and surveillance.