View of Analysis of Human Detection System

(1)

Analysis of Human Detection System

Daniyal Khan Btech CSE Galgotias University

Ankit Pandey Btech CSE Galgotias University

P. Raja Kumar Professor Galgotias University [email protected]

Shayan Khan Btech CSE Galgotias University Bulamdshahar(U.P), India [email protected]

Abstract—This work does a comparative studyon how hyperparameter configurationinfluence on the Histogram of oriented gradient technique used for detecting humans in the videos or images. We used OpenCV implementation of technique developed by dalal & triggs to analyze human detection for real-time system. In this paper, we presenta study regarding the influence of hyperparameter configuration in classification accuracy, depending on the user and the activities performed by each user. Human detection with higher accuracy can be done by using deep neural networks, CNN but that needs heavy computations, hardware and still take time to process as these include 150 network layers and millions of parameters, sometimes need big GPUs to process these features. We need algorithms that can easily deploy on edge devices so we did analysis on classical computer vision techniqueHistogram of oriented gradient and combine it with Machine learning tool i.e Support Vector Machines for predicting humans in frame. The algorithm is run through a common data set(MIT and INRIA pedestrian dataset). All the conclusion derived, is based on continuously taking readings from the given algorithms. Also, the algorithm is chosen in such a manner that it can stand against modern ANN based detection in real time systems, which makes it explicit compared to the work existing in this field. The data recorded from the execution is analyzed based on the previous work on this technique.

Keywords—HOG(Histogram of Oriented Gradient), winStride, Padding, Scaling, hiThreshold.

I. INTRODUCTION

Human detection is a major field of computer vision. It deals with detecting instances in images and videos. However, detection of more than one human within a single frame is a challenging task. The limitation and uncertainty of the classifiers can be overcome by the convolutional neural network approaches or R-CNN but theseneeds heavy computations, hardware and still take time to process as these include 150 network layers and millions of parameters, sometimes need big GPUs to process these features. We need algorithms that can easily deploy on edge devices. So we work on classical computer vision technique and tried to increase accuracy by tuning its hyperparameters.Although the Dalal & Triggs HOG people detector works pretty well, but it does not work well most of the time. What I mean to say is that, it does work well out of the box. Most of the time we need to tune a lot of hyperparameters to get the best results. And tuning hyperparameter is a tedious task, we need to select the combination of hyperparameters and their values and permutate them so that it gives the best possible outcome. Our study is to make use of HOG feature detector with its full possible spectrum in different fields.

When we work on hyperparameter values and permutate them then we realize that the trade-off between accuracy and speed for using the detector is pretty drastic

II. LITERATURE SURVEY

OpenCV people detector SVM is already trained on MIT and INRIA pedestrian dataset, first the HOG features are calculated for every person image from the dataset then it is applied to Linear SVM for learning parameters or creating hyperplane. In this we use linear SVM and it is working well with the given dataset, all hyperparameters of SVM are set carefully and it has trained optimally. When our model encounter a new image, At each stop of the sliding window (and for each level of the image pyramid), it extract HOG features and then pass these features on to the Linear SVM for classification. The process of feature extraction and classifier decision is a rich one. The biggest difference is made by the hyperparameters continuously used for calculating HOG features from an image as in calculating HOG features, it needs to search multiple times in a single frame and that increase time and lessen speed, so we need to carefully

(2)

select values for these.

The most impactful ones among all the hyperparameters involved in HOG technique are winStride, padding, scale. All these have their default values that was suggested by dalal & triggs but their default values do not work very well most of the time. winStride defines a step size for the detector window to maneuver within the horizontal and vertical direction. The smaller the step size, the more important and fine-grained details we would capture. winStride is one of the most impactful hyperparameters.

The padding parameters is employed to pad the window detector horizontally and vertically. As suggested by Dalal and Triggs in their 2005 CVPR paper, Histogram of Oriented Gradients for Human Detection, adding a touch of padding surrounding the image ROI before HOG feature extraction and classification can actually increase the accuracy of the detector but should be valued properly. The scale hyperparamter defines the factor by which the image is resized at each layer of the pyramid when image pyramiding occurs.

This scale parameter controls the things in which our image is resized at each layer of the image pyramid, ultimately influencing the amount of levels within the image pyramid. Both winStride and scale are extremely important parameters that required to be set properly. These parameter have tremendous implications on not only the accuracy of your detector, but also the speed during which your detector runs.

For each layer of the pyramid a window with winStride steps is moved across the whole layer. While it’s important to compute multiple layers of the image pyramid, allowing us to seek out objects in our image at different scales, it also adds a very big computational burden since each layer also implies a series of sliding windows, HOG feature extractions, and decisions by our SVM must be performed.

hitThreshold is for the space between features and SVM classifying plane. This parameter controls max. Euclidean distance between the input HOG features and therefore the SVM classifying plane. If the Euclidean distance exceeds this threshold, the detection is rejected. However, if the space is below this threshold, the detection is accepted.

III. EXPERIMENTALANALYSIS

Running HOG model on real time system we need to reduce image dimensions as then only we can get good detection speed, it seems that it may reduce some accuracy but it won’t as OpenCV SVM is trained on low resolution individual images only. Reducing image size ensures that less sliding windows in the image pyramid need to be evaluated (i.e., have HOG features extracted from and then passed on to the Linear SVM), thus reducing detection time (and increasing overall detection throughput).Resizing our image also improves the overall accuracy of our pedestrian detection (i.e., less false-positives).

A. WinStride

It defines a step size for the detector window to maneuver within the horizontal and vertical direction. The smaller the step size, the more important and fine-grained details we'll be ready to capture. winStride is one of the most impactful hyperparameter.

With the default winStride value(8,8) suggested by dalal & triggs we got Average FPS: 26.450 but accuracy is very less, clearly the figure shows.

The smaller winStride is, the more windows got to be evaluated (which can quickly become quite the computational burden). Now we got Average FPS: 9.249 when we reduce winStride to (4,4) but accuracy increases. Similarly as we decrease we got better accuracy but more time. This time accuracy trade off will be managed by changing other hyperparameters. winStride(2,2) accuracy substantially increases but Average FPS reduce to 4.392.

But I figured out that winStride(4,4) is nearly equal to (2,2) but speed difference is very large, so it can be compatible

(3)

on edgedevices.

Similarly, the larger winStride is that the less windows got to be evaluated (allowing us to dramatically speed up our detector). However, if winStride gets overlarge , then we will easily miss out on detections entirely.

The padding parameters is employed to pad the window detector within the horizontal and vertical direction. The padding switch controls the quantity of pixels the ROI is padded with before HOG feature vector extraction and SVM classification. Typical values for padding include (8, 8), (16, 16), (24, 24), and (32, 32).

More padding values higher will be the time required to process but it won’t affect accuracy much only increase time, though we need padding for better result as for detection people at corner, with padding SVM align hyperplane better.

B. Padding

The padding parameters is employed to pad the window detector within the horizontal and vertical direction. The padding switch controls the quantity of pixels the ROI is padded with before HOG feature vector extraction and SVM classification. Typical values for padding include (8, 8), (16, 16), (24, 24), and (32, 32).

More padding values higher will be the time required to process but it won’t affect accuracy much only increase time, though we need padding for better result as for detection people at corner, with padding SVM align hyperplane better.

Playing with this parameter I found that it increases stability, means it detects same person in many continuous frames in a video.

padding=(0, 0) or no padding gives very less prediction and stability but it take time equal to padding(8,8), that gives good stability and prediction.

padding(64,64) gives best result in terms of stability and prediction. Padding(64,64) gives FPS around 16.5 while (8,8) gives 30.589. and what we analysed is that both give nearly equal stability and prediction based on our experiment on different videos.

Padding is must as without padding and with padding(8,8) both take equal time FPS 30 approx. but huge difference in accuracy.

Padding(0,0) less prediction and accuracy.

Padding(8,8) gives better result.

(4)

And combining it with our analysed winStride we got better result without increasing time consumption as winStride(4,4)+padding(8,8) take time equal to winStride(4,4).

C. Scale

The scale hyperparameter defines the factor by which the image is resized at each layer of an image pyramid. This scale parameter controls the think about which our image is resized at each layer of the image pyramid, ultimately influencing the amount of levels within the image pyramid.

Both winStride and scale are extremely important parameters that require to be set properly. These parameter have tremendous implications on not only the accuracy of your detector, but also the speed during which your detector runs.

Finally, if you decrease both winStride and scale at an equivalent time, you’ll dramatically increase the quantity of your time it takes to perform object detection. A larger scale size will evaluate less layers within the image pyramid which may make the algorithm faster to run. However, having big value of a scale (i.e., less layers are their in image pyramid) can detect pedestrians not being detected. Similarly, having too small of a scale size dramatically increases the no. of of image pyramid layers , to be evaluated. Not only can this be computationally wasteful, it also can dramatically increase the amount of false-positives detected by the pedestrian detector.

A smaller scale will increase no. of of layers within the image pyramid and increase time of time it takes to process your image. We analysed that value from 1.01 to 1.5 gives results otherwise unwanted results occur. Dalal & trigs prefer 1.05 value but we analysed it is not giving better result as expected. Using other values of scale with default values of other hyperparameter doesn’t make any substantial change but using it with other tuned parameters make a lot of positive accuracy change.

D. hitThreshold

It is the threshold for the distance between features and SVM classifying plane. Usually it is 0. I analysed that this parameter controls the max. Euclidean dist. b/w the input HOG features andthe SVM classifying plane. If the Euclidean dist.Is more than the threshold, the detection is rejected. However, if the space is below this threshold, the detection is accepted. Usually it shouldn’t be change as described by dalal & trigs but when I was playing with it I got good results when combine with other parameters as changing gives better FPS without compromising accuracy(I will be discussing this later in other section).

(5)

IV. RESULT AND CONCLUSION

Described above parameters have huge effect on speed and accuracy, by following brute force technique we found range of hyperparameters values that would give best result, now here described the combination of all those parameter values and how they work together to give best possible result with speed and accuracy that can be run easily on edge devices.

Both winStride and scale are very important factors when we are dealing with speed accuracy trade off. Other parameters also plays important role but these two affect time very much and time is very important when we deal with real time situations. Decreasing both winStride and scale at the same time, their is dramatically increase the amount of time it takes to perform object detection. And we have seen that padding plays a very important function in improving accuracy and we have found value for which speed remains same as with no padding.

We found that the hyperparametric values given by dalal & triggs are giving results in very less time but these result are of no use as very less prediction and no stability at all, that there is a lot of flickering in the detection, the detection boxes are appearing and disappearing, and when many people are around it is only detecting 1-2 people in a frame. All these drawbacks are reduced by our hyperparameter values.

Default values suggested by dalal & trigs-

winStride=(8,8), padding=(4,4), scale=1.05 and hitThreshold=0. And it gives Average FPS= 27.080, figure shows the prediction. Results are very bad, time is less but what use of less time when nearly no prediction.

We changed values to winStride=(4,4), padding=(8,8) and scale=1.02. All these values are, that we found out by continuous experiments. It gives optimal predictions result but consume a lot of time. Average FPS= 9.812 which is very low as compared to default value that give 27.080

Figure shows the result of slow detection

(6)

But our detection time is very large, so we again tried all combinations of different values of hyperparameter and we found that we can reduce detection time by increasing winStride as winStride plays a very important role in defining time but increasing winStride decreases result as we have seen above, so decreasing winStride from (4,4) to (8,8) we decrease hiThreshold from 0 to -1 also so that prediction remain same as it controls the maximum Euclidean distance, so by doing this we hot Average FPS= 20.080 without loosing prediction and stability in frames.

winStride=(8,8), padding-(8,8), scale=1.02 and hitThreshold= -1, Average FPS= 20.258

(7)

Hence, we found that specific values for hyperparameters that gives the best prediction and stability with the least amount of time consume for processing and that make human detecting system to be easily deployed on real time systems i.e edge devices(with less computational power).

ACKNOWLEDGMENT

We thank our colleagues from Galgotias University who provided insight and expertise that greatly assisted the research, although they may not agree with all of the interpretations/conclusions of this paper. We thank our guide P.

Raja Kumar Sir for assistance with this project working.

REFERENCES

[1] Navneet Dalal , Bill Triggs, ―Histograms of Oriented Gradients for Human Detection,‖ in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1, p.886-893, June 20-26, 2005.

[2] V. de Poortere, J. Cant, B. Van den Bosch, J. de Prins, F. Fransens, and L. Van Gool., ―Efficient pedestrian detection: a test case for svm based categorization,‖ in Workshop on Cognitive Vision, 2002.

[3] O.Tuzel, F.Porikli, and P.Meer, ―Human detection via classification on Riemannian manifolds,‖ IEEE Int’l Conf.

Computer Vision and Pattern Recognition, 2007.

[4] E. Osuna, R. Freund and F. Girosi, ―An Improved Training Algorithm for Support Vector Machines,‖ To appear in Proc. of IEEE NNSP’97, Amelia Island, FL, 24-26 Sep., 1997.

[5] M. Pedersoli, A. Vedaldi, and J. Gonzalez, ―A coarse-to-fine approach for fast deformable object detection,‖ in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1353–1360.

[6] J. Wu, C. Geyer, and J. Rehg, ―Real-time human detection using contour cues,‖ in Proc. Int. Conf. Robot.

Autom., 2011, pp. 860–867.

[7] Faizan Ahmad, Aaima Najam and Zeeshan Ahmed, ― Image-based Face Detection and Recognition: ‖State of the Art‖ IEEE-11329

[8] Gaurav Kumar, Pradeep Kumar Bhatia, ―A Detailed Review of Feature Extraction in Image Processing Systems‖

[9] Ligang Zhang, Vinod Chandran,‖ Discovering the Best Feature Extraction and Selection Algorithms for Spontaneous Facial Expression Recognition‖.

[10] Sakrapee Paisitkriangkrai, Chunhua Shen, Jian Zhang,‖Face Detection with Effective Feature Extraction‖