Navneet Dalal Thesis Statement

Presentation on theme: "Pedestrian Detection and Localization"— Presentation transcript:

1 Pedestrian Detection and Localization
Members:Đặng Trương Khánh LinhBùi Huỳnh Lam BửuAdvisor:A.Professor Lê Hoài BắcUNIVERSITY OF SCIENCEADVANCED PROGRAM IN COMPUTER SCIENCEYear 2011

2 Outline Introduction Problem statement & application Challenges
Existing approaches.Review HOG and SVMMotivationOverview of methodologyLearning phaseDetection phaseOur contributions:Spatial selective approachMulti-level based approachFusion Algorithm - Mean ShiftConclusionsFuture workReference

3 Problem statementBuild up a system which automatically detects and localizes pedestrians in static image.Pedestrians: up-right and fully visible.Our thesis goal is building up an automatic system which can detects & localizes pedestrian objects in static images. More specific our detector will scan all the given images and bound the box around object if it appears in image. Pedestrian should stand up and fully visible in the picture. Our thesis is based on Dalal work – Normalized Histogram of Oriented Gradients (HOG).We concentrate on extracting robust feature.

4 Applications Automated Automobile Driver , or smart camera in general.
Build a software to categorize personal album images to proper catalogue.Video tracking smart surveillance.Action recognition.Develop a System for Smart car which automatically detect objects & a warning msg will appear whether the car tends to hits people or obstacle on the street.Every person has thousands of photo. Another application is a software that can automatically category personal album.Object Detection is one of the first phase of many of computer vision problems like video tracking, or action recognition.

5 Challenges Huge variation in intra-class.
Non-constraints illumination.Variable appearance and clothing.Complex background.Occlusions, different scales.There’re challenges that make pedestrian detection has more difficultHuge variation in intra-classVariable appearance and clothingBackground clutter varies from image to image. For example, images can be taken from indoor, outdoor, and under diverse natural factors such as illumination, viewpoint.Color

6 Existing approachesHaar wavelets + SVM: Papageorgiou & Poggio, 2000; Mohan et al 2000Rectangular differential features + adaBoost: Viola & Jones, 2001Model based methods: Felzenszwalb 2008Local Binary Pattern: Wang 2009Histogram of Oriented Gradients: Dalal and Trigg 2005FelzenszwalbCVPR 2008Object detection in general, or pedestrian detection in specific has attracted a lot researcher’s attention. These are some well-known approach up to decade years ago. There are Haar wavelet and Rectangle differential feature which utilize the different between two rectangle. Recently, Felzenszwalb proposed model based method which use HOG as extraction algorithm. The traditional method, SIFT, is also another well-known work. LBP use the order of pixel to construct histogram. Last but not least, HOG utilize gradients information of pixel.WangICCV 2009

7 Histogram of Oriented Gradients (HOG)
Base on the gradient of pixels.Because our thesis is based on Histogram of Oriented Gradients, in short HOG. So, we will briefly review HOG method. In pre-processing, detection window is normalized to reduce the illumination effect. Gradient of pixels will be computed and vote to spatial and orientation cells. Block which consists of 4 cells will be contrast normalized and concatenated to form final window feature vector.

8 Histogram of Oriented Gradients Review
blockcellFor example, sliding window will be divided into grid of points. Block includes 4 cell, each cell has a histogram constructed by pixel in this cell.9 orientation bins° degreesFeature vectorf = […,…,…, ,…]normalize9x4 feature vector per cell

9 Histogram of Oriented Gradients Review
blockcell9 orientation bins° degreesFeature vectorf = […,…,…, ,…]normalize9x4 feature vector per cell

10 Histogram of Oriented Gradients Review
blockcell9 orientation bins° degreesFeature vectorf = […,…,…, ,…]normalize9x4 feature vector per cell

11 Histogram of Oriented Gradients Review
blockcell9 orientation bins° degreesFeature vectorf = […,…,…, ,…]normalize9x4 feature vector per cell

12 SVM Review

13 Motivation of choosing HOG
The blob structure based methods have fail to object detection problem.Object detection methods via edge detection are unreliable.Use the advantage of rigid shape of object.Has a good performance and low complexity.Disadvantages:Very high dimensional feature vector.Lack of multi-scale shape of object.It’s just suitable for matching problem.Affect a lot by the variation of intra-class and noise of background.Though people has diversity of shape, in specific circumstance such as walking in the street, people usually are up-right.

14 Contributions Re-implement HOG-based pedestrian detector.
Spatial Selective Method.Multi-level Method.From the disadvantages of HOG we observe, we proposed two methods in order to overcome them. First, SSM is a method of eliminate unimportant region of image to shrink feature vector. Second, MLM get more information about object’s shape to make the feature set more robust. We also re-implement HOG of Dalal. This is not somehow re-invent the wheel, we have to do this to fully understand the philosophy under HOG method.

15 Dataset INRIA pedestrian dataset Train: 1208 positive windows
1218 negative imagesTest:566 positive windows453 negative imagesWe use INRIA pedestrian dataset which is very challenge b.c people in diverse shape and complex background.

16 Dataset Positive images Negative images Positive windows
Negative windowsThese are some examples of images and windows.As you can see, a window is a part of a image.In positive windows, pedestrian stands in the center of image.

17 Overview of methodology
Learning Phase:Input: positive windows and negative images.Output: binary classifierDetection Phase:Input: arbitrary image.Output: bounding boxes containing pedestrians.

18 Learning PhaseIn learning phase, firstly, we have a training dataset which content negative& pos windows.We extract the features over windows in this training set in order to create the first classifier.But we cannot use this classifier, b/c it’s very sensitive with false positive windows.We use the first classifier to run on training negative set to get all false pos windows.After that, we add these false pos windows into training set, training again, to get the better- second classifier

19 Detection Phase

20 Result of re-implementation

21 Examples

22 Contributions PERFORMANCE (Spatial Selective Approach) IMPROVEMENTS
ACCURACY(Multi-Level Approach)

23 Spatial Selective Approach
Less informative regionDescriptorBy experiment, we observe that there is a small region in the center of window which mostly contains chest and stomach is less informative.We remove the small region at the center, and divide the image into 4 parts.With each part, we compute the feature vector.Finally, we concatenate 4 vectors into the feature vector of the whole img[A1,..,Z1][A2,..,Z2][A3,..,Z3][A4,..,Z4][A1,..,Z1, A2,..,Z2, A3,..,Z3, A4,..,Z4]

24 Spatial Selective Approach
Examples:The center region mostly contains chest and stomach is less informative.Region (0) & 2 contain most information. Region (0) contains the head and left shoulder.Region (2) has information of legs.Region (1) occupies right shoulder, The (3) one is unreliable because sometimes it does not have any object information.When we test these four parts independently, their performance of them is extremely low because they lack of whole object information.One more thing that significantly affects performance is the overlap of regions. The more these regions overlap to each other, the more accuracy it is. Nonetheless, percentages of overlap of regions accompanies with the size of feature vector.

25 Result Spatial Selective Method
The performance of this new one is approximate with the original one thought the length of new feature vector is reduced by 15-25%.

26 Vector Length v.s SpeedA:B  Deleted cell(s): Overlap cell(s)

27 ExamplesOriginalRe-implement

28 Multi-level ApproachPurpose: enhance the performance by getting more information about shape of object.Inspired by pyramid modelOur goal in this method is to enhance the accuracy of the detector by adding more object’s shape information. We’re inspired by the pyramid model which is likely see object from near and far distance.

29 Multi-level Approach [A1,..,Z1] [A2,..,Z2] [A3,..,Z3]
HOGInstead of zoom in or zoom out window like the pyramid model, we use different grid of points apply on each level. The grids are designed from fine to coarse. In the dense grid, it is likely we look object in the near distance. So we see object in detail. And in the other level, it look like we see object in far distance, we get object’s shape more general.[A1,..,Z1][A2,..,Z2][A3,..,Z3][A1,..,Z1, A2,..,Z2, A3,..,Z3, A4,..,Z4]

30 Result(cont…)

31 Feature vector length v.s Time

32 ExamplesOriginalMulti-level

33 Examples (cont)

34 Fusion Algorithm – Mean Shift

35 Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel

36 Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel

37 Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel

38 Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel

39 Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel

40 Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel

41 Mean shift Region of interest Center of mass
Slide by Y. Ukrainitz & B. Sarel

42 Mean shift clusteringCluster: all data points in the attraction basin of a modeAttraction basin: the region for which all trajectories lead to the same modeSlide by Y. Ukrainitz & B. Sarel

43 Non-maximum suppression
Using non-maximum suppression such as mean shift to find the modes.

44 Conclusions Successfully re-implement HOG descriptor.
Propose the Spatial Selective Approach which take advantages of less informative center region of image window.Multi-level has more information about shape of object.It is general model, and can apply to any object.

45 Future work Non-uniform grid of points.
Combination of Spatial Selective and Multi-level approach.Combine the advantages of spatial selective method & multi-level method in order to enhance the performance & accuracy of algorithmNon-uniform grid of points, we focus on the important regions which contain more informations

46 Non-uniform grid of points

47 Demonstration

48 ReferencesN. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005.Subhransu Maji et al. Classification using Intersection Kernel Support Vector Machines is Efficient. IEEE Computer Vision and Pattern Recognition 2008C. Harris and M. Stephens. A combined corner and edge detector. In Alvey Vision Conference, pages 147–151, 1988.D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

49 Scan image at all positions and scales
Object/Non-object classifier

50 Miss rate = 1 – recall = 𝑓𝑛 𝑡𝑝+𝑓𝑛

51 Overview of methodology


I am cofounder of Matician, a startup building autonomous home devices.

Prior to this, I was the cofounder of Flutter, where we enabled hand gesture detection over built-in webcam to allow users to control computers, TVs, tablets and phones. Flutter app became number one app on the Mac App Store in 2012 in 72 countries and was picked by Apple in 2012 as one of the best app of the year. It was one of the rare computer vision consumer software which was rated 4.8 stars (out of 5). Flutter was acquired by Google in 2013. While at Google, I was involved with various projects at Google Research and at Nest. Prior to Flutter, I was senior researcher at, a visual search engine for online shopping (also acquired by Google).

I am perhaps best known for "Histogram of Oriented Gradient" features (a joint work with Bill Triggs and published in CVPR 2005) as way to encode images and videos enabling machines to classify and locate objects in image. This paper improved the then state-of-art by 100-1000x and became the de-facto standard for almost a decade. Last I checked, it has 20,000+ citations. The work was done at INRIA Grenoble as part of my PhD thesis "Finding People in Images and Video Sequences" (coadvised by Bill Triggs and Cordelia Schmid).

I grew up in Chandigarh, India, have lived in Grenoble, France, and am currently settled in San Francisco bay area since last 10+ years. In my spare time I love to bike, hike, travel and ski. When I was young, I used to paint and sketch a lot (however that's part of past life now).

0 Replies to “Navneet Dalal Thesis Statement”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *