Click to edit the title text format,Click to edit the outline text format,Second Outline Level,Third Outline Level,Fourth Outline Level,Fifth Outline Level,Sixth Outline Level,Seventh Outline Level,Eighth Outline Level,Ninth Outline Level,*,Click to edit the title text format,*,Click to edit the outline text format,Second Outline Level,Third Outline Level,Fourth Outline Level,Fifth Outline Level,Sixth Outline Level,Seventh Outline Level,Eighth Outline Level,Ninth Outline Level,Human Detection,Phanindra Varma,Detection - Overview,Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding of images,Training set consists of positive windows (containing humans) and negative images,For each window in the training set the HOG feature vector is computed and linear SVM is used for learning the classifier,For any test image, the feature vector is computed on densely spaced windows at all scales and classified using the learned SVM,HOG encoding,Preprocessing:-,Gamma normalize each channel using square root transformation in the given window,For each channel compute gradients using -1 0 1 and -1 0 1,T,and find the channel with the largest gradient magnitude for each pixel,Compute gradient orientation (0 180) for each pixel in this dominant channel,Descriptor computation :-,Divide the window (64x128) into dense grid of points with horizontal and vertical spacing equal to 8 pixels,Divide the 16x16 region (block) centered on each point on the grid into cells of size 8x8 (i.e 4 cells for each grid point),For each pixel in the current block use Trilinear interpolation based on gradient strength to vote into a 2x2x9 histogram,HOG encoding (Contd.),Different voting schemes were used for each of the colored regions,Block normalization for illumination invariance is done on each block independently using the norm of the 2x2x9 vector,The final feature vector is the collection of all the 2x2x9 feature vectors from all the grid points,A Block of 16x16 pixels,Cell centers,Grid point,Training,The training set has been obtained from,The training set consists of positive 64x128 windows (2416) containing humans and negative images,Negative windows are sampled from the negative images at random locations (12000),Initial Phase learning :- Learn the SVM classifier on the original training set,Generate Hard examples :- Run the learned SVM on the negative images at all scales and window locations and save all the false positives (approx.6000),Training (Contd.),Second Phase learning :- Using the newly generated negative examples learn the new linear SVM (total positive windows 2400, negative windows 17000 approx),Following this procedure, 375 windows were misclassified out of the possible 19400 windows (using SVMLight),Testing,Given an Image :- HOG feature vector is computed across all scales and window locations and the locations and scales of all positive windows are saved (window size 64x128),This procedure gives multiple detections (at many scales and locations),To fuse overlapping detections the Mean Shift mode detection algorithm is used,Represent each detection in a 3D space (x y log(s) and iteratively compute the mean shift vector at each point,The resulting modes give the final detections and the bounding boxes are drawn using this final scale,Results - Detection,An example image,Detections when threshold,is zero,Results Detection (Contd.),Previous image,Detections when threshold,is equal to one,Results - Detection,An example image,Detections when threshold,is zero,Results Detection (Contd.),Result of Mean Shift mode detection,Comparision,Detection Video,