本章讲述审查/完成监督学习,对象类别检测概述,统计模板匹配包括Dalal-Triggs行人检测器、Viola-Jones探测器、R-CNN检测器。首先复习上一讲的内容,基于范例:从具有最相似特性的示例中转移类别标签;线性分类器:对正标签的置信度是特征的加权和;非线性分类器:基于更复杂的特征函数的预测;生成分类器:指定最能解释特征的标签(使特征最有可能出现)。以及对相关模板的介绍。

注脚

展开查看详情

1.Object Detection with Statistical Template Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem, J. Hays

2.Administrative stuffs HW 5 is out Due 11:59pm on Wed, November 16 Scene categorization Please s tart early Final project proposal Feedback via email s

3.Today’s class Review/finish supervised learning Overview of object category detection Statistical template matching Dalal-Triggs pedestrian detector (basic concept) Viola-Jones detector (cascades, integral images) R-CNN detector (object proposals/CNN)

4.Image Categorization Training Labels Training Images Classifier Training Training Image Features Trained Classifier

5.Image Categorization Training Labels Training Images Classifier Training Training Image Features Image Features Testing Test Image Trained Classifier Trained Classifier Outdoor Prediction

6.Image features : map images to feature space Classifiers : map feature space to label space x x x x x x x x o o o o o x2 x1 x x x x o o o o o o o x x x x x x x x o o o o o x2 x1 x x x x o o o o o o o x x x x x x x x o o o o o x2 x1 x x x x o o o o o o o

7.Different types of classification Exemplar-based : transfer category labels from examples with most similar features What similarity function? What parameters? Linear classifier : confidence in positive label is a weighted sum of features What are the weights? Non-linear classifier : predictions based on more complex function of features What form does the classifier take? Parameters? Generative classifier : assign to the label that best explains the features (makes features most likely) What is the probability function and its parameters? Note: You can always fully design the classifier by hand, but usually this is too difficult. Typical solution: learn from training examples.

8.Exemplar-based Models Transfer the label(s) of the most similar training examples

9.K-nearest neighbor classifier x x x x x x x x o o o o o o o x2 x1 + +

10.1-nearest neighbor x x x x x x x x o o o o o o o x2 x1 + +

11.3-nearest neighbor x x x x x x x x o o o o o o o x2 x1 + +

12.5-nearest neighbor x x x x x x x x o o o o o o o x2 x1 + +

13.Using K-NN Simple, a good one to try first Higher K gives smoother functions No training time (unless you want to learn a distance function) With infinite examples, 1-NN provably has error that is at most twice Bayes optimal error

14.Discriminative classifiers Learn a simple function of the input features that confidently predicts the true labels on the training set Training Goals Accurate classification of training data Correct classifications are confident Classification function is simple  

15.Classifiers: Logistic Regression Objective Parameterization Regularization Training Inference x x x x x x x x o o o o o x2 x1 The objective function of most discriminative classifiers includes a loss term and a regularization term .

16.Using Logistic Regression Quick, simple classifier (good one to try first) Use L2 or L1 regularization L1 does feature selection and is robust to irrelevant features but slower to train

17.Classifiers: Linear SVM x x x x x x x x o o o o o o x2 x1  

18.Classifiers: Kernelized SVM x x x x o o o x x x x x o o o x x 2 Kernel trick: - implicitly mapping their inputs into high-dimensional feature spaces

19.Using SVMs Good general purpose classifier Generalization depends on margin, so works well with many weak features No feature selection Usually requires some parameter tuning Choosing kernel Linear: fast training/testing – start here RBF: related to neural networks, nearest neighbor Chi-squared, histogram intersection: good for histograms (but slower, esp. chi-squared) Can learn a kernel function

20.Classifiers: Decision Trees x x x x x x x x o o o o o o o x2 x1

21.Ensemble Methods: Boosting figure from Friedman et al. 2000

22.Boosted Decision Trees … Gray? High in Image? Many Long Lines? Yes No No No No Yes Yes Yes Very High Vanishing Point? High in Image? Smooth? Green? Blue? Yes No No No No Yes Yes Yes Ground Vertical Sky [Collins et al. 2002] P( label | good segment , data )

23.Using Boosted Decision Trees Flexible: can deal with both continuous and categorical variables How to control bias/variance trade-off Size of trees Number of trees Boosting trees often works best with a small number of well-designed features Boosting “stubs” can give a fast classifier

24.Generative classifiers Model the joint probability of the features and the labels Allows direct control of independence assumptions Can incorporate priors Often simple to train (depending on the model) Examples Naïve Bayes Mixture of Gaussians for each class

25.Naïve Bayes Objective Parameterization Regularization Training Inference Conditional independence Inference x 1 x 2 x 3 y

26.Using Naïve Bayes Simple thing to try for categorical data Very fast to train/test

27.Web-based demo SVM Neural Network Random Forest

28.Many classifiers to choose from SVM Neural networks Naïve Bayes Bayesian network Logistic regression Randomized Forests Boosted Decision Trees K-nearest neighbor RBMs Deep networks Etc. Which is the best one?

29.No Free Lunch Theorem

30.Generalization Theory It’s not enough to do well on the training set: we want to also make good predictions for new examples

31.Bias-Variance Trade-off E(MSE) = noise 2 + bias 2 + variance See the following for explanation of bias-variance (also Bishop’s “Neural Networks” book ): http ://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf Unavoidable error Error due to incorrect assumptions Error due to variance parameter estimates from training samples

32.Bias and Variance Many training examples Few training examples Complexity Low Bias High Variance High Bias Low Variance Test Error Error = noise 2 + bias 2 + variance

33.Choosing the trade-off Need validation set Validation set is separate from the test set Training error Test error Complexity Low Bias High Variance High Bias Low Variance Error

34.Effect of Training Size Testing Training Number of Training Examples Error Generalization Error Fixed classifier

35.How to measure complexity? VC dimension Other ways: number of parameters, etc. Training error + Upper bound on generalization error N: size of training set h: VC dimension : 1-probability that bound holds What is the VC dimension of a linear classifier for N-dimensional features? For a nearest neighbor classifier? Test error <=

36.How to reduce variance? Choose a simpler classifier Regularize the parameters Use fewer features Get more training data Which of these could actually lead to greater error?

37.Reducing Risk of Error Margins x x x x x x x x o o o o o x2 x1

38.The perfect classification algorithm Objective function: encodes the right loss for the problem Parameterization: makes assumptions that fit the problem Regularization: right level of regularization for amount of training data Training algorithm: can find parameters that maximize objective on training set Inference algorithm: can solve for objective function in evaluation

39.Comparison Naïve Bayes Logistic Regression Linear SVM Nearest Neighbor Kernelized SVM Learning Objective Training Inference Gradient ascent Quadratic programming or subgradient opt. Quadratic programming complicated to write most similar features  same label Record data assuming x in {0 1}

40.Characteristics of vision learning problems Lots of continuous features E.g., HOG template may have 1000 features Spatial pyramid may have ~15,000 features Imbalanced classes often limited positive examples, practically infinite negative examples Difficult prediction tasks

41.When a massive training set is available Relatively new phenomenon MNIST (handwritten letters) in 1990s, LabelMe in 2000s, ImageNet (object images) in 2009, … Want classifiers with low bias (high variance ok) and reasonably efficient training Very complex classifiers with simple features are often effective Random forests Deep convolutional networks

42.New training setup with moderate sized datasets Training Labels Training Images Tune CNN features and Neural Network classifier Trained Classifier Dataset similar to task with millions of labeled examples Initialize CNN Features

43.Practical tips Preparing features for linear classifiers Often helps to make zero-mean, unit- dev For non-ordinal features, convert to a set of binary features Selecting classifier meta-parameters (e.g., regularization weight) Cross-validation: split data into subsets; train on all but one subset, test on remaining; repeat holding out each subset Leave-one-out, 5-fold, etc. Most popular classifiers in vision SVM : linear for when fast training/classification is needed; performs well with lots of weak features Logistic Regression : outputs a probability; easy to train and apply Nearest neighbor : hard to beat if there is tons of data (e.g., character recognition) Boosted stumps or decision trees : applies to flexible features, incorporates feature selection, powerful classifiers Random forests : outputs probability; good for simple features, tons of data Deep networks / CNNs : flexible output; learns features; adapt existing network (which is trained with tons of data) or train new with tons of data Always try at least two types of classifiers

44.Making decisions about data 3 important design decisions: 1) What data do I use? 2) How do I represent my data (what feature)? 3) What classifier / regressor / machine learning tool do I use? These are in decreasing order of importance Deep learning addresses 2 and 3 simultaneously (and blurs the boundary between them). You can take the representation from deep learning and use it with any classifier.

45.Things to remember No free lunch: machine learning algorithms are tools Try simple classifiers first Better to have smart features and simple classifiers than simple features and smart classifiers Though with enough data, smart features can be learned Use increasingly powerful classifiers with more training data (bias-variance tradeoff)

46.Some Machine Learning References General Tom Mitchell, Machine Learning , McGraw Hill, 1997 Christopher Bishop, Neural Networks for Pattern Recognition , Oxford University Press, 1995 Adaboost Friedman, Hastie, and Tibshirani , “Additive logistic regression: a statistical view of boosting”, Annals of Statistics, 2000 SVMs http://www.support-vector.net/icml-tutorial.pdf Random forests http:// research.microsoft.com/pubs/155552/decisionForests_MSR_TR_2011_114.pdf

47.Object Category Detection Focus on object search: “Where is it?” Build templates that quickly differentiate object patch from background patch Object or Non-Object? … Dog Model

48.Challenges in modeling the object class Illumination Object pose Clutter Intra-class appearance Occlusions Viewpoint Slide from K. Grauman, B. Leibe

49.Challenges in modeling the non-object class Bad Localization Confused with Similar Object Confused with Dissimilar Objects Misc. Background True Detections

50.General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections What are the object parameters?

51.Specifying an object model Statistical Template in Bounding Box Object is some ( x,y,w,h ) in image Features defined wrt bounding box coordinates Image Template Visualization Images from Felzenszwalb

52.Specifying an object model 2. Articulated parts model Object is configuration of parts Each part is detectable Images from Felzenszwalb

53.Specifying an object model 3. Hybrid template/parts model Detections Template Visualization Felzenszwalb et al. 2008

54.Specifying an object model 3D-ish model Object is collection of 3D planar patches under affine transformation

55.General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections Propose an alignment of the model to the image

56.Generating hypotheses Sliding window Test patch at each location and scale

57.Generating hypotheses Sliding window Test patch at each location and scale

58.Generating hypotheses 2. Voting from patches/keypoints Interest Points Matched Codebook Entries Probabilistic Voting 3D Voting Space (continuous) x y s ISM model by Leibe et al.

59.Generating hypotheses 3. Region-based proposal Endres Hoiem 2010

60.General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections Mainly-gradient based or CNN features, usually based on summary representation, many classifiers

61.General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections Rescore each proposed object based on whole set

62.Resolving detection scores Non-max suppression Score = 0.1 Score = 0.8 Score = 0.8

63.Resolving detection scores 2. Context/reasoning meters meters Hoiem et al. 2006

64.Object category detection in computer vision Goal: detect all pedestrians, cars, monkeys, etc in image

65.Basic Steps of Category Detection Align E.g., choose position, scale orientation How to make this tractable ? Compare Compute similarity to an example object or to a summary representation Which differences in appearance are important? Aligned Possible Objects Exemplar Summary

66.Sliding window: a simple alignment solution … …

67.Each window is separately classified

68.Statistical Template Object model = sum of scores of features at fixed positions +3 +2 -2 -1 -2.5 = -0.5 +4 +1 +0.5 +3 +0.5 = 10.5 > 7.5 ? > 7.5 ? Non-object Object

69.Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should appearance be modeled? What features correspond to the object? How to deal with different viewpoints? Often train different models for a few different viewpoints Implementation details Window size Aspect ratio Translation/scale step size Non-maxima suppression

70.Example: Dalal-Triggs pedestrian detector Extract fixed-sized (64x128 pixel) window at each position and scale Compute HOG (histogram of gradient) features within each window Score the window with a linear SVM classifier Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs , Histograms of Oriented Gradients for Human Detection, CVPR05

71.Slides by Pete Barnum Navneet Dalal and Bill Triggs , Histograms of Oriented Gradients for Human Detection, CVPR05

72.Tested with RGB LAB Grayscale Gamma Normalization and Compression Square root Log Slightly better performance vs. grayscale Very slightly better performance vs. no adjustment

73.uncentered centered cubic-corrected diagonal Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Outperforms

74.Histogram of gradient orientations Votes weighted by magnitude Bilinear interpolation between cells Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

75.Normalize with respect to surrounding cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

76.X= Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 # features = 15 x 7 x 9 x 4 = 3780 # cells # orientations # normalizations by neighboring cells

77.Slides by Pete Barnum Navneet Dalal and Bill Triggs , Histograms of Oriented Gradients for Human Detection, CVPR05 pos w neg w

78.pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs , Histograms of Oriented Gradients for Human Detection, CVPR05

79.Detection examples

80.Something to think about… Sliding window detectors work very well for faces fairly well for cars and pedestrians badly for cats and dogs Why are some classes easier than others?

81.Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).

82.Cascade for Fast Detection Examples Stage 1 H 1 (x) > t 1 ? Reject No Yes Stage 2 H 2 (x) > t 2 ? Stage N H N (x) > t N ? Yes … Pass Reject No Reject No Choose threshold for low false negative rate Fast classifiers early in cascade Slow classifiers later, but most examples don’t get there

83.Features that are fast to compute “ Haar -like features” Differences of sums of intensity Thousands, computed at various positions and scales within detection window Two-rectangle features Three-rectangle features Etc. -1 +1

84.Integral Images ii = cumsum ( cumsum ( i m , 1), 2) x, y ii( x,y ) = Sum of the values in the grey region How to compute A+D-B-C? How to compute B-A?

85.Feature selection with Adaboost Create a large pool of features (180K) Select features that are discriminative and work well together “Weak learner” = feature + threshold + parity Choose weak learner that minimizes error on the weighted training set Reweight

86.Adaboost

87.Adaboost: Immune to Overfitting? Test error Train error

88.Adaboost: Margin Maximizer margin Test error Train error

89.Top 2 selected features

90.Viola-Jones details 38 stages with 1, 10, 25, 50 … features 6061 total used out of 180K candidates 10 features evaluated on average Training Examples 4916 positive examples 10000 negative examples collected after each stage Scanning Scale detector rather than image Scale steps = 1.25 (factor between two consecutive scales) T ranslation 1*scale (# pixels between two consecutive windows) Non-max suppression: average coordinates of overlapping boxes Train 3 classifiers and take vote

91.Viola Jones Results MIT + CMU face dataset Speed = 15 FPS (in 2001)

92.R-CNN (Girshick et al. CVPR 2014) Replace sliding windows with “selective search” region proposals ( Uijilings et al. IJCV 2013) Extract rectangles around regions and resize to 227x227 Extract features with fine-tuned CNN (that was initialized with network trained on ImageNet before training) Classify last layer of network features with SVM http:// arxiv.org/pdf/1311.2524.pdf

93.Sliding window vs. region proposals Sliding window Comprehensive search over position, scale (sometimes aspect, though expensive) Typically 100K candidates Simple Speed boost through convolution often possible Repeatable Even with many candidates, may not be a good fit to object Region proposals Search over regions guided by image contours/patterns with varying aspect/size Typically 2-10K candidates Random (not repeatable) Requires a preprocess (currently 1-5s) Often requires resizing patch to fit fixed size More likely to provide candidates with very good object fit

94.HOG: Dalal-Triggs 2005 HOG Template Statistical Template Matching

95.HOG: Dalal-Triggs 2005 DPM: Felzenszwalb et al. 2008-2012 Deformable Parts Model (v1-v5) HOG Template Better Models of Complex Categories

96.HOG: Dalal-Triggs 2005 DPM: Felzenszwalb et al. 2008-2012 Regionlets : Wang et al. 2013 R-CNN: Girshick et al. 2014 Deformable Parts Model (v1-v5) HOG Template Regionlets R-CNN Better Features Key Advance: Learn effective features from massive amounts of labeled data and a dapt to new tasks with less data

97.Mistakes are often reasonable Bicycle: AP = 0.73 Confident Mistakes R-CNN results

98.Horse: AP = 0.69 Confident Mistakes Mistakes are often reasonable R-CNN results

99.Misses are often predictable Small objects, distinctive parts absent or occluded, unusual views Bicycle R-CNN results

100.Strengths and Weaknesses of Statistical Template Approach Strengths Works very well for non-deformable objects: faces, cars, upright pedestrians Fast detection Weaknesses Sliding window has difficulty with deformable objects (proposals works with flexible features works better) Not robust to occlusion Requires lots of training data

101.Tricks of the trade Details in feature computation really matter E.g., normalization in Dalal-Triggs improves detection rate by 27% at fixed false positive rate Template size Typical choice for sliding window is size of smallest detectable object For CNNs, typically based on what pretrained features are available “Jittering” to create synthetic positive examples Create slightly rotated, translated, scaled, mirrored versions as extra positive examples Bootstrapping to get hard negative examples Randomly sample negative examples Train detector Sample negative examples that score > -1 Repeat until all high-scoring negative examples fit in memory

102.Influential Works in Detection Sung- Poggio (1994, 1998) : ~2100 citations Basic idea of statistical template detection (I think), bootstrapping to get “face-like” negative examples, multiple whole-face prototypes (in 1994) Rowley- Baluja -Kanade (1996-1998) : ~4200 “Parts” at fixed position, non-maxima suppression, simple cascade, rotation, pretty good accuracy, fast Schneiderman-Kanade (1998-2000,2004) : ~2250 Careful feature/classifier engineering, excellent results, cascade Viola-Jones (2001, 2004) : ~20,000 Haar -like features, Adaboost as feature selection, hyper-cascade, very fast, easy to implement Dalal-Triggs (2005) : ~11000 Careful feature engineering, excellent results, HOG feature, online code Felzenszwalb-Huttenlocher (2000): ~1600 Efficient way to solve part-based detectors Felzenszwalb-McAllester-Ramanan (2008,2010)? ~ 4000 Excellent template/parts-based blend Girshick-Donahue-Darrell-Malik (2014 ) ~300 Region proposals + fine-tuned CNN features (marks significant advance in accuracy over hog-based methods)

103.Fails in commercial face detection Things iPhoto thinks are faces http:// www.oddee.com/item_98248.aspx

104.Summary: statistical templates Propose Window Sliding window: scan image pyramid Region proposals: edge/region-based, resize to fixed window Extract Features HOG CNN features Fast randomized features Classify SVM Boosted stubs Neural network Post-process Non-max suppression Segment or refine localization

105.Things to remember Sliding window for search Features based on differences of intensity (gradient, wavelet, etc.) Excellent results require careful feature design Boosting for feature selection Integral images, cascade for speed Bootstrapping to deal with many, many negative examples Examples Stage 1 H 1 (x) > t 1 ? Reject No Yes Stage 2 H 2 (x) > t 2 ? Stage N H N (x) > t N ? Yes … Pass Reject No Reject No

106.Next class Part-based models and pose estimation