对应(Correspodence),包括图像之间的匹配点、贴片、边缘或区域;对齐(Alignment/registration)是指解决使两件事情更匹配的转换。而registration问题的处理在计算机视觉领域显得尤为重要。本章将介绍什么是兴趣点?、检测角点、处理规模和方向、特征匹配相关内容。

注脚

展开查看详情

1.Interest Points Computer Vision Jia-Bin Huang, Virginia Tech Many slides from N Snavely, K. Grauman & Leibe , and D. Hoiem

2.Administrative Stuffs HW 1 posted, due 11:59 PM Sept 19 Submission through Canvas Frequently Asked Questions for HW 1 posted on piazza Reading - Szeliski : 4.1

3.What have we learned so far? Light and color What an image records Filtering in spatial domain Filtering = weighted sum of neighboring pixels Smoothing, sharpening, measuring texture Filtering in frequency domain Filtering = change frequency of the input image Denoising , sampling, image compression Image pyramid and template matching Filtering = a way to find a template Image pyramids for coarse-to-fine search and multi-scale detection Edge detection Canny edge = smooth -> derivative -> thin -> threshold -> link Finding straight lines, binary image analysis

4.This module: Correspondence and Alignment Correspondence : matching points, patches, edges, or regions across images ≈ Slide credit: Derek Hoiem

5.This module: Correspondence and Alignment Alignment/registration : solving the transformation that makes two things match better T Slide credit: Derek Hoiem

6.The three biggest problems in computer vision? Registration Registration Registration Takeo Kanade (CMU)

7.Example: automatic panoramas Credit: Matt Brown

8.A Example: fitting an 2D shape template Slide from Silvio Savarese

9.Example: fitting a 3D object model Slide from Silvio Savarese

10.Example: estimating “fundamental matrix” that corresponds two views Slide from Silvio Savarese

11.Example: tracking points frame 0 frame 22 frame 49 x x x

12.Today’s class What is interest point? Corner detection Handling scale and orientation Feature matching

13.Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

14.Why extract features? Motivation: panorama stitching We have two images – how do we combine them? Step 1: extract features Step 2: match features

15.Why extract features? Motivation: panorama stitching We have two images – how do we combine them? Step 1: extract features Step 2: match features Step 3: align images

16.Image matching by Diva Sian by swashford TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A

17.Harder case by Diva Sian by scgbt

18.Harder still?

19.NASA Mars Rover images with SIFT feature matches Answer below (look for tiny colored squares…)

20.Applications Keypoints are used for: Image alignment 3D reconstruction Motion tracking Robot navigation Indexing and database retrieval Object recognition

21.Advantages of local features Locality features are local, so robust to occlusion and clutter Quantity hundreds or thousands in a single image Distinctiveness: can differentiate a large database of objects Efficiency real-time performance achievable

22.Overview of Keypoint Matching K. Grauman, B. Leibe B 1 B 2 B 3 A 1 A 2 A 3 1. Find a set of distinctive key- points 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors

23.Goals for Keypoints Detect points that are repeatable and distinctive

24.Key trade-offs More Repeatable More Points B 1 B 2 B 3 A 1 A 2 A 3 Detection More Distinctive More Flexible Description Robust to occlusion Works with less texture Minimize wrong matches Robust to expected variations Maximize correct matches Robust detection Precise localization

25.Choosing interest points Where would you tell your friend to meet you?

26.Choosing interest points Where would you tell your friend to meet you?

27.Corner Detection: Basic Idea “flat” region: no change in all directions “edge” : no change along the edge direction “corner” : significant change in all directions How does the window change when you shift it? Shifting the window in any direction causes a big change Credit: S. Seitz, D. Frolova, D. Simakov

28.Consider shifting the window W by ( u,v ) how do the pixels in W change? compare each pixel before and after by summing up the squared differences (SSD) this defines an SSD “error” E ( u,v ): Harris corner detection: the math W

29.Taylor Series expansion of I : If the motion (u,v) is small, then first order approximation is good Plugging this into the formula on the previous slide… Small motion assumption

30.Harris corner detection: the math Using the small motion assumption, replace I with a linear approximation W (Shorthand: )

31.Corner detection: the math W Thus, E ( u , v ) is locally approximated as a quadratic form

32.The surface E ( u , v ) is locally approximated by a quadratic form. The second moment matrix Let’s try to understand its shape.

33.Horizontal edge: u v E(u,v)

34.Vertical edge: u v E(u,v)

35.General case The shape of H tells us something about the distribution of gradients around a pixel We can visualize H as an ellipse with axis lengths determined by the eigenvalues of H and orientation determined by the eigenvectors of H direction of the slowest change direction of the fastest change (  max ) -1/2 (  min ) -1/2 Ellipse equation:  max ,  min : eigenvalues of H

36.Quick eigenvalue/eigenvector review The eigenvectors of a matrix A are the vectors x that satisfy: The scalar  is the eigenvalue corresponding to x The eigenvalues are found by solving: In our case, A = H is a 2x2 matrix, so we have The solution: Once you know , you find x by solving

37.Corner detection: the math Eigenvalues and eigenvectors of H Define shift directions with the smallest and largest change in error x max = direction of largest increase in E  max = amount of increase in direction x max x min = direction of smallest increase in E  min = amount of increase in direction x min x min x max

38.Corner detection: the math How are  max , x max ,  min , and x min relevant for feature detection? What’s our feature scoring function?

39.Corner detection: the math How are  max , x max ,  min , and x min relevant for feature detection? What’s our feature scoring function? Want E ( u , v ) to be large for small shifts in all directions the minimum of E ( u , v ) should be large, over all unit vectors [ u v ] this minimum is given by the smaller eigenvalue (  min ) of H

40.Interpreting the eigenvalues  1  2 “Corner”  1 and  2 are large,  1 ~  2 ; E increases in all directions  1 and  2 are small; E is almost constant in all directions “Edge”  1 >>  2 “Edge”  2 >>  1 “Flat” region Classification of image points using eigenvalues of M :

41.Corner detection summary Here’s what you do Compute the gradient at each point in the image Create the H matrix from the entries in the gradient Compute the eigenvalues. Find points with large response (  min > threshold) Choose those points where  min is a local maximum as features

42.Corner detection summary Here’s what you do Compute the gradient at each point in the image Create the H matrix from the entries in the gradient Compute the eigenvalues. Find points with large response (  min > threshold) Choose those points where  min is a local maximum as features

43.The Harris operator  min is a variant of the “Harris operator” for feature detection The trace is the sum of the diagonals, i.e., trace(H) = h 11 + h 22 Very similar to  min but less expensive (no square root) Called the “Harris Corner Detector” or “Harris Operator” Lots of other detectors, this is one of the most popular

44.The Harris operator Harris operator

45.Harris detector example

46.

47.f value (red high, blue low)

48.Threshold (f > value)

49.Find local maxima of f

50.Harris features (in red)

51.Weighting the derivatives In practice, using a simple window W doesn’t work too well Instead, we’ll weight each derivative value based on its distance from the center pixel

52.Harris Detector – Responses [ Harris88 ] Effect: A very precise corner detector.

53.Harris Detector – Responses [ Harris88 ]

54.Harris Detector: Invariance Properties Rotation Ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner response is invariant to image rotation

55.Harris Detector: Invariance Properties Affine intensity change: I  aI + b Only derivatives are used => invariance to intensity shift I  I + b Intensity scale: I  a I R x (image coordinate) threshold R x (image coordinate) Partially invariant to affine intensity change

56.Harris Detector: Invariance Properties Scaling All points will be classified as edges Corner Not invariant to scaling

57.Scale invariant detection Suppose you’re looking for corners Key idea: find scale that gives local maximum of f in both position and scale One definition of f : the Harris operator

58.Automatic Scale Selection K. Grauman, B. Leibe How to find corresponding patch sizes?

59.Automatic Scale Selection Function responses for increasing scale (scale signature) K. Grauman, B. Leibe

60.Automatic Scale Selection K. Grauman, B. Leibe Function responses for increasing scale (scale signature)

61.Automatic Scale Selection K. Grauman, B. Leibe Function responses for increasing scale (scale signature)

62.Automatic Scale Selection K. Grauman, B. Leibe Function responses for increasing scale (scale signature)

63.Automatic Scale Selection K. Grauman, B. Leibe Function responses for increasing scale (scale signature)

64.Automatic Scale Selection K. Grauman, B. Leibe Function responses for increasing scale (scale signature)

65.Implementation Instead of computing f for larger and larger windows, we can implement using a fixed window size with a Gaussian pyramid (sometimes need to create in-between levels, e.g. a ¾-size image)

66.What Is A Useful Signature Function? Difference-of-Gaussian = “blob” detector K. Grauman, B. Leibe

67.Difference-of-Gaussian (DoG) K. Grauman, B. Leibe - =

68.DoG – Efficient Computation Computation in Gaussian scale pyramid K. Grauman, B. Leibe s Original image S ampling with step s 4 =2 s s s

69.Find local maxima in position-scale space of Difference-of-Gaussian K. Grauman, B. Leibe s s 2 s 3 s 4 s 5  L ist of (x, y, s)

70.Results: Difference-of-Gaussian K. Grauman, B. Leibe

71.T. Tuytelaars, B. Leibe Orientation Normalization Compute orientation histogram Select dominant orientation Normalize: rotate to fixed orientation 0 2 p [Lowe, SIFT, 1999]

72.Maximally Stable Extremal Regions [Matas ‘02] Based on Watershed segmentation algorithm Select regions that stay stable over a large parameter range K. Grauman, B. Leibe

73.Example Results: MSER 73 K. Grauman, B. Leibe

74.Available at a web site near you… For most local feature detectors, executables are available online: http://www.robots.ox.ac.uk/~vgg/research/affine http://www.cs.ubc.ca/~lowe/keypoints/ http://www.vision.ee.ethz.ch/~surf K. Grauman, B. Leibe

75.Local Descriptors The ideal descriptor should be Robust Distinctive Compact Efficient Most available descriptors focus on edge/gradient information Capture texture information Color rarely used K. Grauman, B. Leibe

76.Basic idea: Take 16x16 square window around detected feature Compute edge orientation (angle of the gradient - 90  ) for each pixel Throw out weak edges (threshold gradient magnitude) Create histogram of surviving edge orientations S cale I nvariant F eature T ransform Adapted from slide by David Lowe 0 2  angle histogram

77.SIFT descriptor Full version Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Adapted from slide by David Lowe

78.Local Descriptors: SIFT Descriptor [Lowe, ICCV 1999] Histogram of oriented gradients Captures important texture information Robust to small translations / affine deformations K. Grauman, B. Leibe

79.Details of Lowe’s SIFT algorithm Run DoG detector Find maxima in location/scale space Remove edge points Find all major orientations Bin orientations into 36 bin histogram Weight by gradient magnitude Weight by distance to center (Gaussian-weighted mean) Return orientations within 0.8 of peak Use parabola for better orientation fit For each ( x,y,scale,orientation ), create descriptor: Sample 16x16 gradient mag. and rel. orientation Bin 4x4 samples into 4x4 histograms Threshold values to max of 0.2, divide by L2 norm Final descriptor: 4x4x8 normalized histograms Lowe IJCV 2004

80.SIFT Example sift 868 SIFT features

81.Feature matching Given a feature in I 1 , how to find the best match in I 2 ? Define distance function that compares two descriptors Test all the features in I 2 , find the one with min distance

82.Feature distance How to define the difference between two features f 1 , f 2 ? Simple approach: L 2 distance, ||f 1 - f 2 || can give good scores to ambiguous (incorrect) matches I 1 I 2 f 1 f 2

83.Feature distance How to define the difference between two features f 1 , f 2 ? Simple approach: L 2 distance, ||f 1 - f 2 || can give good scores to ambiguous (incorrect) matches I 1 I 2 f 1 f 2

84.Feature matching example 51 matches

85.Feature matching example 58 matches

86.Evaluating the results How can we measure the performance of a feature matcher? 50 75 200 feature distance

87.True/false positives The distance threshold affects performance True positives = # of detected matches that are correct Suppose we want to maximize these—how to choose threshold? False positives = # of detected matches that are incorrect Suppose we want to minimize these—how to choose threshold? 50 75 200 false match true match feature distance How can we measure the performance of a feature matcher?

88.0.7 Evaluating the results 0 1 1 false positive rate true positive rate # true positives # correctly matched features (positives) 0.1 How can we measure the performance of a feature matcher? “recall” # false positives # incorrectly matched features (negatives) 1 - “precision”

89.0.7 Evaluating the results 0 1 1 false positive rate true positive rate 0.1 ROC curve (“Receiver Operator Characteristic”) How can we measure the performance of a feature matcher? # true positives # correctly matched features (positives) “recall” # false positives # incorrectly matched features (negatives) 1 - “precision”

90.Matching SIFT Descriptors Nearest neighbor (Euclidean distance) Threshold ratio of nearest to 2 nd nearest descriptor Lowe IJCV 2004

91.SIFT Repeatability Lowe IJCV 2004

92.SIFT Repeatability

93.SIFT Repeatability

94.Local Descriptors: SURF K. Grauman, B. Leibe Fast approximation of SIFT idea Efficient computation by 2D box filters & integral images  6 times faster than SIFT Equivalent quality for object identification [Bay, ECCV’06], [Cornelis, CVGPU’08] GPU implementation available Feature extraction @ 200Hz (detector + descriptor, 640×480 img ) http://www.vision.ee.ethz.ch/~surf Many other efficient descriptors are also available

95.Local Descriptors: Shape Context Count the number of points inside each bin, e.g.: Count = 4 Count = 10 ... Log-polar binning: more precision for nearby points, more flexibility for farther points. Belongie & Malik, ICCV 2001 K. Grauman, B. Leibe

96.Local Descriptors: Geometric Blur Example descriptor ~ Compute edges at four orientations Extract a patch in each channel Apply spatially varying blur and sub-sample (Idealized signal) Berg & Malik, CVPR 2001 K. Grauman, B. Leibe

97.Choosing a detector What do you want it for? Precise localization in x-y: Harris Good localization in scale: Difference of Gaussian Flexible region shape: MSER Best choice often application dependent Harris-/Hessian-Laplace/ DoG work well for many natural categories MSER works well for buildings and printed things Why choose? Get more points with more detectors There have been extensive evaluations/comparisons [ Mikolajczyk et al., IJCV’05, PAMI’05] All detectors/descriptors shown here work well

98.Comparison of Keypoint Detectors Tuytelaars Mikolajczyk 2008

99.Choosing a descriptor Again , need not stick to one For object instance recognition or stitching, SIFT or variant is a good choice

100.Recent advances in interest points Features from Accelerated Segment Test , ECCV 06 Binary feature descriptors BRIEF : Binary Robust Independent Elementary Features , ECCV 10 ORB (Oriented FAST and Rotated BRIEF) , CVPR 11 BRISK: Binary robust invariant scalable keypoints , ICCV 11 Freak: Fast retina keypoint , CVPR 12 LIFT: Learned Invariant Feature Transform , ECCV 16

101.Things to remember Keypoint detection: repeatable and distinctive Corners, blobs, stable regions Harris, DoG Descriptors: robust and selective spatial histograms of orientation SIFT

102.Thank you Next class: feature tracking and optical flow