本章首先复习了上一张关于拟合和对齐的相关内容,从而开始介绍对象实例的认可和基于对齐的类别识别示例。如果我们知道哪些点属于直线,我们如何找到“最优”的直线参数?应该使用最小二乘,存在异常值时可以考虑健壮的拟合,RANSAC方法;如果有很多行呢?RANSAC, Hough变换十分适用。

注脚

展开查看详情

1.Alignment and Object Instance Recognition Computer Vision Jia-Bin Huang, Virginia Tech Many slides from S. Lazebnik and D. Hoiem

2.Administrative Stuffs HW 2 due 11:59 PM Oct 3rd Please start early Anonymous feedback Lecture Lectures going too fast Show more examples/code to demonstrate how the algorithms work HW assignments List functions that are not allowed to use Piazza Encourage more students to participate (e.g. answer questions) Group the questions into threads

3.Today’s class Review fitting Alignment Object instance recognition Example of alignment-based category recognition

4.Previous class Global optimization / Search for parameters Least squares fit Robust least squares Iterative closest point (ICP) Hypothesize and test Generalized Hough transform RANSAC

5.Least squares line fitting Data: ( x 1 , y 1 ), …, ( x n , y n ) Line equation: y i = m x i + b Find ( m , b ) to minimize ( x i , y i ) y=mx+b Matlab : p = A \ y; Modified from S. Lazebnik

6.Least squares line fitting function [m, b] = lsqfit (x, y) % y = mx + b % find line that best predicts y given x % minimize sum_i (m* x_i + b - y_i ).^2 A = [x(:) ones( numel (x), 1)]; b = y(:); p = A; m = p(1); b = p(2 ); A y p

7.Total least squares Find ( a , b , c ) to minimize the sum of squared perpendicular distances ( x i , y i ) ax+by+c =0 Unit normal: N= ( a, b ) Solution is eigenvector corresponding to smallest eigenvalue of A T A See details on Raleigh Quotient: http://en.wikipedia.org/wiki/Rayleigh_quotient Slide modified from S. Lazebnik

8.Total least squares Find ( a , b , c ) to minimize the sum of squared perpendicular distances ( x i , y i ) ax+by+c =0 Unit normal: N= ( a, b ) Solution is eigenvector corresponding to smallest eigenvalue of A T A See details on Raleigh Quotient: http://en.wikipedia.org/wiki/Rayleigh_quotient Slide modified from S. Lazebnik

9.Robust Estimator Initialize : e.g., choose by least squares fit and Choose params to minimize: E.g., numerical optimization Compute new Repeat (2) and (3) until convergence  

10.function [m, b] = robust_lsqfit (x, y) % iterative robust fit y = mx + b % find line that best predicts y given x % minimize sum_i (m* x_i + b - y_i ).^2 [ m, b] = lsqfit (x, y); p = [m ; b]; err = sqrt ((y-p(1)*x-p(2)).^2); sigma = median(err)*1.5; for k = 1:7 p = fminunc (@( p) geterr ( p,x,y,sigma ), p); err = sqrt ((y-p(1)*x-p(2)).^2); sigma = median(err)*1.5; end m = p(1); b = p(2 );

11.x y Hough transform P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 Hough space Use a polar representation for the parameter space Slide from S. Savarese

12.function [m, b] = houghfit (x, y) % y = mx + b % x*cos(theta) + y*sin(theta) = r % find line that best predicts y given x % minimize sum_i (m* x_i + b - y_i ).^2 thetas = (- pi+pi /50):(pi/100):pi; costhetas = cos(thetas); sinthetas = sin(thetas); minr = 0; stepr = 0.005; maxr = 1; % count hough votes counts = zeros( numel (thetas ),( maxr-minr )/stepr+1); for k = 1:numel(x) r = x(k)* costhetas + y(k)* sinthetas ; % only count parameters within the range of r inrange = find(r >= minr & r <= maxr ); rnum = round((r( inrange )- minr )/ stepr )+1; ind = sub2ind(size(counts), inrange , rnum ); counts( ind ) = counts( ind ) + 1; end % smooth the bin counts counts = imfilter (counts, fspecial ( gaussian , 5, 0.75)); % get best theta, rho and show counts [ maxval , maxind ] = max(counts(:)); [ thetaind , rind] = ind2sub(size(counts ), maxind ); theta = thetas( thetaind ); r = minr + stepr *(rind-1 ); % convert to slope-intercept b = r/sin(theta); m = -cos(theta)/sin(theta);

13.RANSAC Algorithm: Sample (randomly) the number of points required to fit the model (#=2) Solve for model parameters using samples Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence

14.function [m, b] = ransacfit (x, y) % y = mx + b N = 200; thresh = 0.03 ; bestcount = 0; for k = 1:N rp = randperm ( numel (x)); tx = x( rp (1:2)); ty = y( rp (1:2)); m = (ty(2)-ty(1)) ./ (tx(2)-tx(1)); b = ty(2)-m* tx (2); nin = sum(abs(y-m*x-b)<thresh); if nin > bestcount bestcount = nin ; inliers = (abs(y - m*x - b) < thresh); end end % total least square fitting on inliers [ m, b] = total_lsqfit (x(inliers), y(inliers ));

15.Line fitting demo demo_linefit ( npts , outliers, noise, method) npts : number of points outliers : number of outliers noise : noise level Method lsq : least squares tlsq : total least squares rlsq : robust least squares hough : hough transform ransac : RANSAC

16.Which algorithm should I use? If we know which points belong to the line, how do we find the “optimal” line parameters? Least squares What if there are outliers? Robust fitting, RANSAC What if there are many lines? Voting methods: RANSAC, Hough transform Slide credit: S . Lazebnik

17.Which algorithm should I use? If we know which points belong to the line, how do we find the “optimal” line parameters? Least squares What if there are outliers? Robust fitting, RANSAC What if there are many lines? Voting methods: RANSAC, Hough transform Slide credit: S . Lazebnik

18.What if you want to align but have no prior matched pairs? Hough transform and RANSAC not applicable Important applications Medical imaging: match brain scans or contours Robotics: match point clouds

19.Iterative Closest Points (ICP) Algorithm Goal: estimate transform between two dense sets of points Initialize transformation (e.g., compute difference in means and scale) Assign each point in {Set 1} to its nearest neighbor in {Set 2} Estimate transformation parameters e.g., least squares or robust least squares Transform the points in {Set 1} using estimated parameters Repeat steps 2-4 until change is very small

20.Example: solving for translation A 1 A 2 A 3 B 1 B 2 B 3 Given matched points in {A} and {B}, estimate the translation of the object

21.A 1 A 2 A 3 B 1 B 2 B 3 Least squares solution ( t x , t y ) Write down objective function Derived solution Compute derivative Compute solution Computational solution Write in form Ax=b Solve using pseudo-inverse or eigenvalue decomposition Example: solving for translation

22.A 1 A 2 A 3 B 1 B 2 B 3 RANSAC solution ( t x , t y ) Sample a set of matching points (1 pair) Solve for transformation parameters Score parameters with number of inliers Repeat steps 1-3 N times Problem: outliers A 4 A 5 B 5 B 4 Example: solving for translation

23.A 1 A 2 A 3 B 1 B 2 B 3 Hough transform solution ( t x , t y ) Initialize a grid of parameter values Each matched pair casts a vote for consistent values Find the parameters with the most votes Solve using least squares with inliers A 4 A 5 A 6 B 4 B 5 B 6 Problem: outliers, multiple objects, and/or many-to-one matches Example: solving for translation

24.( t x , t y ) Problem: no initial guesses for correspondence ICP solution Find nearest neighbors for each point Compute transform using matches Move points using transform Repeat steps 1-3 until convergence Example: solving for translation

25.Extract edge pixels and Compute initial transformation (e.g., compute translation and scaling by center of mass, variance within each image) Get nearest neighbors: for each point find corresponding Compute transformation T based on matches Warp points p according to T Repeat 3-5 until convergence   Example: aligning boundaries p q

26.Algorithm Summary Least Squares Fit closed form solution robust to noise not robust to outliers Robust Least Squares improves robustness to noise requires iterative optimization Hough transform robust to noise and outliers can fit multiple models o nly works for a few parameters (1-4 typically) RANSAC robust to noise and outliers works with a moderate number of parameters ( e.g , 1-8) Iterative Closest Point (ICP) For local alignment only: does not require initial correspondences

27.Alignment Alignment: find parameters of model that maps one set of points to another Typically want to solve for a global transformation that accounts for most true correspondences Difficulties Noise (typically 1-3 pixels) Outliers (often 30-50%) Many-to-one matches or multiple objects

28.Parametric (global) warping Transformation T is a coordinate-changing machine: p’ = T (p) What does it mean that T is global? Is the same for any point p can be described by just a few numbers (parameters) For linear transformations, we can represent T as a matrix p’ = T p T p = (x,y) p’ = (x’,y’)

29.Common transformations translation rotation aspect affine perspective original Transformed Slide credit (next few slides): A . Efros and/or S. Seitz

30.Scaling Scaling a coordinate means multiplying each of its components by a scalar Uniform scaling means this scalar is the same for all components:  2

31.Non-uniform scaling : different scalars per component: Scaling X  2, Y  0.5

32.Scaling Scaling operation: Or, in matrix form: scaling matrix S

33.2-D Rotation  (x, y) (x’, y’) x’ = x cos ( ) - y sin () y’ = x sin () + y cos ()

34.2-D Rotation Polar coordinates… x = r cos ( f ) y = r sin ( f ) x’ = r cos ( f +  ) y’ = r sin ( f +  ) Trig Identity… x’ = r cos ( f ) cos (  ) – r sin( f ) sin(  ) y’ = r sin( f ) cos (  ) + r cos ( f ) sin(  ) Substitute… x’ = x cos ( ) - y sin () y’ = x sin () + y cos ()  (x, y) (x’, y’) f

35.2-D Rotation This is easy to capture in matrix form: Even though sin( q ) and cos ( q ) are nonlinear functions of q , x’ is a linear combination of x and y y’ is a linear combination of x and y What is the inverse transformation? Rotation by – q For rotation matrices R

36.Basic 2D transformations Translate Rotate Shear Scale Affine Affine is any combination of translation, scale, rotation, shear

37.Affine Transformations Affine transformations are combinations of Linear transformations, and Translations Properties of affine transformations: Lines map to lines Parallel lines remain parallel Ratios are preserved Closed under composition or

38.Projective Transformations Projective transformations are combos of Affine transformations, and Projective warps Properties of projective transformations: Lines map to lines Parallel lines do not necessarily remain parallel Ratios are not preserved Closed under composition Models change of basis Projective matrix is defined up to a scale (8 DOF)

39.Projective Transformations ( homography ) The transformation between two views of a planar surface The transformation between images from two cameras that share the same center

40.Application: Panorama stitching Source: Hartley & Zisserman

41.Application: document scanning

42.2D image transformations (reference table)

43.Object Instance Recognition Match keypoints to object model Solve for affine transformation parameters Score by inliers and choose solutions with score above threshold B 1 B 2 B 3 A 1 A 2 A 3 Affine Parameters Choose hypothesis with max score above threshold # Inliers Matched keypoints This Class

44.Overview of Keypoint Matching K. Grauman, B. Leibe N pixels N pixels e.g. color e.g. color B 1 B 2 B 3 A 1 A 2 A 3 1. Find a set of distinctive key- points 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors

45.Finding the objects (overview) Match interest points from input image to database image Matched points vote for rough position/orientation/scale of object Find position/orientation/scales that have at least three votes Compute affine registration and matches using iterative least squares with outlier check Report object if there are at least T matched points Input Image Stored Image

46.Matching Keypoints Want to match keypoints between: Query image Stored image containing the object Given descriptor x 0 , find two nearest neighbors x 1 , x 2 with distances d 1 , d 2 x 1 matches x 0 if d 1 /d 2 < 0.8 This gets rid of 90% false matches, 5% of true matches in Lowe’s study

47.Affine Object Model Accounts for 3D rotation of a surface under orthographic projection

48.Fitting an affine transformation Assume we know the correspondences, how do we get the transformation? Want to find M , t to minimize

49.Fitting an affine transformation Assume we know the correspondences, how do we get the transformation?

50.Fitting an affine transformation Linear system with six unknowns Each match gives us two linearly independent equations: need at least three to solve for the transformation parameters

51.Finding the objects (in detail) Match interest points from input image to database image Get location/scale/orientation using Hough voting In training, each point has known position/scale/orientation wrt whole object Matched points vote for the position, scale, and orientation of the entire object Bins for x, y, scale, orientation Wide bins (0.25 object length in position, 2x scale, 30 degrees orientation) Vote for two closest bin centers in each direction (16 votes total) Geometric verification For each bin with at least 3 keypoints Iterate between least squares fit and checking for inliers and outliers Report object if > T inliers (T is typically 3, can be computed to match some probabilistic threshold)

52.Examples of recognized objects

53.View interpolation Training Given images of different viewpoints Cluster similar viewpoints using feature matches Link features in adjacent views Recognition Feature matches may be spread over several training viewpoints  Use the known links to “transfer votes” to other viewpoints Slide credit: David Lowe [Lowe01]

54.Applications Sony Aibo (Evolution Robotics) SIFT usage Recognize docking station Communicate with visual cards Other uses Place recognition Loop closure in SLAM K. Grauman, B. Leibe 54 Slide credit: David Lowe

55.Location Recognition Slide credit: David Lowe Training [Lowe04]

56.Another application: category recognition Goal: identify what type of object is in the image Approach: align to known objects and choose category with best match “Shape matching and object recognition using low distortion correspondence”, Berg et al., CVPR 2005: http://www.cnbc.cmu.edu/cns/papers/berg-cvpr05.pdf ?

57.Summary of algorithm Input: query q and exemplar e For each: sample edge points and create “geometric blur” descriptor Compute match cost c to match points in q to each point in e Compute deformation cost H that penalizes change in orientation and scale for pairs of matched points Solve a binary quadratic program to get correspondence that minimizes c and H , using thin-plate spline deformation Record total cost for e , repeat for all exemplars, choose exemplar with minimum cost Input, Edge Maps Geometric Blur Feature Points Correspondences

58.Examples of Matches

59.Examples of Matches

60.Other ideas worth being aware of Thin-plate splines : combines global affine warp with smooth local deformation Robust non-rigid point matching: A new point matching algorithm for non-rigid registration , CVIU 2003 (includes code, demo, paper )

61.Things to remember Alignment Hough transform RANSAC ICP Object instance recognition Find keypoints , compute descriptors Match descriptors Vote for / fit affine parameters Return object if # inliers > T

62.What have we learned? Interest points Find distinct and repeatable points in images Harris-> corners, DoG -> blobs SIFT -> feature descriptor Feature tracking and optical flow Find motion of a keypoint /pixel over time Lucas- Kanade : brightness consistency, small motion, spatial coherence Handle large motion: iterative update + pyramid search Fitting and alignment find the transformation parameters that best align matched points Object instance recognition Keypoint -based object instance recognition and search

63.Next week – Perspective and 3D Geometry