20_Models_For_Words

Most models treat data as continuous Likelihood based on normal distribution Visual words = discrete representation of image Likelihood based on categorical distribution Useful for difficult tasks such as scene recognition and object recognition
展开查看详情

1.Computer vision: models, learning and inference Chapter 20 Models for Visual Words

2.Visual words 2 2 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Most models treat data as continuous Likelihood based on normal distribution Visual words = discrete representation of image Likelihood based on categorical distribution Useful for difficult tasks such as scene recognition and object recognition

3.Motivation: scene recognition 3 3 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

4.Structure 4 4 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Computing visual words Bag of words model Latent Dirichlet allocation Single author-topic model Constellation model Scene model Applications

5.Computing dictionary of visual words 5 5 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince For every one of the I training images, select a set of J i spatial locations. Interest points Regular grid Compute a descriptor at each spatial location in each image Cluster all of these descriptor vectors into K groups using a method such as the K-Means algorithm (or others!) The means of the K clusters are used as the K prototype vectors in the dictionary.

6.Encoding images as visual words 6 6 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Select a set of J spatial locations in the image using the same method as for the dictionary Compute the descriptor at each of the J spatial locations. Compare each descriptor to the set of K prototype descriptors in the dictionary Assign a discrete index to this location that corresponds to the index of the closest word in the dictionary. End result: Discrete feature index x and y position

7.Structure 7 7 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Computing visual words Bag of words model Latent Dirichlet allocation Single author-topic model Constellation model Scene model Applications

8.where Bag of words model 8 8 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Key idea: Abandon all spatial information Just represent image by relative frequency (histogram) of words from dictionary for n’th class

9.Bag of words 9 9 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

10.Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Categorical Distribution or can think of data as vector with all elements zero except k th e.g. [0,0,0,1 0] For short we write: Categorical distribution describes situation where K possible outcomes y=1… y=k . Takes K parameters where 10 Review

11.Dirichlet Distribution Defined over K values where Or for short: Has k parameters a k >0 11 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Review

12.Categorical distribution: MAP 12 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince With a uniform prior ( a 1..K =1), gives same result as maximum likelihood. Take derivative, set to zero and re-arrange: Review

13.Structure 13 13 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Learning (MAP solution): Inference:

14.Bag of words for object recognition 14 14 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

15.Problems with bag of words 15 15 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

16.Structure 16 16 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Computing visual words Bag of words model Latent Dirichlet allocation Single author-topic model Constellation model Scene model Applications

17.Latent Dirichlet allocation 17 17 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Describes relative frequency of visual words across a set of images (no world term) Words not generated independently (connected by hidden variable) Analogy to text documents Each image contains mixture of several topics (parts) Each topic induces a distribution over words

18.Latent Dirichlet allocation 18 18 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

19.Latent Dirichlet allocation 19 19 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Generative equations Marginal distribution over features Conjugate priors over parameters

20.Latent Dirichlet Allocation 20 20 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

21.Learning LDA model 21 21 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Part labels p hidden variables If we knew them then it would be easy to estimate the MAP parameters: How about EM algorithm? Unfortunately, parts p within each image are not independent

22.Latent Dirichlet Allocation 22 22 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

23.Learning 23 23 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Strategy: Write an expression for posterior distribution over part labels Draw samples from posterior using MCMC Use samples to estimate parameters

24.“Lucky” that we chose conjugate priors! 1. Posterior over part labels 24 24 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Can compute two terms in numerator in closed form Denominator intractable

25.2. Draw samples from posterior 25 25 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Gibbs’ sampling : fix all part labels except one and sample from conditional distribution This can be computed in closed form

26.Review Gibbs sampling example: bi- variate normal distribution 26 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

27.Review Gibbs sampling example: bi- variate normal distribution 27 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

28.3. Use samples to estimate parameters 28 28 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Samples substituted in for real part labels in original MAP equations:

29.Structure 29 29 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Computing visual words Bag of words model Latent Dirichlet allocation Single author-topic model Constellation model Scene model Applications