Near-Optimal Algorithms for Online Matrix Prediction

本章首先介绍了三大预测问题:在线协同过滤、在线Max Cut、在线博弈,并将他们都归结为OMP问题,给出了相应介绍和求解过程同时介绍证明了三种解决方法的可分解性,给出了各种在线矩阵预测问题的近似最优算法,利用比较矩阵的谱结构得到近似紧凸松弛。
展开查看详情

1.Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan ( Technion ) Satyen Kale (Yahoo! Labs) Shai Shalev-Shwartz (Hebrew University)

2.Three Prediction Problems: I. Online Collaborative Filtering Users: {1, 2, …, m} Movies: {1, 2, …, n} On round t : User i t arrives and is interested in movie j t Output predicted rating p t in [-1, 1] User responds with actual rating r t in [-1, 1] Loss = ( p t – r t ) 2 Comparison class: all m x n matrices with entries in [-1, 1] of trace norm ≤ τ For each such matrix W , predicted rating = W(i t , j t ) Regret = loss of alg – loss of best bounded trace-norm matrix Sum of singular values If no entries repeated, [ Cesa -Bianchi, Shamir ‘11] : O(n 3/2 ) regret for τ = O(n)

3.Three Prediction Problems: II. Online Max Cut 2 political parties Voters: {1, 2, …, n} On round t : Voters i t , j t arrive Output prediction: votes agree or disagree Loss = 1 if incorrect prediction, 0 o.w . Comparison class: all possible bipartitions Bipartition prediction = “agree” if i t , j t in same partition, “disagree” o.w . Regret = loss of alg – loss of best bipartition Weight = #(disagree) - #(agree) Best bipartition = Max Cut! Inefficient alg using the 2 n bipartitions as experts: regret =

4.Three Prediction Problems: III. Online Gambling [Abernethy ’ 10, Kleinberg, Niculescu-Mizil , Sharma ‘10] Teams: {1, 2, …, n} In round t : Teams i t , j t compete Output: prediction which team will win Loss = 1 if incorrect prediction, 0 o.w . Comparison class: all possible permutations π Permutation π prediction = i t if π (i t ) ≤ π ( j t ) ; j t o.w . Regret = loss of alg – loss of best permutation Weight = #( i wins) - #(j wins) i j Best permutation = Min Feedback Arc Set! Inefficient alg using the n! permutations as experts: regret = Trivial bound of considered hard to improve (e.g. [ Kanade , Steinke ’ 12] )

5.Results Upper Bound Lower Bound Online Collaborative Filtering  Online Max Cut Online Gambling Stochastic; solves $50 open problem of [ Srebro , Shamir ’ 11] By [Kleinberg, Niculescu-Mizil , Sharma ‘10]

6.One meta-problem to rule them all: Online Matrix Prediction (OMP) In round t : Receive pair i t , j t in [m] x [n] Output prediction p t in [-1, 1] Receive true value y t in [-1, 1] Suffer loss L( p t , y t ) Comparison class : set W of m x n matrices with entries in [-1, 1] Prediction for matrix W : entry W(i t , j t ) Regret = loss of alg – loss of best comparison matrix m x n matrices 1 2 … … n 1 2 : m

7.Online Collaborative Filtering as OMP Users: {1, 2, …, m} Movies: {1, 2, …, n} On round t : User i t arrives and is interested in movie j t Output predicted rating p t in [-1, 1] User responds with actual rating r t Loss = ( p t – r t ) 2 Comparison class: W = all m x n matrices with entries in [-1, 1] of trace norm ≤ τ For each such matrix W , predicted rating = W(i t , j t )

8.Online Max Cut as OMP On round t : Voters i t , j t arrive Output prediction: votes agree or disagree Loss = 1 if incorrect prediction, 0 o.w . Comparison class: all possible bipartitions Bipartition prediction = “agree” if i t , j t in same partition, “disagree” o.w . 2 political parties Voters: {1, 2, …, n} W = all 2 n “cut matrices” W S corresponding to subsets S of [n] W S ( i , j) = 0 if both i , j in S or [n] \ S , = 1 o.w . 0 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 S S [n] \ S [n] \ S

9.Online Gambling as OMP Teams: {1, 2, …, n} In round t : Teams i t , j t compete Output: prediction which team will win Loss = 1 if incorrect prediction, 0 o.w . π(1) π(2) … … π(n) π(1) 1 1 1 1 1 π(2) 0 1 1 1 1 : 0 0 1 1 1 : 0 0 0 1 1 π(n) 0 0 0 0 1 1 2 … … n 1 1 0 1 1 0 2 1 1 1 1 1 : 0 0 1 1 0 : 0 0 0 1 0 n 1 0 1 1 1 Comparison class: all possible permutations π Permutation π prediction = i t if π (i t ) ≤ π ( j t ) ; j t o.w . W = all n! “permutation matrices” W π corresponding to permutations π W π ( i , j) = 1 if π( i ) ≤ π(j) = 0 o.w .

10.Decomposability 0 W W T 0 P = − N W is (β, τ ) -decomposable if where P, N are positive semidefinite Diagonal entries P ii , N ii ≤ β Sum of traces Tr (P) + Tr (N) ≤ τ Class W is (β, τ ) - decomposable if every W in W is. Symmetric square matrix of order m + n

11.Main Result for ( β , τ )-decomposable OMP An efficient algorithm for OMP with ( β, τ )- decomposable W and Lipschitz losses with regret bound

12.The Technology Theorem: Matrix Exponentiated Gradient [ Tsuda , Rätsch , Warmuth ’ 06]/ Matrix Multiplicative Weights [ Arora , K. ‘07 ] algorithm Online Learning problem: in round t , Learner chooses density (i.e. psd , trace 1 ) matrix X t Nature reveals loss matrix M t with eigenvalues in [-1, 1] Learner suffers loss Tr ( M t X t ) Goal: minimize regret = loss of learner – loss of best density matrix

13.Overview of Algorithm for OMP W K All square symmetric X of order 2( m+n ) s.t. X is positive semidefinite Diagonals X ii ≤ β Trace Tr (X) ≤ τ P 0 0 N W Matrix MW algorithm + Bregman projections into K 0 W W T 0 = P - N

14.Decomposability Theorems Online Collaborative Filtering Trace norm ≤ τ matrices are (√(m + n), 2τ) - decomposable. Online Max Cut Cut matrices W S are (½, 2n) -decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n)) -decomposable.

15.Decomposability Theorems Online Collaborative Filtering Trace norm ≤ τ matrices are (√(m + n), 2τ) - decomposable. Online Max Cut Cut matrices W S are (½, 2n) -decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n)) -decomposable.

16.Decomposability for OCF Thm : Any symmetric matrix M of order n with entries in [-1, 1] and trace norm τ is (√n, τ ) -decomposable Eigenvalue decomposition: Define and Clearly Tr (P) + Tr (N) = trace-norm(M) = τ . Diagonals of (P + N) 2 = M 2 bounded by n . So diagonals of (P + N) bounded by √n . So diagonals of P, N bounded by √n .

17.Decomposability for OCF Thm : Any symmetric matrix M of order n with entries in [-1, 1] and trace norm τ is (√n, τ ) -decomposable Eigenvalue decomposition: Define and Clearly Tr (P) + Tr (N) = trace-norm(M) = τ . Diagonals of (P + N) 2 = M 2 bounded by n . So diagonals of (P + N) bounded by √n . So diagonals of P, N bounded by √n .

18.Decomposability for Online Gambling Thm : The all 1 ’s upper triangular matrix of order n is (O(log n), O(n log n)) -decomposable. T(n) = One rank- 1 matrix + t wo non-overlapping T(n/2) B(n) = 1 + B(n/2) B(n) = O(log n).

19.Concluding Remarks Gave near-optimal algorithms for various online matrix prediction problems Exploited spectral structure of comparison matrices to get near-tight convex relaxations Solved 2 COLT open problems from [Abernethy ‘10] and [Shamir, Srebro ‘11] Open problem: get rid of the logarithmic gap between upper and lower bounds Decompositions in the paper are optimal up to constant factors, so a fundamentally different algorithm seems necessary

20.Thanks!