# Near-Optimal Algorithms for Online Matrix Prediction

1.Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan ( Technion ) Satyen Kale (Yahoo! Labs) Shai Shalev-Shwartz (Hebrew University)

2.Three Prediction Problems: I. Online Collaborative Filtering Users: {1, 2, …, m} Movies: {1, 2, …, n} On round t : User i t arrives and is interested in movie j t Output predicted rating p t in [-1, 1] User responds with actual rating r t in [-1, 1] Loss = ( p t – r t ) 2 Comparison class: all m x n matrices with entries in [-1, 1] of trace norm ≤ τ For each such matrix W , predicted rating = W(i t , j t ) Regret = loss of alg – loss of best bounded trace-norm matrix Sum of singular values If no entries repeated, [ Cesa -Bianchi, Shamir ‘11] : O(n 3/2 ) regret for τ = O(n)

3.Three Prediction Problems: II. Online Max Cut 2 political parties Voters: {1, 2, …, n} On round t : Voters i t , j t arrive Output prediction: votes agree or disagree Loss = 1 if incorrect prediction, 0 o.w . Comparison class: all possible bipartitions Bipartition prediction = “agree” if i t , j t in same partition, “disagree” o.w . Regret = loss of alg – loss of best bipartition Weight = #(disagree) - #(agree) Best bipartition = Max Cut! Inefficient alg using the 2 n bipartitions as experts: regret =

4.Three Prediction Problems: III. Online Gambling [Abernethy ’ 10, Kleinberg, Niculescu-Mizil , Sharma ‘10] Teams: {1, 2, …, n} In round t : Teams i t , j t compete Output: prediction which team will win Loss = 1 if incorrect prediction, 0 o.w . Comparison class: all possible permutations π Permutation π prediction = i t if π (i t ) ≤ π ( j t ) ; j t o.w . Regret = loss of alg – loss of best permutation Weight = #( i wins) - #(j wins) i j Best permutation = Min Feedback Arc Set! Inefficient alg using the n! permutations as experts: regret = Trivial bound of considered hard to improve (e.g. [ Kanade , Steinke ’ 12] )

5.Results Upper Bound Lower Bound Online Collaborative Filtering ﻿ Online Max Cut Online Gambling Stochastic; solves \$50 open problem of [ Srebro , Shamir ’ 11] By [Kleinberg, Niculescu-Mizil , Sharma ‘10]

6.One meta-problem to rule them all: Online Matrix Prediction (OMP) In round t : Receive pair i t , j t in [m] x [n] Output prediction p t in [-1, 1] Receive true value y t in [-1, 1] Suffer loss L( p t , y t ) Comparison class : set W of m x n matrices with entries in [-1, 1] Prediction for matrix W : entry W(i t , j t ) Regret = loss of alg – loss of best comparison matrix m x n matrices 1 2 … … n 1 2 : m

7.Online Collaborative Filtering as OMP Users: {1, 2, …, m} Movies: {1, 2, …, n} On round t : User i t arrives and is interested in movie j t Output predicted rating p t in [-1, 1] User responds with actual rating r t Loss = ( p t – r t ) 2 Comparison class: W = all m x n matrices with entries in [-1, 1] of trace norm ≤ τ For each such matrix W , predicted rating = W(i t , j t )

8.Online Max Cut as OMP On round t : Voters i t , j t arrive Output prediction: votes agree or disagree Loss = 1 if incorrect prediction, 0 o.w . Comparison class: all possible bipartitions Bipartition prediction = “agree” if i t , j t in same partition, “disagree” o.w . 2 political parties Voters: {1, 2, …, n} W = all 2 n “cut matrices” W S corresponding to subsets S of [n] W S ( i , j) = 0 if both i , j in S or [n] \ S , = 1 o.w . 0 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 S S [n] \ S [n] \ S

9.Online Gambling as OMP Teams: {1, 2, …, n} In round t : Teams i t , j t compete Output: prediction which team will win Loss = 1 if incorrect prediction, 0 o.w . π(1) π(2) … … π(n) π(1) 1 1 1 1 1 π(2) 0 1 1 1 1 : 0 0 1 1 1 : 0 0 0 1 1 π(n) 0 0 0 0 1 1 2 … … n 1 1 0 1 1 0 2 1 1 1 1 1 : 0 0 1 1 0 : 0 0 0 1 0 n 1 0 1 1 1 Comparison class: all possible permutations π Permutation π prediction = i t if π (i t ) ≤ π ( j t ) ; j t o.w . W = all n! “permutation matrices” W π corresponding to permutations π W π ( i , j) = 1 if π( i ) ≤ π(j) = 0 o.w .

10.Decomposability 0 W W T 0 P = − N W is (β, τ ) -decomposable if where P, N are positive semidefinite Diagonal entries P ii , N ii ≤ β Sum of traces Tr (P) + Tr (N) ≤ τ Class W is (β, τ ) - decomposable if every W in W is. Symmetric square matrix of order m + n

11.Main Result for ( β , τ )-decomposable OMP An efficient algorithm for OMP with ( β, τ )- decomposable W and Lipschitz losses with regret bound

12.The Technology Theorem: Matrix Exponentiated Gradient [ Tsuda , Rätsch , Warmuth ’ 06]/ Matrix Multiplicative Weights [ Arora , K. ‘07 ] algorithm Online Learning problem: in round t , Learner chooses density (i.e. psd , trace 1 ) matrix X t Nature reveals loss matrix M t with eigenvalues in [-1, 1] Learner suffers loss Tr ( M t X t ) Goal: minimize regret = loss of learner – loss of best density matrix

13.Overview of Algorithm for OMP W K All square symmetric X of order 2( m+n ) s.t. X is positive semidefinite Diagonals X ii ≤ β Trace Tr (X) ≤ τ P 0 0 N W Matrix MW algorithm + Bregman projections into K 0 W W T 0 = P - N

14.Decomposability Theorems Online Collaborative Filtering Trace norm ≤ τ matrices are (√(m + n), 2τ) - decomposable. Online Max Cut Cut matrices W S are (½, 2n) -decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n)) -decomposable.

15.Decomposability Theorems Online Collaborative Filtering Trace norm ≤ τ matrices are (√(m + n), 2τ) - decomposable. Online Max Cut Cut matrices W S are (½, 2n) -decomposable. Online Gambling Permutation matrices W π are (O(log n), O(n log n)) -decomposable.

16.Decomposability for OCF Thm : Any symmetric matrix M of order n with entries in [-1, 1] and trace norm τ is (√n, τ ) -decomposable Eigenvalue decomposition: Define and Clearly Tr (P) + Tr (N) = trace-norm(M) = τ . Diagonals of (P + N) 2 = M 2 bounded by n . So diagonals of (P + N) bounded by √n . So diagonals of P, N bounded by √n .

17.Decomposability for OCF Thm : Any symmetric matrix M of order n with entries in [-1, 1] and trace norm τ is (√n, τ ) -decomposable Eigenvalue decomposition: Define and Clearly Tr (P) + Tr (N) = trace-norm(M) = τ . Diagonals of (P + N) 2 = M 2 bounded by n . So diagonals of (P + N) bounded by √n . So diagonals of P, N bounded by √n .

18.Decomposability for Online Gambling Thm : The all 1 ’s upper triangular matrix of order n is (O(log n), O(n log n)) -decomposable. T(n) = One rank- 1 matrix + t wo non-overlapping T(n/2) B(n) = 1 + B(n/2) B(n) = O(log n).

19.Concluding Remarks Gave near-optimal algorithms for various online matrix prediction problems Exploited spectral structure of comparison matrices to get near-tight convex relaxations Solved 2 COLT open problems from [Abernethy ‘10] and [Shamir, Srebro ‘11] Open problem: get rid of the logarithmic gap between upper and lower bounds Decompositions in the paper are optimal up to constant factors, so a fundamentally different algorithm seems necessary

20.Thanks!