## 展开查看详情

1.Recommendation and Advertising Shannon Quinn (with thanks to J . Leskovec , A. Rajaraman , and J. Ullman of Stanford University)

2.Lecture breakdown Part 1: Advertising Bipartite Matching AdWords Part 2: Recommendation Collaborative Filtering Latent Factor Models

3.1: Advertising on the Web

4.Example: Bipartite Matching 1 2 3 4 a b c d Boys Girls 4 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Nodes: Boys and Girls; Edges: Preferences Goal: Match boys to girls so that maximum number of preferences is satisfied

5.Example: Bipartite Matching M = {(1,a),(2,b),(3,d)} is a matching Cardinality of matching = |M| = 3 1 2 3 4 a b c d Boys Girls 5 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

6.Example: Bipartite Matching 1 2 3 4 a b c d Boys Girls M = {(1,c),(2,b),(3,d),(4,a)} is a perfect matching 6 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Perfect matching … all vertices of the graph are matched Maximum matching … a matching that contains the largest possible number of matches

7.Matching Algorithm Problem: Find a maximum matching for a given bipartite graph A perfect one if it exists There is a polynomial-time offline algorithm based on augmenting paths ( Hopcroft & Karp 1973, see http://en.wikipedia.org/wiki/Hopcroft-Karp_algorithm ) But what if we do not know the entire graph upfront? 7 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

8.Online Graph Matching Problem Initially, we are given the set boys In each round , one girl’s choices are revealed That is, girl’s edges are revealed At that time, we have to decide to either: Pair the girl with a boy Do not pair the girl with any boy Example of application: Assigning tasks to servers 8 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

9.Greedy Algorithm Greedy algorithm for the online graph matching problem: Pair the new girl with any eligible boy If there is none, do not pair girl How good is the algorithm? 9 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

10.Competitive Ratio For input I , suppose greedy produces matching M greedy while an optimal matching is M opt Competitive ratio = min all possible inputs I (| M greedy |/| M opt |) (what is greedy’s worst performance over all possible inputs I ) 10 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

11.Worst-case Scenario 1 2 3 4 a b c (1,a) (2,b) d 11 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

12.History of Web Advertising Banner ads (1995-2001) Initial form of web advertising Popular websites charged X $ for every 1,000 “ impressions” of the ad Called “ CPM ” rate ( Cost per thousand impressions) Modeled similar to TV, magazine ads From untargeted to demographically targeted Low click-through rates L ow ROI for advertisers J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 12 CPM …cost per mille Mille…thousand in Latin

13.Performance-based Advertising Introduced by Overture around 2000 Advertisers bid on search keywords When someone searches for that keyword, the highest bidder’s ad is shown Advertiser is charged only if the ad is clicked on Similar model adopted by Google with some changes around 2002 Called Adwords 13 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

14.Ads vs. Search Results 14 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

15.Web 2.0 Performance-based advertising works! Multi-billion-dollar industry Interesting problem: What ads to show for a given query? (Today’s lecture) If I am an advertiser, which search terms should I bid on and how much should I bid? (Not focus of today’s lecture) 15 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

16.Adwords Problem Given : 1. A set of bids by advertisers for search queries 2. A click-through rate for each advertiser-query pair 3. A budget for each advertiser (say for 1 month) 4. A limit on the number of ads to be displayed with each search query Respond to each search query with a set of advertisers such that: 1. The size of the set is no larger than the limit on the number of ads per query 2. Each advertiser has bid on the search query 3. Each advertiser has enough budget left to pay for the ad if it is clicked upon J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 16

17.Adwords Problem A stream of queries arrives at the search engine: q 1 , q 2 , … Several advertisers bid on each query When query q i arrives, search engine must pick a subset of advertisers whose ads are shown Goal : Maximize search engine’s revenues Simple solution: Instead of raw bids, use the “ expected revenue per click ” (i.e., Bid*CTR ) Clearly we need an online algorithm! 17 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

18.The Adwords Innovation J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 18 Advertiser Bid CTR Bid * CTR A B C $1.00 $0.75 $0.50 1% 2% 2.5% 1 cent 1.5 cents 1.125 cents Click through rate Expected revenue

19.The Adwords Innovation J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19 Advertiser Bid CTR Bid * CTR A B C $1.00 $0.75 $0.50 1% 2% 2.5% 1 cent 1.5 cents 1.125 cents

20.Complications: Budget Two complications: Budget CTR of an ad is unknown Each advertiser has a limited budget Search engine guarantees that the advertiser will not be charged more than their daily budget J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 20

21.Complications: CTR CTR: Each ad has a different likelihood of being clicked Advertiser 1 bids $2, click probability = 0.1 Advertiser 2 bids $1, click probability = 0.5 Clickthrough rate (CTR) is measured historically Very hard problem: Exploration vs. exploitation Exploit: Should we keep showing an ad for which we have good estimates of click-through rate or Explore: Shall we show a brand new ad to get a better sense of its click-through rate J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 21

22.BALANCE Algorithm [MSVV] BALANCE Algorithm by Mehta , Saberi , Vazirani , and Vazirani For each query, pick the advertiser with the largest unspent budget Break ties arbitrarily ( but in a deterministic way ) 22 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

23.Example: BALANCE Two advertisers A and B A bids on query x , B bids on x and y Both have budgets of $ 4 Query stream: x x x x y y y y BALANCE choice: A B A B B B _ _ Optimal: A A A A B B B B In general: For BALANCE on 2 advertisers Competitive ratio = ¾ 23 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

24.BALANCE: General Result In the general case, worst competitive ratio of BALANCE is 1–1/e = approx. 0.63 Interestingly, no online algorithm has a better competitive ratio! 24 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

25.2: Recommender Systems

26.Recommendations Items Search Recommendations Products, web sites, blogs , news items, … 26 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Examples:

27.Sidenote : The Long Tail Source: Chris Anderson (2004) 27 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

28.Formal Model X = set of Customers S = set of Items Utility function u : X × S R R = set of ratings R is a totally ordered set e.g., 0-5 stars, real number in [0,1] 28 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

29.Utility Matrix Avatar LOTR Matrix Pirates Alice Bob Carol David 29 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org