Deco +Crowdsourcing Summary

1.Deco: Declarative Crowdsourcing Litian Ma

2.Presentation Outline Overview Running Examples Data Model Query Language Query Processing System Architecture Experiments

3.Overview Conventional data management, incorporate “human computation”. DBMS like thing Declarative queries Web

4.Resolve disagreeing human opinions. How does database system interact with human workers? How to enable usage of external sources in addition to crowd? Right data model and query language. Materialization of crowdsourced data. Efficient query processor. Main Challenges

5.Restaurant(name, address, rating, cuisine) AddInfo (address, city, zip) Running Example

6.Data Model DBMS Raw S chema Conceptual Schema Schema designer relations and other stuff End user relations automatic (system)

7.Relations Like Restaurant and AddrInfo . Partitioning of attributes in conceptual relation Anchor attributes (Identifier). Dependent attribute-groups (Property). Fetch rules How to obtain data from external sources including humans. Resolution rules Reconcile inconsistent or uncertain values. Conceptual Schema

8.Tables actually stored in DBMS. For each relation R in the conceptual schema: One anchor table whose attributes are the anchor attributes of R One dependent table for each dependent attribute-group D in R, containing the attributes in the resolution rule for D. Raw Schema

9.Component of Data Model Fetch-Resolve-Join Sequence is a logical concept. May interleave. No materialization for conceptual data.

10.Restaurant(name, address, rating, cuisine) AddrInfo (address, city, zip) Restaurant(name, address, [rating], [cuisine]) AddrInfo (address, [city, zip]) Conceptual Relations - Restaurant Enclose dependent attribute-groups

11.1. A’ → D : f where A’ is a subset of the anchor attributes (A’ A) and D is a dependent attribute-group. function f “cleans” the set of dependent values associated with specific anchor values, when the dependent values may be inconsistent or uncertain. 2. → A : f where A is the set of anchor attributes. function f “cleans” a set of anchor (A) values.   Resolution Rules

12.Resolution Rules

13.A fetch rule takes the following form: A1 A2 : P where A1 and A2 are sets of attributes from one relation and P is a fetch procedure that implements access to human workers or other external sources.   Fetch Rules

14.Fetch Rules Verification

15.Recap: One anchor table One dependent table for each dependent attribute-group. Raw Schema RestA ( name,address ) RestD1( name,address,rating ) RestD2( name,cuisine ) AddrA (address) AddrD1( address,city,zip )

16.Starting with the current contents of the raw tables and logically performing: Fetch: add tuples to Deco tables. Resolve: resolve dependent attributes. Join: full outerjoin of Deco tables for each relation. Resulting in a set of data for the conceptual relations. Logical steps, not necessarily perform, not necessarily in order. Valid Instance

17.Extra column in the raw tables. Not first-class of data model, but crucial for some crowdsourcing applications. Deal with messy aspects of using crowdsourced data. Examples: Data expiration Worker quality Voting Confidence scores Etc. Metadata

18.A Deco query Q is a relational query over the conceptual relations . The answer to Q is the result of evaluating Q over some valid instance of the database. Query Language Empty! At Least 5 At least 5 tuples of non-NULL attributes will return.

19.Push-Pull Hybrid Execution Model Incremental Push Borrow ideas from incremental view maintenance . Result of a fetch rule -> update t o one/more base tables -> propagated to view (conceptual table). Asynchronous Pull Borrow ideas from asynchronous iteration. Initiate multiple new fetches in parallel and feed more tuples back to plan ASAP. Two Phase Materialization : try to answer using raw tables. Accretion : Issues fetch rules to obtain more results. Query Processing

20.Query Plans

21.Query Plans

22.System Design

23.Experiment Setup County(name, [language], [capital])

24.Benchmark Query County(name, [language], [capital])

25.Experiment 1- Fetch Configurations

26.Plan “Down” Push all predicates down as much as possible Similar to reverse fetch query plan. Plan “Up” Predicate pull-up transformation. Similar to filter later query plan. Experiment 2 – Query Plans

27.Experiment 2 – Query Plans