申请试用
HOT
登录
注册
 
05_aggregate_functions
7 点赞
2 收藏
0下载
overflow
/
发布于
/
1645
人观看
Efficient computation of aggregate functions in large scale data processing framewor
展开查看详情

1.Efficient computation of aggregate functions in large scale data processing frameworks Chao Zhang, Farouk Toumani and Emmanuel Gangler University Clermont Auvergne, France

2.MapReduce Paradigm Shuffle: sending <k,v> with Mapper: same k to identical reducer generating <k,v> Reducer: computing output on <k,v> with same k Communication cost:a bottleneck of MapReduce 2

3.Aggregation The inherent property of aggregation The property to break down the dependency : taking all values as input and : associativity. returning only one value as output. Input : 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 } Input : 𝑋 = 𝑋1 ∪ 𝑋2 … ∪ 𝑋𝑙 Output : 𝑦 = 𝛼(𝑋) Output : 𝑦 = 𝛼(𝑋) 𝑥1 𝑋1 𝛼(𝑋1 ) 𝑥2 𝑋2 𝛼(𝑋2 ) 𝑥3 𝛼 y 𝑋3 𝛼(𝑋3 ) 𝛼 y . . . . . . . . . 𝑥𝑛 𝑋𝑙 𝛼(𝑋𝑙 ) Partial aggregation:reducing communication cost 3

4.Research agenda Q: given an arbitrary aggregation 𝛼, when it is decomposable and how to decompose it? • Classifying aggregations into symmetric (commutative) and asymmetric families. • Generic frameworks to decompose aggregation. • Properties for the frameworks to be efficient: • Sufficient and necessary condition for symmetric aggregation; • Sufficient condition for a class of asymmetric aggregation. 4

5.An overview of solutions 5

7 点赞
2 收藏
0下载
相关文档