- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
05_aggregate_functions
展开查看详情
1 .Efficient computation of aggregate functions in large scale data processing frameworks Chao Zhang, Farouk Toumani and Emmanuel Gangler University Clermont Auvergne, France
2 .MapReduce Paradigm Shuffle: sending <k,v> with Mapper: same k to identical reducer generating <k,v> Reducer: computing output on <k,v> with same k Communication cost:a bottleneck of MapReduce 2
3 .Aggregation The inherent property of aggregation The property to break down the dependency : taking all values as input and : associativity. returning only one value as output. Input : 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 } Input : 𝑋 = 𝑋1 ∪ 𝑋2 … ∪ 𝑋𝑙 Output : 𝑦 = 𝛼(𝑋) Output : 𝑦 = 𝛼(𝑋) 𝑥1 𝑋1 𝛼(𝑋1 ) 𝑥2 𝑋2 𝛼(𝑋2 ) 𝑥3 𝛼 y 𝑋3 𝛼(𝑋3 ) 𝛼 y . . . . . . . . . 𝑥𝑛 𝑋𝑙 𝛼(𝑋𝑙 ) Partial aggregation:reducing communication cost 3
4 .Research agenda Q: given an arbitrary aggregation 𝛼, when it is decomposable and how to decompose it? • Classifying aggregations into symmetric (commutative) and asymmetric families. • Generic frameworks to decompose aggregation. • Properties for the frameworks to be efficient: • Sufficient and necessary condition for symmetric aggregation; • Sufficient condition for a class of asymmetric aggregation. 4
5 .An overview of solutions 5