Distributed Data-Parallel Programs from Sequential Building Blocks

using a histogram describing the distribution of likely execution times rather than a ... 4] have introduced strong incentives for cost-aware scheduling in database ...

1.DistributionBased Query Scheduling -- 邹立达

2. 文献来源  Proceedings of the VLDB Endowment, Vol. 6, No. 9  Yun Chi  Department of Computer Sciences, Univers ity of WisconsinMadison ,NEC

3. 背景  当前 SLA 调度,都是假设查询的精确执行 时间是一定的且可以预知的。  现实场景中,执行时间 is not achievable.  准确预估查询时间,在动态、并发环境中 是非常困难的。主要受到硬件、内存上下 文、并发查询的影响。

4. 思想  using a histogram describing the distributio n of likely execution times rather than a sin gle, point estimate of the true running time  one can monitor execution times for instan tiations of each template, and learn a dis tribution of running times 。

5. 直方图分布的优点:  两个查询  Deadline 在不同时, 策略不同。

6. The rapid growth of cloud computing, datab ase as a service (DaaS), and multitenant d atabases [1, 4] have introduced strong inc entives for cost-aware scheduling in datab ase systems as well.

7. CBS 及其缺点  不精确延迟就会导致违约。  一个小的分布扰动,对 CBS 优先级改变很 大。  因此 CBS 不对执行时间不确定的查询,不 具有鲁棒性。并通过实验表明,影响较大 。

8. Shepherd  In Shepherd, instead of relying on a single point estimation for the execution time of a query q, we consider all the possible value s of the execution time of q

9.执行时间均衡分布的 p 值

10.Robustness of the Shepherd Score

11. SHEPHERD IN GENERAL  直方图的获得  历史统计  机器学习预估(贝叶斯)  直方图分解

12. Multiple-step SLA Cost Functions  每个直方图对应每个 线段

13. TPC-W query templates

14. We used two types of workload trac: static and bursty.  Poisson arrivals,  For bursty we used the well-known 1998 World Cup trace

15. 查询时间平均 30ms  We set the deadline d at be 10 ms after th e arrival of each query and we set the cost simply at c = 1.