How Good are Query Optimizers, Really?

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentall revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates.
展开查看详情

1. How Good Are Query Optimizers, Really? Viktor Leis Andrey Gubichev Atanas Mirchev TUM TUM TUM leis@in.tum.de gubichev@in.tum.de mirchev@in.tum.de Peter Boncz Alfons Kemper Thomas Neumann CWI TUM TUM p.boncz@cwi.nl kemper@in.tum.de neumann@in.tum.de HJ ABSTRACT B Finding a good join order is crucial for query performance. In this cardinality cost INL SELECT ... estimation model B T paper, we introduce the Join Order Benchmark (JOB) and exper- imentally revisit the main components in the classic query opti- FROM R,S,T v S mizer architecture using a complex, real-world data set and realistic WHERE ... R plan space multi-join queries. We investigate the quality of industrial-strength enumeration cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using an- Figure 1: Traditional query optimizer architecture other set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumera- • How important is an accurate cost model for the overall query tion techniques comparing exhaustive dynamic programming with optimization process? heuristic algorithms and find that exhaustive enumeration improves • How large does the enumerated plan space need to be? performance despite the sub-optimal cardinality estimates. To answer these questions, we use a novel methodology that allows 1. INTRODUCTION us to isolate the influence of the individual optimizer components on query performance. Our experiments are conducted using a real- The problem of finding a good join order is one of the most stud- world data set and 113 multi-join queries that provide a challeng- ied problems in the database field. Figure 1 illustrates the classical, ing, diverse, and realistic workload. Another novel aspect of this cost-based approach, which dates back to System R [36]. To obtain paper is that it focuses on the increasingly common main-memory an efficient query plan, the query optimizer enumerates some subset scenario, where all data fits into RAM. of the valid join orders, for example using dynamic programming. The main contributions of this paper are listed in the following: Using cardinality estimates as its principal input, the cost model then chooses the cheapest alternative from semantically equivalent • We design a challenging workload named Join Order Bench- plan alternatives. mark (JOB), which is based on the IMDB data set. The Theoretically, as long as the cardinality estimations and the cost benchmark is publicly available to facilitate further research. model are accurate, this architecture obtains the optimal query plan. In reality, cardinality estimates are usually computed based on sim- • To the best of our knowledge, this paper presents the first plifying assumptions like uniformity and independence. In real- end-to-end study of the join ordering problem using a real- world data sets, these assumptions are frequently wrong, which world data set and realistic queries. may lead to sub-optimal and sometimes disastrous plans. In this experiments and analyses paper we investigate the three • By quantifying the contributions of cardinality estimation, main components of the classical query optimization architecture the cost model, and the plan enumeration algorithm on query in order to answer the following questions: performance, we provide guidelines for the complete design of a query optimizer. We also show that many disastrous • How good are cardinality estimators and when do bad esti- plans can easily be avoided. mates lead to slow queries? The rest of this paper is organized as follows: We first discuss important background and our new benchmark in Section 2. Sec- tion 3 shows that the cardinality estimators of the major relational database systems produce bad estimates for many realistic queries, This work is licensed under the Creative Commons Attribution- in particular for multi-join queries. The conditions under which NonCommercial-NoDerivatives 4.0 International License. To view a copy these bad estimates cause slow performance are analyzed in Sec- of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For tion 4. We show that it very much depends on how much the any use beyond those covered by this license, obtain permission by emailing query engine relies on these estimates and on how complex the info@vldb.org. Proceedings of the VLDB Endowment, Vol. 9, No. 3 physical database design is, i.e., the number of indexes available. Copyright 2015 VLDB Endowment 2150-8097/15/11. Query engines that mainly rely on hash joins and full table scans, 204

2.are quite robust even in the presence of large cardinality estima- movie_info tion errors. The more indexes are available, the harder the problem becomes for the query optimizer resulting in runtimes that are far info_type away from the optimal query plan. Section 5 shows that with the currently-used cardinality estimation techniques, the influence of cost model errors is dwarfed by cardinality estimation errors and movie_info_idx that even quite simple cost models seem to be sufficient. Sec- company_type tion 6 investigates different plan enumeration algorithms and shows that—despite large cardinality misestimates and sub-optimal cost models—exhaustive join order enumeration improves performance movie_companies info_type and that using heuristics leaves performance on the table. Finally, after discussing related work in Section 7, we present our conclu- sions and future work in Section 8. 2. BACKGROUND AND METHODOLOGY company_name title kind_type Many query optimization papers ignore cardinality estimation and only study search space exploration for join ordering with ran- Figure 2: Typical query graph of our workload domly generated, synthetic queries (e.g., [32, 13]). Other papers investigate only cardinality estimation in isolation either theoreti- cally (e.g., [21]) or empirically (e.g., [43]). As important and in- files. The two largest tables, cast info and movie info have teresting both approaches are for understanding query optimizers, 36 M and 15 M rows, respectively. they do not necessarily reflect real-world user experience. The goal of this paper is to investigate the contribution of all rele- 2.2 The JOB Queries vant query optimizer components to end-to-end query performance Based on the IMDB database, we have constructed analytical in a realistic setting. We therefore perform our experiments using a SQL queries. Since we focus on join ordering, which arguably is workload based on a real-world data set and the widely-used Post- the most important query optimization problem, we designed the greSQL system. PostgreSQL is a relational database system with queries to have between 3 and 16 joins, with an average of 8 joins a fairly traditional architecture making it a good subject for our per query. Query 13d, which finds the ratings and release dates for experiments. Furthermore, its open source nature allows one to in- all movies produced by US companies, is a typical example: spect and change its internals. In this section we introduce the Join SELECT cn.name, mi.info, miidx.info Order Benchmark, describe all relevant aspects of PostgreSQL, and FROM company_name cn, company_type ct, present our methodology. info_type it, info_type it2, title t, 2.1 The IMDB Data Set kind_type kt, movie_companies mc, movie_info mi, movie_info_idx miidx Many research papers on query processing and optimization use WHERE cn.country_code =’[us]’ standard benchmarks like TPC-H, TPC-DS, or the Star Schema AND ct.kind = ’production companies’ Benchmark (SSB). While these benchmarks have proven their value AND it.info = ’rating’ for evaluating query engines, we argue that they are not good bench- AND it2.info = ’release dates’ marks for the cardinality estimation component of query optimiz- AND kt.kind = ’movie’ ers. The reason is that in order to easily be able to scale the bench- AND ... -- (11 join predicates) mark data, the data generators are using the very same simplifying assumptions (uniformity, independence, principle of inclusion) that Each query consists of one select-project-join block4 . The join query optimizers make. Real-world data sets, in contrast, are full graph of the query is shown in Figure 2. The solid edges in the of correlations and non-uniform data distributions, which makes graph represent key/foreign key edges (1 : n) with the arrow head cardinality estimation much harder. Section 3.3 shows that Post- pointing to the primary key side. Dotted edges represent foreign greSQL’s simple cardinality estimator indeed works unrealistically key/foreign key joins (n : m), which appear due to transitive join well for TPC-H. predicates. Our query set consists of 33 query structures, each with Therefore, instead of using a synthetic data set, we chose the 2-6 variants that differ in their selections only, resulting in a total Internet Movie Data Base1 (IMDB). It contains a plethora of in- of 113 queries. Note that depending on the selectivities of the base formation about movies and related facts about actors, directors, table predicates, the variants of the same query structure have dif- production companies, etc. The data is freely available2 for non- ferent optimal query plans that yield widely differing (sometimes commercial use as text files. In addition, we used the open-source by orders of magnitude) runtimes. Also, some queries have more imdbpy3 package to transform the text files into a relational database complex selection predicates than the example (e.g., disjunctions with 21 tables. The data set allows one to answer queries like or substring search using LIKE). “Which actors played in movies released between 2000 and 2005 Our queries are “realistic” and “ad hoc” in the sense that they with ratings above 8?”. Like most real-world data sets IMDB is full answer questions that may reasonably have been asked by a movie of correlations and non-uniform data distributions, and is therefore 4 Since in this paper we do not model or investigate aggregation, much more challenging than most synthetic data sets. Our snap- we omitted GROUP BY from our queries. To avoid communica- shot is from May 2013 and occupies 3.6 GB when exported to CSV tion from becoming the performance bottleneck for queries with 1 large result sizes, we wrap all attributes in the projection clause 2 http://www.imdb.com/ with MIN(...) expressions when executing (but not when es- ftp://ftp.fu-berlin.de/pub/misc/movies/database/ timating). This change has no effect on PostgreSQL’s join order 3 https://bitbucket.org/alberanid/imdbpy/get/5.0.zip selection because its optimizer does not push down aggregations. 205

3.enthusiast. We also believe that despite their simple SPJ-structure, the EXPLAIN command for PostgreSQL). We will later use these the queries model the core difficulty of the join ordering problem. estimates of different systems to obtain optimal query plans (w.r.t. For cardinality estimators the queries are challenging due to the sig- respective systems) and run these plans in PostgreSQL. For exam- nificant number of joins and the correlations contained in the data ple, the intermediate results of the chain query set. However, we did not try to “trick” the query optimizer, e.g., by picking attributes with extreme correlations. Also, we intention- σx=5 (A) A.bid=B.id B B.cid=C.id C ally did not include more complex join predicates like inequalities are σx=5 (A), σx=5 (A) B, B C, and σx=5 (A) B C. or non-surrogate-key predicates, because cardinality estimation for Additionally, the availability of indexes on foreign keys and index- this workload is already quite challenging. nested loop joins introduces the need for additional intermediate We propose JOB for future research in cardinality estimation and result sizes. For instance, if there exists a non-unique index on the query optimization. The query set is available online: foreign key A.bid, it is also necessary to estimate A B and http://www-db.in.tum.de/˜leis/qo/job.tgz A B C. The reason is that the selection A.x = 5 can only be applied after retrieving all matching tuples from the index on 2.3 PostgreSQL A.bid, and therefore the system produces two intermediate results, PostgreSQL’s optimizer follows the traditional textbook archi- before and after the selection. Besides cardinality estimates from tecture. Join orders, including bushy trees but excluding trees with the different systems, we also obtain the true cardinality for each cross products, are enumerated using dynamic programming. The intermediate result by executing SELECT COUNT(*) queries5 . cost model, which is used to decide which plan alternative is cheaper, We further modified PostgreSQL to enable cardinality injection is described in more detail in Section 5.1. The cardinalities of base of arbitrary join expressions, allowing PostgreSQL’s optimizer to tables are estimated using histograms (quantile statistics), most com- use the estimates of other systems (or the true cardinality) instead mon values with their frequencies, and domain cardinalities (dis- of its own. This allows one to directly measure the influence of tinct value counts). These per-attribute statistics are computed by cardinality estimates from different systems on query performance. the analyze command using a sample of the relation. For com- Note that IBM DB2 allows a limited form of user control over the plex predicates, where histograms can not be applied, the system estimation process by allowing users to explicitly specify the se- resorts to ad hoc methods that are not theoretically grounded (“magic lectivities of predicates. However, selectivity injection cannot fully constants”). To combine conjunctive predicates for the same table, model inter-relation correlations and is therefore less general than PostgreSQL simply assumes independence and multiplies the se- the capability of injecting cardinalities for arbitrary expressions. lectivities of the individual selectivity estimates. The result sizes of joins are estimated using the formula 2.5 Experimental Setup |T1 ||T2 | The cardinalities of the commercial systems were obtained using |T1 x=y T2 | = , a laptop running Windows 7. All performance experiments were max(dom(x), dom(y)) performed on a server with two Intel Xeon X5570 CPUs (2.9 GHz) where T1 and T2 are arbitrary expressions and dom(x) is the do- and a total of 8 cores running PostgreSQL 9.4 on Linux. Post- main cardinality of attribute x, i.e., the number of distinct values of greSQL does not parallelize queries, so that only a single core was x. This value is the principal input for the join cardinality estima- used during query processing. The system has 64 GB of RAM, tion. To summarize, PostgreSQL’s cardinality estimator is based on which means that the entire IMDB database is fully cached in RAM. the following assumptions: Intermediate query processing results (e.g., hash tables) also easily • uniformity: all values, except for the most-frequent ones, are fit into RAM, unless a very bad plan with extremely large interme- assumed to have the same number of tuples diate results is chosen. We set the memory limit per operator (work mem) to • independence: predicates on attributes (in the same table or 2 GB, which results in much better performance due to the from joined tables) are independent more frequent use of in-memory hash joins instead of ex- ternal memory sort-merge joins. Additionally, we set the • principle of inclusion: the domains of the join keys overlap buffer pool size (shared buffers) to 4 GB and the size such that the keys from the smaller domain have matches in of the operating system’s buffer cache used by PostgreSQL the larger domain (effective cache size) to 32 GB. For PostgreSQL it is gen- The query engine of PostgreSQL takes a physical operator plan erally recommended to use OS buffering in addition to its own and executes it using Volcano-style interpretation. The most im- buffer pool and keep most of the memory on the OS side. The de- portant access paths are full table scans and lookups in unclustered faults for these three settings are very low (MBs, not GBs), which B+Tree indexes. Joins can be executed using either nested loops is why increasing them is generally recommended. Finally, by in- (with or without index lookups), in-memory hash joins, or sort- creasing the geqo threshold parameter to 18 we forced Post- merge joins where the sort can spill to disk if necessary. The de- greSQL to always use dynamic programming instead of falling cision which join algorithm is used is made by the optimizer and back to a heuristic for queries with more than 12 joins. cannot be changed at runtime. 2.4 Cardinality Extraction and Injection 3. CARDINALITY ESTIMATION We loaded the IMDB data set into 5 relational database sys- Cardinality estimates are the most important ingredient for find- tems: PostgreSQL, HyPer, and 3 commercial systems. Next, we ing a good query plan. Even exhaustive join order enumeration and ran the statistics gathering command of each database system with a perfectly accurate cost model are worthless unless the cardinal- default settings to generate the database-specific statistics (e.g., his- ity estimates are (roughly) correct. It is well known, however, that tograms or samples) that are used by the estimation algorithms. We 5 For our workload it was still feasible to do this na¨ıvely. For larger then obtained the cardinality estimates for all intermediate results data sets the approach by Chaudhuri et al. [7] may become neces- of our test queries using database-specific commands (e.g., using sary. 206

4. PostgreSQL DBMS A DBMS B DBMS C HyPer [log scale] overestimation → 1e4 1e2 1 1e2 ← underestimation 1e4 95th percentile 1e6 75th percentile median 25th percentile 5th percentile 1e8 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 number of joins Figure 3: Quality of cardinality estimates for multi-join queries in comparison with the true cardinalities. Each boxplot summarizes the error distribution of all subexpressions with a particular size (over all queries in the workload) median 90th 95th max curate estimates for arbitrary base table predicates as long as the PostgreSQL 1.00 2.08 6.10 207 selectivity is not too low. When we looked at the selections where DBMS A 1.01 1.33 1.98 43.4 DBMS A and HyPer produce errors above 2, we found that most DBMS B 1.00 6.03 30.2 104000 of them have predicates with extremely low true selectivities (e.g., DBMS C 1.06 1677 5367 20471 10−5 or 10−6 ). This routinely happens when the selection yields HyPer 1.02 4.47 8.00 2084 zero tuples on the sample, and the system falls back on an ad-hoc estimation method (“magic constants”). It therefore appears to be Table 1: Q-errors for base table selections likely that DBMS A also uses the sampling approach. The estimates of the other systems are worse and seem to be based on per-attribute histograms, which do not work well for many cardinality estimates are sometimes wrong by orders of magnitude, predicates and cannot detect (anti-)correlations between attributes. and that such errors are usually the reason for slow queries. In this Note that we obtained all estimates using the default settings af- section, we experimentally investigate the quality of cardinality es- ter running the respective statistics gathering tool. Some commer- timates in relational database systems by comparing the estimates cial systems support the use of sampling for base table estimation, with the true cardinalities. multi-attribute histograms (“column group statistics”), or ex post feedback from previous query runs [38]. However, these features 3.1 Estimates for Base Tables are either not enabled by default or are not fully automatic. To measure the quality of base table cardinality estimates, we use the q-error, which is the factor by which an estimate differs from the true cardinality. For example, if the true cardinality of 3.2 Estimates for Joins an expression is 100, the estimates of 10 or 1000 both have a q- Let us now turn our attention to the estimation of intermediate error of 10. Using the ratio instead of an absolute or quadratic results for joins, which are more challenging because sampling or difference captures the intuition that for making planning decisions histograms do not work well. Figure 3 summarizes over 100,000 only relative differences matter. The q-error furthermore provides cardinality estimates in a single figure. For each intermediate re- a theoretical upper bound for the plan quality if the q-errors of a sult of our query set, we compute the factor by which the estimate query are bounded [30]. differs from the true cardinality, distinguishing between over- and Table 1 shows the 50th, 90th, 95th, and 100th percentiles of the underestimation. The graph shows one “boxplot” (note the legend q-errors for the 629 base table selections in our workload. The in the bottom-left corner) for each intermediate result size, which median q-error is close to the optimal value of 1 for all systems, allows one to compare how the errors change as the number of joins indicating that the majority of all selections are estimated correctly. increases. The vertical axis uses a logarithmic scale to encompass However, all systems produce misestimates for some queries, and underestimates by a factor of 108 and overestimates by a factor of the quality of the cardinality estimates differs strongly between the 104 . different systems. Despite the better base table estimates of DBMS A, the overall Looking at the individual selections, we found that DBMS A and variance of the join estimation errors, as indicated by the boxplot, HyPer can usually predict even complex predicates like substring is similar for all systems with the exception of DBMS B. For all search using LIKE very well. To estimate the selectivities for base systems we routinely observe misestimates by a factor of 1000 or tables HyPer uses a random sample of 1000 rows per table and more. Furthermore, as witnessed by the increasing height of the applies the predicates on that sample. This allows one to get ac- box plots, the errors grow exponentially (note the logarithmic scale) 207

5. JOB 6a JOB 16d JOB 17b JOB 25c TPC-H 5 TPC-H 8 TPC-H 10 [log scale] overestimation → as the number of joins increases [21]. For PostgreSQL 16% of the estimates for 1 join are wrong by a factor of 10 or more. This per- 1e2 centage increases to 32% with 2 joins, and to 52% with 3 joins. For DBMS A, which has the best estimator of the systems we com- pared, the corresponding percentages are only marginally better at 1 15%, 25%, and 36%. Another striking observation is that all tested systems—though DBMS A to a lesser degree—tend to systematically underestimate 1e2 the results sizes of queries with multiple joins. This can be deduced ← underestimation from the median of the error distributions in Figure 3. For our query set, it is indeed the case that the intermediate results tend to de- 1e4 crease with an increasing number of joins because more base table selections get applied. However, the true decrease is less than the independence assumption used by PostgreSQL (and apparently by 0123456 0123456 0123456 0123456 0123456 0123456 0123456 the other systems) predicts. Underestimation is most pronounced number of joins with DBMS B, which frequently estimates 1 row for queries with more than 2 joins. The estimates of DBMS A, on the other hand, Figure 4: PostgreSQL cardinality estimates for 4 JOB queries have medians that are much closer to the truth, despite their vari- and 3 TPC-H queries ance being similar to some of the other systems. We speculate that DBMS A uses a damping factor that depends on the join size, sim- PostgreSQL PostgreSQL (true distinct) [log scale] ilar to how many optimizers combine multiple selectivities. Many 1 estimators combine the selectivities of multiple predicates (e.g., for a base relation or for a subexpression with multiple joins) not by assuming full independence, but by adjusting the selectivities “up- ← underestimation wards”, using a damping factor. The motivation for this stems from the fact that the more predicates need to be applied, the less certain 1e2 one should be about their independence. Given the simplicity of PostgreSQL’s join estimation formula (cf. Section 2.3) and the fact that its estimates are nevertheless com- petitive with the commercial systems, we can deduce that the cur- rent join size estimators are based on the independence assumption. 1e4 0 1 2 3 4 5 6 0 1 2 3 4 5 6 No system tested was able to detect join-crossing correlations. Fur- number of joins thermore, cardinality estimation is highly brittle, as illustrated by the significant number of extremely large errors we observed (fac- Figure 5: PostgreSQL cardinality estimates based on the de- tor 1000 or more) and the following anecdote: In PostgreSQL, we fault distinct count estimates, and the true distinct counts observed different cardinality estimates of the same simple 2-join query depending on the syntactic order of the relations in the from and/or the join predicates in the where clauses! Simply by swap- all queries like in Figure 3). Clearly, the TPC-H query workload ping predicates or relations, we observed the estimates of 3, 9, 128, does not present many hard challenges for cardinality estimators. or 310 rows for the same query (with a true cardinality of 2600)6 . In contrast, our workload contains queries that routinely lead to se- Note that this section does not benchmark the query optimizers vere overestimation and underestimation errors, and hence can be of the different systems. In particular, our results do not imply considered a challenging benchmark for cardinality estimation. that the DBMS B’s optimizer or the resulting query performance is necessarily worse than that of other systems, despite larger errors 3.4 Better Statistics for PostgreSQL in the estimator. The query runtime heavily depends on how the As mentioned in Section 2.3, the most important statistic for join system’s optimizer uses the estimates and how much trust it puts estimation in PostgreSQL is the number of distinct values. These into these numbers. A sophisticated engine may employ adaptive statistics are estimated from a fixed-sized sample, and we have ob- operators (e.g., [4, 8]) and thus mitigate the impact of misestima- served severe underestimates for large tables. To determine if the tions. The results do, however, demonstrate that the state-of-the-art misestimated distinct counts are the underlying problem for cardi- in cardinality estimation is far from perfect. nality estimation, we computed these values precisely and replaced the estimated with the true values. 3.3 Estimates for TPC-H Figure 5 shows that the true distinct counts slightly improve the We have stated earlier that cardinality estimation in TPC-H is variance of the errors. Surprisingly, however, the trend to underes- a rather trivial task. Figure 4 substantiates that claim by show- timate cardinalities becomes even more pronounced. The reason is ing the distributions of PostgreSQL estimation errors for 3 of the that the original, underestimated distinct counts resulted in higher larger TPC-H queries and 4 of our JOB queries. Note that in the estimates, which, accidentally, are closer to the truth. This is an ex- figure we report estimation errors for individual queries (not for ample for the proverbial “two wrongs that make a right”, i.e., two errors that (partially) cancel each other out. Such behavior makes 6 The reasons for this surprising behavior are two implementation analyzing and fixing query optimizer problems very frustrating be- artifacts: First, estimates that are less than 1 are rounded up to 1, cause fixing one query might break another. making subexpression estimates sensitive to the (usually arbitrary) join enumeration order, which is affected by the from clause. The second is a consistency problem caused by incorrect domain sizes of predicate attributes in joins with multiple predicates. 208

6. default + no nested-loop join + rehashing 4. WHEN DO BAD CARDINALITY ESTI- MATES LEAD TO SLOW QUERIES? 60% (a) (b) (c) While the large estimation errors shown in the previous section are certainly sobering, large errors do not necessarily lead to slow 40% query plans. For example, the misestimated expression may be cheap in comparison with other parts of the query, or the relevant plan alternative may have been misestimated by a similar factor 20% thus “canceling out” the original error. In this section we investi- gate the conditions under which bad cardinalities are likely to cause slow queries. 0% One important observation is that query optimization is closely .9 ) [1 1) [2 ) 0, ) >1 ) 00 .9 ) [1 1) [2 ) 0, ) >1 ) 00 .9 ) [1 1) [2 ) 0, ) >1 ) 00 [0 0.9 ,2 [1 ,10 0 [0 0.9 ,2 [1 ,10 0 [0 0.9 ,2 [1 ,10 0 intertwined with the physical database design: the type and number . 10 . 10 . 10 ,1 .1 ,1 .1 ,1 .1 , , , .3 .3 .3 [0 [0 [0 of indexes heavily influence the plan search space, and therefore affects how sensitive the system is to cardinality misestimates. We Figure 6: Slowdown of queries using PostgreSQL estimates therefore start this section with experiments using a relatively ro- w.r.t. using true cardinalities (primary key indexes only) bust physical design with only primary key indexes and show that in such a setup the impact of cardinality misestimates can largely be mitigated. After that, we demonstrate that for more complex con- figurations with many indexes, cardinality misestimation makes it join algorithm and 1,000,001 with a hash join, PostgreSQL will much more likely to miss the optimal plan by a large margin. always prefer the nested-loop algorithm even if there is a equality join predicate, which allows one to use hashing. Of course, given 4.1 The Risk of Relying on Estimates the O(n2 ) complexity of nested-loop join and O(n) complexity of To measure the impact of cardinality misestimation on query per- hash join, and given the fact that underestimates are quite frequent, formance we injected the estimates of the different systems into this decision is extremely risky. And even if the estimates happen PostgreSQL and then executed the resulting plans. Using the same to be correct, any potential performance advantage of a nested-loop query engine allows one to compare the cardinality estimation com- join in comparison with a hash join is very small, so taking this high ponents in isolation by (largely) abstracting away from the different risk can only result in a very small payoff. query execution engines. Additionally, we inject the true cardinali- Therefore, we disabled nested-loop joins (but not index-nested- ties, which computes the—with respect to the cost model—optimal loop joins) in all following experiments. As Figure 6b shows, when plan. We group the runtimes based on their slowdown w.r.t. the op- rerunning all queries without these risky nested-loop joins, we ob- timal plan, and report the distribution in the following table, where served no more timeouts despite using PostgreSQL’s estimates. each column corresponds to a group: Also, none of the queries performed slower than before despite having less join algorithm options, confirming our hypothesis that <0.9 [0.9,1.1) [1.1,2) [2,10) [10,100) >100 nested-loop joins (without indexes) seldom have any upside. How- PostgreSQL 1.8% 38% 25% 25% 5.3% 5.3% ever, this change does not solve all problems, as there are still a DBMS A 2.7% 54% 21% 14% 0.9% 7.1% number of queries that are more than a factor of 10 slower (cf., red DBMS B 0.9% 35% 18% 15% 7.1% 25% bars) in comparison with the true cardinalities. DBMS C 1.8% 38% 35% 13% 7.1% 5.3% When investigating the reason why the remaining queries still HyPer 2.7% 37% 27% 19% 8.0% 6.2% did not perform as well as they could, we found that most of them contain a hash join where the size of the build input is underesti- A small number of queries become slightly slower using the true mated. PostgreSQL up to and including version 9.4 chooses the instead of the erroneous cardinalities. This effect is caused by cost size of the in-memory hash table based on the cardinality estimate. model errors, which we discuss in Section 5. However, as expected, Underestimates can lead to undersized hash tables with very long the vast majority of the queries are slower when estimates are used. collisions chains and therefore bad performance. The upcoming Using DBMS A’s estimates, 78% of the queries are less than 2× version 9.5 resizes the hash table at runtime based on the number slower than using the true cardinalities, while for DBMS B this is of rows actually stored in the hash table. We backported this patch the case for only 53% of the queries. This corroborates the findings to our code base, which is based on 9.4, and enabled it for all re- about the relative quality of cardinality estimates in the previous maining experiments. Figure 6c shows the effect of this change section. Unfortunately, all estimators occasionally lead to plans in addition with disabled nested-loop joins. Less than 4% of the that take an unreasonable time and lead to a timeout. Surprisingly, queries are off by more than 2× in comparison with the true cardi- however, many of the observed slowdowns are easily avoidable de- nalities. spite the bad estimates as we show in the following. To summarize, being “purely cost-based”, i.e., not taking into When looking at the queries that did not finish in a reasonable account the inherent uncertainty of cardinality estimates and the time using the estimates, we found that most have one thing in asymptotic complexities of different algorithm choices, can lead to common: PostgreSQL’s optimizer decides to introduce a nested- very bad query plans. Algorithms that seldom offer a large benefit loop join (without an index lookup) because of a very low cardinal- over more robust algorithms should not be chosen. Furthermore, ity estimate, whereas in reality the true cardinality is larger. As we query processing algorithms should, if possible, automatically de- saw in the previous section, systematic underestimation happens termine their parameters at runtime instead of relying on cardinality very frequently, which occasionally results in the introduction of estimates. nested-loop joins. The underlying reason why PostgreSQL chooses nested-loop joins 4.2 Good Plans Despite Bad Cardinalities is that it picks the join algorithm on a purely cost-based basis. For The query runtimes of plans with different join orders often vary example, if the cost estimate is 1,000,000 with the nested-loop by many orders of magnitude (cf. Section 6.1). Nevertheless, when 209

7. PK indexes PK + FK indexes correlated predicates involve columns from different tables, con- 60% (a) (b) nected by joins. These we call “join-crossing correlations”. Such correlations frequently occur in the IMDB data set, e.g., actors born 40% in Paris are likely to play in French movies. Given these join-crossing correlations one could wonder if there exist complex access paths that allow to exploit these. One exam- 20% ple relevant here despite its original setting in XQuery processing is ROX [22]. It studied runtime join order query optimization in the context of DBLP co-authorship queries that count how many 0% Authors had published Papers in three particular venues, out of many. These queries joining the author sets from different venues ) [1 ) ) [1 0) 0) 00 [0 .9) [1 ) ) [1 0) 0) 00 .9 .1 ,2 .1 ,2 ,1 10 ,1 10 >1 >1 ,0 ,1 .1 ,0 ,1 .1 clearly have join-crossing correlations, since authors who publish [2 [2 .3 .9 0, .3 .9 0, [0 [0 [0 in VLDB are typically database researchers, likely to also publish in Figure 7: Slowdown of queries using PostgreSQL estimates SIGMOD, but not—say—in Nature. w.r.t. using true cardinalities (different index configurations) In the DBLP case, Authorship is a n : m relationship that links the relation Authors with the relation Papers. The op- timal query plans in [22] used an index-nested-loop join, look- ing up each author into Authorship.author (the indexed pri- the database has only primary key indexes, as in all in experiments mary key) followed by a filter restriction on Paper.venue, which so far, and once nested loop joins have been disabled and rehashing needs to be looked up with yet another join. This filter on venue has been enabled, the performance of most queries is close to the would normally have to be calculated after these two joins. How- one obtained using the true cardinalities. Given the bad quality ever, the physical design of [22] stored Authorship partitioned by of the cardinality estimates, we consider this to be a surprisingly Paper.venue.7 This partitioning has startling effects: instead of positive result. It is worthwhile to reflect on why this is the case. one Authorship table and primary key index, one physically has The main reason is that without foreign key indexes, most large many, one for each venue partition. This means that by accessing (“fact”) tables need to be scanned using full table scans, which the right partition, the filter is implicitly enforced (for free), before dampens the effect of different join orders. The join order still the join happens. This specific physical design therefore causes matters, but the results indicate that the cardinality estimates are the optimal plan to be as follows: first join the smallish authorship usually good enough to rule out all disastrous join order decisions set from SIGMOD with the large set for Nature producing almost like joining two large tables using an unselective join predicate. no result tuples, making the subsequent nested-loops index lookup Another important reason is that in main memory picking an index- join into VLDB very cheap. If the tables would not have been parti- nested-loop join where a hash join would have been faster is never tioned, index lookups from all SIGMOD authors into Authorships disastrous. With all data and indexes fully cached, we measured would first find all co-authored papers, of which the great majority that the performance advantage of a hash join over an index-nested- is irrelevant because they are about database research, and were not loop join is at most 5× with PostgreSQL and 2× with HyPer. Ob- published in Nature. Without this partitioning, there is no way to viously, when the index must be read from disk, random IO may avoid this large intermediate result, and there is no query plan that result in a much larger factor. Therefore, the main-memory setting comes close to the partitioned case in efficiency: even if cardinality is much more forgiving. estimation would be able to predict join-crossing correlations, there would be no physical way to profit from this knowledge. 4.3 Complex Access Paths The lesson to draw from this example is that the effects of query So far, all query executions were performed on a database with optimization are always gated by the available options in terms of indexes on primary key attributes only. To see if the query opti- access paths. Having a partitioned index on a join-crossing predi- mization problem becomes harder when there are more indexes, cate as in [22] is a non-obvious physical design alternative which we additionally indexed all foreign key attributes. Figure 7b shows even modifies the schema by bringing in a join-crossing column the effect of additional foreign key indexes. We see large perfor- (Paper.venue) as partitioning key of a table (Authorship). The mance differences with 40% of the queries being slower by a factor partitioned DBLP set-up is just one example of how one particu- of 2! Note that these results do not mean that adding more indexes lar join-crossing correlation can be handled, rather than a generic decreases performance (although this can occasionally happen). In- solution. Join-crossing correlations remain an open frontier for deed overall performance generally increases significantly, but the database research involving the interplay of physical design, query more indexes are available the harder the job of the query optimizer execution and query optimization. In our JOB experiments we do becomes. not attempt to chart this mostly unknown space, but rather charac- terize the impact of (join-crossing) correlations on the current state- 4.4 Join-Crossing Correlations of-the-art of query processing, restricting ourselves to standard PK and FK indexing. There is consensus in our community that estimation of interme- diate result cardinalities in the presence of correlated query predi- cates is a frontier in query optimization research. The JOB work- 5. COST MODELS load studied in this paper consists of real-world data and its queries The cost model guides the selection of plans from the search contain many correlated predicates. Our experiments that focus on space. The cost models of contemporary systems are sophisticated single-table subquery cardinality estimation quality (cf. Table 1) 7 show that systems that keep table samples (HyPer and presumably In fact, rather than relational table partitioning, there was a sep- arate XML document per venue, e.g., separate documents for DBMS A) can achieve almost perfect estimation results, even for SIGMOD, VLDB, Nature and a few thousand more venues. Stor- correlated predicates (inside the same table). As such, the cardinal- age in a separate XML document has roughly the same effect on ity estimation research challenge appears to lie in queries where the access paths as partitioned tables. 210

8.software artifacts that are resulting from 30+ years of research and PostgreSQL estimates true cardinalities development, mostly concentrated in the area of traditional disk- based systems. PostgreSQL’s cost model, for instance, is com- 1e4 standard cost model prised of over 4000 lines of C code, and takes into account various subtle considerations, e.g., it takes into account partially correlated index accesses, interesting orders, tuple sizes, etc. It is interest- 1e2 ing, therefore, to evaluate how much a complex cost model actually contributes to the overall query performance. First, we will experimentally establish the correlation between 1 the PostgreSQL cost model—a typical cost model of a disk-based (a) (b) DBMS—and the query runtime. Then, we will compare the Post- greSQL cost model with two other cost functions. The first cost model is a tuned version of PostgreSQL’s model for a main-memory runtime [ms] [log scale] 1e4 setup where all data fits into RAM. The second cost model is an ex- tuned cost model tremely simple function that only takes the number of tuples pro- duced during query evaluation into account. We show that, un- 1e2 surprisingly, the difference between the cost models is dwarfed by the cardinality estimates errors. We conduct our experiments on a database instance with foreign key indexes. We begin with a brief 1 description of a typical disk-oriented complex cost model, namely (c) (d) the one of PostgreSQL. 5.1 The PostgreSQL Cost Model 1e4 simple cost model PostgreSQL’s disk-oriented cost model combines CPU and I/O costs with certain weights. Specifically, the cost of an operator is defined as a weighted sum of the number of accessed disk pages 1e2 (both sequential and random) and the amount of data processed in memory. The cost of a query plan is then the sum of the costs of all operators. The default values of the weight parameters used 1 in the sum (cost variables) are set by the optimizer designers and (e) (f) are meant to reflect the relative difference between random access, 1e+05 1e+07 1e+03 1e+05 1e+07 sequential access and CPU costs. cost [log scale] The PostgreSQL documentation contains the following note on cost variables: “Unfortunately, there is no well-defined method Figure 8: Predicted cost vs. runtime for different cost models for determining ideal values for the cost variables. They are best treated as averages over the entire mix of queries that a particular installation will receive. This means that changing them on the ba- the PostgreSQL cost model a reliable predictor of the runtime, as sis of just a few experiments is very risky.” For a database adminis- has been observed previously [42]. trator, who needs to actually set these parameters these suggestions Intuitively, a straight line in Figure 8 corresponds to an ideal are not very helpful; no doubt most will not change these param- cost model that always assigns (predicts) higher costs for more ex- eters. This comment is of course, not PostgreSQL-specific, since pensive queries. Naturally, any monotonically increasing function other systems feature similarly complex cost models. In general, would satisfy that requirement, but the linear model provides the tuning and calibrating cost models (based on sampling, various ma- simplest and the closest fit to the observed data. We can therefore chine learning techniques etc.) has been a subject of a number of interpret the deviation from this line as the prediction error of the papers (e.g, [42, 25]). It is important, therefore, to investigate the cost model. Specifically, we consider the absolute percentage error impact of the cost model on the overall query engine performance. |T (Q)−Tpred (Q)| This will indirectly show the contribution of cost model errors on of a cost model for a query Q: (Q) = real Treal (Q) , where query performance. Treal is the observed runtime, and Tpred is the runtime predicted by our linear model. Using the default cost model of PostgreSQL and the true cardinalities, the median error of the cost model is 38%. 5.2 Cost and Runtime The main virtue of a cost function is its ability to predict which 5.3 Tuning the Cost Model for Main Memory of the alternative query plans will be the fastest, given the cardinal- As mentioned above, a cost model typically involves parame- ity estimates; in other words, what counts is its correlation with the ters that are subject to tuning by the database administrator. In a query runtime. The correlation between the cost and the runtime of disk-based system such as PostgreSQL, these parameters can be queries in PostgreSQL is shown in Figure 8a. Additionally, we con- grouped into CPU cost parameters and I/O cost parameters, with sider the case where the engine has the true cardinalities injected, the default settings reflecting an expected proportion between these and plot the corresponding data points in Figure 8b. For both plots, two classes in a hypothetical workload. we fit the linear regression model (displayed as a straight line) and In many settings the default values are sub optimal. For example, highlight the standard error. The predicted cost of a query corre- the default parameter values in PostgreSQL suggest that process- lates with its runtime in both scenarios. Poor cardinality estimates, ing a tuple is 400x cheaper than reading it from a page. However, however, lead to a large number of outliers and a very wide stan- modern servers are frequently equipped with very large RAM ca- dard error area in Figure 8a. Only using the true cardinalities makes pacities, and in many workloads the data set actually fits entirely 211

9.into available memory (admittedly, the core of PostgreSQL was JOB 6a JOB 13a JOB 16d JOB 17b JOB 25c shaped decades ago when database servers only had few megabytes of RAM). This does not eliminate the page access costs entirely no indexes (due to buffer manager overhead), but significantly bridges the gap between the I/O and CPU processing costs. Arguably, the most important change that needs to be done in the cost model for a main-memory workload is to decrease the propor- tion between these two groups. We have done so by multiplying the PK indexes CPU cost parameters by a factor of 50. The results of the workload run with improved parameters are plotted in the two middle subfig- ures of Figure 8. Comparing Figure 8b with d, we see that tuning does indeed improve the correlation between the cost and the run- time. On the other hand, as is evident from comparing Figure 8c and d, parameter tuning improvement is still overshadowed by the PK + FK indexes difference between the estimated and the true cardinalities. Note that Figure 8c features a set of outliers for which the optimizer has accidentally discovered very good plans (runtimes around 1 ms) without realizing it (hence very high costs). This is another sign of “oscillation” in query planning caused by cardinality misestimates. 1 1e2 1e3 1e4 1 1e2 1e3 1e4 1 1e2 1e3 1e4 1 1e2 1e3 1e4 1 1e2 1e3 1e4 In addition, we measure the prediction error of the tuned cost cost relative to optimal FK plan [log scale] model, as defined in Section 5.2. We observe that tuning improves the predictive power of the cost model: the median error decreases Figure 9: Cost distributions for 5 queries and different index from 38% to 30%. configurations. The vertical green lines represent the cost of the optimal plan 5.4 Are Complex Cost Models Necessary? As discussed above, the PostgreSQL cost model is quite com- plex. Presumably, this complexity should reflect various factors than the built-in cost function. This improvement is not insignifi- influencing query execution, such as the speed of a disk seek and cant, but on the other hand, it is dwarfed by improvement in query read, CPU processing costs, etc. In order to find out whether this runtime observed when we replace estimated cardinalities with the complexity is actually necessary in a main-memory setting, we will real ones (cf. Figure 6b). This allows us to reiterate our main mes- contrast it with a very simple cost function Cmm . This cost func- sage that cardinality estimation is much more crucial than the cost tion is tailored for the main-memory setting in that it does not model model. I/O costs, but only counts the number of tuples that pass through each operator during query execution: 6. PLAN SPACE  τ · |R| if T = R ∨ T = σ(R) Besides cardinality estimation and the cost model, the final im- portant query optimization component is a plan enumeration algo-   |T | + C (T ) + C (T ) if T = T  HJ mm mm T2 Cmm (T ) = 1 2 1 rithm that explores the space of semantically equivalent join orders.   Cmm (T1 )+ if T = T1 INL T2 , Many different algorithms, both exhaustive (e.g., [29, 12]) as well λ · |T1 | · max( |T|T 1 R| as heuristic (e.g, [37, 32]) have been proposed. These algorithms  , 1) (T2 = R ∨ T2 = σ(R))  1| consider a different number of candidate solutions (that constitute In the formula above R is a base relation, and τ ≤ 1 is a pa- the search space) when picking the best plan. In this section we rameter that discounts the cost of a table scan in comparison with investigate how large the search space needs to be in order to find a joins. The cost function distinguishes between hash HJ and index- good plan. nested loop INL joins: the latter scans T1 and performs index The experiments of this section use a standalone query optimizer, lookups into an index on R, thus avoiding a full table scan of R. which implements Dynamic Programming (DP) and a number of A special case occurs when there is a selection on the right side of heuristic join enumeration algorithms. Our optimizer allows the in- the index-nested loop join, in which case we take into account the jection of arbitrary cardinality estimates. In order to fully explore number of tuple lookups in the base table index and essentially dis- the search space, we do not actually execute the query plans pro- card the selection from the cost computation (hence the multiplier duced by the optimizer in this section, as that would be infeasible max( |T|T 1 R| 1| , 1)). For index-nested loop joins we use the constant due to the number of joins our queries have. Instead, we first run λ ≥ 1 to approximate by how much an index lookup is more ex- the query optimizer using the estimates as input. Then, we recom- pensive than a hash table lookup. Specifically, we set λ = 2 and pute the cost of the resulting plan with the true cardinalities, giving τ = 0.2. As in our previous experiments, we disable nested loop us a very good approximation of the runtime the plan would have joins when the inner relation is not an index lookup (i.e., non-index in reality. We use the in-memory cost model from Section 5.4 and nested loop joins). assume that it perfectly predicts the query runtime, which, for our The results of our workload run with Cmm as a cost function are purposes, is a reasonable assumption since the errors of the cost depicted in Figure 8e and f. We see that even our trivial cost model model are negligible in comparison the cardinality errors. This ap- is able to fairly accurately predict the query runtime using the true proach allows us to compare a large number of plans without exe- cardinalities. To quantify this argument, we measure the improve- cuting all of them. ment in the runtime achieved by changing the cost model for true cardinalities: In terms of the geometric mean over all queries, our 6.1 How Important Is the Join Order? tuned cost model yields 41% faster runtimes than the standard Post- We use the Quickpick [40] algorithm to visualize the costs of greSQL model, but even a simple Cmm makes queries 34% faster different join orders. Quickpick is a simple, randomized algorithm 212

10.that picks joins edges at random until all joined relations are fully PK indexes PK + FK indexes connected. Each run produces a correct, but usually slow, query median 95% max median 95% max plan. By running the algorithm 10,000 times per query and com- zig-zag 1.00 1.06 1.33 1.00 1.60 2.54 puting the costs of the resulting plans, we obtain an approximate left-deep 1.00 1.14 1.63 1.06 2.49 4.50 distribution for the costs of random plans. Figure 9 shows density right-deep 1.87 4.97 6.80 47.2 30931 738349 plots for 5 representative example queries and for three physical database designs: no indexes, primary key indexes only, and pri- Table 2: Slowdown for restricted tree shapes in comparison to mary+foreign key indexes. The costs are normalized by the opti- the optimal plan (true cardinalities) mal plan (with foreign key indexes), which we obtained by running dynamic programming and the true cardinalities. The graphs, which use a logarithmic scale on the horizontal cost trees are worse than zig-zag trees, as expected, but still result in axis, clearly illustrate the importance of the join ordering problem: reasonable performance. Right-deep trees, on the other hand, per- The slowest or even median cost is generally multiple orders of form much worse than the other tree shapes and thus should not be magnitude more expensive than the cheapest plan. The shapes of used exclusively. The bad performance of right-deep trees is caused the distributions are quite diverse. For some queries, there are many by the large intermediate hash tables that need to be created from good plans (e.g., 25c), for others few (e.g., 16d). The distribution each base relation and the fact that only the bottom-most join can are sometimes wide (e.g., 16d) and sometimes narrow (e.g., 25c). be done via index lookup. The plots for the “no indexes” and the “PK indexes” configurations are very similar implying that for our workload primary key in- 6.3 Are Heuristics Good Enough? dexes alone do not improve performance very much, since we do So far in this paper, we have used the dynamic programming not have selections on primary key columns. In many cases the algorithm, which computes the optimal join order. However, given “PK+FK indexes” distributions have additional small peaks on the the bad quality of the cardinality estimates, one may reasonably ask left side of the plot, which means that the optimal plan in this index whether an exhaustive algorithm is even necessary. We therefore configuration is much faster than in the other configurations. compare dynamic programming with a randomized and a greedy We also analyzed the entire workload to confirm these visual ob- heuristics. servations: The percentage of plans that are at most 1.5× more The “Quickpick-1000” heuristics is a randomized algorithm that expensive than the optimal plan is 44% without indexes, 39% with chooses the cheapest (based on the estimated cardinalities) 1000 primary key indexes, but only 4% with foreign key indexes. The random plans. Among all greedy heuristics, we pick Greedy Op- average fraction between the worst and the best plan, i.e., the width erator Ordering (GOO) since it was shown to be superior to other of the distribution, is 101× without indexes, 115× with primary deterministic approximate algorithms [11]. GOO maintains a set key indexes, and 48120× with foreign key indexes. These sum- of join trees, each of which initially consists of one base relation. mary statistics highlight the dramatically different search spaces of The algorithm then combines the pair of join trees with the lowest the three index configurations. cost to a single join tree. Both Quickpick-1000 and GOO can pro- duce bushy plans, but obviously only explore parts of the search 6.2 Are Bushy Trees Necessary? space. All algorithms in this experiment internally use the Post- Most join ordering algorithms do not enumerate all possible tree greSQL cardinality estimates to compute a query plan, for which shapes. Virtually all optimizers ignore join orders with cross prod- we compute the “true” cost using the true cardinalities. ucts, which results in a dramatically reduced optimization time with Table 3 shows that it is worthwhile to fully examine the search only negligible query performance impact. Oracle goes even fur- space using dynamic programming despite cardinality misestima- ther by not considering bushy join trees [1]. In order to quantify tion. However, the errors introduced by estimation errors cause the effect of restricting the search space on query performance, we larger performance losses than the heuristics. In contrast to some modified our DP algorithm to only enumerate left-deep, right-deep, other heuristics (e.g., [5]), GOO and Quickpick-1000 are not re- or zig-zag trees. ally aware of indexes. Therefore, GOO and Quickpick-1000 work Aside from the obvious tree shape restriction, each of these better when few indexes are available, which is also the case when classes implies constraints on the join method selection. We fol- there are more good plans. low the definition by Garcia-Molina et al.’s textbook, which is re- To summarize, our results indicate that enumerating all bushy verse from the one in Ramakrishnan and Gehrke’s book: Using trees exhaustively offers moderate but not insignificant performance hash joins, right-deep trees are executed by first creating hash ta- benefits in comparison with algorithms that enumerate only a sub bles out of each relation except one before probing in all of these set of the search space. The performance potential from good car- hash tables in a pipelined fashion, whereas in left-deep trees, a new dinality estimates is certainly much larger. However, given the ex- hash table is built from the result of each join. In zig-zag trees, istence of exhaustive enumeration algorithms that can find the opti- which are a super set of all left- and right-deep trees, each join mal solution for queries with dozens of relations very quickly (e.g., operator must have at least one base relation as input. For index- [29, 12]), there are few cases where resorting to heuristics or dis- nested loop joins we additionally employ the following convention: abling bushy trees should be necessary. the left child of a join is a source of tuples that are looked up in the index on the right child, which must be a base table. Using the true cardinalities, we compute the cost of the optimal 7. RELATED WORK plan for each of the three restricted tree shapes. We divide these Our cardinality estimation experiments show that systems which costs by the optimal tree (which may have any shape, including keep table samples for cardinality estimation predict single-table “bushy”) thereby measuring how much performance is lost by re- result sizes considerably better than those which apply the inde- stricting the search space. The results in Table 2 show that zig-zag pendence assumption and use single-column histograms [20]. We trees offer decent performance in most cases, with the worst case think systems should be adopting table samples as a simple and ro- being 2.54× more expensive than the best bushy plan. Left-deep bust technique, rather than earlier suggestions to explicitly detect 213

11. PK indexes PK + FK indexes PostgreSQL estimates true cardinalities PostgreSQL estimates true cardinalities median 95% max median 95% max median 95% max median 95% max Dynamic Programming 1.03 1.85 4.79 1.00 1.00 1.00 1.66 169 186367 1.00 1.00 1.00 Quickpick-1000 1.05 2.19 7.29 1.00 1.07 1.14 2.52 365 186367 1.02 4.72 32.3 Greedy Operator Ordering 1.19 2.29 2.36 1.19 1.64 1.97 2.35 169 186367 1.20 5.77 21.0 Table 3: Comparison of exhaustive dynamic programming with the Quickpick-1000 (best of 1000 random plans) and the Greedy Operator Ordering heuristics. All costs are normalized by the optimal plan of that index configuration certain correlations [19] to subsequently create multi-column his- Our experiments with the second query optimizer component be- tograms [34] for these. sides cardinality estimation, namely the cost model, suggest that However, many of our JOB queries contain join-crossing cor- tuning cost models provides less benefits than improving cardi- relations, which single-table samples do not capture, and where nality estimates, and in a main-memory setting even an extremely the current generation of systems still apply the independence as- simple cost-model can produce satisfactory results. This conclu- sumption. There is a body of existing research work to better esti- sion resonates with some of the findings in [42] which sets out to mate result sizes of queries with join-crossing correlations, mainly improve cost models but shows major improvements by refining based on join samples [17], possibly enhanced against skew (end- cardinality estimates with additional sampling. biased sampling [10], correlated samples [43]), using sketches [35] For testing the final query optimizer component, plan enumera- or graphical models [39]. This work confirms that without ad- tion, we borrowed in our methodology from the Quickpick method dressing join-crossing correlations, cardinality estimates deterio- used in randomized query optimization [40] to characterize and vi- rate strongly with more joins [21], leading to both the over- and sualize the search space. Another well-known search space visu- underestimation of result sizes (mostly the latter), so it would be alization method is Picasso [18], which visualizes query plans as positive if some of these techniques would be adopted by systems. areas in a space where query parameters are the dimensions. Inter- Another way of learning about join-crossing correlations is by estingly, [40] claims in its characterization of the search space that exploiting query feedback, as in the LEO project [38], though there good query plans are easily found, but our tests indicate that the it was noted that deriving cardinality estimations based on a mix of richer the physical design and access path choices, the rarer good exact knowledge and lack of knowledge needs a sound mathemat- query plans become. ical underpinning. For this, maximum entropy (MaxEnt [28, 23]) Query optimization is a core database research topic with a huge was defined, though the costs for applying maximum entropy are body of related work, that cannot be fully represented in this sec- high and have prevented its use in systems so far. We found that tion. After decades of work still having this problem far from re- the performance impact of estimation mistakes heavily depends on solved [26], some have even questioned it and argued for the need the physical database design; in our experiments the largest impact of optimizer application hints [6]. This paper introduces the Join is in situations with the richest designs. From the ROX [22] dis- Order Benchmark based on the highly correlated IMDB real-world cussion in Section 4.4 one might conjecture that to truly unlock data set and a methodology for measuring the accuracy of cardinal- the potential of correctly predicting cardinalities for join-crossing ity estimation. Its integration in systems proposed for testing and correlations, we also need new physical designs and access paths. evaluating the quality of query optimizers [41, 16, 14, 27] is hoped Another finding in this paper is that the adverse effects of cardi- to spur further innovation in this important topic. nality misestimations can be strongly reduced if systems would be “hedging their bets” and not only choose the plan with the cheapest expected cost, but take the probabilistic distribution of the estimate into account, to avoid plans that are marginally faster than others 8. CONCLUSIONS AND FUTURE WORK but bear a high risk of strong underestimation. There has been work In this paper we have provided quantitative evidence for conven- both on doing this for cardinality estimates purely [30], as well as tional wisdom that has been accumulated in three decades of prac- combining these with a cost model (cost distributions [2]). tical experience with query optimizers. We have shown that query The problem with fixed hash table sizes for PostgreSQL illus- optimization is essential for efficient query processing and that ex- trates that cost misestimation can often be mitigated by making the haustive enumeration algorithms find better plans than heuristics. runtime behavior of the query engine more “performance robust”. We have also shown that relational database systems produce large This links to a body of work to make systems adaptive to estima- estimation errors that quickly grow as the number of joins increases, tion mistakes, e.g., dynamically switch sides in a join, or change and that these errors are usually the reason for bad plans. In con- between hashing and sorting (GJoin [15]), switch between sequen- trast to cardinality estimation, the contribution of the cost model to tial scan and index lookup (smooth scan [4]), adaptively reordering the overall query performance is limited. join pipelines during query execution [24], or change aggregation Going forward, we see two main routes for improving the plan strategies at runtime depending on the actual number of group-by quality in heavily-indexed settings. First, database systems can in- values [31] or partition-by values [3]. corporate more advanced estimation algorithms that have been pro- A radical approach is to move query optimization to runtime, posed in the literature. The second route would be to increase the when actual value-distributions become available [33, 9]. However, interaction between the runtime and the query optimizer. We leave runtime techniques typically restrict the plan search space to limit the evaluation of both approaches for future work. runtime plan exploration cost, and sometimes come with functional We encourage the community to use the Join Order Benchmark restrictions such as to only consider (sampling through) operators as a test bed for further experiments, for example into the risk/re- which have pre-created indexed access paths (e.g., ROX [22]). ward tradeoffs of complex access paths. Furthermore, it would be interesting to investigate disk-resident and distributed databases, which provide different challenges than our main-memory setting. 214

12.Acknowledgments [22] R. A. Kader, P. A. Boncz, S. Manegold, and M. van Keulen. We would like to thank Guy Lohman and the anonymous reviewers ROX: run-time optimization of XQueries. In SIGMOD, for their valuable feedback. We also thank Moritz Wilfer for his pages 615–626, 2009. input in the early stages of this project. [23] R. Kaushik, C. R´e, and D. Suciu. General database statistics using entropy maximization. In DBPL, pages 84–99, 2009. 9. REFERENCES [24] Q. Li, M. Shao, V. Markl, K. S. Beyer, L. S. Colby, and [1] R. Ahmed, R. Sen, M. Poess, and S. Chakkappen. Of G. M. Lohman. Adaptively reordering joins during query snowstorms and bushy trees. PVLDB, 7(13):1452–1461, execution. In ICDE, pages 26–35, 2007. 2014. [25] F. Liu and S. Blanas. Forecasting the cost of processing [2] B. Babcock and S. Chaudhuri. Towards a robust query multi-join queries via hashing for main-memory databases. optimizer: A principled and practical approach. In SIGMOD, In SoCC, pages 153–166, 2015. pages 119–130, 2005. [26] G. Lohman. Is query optimization a solved problem? [3] S. Bellamkonda, H.-G. Li, U. Jagtap, Y. Zhu, V. Liang, and http://wp.sigmod.org/?p=1075, 2014. T. Cruanes. Adaptive and big data scale parallel execution in [27] L. F. Mackert and G. M. Lohman. R* optimizer validation Oracle. PVLDB, 6(11):1102–1113, 2013. and performance evaluation for local queries. In SIGMOD, [4] R. Borovica-Gajic, S. Idreos, A. Ailamaki, M. Zukowski, pages 84–95, 1986. and C. Fraser. Smooth scan: Statistics-oblivious access [28] V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, paths. In ICDE, pages 315–326, 2015. and U. Srivastava. Consistently estimating the selectivity of [5] N. Bruno, C. A. Galindo-Legaria, and M. Joshi. Polynomial conjuncts of predicates. In VLDB, pages 373–384, 2005. heuristics for query optimization. In ICDE, pages 589–600, [29] G. Moerkotte and T. Neumann. Dynamic programming 2010. strikes back. In SIGMOD, pages 539–552, 2008. [6] S. Chaudhuri. Query optimizers: time to rethink the [30] G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad contract? In SIGMOD, pages 961–968, 2009. plans by bounding the impact of cardinality estimation [7] S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Exact errors. PVLDB, 2(1):982–993, 2009. cardinality query optimization for optimizer testing. PVLDB, [31] I. M¨uller, P. Sanders, A. Lacurie, W. Lehner, and F. F¨arber. 2(1):994–1005, 2009. Cache-efficient aggregation: Hashing is sorting. In [8] M. Colgan. Oracle adaptive joins. SIGMOD, pages 1123–1136, 2015. https://blogs.oracle.com/optimizer/ [32] T. Neumann. Query simplification: graceful degradation for entry/what_s_new_in_12c, 2013. join-order optimization. In SIGMOD, pages 403–414, 2009. [9] A. Dutt and J. R. Haritsa. Plan bouquets: query processing [33] T. Neumann and C. A. Galindo-Legaria. Taking the edge off without selectivity estimation. In SIGMOD, pages cardinality estimation errors using incremental execution. In 1039–1050, 2014. BTW, pages 73–92, 2013. [10] C. Estan and J. F. Naughton. End-biased samples for join [34] V. Poosala and Y. E. Ioannidis. Selectivity estimation without cardinality estimation. In ICDE, page 20, 2006. the attribute value independence assumption. In VLDB, [11] L. Fegaras. A new heuristic for optimizing large queries. In pages 486–495, 1997. DEXA, pages 726–735, 1998. [35] F. Rusu and A. Dobra. Sketches for size of join estimation. [12] P. Fender and G. Moerkotte. Counter strike: Generic TODS, 33(3), 2008. top-down join enumeration for hypergraphs. PVLDB, [36] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. 6(14):1822–1833, 2013. Lorie, and T. G. Price. Access path selection in a relational [13] P. Fender, G. Moerkotte, T. Neumann, and V. Leis. Effective database management system. In SIGMOD, pages 23–34, and robust pruning for top-down join enumeration 1979. algorithms. In ICDE, pages 414–425, 2012. [37] M. Steinbrunn, G. Moerkotte, and A. Kemper. Heuristic and [14] C. Fraser, L. Giakoumakis, V. Hamine, and K. F. randomized optimization for the join ordering problem. Moore-Smith. Testing cardinality estimation models in SQL VLDB J., 6(3):191–208, 1997. Server. In DBtest, 2012. [38] M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - [15] G. Graefe. A generalized join algorithm. In BTW, pages DB2’s learning optimizer. In VLDB, pages 19–28, 2001. 267–286, 2011. [39] K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight [16] Z. Gu, M. A. Soliman, and F. M. Waas. Testing the accuracy graphical models for selectivity estimation without of query optimizers. In DBTest, 2012. independence assumptions. PVLDB, 4(11):852–863, 2011. [17] P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. [40] F. Waas and A. Pellenkoft. Join order selection - good Selectivity and cost estimation for joins based on random enough is easy. In BNCOD, pages 51–67, 2000. sampling. J Computer System Science, 52(3):550–569, 1996. [41] F. M. Waas, L. Giakoumakis, and S. Zhang. Plan space [18] J. R. Haritsa. The Picasso database query optimizer analysis: an early warning system to detect plan regressions visualizer. PVLDB, 3(2):1517–1520, 2010. in cost-based optimizers. In DBTest, 2011. [19] I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. [42] W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacig¨um¨us, and J. F. CORDS: automatic discovery of correlations and soft Naughton. Predicting query execution time: Are optimizer functional dependencies. In SIGMOD, pages 647–658, 2004. cost models really unusable? In ICDE, pages 1081–1092, 2013. [20] Y. E. Ioannidis. The history of histograms (abridged). In VLDB, pages 19–30, 2003. [43] F. Yu, W. Hou, C. Luo, D. Che, and M. Zhu. CS2: a new database synopsis for query estimation. In SIGMOD, pages [21] Y. E. Ioannidis and S. Christodoulakis. On the propagation of 469–480, 2013. errors in the size of join results. In SIGMOD, 1991. 215