关系数据库管理系统中的访问路径选择

查询优化的基础知识。 SQL是声明性的,即你所使用的是查询语言指定所需的数据,并不是你想要的数据进行查询。 通常有多种方式(query plans)执行查询。数据库系统检查多个计划并确定最佳计划,此过程称为查询优化。进行查询优化的传统方法是为不同的访问方法和查询计划建立成本模型(cost-model)。本文介绍了成本模型和动态编程算法,用以选择最佳计划。
展开查看详情

1. Access Path Selection in a Relational Database Management System P. Griffiths Selinger M. M. Astrahan D. D. Chamberlin ‘,.::' It. A. Lorie T. G. Price 4: IBM Research Division, San Jose, California 95193 ABSTRACT: In a high level query and data retrieval. Nor does a user specify in what manipulation language such as SQL, requests order joins are to be performed. The are stated non-procedurally, without System R optimizer .chooses both join order reference to access paths. This paper and an access path for each table in the describes how System R chooses access paths SQL statement. Of the many possible for both simple (single relation) and choices, the optimizer chooses the one complex queries (such as joins), given a which minimizes "total access cost" for user specification of desired data as a performing the entire statement. boolean expression of predicates. System R is an experimental database management This paper will address the issues of system developed to carry out research on access path selection for queries. the relational model of data. System R was Retrieval for data manipulation (UPDATE, designed and built by members of the IBM DELETE) is treated similarly. Section 2 San Jose Research'Laboratory. will describe the place of the optimizer in the processing of a SQL statement, and 1. Introduction section 3 will describe the storage compo- nent access paths that are available on a System' R is an experimental database single physically stored table. In section management system based on the relational 4 the optimizer cost formulas are intro- model of data which has been under develop- duced for single table queries, and section ment at the IBM San Jose Research Laborato- 5 discusses the joining of two or more ry since 1975 Cl>. The software was tables, and their corresponding costs. . developed as a research vehicle in rela- Nested queries (queries in predicates) are tional database, and is not generally covered in section 6. available outside the IBM Research Divi- sion. 2. processi.Bg & B.B u statement This paper assumes familiarity with A SQL statement is subjected to four relational data model terminology as phases of processing. Depending on 'the described in Codd <7> and Date <a>. The origin and contents of the statement., these user interface in System R is the unified phases may be separated by arbitrary query, data definition, and manipulation intervals. of time. In System RI these language SQL <5>. Statements in SQL can be arbitrary time intervals are transparent to issued both from an on-line casual-user-or- the system components which process a SQL iented terminal interface and from program- statement. These mechanisms and a descrip- ming languages such as PL/I and COBOL. tion of the processing of SQL statements from both programs and terminals are In System R a user need not know how further discussed in <2>. Only an overview the tuples are physically stored and what of those processing steps that are relevant access paths are available (e.g. which to access path selection will be discussed columns have indexes). SQL statements do here. not require the user to specify anything about the access path to be used for tuple The four phases of statement processing are-parsing, optimization. code generation. Permission to copy without fee all or part of this and execution. Each SQL statement is sent material is granted provided that the copies are to , the parser. where it is checked for not made or distributed for direct couunercial ad- correct syntax. A guery block is repre- vantage, the ACMcopyright notice and the title of sented by a SELECT list, a FROM list, and a the publication and its date appear, and notice is WHERE tree, containing, respectively the given that copying is by permission of the Associa- list of .items to be retrieved, the table(s) tion for Computing Machinery. To copy otherwise, referenced, and the boolean combination of or to republish, requires a fee end/or specific simple predicates specified by the user. A permission. single SQL statement may have many query 01979 ACM0-89791-001-X/79/0500-0023 $00.75 blocks because a predicate may have one 23

2.operand which is itself a query. 3. _T'he Research Storaae System If the parser returns without any The Research Storage System (RSSI is errors detected, the OPTIMIZER component is the storage subsystem of System R. It is called. The OPTIMIZER accumulates the responsible for maintaining Physical names of tables and columns referenced in storage of relations, access paths on these the query and looks them up in the System R relations, locking (in a multi-user envi- catalogs to verify their existence and to ronment), and logging and recovery facili- retrieve information about them. ties. The RSS presents a tuple-oriented interface (RSII to its users. Although the The catalog lookup portion of the RSS may be used independently of System R, OPTIMIZER also obtains statistics about the we are concerned here with its use for referenced relations, and the access paths executing the code generated by the proces- ‘available on each of them. These will be sing of SQL statements in System R, as used later in access path selection. After described in the previous section. For a catalog lookup has obtained the datatype complete description of the RSS, see <l>. and length of each column, the OPTIMIZER rescans the SELECT-list and WHERE-tree to Relations are stored in the RSS as a check for semantic errors and type compati- collection of tuples whose columns are bility in both expressions and predicate physically contiguous. These tuples are comparisons. stored on 4K byte pages; no tuple spans a page. Pages are organized into logical units called segments. Segments may contain one or more relations, but no Finally the OPTIMIZER performs access relation rn%y span a segment. Tuples from path selection. It first determines the two or more relations may occur on the same evaluation order among the query blocks in page. Each tuple is tagged with the the statement. Then for each query block, identification of the relation to which it the relations in the FROM list are belongs. processed. If there is more than one relation in a block, permutations of the The'primary way of accessing tuples in join order and of the method of joining are a relation is via an RSS scan. A scan evaluated. The access paths that minimize returns a tuple at a time along a given total cost for the block are chosen from a access path. OPEN, NEXT, and CLOSE are the tree of alternate path choices. This principal commands on a scan. minimum cost .solution is represented by a structural modification of the parse tree. Two types of scans are currently The result is an execution plan in the available for SQL statements. The first Access Specification Language (ASLI <lo>. type is a segment scan to find all the tuples of a given relation. A series of After a plan is chosen for each query NEXTs on a segment scan simply examines all block and represented in the parse tree, pages of the segment which contain tuples, the CODE GENERATOR is called. The CODE from any relation, and returns those tuples GENERATOR is a table-driven program which belonging to the given relation. translates ASL trees into machine language code to execute the plan chosen by the The second type of scan is an index OPTIMIZER. In doing this it uses a rela- scan. An index may be created by a.Sy.stem tively small number of code templates, one R user on one or more columns of a rela- for each type of join method (including no tion, and a relation may have any number join). Query blocks for nested queries are (including zero1 of indexes on it. These treated as "subroutines" which return indexes are stored on separate pages from values to the predicates in which they those containing the relation tuples. occur. The CODE GENERATOR is further Indexes are implemented as B-trees <3>, described in <9>. whose leaves are pages containing sets of (key, identifiers of tuples .. which contain During code generation, the parse tree that key). Therefore a series of NEXTs on is replaced by executable machine code and an index scan does a sequential read along its associated data structures. Either the leaf Pages of the index, obtaining the control is immediately transfered to this tuple identifiers matching a key, and using code or the code is stored away in the them to find and return the data tuples to database for later execution, depending on the user in key value order. Index leaf the origin of the statement (program or pages are chained together so that NEXTs terminal). In either case, when the code needinot reference any upper level Pages Of is ultimately enecuted, it calls upon the the i,ndex. System R internal storage system (RSS) via the storage system interface (RSII to scan In a segment scan, all the non-empty each of the physically stored relations in page5 of a segment will be touched. regard- the query. These scans are along the less of whether there are any tuples from access paths chosen by the OPTIMIZER. The the desired relation on them. However, RSI commands that may be used by generated each page is touched only once. When an code are described in the next section. entire relation is enamined via an index scan, each page of the index is touched 24

3. only once, but 'a data page may be examined considered to be in conjunctive normal more than once if it has two tuples on it form, and every conjunct is called a which are not "close" in the index order- boolean factoy. Boolean factors are ing. If the tuples are inserted into notable because every tuple returned to the segment pages in the index 'ordering, and if user must satisfy every boolean factor. An this physical proximity corresponding to index is said to match a boolean factor i,f index key value is maintained, we say that the boolean factor is a sargable predicate the index is clustered. A clustered index whose referenced column is the index key; has the property that not only each index e.g., an index on SALARY matches the paw P but also each data page containing a predicate 'SALARY = 20000'. More precise- tuple from that relation will be touched ly, we say that a predicate or set of only once in a scan on that index. ,.. predicates matches an index access path .: when the predicates are sargable and the An index scan need not scan the *entire columns mentioned in the predicate(s) are relation. Starting and stopping key values an initial substring of the set of columns may be specified in order to scan only of the index key. For example. a NAME, those tuples which have a key in a range of LOCATION index matches NAME = 'SMITH' AND index values. Both index and segment scans LOCATION = 'SAN JOSE'. If an index matches may optionally take a set of predicates, a boolean factor, an access using that called search arguments (or SARGS), which index is an efficient way to satisfy the are applied to a tuple before it is boolean factor. Sargable boolean factors returned to the RSI caller. If the tuple can also be efficiently satisfied if they satisfies the predicates, it is returned; are expressed as search arguments. Note otherwise the scan continues until it that a boolean factor may be an entire tree either finds a tuple which satisfies the of predicates headed by an OR. SARGS or exhausts the segment or the specified index value range. This reduces During catalog lookup, the OPTIIlIZER cost by eliminating the overhead of making retrieves statistics on the relations in RSI calls for tuples which can be effi- the query and on the access paths available ciently rejected within the RSS. Not all on each relation. The statistics kept are predicates are of the form that can become the following: SARGS. A sm predicate is one of the form (or which can be put into the form) For each relation T. ncolumn comparison-operator value". SARGS - NCARD(TIr the cardinality of relation T. are expressed as a boolean expression of - TCARDfT). the number of pages in the such predicates in disjunctive normal form. segment that hold tuples of relation T. - P(T), the fraction of data pages in the f 4. Costs fox sinqle relation access paths segment that hold tuples of relation T. P(T) = TCARD(T1 / (no. of non-empty In the next 'several sections we will pages in the segment). describe the process of choosing a plan for evaluating a query. We will first describe For each index I on relation T, the simplest case, accessing a single - ICARD( number of distinct keys in relation, and show how it extends and index I. generalizes to t-way joins of relations, - NINDXfIlr the number of pages in index I. n-way joins, and finally multiple query blocks (nested queries). These statistics are maintained in the System R catalogs, and come from several The OPTIMIZER examines both the predi- sources. Initial relation loading and cates in the query and the access paths index creation ihitialize these statistics. available on the relations referenced by They are then updated periodically by an the queryI and.formu1ate.s a cost prediction UPDATE STATISTICS command, which can be run for each access plan, using the following by any user. System R does not update cost formula: these statistics at every INSERT, DELETE, COST = PAGE -FETCHES + W * (RSI CALLS). or UPDATE because of the extra database This cost is' a weighted measure of I/O operations and the locking bottleneck th,is (pages fetched) and CPU utilization would create at the system catalogs. (instructions executed). W is an adjusta- Dynamic updating of statistics would tend ble weighting factor between I/O and CPU. to serialize accesses that modify the RSI CALLS is the predicted number 0.f tuples relation contents. returned from the RSS. Since most of System R's CPU time is spent in the RSS, Using these statistics, the OPTIMIZER the number of RSI calls is a good approxi- assigns a selectivity factor 'F' for each mation for CPU utilization. Thus the boolean factor in the predicate list. This choice of a minimum cost path to process a selectivity factor very roughly corresponds query attempts to minimize total resources tb the expected fraction of tuples which required. wk 11 satisfy the predicate. TABLE 1 gives the selectivity factors for different kinds During execution of the type-compati- of predicates. We assume that a lack of bility and semantic checking portion of the statistics implies that the relation is OPTIMIZER, each query block's WHERE tree of small, so an arbitrary factor is chosen. predicates is examined. The WHERE tree is 25

4. TABLE 1 SELECTIVITY FACTORS column = value . F = 1 / ICARDfcolumn index) if there is an index on column This assumes an even distribution of tuples among the index key values. F = l/10 otherwise column1 = column2 F = l/MAX(ICARD(columnl index), ICARD(column2 index)) if there are indexes on both column1 and column2 This assumes that each key value in the index with the smaller cardinality has a matching value in the other index. F = l/ICARD(col.umn-i index) if there is only an index on Column-i F = l/l0 otherwise column > value (or any other open-ended comparison) F = (high key value - value) / (high key value - low key value)‘ Linear interpolation of the value within the range of key values yields F if the column is an arithmetic type and value is kno,wn at access path selection time. F = l/3 otherwise (i.e. column not arithmetic) There is no significance to this number, other than the fact that it is l’ess selective than the guesses for equal predicates for which there are no indexes, and that it is less than l/2. We hypothesize that few queries use predicates that are satisfied by more than half the tuples. column BETWEEN value1 AND value2 F = (value? -. value11 / (high key value - low key value) A ratio of the BETWEEN value range to the entire key value range is used as the selectivity factor if column is arithmetic and both value1 and value2 are known at access path selection. F = l/4 otherwise Again there is no significance to this choice except that it is between the default selectivity factors for an equal predicate and a range p.redicate. column IN (list of values) F = (number of items in list) * (selectivity factor for column = value 1 This is allowed to be no more than l/2. columnA IN subquery F = (expected cardinality of the subquery result) / (product of the cardinalities of all the relations in the subquery’s FROM-list). The computation of query cardinality will be discussed below. This formula is derived by the following argument: Consider the simplest case, where subquery is of the form “SELECT columnB FROM relationc . ..“. Assume that the set of all columnB values in relationc contains the set of all columnA values. If all the tuples of relationc are selected by the subquery, then the predicate is always TRUE and F = 1. If the tuples of the subquery are restricted by a selectivity factor F’, then assume that the set of unique values in the subquery result that match columnA values is proportionately restricted, i.e. the selectivity factor for the predicate should be F’. F’ is the product of all the subquery’s selectivity factors. namely (subquery cardinality) / (cardinality of all possible subquery answers). With a little optimism, we can extend this reasoning to include sifbqueries which are joins and subqueries in which columnB is replabed by an arithmetic expression involving column names. This leads to the formula given above. fpred expression11 OR (pred expression2) F = F(pred1) + F(pred2) - Ffpredll * F(pred21 26

5. (predll AND (predtl F = Ffpredl) * F(pred27 Note that this assumes that column values are independent. NOT pred F = 1 - Ffpredl Query cardinality (QCARD) is the product of relation query, we need only to examine the the cardinalities of every relation in the cheapest access path which produces tuples query block's FROM list times the product in each "interesting" order and the cheap- of all the selectivity factors of that est "unordered" access path. Note that an query block's boolean factors. The number "unordered" access path may in fact produce of expected RSI calls (RSICARD) i* the tuples in some order, but the order is not product of the relation cardinalities times "interesting". If there are no GROUP BY or the selectivity factors of the s_arsablq ORDER BY clauses on the query, then there boolean factors, since the sargable boolean will be no interesting orderings, and the factors will be put into search arguments cheapest access path is the one chosen. If which will filter out tuples without there are GROUP BY or ORDER BY clauses, returning across the RSS interface. then the cost for producing that interest- ing ordering must be compared to the cost Choosing,an optimal access path for a of the cheapest unordered path J&U the single relation consists of using these cost of sorting QCARD tuples into the selectivity factors in formulas together proper order. The cheapest of these with the statistics on available access alternatives is chosen as the plan for the paths. Before this process is described, a query block. definition is needed. Using an index access path or sorting tuples produces The cost formulas for single'relation tuples in the index value or sort key access paths are given in TABLE 2. These order. We say that a tuple order is an formulas give index pages fetched plus data jnterestinq order if that order is one pages fetched plus the weighting factor specified by the query block's GROUP BY or times RSI tuple retrieval calls. W is the ORDER BY clauses. weighting factor between page fetches and RSI calls. Some situations give several For single relations, the cheapest alternative formulas depending on whether access path is obtained by evaluating the the set of tuples retrieved will fit cost for each available access path (each entirely in the RSS buffer pool for effec- index on the relation, plus a segment tive buffer pool per user). We assume for scan). The costs will be- described below. clustered indexes that a page remains. in For each such access path, a predicted cost the buffer long enough for every tup1e to is computed along with the ordering of the be retrieved from it. For non-clustered tuples it will produce. Scanning along the indexes, it is assumed that for those SALARY index in ascending order, for relations not fitting in the buffer, the example, will produce some cost C and a .relation is sufficiently large with respect tuple order of SALARY (ascending). To find to the buffer size that a page fetch is the cheapest access plan fo; a single required for every tuple retrieval. TABLE 2 COST FORMULAS SITUATIOR lxGlc(inaases) Unique index matching 1+1+w an equal predicate Clustered index I matching F(predsI * (NINDX(Il + TCARD) + W x RSICARD one or more boolean factors Non-clustered index I matching F(predsl * (NINDXfI) + NCARD) + W * RSICARD one or more boolean factors or F(predsl * (NINDXfI) + TCARD) + W * RSICARD if this number fits in the System R buffer Clustered index I not (NINDX(Il + TCARDI + W * RSICARD matching any boolean factors Non-clustered index I not (NINDX(Il + NCARDI + W * RSICARD matching any boolean factors or (NINDX(II + TCARD) + W * RSICARD if this number fits in the System R buffer Segment scan TCARD/P + W * RSICARD 27

6.5. pccess & selecti- .&E joins “Clustering” on a column means that tuples which have the same value in that column In 1976, Blasgen and Eswaran <4> are physically stored close to each other examined a number of methods for performing so that one Page access will retrieve 2-way joins. The performance of each of several tuples. these methods was analyzed under a variety of relation cardinalities. Their evidence N-way joins can be visualized as a indicates that for other than very small sequence of a-way joins. In this visuali- relations, one of two join methods were zation, two relations are joined together, always optimal or near optimal. The System the resulting composite relation is joined R optimizer chooses between these two with the third relation, etc. At each step methods. We first describe these methods, of the n-way join it is possible to identi- and then discuss how they are extended for fy the outer relation (which in general is n-way joins. Finally we specify how the composite.1 and the inner relation (the join order (the order in which the rela- relation being added to the join). Thus tions are joined) is chosen. For joins the methods described above for two way involving two relations, the two relations joins are easily generalized to n-way are called the outer relation, from which a joins. However, it should be emphasized tuple will be retrieved first, and the that the first 2-way join does not have to m relation, from which tuples will be be completed before the second t-way join retrieved, possibly depending on the values is started. As soon as we get a composite obtained in the outer relation tuple. A tuple for the first t-way join, it can be predicate which relates columns of two joined with tuples of the third relation to tables to be joined is called a &&I form result tuples for the 3-way join, etc. Predicate. The columns referenced in a Nested ioop joins and merge scan joins may join predicate are called && golumnq. be mixed in the same query, e.g. the first two relations of a three-way join may be The first join method, called the joined using merge scans and the composite nested loops. method, uses scans, in any result may be joined with the third rela- order, on the outer and inner relations. tion using a nested loop join. The The scan on the outer relation is opened intermediate composite relations are and the first tuple is retrieved. For each physically stored only if a sort is outer relation tuple obtained, a scan is required for the next join step. When a opened on the inner relation to retrieve, sort of the composite relation is not one at a time, all the tuples of the inner specified, the composite relation will be relation which satisfy the join predicate. materialized one tuple at a time to parti- The composite tuples formed by the cipate in the next join. outer-relation-tuple / inner-relation-tuple pairs comprise the result of this join. We now consider the order in which the relations are chosen to be joined. It The second join method, called mereinq should be noted that although the cardinal- scans. requires the outer and inner rela- ity of the join of n relations is the same tions to be scanned in join column order. regardless of join order, the cost of This implies that, along with the columns joining in different orders can be substan- mentioned in ORDER BY and GROUP BY, columns tially different. If a query block has n of equi-join predicates (those of the form relations in its FROM list, then there are Table1 .columnl = TableZ.column2) also n factorial permutations of relation join define “interesting” orders. If there is orders. The search space can be reduced by more than one join predicate, one of them observing that that once the first k is used as the join predicate and the relations are joined, the method to join others are treated as ordinary predicates. the composite to the k+l-st relation is The merging scans method is only applied to independent of the order of joining the equi-joins, although in principle it could first k; i.e. the applicable predicates are be applied to other types of joins. If one the same, the set of interesting orderings or both of the relations to be joined has is the same, the possible join methods are no indexes on the join column, it must be the same, etc. Using this property, an sorted into a temporary list which is efficient way to organize the search is to ordered by the join column. find the best join order for successively larger subsets of tables. The more complex logic of the merging scan join method takes advantage of the A heuristic is used to reduce the join ordering on join columns to avoid rescan- order permutations which are considered. ning the entire inner relation (looking for Whe’h possible, the search is reduced by a match 1 for each tuple of the outer consideration only of join orders which relation. It does this by synchronizing have join predicates relating the inner the inner and outer scans by reference to relation to the other relations already matching join column values and by “remem- participating in the join. This means that ber ing” where matching join groups .are in joining relations tl,tt,...,tn only located. Further savings occur if the those orderings til,ti2,...,tin are inner relation is clustered on the join examined .in which for all j (j=2,...,n) column (as would be true if it is the either output of a sort on the join column). (1) tij has at least one join predicate 28

7.with some relation tik, where k < j, or any was specified. Note that if a solution (2) for all k > j, tik has no join predi- exists with the correct order, no sort is cate with til,tit,...,or ti(j-1). performed for ORDER BY or GROUP BY, unless This means that all joins requiring Carte- the ordered solution is more expensive than sian products are performed as late in the the cheapest unordered solution plus the join sequence as possible. For example, if COSt of. sorting into the required order. Tl.T2,T3 are the three relations in a query block’s FROM list, and there are join The number of solutions which must be predicates between Tl and T2 and between T2 stored is at most 2X*n (the number of and T3 on different columns than the Tl-T2 subsets of n tables) t i.mes the number of join, then the following permutations are interesting result orders. The computation not considered: .’ time to generate the tree is approximately T l-T3-T2 *, proportional to the same number. This T3-T l-T2 number is frequently reduced substantially by the join order heuristic. Our experi- To find the optimal plan for joining n ence is that typical cases require only a relations, a tree of possible solutions is few thousand bytes of storage and a few constructed. As discussed above, the tenths of a second of 3701158 CPU time. search is performed by finding the best way Joins of 8 tables have been optimized in a to join subsets of the relations. For each few seconds. ' set of relations joined, the cardinality of the composite relation is estimated and Gomputation DJ costs saved. In addition, for the unordered The costs for joins are computed from join, and for each interesting order the costs of the scans on each of the obtained by the join thus far, the cheapest relations and the cardinalities. The costs solution for achieving that order and the of the scans on each of the relations are cost of that solution are saved. A solu- computed using the cost formulas for single tion consists of an ordered list of the relation access paths presented in section relations to be joined, the join method b. used for each join, and a plan indicating how each relation is to be accessed. If Let C-outerfpath 1) be the cost of scanning either the outer composite relation or the the outer relation via pathl, and N be the inner relation needs to be sorted before cardinality of the outer relation tuples the join, then that is also included in the which satisfy the applicable predicates. N plan. As in the single relation case, is computed by: “interesting” orders are those listed in N = (product of the cardinalities of all the query block’s GROUP BY or ORDER BY relations T of the join so far) * clause. if any. Also every join column (product of the selectivity factors of defines an “interesting” order. To mini- al 1 applicable predicates). nimize the number of different interesting Let C-innercpatht) be the cost of scanning orders and hence the number of solutions in the inner relation, applying all applicable the tree, equivalence clns,ses for interest- predicates. Note that in the merge scan ing orders are computed and only the best join this means scanning the contiguous solution for each equivalence class is group of the inner relation which corres- saved. For example, if there is a join ponds to one join column value in the outer predicate E. DNO = D.DNO and another join relation. Then the cost of a nested loop predicate D.DNO = F.DNOe then all three of join is these columns belong to the same order c-nested-loop-join(pathl.path2)E equivalence class. C-outerfpathll + N * C-inner(path21 The search tree is constructed by The cost of a merge scan join can be iteration ‘on the number of relations joined broken up into the cost of actually doing so far. First, the best way is found to the merge plus the cost of sorting the access each single relation for each outer or inner relations, if required. The interesting tuple ordering and for the cost of doing the merge is unordered case. Next, the best way of C-merge(pathlspath2)= joining any relation to these is found, C-outer(path1) + N * C-inner(path2) subject to the heuristics for join order. This produces solutions for joining pairs For the case where the inner relation of relations. Then the best way to join is sorted into a temporary relation none of sets of three relations is found by COnSid- the single relation access path formulas in eration of all sets of two relations and section 4 apply. In this case the inner joining in each third relation permitted by scan is like a segment scan except that the the join order heuristic. For each plan to merging scans method makes use of the fact . join a set of relations, the order of the that the inner relation is sorted so that composite result is kept in the tree. This it is not necessary to scan the entire allows consideration of a merge scan join inner relation looking for a match. For which would not require sorting the compo- this case we use the following formula for site. After the complete solutions (all Of the cost of the inner scan. the relations joined together) have been C-innercsorted list) = found, the optimizer chooses the cheapest TEHPPAGES/N + W*RSICARD solution which gives the required order, if where TEMPPAGES is the number of pages 29

8.required to hold the inner relation. This formula assumes that during the merge each page of the inner relation is fetched once. It is interesting to observe that the cost formula for nested loop joins and the cost formula for merging scans are essen- tially the same. The reason that merging scans is sometimes better than nested loops is that the cost of the inner scan may be much less. After sorting, the inner relation is clustered on the join column which tends to minimize the number of pages fetched, and it is not necessary to scan the entire inner relation (looking for a JoB pzqxr--JOB match) for each tuple of the outer rela- 5 CLERK tion. 6 TYPIST 9 SALES The cost of sorting a relation, 12 MECHANIC C-sort(path), includes the cost of retriev- ing the data using the specified access Path, sorting the data, which may involve SELECT NAME, TITLE, SAL, DNAME several passes, and putting the results FROM EMP, DEPT, JOB into a temporary list. Note that prior to WHERE TITLE=‘CLERK’ sorting the inner table, only the local AND LOCYDENVER predicates can be applied. Also, if it is AND EMP.DNO=DEPT.DNO necessary to sort a composite result, the AND EMP.JOB=JOB.JOB entire composite relation must be stored in a temporary relation before it can be “Retrieve the name, salary, job title, and department sorted. The cost of inserting the compo- name of employees who are clerks and work for site tuples into a temporary relation departments in Denver.” before sorting is included in C-sortfpath). Figure 1. JOIN example We now show how the search is done for Access Path for Single Relations the example join shown in Fig. 1. First we find all of the reasonable access paths for l Eligible Predicates: Local Predicates Only single relations with only their local l “Interesting” Orderings: DNO,JOB predicates applied. The results for this example are shown in Fig. 2. There are three access paths for the EHP table: an index on DNO, an index on JOB, and a EM” 1 %:DNO 1 :I., 1 Ftt segment .scan. The interesting orders are DNO and JOB. The index on DNO provides the tuples in DNO order and the index on JOB $EMP.DNO, CIEMP seg. scan) provides the tuples in JOB order. The segment 'scan access path is, for our purposes, unordered. For this example we assume that the index on JOB is the cheap- est path, so the segment scan path is ’ N2 ’ N2 pruned. For the DEPT relation there are C(DEPT.DNO) C(DEPT reg. scan) two access paths, an index on DNO and a X pruned segment scan. We assume that the index on DNO is cheaper so the segment scan path is JOB: pruned. For the JOB relation there are two index segment access paths, an index on JOB and a segment JOBJOB scan on JOB scan. We assume that the segment scan path is cheaper, so both paths are saved. The I I N $JOB.JOB) doe sag. scan) results just described are saved in the search tree as shown in Fig. 3. In the figures, the notation C(EMP.DNOl or Figure 2. C(E.DNOl means the cost of scanning EMP via ‘:t the DNO index, applying all predicates the.$esults for single relations shown in which are applicable given that tuples from Fig. 3. For single each relation, we find the specified set of relations have already access paths for joining in each second been fetched. The notation Ni is used to relation for which there exists a predicate represent the cardinalities of the differ- connecting it to the first relation. First ent partial results. we consider access path selection for nested loop joins. In this example we Next, solutions for pairs of relations assume that the EMP-JOB join is cheapest by are found by joining a second relation to accessing JOB on the JOB index. This is 30

9.likely since it can fetch directly the For merging JOB with EMP, ue only tuples with matching JOB, (without having to consider the JOB index on EMP since it is scan the entire relation). In practice the the cheapest access path for EHP regardless cost of joining is estimated using the of order. Using the JOB index on JOB, we formulas given earlier and the cheapest can merge without any sorting. However, path is chosen. For joining the EMP it might be cheapter to sort JOB using a relation to the DEPT ,relation we assume relation scan as input to the sort and then that the DHO index is cheapest. The best do the merge. access path for each second-level relation is combined with each of the plans in Fig. Referring to Fig. 3, we see that the 3 to form the nested loop solutions ‘shown ,:' access path chosen for the' the DEPT rela- in Fig. 4. *, tion is the DHO index. After accessing DEPT via this index, we can merge with EMP using the DHO index on EMP, again without any sorting. However, it might be cheaper to sort EMP first using the JOB index as input to the sort and then do the merge. Both of these cases are shown in Fig. 5. As each of the costs shown in Figs. 4 and 5 are computed they are compared with the cheapest equivalent solution (same C(EMP.DNOI C(EMP.JOBI UDEPT.ONO~ CIJOB.JOBi CIJOB rsg. scanI unordered tables and same result order) found so far, DNO order JOB ordef ON0 order JOB order and the cheapest solution is saved. After this pruning. solutions for all three Figure 3. Search tree for single relations relations are found. For each pair of relations, we find access paths for joining Next we generate solutions using the in the remaining third relation. As before merging scans method. As we see on the we will extend the tree using nested loop left side of Fig. 3, there is a scan on the joins and merging scans to join the third EHP relation in DHO order, so it is possi- relation. The search tree for three ble to use this scan and the DHO scan on relations is shown in Fig. 6. Note that in the DEPT relation to do a merging scans one case both the composite relation‘ and join, without any sorting. Although it is the table being added (JOB1 are sorted. possible to do the merging join without Note also that for some of the cases. no sorting as just described, it might be sorts are performed at all. In these cheaper to use the JOB index on EMP, sort cases, the composite result is materialized on DHO. and then merge. Note that we never one tuple at a time and the intermediate consider sorting the DEPT table because the composite relation is never stored. As cheapest scan on that table is already in 'before, as each of the costs are computed DHO order. they are compared with the cheapest solu- (EMP. OEPT) 1 (DEPT. EMP) (JOB, EMP) segment h&x Index Index EMP.DNO EMP.JOB JOB.JOB 3 L N3 N3 ‘Nt Nl Index Index index Index DEPT.DNO DEPT.DNO EMPJOB EMP.JOB n 1 % n NS N4 N4 C(E.DNO) C(E.JOBl C(E.DNO) CULJOBI C(D.DNO) IXJJOB) C(J seg scan) + + + + N&(D.DNO) N,C,(D.DNO) N,&J.JOB) N&J.JOBI N,C,(E.DNO) N,C,(E.JOBI NIL&JOB) DNO order JOE order DNO order JOE order DNO order JOB order unordered Figure 4. Extended search tree for second relation (nested loop join) 31

10. *JOB, IEMP, JOB1 EMP) f index Index Be!lmmt E.JOB D.DNO SG :i .JC NI 4 “JI N3 N3 son Sort JOB reg scan Sort JOI E.JOB tq JOB into L2 \ s.?g. scan ly DNO into hy JOE into L2 Ll \ 0 Merge Merpa M-C Merge Merge Merge Mew EON0 Ll E.JO8 EJOB D.DNO J.JOB LZ with with with with with with with DDNO D.LlNO J.JOB LZ Ll E.JOB E.JOB N, l NS N5 N5 Ns DNO order ON0 order JOB order JOB order JOB order JOB order Figure 5. Extended search tree for second relation (merge join) ,EMP. DEW fi L5 Merge M-9 with b with D.DNO I I D.DNO Figure 6. Extended search tree for third relation 32

11. 6. Nested Queries the query: SELECT NAME A query may appear as an operand of a FROM EMPLOYEE X predicate of the form "expression operator WHERE SALARY > (SELECT SALARY query". Such a query is called a Nested FROM EMPLOYEE Query or -a Subquery. If the operator is WHERE EMPLOYEE-NUMBER= one of the six scalar comparisons (=, -1, X.MANAGERl >t >=, <, <=I, then the subquery must This selects names of EMPLOYEE's that earn return a single .value. The following more than their HANAGER. Here X identifies example using the 'I=" operator was given in the query block and relation which furnish- section 2: Ii es the candidate tuple for the correlation. SELECT NAME 3, For each candidate tuple of the top level FROM EMPLOYEE query block, the MANAGER value is used for WHERE SALARY = evaluation of the subquery. The subquery (SELECT AVG(SALARY) result is then returned to the "SALARY >" FROM EHPLOYEE) predicate for testing acceptance of the candidate tuple. If the operator is IN or NOT IN then the subquery may return a set of values. If a correlation subquery is not For example: directly below the query block it referenc- SELECT NAME es but is separated from that block by one FROM EMPLOYEE or more intermediate blocks, then the WHERE DEPARTMENT-NUMBER IN correlation subquery evaluation will be (SELECT DEPARTMENT-NUHBER done before evaluation of the highest of FROM DEPARTMENT the intermediate blocks. For example: WHERE LOCATION='DENVER'l level 1 SELECT NAME FROM EMPLOYEE X In both examples, the subquery needs to WHERE SALARY > be evaluated only once. The OPTIMIZER will level 2 (SELECT SALARY arrange for the subquery to be evaluated FROM EMPLOYEE before the top level query is evaluated. WHERE EMPLOYEE-NUMBER = If a single value is returned, it is level 3 (SELECT MANAGER incorporated into the top level query as FROM ERPLOYEE though it had been part of the original WHERE EMPLOYEE-NUMBER = query statement; for example, if AVG(SAL1 X.MANAGERll above evaluates to 15000 at execution time, This selects names of EMPLOYEE's that earn then the predicate becomes "SALARY = more than their MANAGER's MANAGER. As 15000". If the subquery can return a set before, for each candidate tuple of the of values, they are returned in a temporary level-l query block, the EMPLOYEE.MANAGER list, an internal form which is more value is used for evaluation of the level-3 efficient than a relation but which can query block. In this case, because the only be accessed sequentially. In the level 3 subquery references a level 1 value example above, if the subquery returns the but does not reference level 2 values, it list (17,241 then the predicate is evaluat- is evaluated once for every new level 1 ed in a manner similar to the way in which candidate tuple. but not for every level 2 it would have been evaluated if the origi- candidate tuple. nal predicate had,been DEPARTMENT-NUMBER IN (17,210. If the value referenced by a correla- tion subquery (X.MANAGER above) is not A subquery may also contain a predicate unique in the set of candidate tuples with a subquery. down to a (theoretically) (e.g., many employees have the same manag- arbitrary level of nesting. When such er), the procedure given above will still subqueries do not reference columns from cause the subquery to be re-evaluated for tables in higher level query blocks, they each occurrence of a replicated value. are all evaluated before the top level However, if the referenced relation is query is evaluated. In this case, the most ordered on the referenced column. the deeply nested subqueries are evaluated re-evaluation can be made conditional, first, since any subquery must be evaluated depending. on a test of whether or not the before its parent query can be evaluated. current referenced value is the same as the one in the previous candidate tuple. If A subquery may contain a reference to a they are the same, the previous evaluation value obtained from a candidate tuple of a result can be used again. In some cases, higher level query block (see example it might even pay to sort the referenced below). Such a query is called a correla- relation on the referenced column in order tion subquery. A correlation subquery must to avoid re-evaluating subqueries unneces- in principle be re-evaluated for each sarily. In order to determine whether or candidate tuple from the referenced query not the referenced column values are block. This re-evaluation must be done unique. the OPTIHIZER can use clues like before the correlation subquery's parent NCARD > ICARD. where NCARD is the relation predicate in the higher level block can be cardinality and ICARD.is the index cardi- tested for acceptance qr rejection of the nality of an index on the referenced candidate tuple. As an example, consider column. 33

12.7. Conclusion the current more procedural languages. The System R access path selection has Cited and General References been described for single table queries, <l> Astrahan, M. M. et al. System R: joins, and nested queries. Evaluation work Relational Approach to Database Management. on comparing the choices made to the ACM Transactions on Database Systems, Vol. "right" choice is in progress, and will be 1, No. 2, June 1936, pp. 97-137. described in a forthcoming paper. Prelimi- <2> Astrahan, M. M. et al. System R: A nary results indicate that, although the Relational Database Management System. To costs p.redicted by the optimizer are often appear in Computer. not accurate in absolute value, the true <3> Bayer, R. and McCreight, E. Organiza- optimal path is selected in a large majori- tion and Maintenance of Large Ordered ty of cases. In many cases, the ordering Indices. Acta Infornatica, Vol. 1, 1972. among the estimated costs for 'all paths <4> Blasgen, M.W. and Eswaran, K.P. On the considered is precisely the same as that Evaluation of Queries in a Relational Data among the actual measured costs. Base System. IBM Research Report RJl745. April, 1976. Furthermore. the cost of path selection <5> Chamberlin, D.D., et al. SEQUELZ: A is not overwhelming. For a two-way join, Unified Approach to Data Definition, the cost of optimization is approximately Manipulation, and Control. IBH Journal of equivalent to between 5 and 20 database Research and Development, Vol. 20, No. 6, retrie,vals. This number becomes even more Nov. 1976, pp. 560-575. insignificant when such a path selector is <6> Chamberlin, D.D., Gray, J.N., and placed in an environment such as System R, Traiger, 1.1. Views, Authorization and where application programs are compiled Locking in a Relational Data Base System. once and run many times. The cost of ACM National Computer Conference Proceed- optimization is amortized over many runs. ings, 1975, pp. 425-430. <7> Codd, E.F. A Relational Model of Data The key contributions of this path for Large Shared Data Banks. ACM Communi- selector over other work in this area are cations, Vol. 13. No. 6, June, 1970, pp. the expanded use of statistics (index 377-387. cardinality, for example), the inclusion of <8> Date, C.J. An Introduction to Data Base CPU utilization into the cost formulas, and Systems, Addison-Wesley, 1975. the method of determining join order. Many CS> Lorie. R.A. and Wade, B.W. The Compila- queries are CPU-bound, particularly merge tion of a Very High Level Data .Language. joins for which temporary relations are IBM Research Report RJ2008, May, 1977. created and sorts performed. The concept <lO> Lorie, R.A. and Nilsson, J.F. An of "selectivity factor" permits the optim- Access Specification Language for ,a Rela- izer to take advantage of as many of the tional Data Base System. IBH Research query's restriction predicates as possible Report RJ2218. April, 1978. in the RSS search arguments and access (11) Stonebraker, M.R., Wang, E., Kreps. paths. By remembering "interesting order- P., and Held, G-D. The Design and Implemen- ing" equivalence classes for joins and tation of INGRES. ACM Trans. on Database ORDER or GROUP specifications, the optimiz- Systems, Vol. 1, No. 3, September, 1976, er does more bookkeeping than most path PP. 189-222. selectors, but this additional work in many (12) Toad, s. PRTV: An Efficient Implemen- cases results in avoiding the storage and tation for Large Relational Data Bases. sorting of intermediate query results. Proc. International Conf. on. Very Large Tree pruning and tree searching techniques Data Bases, Framingham. Mass., September, allow this additional bookkeeping to be 1975. performed efficiently. <13> Wong, E., and Youssefi, K. Decomposi- tion - A Strategy for Query Processing. ACH More work on validation of the optimiz- Transactions on Database Systems, Vol. 1, er cost formulas needs to be done, but we No. 3 (Sept. 1976) pp. 223-2’41. can conclude from this preliminary work (19) ZlOOf, M.H. Query by Example. Proc. that database management systems can AFIPS 1975 NCC, Vol. 94, AFIPS Press, support non-procedural query languages with Montvale, N.J., pp. 431-437. I) performance comparable to those supporting 34