# дҪҝз”ЁеҸҳдҪ“зҙўеј•ж”№иҝӣжҹҘиҜўжҖ§иғҪ

Improved Query Performance with Variant Indexes (1997): еҲҶжһҗеһӢж•°жҚ®еә“е’ҢOLTPж•°жҚ®еә“йңҖиҰҒзҡ„жқғиЎЎдёҚеҗҢпјҢиҝҷдәӣеҸҚжҳ еңЁзҙўеј•зҡ„ж•°жҚ®з»“жһ„йҖүжӢ©дёӯгҖӮжң¬ж–Үи®Ёи®әдәҶдёҖдәӣжӣҙйҖӮеҗҲеҲҶжһҗеһӢж•°жҚ®еә“зҡ„зҙўеј•ж•°жҚ®з»“жһ„гҖӮ
еұ•ејҖжҹҘзңӢиҜҰжғ…

3.extremely fast for Bitmaps. Given Bitmaps B1 and B2, we can [2.1] SELECT K10, K25, COUNT(*) FROM BENCH calculate a new Bitmap B3, B3 = B1 AND B2, by treating all GROUP BY K10, K25; bitmaps as arrays of long ints and looping through them, using the & operation of C: A 1995 benchmark on a 66 MHz Power PC of the Praxis Omni Warehouse, a C language version of MODEL 204, demonstrated for (i = 0; i < len(B1); i++) an elapsed time of 19.25 seconds to perform this query. The /* Note: len(B1)=len(B2)=len(B3) */ query plan was to read Bitmaps from the indexes for all values of B3[i] = B1[i] & B2[i]; K10 and K25, perform a double loop through all 250 pairs of /* B3 = B1 AND B2 */ values, AND all pairs of Bitmaps, and COUNT the results. The 250 ANDs and 250 COUNTs of 1,000,000 bit Bitmaps required We would not normally expect the entire Bitmap to be memory only 19.25 seconds on a relatively weak processor. By compar- resident, but would perform a loop to operate on Bitmaps by ison, MVS DB2 Version 2.3, running on an IBM 9221/170 used reading them in from disk in long Fragments. We ignore this an algorithm that extracted and wrote out all pairs of (K10, K25) loop here. Using a similar approach, we can calculate B3 = B1 values from the rows, sorted by value pair, and counted the re- OR B2. But calculating B3 = NOT(B1) requires an extra step. sult in groups, taking 248 seconds of elapsed time and 223 sec- Since some bit positions can correspond to non-existent rows, onds of CPU. (See [O'NEI96] for more details.) u we postulate an Existence Bitmap (designated EBM) which has exactly those 1 bits corresponding to existing rows. Now when 2.1.3 Segmentation we perform a NOT on a Bitmap B, we loop through a long int ar- ray performing the ~ operation of C, then AND the result with To optimize Bitmap index access, Bitmaps can be broken into the corresponding long int from EBM. Fragments of equal sizes to fit on single fixed-size disk pages. Corresponding to these Fragments, the rows of a table are parti- for (i = 0; i < len(B1); i++) tioned into Segments, with an equal number of row slots for B3[i] = ~B1[i] & EBM[i]; each segment. In MODEL 204 (see [M204, O'NEI87]), a Bitmap /* B3 = NOT(B1)for rows that exist */ Fragment fits on a 6 KByte page, and contains about 48K bits, so the table is broken into segments of about 48K rows each. Typical Select statements may have a number of predicates in This segmentation has two important implications. their Where Clause that must be combined in a Boolean manner. The resulting set of rows, which is then retrieved or aggregated The first implication involves RID-lists. When Bitmaps are suf- in the Select target-list, is called a Foundset in what follows. ficiently sparse that they need to be converted to RID-lists, the Sometimes, the rows filtered by the Where Clause must be further RID-list for a segment is guaranteed to fit on a disk page (1/32 of grouped, due to a group-by clause, and we refer to the set of 48K is about 1.5K; MODEL 204 actually allows sparser rows restricted to a single group as a Groupset. Bitmaps than 1/32, so several RID lists might fit on a single disk page). Furthermore, RIDs need only be two bytes in Finally, we show how the COUNT function for a Bitmap of a length, because they only specify the row position within the Foundset can be efficiently performed. First, a short int array segment (the 48K rows of a segment can be counted in a short shcount[ ] is declared, with entries initialized to contain the int). At the beginning of each RID-list, the segment number will number of bits in the entry subscript. Given this array, we can specify the higher order bits of a longer RID (4 byte or more), loop through a Bitmap as an array of short int values, to get the but the segment-relative RIDs only use two bytes each. This is count of the total Bitmap as shown in Algorithm 2.1. Clearly an important form of prefix RID compression, which greatly the shcount[ ] array is used to provide parallelism in calculating speeds up index range search. the COUNT on many bits at once. The second implication of segmentation involves combining Algorithm 2.1. Performing COUNT with a Bitmap predicates. The B-tree index entry for a particular value in /* Assume B1[ ] is a short int array MODEL 204 is made up of a number of pointers by segment to overlaying a Foundset Bitmap */ Bitmap or RID-list Fragments, but there are no pointers for seg- count = 0; ments that have no representative rows. In the case of a clus- for (i = 0; i < SHNUM; i++) tered index, for example, each particular index value entry will count += shcount[B1[i]]; have pointers to only a small set of segments. Now if several /* add count of bits for next short int */ predicates involving different column indexes are ANDed, the u evaluation takes place segment-by-segment. If one of the predi- cate indexes has no pointer to a Bitmap Fragment for a segment, Loops for Bitmap AND, OR, NOT, or COUNT are extremely fast then the segment Fragments for the other indexes can be ignored compared to loop operations on RID lists, where several opera- as well. Queries like this can turn out to be very common in a tions are required for each RID, so long as the Bitmaps involved workload, and the I/O saved by ignoring I/O for these index have reasonably high density (down to about 1%). Fragments can significantly improve performance. Example 2.1. In the Set Query benchmark of [O'NEI91], the In some sense, Bitmap representations and RID-list representa- results from one of the SQL statements in Query Suite Q5 gives tions are interchangeable: both provide a way to list all rows a good illustration of Bitmap performance. For a table named with a given index value or range of values. It is simply the case BENCH of 1,000,000 rows, two columns named K10 and K25 that, when the Bitmap representations involved are relatively have cardinalities 10 and 25, respectively, with all rows in the dense, Bitmaps are much more efficient than RID-lists, both in table equally likely to take on any valid value for either column. storage use and efficiency of Boolean operations. Indeed a Thus the Bitmap densities for indexes on this column are 10% Bitmap index can contain RID-lists for some entry values or and 4% respectively. One SQL statement from the Q5 Suite is: even for some Segments within a value entry, whenever the number of rows with a given keyvalue would be too sparse in -3-

7.assuming that I/O requires 10K instructions is: To calculate MEDIAN(C) with C a keyvalue in a Value-List in- ((f.100 .10,000+k .1000 .10)/1,000,000).\$12. Since k вүӨ f.100, the dex, one loops through the non-null values of C in decreasing formula f.100.10,000 + k.1000.10 вүӨ f.100.10,000 + f.100.1000.10 (or increasing) order, keeping a count of rows encountered, until = f.2,000,000. Thus, the total CPU cost is bounded above by for the first time with some value v the number of rows encoun- f.\$24, which is still cheap compared to an I/O cost of f.\$600. Yet tered so far is greater than COUNT(Bf AND Bnn )/2. Then v is this is the highest cost we assume for CPU due to I/O, which is the MEDIAN. Projection indexes are not useful for evaluating the dominant CPU term. In Table 3.4, we give the maximum dol- MEDIAN, unless the number of rows in the Foundset is very lar cost for each index approach. small, since all values have to be extracted and sorted. Surprisingly, a Bit-Sliced index can also be used to determine Method \$Cost for 10K the MEDIAN, in about the same amount of time as it takes to de- ins per I/O termine SUM (see [O'NQUA]). Projection index f.\$624 The N-TILE aggregate function finds values v1 , v2 , . . ., vN-1 , Value-List index \$642 which partition the rows in Bf into N sets of (approximately) Bit-Sliced index f.\$425 equal size based on the interval in which their C value falls: C <= v1, v1 < C <= v2, . . ., vN-1 < C. MEDIAN equals 2-TILE. Table 3.4. Costs of the four plans in dollars, with kM rows and clustering fraction f An example of a COLUMN-PRODUCT aggregate function is one which involves the product of different columns. In the The clustered case clearly affects the plans by making the TPC-D benchmark, the LINEITEM table has columns Projection and Bit-Sliced indexes more efficient compared to the L_EXTENDEDPRICE and L_DISCOUNT. A large number of Value-List index. queries in TPC-D retrieve the aggregate: SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)), usually with 3.2 Evaluating Other Column Aggregate Functions the column alias "REVENUE". The most efficient method for cal- culating Column-Product Aggregates uses Projection indexes We consider aggregate functions of the form in [3.2], where for the columns involved. It is possible to calculate products of AGG is an aggregate function, such as COUNT, MAX, MIN, etc. columns using Value-List or Bit-Sliced indexes, with the sort of algorithm that was used for SUM, but in both cases, Foundsets [3.2] SELECT AGG(C) FROM T WHERE condition; of all possible cross-terms of values must be formed and counted, so the algorithm are terribly inefficient. Table 3.5 lists a group of aggregate functions and the index types to evaluate these functions. We enter the value "Best" in a 4. Evaluating Range Predicates cell if the given index type is the most efficient one to have for this aggregation, "Slow" if the index type works but not very ef- Consider a Select statement of the following form: ficiently, etc. Note that Table 3.5 demonstrates how different in- dex types are optimal for different aggregate situations. [4.1] SELECT target-list FROM T WHERE C-range AND <condition>; Aggregate Value-List Projection Bit-Sliced Index Index Index Here, C is a column of T, and <condition> is a general search- COUNT Not needed Not needed Not needed condition resulting in a Foundset Bf. The C-range represents a SUM Not bad Good Best range predicate, {C > c1, C >= c1, C = c1, C >= c1, C > c1, C be- AVG ( SUM/COUNT) Not bad Good Best tween c1 and c2}, where c1 and c2 are constant values. We will MAX and MIN Best Slow Slow demonstrate below how to further restrict the Foundset Bf, creat- MEDIAN, N-TILE Usually Not Useful Sometimes ing a new Foundset BF, so that the compound predicate "C-range Best Best 2 AND <condition>" holds for exactly those rows contained in Column-Product Very Slow Best Very Slow B F. We do this with varying assumptions regarding index types on the column C. Table 3.5. Tabulation of Performance by Index Type for Evaluating Aggregate Functions Evaluating the Range using a Projection Index. If there is a Projection index on C, we can create BF by accessing each C The COUNT and SUM aggregates have already been covered. value in the index corresponding to a row number in Bf and test- COUNT requires no index, and AVG can be evaluated as ing whether it lies within the specified range. SUM/COUNT, with performance determined by SUM. Evaluating the Range using a Value-List Index. With a The MAX and MIN aggregate functions are best evaluated with Value-List index, evaluation the C-range restriction of [4.1] uses a Value-List index. To determine MAX for a Foundset Bf, one an algorithm common in most database products, looping loops from the largest value in the Value-List index down to the through the index entries for the range of values. We vary smallest, until finding a row in Bf. To find MAX and MIN using slightly by accumulating a Bitmap Br as an OR of all row sets in a Projection index, one must loop through all values stored. the index for values that lie in the specified range, then AND The algorithm to evaluate MAX or MIN using a Bit-Sliced index this result with Bf to get BF. See Algorithm 4.1. is given in our extended paper, [O'NQUA], together with other algorithms not detailed in this Section. Note that for Algorithm 4.1 to be efficiently performed, we must find some way to guarantee that the Bitmap Br remains in memory 2Best only if there is a clustering of rows in B in a local at all times as we loop through the values v in the range. This region, a fraction f of the pages, f вүӨ 0.755. requires some forethought in the Query Optimizer if the table T -7-

8.being queried is large: 100 million rows will mean that a to test each value and, if the row passes the range test, to turn on Bitmap Br of 12.5 MBytes must be kept resident. the appropriate bit in a Foundset. Algorithm 4.1. Range Predicate with a Value-List Index As we have just seen, it is possible to determine the Foundset of Br = the empty set rows in a range using Bit-Sliced indexes. We can calculate the For each entry v in the index for C that satisfies the range predicate c2 >= C >= c1 using a Bit-Sliced index by calcu- range specified lating BGE for c1 and BLE for c2, then ANDing the two. Once Designate the set of rows with the value v as Bv again the calculation is generally comparable in cost to calculat- Br = Br OR Bv ing a SUM aggregate, as seen in Fig. 3.2. BF = Bf AND Br u With a Value-List index, algorithmic effort is proportional to the width of the range, and for a wide range, it is comparable to the Evaluating the Range using a Bit-Sliced Index. Rather sur- effort needed to perform SUM for a large Foundset. Thus for wide prisingly, it is possible to evaluate range predicates efficiently ranges the Projection and Bit-Sliced indexes have a performance using a Bit-Sliced index. Given a Foundset Bf, we demonstrate advantage. For short ranges the work to perform the Projection and Bit-Sliced algorithms remain nearly the same (assuming the in Algorithm 4.2 how to evaluate the set of rows BGT such that range variable is not a clustering value), while the work to per- C > c1, BGE such that C >= c1, BEQ such that C = c1, BLE such form the Value-List algorithm is proportional to the number of that C <= c1, BLT such that C < c1. rows found in the range. Eventually as the width of the range decreases the Value-List algorithm is the better choice. These In use, we can drop Bitmap calculations in Algorithm 4.2 that considerations are summarized in Table 4.1. do not evaluate the condition we seek. If we only need to eval- uate C >= c1, we don't need steps that evaluate BLE or BLT. Range Evaluation Value-List Projection Bit-Sliced Index Index Index Algorithm 4.2. Range Predicate with a Bit-Sliced Index Narrow Range Best Good Good BGT = BLT = the empty set; BEQ = Bnn Wide Range Not bad Good Best For each Bit-Slice Bi for C in decreasing significance If bit i is on in constant c1 BLT = BLT OR (BEQ AND NOT(Bi)) Table 4.1. Range Evaluation Performance by Index Type BEQ = BEQ AND Bi 4.2 Range Predicate with a Non-Binary Bit-Sliced Index else BGT = BGT OR (BEQ AND Bi) Sybase IQ was the first product to demonstrate in practice that BEQ = BEQ AND NOT(Bi) the same Bit-Sliced index, called the "High NonGroup Index" BEQ = BEQ AND Bf; [EDEL95], could be used both for evaluating range predicates BGT = BGT AND Bf; BLT = BLT AND Bf (Algorithm 4.2) and performing Aggregates (Algorithm 3.2, et BLE = BLT OR BEQ; BGE = BGT OR BEQ al). For many years, MODEL 204 has used a form of indexing to u evaluate range predicates, known as "Numeric Range" [M204]. Numeric Range evaluation is similar to Bit-Sliced Algorithm Proof that BEQ BGT and BGE are properly evaluated. The 4.2, except that numeric quantities are expressed in a larger base method to evaluate BEQ clearly determines all rows with C = c1, (base 10). It turns out that the effort of performing a range re- since it requires that all 1-bits on in c1 be on and all 0-bits 0 in trieval can be reduced if we are willing to store a larger number c1 be off for all rows in BEQ. Next, note that BGT is the OR of a of Bitmaps. In [O'NQUA] we show how Bit-Sliced Algorithm 4.2 can be generalized to base 8, where the Bit-Slices represent set of Bitmaps with certain conditions, which we now describe. sets of rows with octal digit Oi вүҘ c, c a non-zero octal digit. Assume that the bit representation of c1 is bN b N-1 . . .b1 b 0 , and This is a generalization of Binary Bit-Slices, which represent sets of rows with binary digit Bi вүҘ 1. that the bit representation of C for some row r in the database is rNrN-1. . .r1r0. For each bit position i from 0 to N with bit bi off in 5. Evaluating OLAP-style Queries c1, a row r will be in BGT if bit ri is on and bits rNrN-1. . .r1ri+1 are all equal to bits bN b N-1 . . .bi+1 . It is clear that C > c1 for any Figure 5.1 pictures a star-join schema with a central fact table, such row r in BGT . Furthermore for any value of C > c1, there SALES, containing sales data, together with dimension tables must be some bit position i such that the i-th bit position in c1 known as TIME (when the sales are made), PRODUCT (product is off, the i-th bit position of C is on, and all more-significant sold), and CUSTOMER (purchaser in the sale). Most OLAP bits in the two values are identical. Therefore, Algorithm 4.2 products do not express their queries in SQL, but much of the properly evaluates BGT. u work of typical OLAP queries could be represented in SQL [GBLP96] (although more than one query might be needed). 4.1 Comparing Algorithm Performance [5.1] SELECT P.brand, T.week, C.city, SUM(S.dollar_sales) Now we compare performance of these algorithms to evaluate a FROM SALES S, PRODUCT P, CUSTOMER C, TIME T range predicate, "C between c1 and c2". We assume that C val- WHERE S.day = T.day and S.cid = C.cid ues are not clustered on disk. The cost of evaluating a range and S.pid = P.pid and P.brand = :brandvar predicate using a Projection index is similar to evaluating SUM and T.week >= :datevar and C.state in using a Projection index, as seen in Fig. 3.2. We need the I/O to ('Maine', 'New Hampshire', 'Vermont', access each of the index pages with C values plus the CPU cost 'Massachusetts', 'Connecticut', 'Rhode Island') GROUP BY P.brand, T.week, C.city; -8-

9.Query [5.1] retrieves total dollar sales that were made for a prod- of data in the summary tables grows as the product of the number uct brand during the past 4 weeks to customers in New of values in the independent dimensions (counting values of hi- England. erarchies within each dimension), it soon becomes impossible to provide dimensions for all possible restrictions. The goal of this section is to describe and analyze a variant indexing ap- CUS TO ME R P RO DUCT proach that is useful for evaluating OLAP-style queries Dime ns ion Dime ns ion quickly, even when the queries cannot make use of preaggrega- tion. To begin, we need to explain Join indexes. cid pid gender SKU 5.1 Join Indexes city S ALE S Fa c t brand Definition 5.1. Join Index. A Join index is an index on one state size table for a quantity that involves a column value of a different cid weight table through a commonly encountered join u zip pid package_type hobby day Join indexes can be used to avoid actual joins of tables, or to dollar_sales greatly reduce the volume of data that must be joined, by per- TI ME dollar_cost forming restrictions in advance. For example, the Star Join index Dime ns io вҖ” invented a number of years ago вҖ” concatenates ordinal en- unit_sales codings of column values from different dimension tables of a n Star schema, and lists RIDs in the central fact table for each con- day catenated value. The Star Join index was the best approach week known in its day, but there is a problem with it, comparable to the problem with summary tables. If there are numerous columns month used for restrictions in each dimension table, then the number of year Star Join indexes needed to be able to combine arbitrary column holiday_flg restrictions from each dimension table is a product of the number weekday_flg of columns in each dimension. Thus, there will be a "combinato- rial explosion" of Join Indexes in terms of the number of inde- pendent columns. Figure 5.1. Star Join Schema of SALES, CUSTOMER, PRODUCT, and TIME The Bitmap join index, defined in [O'NGG95], addresses this problem. In its simplest form, this is an index on a table T based An important advantage of OLAP products is evaluating such on a single column of a table S, where S commonly joins with T queries quickly, even though the fact tables are usually very in a specified way. For example, in the TPC-D benchmark large. The OLAP approach precalculates results of some database, the O_ORDERDATE column belongs to the ORDER Grouped queries and stores them in what we have been calling table, but two TPC-D queries need to join ORDER with summary tables. For example, we might create a summary table LINEITEM to restrict LINEITEM rows to a range of where sums of Sales.dollar_sales and sums of Sales.unit_sales O_ORDERDATE. This can better be accomplished by creating are precalculated for all combination of values at the lowest an index for the value ORDERDATE on the LINEITEM table. level of granularity for the dimensions, e.g., for C.cid values, This doesn't change the design of the LINEITEM table, since T.day values, and P.pid values. Within each dimension there are the index on ORDERDATE is for a virtual column through a also hierarchies sitting above the lowest level of granularity. A join. The number of indexes of this kind increases linearly with week has 7 days and a year has 52 weeks, and so on. Similarly, a the number of useful columns in all dimension tables. We de- customer exists in a geographic hierarchy of city and state. pend on the speed of combining Bitmapped indexes to create ad- When we precalculate a summary table at the lowest dimen- hoc combinations, and thus the explosion of Star Join indexes sional level, there might be many rows of detail data associated because of different combinations of dimension columns is not a with a particular cid, day, and pid (a busy product reseller cus- problem. Another way of looking at this is that Bitmap join in- tomer), or there might be none. A summary table, at the lowest dexes are Recombinant, whereas Star join indexes are not. level of granularity, will usually save a lot of work, compared to detailed data, for queries that group by attributes at higher lev- The variant indexes of the current paper lead to an important els of the dimensional hierarchy, such as city (of customers), point, that Join indexes can be of any type: Projection, Value- week, and brand. We would typically create many summary ta- List, or Bit-Sliced. To speed up Query [5.1], we use Join in- bles, combining various levels of the dimensional hierarchies. dexes on the SALES fact table for columns in the dimensions. If The higher the dimensional levels, the fewer elements in the appropriate join indexes exist for all dimension table columns summary table, but there are a lot of possible combinations of hi- mentioned in the queries, then explicit joins with dimension ta- erarchies. Luckily, we don't need to create all possible summary bles may no longer be necessary at all. Using Value-List or Bit- tables in order to speed up the queries a great deal. For more de- Sliced join indexes we can evaluate the selection conditions in tails, see [STG95, HRU96]. the Where Clause to arrive at a Foundset on SALES, and using Projection join indexes we can then retrieve the dimensional By doing the aggregation work beforehand, summary tables pro- values for the Query [5.1] target-list, without any join needed. vide quick response to queries, so long as all selection condi- tions are restrictions on dimensions that have been foreseen in 5.2 Calculating Groupset Aggregates advance. But, as we pointed out in Example 1.1, if some restric- tions are non-dimensional, such as temperature, then summary We assume that in star-join queries like [5.1], an aggregation is tables sliced by dimensions will be useless. And since the size performed on columns of the central fact table, F. There is a -9-

10.Foundset of rows on the fact table, and the group-by columns in be few rows in each, and evaluating the Grouped AGG(F.C) in the Dimension tables D1, D2, . . . (they might be primary keys of Algorithm 5.1 might require an I/O for each individual row. the Dimension tables, in which case they will also exist as for- eign keys on F). Once the Foundset has been computed from the 5.3 Improved Grouping Efficiency Using Segmentation Where Clause, the bits in the Foundset must be partitioned into and Clustering groups, which we call Groupsets, again sets of rows from F. Any aggregate functions are then evaluated separately over In this section we show how segmentation and clustering can these different Groupsets. In what follows, we describe how to be used to accelerate a query with one or more group-by at- compute Groupset aggregates using our different index types. tributes, using a generalization of Algorithm 5.1. We assume that the rows of the table F are partitioned into Segments, as ex- Computing Groupsets Using Projection Indexes. We as- plained in Section 2.1. Query evaluation is performed on one sume Projection indexes exist on F for each of the group-by Segment at a time, and the results from evaluating each Segment columns (these are Join Indexes, since the group-by columns are are combined at the end to form the final query result. on the Dimension tables), and also for all columns of F involved Segmentation is most effective when the number of rows per in aggregates. If the number of group cells is small enough so Segment is the number of bits that will fit on a disk page. With that all grouped aggregate values in the target list will fit into this Segment size, we can read the bits in an index entry that cor- memory, then partitioning into groups and computing aggregate respond to a segment by performing a single disk I/O. functions for each group can usually be done rather easily. As pointed out earlier, if a Segment s1 of the Foundset (or For each row of the Foundset returned by the Where clause, clas- Groupset) is completely empty (i.e., all bits are 0), then ANDing sify the row into a group-by cell by reading the appropriate s 1 with any other Segment s2 will also result in an empty Projection indexes on F. Then read the values of the columns to Segment. As explained in [O'NEI87], the entry in the B-tree leaf be aggregated from Projection indexes on these columns, and level for a column C that references an all-zeros Bitmap Segment aggregate the result into the proper cell of the memory-resident is simply missing, and a reasonable algorithm to AND Bitmaps array. (This approach can be used directly for functions such a will test this before accessing any Segment Bitmap pages. Thus SUM(C); for functions such as AVG(C), it can be done by accu- neither s1 nor s2 will need be read from disk after an early phase mulating a "handle" of results, SUM(C) and COUNT(C), to cal- of evaluation. This optimization becomes especially useful culate the final aggregate.) when rows are clustered on disk by nested dimensions used in grouping, as we will see. If the total set of cells in the group-by cannot be retained in a memory-resident array, then the values to be aggregated can be Consider a Star Join schema with a central fact table F and a set tagged with their group cell values, and then values with iden- of three dimension tables, D1, D2, D3. We can easily generalize tical group cell values brought together using a disk sort (this the analysis that follows to more than three dimensions. Each is a common method used today, not terribly efficient). dimension Dm, 1 вүӨ m вүӨ 3, has a primary key, dm , with a domain of values having an order assigned by the DBA. We represent the Computing Groups Using Value-List Indexes. The idea of number of values in the domain of dm by nm , and list the values using Value-List indexes to compute aggregate groups is not new. As mentioned in Example 2.1, Model 204 used them years of dm in increasing order, differentiated by superscript, as: dm 1, n ago. In this section we formally present this approach. dm2, . . ., dm m. For example, the primary key of the TIME dimen- sion of Figure 5.1 would be days and have a natural temporal Algorithm 5.1. Grouping by columns D1.A, D2.B using a order. The DBA would probably choose the order of values in Value-List Index the PRODUCT dimension so that the most commonly used hier- For each entry v1 in the Value-List index for D1.A archies, such as product_type or category, consist of contiguous For each entry v2 in the Value-List index for D2.B sets of values in the dimensional order. See Figure 5.2. B g = Bv1 AND Bv2 AND Bf Evaluate AGG(F.C) on Bg Category Product Type Product /* We would do this with a Projection index */ u PROD1 Algorithm 5.1 presents an algorithm for computing aggregate Soap PROD2 Personal groups that works for queries with two group-by columns (with Hygiene PROD3 Bitmap Join Value-List indexes on Dimension tables D1 and PROD4 D2). The generalization of Algorithm 5.1 to the case of n group- PROD5 by attributes is straightforward. Assume the Where clause con- Shampoo PROD6 dition already performed resulted in the Foundset Bf on the fact PROD7 table F. The algorithm generates a set of Groupsets, Bg, one for PROD8 each (D1.A, D2.B) group. The aggregate function AGG(F.C) is Figure 5.2. Order of Values in PRODUCT Dimensions evaluated for each group using Bg in place of Bf. In what follows, we will consider a workload of OLAP-type Algorithm 5.1 can be quite inefficient when there are a lot of queries which have group-by clauses on some values in the di- Groupsets and rows of table F in each Groupset are randomly mension tables (not necessarily the primary key values). The placed on disk. The aggregate function must be re-evaluated for fact table F contains foreign key columns that match the primary each group and, when the Projection index for the column F.C is keys of the various dimensions. We will assume indexes on too large to be cached in memory, we must revisit disk pages for these foreign keys for table F and make no distinction between each Groupset. With many Groupsets, we would expect there to these and these and the primary keys of the Dimensions. We in- -10-

11.tend to demonstrate how these indexes can be efficiently used to The Groupset for the next few cells will have Bitmaps: perform group-by queries using Algorithm 5.1. 0011000000000000000000000000000000000... We wish to cluster the fact table F to improve performance of the 0000110000000000000000000000000000000... most finely divided group-by possible (grouping by primary key values of the dimensions rather than by any hierarchy val- And so on, moving from left to right. ues above these). It will turn out that this clustering is also ef- fective for arbitrary group-by queries on the dimensions. To To repeat: as the loop to perform the most finely divided group- evaluate the successive Groupsets by Algorithm 5.1, we con- by is performed, and Groupset Bitmaps are generated, succes- sider performing the nested loop of Figure 5.3. sive blocks of 1-bits by row number will be created, and succes- sive row values from the Projection index will be accessed to For each key-value v1 in order from D1 evaluate an aggregate. Because of Segmentation, no unnecessary For each key-value v2 in order from D2 I/Os are ever performed to AND the Bitmaps of the individual For each key-value v3 in order from D3 dimensions. Indeed, due to clustering, it is most likely that <calculate aggregates for cell v1, v2, v3> Groupset Bitmaps for successive cells will have 1-bits that End For v3 move from left to right on each Segment Bitmap page of the Value End For v2 index, and the column values to aggregate will move from left to End For v1 right in each Projection index page, only occasionally jumping to the next page. This is tremendously efficient, since relevant Figure 5.3. Nested Loop to Perform a Group-By pages from the Value-list dimension indexes and Projection in- dexes on the fact table need be read only once from left to right In the loop of Figure 5.3, we assume the looping order for di- to perform the entire group-by. mensions (D1, D2, D3) is determined by the DBA (this order has long-term significance; we give an example below). The loop on If we consider group-by queries where the Groupsets are less dimension values here produces conjoint cells (v1, v2, v3), of the finely divided than in the primary key loop given, grouping in- group-by. Each cell may contain a large number of rows from stead by higher hierarchical levels in the dimensions, this ap- table F or none. The set of rows in a particular cell is what we proach should still work. We materialize the grouped have been referring to as a Groupset. Aggregates in memory, and aggregate in nested loop order by the primary keys of the dimensions as we examine rows in F. It is our intent to cluster the rows of the fact table F so that all Now for each cell, (v1 , v2 , v3 ) in the loop of Figure 5.3, we de- the rows with foreign keys matching the dimension values in termine the higher order hierarchy values of the group-by we are each cell (v1 , v2 , v3 ) are placed together on disk, and further- trying to compute. Corresponding to each dimension primary more that the successive cells fall in the same order on disk as m key value of the current cell, vi = di , there is a value in the di- the nested loop above on (D1, D2, D3). r mension hierarchy we are grouping by hi ; thus, as we loop Given this clustering, the Bitmaps for each Groupset will have through the finely divided cells, we aggregate the results for 1-bits in a limited contiguous range. Furthermore, as the loop is m m m r r r (d1 1, d2 2, d3 3) into the aggregate cell for (h1 1, h2 2, h3 3). performed to calculate a group-by, successive cells will have As long as we can hold all aggregates for the higher hierarchical rows in Groupset Bitmaps that are contiguous one to another levels in memory at once, we have lost none of the nested loop and increase in row number. Figure 5.4 gives a schematic repre- efficiency. This is why we attempted to order the lowest level sentation of the Bitmaps for index values of three dimensions. dimension values by higher level aggregates, so the cells here can be materialized, aggregated, and stored on disk in a streamed D 1 = d11 1111111111111111100000000000000000000... fashion. In a similar manner, if we were to group by only a sub- = d12 0000000000000000011111111111111111000... set of dimensions, we would be able to treat all dimensions not . . . named as the highest hierarchical level for that dimension, D 2 = d21 1111111000000000011111110000000000111... which we refer to as ALL, and continue to use this nested loop = d22 0000000111111100000000001111111000000... approach. . . . D 3 = d31 1100000110000000011000000000000000110... 5.4 Groupset Indexes = d32 0011000001100000000110000000000000001... . . . While Bitmap Segmentation permits us to use normal Value-List n indexing, ANDing Bitmaps (or RID-lists) from individual in- = d3 3 0000011000001100000000110000000000001... dexes to find Groupsets, there is some inefficiency associated Figure 5.4. Schematic Representation of Dimension with calculating which Segments have no 1-bits for a particular Index Bitmaps for Clustered F Cell to save ANDing segment Bitmaps. In Figure 5.1, for exam- ple, the cell (d1 1 , d2 1 , d3 1 ) has only the leftmost 2 bits on, but The Groupset Bitmaps are calculated by ANDing the appropri- the Value-List index Bitmaps for these values have many other ate index Bitmaps for the given values. Note that as successive segments with bits on, as we see in Figure 5.4, and Bitmaps for Groupset Bitmaps in loop order are generated from ANDing, the individual index values might have 1-bits that span many 1-bits in each Groupset move from left to right. In terms of Segments. Figure 5.4, the Groupset for the first cell (d1 1 , d2 1 , d3 1 ) calcu- lated by a Bitmap AND of the three index Bitmaps D1 = d11, D2 = To reduce this overhead, we can create a Groupset index, whose d21, and D3 = d31, is as follows. keyvalues are a concatenation of the dimensional primary-key values. Since the Groupset Bitmaps in nested loop order are 1100000000000000000000000000000000000... represented as successive blocks of 1-bits in row number, the -11-

12.Groupset index value can be represented by a simple integer, [O'NEI91] Patrick O'Neil. The Set Query Benchmark. The which represents the starting position of the first 1-bit in the Benchmark Handbook for Database and Transaction Processing Groupset, and the ending position of that Bitmap can be deter- Systems, Jim Gray (Ed.), Morgan Kaufmann, 2nd Ed. 1993, pp. mined as one less than the starting position for the following 359-395. index entry. Some cells will have no representative rows, and this will be most efficiently represented in the Groupset index [O'NEI96] Patrick O'Neil. Database: Principles, Programming, by the fact that there is no value representing a concatenation of and Performance. Morgan Kaufmann, 3rd printing, 1996. the dimensional primary-key values. [O'NGG95] Patrick O'Neil and Goetz Graefe. Multi-Table We believe that the Groupset index makes the calculation of a Joins Through Bitmapped Join Indices. SIGMOD Record, multi-dimensional group-by as efficient as possible when pre- September, 1995, pp. 8-11, calculating aggregates in summary tables isn't appropriate. [O'NQUA] Patrick O'Neil and Dallan Quass. Improved 6. Conclusion Query Performance with Variant Indexes. Extended paper, avail- able on http:/www.cs.umb.edu/~poneil/varindexx.ps The read-mostly environment of data warehousing has made it feasible to use more complex index structures to speed up the [PH96] D. A. Patterson and J. L. Hennessy. Computer Archi- evaluation of queries. This paper has examined two new index tecture, A Quantitative Approach. Morgan Kaufmann, 1996. structures: Bit-Sliced indexes and Projection indexes. Indexes like these were used previously in commercial systems, Sybase [STG95] Stanford Technology Group, Inc., An INFORMIX Co.. IQ and MODEL 204, but never examined in print. Designing the Data Warehouse on Relational Databases. Informix White Paper, 1995, http://www.informix.com. As a new contribution, we have shown how ad-hoc OLAP- style queries involving aggregation and grouping can be effi- [TPC] TPC Home Page. Descriptions and results of TPC ciently evaluated using indexing and clustering, and we have benchmarks, including the TPC-C and TPC-D benchmarks. introduced a new index type, Groupset indexes, that are espe- http://www.tpc.org. cially well-suited for evaluating this type of query. References [COMER79] Comer, D. The Ubiquitous B-tree. Comput. Surv. 11 (1979), pp. 121-137. [EDEL95] Herb Edelstein. Faster Data Warehouses. Information Week, Dec. 4, 1995, pp. 77-88. Give title and au- thor on http://www.techweb.com/search/advsearch.html. [FREN95] Clark D. French. "One Size Fits All" Database Architectures Do Not Work for DSS. Proceedings of the 1995 ACM SIGMOD Conference, pp. 449-450. [GBLP96] Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. Data Cube: A Relational Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Proc. 12th Int. Conf. on Data Eng., pp. 152-159, 1996. [GP87] Jim Gray and Franco Putzolu. The Five Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. Proc. 1987 ACM SIGMOD, pp. 395-398. [HRU96] Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. Implementing Data Cubes Efficiently. Proc. 1996 ACM SIGMOD, pp. 205-216. [KIMB96] Ralph Kimball. The Data Warehouse Toolkit. John Wiley & Sons, 1996. [M204] MODEL 204 File Manager's Guide, Version 2, Release 1.0, April 1989, Computer Corporation of America. [O'NEI87] Patrick O'Neil. Model 204 Architecture and Performance. Springer-Verlag Lecture Notes in Computer Science 359, 2nd Int. Workshop on High Performance Transactions Systems (HPTS), Asilomar, CA, 1987, pp. 40-59. -12-

жҲ‘е°ұжҳҜжҲ‘пјҒ
е·Іе°Ҷй“ҫжҺҘеӨҚеҲ¶иҮіеүӘиҙҙжқҝ