Most pmv~ous work m the ama of mam memory database systems

Most pmv~ous work m the ama of mam memory database systems has focused on the problem of developing query processmg techmoues that work well wnh a very large buffer pool In thus paper, -we address query processmg issues for memoryresrdent relauonal databases, an envuonment wrth a very dtfferent set of costs and pnonues We present an arclutectum for a main memory DBMS, discussing the ways m whtch a memory resident database differs from a disk-based database We then address the problem of processmg relauonal quenes 1n bus archttecture, cons1denng altemat1ve algonthms for seleetron, pmpecuon, and Join opemnons and studying then performance

1. Query Processing in Main Memory Database Management Systems Tobm J L.&man Mchael J Carey Computer Sctenccs Department Untversity of Wisconsm Ma&son, WI 53706 ABSTRACT m mam memory For those apphcauons whose storage reqmrements contmue to exceed memory capacmes, them may sttll be often- Most pmv~ouswork m the ama of mam memory database sys- tems has focused on the problem of developing query processmg referenced relattons that wtll fit m memory, m which case rt may pay techmoues that work well wnh a very large buffer pool In thus to pamuon the databaseinto memory resident and disk resident por- paper, -we addressquery processmg issues for memoryresrdent rela- ttons and then use memory-specific techmques for the memory uonal databases, an envuonment wrth a very dtfferent set of costs restdent pomon (much hke IMS Fastpath and IMS [Dat81]) In and pnonues We present an arclutectum for a main memory DBMS, discussing the ways m whtch a memory resident database add1uonto tradruonal database appllcattons, there are a number of differs from a disk-based database We then addressthe problem of emerging appl1cattonsfor wluch mam memory sizes will almost cer- processmgrelauonal quenes 1nbus archttecture, cons1denngaltema- tainly be sufficient - appllcat1on.sthat wish to be able to store and t1ve algonthms for seleetron, pmpecuon, and Join opemnons and access mlaaonal data mostly because the relauonal model and 1ts studying then performance We show that a new mdex structure, the T Tree, works well for selectton and JOM processing in memory assoctated operattons provide an attracuve abstracaon for their restdent databases We also show that bashmg methods work well needs Horwuz and Teltelbaum have proposed using relaaonal for processmg pro~cttons and JOUIS. and that an old Join method, storage for program mformanon 1nlanguage-basededitors, as adding sort-merge, shll has a place m mam memory relations and relaaonal operanons to attnbute grammars provides a mce mechamsm for speafy1ng and building such systems [HoT85] 1 Introductron Linton has also proposed the use of a databasesystem as the basis for constructmg program development envlromnents but841 Today, medmm to hrgh-end computer systems typmally have Snodgrasshas shown that the relauonal model provides a good basis memory capacmes m the range of 16 to 128 megabytes, and 1t 1spro- for the development of performance momtonng tools and their inter- Jetted that chp densmes will contmue their current trend of doubling faces [Sno84] Finally, Warren (and others) have addressedthe rela- every year for the foreseeable future [Rs86] As a result, It IS ttonsh1p between Prolog and mlahonal database systems fJVar81], expected that mam memory sizes of a ggabyte or mote will be feasi- and having efficient algonthms for telauonal operaaons 1n mam ble and perhaps even fanly common wuhm the next decade Some memory could be useful for processing quenes m future logic pro- researchers b&eve that many apphcaaons wuh memory requue- gramming language implementauons ments whtch currently exceed those of today’s technology wtll thus become memory restdent apphcauons 1n the not-too-dntant future Mouvated by these considerations, we are addressing the ques- [GLV83], and the database systems area 1scertam to be affected 1n uon of how to manage large memory res&nr relauonal databases some way by these trends Pmv~ousstudies of how large amounts of Whereas tradmonal database algonthms are usually designed to memory will affect the design of databasemanagementsystems have muum1zedisk traffic, a main memory database system must employ focused almost entuely on how to make use of a large buffer pool algonthms that am dnven by other cost factors such as the number of [DK084, DeG85, ElB84, Sha86] data compansons and the amount of data movement. We are study- mg these issues, evaluatmg both old and new algonthms to deter- With memory stzes growing as they am, 1t IS qtute 11kelythat mme which ones make the best use of both CPU cycles and memory databases,at least for some apphcauons, will eventually fit enurely (Note that whde memory can be expected to be large, 1twdl never be free ) We have focused mostly on query processing issues to date, but we also plan to examme concurrency control and recovery issues Thisresearch wasparttallysupported by soIBM FellowshIp, anIBM Faculty III our research - main memory databases ~111still be mulu-user DevelosmentAward, and NattonalScrwceFoundauonGrant NumberDCR- 84028l-8 systems, and many apphcauons will require their data to be stored safely on disk as well as 1nmam memory The remainder of thts paper IS orgamzed as follows Secuon 2 PermIssIonto copy wtthout fee all or part of this matenal ISgranted describes our main memory DBMS archnecture, pointing out ways provtded that the copies are not made or dtstrtbuted for dnect m which the orgamzauon of mam memory databasescan profitably commerctal advantage, the ACM copynght nottce and the tttle of the doffer from drsk-based databases Sections 3 presents our work on pubheatron and its date appear, and nottce ISgrven that copymg 1sby algonthms for tmplementmg selechon, projectton, and JoIn opera- permtsstonof the Associatron for Computrng Machmery To copy otherwtse, or to repubhsh, reqmres a fee and/or spectfic permtsston uons Both algonthms and performance results are given for each of these operauons Fmally, Secuon 4 presents our conclusions and 0 1986 ACM 0-89791-191-l/86/0500/0239 $00 75 discussestheir impact on query opumizatton 239

2.2 Mam Memory DBMS Arch&cture references Department tuples, the MM-DBMS will subsmute a In ths sectlon, we present the design of a mam memory data- Department tuple pointer m tts place, The MM-DBMS can then sim- base management system (MM-DBMS) that we are bmldmg as part ply perform the selection on the Employee relanon, followmg the of a research proJect at the Umverslty of W~s~~~n-Mad~son The Department pointer of each result tuple key aspects of the design are the stmcture of relanons, m&es, and Assuming that the Department Relahon does not have pointers temporary hsts (for holding query results and temporary relanons) to the Employee Relation, remevmg data m the other drecaon Ideas for approachmg the problems of concurrency control and would sull requne a Jam operation, but the Join’s comparison can be recovery are in the development stages The design 1s presented done on pointers rather than on data Usmg the relanow from the under the assumption that the. entire. database resides m mam example above, consider the followmg query memory, ignonng (for now) the case of a pamuoned database Query 2 Retneve the names of all employees who work m the Toy or Shoe Departments 2 1 Relations To process tis query, a selecuon wdl be done on the Department Every relanon in the MM-DBMS ~11 be broken up mto partl- relauon to remeve the “Shoe” and “Toy” Department tuples, and the tions, a partmon is a umt of recovery that IS larger than a typical &Sk result will then be homed with the Employee relanon For the Join, page, probably on the order of one or two Qsk tracks In order to compansons will be performed using the tuple pointers for the allow more freedom of design of these pamuons, the relanons ~111 selecuon’s result and the Department tuple pointers m the Employee not be allowed to be traversed dltecdy, so all accessto a relation 1s relation Wlnle this would be equivalent m cost to Jmmng on through an index (Note that tins reqmres all telauons to have at Dept-Id m tis example, it could lead to a slgmficant cost savmgs If least one mdex ) Although physical contiguity 1snot a major perfor- the Join columns were smng values instead mance issue m mam memory (indeed, the tuples of a relation could be scattered across all of memory), keepmg the tuples grouped 2 2 Indices together m a pamnon aids m space management and recovery, as Since relauons are memory resident, it 1s not necessary for a well as being more efficient in a multi-level cache environment (In a main memory index to store actual attnbute values Instead, pomters single-level cache, cache block Sizes are typically smaller than the to tuples can be stored m then place, and these potntem can be used size of a tuple, but m a muln-level cache where there are several to extract the atmbute values when needed This has several advan- cache block sizes, the larger sized cache blocks could hold most or tages First, a single tuple pointer provides the index with accessto all of a pamtion ) both the atmbute value of a tuple and the tuple itself, reducing the The tuples m a pamuon will be referred to directly by memory Size of the index Second, dus eliminates the complexity of dealing addresses,so tuples must not change locauons once they have been with long fields, vanable length fields, compression techmques, and entered mto the database For a vanable-length field, the tuple itself calculatmg storage reqmrements for the index Third, movmg will contain a pointer to the field m the ptitron’s heap space, so pointers will tend to be cheaper than moving the (usually longer) attnbute values when updates necessitate mdex operaaons Finally, tuple growth will not cause tuples to move ’ Since tuples in memory Since a single tuple pomter provides accessto any field in the tuple, can be randomly accessedwith no loss m performance, tt 1spossible muIn-atmbute m&ces will need less m the way of special mechan- for the MM-DBMS to use pointers where it would otherwise be isms Figure 1 shows an example of two m&es built for the necessary to copy data in a disk-based DBMS For example, If Employee relauon (The m&es are shown as sorted tables for sim- foreign keys (attnbutes that reference tuples m other relations) are PllW) ldentlfied m the manner proposed by Date [DatW, the MM-DBMS can substitute a tuple pointer field for the foreign key field (Tins The MM-DBMS design has two types of dynanuc mdex struc- field could hold a single pointer value m the case of a one to one tures, each serving a &fferent purpose The T Tree mC851, a rela- relatlonshlp, or It could hold a list of pomters If the relationship 1s uvely new index structure designed for use m mam memory, is used one to many ) When the foreign key field’s value 1sreferenced, the as the general purpose index for ordered data It 1sable to grow and MM-DBMS can simply follow the pointer to the foreign relation shnnk gracefully, be scanned m either Qrection, use storage tuple to obtam the desired value %s will be more space efficient, efficiently, and handle duplicates with httle extra work Modified as pointers will usually be as small as or smaller than data values Linear Hashmg, a vanant of Linear Hashmg [Lit801 that has been (especially when the values are stnngs) Tlus wdl also enhance modified for use m mam memory mC85], is used for unordered remeval performance by allowing the use of precomputed Jams data Several other index structures were constructed to aid in the Consider the following example exammatlon of Join and project methods shown later m tis paper The array index structure [AHK85] was used to store ordered data It Employee Relauon (Name, Id, Age, DeptId) Department Relation (Name, Id) 1seasy to build and scan, but it 1suseful only as a read-only index because It does not handle updates well Chained Bucket Hashmg Query 1 Retneve the Employee name, Employee age, and Depart- [AHU74] was used as the temporary index structure for unordered ment name for all employees over age 65 data, as It has excellent performance for static data (Ongmally, Most convenuonal DBMSs lack precomputed Jams and would Chained Bucket Hashmg was going to be used for static structuresin require a Join operation to answer tis query Even with precom- the MM-DBMS, but it has Since been replaced by Mo&ied Lmear puted Joms, a conventional DBMS would need to have the Depart- Hashmg, becauseIt was discovered that the two have slrmlar perfar- ment tuples clustered with the Employee tuples or It could pay the mance when the number of elements remants stabc ) pnce of a disk access for every Department tuple remeved In the MM-DBMS, usmg precomputed Joins is much easier Assummg 2 3 Temporary hsts that the Emp Depud field has been ldentlfied as a foreign key that The MM-DBMS uses a temporary hst stmctum far srormg mtermedmte result nlahons A temm hst is a hst of tuple pomters plus an awxx@d result descnptor The pomters point to the source relauon(s) from which the temporary relanon was formed, 240

3.and the result descnptor 1dentlfiesthe fields that are contamed 1nthe actual update 1sdone to the database,as 1sdone 1nIMS FASTPATH relation that the temporary hst represents The descnptor takes the [IBM791 If the transaction aborts, then the log entry 1sremoved and place of projection - no wdtb reduction 1sever done, so there 1sht- no undo 1sneeded If the transacuon comrmts, then the updates are tle motivation for computing pmJect1onsbefore the last step of query propagated to the database After a crash, the MM-DBMS can processmg unless a sigmficant number of duphcates can be ehm- continue processmg as soon as the workmg sets of the current tran- 1nated Unlike regular relauons, a temporary hst can be traversed sacDonsare present in main memory The process of readmg in a Qrectly, however, 1t1salso possible to have an index on a temporary workmg set works as follows Each pamtlon that pamclpates 1n the hst working set 1sread from the &sk copy of the database The log dev- 1ce1schecked for any updates to that pamuon that have not yet been As an example, 1f the Employee and Department relations of propagated to the disk copy Any updates that exist are merged with Figure 1 were Joined on the Department Id fields, then each result the parution on the fly and the updated pamuon 1s placed 1n tuple 1n the temporary hst would hold a pourof tuple pomters (one memory Once the workmg set has been read in, the MM-DBMS poumng to an Employee tuple and one pomting to a Department should be able to run at close to 1tsnormal rate while the remainder tuple), and the result descnptor would hst the fields 1n each relation of the database1sread 1nby a background process A related propo- that appear 1nthe result Rgure 1 also shows the result hst for such sal for mam memory databaserecovery has been developed in paral- an equijom on Department Id (Query 1) lel with ours [&86], since both schemes are 1n theu development EmDlOVeeRelatton r Employee 1 stages,however, 1twould be premature to compare them here c Log Device I I L-L-Icl- 124 102 110 124 105 110 CPU 137 137 DeDartment L Flgure 2 - Recovery Components Concurrency control costs are different for a memory resident I 1 287 1 Pamt 1 455 1 1 (102,201)1 ’ - database Transactions will be much shorter 1n the absence of &sk accesses In tis environment, 1t will be reasonable to lock large Flgure 1 - Relation and Index Design items, as locks will be held for only a short time Complete senall- zat1onwould even be possible 1f all transactionscould be guaranteed to be reasonably short, but transachon interleaving 1s necessary for 2 4 Concurrency Control and Recovery fairness 1f some transactions will be long We expect to set locks at The MM-DBMS 1sintended to provide very high performance the partition level, a frurly coarSelevel of granulanty, as tuple-level for the apphcahons that 1t 1scapable of serving, many of which urlll locking would be prolubrtlvely expensive here (A lock table 1sbasi- reqmre their data to be stored safely on disk as well as 1n memory tally a hashed relation, so the cost of lockmg a tuple would be com- Thus, the MM-DBMS must have a fast recovery mechamsm The parable to the cost of accessing 1t - thus doubling the cost of tuple system 1s intended for multiple users, so 1t must also provide con- accesses1f tuple-level lockmg 1sused ) Recall that the Size of a par- currency control While we have not yet fimshed the design of these tmon 1s expected to be on the order of one or several disk tracks subsystems, we wish to point out some of the major issues that are (since tlus 1sthe unit of recovery) Partmon-level lockmg may lead gmd1ngtheir design to problems with certam types of transactions that are mherently One proposed solution to the recovery problem 1s to use long (e g , conversational transachons) We will addressthese issues battery-backup RAM modules &eR85], but thus does not protect 1nfuture work memory from the possibility of a me&a failure - a malfunctionmg CPU or a memory frulure could destroy a several gigabyte database 3 Query Processmg m Mam Memory DBMS Thus, disks will stdl be needed to provide a stable storage medium The direct addressatnhty of data 1na memory resident database for the database Given the size of memory, appl1cauonsthat depend has a profound impact on query processmg With the nouon of clus- on the DBMS ~11 probably not be able to afford to wat for the tenng removed, the methods for selection, Join and projection entire database to be reloaded and brought up to date from the log acquire new cost formulas Old and new algonthms for these query Thus, we are developing an approach that will allow normal process- processing operations were tested to determme which algonthms 1ng to contmue 1mme&ately, although at a slower pace until the perform best 1na main memory environment workmg sets of the current transacuonsare read into mam memory Our approach to recovery 1n the MM-DBMS 1s based on an 3 1 The Test Environment active log device. Dunng normal operation, the log device reads the All of the tests reported here were run on a PDP VAX 1l/750 updates of comrmtted transactions from the stable log buffer and running with two megabytes of real memory (as opposed to virtual updates the disk copy of the database The log device holds a change memory) ’ Each of the algontbms was Implemented 1n the C pro- accumulation log, so 1t does not need to update the disk version of gramming language, and every effort was made to ensure that the the database every ume a partmon 1s modified The MM-DBMS quality of the implementauons was umform acmss the algonthms wntes all log 1nformauondirectly into a stable log buffer before the The validity of the execution umes reported here was venfied by 241

4.recording and exanumng the number of compansons, the amount of data movement, the number of hash funcoon calls, and other nuscel- laneous operations to ensure that the algonthms were doing what they were supposed to (1e , neither more nor less) These counters were compiled out of the code when the final performance tests were run, so the execution hmes presented here reflect the mnmng times of the actual operaaons ~rlth very little ume spent m overhead (e g , dnver) routmes Tumng was done usmg a rouhne sundar to the ‘getrusage’ facility of Umx ’ Figure 3 - A T Tree 3 2 Selection + Half-Leaf Nodes This section summarizes the results from a study of index mechamsmsfor mam memory databases[Lec85] The index stmc- tures tested were AVL Trees [AHU74], B Trees [Com7913,arrays data, data, data, “’ dat [AHK85], Chamed Bucket hashmg [Knu73], Extendible Hashmg I I [FNP79], Linear Hashmg &t80], Modified Linear Hashmg [LeC85], and one new method, the T Tree [LeC85] (Modified Linear Hashmg uses the basic pnnclples of Lmear Hashmg, but uses very small nodes m the directory, single-item overflow buckets, and average overtlow cham length as the cntena to control duectoly growth ) All Figure 4 - T Nodes of these mdex structures, except for the T Tree, are well-known, and tbelr algontbms are described m the hterature Thus, we descnbe called an rnrernal node A T-node that has one NIL chdd pointer and only the T Tree here one non-NIL cluld pointer 1scalled a half-leaf A node that has two 3 2 1 The T Tree Index Structure NIL cluld pointers 1scalled a leaf For a node N and a value X, If X lies between the mlmmum element of N and the maximum element The T Tree 1sa new balanced tree structure that evolved from of N (mcluslve), then we say that node N bounds the value X Since AVL and B Trees, both of which have certam posmve quahhes for the data m a T-node ISkept m sorted or&r, its leftmost element ISthe use m mam memory The AVL Tree was designed as an internal smallest element m the node and its nghrmost element 1sthe largest. memory data structure It uses a bmary tree search, which 1s fast For each internal node A, there 1sa cormspondmg leaf (or half-leaf) sum the binary search 1smtrm~c to the tree structure (1e , no anth- that holds the data value that 1s the predecessor to the mimmum mettc calculations are needed) Updates always affect a leaf node, value m A, and there 1salso a leaf (or half-leaf) that holds the succes- and may result m an unbalanced tree, so the tree IS kept balanced by sor to the maximum value m A The predecessorvalue 1scalled the rotation operations The AVL Tree has one major hsadvantage - greatest lower bound of the internal node A, and the successorvalue its poor storage uuhza0on Each tree node holds only one data item, IScalled the least upper bound so there are two pointers and some control mformation for every data item The B Tree IS also good for memory use - its storage unhza- Associated with a T 1 ree is a nummum count and a maxlmum count Internal nodes no&s keep their occupancy (I e the number of uon IS better since there are many data items per pomter4, searchmg data items m the node) m dus range The mlmmum and maximum 1s fairly fast since a small number of nodes are searched with a counts will usually differ by Just a small amount, on the order of one binary search, and updatmg 1s fast smce data movement usually or two items, whch turns out to be enough to slgmficandy reduce the involves only one node need for tree rotations With a mix of mserts and deletes, dns little The T Tree is a binary tree with many elements per node (Rg- bit of extra mom reduces the amount of data passeddown to leaves ure 3) Figure 4 shows a node of a T Tree, called a T Node Since due to msert overtlows, and It also reduces the amount of data bor- the T Tree IS a binary tree, It retams the mtnnslc binary searchnature rowed from leaves due to delete underllows Thus, havmg flexlblhty of the AVL Tree, and, becausea T node contmns many elements, the m the occupancy of internal nodes allows storage uuhxanon and T Tree has the good update and storage charactenshcsof the B Tree insert/delete time to be traded off to some extent Leaf nodes and Data movement 1s required for msemon and deletion, but it 1susu- half-leaf nodes have an occupancy ranging from zero to the max- ally needed only withm a single node Rebalancing IS done using Imum count rotations smular to those of the AVL Tree, but it 1sdone much less Searchmg m a T Tree 1s sumlar to seamhmg m a binary tree often than m an AVL ‘Fiee due to the possibility of mtra-node data The mam difference.1sthat compansons are made wltb the mimmum movement and maximum values of the node rather than a single value as m a To md in our dlscuss:on of T Trees, we begin by mtroducmg bmary tree no& The search conslsta of a bmary tree search to iind some helpful termmology There are three different types of T- the node that bounds the searchvalue and then a bmary searchof the nodes, as shown in Figure 4 A T-node that has two subtrees 1s node to find the value, if such a node ts found To insert mto a T Tree, one first searches for a node that of AT&T Bell Laboratones 2Unvr IS a trademark bounds the msert value If such a node 1sfound, the item 1sinserted ’ Wereferto theonglaalB Tree,notthecommonlyused B+ Tree Testsre- there If the insert causesan overtlow, the mimmum elementSof that ported m [L&85] showed that the B+ Tree uses more storage than the B Tree and does not perform any better m mam memory ’ A B Tree mternal node contams (N + 1) node pomters for every N data shlovmg the muummn element mqures less total data movement tbao movmg Items whilea B Treeleafnodecoota~ns onlydataitems Sinceleafnodesgreatly the maxmum element Smularly, when a node underflow because of a deletion, outnumber mlemalnodesfor typlcalvaluesof N, therearemanydataitemspernode borrowlog the greatest lower bound from a leaf node rqmres less work than kc- poulter rowmg the least upper bound These details are explamed m [L.eC851 242

5.node IS transferred to a leaf node, becoming the new greatest lower Modltied LlFear Hash bound for the node it used to occupy If no boundmg node can be found, then the leaf node where the search ended 1sthe node where the insert value goes If the leaf node 1sfull, a new leaf 1sadded and the tree IS rebalanced To delete from a T Tree, one first searches for the node that bounds the delete value Then, one searchesthe node for the delete value If a boundmg node IS not found, or the delete value wlthm the bounding node IS not found, the delete returns unsuccessful Other- wse, the item IS removed from the node If deletmg from the node Array causes an underllow, then the greatest lower bound for tlus node 1s borrowed from a leaf If dns causes a leaf node to become empty, T Tree the leaf node 1sdeleted and the tree 1srebalanced If there 1sno leaf to borrow from, then the node (which must be a leaf) 1sallowed to Seconds AVL Tree underflow 3 2 2 The Index Tests Each index structure (arrays, AVL Trees, B Trees, Chained Bucket Hashmg, Extedble Hashmg, Lmear Hashmg, Modified ChamedBucket Hash Linear Hashmg, and T Trees) was tested for all aspectsof mdex use creation, search, scan, range quenes (hash structures excluded), query nuxes (mtenmxed searches,mserts and deletes), and deletion o! ’ ’ ’ 1 8 1 ’ ’ 0 ’ Each test used m&x structures filled ~rlth 30,000 umque elements 0 10 20 30 40 50 60 70 80 90 100 (except for create, which inserted 30,000 elements) The m&ces Node Size were configured to run as umque mdlces - no duphcates were per- Graph 1- Index Search mltted The index structures were constructed in a *mam memory” style, that is, the Indices held only tuple pomters instead of actual “Node Size” 1sreally average cham length for Mo&fied Linear Hash- key values or whole tuples We summanze the results of three of the ing 1 tests from [LeCSS] searchmg, a query mrx of searchesand updates, and storage cost measurements In order to compare the perfor- Query MIX mance of the mdex structuresm the same graphs, the number of van- The query mix test IS most Important, as It shows the index able parameters of the vanous st.mctureswas reduced to one - node structures m a normal workmg envlromnent Tests wem performed size In the caseof Mtified Lmear Hashmg, single-item nodes were for three query nuxes using different percentages of mterspersed used, so the “Node Sue” axis m the graphs refers to the average searches,inserts and deletes overflow bucket cham length Those structures without vanable 1) 80% searches,10% msew, 10% deletes node sizes simply have straight hnes for their execution umes The 2) 60% searches,20% mserta, 20% deletes graphs represent the hashmg algontbms with dashed lines and the 3) 40% searches,30% inserts, 30% deletes order-presexvmgstructures with solid lines The query mix of 60 percent searches,20 percent mserta and 20 per- cent deletes (Graph 2) was representative of the three query mix Search graphs The T Tree performs better than the AVL Tree and the B Graph 1 shows the search times of each algonthm for vanous Tree here because of its better combmed search / update capability node sizes The array usesa pure bmary search The overhead of the The AVL tree is faster than the B Ttee because It 1s able to search anthmeac calculauon and movement of pointers 1snonceable when faster than the B Tree, but the execution times are smular becauseof compared to the “hanlwu&” binary search of a binary tree In con- the B Tree’s better update capability For the smallest node sizes, trast, the AVL Tree needs no anthmetrc calculahons, as It Just does Modified Linear Hashmg, Extendible Hashmg, and Chamed Bucket one compare and then follows a pomter The T Tree does the major- Hashing are all basically equivalent They have similar search cost, ity of lta search m a manner slmllar to that of the AVL Tree, then, and when the need to resize the directory 1snot present, they all have when It locates the correct node, It sHrltchesto a binary search of that the same update cost Lmear Hashmg, on the other hand, was much node Thus, the search cost of the T Tree search 1s shghdy more slower because, trying to mamtam a pamcular storage uuhzatlon than the AVL cost, as some time 1slost m binary search- (number of data bytes used I total number of data bytes aviulable), It mg the final node The B Tree search ume IS the worst of the four did a slgmficant amount of data reorgamzmon even though the order-preservmg structures, because It reqmres several bmary number of elements was relatively constant As for the array index, searches,one for each no& m the searchpath ita performance was two orders of magmmde worse than that of the The hashmg schemes have a fixed cost for the hash funcaon other index structures becauseof the large amount of data movement computation plus the cost of a linear searchof the node and any asso- reqmred to keep the amy m sorted order (Every update reqmres ciated overflow buckets For the smallest node sizes, all four hash- moving half of the army, on the average ) mg methods are basically equivalent The differences he in the search fimes as the nodes get larger Linear Hashmg and Extendible Storage Cost Hashmg are Just about the same, as they both search mulaple-Item Space considerations preclude the mcluslon of the storage nodes Mo&fied Linear Hashmg searchesa linked hst of single-Item results graph, but we summanze them here The array uses the nodes, so each data reference reqmres traversing a pointer Tlus mlmmum amount of storage, so we discuss the storage costs of the overhead 1s noticeable when the cham becomes long (Recall that other algonthms as a raao of their storage cost to the array storage 243

6. 947 based methods tested, Modified Linear Hashmg pmvlded the best Array overall performance & 501 I I Lookmg at the order-preserving mdex structures, AVL Trees I i : have good search execution times and fair update execution rimes,, 45 - \ Lmear Hash : but they have tigh storage costs Arrays have reasonable search : / umes and low storage costs, but any update actlvlty at all causesit to /’ have execunon rimes orders of mugnrrude higher than the other ,/ Modified Lmear Hash mdex structures AVL Trees and arrays do not have suffiaently good performance I storage charactenshcs for conslderaaon as mam memory indices T Trees and B Trees do not have the storage prob- lems of dynarmc haslung methods, they have low storage costs for 30 1 i ; !I those node sizes that lead to good performance The T Tree seemsto be the best of choice for an order-preservmg mdex structure, as it performs umformly well in all of the tests Extendible Ha+,. ,//--’ /’ /’ \ B Tree /’ L AVL Tree ,‘- / c / T Tree 5~-------------------------------- ChamedBucket Hash 0- I I , I I , t I 0 10 20 30 40 50 60 70 80 90 100 Table 1 -Index Study Results Node Size Graph 2 - Query MIX of 60% Searches 3 3 Jorn cost Fust, we consider the fixed values the AVL Tree storage fac- Previous Jam studies mvolvmg large memones have been tor was 3 because of the two node pomters it needs for each data based on the large buffer pool assumphon [Sha86], [DK084], item, and Chamed Bucket Hashmg had a storage factor of 2 3 [DeG851 (Others have studled hash moms as well m a normal dtsk because it had one pointer for each data Item and part of the table environment [Bab791, [VaG84], [Bra84], but tbelr results are less remained unused (the hash funcnon was not perfecrly umform) applicable here ) Three mam Jam methods were tested m [DeG85] Modified Lmear Hashmg was slmdar to Chamed Bucket Hasbmg for Nested Loops wtth a hashed index, Sort Merge [BlE77], and three an average hash chain length of 2, but, for larger hash chams, the hasbmg methods, Simple Hash, Hybnd Hash and GRACE Hash number of empty slots m the table decreasedand eventually the table [DK084] The msulta showed that when both Elations fit m became completely full Finally, Linear Hashmg, B Trees, Extendl- memory, the three hash algonthms became equvalent, and the ble Hashmg and T Trees all had nearly equal storage factors of 15 nestedloops Jam with a bash index was found to perform Just as well for medmm to large size nodes Extendible Hashmg tended to use as the other hash algonthms (and outperformed Sort Merge) They tbe largest amount of storage for small nodes sizes (2,4 and 6) This also studied the use of semqom pmcessmg with ht vectors to reduce was because a small node size mcreased the probablbty that some the number of disk accessesinvolved m the Join, but dus senqom nodes would get more values than others, causing tbe directory to pass 1s redundant when the relattons are memory resident The double repeatedly and thus use large amounts of storage As its node variety of Join relation compostaons (e g , sizes, Join selechvitles, size was mcreased,the probabllny of dus happemng became lower Jam column value dtstnbunons) used m their study was small, and may not completely reflect all posslblhties (performance-wise) 3 2 3 Index Study Results In dus study, we examme the performance of a number of can- Table 1 summarizes the results of our study of mam memory didate Join methods for the MM-DBMS We use a wide selection of index structures We use a four level ratmg scale (poor, far, good, relation composlhons so as to evaluate the algonthms under a wde great) to show the performance of the index structures m the three vanety of possible condmons categones An important dung to nonce about the hash-based indices is that, whtle Extendble Hashmg and Mtified Linear Hash- 3 3 1 Relation GeneratIon mg had very good performance for small nodes, they also had hgh In or&r to support our intent to test a variety of relanon com- storage costs for small nodes (However, the storage uhhzatlon for poslhons, we constructed our test relahons so that we could vary Modified Linear Hashmg can probably be Improved by using several parameters The vanable parameterswere multiple-item nodes, thereby reducing the pointer to data Item rauo, (1) The relauon cardmahty (IRj) the version of Modified Linear Hasbmg tested here used smgle-Item nodes, so there was 4 bytes of pointer overhead for each data item ) (2) Tbe number of Jam column duplicate values (as a percentageof As for the other two hash-basedmethods Chained Bucket Hashmg IRI) and then &stnbuhon had good search and update performance, but it also bad fairly high (3) The semjorn selecnvlty (the number of values m the larger storage costs, and it 1s only a stauc stn~cture, and finally, Linear relauon tbat paruclpate in the Join, expressed as a percentageof Hashing is Just too slow to use m mam memory Among the hash- the larger telauon) 244

7. In order to get a vanable semlJom selecnvlty, the smaller rela- tuples that have to be scanned m the inner relation The Hash Join tlon was built with a specified number of values from the larger rela- bmlds a Cham Bucket Hash index on the Join column of the mner tion To get a vanable number of duphcates, a specified number of relation, and then it uses tis index to find matchmg tuples durmg the umque values were generated (either from a random number genera- JoIn The Tree Jom uses an exlstmg T Tree index on the mner rela- tor or from the larger relauon), and then the number of occurrences hon to find matchmg tuples We do not include the posslblhty of of each of these values was determmed usmg a random sampling building a T Tree on the mner relation for the Join because It turns procedure based on a truncated normal dlstnbutlon with a vanable out to be a viable alternative only if the T tree already exists as a reg- standard deviation Graph 3 shows the three duphcate dlstnbuhons ular index - if the cost to build the tree is included, a Tree Jom ~111 used for the tests - a skewed &smbutlon (where the standarddevla- aZwuyscost more than a Hash Jom, as a T tree costs more to bmld hon was 0 1), a moderately skewed dlstnbuhon (the 0 4 curve m the and a hash table IS faster for single value remeval DC851 On the graph), and a near-umform dlsmbution (the 0 8 curve m the graph) other hand, we always include the cost of bmldmg a hash table, becausewe feel tbat a hash table mdex 1sless likely to emst than a T 100 Tree index The cost of creating a hash table with 30,000 elements 1s about 5 secondsm our envlmnment OCSS] 90 The merge Jam algonthm [BlE77] was implemented using two index struchzes, an array mdex and a T Tree mdex For the Son 80 Merge algorithm tested here, array indexes were built on both rela- tions and then sorted The sort was done using qmcksort wtth an msemon sort for subarrays of ten elements or less 6 For the Tree 70 Merge tests, we built T Tree mdlces on the Jam columns of each relation, and then performed a merge Join usmg these indices How- 60 ever, we do not report the T Tree constmctton ames m our tests - tt turns out that the T Merge algorithm 1s only a viable altemanve if SO the indices already exist Prehmmary tests showed that the arrays Percent can be built and sorted m 60 percent of the time to bmld the trees, Tuples and also that the array can be scannedm about 60 percent of the time 40 It takes to scan a tree 30 3.3 3 Jom Tests The Jom algonthms wem each tested with a variety of relation i 20 composltrons m order to determine their relative performance Six ; tests were performed m all, and they are summarized below In our descnpuon of the tests, IRl( denotes the outer relation and IR21 -I 10 I denotesthe mner relation -i (1) Vary Cardtnuhty Vary the sizes of the relations with (Rll = 0 jR21,0% duphcates, and a semlJomselechvlty of 100% 0 10 20 30 40 50 60 70 80 90 100 (2) Vary Inner Curdmnultty Vary the size of R2 (IR2( = l-100% of Percent Values IRll) with [Rll = 30,000, 0% duplicates, and a semJom selec- Graph 3 - Dlstrlhutlon of Duphcate Values t1v1ty of 100% (3) Vury Outer Cardmltty Vary the size of Rl (IRll = l-100% of The results for the 0 4 and 0 8 caseswere slmdar, so results are given IR21)with IR21= 30,000, 0% duplicates, and a senuJomselec- here only for the two extreme cases t1v1ty of 100% (4) Vary Dupltcate Percentage (skewed) Vary the duplicate per- 3.3 2 The Jom Algorithms centage of both relations from O-100% with IRll = IR21 = For memory resident databases, all of the hash-based algo- 20,000, a semiJom selecnvlty of 1008, and a skewed duplicate nthms tested m [DeG85] were found to perform equally well dlsmbution Therefore, the hash-based nested loops algonthm IS the only hash- (5) Vary Duphcate Percentage (uniform) Vary the duplicate per- based algonthm that we examme here For our tests, we lmple- centage of both mlauons from O-100% with IRll = IR21 = mented and measured the performance of a total of five Join algo- nthms Nested Loops, a simple mam-memory version of a nested 20,000, a semlJomselecuvlty of lOO%,and a umform duplicate loops Jam with no index, Hash Jom and Tree Jom, two vanants of dlsmbuuon the nested loops Jom that use indices, and Sort Merge and Tree (6) Vary Semyorn Selectrvrty Vary the semiJoin selecmrty from Merge, two vanants of the sort-merge Jom method of [BlE77] We l-100% with jRl[ = IR21= 30,000 and a duplicate percentageof bnefly descnbe each of these methods m turn Recall that relauons 50% with a umform duplicate dlsmbunon are always accessed vta an index, unless otherwise specified, an array index was used to scan the relations m our tests 6 We ran a test to detenmne the optunalsubarray sue for swtchmg from The pure Nested Loops Join IS an O(N*) algontbm It uses one qucksort to’msertmn sort, the ophmal subarray size was 10 relation as the outer, scamung each of its tuples once For each outer tuple, it then scansthe entire inner relaaon lookmg for tuples with a matchmg Jom column value The Hash Jom and Tree Jom algo- nthms are similar, but they each use an index to limit the number of 245

8. -- 3 3 4 Jom Test Results Test 2 - Vary Inner Cardmallty we present the results of each of the JOHI tests m dns secuon Graph 5 shows the performance of the JOHI methods as R2’s The results for the Nested Loops algorithm ~11 be presented cardmahty 1s vaned from l-10036 of the cardmahty of Rl In tis separately at the end of the secaon, as its performance was typIcally test, Rl’s cardmahty IS fixed at 30,000, the Join columns were agam two orders of magmtude worse than that of the other Jam methods keys (1e , no duplicates), and the senuJom selectlmty was agam 100% The results obtamed here are snmlar to those of Test 1, with Test 1 - Vary Cardmality Tree Merge performmg the best if T Tree m&ces exist on both Join Graph 4 shows the performance of the Jam methods for rela- columns, and Hash Jom performing the best othemse In dus test, tlons with equal cardmahues The relations are Jomed on keys (1e , each of the the mdex JOIIIS were basically doing (RI\ searchesof an no duplicates) w1t.ha semlJom selectlvlty of 100% (1e , all tuples index of (Increasing) cardmahty IR21 pamclpate m the Join) If both m&ces are av;ulable, then a Tree Merge gives the best performance It does the least amount of work, 20- as the T Tree indices are assumed to exist, and scanmng them m JOIN TEST 2 (Vary IR21) order 1lmlt.sthe number of compansons required to perform the Join The number of compansons done 1sapproximately ([Rll + IR21* 2), Hash Jom as each element in Rl 1sreferenced once and each element m R2 IS - Tree Join referenced twice me presence of duplicates would increase the ------ Sort Merge number of hmes the elements m R2 are referenced) If it 1snot that 15- --- case that both mdlces are avadable, it 1sbest to do a Hash Join It turns out that, m tlus case, it is actually faster to build and use a hash table on the mner relanon than to simply use an exishng T Tree index A Hash table has a fixed cost, independent of the index size, to look up a value The number of compansons done in a Hash Join ,/- I 1sapproximately (IRlj + (IRll * k)) where k IS the fixed lookup cost, lo- /’ /’ /’ whereas the number of compansons m a Tree Jom 1sroughly (IR1I + (IRl) * Log@2]))) The value of k 1s much smaller than Seconds 1’ , Logz(lR21)))but larger than 2 Finally, the Sort Merge algorithm has the worst performance of the algonthms in dus test, as the cost of / , bulldmg and sorting the arrays for use m the merge phase 1stoo high (WI * Lw$lW) + WI * L4$lW)) + WI + WIN 5 I, -- __--- --/ /-- JOIN TEST 1 ((RI! = IR2() _--- -- 0- 0 25% 50% 75% 100% Hash Jom 1’ /’ Graph 5 - 1R2) Percentage of JR11 Vary Inner Cardmahty - ------ --- Tree Jam Sort Merge Tree Merge /’ /’ /’ / 1’ 1’ Test 3 -Vary Outer Cardmahty /’ The parameters of Test 3 were identical to those of Test 2 10 - /’ except that jRl( was vaned instead of (R2( The results of dus test are /’ 1’ , shown m Graph 6 The Tree Merge, Hash Join, and Sort Merge algonthms perform much the same as they did m Test 2 In dus /’ Seconds /’ II/-; case, however, the Tree Join outperforms the others for small values of (Rll, beatmg even the Tree Merge algorithm for the smallest (RI1 values Tlus IS mtumve, as this algorithm behaves like a simple selecuon when [Rll contams few tuples Once IR21 mcreases to about 60% of IRlI, the Hash Jom algorithm becomes the better method agam because the speed of the hash lookup overcomes the lmtlal cost of bmldmg the hash table, both of which combmed are cheaper than the cost of many T Tree searchesfor large values of lRl/ Note if a hash table index already existed for R2, then the Hash Jom would be faster than the Tree Jom (recall that bmldmg the 0 7500 15000 22500 30000 hash table takes about 5 seconds) Number Of Tuples Graph 4 - Vary Cardmahty 246

9. 20 JOIN TEST 3 Wary lR11) JOIN TEST 4 (Vary Duplicates - Skewed Dust ) 10000 Hash Jom - Tree Jom Hash Join - Tree Jom ------ Sort Merge --- Tree Merge 10 Seconds Seconds 10 50% 75% 100% 1 0 25% 75% 100% IRll Percentage of IR21 0 25% 50% Duplicate Percentage Graph 6 - Vary Outer Cardmahty Graph 7 - Vary Duplicate Percentage (skewed) Test 4 -Vary Duphcate Percentage (skewed) IN TEST 5 (Vary Duphcates - Uniform Dust ) For test 4, IRll and /R21were fixed at 20,000, the sern1Jom 1000 selecttvity was kept at lOO%, and the duphcate percentage for both relahons was vaned from 1 to 100% The results of dus test are shown m Graph 7 The duplicate chsmbuuon was skewed, so there Hash Jom were many duplicates for some values and few or none for others - Tree Jom (The duplicate percentagesof the two relauons were different m this ------ Sort Merge test - a result of the relauon construction procedure In order to --- Tree Merge actieve 100 percent semqom selecnvlty, the values for R2 were II 100 chosen from Rl, wluch already contamed a non-urnform dlsmbuhon of duphcates Therefore, number of duplicates m R2 1sgreater than that of Rl The duplicate percentagesm Graph 7 refer to Rl ) Once the number of duplicates becomes sigmficant, the number of match- mg tuples (and hence result tuples) becomes large, resultmg m many Seconds more tuples being scanned The Sort Merge method 1s the most efficient of the algonthms for scanmng large numbers of tuples - once the skewed duplicate percentage leaches about 80 percent, the 10 cost of bmldmg and sortmg the arrays IS overcome by the efficiency / of scannmg the relations via the arrays, so It beats even Tree Merge m tlus case Although the number of compansons 1s the same, as both Tree Merge and Sort Merge use the same Merge Jam algorithm,, the array index can be scanned faster than the T Tree index because the army index holds a hst of contiguous elements whereas the T Tree holds nodes of con@uous elements Joined by pomters Test results from [L&851 show that the array can be scannedm about 2/3 1 the time It takes to scan a T Tree The Index Jam methods are less 0 25% 50% 75% 100% efficient for processmg large numbers of elements for each Jcnn Duplicate Percentage value, so they begin to lose to Sort Merge when the skewed duphcate Graph 8 - Vary Duplicate Percentage (uniform) percentagereachesabout 40 percent (Note that the duplicate percentagesof Rl and R2 are the same here, Test 5 -Vary Duplicate Percentage (muform) because R2 was created with a umform &smbunon of Rl values ) Test 5 is ldenucal to Test 4 except that the chsmbuaon of Here, the Tree Merge algonthm remamed the best method unttl the duphcates was umform The results of Test 5 are shown m Graph 8 duphcate percentage exceeded about 97 percent because the output 247

10.of the loin was much lower for most duplicate percentages When 3 3 5 Jom Test Result Summary the duplicate percentages were low (O-60 percent), the Jam algo- If the proper pair of tree indices IS present, the Tree Merge Jam nthms had behavior similar to that of earlier tests Once the duph- method was found to perform the best m almost all of the sltuatlons cate percentage became tigh enough to cause a lugh output Jam (at tested It turned out never to be advantageous to budd the T Tree about 97 percent), Sort Merge agam became the fastestJoin method Indices for thusJam method, however, as it would then be slower then the other three methods In sltuauons where one of the two Test 6 -Vary SenuJom Selectlvlty relations 1smissing a Jam column index, the Hash Jom method was In the previous tests, the senuJomselecnvlty was held constant found to be the best choice There are only two exceptions to these at 100% In Test 6, however, it was vaned, and the results of tis statements test are shown in Graph 9 For @IIStest, /Rlj = JR21= 30,000 ele- (1) If an mdex exists on the larger relation and the smaller relation ments, the duplicate percentage was fixed at 50% m each relauon 1sless than half the size of the larger relation, then a Tree Jam with a umform dlsmbuhon (so there were roughly two occurrences (T Tree index Join) was found to execute faster than a Hash of each JOUI column value in each relation), and the senuJomselec- Join In tis slmaaon, the tuples m the smaller relation can be tivity was vaned from l-100% The Tree Jam was affected the most looked up m the tree mdex faster than a hash table can be bmlt by the mcrease in matching values, a bnef descnpaon of the search and scanned This would also be true for a hash mdex if It procedure ~111explam why When the T Tree ISsearchedfor a set of already existed tuples with a single value, the search stops at any tuple with that value, and the tree 1sthen scannedm both dnec~ons from that poa- (2) When the semiJom selecnvlty and the duplicate percentage are non (smce the list of tuples for a gven value 1slogically connguous both tigh, the Sort Merge Jam method should be used, pamcu- m the tree) If the lmtlal search does not find any tuples matchmg larly if the duplicate dlsmbutlon IS highly skewed A Tree the search value, then the scan phase 1s bypassed and the search Merge Join IS also sansfactoly is this case, but the required returns unsuccessful When the percentage of matchmg values IS mdlces may not be present If the indices must be bmlt, then low then, most of the searchesare unsuccessfuland the total cost IS the Tree Merge Join will be more costly than the Hash Jom for much lower than when the maJonty of searches are successful A duplicate percentagesless then 60 m the skewed case and 80 m slmllar case can be made for the Hash Jam m that unsuccessful the umform case searches sometimes reqmre less work than successful ones - an It should be mentioned that only eqmJomswere tested Non- unsuccessful search may scan an empty hash chain instead of a full equljoins other than “not equals” can make use of ordenng of the one The increase m the Tree Merge execunon time m Graph 9 was data, so the Tree Jom should be used for such (<, 5, >, 2) Jams due mostly to the extra data compansons and the extra overhead of As mennoned earher, we also tested the nested loops Join recording the mcreasmg number of matchmg tuples Sort Merge 1s method Due to the fact that its performance was usually several less affected by the increase m matchmg tuples because the sortmg orders of magnitude worse than the other Join methods, we were time overshadowsthe time required to perform the actual merge Jam unable to present them on the samegraphs Graph 10 shows the cost of nested loops Jam for a pomon of Test 1, with IRl( = jR21vaned from 1,000 to 20,000 It is clear that, unless one plans to generate 20 full cross products on a regular basis, nested loops Join should slm- JOIN TEST 6 (Vary SemlJom Selectwty) ply never be considered as a practical Join method for a mam memory DBMS 15 fl The precomputed Jam described m Section 2 1 was not tested along with the other Join methods Intumvely, It would beat each of the Join methods m every case, because the Jolmng tuples have already been poured Thus, the tuple pointers for the result relation can simply be extracted from a single relation 3 4 ProJectIon 10 In our discussion of the MM-DBMS m Section 2, we explained Hash Jom that much of the work of the proJectlon phase of a query 1simphcltly Tree Jom done by speclfymg the attnbutes in the form of result descnptors ------ Sort Merge Seconds _--- Thus, the only step requmng any slgmficant processing is the final Tree Merge operation of removing duplicates For duphcate ehmmatlon, we tested two candidate methods Sort Scan [BBD83] and Hashmg [DK084] Agam, we implemented both methods and compared their _--- --- performance _--- _ c -- In these tests, the composltlon of the relation to be proJected _---- was vaned m ways similar to the those of the Join tests - both the relation cardmahty and its duplicate percentage were vaned Since prehmmary tests showed that the dlsmbunon of duphcates had no 1 effect on the results, we do not vary the dlsmbuuon in the tests 0 25% 50% 75% 100% presentedhere Percent Matchmg Values Graph 9 - Vary Semqom Selectwlty 248

11. 10000 NESTED LOOPS JOIN 100 Seconds 10 17 0 10000 20000 30000 0 10000 20000 30000 Number of Tuples (IF11 = JR2)) Number of Tuples Graph 10 - Nested Loops Join Graph 11 - Vary Cardmahty Graph 11 shows the performance of the two duphcate ehmma- non algonthms for relations of various sizes For dus test, no duph- cates were actually introduced m the relation, so the startmg size of the relation and lta final stze were the same The msemon overhead m the hash table IS linear for all values of JR1(since the hash table PROJECT TEST 2 (Vary Duphcate Percentage) size was always chosen to be (R1/2),whde the cost for sortmg goes as 8 O(lRI log IRI) As the number of tuples becomes large, this sorting cost dommates the performance of the Sort Scan method In addl- tlon, these tests were performed using single column relations - the 7 number of compansons IS much higher m the sort process, and thus cost would only be exacerbated If more columns paruclpated m the proJection Thus, the Hashmg method ISthe clear winner m dus test 6 Graph 12 shows the results for a relation with 30,000 elements but a varying number of duplicates As the number of duplicates Increases,the hash table stores fewer elements (since the duplicates 5 are discarded as they are encountered) The Hashmg method IS thus able to run faster than It would with all the elements (since it has shorter chams of elements to process for each hash value) Sortmg, 4 on the other hand, realizes no such advantage, as it must shll sort the Seconds entire hst before ehmmatmg tuples durmg the scan phase The large number of duplicates does affect the sort to some degree, however, 3 Hash because the msertlon sort has less work to do when there are many duplicates - with many equal values, the subarray m quicksort IS often already sorted by the time it ISpassedto the msemon sort 4 Conclusions and Future Work In thus paper, we have addressedquery processmg Issues and slgonthms for a mam memory database management system We sketched an archnecture for such a system, the MM-DBMS archtec- ture, pomtmg out the major dtfferences between disk-based data- 25% 50% 75% 100% 0 bases and memory resident databases We then addressedthe prob- Duphcate Percentage lem of processmg relational quenes m the MM-DBMS architecture, studying algonthms for the selection, Join, and proJechonoperations Graph 12 - Vary Duphcate Percentage 249

12. --- 1 A number of candldate algonthms were implemented for each opera- [DeC85] D DeWltt and R Gerber, Multiprocessor Hash-Based uon, and their performance was expenmentally compared We Jom Algonthms, Proc of Ilth Int Conf on Very Lurge found that, for selection, the T Tree provides excellent overall per- Data Bases, Stockholm, Sweden, August 1985 formance for quenes on ordered data, and that Modified Linear [ElC86] M Elch, MMDB Recovery, Southern Methodist Umv Hashing 1sthe best mdex structure (of those exammed) for unordered Dept of Computer Sciences Tech Rep # 86-CSE-11, data For JOHIS, when a precomputed JOIII does not exist, we found March 1986 that a T Tree based merge Join offers good performance if both [ElB84] K Elhardt and R Bayer, A Database Cache for &gh mdlces exist, and that hashmg tends to offer the best performance Performance and Fast Restart m Database Systems, otherwlse A mam memory vanant of the sort merge algorithm was ACM Transacttons on Database Systems 9,4 (December found to perform well for high output JOHIS Fmally, it was shown 1984), 503-526 that hashmg 1sthe dominant algorithm for processmg proJectionsm [FNP791 R Fagm, J Nlevergelt, N Pippenger and H Strong, mam memory Extenhble Hashmg - A Fast Access Method for Dynanuc Rles, ACM Transactions on Database Systems In light of these results, query optimization m MM-DBMS 4,3 (Sept 1979) should be simpler than m convenuonal databasesystems, as the cost [FE861 M Fishem, Techology ‘86 Solid State, IEEE Spectrum formulas are less complicated [SAC791 The issue of clustenng and 23,l (January 1986) proJection for size reduction has been removed from conslderauon, [CiLV83] H Garcia-Mohna, R J Lipton and J Valdes, A Massive thereby slmphfymg the choice of algonthms (ProJectlon may be Memory Machme, Prmceton Umv EECS Dept Tech needed to reduce the number of duplicate entnes m a temporary Rep # 315, July 1983 result, but it is never needed to reduce the szzeof the result tuples, [HoT85] S Horwltz and T Teltelbaum, Relations and Attnbutes because tuples are never copied, only pointed to ) There are three A Symblotlc Basis for Edmng Environments, Proc of possible access paths for selection (hash lookup, tree lookup, or the ACM SIGPLAN Notrces Conf on Language Issues m sequential scan through an unrelated index), three mam Jom methods Programmrng Envrronments, Seattle, WA, June 1985 (precomputedJoin, Tree Merge Join, and Hash Join) and one method [IBM791 IBM, IMS Verwn I Release I5 Fast Path Feature for ehmmatmg duphcates (Hash) Moreover, the choice of w2llch Descnptron and Destgn Gut&, IBM World Trade algorithm IS slmphfied because there 1s a more defimte ordenng of Systems Centers (G320-5775), 1979 preference a hash lookup (exact match only) IS always faster than a [Knu73] D Knuth, Sortzng and Searchmng, Addison-Wesley, tree lookup whch 1salways faster than a sequenhal scan, a precom- 1973 puted Jam 1s always faster than the other Jam methods, and a Tree Merge Jam IS nearly always preferred when the T Tree m&ces [LeC851 T Lehman and M Carey, A Study of Index Structures for Mam Memory Database Management Systems, UW already exist CS Tech Rep # 605, July 1985 (A revised version has been submttted for pubhcauon) 5 References IWW M Leland and W Roome, The S&on Database [AHU74] A Aho, J Hopcroft and J Ullman, The Design and Machme, Proc 4th Int Workhop on Database Analysrs of Computer Algorrthms, Addison-Wesley Machtnes, Grand Bahama Island, March 1985 Pubhshmg Company, 1974 &in841 M Lmton, Implementmg Relational Views of Programs, [AHK85] A Ammann, M Hanrahan and R Knshnamurthy, Proc of the ACM Software Eng NoteslSlGPLAN Design of a Memory Resident DBMS, Proc IEEE Nohces Sofhvare Eng Symp on PraChCal Sofhvare COMPCON, San Francisco, February 1985 Development Envwonments, Pmsburgh, PA, Apnl 1984 [Bab79] E Babb, Implementing a Relational Database by Means &t80] W Lltwm, Linear Hashmg A New Tool For File and of Specialized Hardware, ACM Transactions on Table Addressing, Proc of 6th Int Conf on Very Large Database Systems 4,l (March 1979), l-29 Data Bases, Montreal, Canada, October 1980 [BBD83] D Bltton, H Boral, D DeWltt and W Wllkmson, [SAC791 P Sehnger, M Astrahan, D Chamberhn, R Lone and Parallel Algonthms for the execution of Relational T Price, Access Path Selection m a Relational DBMS, Database Operatrons, ACM Transacnons on Database Proc ACM SIGMOD Conf, June 1979 Systems8,3 (September 1983), 324 [Sha86] L D Shapiro, Jom Processingm DatabaseSystemswith [BlE77] M Blasgen and K Eswaran, Storage and Access m Large Mam Memones, ACM Transachons on Database Relational Databases,IBM Systems Journal 16,4 (1977) Systems, 1986 (to appear) [Bra841 K Bratbeesengen, Hashing Methods and Relational [Sno84] R Snodgrass, Momtonng m a Software Development Algebra Operahons, Proc of 10th Znt Conf on Very Environment A Relauonal Approach, Proc of the ACM Large Data Bases, Singapore, August 1984,323 Software Eng NoteslSIGPLAN Nottces Sofrware Eng [Corn791 D Comer, The Ublqmtous B-Tree, Computrng Surveys Symp on Pracncal Sojlware Development 11,2 (June 1979) Envaronments, Pmsburgh, PA, Apnl1984 [Dat81] C J Date, An Introductron to Database Systems, [VaG84] P Valdunez and G Gardann, Jom and SemiJoin Addison-Wesley, 1981 (3rd Ed ) Algonthms for a Mmluprocessor Database Machme [Dat85] C J Date, An Introductton to Database Systems, Transacuons on Database Systems, ACM Transachons Addison-Wesley, 1985 (4th Ed ) on Database Systems 9,1 (March 1984), 133 [DK084] D Dewitt, R Katz, F Olken, L Shapiro, M mar8 11 D H D Warren, Efficient Processing of Interactive Stonebraker and D Wood, Implementahon Techmques Relational Database Quenes Expressed m Loge, Proc for Mam Memory Database Systems, Proc ACM of 7th Int Conf on Very Large Data Bases, Cannes, SIGMOD Conf, June 1984, l-8 Fance, September, 1981 250