系统R,一个实验数据库系统, 被构造来证明可用性的优势 关系数据模型的实现可以在系统中实现 功能齐全,性能要求高 日常生产使用。本文描述了这三个方面 系统R项目的主要阶段,并讨论了一些 从系统R中获得的设计经验 关系系统和数据库系统。

注脚

展开查看详情

1. COMPUTING PRACTICES A History and Evaluation of System R Donald D. Chamberlin Thomas G. Price Morton M. Astrahan Franco Putzolu Michael W. Blasgen Patricia Griffiths Selinger James N. Gray Mario Schkolnick W. Frank King Donald R. Slutz Bruce G. Lindsay Irving L. Traiger Raymond Lorie Bradford W. Wade James W. Mehl Robert A. Yost IBM Research Laboratory San Jose, California 1. Introduction Throughout the history of infor- mation storage in computers, one of SUMMARY: System R, an experimental database system, the most readily observable trends has been the focus on data indepen- was constructed to demonstrate that the usability advantages dence. C.J. Date [27] defined data of the relational data model can be realized in a system with independence as "immunity of ap- the complete function and high performance required for plications to change in storage struc- everyday production use. This paper describes the three ture and access strategy." Modern principal phases of the System R project and discusses some database systems offer data indepen- of the lessons learned from System R about the design of dence by providing a high-level user relational systems and database systems in general. interface through which users deal with the information content of their data, rather than the various bits, pointers, arrays, lists, etc. which are representation for the information; sented by connections between the used to represent that information. indeed, the representation of a given relevant part and supplier records. In The system assumes responsibility fact may change over time without such a system, a user frames a ques- for choosing an appropriate internal users being aware of the change. tion, such as "What is the lowest Permission to copy without fee all or part of The relational data model was price for bolts?", by writing a pro- this material is granted provided that the cop- proposed by E.F. Codd [22] in 1970 gram which "navigates" through the ies are not made or distributed for direct as the next logical step in the trend maze of connections until it arrives commercial advantage, the ACM copyright notice and the title o f the publication and its toward data independence. Codd ob- at the answer to the question. The date appear, and notice is given that copying served that conventional database user of a "navigational" system has is by permission of the Association for Com- systems store information in two the burden (or opportunity) to spec- puting Machinery. To copy otherwise, or to republish, requires a fee and/or specific per- ways: (1) by the contents of records ify exactly how the query is to be mission. stored in the database, and (2) by the processed; the user's algorithm is Key words and phrases: database manage- ways in which these records are con- then embodied in a program which ment systems, relational model, compilation, locking, recovery, access path selection, au- nected together. Different systems is dependent on the data structure thorization use various names for the connec- that existed at the time the program CR Categories: 3.50, 3.70, 3.72, 4.33, 4.6 tions among records, such as links, was written. Authors' address: D. D. Chamberlin et al., IBM Research Laboratory, 5600 Cottle Road, sets, chains, parents, etc. For exam- Relational database systems, as San Jose, California 95193. ple, in Figure l(a), the fact that sup- proposed by Codd, have two impor- © 1981 ACM 0001-0782/81/1000-0632 75¢. plier Acme supplies bolts is repre- tant properties: (1) all information is 632 Communications October 1981 of Volume 24 the ACM Number 10

2.represented by data values, never by any sort of "connections" which are visible to the user; (2) the system supports a very high-level language FF in which users can frame requests for data without specifying algorithms for processing the requests. The re- lational representation of the data in Figure l(a) is shown in Figure l(b). Information about parts is kept in a PARTS relation in which each record has a "key" (unique identifier) called PARTNO. Information about suppliers SUPPLIERS is kept in a SUPPLIERSrelation keyed by SUPPNO. The information which was formerly represented by connec- tions between records is now con- tained in a third relation, PRICES, in which parts and suppliers are repre- pcF sented by their respective keys. The Fig. l(a). A "Navigational" Database. question "What is the lowest price for bolts?" can be framed in a high- level language like SQL [16] as fol- lows: required for everyday production nisms to protect the integrity of the SELECT MIN(PRICE) FROM PRICES use. database in a concurrent-update en- W H E R E PARTNO IN The key goals established for Sys- vironment. (SELECT P A R T N O tem R were: (5) To provide a means of re- FROM PARTS. W H E R E NAME = 'BOLT'); covering the contents of the database (1) To provide a high-level, to a consistent state after a failure of A relational system can maintain nonnavigational user interface for hardware or software. whatever pointers, indices, or other maximum user productivity and data (6) To provide a flexible mech- access aids it finds appropriate for independence. anism whereby different views of processing user requests, but the (2) To support different types stored data can be defined and var- user's request is not framed in terms of database use including pro- ious users can be authorized to query of these access aids and is therefore grammed transactions, ad hoc que- and update these views. not dependent on them. Therefore, ries, and report generation. (7) To support all of the above the system may change its data rep- (3) To support a rapidly chang- functions with a level of performance resentation and access aids periodi- ing database environment, in which comparable to existing lower-func- cally to adapt to changing require- tables, indexes, views, transactions, tion database systems. ments without disturbing users' ex- and other objects could easily be isting applications. added to and removed from the data- Throughout the System R project, Since Codd's original paper, the base without stopping the system. there has been a strong commitment advantages of the relational data (4) To support a population of to carry the system through to an model in terms of user productivity many concurrent users, with mecha- operationally complete prototype and data independence have become widely recognized. However, as in the early days of high-level program- ming languages, questions are some- PARTS SUPPLIERS PRICES times raised about whether or not an automatic system can choose as ef- PARTNO NAME SUPPNO NAME PARTNO SUPPNO PRICE ficient an algorithm for processing a P107 Bolt $51 Acme P107 $51 .59 complex query as a trained program- P113 Nut $57 Ajax P107 $57 .65 mer would. System R is an experi- P125 Screw $63 Amco P113 $51 .25 mental system constructed at the San P132 Gear P113 $63 .21 P125 $63 .15 Jose IBM Research Laboratory to P132 $57 5.25 demonstrate that a relational data- P132 $63 10.00 base system can incorporate the high performance and complete function Fig. l(b). A Relational Database. 633 Communications October 1981 of Volume 24 the ACM N u m b e r 10

3. tional access method called XRM, by the facilities ofXRM. XRM stores COMPUTING which had been developed by R. relations in the form of "tuples," PRACTICES Lorie at IBM's Cambridge Scientific each of which has a unique 32-bit Center [40]. '(XRM was influenced, "tuple identifier" (TID). Since a TID to some extent, by the " G a m m a contains a page number, it is possi- which could be installed and evalu- Zero" interface defined by E.F. ble, given a TID, to fetch the asso- ated in actual user sites. Codd and others at San Jose [11].) ciated tuple in one page reference. The history of System R can be Since XRM is a single-user access However, rather than actual data divided into three phases. "Phase method without locking or recovery values, the tuple contains pointers to Zero" of the project, which occurred capabilities, issues relating to con- the "domains" where the actual data during 1974 and-most of 1975, in- currency and recovery were excluded is stored, as shown in Figure 2. Op- volved the development of the SQL from consideration in Phase Zero. tionally, each domain may have an user interface [14] and a quick im- An interpreter program was writ- "inversion," which associates do- plementation of a subset of SQL for ten in P L / I to execute statements main values (e.g., "Programmer") one user at a time. The Phase Zero in the high-level SQL (formerly with the TIDs of tuples in which the prototype, described in [2], provided SEQUEL) language [14, 16] on top values appear. Using the inversions, valuable insight in several areas, but of XRM. The implemented subset XRM makes it easy to find a list of its code was eventually abandoned. of the SQL language included que- TIDs of tuples which contain a given "Phase One" of the project, which ries and updates of the database, as value. For example, in Figure 2, if took place throughout most of 1976 well as the dynamic creation of inversions exist on both the JOB and and 1977, involved the design and new database relations. The Phase LOCATION domains, XRM provides construction of the full-function, Zero implementation supported the commands to create a list of TIDs of multiuser version of System R. An "subquery" construct of SQL, but employees who are programmers, initial system architecture was pre- not its "join" construct. In effect, this and another list of TIDs of employ- sented in [4] and subsequent updates meant that a query could search ees who work in Evanston. If the to the design were described in [10]. through several relations in comput- SQL query calls for programmers "Phase Two" was the evaluation of ing its result, but the final result who work in Evanston, these TID System R in actual use. This oc- would be taken from a single rela- lists can be intersected to obtain the curred during 1978 and 1979 and tion. list of TIDs of tuples which satisfy involved experiments at the San Jose The Phase Zero implementation the query, before any tuples are ac- Research Laboratory and several was primarily intended for use as a tually fetched. other user sites. The results of some standalone query interface by end The most challenging task in con- of these experiments and user expe- users at interactive terminals. At the structing the Phase Zero prototype riences are described in [19-21]. At time, little emphasis was placed on was the design of optimizer algo- each user site, System R was installed issues of interfacing to host-language rithms for efficient execution of SQL for experimental purposes only, and programs (although Phase Zero statements on top of XRM. The de- not as a supported commercial prod- could be called from a P L / I sign of the Phase Zero optimizer is uct.1 program). However, considerable given in [2]. The objective of the This paper will describe the de- thought was given to the human fac- optimizer was to minimize the num- cisions which were made and the tors aspects of the SQL language, ber of tuples fetched from the data- lessons learned during each of the and an experimental study was con- base in processing a query. There- three phases of the System R project. ducted on the learnability and usa- fore, the optimizer made extensive bility of SQL [44]. use of inversions and often manipu- 2. Phase Zero: An Initial Proto- One of the basic design decisions lated TID lists before beginning to type in the Phase Zero prototype was that fetch tuples. Since the TID lists were Phase Zero of the System R proj- the system catalog, i.e., the descrip- potentially large, they were stored as ect involved the quick implementa- tion of the content and structure of temporary objects in the database tion of a subset of system functions. the database, should be stored as a during query processing. From the beginning, it was our inten- set of regular relations in the data- The results of the Phase Zero tion to learn what we could from this base itself. This approach permits the implementation were mixed. One initial prototype, and then scrap the system to keep the catalog up to date strongly felt conclusion was that it is Phase Zero code before construction automatically as changes are made a very good idea, in a project the size of the more complete version of Sys- to the database, and also makes the of System R, to plan to throw away tem R. We decided to use the rela- catalog information available to the the initial implementation. On the 1The System R research prototype later system optimzer for use in access positive side, Phase Zero demon- evolved into SQL/Data System, a relational path selection. strated the usability of the SQL lan- database management product offered by guage, the feasibility of creating new IBM in the DOS/VSE operating system en- The structure of the Phase Zero vironment. interpreter was strongly influenced tables and inversions "on the fly" 634 Communications October 1981 of Volume 24 the ACM Number 10

4.and relying on an automatic opti- mizer for access path selection, and Domain#1 : Names Domain# 3: Locations the convenience of storing the system catalog in the database itself. At the same time, Phase Zero taught us a number of valuable lessons which JohnSmith Evanston greatly influenced the design of our later implementation. Some of these lessons are summarized below. (1) The optimizer should take into account not just the cost of fetching tuples, but the costs of cre- ating and manipulating TID lists, \ then fetching tuples, then fetching the data pointed to by the tuples. T'D1 /I When these "hidden costs" are taken into account, it will be seen that the ~ 2 : Jobs manipulation of TID lists is quite expensive, especially if the TID lists are managed in the database rather than in main storage. Programmer (2) Rather than "number of tu- pies fetched," a better measure of cost would have been "number of I/Os." This improved cost measure would have revealed the great im- Fig. 2. X R M Storage Structure. portance of clustering together re- lated tuples on physical pages so that several related tuples could be fetched by a single I/O. Also, an subsequent implementation, both and access path selection functions I/O measure would have revealed a "joins" and "subqueries" were sup- were isolated in the RDS. Construc- serious drawback of XRM: Storing ported. tion of the RSS was underway in the domains separately from the tu- (5) The Phase Zero optimizer 1975 and construction of the RDS pies causes many extra I/Os to be was quite complex and was oriented began in 1976. Unlike XRM, the done in retrieving data values. Be- toward complex queries. In our later RSS was originally designed to sup- cause of this, our later implementa- implementation, greater emphasis port multiple concurrent users. tion stored data values in the actual was placed on relatively simple in- The multiuser prototype of Sys- tuples rather than in separate do- teractions, and care was taken to tem R contained several important mains. (In defense of XRM, it should minimize the "path length" for sim- subsystems which were not present be noted that the separation of data ple SQL statements. in the earlier Phase Zero prototype. values from tuples has some advan- In order to prevent conflicts which tages if data values are relatively 3. Phase One: Construction of a might arise when two concurrent large and if many tuples are proc- Multiuser Prototype users attempt to update the same essed internally compared to the After the completion and evalu- data value, a locking subsystem was number of tuples which are materi- ation of the Phase Zero prototype, provided. The locking subsystem en- alized for output.) work began on the construction of sures that each data value is accessed (3) Because the Phase Zero im- the full-function, multiuser version by only one user at a time, that all plementation was observed to be of System R. Like Phase Zero, Sys- the updates made by a given trans- CPU-bound during the processing of tem R consisted of an access method action become effective simultane- a typical query, it was decided the (called RSS, the Research Storage ously, and that deadlocks between optimizer cost measure should be a System) and an optimizing SQL users are detected and resolved. The weighted sum of CPU time and I / O processor (called RDS, the Rela- security of the system was enhanced count, with weights adjustable ac- tional Data System) which runs on by view and authorization subsys- cording to the system configuration. top of the RSS. Separation of the tems. The view subsystem permits (4) Observation of some of the RSS and RDS provided a beneficial users to define alternative views of applications of Phase Zero con- degree of modularity; e.g., all locking the database (e.g., a view of the em- vinced us of the importance of the and logging functions were isolated ployee file in which salaries are de- "join" formulation of SQL. In our in the RSS, while all authorization leted or aggregated by department). 635 Communications October 1981 of Volume 24 the ACM N u m b e r 10

5.COMPUTING SQL statements of arbitrary com- base changes (e.g., an index is plexity could be decomposed into a dropped), all affected access modules PRACTICES relatively small collection of ma- are marked "invalid." The next time chine-language "fragments," and an invalid access module is invoked, The authorization subsystem ensures that an optimizing compiler could it is regenerated from its original that each user has access only to assemble these code fragments from SQL statements, with newly opti- a library to form a specially tailored mized access paths. This process is those views for which he has been specifically authorized by their cre- routine for processing a given SQL completely transparent to the System ators. Finally, a recovery subsystem statement. This technique had a very R user. dramatic effect on our ability to sup- SQL statements submitted to the was provided which allows the data- port application programs for trans- interactive UFI dialog manager are base to be restored to a consistent state in the event of a hardware or action processing. In System R, a processed by the same optimizing software failure. P L / I or Cobol pi'ogram is run compiler as preprocessed SQL state- In order to provide a useful host- through a preprocessor in which its ments. The UFI program passes the language capability, it was decided SQL statements are examined, opti- ad hoc SQL statement to System R mized, and compiled into small, ef- with a special "EXECUTE" call. In re- that System R should support both P L / I and Cobol application pro- ficient machine-language routines sponse to the EXECUTEcall, System R grams as well as a standalone query which are packaged into an "access parses and optimizes the SQL state- interface, and that the system should module" for the application pro- ment and translates it into a ma- run under either the V M / C M S or gram. Then, when the program goes chine-language routine. The routine M V S / T S O operating system envi- into execution, the access module is is indistinguishable from an access ronment. A key goal of the SQL invoked to perform all interactions module and is executed immediately. language was to present the same with the database by means o f calls This process is described in more capabilities, and a consistent syntax, to the RSS. The process of creating detail in [20]. to users of the P L / I and Cobol host and invoking an access module is illustrated in Figures 3 and 4. All the RSS Access Paths languages and to ad hoc query users. The imbedding of SQL into P L / I is overhead of parsing, validity check- Rather than storing data values described in [16]. Installation of a ing, and access path selection is re- in separate "domains" in the manner multiuser database system under moved from the path of the execut- o f XRM, the RSS chose to store data V M / C M S required certain modifi- ing program and placed in a separate values in the individual rcords of the cations to the operating system in preprocessor step which need not be database. This resulted in records be- support of communicating virtual repeated. Perhaps even more impor- coming variable in length and machines and writable shared virtual tant is the fact that the running pro- longer, on the average, than the gram interacts only with its small, equivalent XRM records. Also, com- memory. These modifications are de- scribed in [32]. special-purpose access module rather monly used values are represented than with a much larger and less many times rather than only once as The standalone query interface of System R (called UFI, the User- efficient general-purpose SQL inter- in XRM. It was felt, however, that Friendly Interface) is supported by preter. Thus, the power and ease of these disadvantages were more than a dialog manager program, written use of the high-level SQL language offset by the following advantage: in PL/I, which runs on top o f System are combined with the execution- All the data values of a record could R like any other application pro- time efficiency of the much lower be fetched by a single I/O. gram. Therefore, the UFI support level RSS interface. In place of XRM "inversions," program is a cleanly separated com- Since all access path selection de- the RSS provides "indexes," which ponent and can be modified inde- cisions are made during the prepro- are associative access aids imple- pendently of the rest of the system. cessor step in System R, there is the mented in the form of B-Trees [26]. In fact, several users improved on possibility that subsequent changes Each table in the database may have our UFI by writing interactive dialog in the database may invalidate the anywhere from zero indexes up to an managers of their own. decisions which are embodied in an index on each column (it is also pos- access module. For example, an in- sible to create an index on a combi- The Compilation Approach dex selected by the optimizer may nation of columns). Indexes make it Perhaps the most important de- later be dropped from the database. possible to scan the table in order by cision in the design of the RDS was Therefore, System R records with the indexed values, or to directly ac- inspired by R. Lorie's observation, in each access module a list of its "de- cess the records which match a par- early 1976, that it is possible to com- pendencies" on database objects ticular value. Indexes are maintained pile very high-level SQL statements such as tables and indexes. The de- automatically by the RSS in the into compact, efficient routines in pendency list is stored in the form of event of updates to the database. System/370 machine language [42]. a regular relation in the system cat- The RSS also implements Lorie was able to demonstrate that alog. When the structure of the data- "links," which are pointers stored 636 Communications October 1981 of Volume 24 the ACM N u m b e r l0

6. temporary list in the database. In P L / I Source Program System R, the RDS makes extensive I use o f index and relation scans and f sorting. The RDS also utilizes links I for internal purposes but not as an SELECT NAME INTO $)< access path to user data. FROM EMP WHERE EMPNO=$Y The Optimizer I Building on our Phase Zero ex- I perience, we designed the System R I optimizer to minimize the weighted sum of the predicted number of I/Os and RSS calls in processing an SQL statement (the relative weights of SYSTEM R these two terms are adjustable ac- PRECOMPILER cording to system configuration). (XPREP) Rather than manipulating TID lists, the optimizer chooses to scan each table in the SQL query by means of only one index (or, if no suitable index exists, by means of a relation scan). For example, if the query calls for programmers who work in Ev- Modified P L / I Program Access Module anston, the optimizer might choose I to use the job index to find program- I mers and then examine their loca- Machine code ready to run tions; it might use the location index CALL on RSS to find Evanston employees and ex- I amine their jobs; or it might simply I scan the relation and examine the job and location of all employees. Fig. 3. Precompilation Step. The choice would be based on the optimizer's estimate of both the clus- tering and selectivity properties of each index, based on statistics stored User's Object with a record which connect it to in the system catalog. An index is Program considered highly selective if it has a other related records. The connec- tion of records on links is not per- large ratio of distinct key values to call formed automatically by the RSS, total entries. An index is considered but must be done by a higher level to have the clustering property if the Loads, system. key order of the index corresponds Execution-time then calls System Access The access paths made available closely to the ordering of records in Module physical storage. The clustering (XRDI) by the RSS include (1) index scans, which access a table associatively property is important because when l call and scan it in value order using an index; (2) relation scans, which scan over a table as it is laid out in phys- a record is fetched via a clustering index, it is likely that other records with the same key will be found on RSS the same page, thus minimizing the ical storage; (3) link scans, which traverse from one record to another number of page fetches. Because of using links. On any of these types of the importance of clustering, mech- scan, "search arguments" may be anisms were provided for loading specified which limit the records re- data in value order and preserving turned to those satisfying a certain the value ordering when new records predicate. Also, the RSS provides a are inserted into the database. built-in sorting mechanism which The techniques of the System R can take records from any of the scan optimizer for performing joins of two Fig. 4. Execution Step. methods and sort them into some or more tables have their origin in a value order, storing the result in a study conducted by M. Blasgen and 637 Communications October 1981 of Volume 24 the ACM N u m b e r 10

7.COMPUTING an SQL parse tree. When an SQL media may fail, the system may fail, operation is to be executed against a or an individual transaction may fail. PRACTICES view, the parse tree which defines Although both the scope of the fail- the operation is merged with the ure and the time to effect recovery parse tree which defines the view, may be different, all three types o f producing a composite parse tree recovery require that an alternate K. Eswaran [7]. Using APL models, which is then sent to the optimizer copy of data be available when the Blasgen and Eswaran studied ten for access path selection. This ap- primary copy is not. methods of joining together tables, proach is similar to the "query mod- When a media failure occurs, based on the use of indexes, sorting, ification" technique proposed by database information on disk is lost. physical pointers, and TID lists. The Stonebraker [48]. The algorithms de- When this happens, an image dump number of disk accesses required to veloped for merging parse trees were of the database plus a log o f " b e f o r e " perform a join was predicted on the sufficiently general so that nearly and "after" changes provide the al- basis of various assumptions for the any SQL statement could be exe- ternate copy which makes recovery ten join methods. Two join methods cuted against any view definition, possible. System R's use of "dual were identified such that one or the with the restriction that a view can logs" even permits recovery from other was optimal or nearly optimal be updated only if it is derived from media failures on the log itself. To under most circumstances. The two a single table in the database. The recover from a media failure, the methods are as follows: reason for this restriction is that some database is restored using the latest updates to views which are derived image dump and the recovery pro- Join Method 1: Scan over the from more than one table are not cess reapplies all database changes qualifying rows of table A. For each meaningful (an example of such an as specified on the log for completed row, fetch the matching rows of table update is given in [24]). transactions. B (usually, but not always, an index The authorization subsystem of When a system failure occurs, the on table B is used). System R is based on privileges information in main memory is lost. Join Method 2: (Often used which are controlled by the SQL Thus, enough information must al- when no suitable index exists.) Sort statements GRANT and REVOKE.Each ways be on disk to make recovery the qualifying rows of tables A and user of System R may optionally be possible. For recovery from system B in order by their respective join given a privilege called RESOURCE failures, System R uses the change fields. Then scan over the sorted lists which enables h i m / h e r to create new log mentioned above plus something and merge them by matching values. tables in the database. When a user called "shadow pages." As each page When selecting an access path for creates a table, he/she receives all in the database is updated, the page a join of several tables, the System R privileges to access, update, and de- is written out in a new place on disk, optimizer considers the problem to stroy that table. The creator of a and the original page is retained. A be a sequence of binary joins. It then table can then grant these privileges directory of the "old" and "new" performs a tree search in which each to other individual users, and subse- locations of each page is maintained. level of the tree consists of one of the quently can revoke these grants if Periodically during normal opera- binary joins. The choices to be made desired. Each granted privilege may tion, a "checkpoint" occurs in which at each level of the tree include which optionally carry with it the "GRANT all updates are forced out to disk, the join method to use and which index, option," which enables a recipient to "old" pages are discarded, and the if any, to select for scanning. Com- grant the privilege to yet other users. "new" pages become "old." In the parisons are applied at each level of A REVOKE destroys the whole chain event of a system crash, the "new" the tree to prune away paths which of granted privileges derived from pages on disk may be in an incon- achieve the same results as other, less the original grant. The authorization sistent state because some updated costly paths. When all paths have subsystem is described in detail in pages may still be in the system been examined, the optimizer selects [37] and discussed further in [31]. buffers and not yet reflected on disk. the one o f minimum predicted cost. To bring the database back to a con- The System R optimizer algorithms The Recovery Subsystem sistent state, the system reverts to the are described more fully in [47]. The key objective of the recovery "old" pages, and then uses the log to subsystem is provision of a means redo all committed transactions and Views and Authorization whereby the database may be re- to undo all updates made by incom- The major objectives of the view covered to a consistent state in the plete transactions. This aspect of the and authorization subsystems o f Sys- event of a failure. A consistent state System R recovery subsystem is de- tem R were power and flexibility. is defined as one in which the data- scribed in more detail in [36]. We wanted to allow any SQL query base does not reflect any updates When a transaction failure o c - to be used as the definition of a view. made by transactions which did not curs, all database changes which This was accomplished by storing complete successfully. There are have been made by the failing trans- each view definition in the form of three basic types of failure: the disk action must be undone. To accom- 638 Communications October 1981 of Volume 24 the ACM N u m b e r 10

8.plish this, System R simply processes "intention" locks are simultaneously tal applications, although no speci- the change log backwards removing acquired on the larger objects which fic performance comparisons were all changes made by the transaction. contain them. For example, user A drawn. In general, the experimental Unlike media and system recovery and user B may both be updating databases used with System R were which both require that System R be employee records. Each user holds smaller than one 3330 disk pack (200 reinitialized, transaction recovery an "intention" lock on the employee Megabytes) and were typically ac- takes place on-line. table, and "exclusive" locks on the cessed by fewer than ten concurrent particular records being updated. If users. As might be expected, inter- The Locking Subsystem user A attempts to trade her individ- active response slowed down during A great deal of thought was given ual record locks for an "exclusive" the execution of very complex SQL to the design of a locking subsystem lock at the table level, she must wait statements involving joins of several which would prevent interference until user B ends his transaction and tables. This performance degrada- among concurrent users of System releases his "intention" lock on the tion must be traded off against R. The original design involved the table. the advantages of normalization concept of "predicate locks," in [23, 30], in which large database which the lockable unit was a data- 4. Phase Two: Evaluation tables are broken into smaller parts base property such as "employees The evaluation phase of the Sys- to avoid redundancy, and then whose location is Evanston." Note tem R project lasted approximately joined back together by the view that, in this scheme, a lock might be 2'/2 years and consisted of two parts: mechanism or user applications. held on the predicate LOC = 'EVANS- (l) experiments performed on the TON', even if no employees currently system at the San Jose Research Lab- The SQL Language satisfy that predicate. By comparing oratory, and (2) actual use of the The SQL user interface of System the predicates being processed by system at a number of internal IBM R was generally felt to be successful different users, the locking subsys- sites and at three selected customer in achieving its goals of simplicity, tem could prevent interference. The sites. At all user sites, System R was power, and data independence. The "predicate lock" design was ulti- installed on an experimental basis language was simple enough in its mately abandoned because: (1) de- for study purposes only, and not as basic structure so that users without termining whether two predicates are a supported commercial product. prior experience were able to learn a mutually satisfiable is difficult and The first installations of System R usable subset on their first sitting. At time-consuming; (2) two predicates took place in June 1977. the same time, when taken as a may appear to conflict when, in fact, whole, the language provided the the semantics of the data prevent any General User Comments query power of the first-order pred- conflict, as in "PRODUCT = AIR- In general, user response to Sys- icate calculus combined with opera- CRAFT" and "MANUFACTURER ---~ tem R has been enthusiastic. The tors for grouping, arithmetic, and ACME STATIONERY CO."; a n d (3) w e system was mostly used in applica- built-in functions such as SUM and desired to contain the locking sub- tions for which ease of installation, AVERAGE. system entirely within the RSS, and a high-level user language, and an Users consistently praised the therefore to make it independent of ability to rapidly reconfigure the uniformity of the SQL syntax across any understanding of the predicates database were important require- the environments of application pro- being processed by various users. ments. Several user sites reported grams, ad hoc query, and data defi- The original predicate locking that they were able to install the nition (i.e., definition of views). scheme is described in [29]. system, design and load a database, Users who were formerly required to The locking scheme eventually and put into use some application learn inconsistent languages for these chosen for System R is described in programs within a matter of days. purposes found it easier to deal with [34]. This scheme involves a hierar- User sites also reported that it was the single syntax (e.g., when debug- chy of locks, with several different possible to tune the system perform- ging an application program by sizes of lockable units, ranging from ance after data was loaded by creat- querying the database to observe its individual records to several tables. ing and dropping indexes without " effects). The single syntax also en- The locking subsystem is transparent impacting end users or application hanced communication among dif- to end users, but acquires locks on programs. Even changes in the data- ferent functional organizations (e.g., physical objects in the database as base tables could be made transpar- between database administrators and they are processed by each user. ent to users if the tables were read- application programmers). When a user accumulates many only, and also in some cases for up- While developing applications small locks, they may be "traded" dated tables. using SQL, our experimental users for a larger lockable unit (e.g., locks Users found the performance made a number of suggestions for on many records in a table might be characteristics and resource con- extensions and improvements to the traded for a lock on the table). When sumption of System R to be gener- language, most of which were imple- locks are acquired on small objects, ally satisfactory for their experimen- mented during the course of the proj- 639 Communications October 1981 of Volume 24 the ACM N u m b e r 10

9.COMPUTING The CompilationApproach compilation are obvious. All the The approach of compiling SQL overhead of parsing, validity check- PRACTICES statements into machine code was ing, and access path selection are one of the most successful parts of removed from the path of the run- the System R project. We were able ning transaction, and the application ect. Some of these suggestions are to generate a machine-language rou- program interacts with a small, spe- summarized below: tine to execute any SQL statement of cially tailored access module rather (1) Users requested an easy-to- arbitrary complexity by selecting than with a larger and less efficient use syntax when testing for the exist- code fragments from a library of ap- general-purpose interpreter pro- ence or nonexistence of a data item, proximately 100 fragments. The re- gram. Experiments [38] showed that such as an employee record whose sult was a beneficial effect on trans- for a typical short transaction, about department number matches a given action programs, ad hoc query, and 80 percent of the instructions were department record. This facility was system simplicity. executed by the RSS, with the re- implemented in the form of a special In an environment of short, re- maining 20 percent executed by the "EXISTS" predicate. petitive transactions, the benefits of access module and application pro- (2) Users requested a means of seaching for character strings whose contents are only partially known, such as "all license plates beginning with NVK." This facility was imple- mented in the form of a special Example 1 : "LIKE" predicate which searches for "patterns" that are allowed to con- SELECT SUPPNO, PRICE FROM QUOTES tain "don't care" characters. WHERE PARTNO = '010002' (3) A requirement arose for an AND MI NQ < = 1000 AND M A X Q > = 1000; application program to compute an CPU time Number SQL statement dynamically, submit Operation (msec on 168) of I / O s the statement to the System R optim- Parsing 13.3 0 izer for access path selection, and then execute the statement repeat- Access Path 40.0 9 edly for different data values without Selection reinvoking the optimizer. This facil- Code 10.1 0 ity was implemented in the form of Generation PREPARE and EXECUTE statements Fetch 1.5 0.7 which were made available in the answer set host-language version of SQL. (per record) (4) In some user applications the need arose for an operator which Codd has called an "outer join" [25]. Suppose that two tables (e.g., suP- Example 2: PLIERS and PROJECTS) are related by SELECT ORDERNO,ORDERS.PARTNO,DESCRIP,DATE,QTY a common data field (e.g., PARTNO). FROM ORDERS,PARTS In a conventional join of these tables, WHERE ORDERS.PARTNO = PARTS.PARTNO AND DATE BETWEEN '750000' AND '751231' supplier records which have no AND SUPPNO = '797'; matching project record (and vice versa) would not appear. In an CPU time Number Operation (msec on 168) of I / O s "outer join" of these tables, supplier records with no matching project rec- Parsing 20.7 0 ord would appear together with a Access Path 73.2 9 "synthetic" project record containing Selection only null values (and similarly for Code 19.3 0 projects with no matching supplier). Generation An "outer-join" facility for SQL is Fetch 8.7 10.7 currently under study. answer set A more complete discussion of (per record) user experience with SQL and the resulting language improvements is presented in [19]. Fig. 5. Measurements of Cost of Compilation. 64O Communications October 1981 of Volume 24 the ACM N u m b e r l0

10.gram. Thus, the user pays only a (2) If code generation results in ords by a three-level index. If we small cost for the power, flexibility, a routine which runs more efficiently wish to begin an associative scan and data independence of the SQL than an interpreter, the cost of the through a large table, three I/Os will language, compared with writing the code generation step is paid back typically be required (assuming the same transaction directly on the after fetching only a few records. (In root page is referenced frequently lower level RSS interface. Example 1, if the CPU time per rec- enough to remain in the system In an ad hoc query environment ord of the compiled module is half buffers, we need an I / O for the in- the advantages of compilation are that of an interpretive system, the termediate-level index page, the less obvious since the compilation cost of generating the access module "leaf" index page, and the data must take place on-line and the is repaid after seven records have page). If several records are to be query is executed only once. In this been fetched.) fetched using the index scan, the environment, the cost of generating three start-up I/Os are relatively in- A final advantage of compilation significant. However, if only one rec- a machine-language routine for a is its simplifying effect on the system ord is to be fetched, other access given query must be balanced architecture. With both ad hoc que- techniques might have provided a against the increased efficiency of ries and precanned transactions quicker path to the stored data. this routine as compared with a more being treated in the same way, most Two common access techniques conventional query interpreter. Fig- of the code in the system can be which were not utilized for user data ure 5 shows some measurements of made to serve a dual purpose. This in System R are hashing and direct the cost of compiling two typical ties in very well with our objective of links (physical pointers from one rec- SQL statements (details of the exper- supporting a uniform syntax between ord to another). Hashing was not iments are given in [20]). From this query users and transaction pro- used because it does not have the data we may draw the following con- grams. convenient ordering property of a B- clusions: tree index (e.g., a B-tree index on (1) The code generation step Available Access Paths SALARY enables a list of employees adds a small amount of CPU time As described earlier, the principal ordered by SALARY to be retrieved and no I/Os to the overhead of pars- access path used in System R for very easily). Direct links, although ing and access path selection. Parsing retrieving data associatively by its they were implemented at the RSS and access path selection must be value is the B-tree index. A typical level, were not used as an access path done in any query system, including index is illustrated in Figure 6. If we for user data by the RDS for a two- interpretive ones. The additional in- assume a fan-out of approximately fold reason. Essential links (links structions spent on code generation 200 at each level of the tree, we can whose semantics are not known to are not likely to be perceptible to an index up to 40~000 records by a two- the system but which are connected end user. level index, and up to 8,000,000 rec- directly by users) were rejected be- cause they were inconsistent with the nonnavigational user interface of a relational system, since they could not be used as access paths by an automatic optimizer. Nonessential ] Root links (links which connect records to other records with matching data values) were not implemented be- cause of the difficulties in automati- cally maintaining their connections. Intermediate When a record is updated, its con- Pages nections on many links may need to be updated as well, and this may involve many "subsidiary queries" to find the other records which are in- Leaf volved in these connections. Prob- Pages lems also arise relating to records which have no matching partner rec- ord on the link, and records whose link-controlling data value is null. [] [] [] [] Data In general, our experience [] Pages showed that indexes could be used very efficiently in queries and trans- Fig. 6. A B-Tree Index. actions which access many records, 641 Communications October 1981 of Volume 24 the ACM N u m b e r 10

11.COMPUTING was modified in such a way that it exists on SEQNO. Consider the follow- could be made to generate the com- ing SQL query: PRACTICES plete tree of access paths, without SELECT * FROM T WH ER E SEQNO IN pruning, and to estimate the cost of each path (cost is defined as a (15, 17, 19, 21); weighted sum of page fetches and This query has an answer set of but that hashing and links would RSS calls). Mechanisms were also (at most) four rows, and an obvious have enhanced the performance of added to the system whereby it could method of processing it is to use the "canned transactions" which access be forced to execute an SQL state- SEQNO index repeatedly: first to find only a few records. As an illustration ment by a particular access path and the row with SEQNO 15, then SEQNO = of this problem, consider an inven- to measure the actual number of = 17, etc. However, this access path tory application which has two page fetches and RSS calls incurred. would not be chosen by System R, tables: a PRODUCTStable, and a much In this way, a comparison can be because the optimizer is not pres- larger PARTS table which contains made between the optimizer's pre- ently structured to consider multiple data on the individual parts used for dicted cost and the actual measured uses of an index within a single query each product. Suppose a given trans- cost for various alternative paths. block. As we gain more experience action needs to find the price of the In [6], an experiment is described with access path selection, the opti- heating element in a particular in which ten SQL statements, includ- mizer may grow to encompass this toaster. To execute this transaction, ing some single-table queries and and other access paths which have so System R might require two I/Os to some joins, are run against a test far been omitted from consideration. traverse a two-level index to find the database. The database is artificially toaster record, and three more I/Os generated to conform to the two Views and Authorization to traverse another three-level index basic assumptions of the System R Users generally found the System to find the heating element record. If optimizer: (1) the values in each col- R mechanisms for defining views access paths based on hashing and umn are uniformly distributed from and controlling authorization to be direct links were available, it might some minimum to some maximum powerful, flexible, and convenient. be possible to find the toaster record value; and (2) the distribution of val- The following features were consid- in one I / O via hashing, and the heat- ues of the various columns are inde- ered to be particularly beneficial: ing element record in one more I / O pendent of each other. For each of (1) The full query power of via a link. (Additional I/Os would the ten SQL statements, the ordering SQL is made available for defining be required in the event of hash col- of the predicted costs of the various new views of data (i.e., any query lisions or if the toaster parts records access paths was the same as the may be defined as a view). This occupied more than one page.) Thus, ordering of the actual measured costs makes it possible to define a rich for this very simple transaction hash- (in a few cases the optimizer pre- variety of views, containing joins, ing and links might reduce the num- dicted two paths to have the same subqueries, aggregation, etc., without ber of I/Os from five to three, or cost when their actual costs were un- having to learn a separate "data def- even two. For transactions which re- equal but adjacent in the ordering). inition language." However, the view trieve a large set of records, the ad- Although the optimizer was able mechanism is not completely trans- ditional I/Os caused by indexes com- to correctly order the access paths in parent to the end user, because of the pared to hashing and links are less the experiment we have just de- restrictions described earlier (e.g., important. scribed, the magnitudes of the pre- views involving joins of more than dicted costs differed from the mea- The Optimizer one table are not updateable). sured costs in several cases. These (2) The authorization subsys- A series of experiments was con- discrepancies were due to a variety tem allows each installation of Sys- ducted at the San Jose IBM Research of causes, such as the optimizer's in- tem R to choose a "fully centralized Laboratory to evaluate the success of ability to predict how much data policy" in which all tables are cre- the System R optimizer in choosing would remain in the system buffers ated and privileges controlled by a among the available access paths for during sorting. central administrator; or a "fully de- typical SQL statements. The results The above experiment does not centralized policy" in which each of these experiments are reported in address the issue of whether or not a user may create tables and control [6]. For the purpose of the experi- very good access path for a given access to them; or some intermediate ments, the optimizer was modified in SQL statement might be overlooked policy. order to observe its behavior. Or- because it is not part of the opti- dinarily, the optimizer searches mizer's repertoire. One such example During the two-year evaluation through a tree of path choices, com- is known. Suppose that the database of System R, the following sugges- puting estimated costs and pruning contains a table T in which each row tions were made by users for im- the tree until it arrives at a single has a unique value for the field provement of the view and authori- preferred access path. The optimizer SEQNO, and suppose that an index zation subsystems: 642 Communications October 1981 of Volume 24 the A C M N u m b e r 10

12. (1) The authorization subsys- This performance impact is due pri- a Level-1 transaction may not give tem could be augmented by the con- marily to the following factors: consistent values. A Level-l trans- cept of a "group" of users. Each (1) Since each updated page is action does not attempt to acquire group would have a "group admin- written out to a new location on disk, any locks on records while reading. istrator" who controls enrollment of data tends to move about. This limits Level 2: A transaction running new members in the group. Privi- the ability of the system to cluster at Level 2 is protected against read- leges could then be granted to the related pages in secondary storage to ing uncommitted data. However, group as a whole rather than to each minimize disk arm movement for se- successive reads at Level 2 may still member of the group individually. quential applications. yield inconsistent values if a second (2) A new command could be (2) Since each page can poten- transaction updates a given record added to the SQL language to tially have both an "old" and "new" and then terminates between the first change the ownership of a table from version, a directory must be main- and second reads by the Level-2 one user to another. This suggestion tained to locate both versions of each transaction. A Level-2 transaction is more difficult to implement than page. For large databases, the direc- locks each record before reading it to it seems at first glance, because the tory may be large enough to require make sure it is committed at the time owner's name is part of the fully a paging mechanism of its own. of the read, but then releases the lock qualified name of a table (i.e., two (3) The periodic checkpoints immediately after reading. tables owned by Smith and Jones which exchange the "old" and "new" Level 3: A transaction running could be named SMITH.PARTS and page pointers generate I / O activity at Level 3 is guaranteed that succes- JONES.PARTS). References to the and consume a certain amount of sive reads of the same record will table SMITH.PARTS might exist in CPU time. yield the same value. This guarantee many places, such as view definitions is enforced by acquiring a lock on A possible alternative technique and compiled programs. Finding each record read by a Level-3 trans- for recovering from system failures and changing all these references action and holding the lock until the would dispense with the concept of would be difficult (perhaps impossi- end of the transaction. (The lock ac- shadow pages, and simply keep a log ble, as in the case of users' source quired by a Level-3 reader is a of all database updates. This design programs which are not stored under "share" lock which permits other would require that all updates be System R control). users to read but not update the written out to the log before the up- (3) Occasionally it is necessary locked record.) dated page migrates to disk from the to reload an existing table in the system buffers. Mechanisms could be database (e.g., to change its physical It was our intention that Isolation developed to minimize I/Os by re- clustering properties). In System R Level 1 provide a means for very taining updated pages in the buffers this is accomplished by dropping the quick scans through the database until several pages are written out at old table definition, creating a new when approximate values were ac- once, sharing an I / O to the log. table with the same definition, and ceptable, since Level-1 readers ac- reloading the data into the new table. The Locking Subsystem quire no locks and should never need Unfortunately, views and authoriza- to wait for other users. In practice, The locking subsystem of System tions defined on the table are lost however, it was found that Level-1 R provides each user with a choice from the system when the old defi- readers did have to wait under cer- of three levels of isolation from other nition is dropped, and therefore they tain circumstances while the phys- users. In order to explain the three both must be redefined on the new ical consistency of the data was levels, we define "uncommitted table. It has been suggested that suspended (e.g., while indexes data" as those records which have views and authorizations defined on or pointers were being adjusted). been updated by a transaction that is a dropped table might optionally be Therefore, the potential of Level 1 still in progress (and therefore still held "in abeyance" pending reacti- for increasing system concurrency subject to being backed out). Under vation of the table. was not fully realized. no circumstances can a transaction, It was our expectation that a The Recovery Subsystem at any isolation level, perform up- tradeoff would exist between Isola- dates on the uncommitted data of The combined "shadow page" tion Levels 2 and 3 in which Level 2 another transaction, since this might and log mechanism used in System would be "cheaper" and Level 3 lead to lost updates in the event of R proved to be quite successful in "safer." In practice, however, it was transaction backout. safeguarding the database against observed that Level 3 actually in- The three levels of isolation in media, system, and transaction fail- volved less CPU overhead than System R are defined as follows: ures. The part of the recovery sub- Level 2, since it was simpler to ac- system which was observed to have Level 1: A transaction running quire locks and keep them than to the greatest impact on system per- at Level 1 may read (but not update) acquire locks and immediately formance was the keeping of a uncommitted data. Therefore, suc- release them. It is true that Isolation shadow page for each updated page. cessive reads of the same record by Level 2 permits a greater degree of 643 Communications October 1981 of Volume 24 the ACM Number 10

13.COMPUTING dispatchable processes will soon re- quest the same lock and become en- working set reduced if several users executing the same "canned trans- PRACTICES queued behind the sleeping process. action" could share a common access This phenomenon is called a "con- module. This would require the Sys- voy." tem R code generator to produce In the original System R design, reentrant code. Approximately half access to the database by concurrent convoys are stable because of the the space occupied by the multiple readers and updaters than does Level protocol for releasing locks. When a copies of the access module could be 3. However, this increase in concur- process P releases a lock, the locking saved by this method, since the other rency was not observed to have an subsystem grants the lock to the first half consists of working storage important effect in most practical ap- waiting process in the queue (thereby which must be duplicated for each plications. making it unavailable to be reac- user. As a result of the observations quired by P). After a short time, P (2) When the recovery subsys- described above, most System R once again requests the lock, and is tem attempts to take an automatic users ran their queries and applica- forced to go to the end of the convoy. checkpoint, it inhibits the processing tion programs at Level 3, which was If the mean time between requests of new RSS commands until all users the system default. for the high-traffic lock is 1,000 in- have completed their current RSS structions, each process may execute command; then the checkpoint is The Convoy Phenomenon only 1,000 instructions before it taken and all users are allowed to drops to the end of the convoy. Since proceed. However, certain RSS com- Experiments with the locking more than 1,000 instructions are typ- mands potentially involve long op- subsystem of System R identified a ically used to dispatch a process, the erations, such as sorting a file. If problem which came to be known as system goes into a "thrashing" con- these "long" RSS operations were the "convoy phenomenon" [9]. dition in which most of the cycles are made interruptible, it would avoid There are certain high-traffic locks spent on dispatching overhead. any delay in performing checkpoints. in System R which every process The solution to the convoy prob- (3) The System R design o f au- requests frequently and holds for a lem involved a change to the lock tomatically maintaining a system short time. Examples of these are the release protocol of System R. After catalog as part of the on-line data- locks which control access to the the change, when a process P releases base was very well liked by users, buffer pool and the system log. In a a lock, all processes which are en- since it permitted them to access the "convoy" condition, interaction be- queued for the lock are made dis- information in the catalog with ex- tween a high-traffic lock and the op- patchable, but the lock is not granted actly the same query language they erating system dispatcher tends to to any particular process. Therefore, use for accessing other data. serialize all processes in the system, the lock may be regranted to process allowing each process to acquire the P if it makes a subsequent request. 5. Conclusions lock only once each time it is dis- Process P may acquire and release patched. We feel that our experience with the lock many times before its time In the VM/370 operating system, System R has clearly demonstrated slice is exhausted. It is highly prob- each process in the multiprogram- the feasibility of applying a rela- able that process P will not be hold- ming set receives a series of small tional database system to a real pro- ing the lock when it goes into a long "quanta" of CPU time. Each quan- duction environment in which many wait. Therefore, if a convoy should tum terminates after a preset amount concurrent users are performing a ever form, it will most likely evapo- of CPU time, or when the process mixture of ad hoc queries and repet- rate as soon as all the members of goes into page, 1/O, or lock wait. At itive transactions. We believe that the convoy have been dispatched. the end of the series of quanta, the the high-level user interface made process drops out of the multipro- Additional Observations possible by the relational data model gramming set and must undergo a can have a dramatic positive effect Other observations were made longer "time slice wait" before it on user productivity in developing during the evaluation of System R once again becomes dispatchable. new applications, and on the data and are listed below: Most quanta end when a process independence of queries and pro- waits for a page, an I / O operation, (1) When running in a "canned grams. System R has also demon- or a low-traffic lock. The System R transaction" environment, it would strated the ability to support a highly design ensures that no process will be helpful for the system to include dynamic database environment in ever hold a high-traffic lock during a data communications front end to which application requirements are any of these types of wait. There is handle terminal interactions, priority rapidly changing. a slight probability, however, that a scheduling, and logging and restart In particular, System R has illus- process might go into a long "time at the message level. This facility was trated the feasibility of compiling a slice wait" while it is holding a high- not included in the System R design. very high-level data sublanguage, traffic lock. In this event, all other Also, space would be saved and the SQL, into machine-level code. The 644 Communications October 1981 of Volume 24 the ACM N u m b e r 10

14.result of this compilation technique from E. F. Codd, whose landmark 12. Boyce, R.F., and Chamberlin, D.D. Us- is that most of the overhead cost for ing a structured English query language as a paper [22] introduced the relational data definition facility. IBM Res. Rep. implementing the high-level lan- model of data. The manager of the RJl318, San Jose, Calif., Dec. 1973. guage is pushed into a "precompila- project through most of its existence 13. Boyce, R.F., Chamberlin, D.D., King, tion" step, and performance for was W. F. King. W.F., and Hammer, M.M. Specifying queries canned transactions is comparable to In addition to the authors of this as relational expressions: The SQUARE data that of a much lower level system. paper, the following people were as- sublanguage. Comm. A C M 18, I l (Nov. 1975), 621-628. The compilation approach has also sociated with System R and made proved to be applicable to the ad hoc important contributions to its devel- 14. Chamberlin, D.D., and Boyce, R.F. SE- QUEL: A structured English query language. query environment, with the result opment: Proc. ACM-SIGMOD Workshop on Data that a unified mechanism can be Description, Access, and Control, Ann Ar- M. Adiba M. Mresse bor, Mich., May 1974, pp. 249-264. used to support both queries and transactions. R.F. Boyce J.F. Nilsson 15. Chamberlin, D.D., Gray, J.N., and The evaluation of System R has A. Chan R.L. Obermarck Traiger, I.L. Views, authorization, and lock- D.M. Choy D. Stott Parker ing in a relational database system. Proc. led to a number of suggested im- 1975 Nat. Comptr. Conf., Anaheim, Calif., provements. Some of these improve- K. Eswaran D. Portal pp. 425-430. ments have already been imple- R. Fagin N. Ramsperger P. Fehder P. Reisner 16. Chamberlin, D.D., et al. SEQUEL 2: A mented and others are still under unified approach to data definition, manipu- study. Two major foci of our contin- T. Haerder P.R. Roever lation, and control. I B M J. Res. and Develop. uing research program at the San R.H. Katz R. Selinger 20, 6 (Nov. 1976), 560-575 (also see errata in W. Kim Jan. 1977 issue). Jose laboratory are adaptation of H.R. Strong System R to a distributed database H. Korth P. Tiberio 17. Chamberlin, D.D. Relational database P. McJones V. Watson management systems. Comptng. Surv. 8, I environment, and extension of our (March 1976), 43-66. optimizer algorithms to encompass a D. McLeod R. Williams 18. Chamberlin, D.D., et al. Data base sys- broader set of access paths. References tem authorization. In Foundations o f Secure Sometimes questions are asked Computation, R. Demillo, D. Dobkin, A. 1. Adiba, M.E., and Lindsay, B.G. Data- Jones, and R. Lipton, Eds., Academic Press, about how the performance of a re- New York, 1978, pp. 39-56. base snapshots. IBM Res. Rep. RJ2772, San lational database system might com- Jose, Calif., March 1980. pare to that of a "navigational" sys- 19. Chamberlin, D.D. A summary of user 2. Astrahan, M.M., and Chamberlin, D.D. experience with the SQL data sublanguage. tem in which a programmer carefully Implementation of a structured English Proc. Internat. Conf. Data Bases, Aberdeen, hand-codes an application to take query language. Comm. A C M 18, 10 (Oct. Scotland, July 1980, pp. 181-203 (also IBM 1975), 580-588. Res. Rep. RJ2767, San Jose, Calif., April advantage of explicit access paths. 3. Astrahan, M.M., and Lorie, R.A. SE- 1980). Our experiments with the System R QUEL-XRM: A Relational System. Proc. optimizer and compiler suggest that ACM Pacific Regional Conf., San Francisco, 20. Chamberlin, D.D., et al. Support for re- Calif., April 1975, p. 34. petitive transactions and ad-hoc queries in the relational system will probably System R. A C M Trans. Database Syst. 6, 1 approach but not quite equal the 4. Astrahan, M.M., et al. System R: A rela- (March 1981), 70-94. tional approach to database management. performance of the navigational sys- A C M Trans. Database Syst.1, 2 (June 1976) 21. Chamberlin, D.D., Gilbert, A.M., and tem for a particular, highly tuned 97-137. Yost, R.A. A history of System R and SQL/ application, but that the relational 5. Astrahan, M.M., et al. System R: A rela- data system (presented at the Internat. Conf. tional data base management system. 1EEE Very Large Data Bases, Cannes, France, system is more likely to be able to Sept. 1981). Comptr. 12, 5 (May 1979), 43-48. adapt to a broad spectrum of unan- 6. Astrahan, M.M., Kim, W., and Schkol- 22. Codd, E.F. A relational model of data ticipated applications with adequate nick, M. Evaluation of the System R access for large shared data banks. Comm. A C M performance. We believe that the path selection mechanism. Proc. IFIP Con- 13, 6 (June 1970), 377-387. benefits of relational systems in the gress, Melbourne, Australia, Sept. 1980, pp. 487-491. 23. Codd, E.F. Further normalization of the areas of user productivity, data in- data base relational model. In Courant Com- 7. Blasgen, M.W., Eswaran, K.P. Storage puter Science Symposia, Vol. 6: Data Base dependence, and adaptability to and access in relational databases. I B M Syst. Systems, Prentice-Hall, Englewood Cliffs, changing circumstances will take on J. 16, 4 (1977), 363-377. N.J., 1971, pp. 33-64. increasing importance in the years 8. Blasgen, M.W., Casey, R.G., and Es- waran, K.P. An encoding method for multi- 24. Codd, E.F. Recent investigations in rela- ahead. tional data base systems. Proc. IFIP Con- field sorting and indexing. Comm. A C M 20, 11 (Nov. 1977), 874-878. gress, Stockholm, Sweden, Aug. 1974. A ckno wledgments 9. Blasgen, M., Gray, J., Mitoma, M., and 25. Codd, E.F. Extending the database rela- Price, T. The convoy phenomenon. Operat- tional model to capture more meaning. A C M From the beginning, System R ing Syst. Rev. 13, 2 (April 1979), 20-25. Trans. Database Syst. 4, 4 (Dec. 1979), 397- was a group effort. Credit for any 10. Blasgen, M.W., et al. System R: An ar- 434. success of the project properly be- chitectural overview. I B M Syst. J. 20, 1 (Feb. 1981), 41-62. 26. Comer, D. The ubiquitous B-Tree. longs to the team as a whole rather Comptng. Surv. 11, 2 (June 1979), 121-137. than to specific individuals. 11. Bjorner, D., Codd, E.F., Deckert, K.L., and Traiger, I.L. The Gamma Zero N-ary 27. Date, C.J. An Introduction to Database The inspiration for constructing relational data base interface. IBM Res. Rep. Systems. 2nd Ed., Addison-Wesley, New a relational system came primarily RJ 1200, San Jose, Calif., April 1973. York, 1977. 645 Communications October 1981 of Volume 24 the ACM Number 10

15.28. Eswaran, K.P., and Chamberlin, D.D. 35. Gray, J.N. Notes on database operating 43. Lorie, R.A., and Nilsson, J.F. An access Functional specifications of a subsystem for systems. In Operating Systems: An Advanced specification language for a relational data database integrity. Proc. Conf. Very Large Course, Goos and Hartmanis, Eds., Springer- base system. I B M J. Res. and Develop. 23, 3 Data Bases, Framingham, Mass., Sept. 1975, Verlag, New York, 1978, pp. 393-481 (also (May 1979), 286-298. pp. 48-68. IBM Res. Rep. RJ2188, San Jose, Calif.). 44. Reisner, P., Boyce, R.F., and Chamber- 29. Eswaran, K.P., Gray, J.N., Lorie, R.A., lin, D.D. Human factors evaluation of two 36. Gray, J.N., et al. The recovery manager and Traiger, I.L. On the notions of consis- data base query languages: SQUARE and of a data management system. IBM Res. tency and predicate locks in a database sys- SEQUEL. Proc. AFIPS Nat. Comptr. Conf., Rep. RJ2623, San Jose, Calif., June 1979. Anaheim, Calif., May 1975, pp. 447-452. tem. Comm. A C M 19, 11 (Nov. 1976), 624- 633. 37. Griffiths, P.P., and Wade, B.W. An au- 45. Reisner, P. Use of psychological experi- thorization mechanism for a relational data- mentation as an aid to development of a 30. Fagin, R. Multivalued dependencies and base system. A C M Trans. Database Syst. 1, 3 query language. I E E E Trans. Software Eng. a new normal form for relational databases. (Sept. 1976), 242-255. SE-3, 3 (May 1977), 218-229. A C M Trans. Database Syst. 2, 3 (Sept. 1977), 262-278. 46. Schkolnick, M., and Tiberio, P. Consid- 38. Katz, R.H., and Selinger, R.D. Internal erations in developing a design tool for a 31. Fagin, R. On an authorization mecha- comm., IBM Res. Lab., San Jose, Calif., relational DBMS. Proc. IEEE COMPSAC nism. A C M Trans. Database Syst. 3, 3 (Sept. Sept. 1978. 79, Nov. 1979, pp. 228-235. 1978), 310-319. 39. Kwan, S.C., and Strong, H.R. Index 47. Selinger, P.G., et al. Access path selec- 32. Gray, J.N., and Watson, V. A shared tion in a relational database management path length evaluation for the research stor- segment and inter-process communication system. Proc. ACM SIGMOD Conf., Boston, age system of System R. IBM Res. Rep. facility for VM/370. IBM Res. Rep. RJ1579, Mass., June 1979, pp. 23-34. RJ2736, San Jose, Calif., Jan. 1980. San Jose, Calif., Feb. 1975. 48. Stonebraker, M. Implementation of in- 33. Gray, J.N., Lorie, R.A., and Putzolu, 40. Lorie, R.A. X R M - - A n extended (N-ary) tegrity constraints and views by query modi- G.F. Granularity of locks in a large shared relational memory. IBM Tech. Rep. G320- fication. Tech. Memo ERL-M514, College of database. Proc. Conf. Very Large Data 2096, Cambridge Scientific Ctr., Cambridge, Eng., Univ. of Calif. at Berkeley, March Bases, Framingham, Mass., Sept. 1975, pp. Mass., Jan. 1974. 1975. 428-451. 49. Strong, H.R., Traiger, I.L., and Mar- 41. Lorie, R.A. Physical integrity in a large 34. Gray, J.N., Lorie, R.A., Putzolu, G.R., segmented database. A C M Trans. Database kowsky, G. Slide Search. IBM Res. Rep. and Traiger, I.L. Granularity of locks and Syst. 2, 1 (March 1977), 91-104. RJ2274, San Jose, Calif., June 1978. degrees of consistency in a shared data base. 50. Traiger, I.L., Gray J.N., Galtieri, C.A., Proc. IFIP Working Conf. Modelling of 42. Lorie, R.A., and Wade, B.W. The com- and Lindsay, B.G. Transactions and consis- Database Management Systems, Freuden- pilation of a high level data language. IBM tency in distributed database systems. IBM stadt, Germany, Jan. 1976, pp. 695-723 (also Res. Rep. RJ2598, San Jose, Calif., Aug. Res. Rep. RJ2555, San Jose, Calif., June IBM Res. Rep. RJ1654, San Jose, Calif.). 1979. 1979. 646 Communications October 1981 of Volume 24 the ACM Number 10