Federated Database Systems for Managing Distributed,


1.Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases’ AMIT P. SHETH Bellcore, lJ-210, 444 Hoes Lane, Piscataway, New Jersey 08854 JAMES A. LARSON Intel Corp., HF3-02, 5200 NE Elam Young Pkwy., Hillsboro, Oregon 97124 A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed. We then define a methodology for developing one of the popular architectures of an FDBS. Finally, we discuss critical issues related to developing and operating an FDBS. Categories and Subject Descriptors: D.2.1 [Software Engineering]: Requirements/ Specifications-methodologies; D.2.10 [Software Engineering]: Design; H.0 [Information Systems]: General; H.2.0 [Database Management]: General; H.2.1 [Database Management]: Logical Design--data models, schema and subs&ma; H.2.4 [Database Management]: Systems; H.2.5 [Database Management]: Heterogeneous Databases; H.2.7 [Database Management]: Database Administration General Terms: Design, Management Additional Key Words and Phrases: Access control, database administrator, database design and integration, distributed DBMS, federated database system, heterogeneous DBMS, multidatabase language, negotiation, operation transformation, query processing and optimization, reference architecture, schema integration, schema translation, system evolution methodology, system/schema/processor architecture, transaction management INTRODUCTION tern (DBMS), and one or more databases that it manages. A federated database sys- Federated Database System tem (FDBS) is a collection of cooperating A database system (DBS) consists of soft- but autonomous component database sys- ware, called a database management sys- tems (DBSs). The component DBSs are ’ The views and conclusions in this paper are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of Bellcore, Intel Corp., or the authors’ past or present affiliations. It is the policy of Bellcore to avoid any statements of comparative analysis or evaluation of vendors’ products. Any mention of products or vendors in this document is done where necessary for the sake of scientific accuracy and precision, or for background information to a point of technology analysis, or to provide an example of a technology for illustrative purposes and should not be construed as either positive or negative commentary on that product or that vendor. Neither the inclusion of a product or a vendor in this paper nor the omission of a product or a vendor should be interpreted as indicating a position or opinion of that product or vendor on the part of the author(s) or of Bellcore. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1990 ACM 0360-0300/90/0900-0183 $01.50 ACM Computing Surveys, Vol. 22, No. 3, September 1990

2.184 l Amit Sheth and James Larson CONTENTS or component DBMS, can be a centralized or distributed DBMS or another FDBMS. The component DBMSs can differ in such aspects as data models, query languages, INTRODUCTION and transaction management capabilities. Federated Database System One of the significant aspects of an Characteristics of Database Systems FDBS is that a component DBS can con- Taxonomy of Multi-DBMS and Federated tinue its local operations and at the same Database Systems Scope and Organization of this Paper time participate in a federation. The inte- 1. REFERENCE ARCHITECTURE gration of component DBSs may be man- 1.1 System Components of a Reference aged either by the users of the federation Architecture or by the administrator of the FDBS 1.2 Processor Types in the Reference Architecture together with the administrators of the 1.3 Schema Types in the Reference Architecture component DBSs. The amount of integra- 2. SPECIFIC FEDERATED DATABASE tion depends on the needs of federation SYSTEM ARCHITECTURES users and desires of the administrators 2.1 Loosely Coupled and Tightly Coupled FDBSs of the component DBSs to participate in 2.2 Alternative FDBS Architectures 2.3 Allocating Processors and Schemas the federation and share their databases. to Computers The term federated database system was 2.4 Case Studies coined by Hammer and McLeod [ 19791 and 3. FEDERATED DATABASE SYSTEM Heimbigner and McLeod [1985]. Since its EVOLUTION PROCESS 3.1 Methodology for Developing a Federated introduction, the term has been used for Database System several different but related DBS archi- 4. FEDERATED DATABASE SYSTEM tectures. As explained in this Introduc- DEVELOPMENT TASKS tion, we use the term in its broader con- 4.1 Schema Translation text and include additional architectural 4.2 Access Control 4.3 Negotiation alternatives as examples of the federated 4.4 Schema Integration architecture. 5. FEDERATED DATABASE SYSTEM The concept of federation exists in many OPERATION contexts. Consider two examples from the 5.1 Query Formulation 5.2 Command Transformation political domain-the United Nations 5.3 Query Processing and Optimization (UN) and the Soviet Union. Both entities 5.4 Global Transaction Management exhibit varying levels of autonomy and 6. FUTURE RESEARCH AND UNSOLVED heterogeneity among the components (sov- PROBLEMS ereign nations and the republics, respec- ACKNOWLEDGMENTS REFERENCES tively). The autonomy and heterogeneity is BIBLIOGRAPHY greater in the UN than in the Soviet Union. GLOSSARY The power of the federation body (the Gen- APPENDIX: Features of Some eral Assembly of the UN and the central FDBS/Multi-DBMS Efforts government of the Soviet Union, respec- tively) with respect to its components in the two cases is also different. Just as peo- ple do not agree on an ideal model or the integrated to various degrees. The software utility of a federation for the political that provides controlled and coordinated bodies and the governments, the database manipulation of the component DBSs is context has no single or ideal model of called a federated database management federation. A key characteristic of a feder- system (FDBMS) (see Figure 1). ation, however, is the cooperation among Both databases and DBMSs play impor- independent systems. In terms of an FDBS, tant roles in defining the architecture of an it is reflected by controlled and sometimes FDBS. Component database refers to a da- limited integration of autonomous DBSs. tabase of a component DBS. A component The goal of this survey is to discuss the DBS can participate in more than one fed- application of the federation concept for eration. The DBMS of a component DBS, managing existing heterogeneous and au- ACM Computing Surveys, Vol. 22, No. 3, September 1990

3. Federated Database Systems l 185 FDBS FDBMS ... Figure 1. An FDBS and its components. tonomous DBSs. We describe various ar- tion management strategies) has been pro- chitectural alternatives and components of posed by Elmagarmid [1987]. Such a a federated database system and explore characterization is particularly relevant to the issues related to developing and oper- the study and development of transaction ating such a system. The survey assumes management in FDBMS, an aspect of an understanding of the concepts in basic FDBS that is beyond the scope of this database management textbooks [ Ceri and paper. Pelagatti 1984; Date 1986; Elmasri and Navathe 1989; Tsichritzis and Lochovsky Distribution 19821 such as data models, the ANSI/ SPARC schema architecture, database de- Data may be distributed among multiple sign, query processing and optimization, databases. These databases may be stored transaction management, and distributed on a single computer system or on multiple database management. computer systems, co-located or geograph- ically distributed but interconnected by a Characteristics of Database Systems communication system. Data may be dis- tributed among multiple databases in dif- Systems consisting of multiple DBSs, of ferent ways. These include, in relational which FDBSs are a specific type, may be terms, vertical and horizontal database par- characterized along three orthogonal di- titions. Multiple copies of some or all of the mensions: distribution, heterogeneity, and data may be maintained. These copies need autonomy. These dimensions are discussed not be identically structured. below with an intent to classify and define Benefits of data distribution, such as in- such systems. Another characterization creased availability and reliability as well based on the dimensions of the networking as improved access times, are well known environment [single DBS, many DBSs in a [Ceri and Pelagatti 19841. In a distributed local area network (LAN), many DBSs in DBMS, distribution of data may be in- a wide area network (WAN), many net- duced; that is, the data may be deliberately works], update related functions of partic- distributed to take advantage of these ben- ipating DBSs (e.g., no update, nonatomic efits. In the case of FDBS, much of the updates, atomic updates), and the types of data distribution is due to the existence of heterogeneity (e.g., data models, transac- multiple DBSs before an FDBS is built. ACM Computing Surveys, Vol. 22, No. 3, September 1990

4.186 l Amit Sheth and James Larson derlying data model used to define data DatabaseSystems structures and constraints. Both represen- Differencesin DBMS tation (structure and constraints) and lan- -data models guage aspects can lead to heterogeneity. (structures,constraints,querylanguages) -system levelsupport l Differences in structure: Different (concurrencycontrol,commit,recovery) data models provide different structural SemanticHeterogeneity primitives [e.g., the information modeled C using a relation (table) in the relational OperatingSystem 0 model may be modeled as a record type -file systems m in the CODASYL model]. If the two rep- -naming, file types,operations m U resentations have the same information -transaction support n content, it is easier to deal with the dif- -interprocess communication I C ferences in the structures. For example, Hardware/System a address can be represented as an entity -instruction set t in one schema and as a composite attri- -data formats8 representation I 0 bute in another schema. If the informa- -configuration n tion content is not the same, it may be very difficult to deal with the difference. Figure 2. Types of heterogeneities. As another example, some data models (notably semantic and object-oriented models) support generalization (and property inheritance) whereas others do not. Many types of heterogeneity are due to l Differences in constraints: Two data technological differences, for example, dif- models may support different con- ferences in hardware, system software straints. For example, the set type in a (such as operating systems), and commu- CODASYL schema may be partially nication systems. Researchers and devel- modeled as a referential integrity con- opers have been working on resolving such straint in a relational schema. CODA- heterogeneities for many years. Several SYL, however, supports insertion and commercial distributed DBMSs are avail- retention constraints that are not cap- able that run in heterogeneous hardware tured by the referential integrity con- and system software environments. straint alone. Triggers (or some other The types of heterogeneities in the da- mechanism) must be used in relational tabase systems can be divided into those systems to capture such semantics. due to the differences in DBMSs and those l Differences in query languages: due to the differences in the semantics of Different languages are used to manipu- data (see Figure 2). late data represented in different data models. Even when two DBMSs support the same data model, differences in their Heterogeneities due to Differences in DBMSs query languages (e.g., QUEL and SQL) An enterprise may have multiple DBMSs. or different versions of SQL supported Different organizations within the enter- by two relational DBMSs could contrib- prise may have different requirements and ute to heterogeneity. may select different DBMSs. DBMSs Differences in the system aspects of the purchased over a period of time may be DBMSs also lead to heterogeneity. Exam- different due to changes in technology. Het- ples of system level heterogeneity include erogeneities due to differences in DBMSs differences in transaction management result from differences in data models and primitives and techniques (including differences at the system level. These are concurrency control, commit protocols, described below. Each DBMS has an un- and recovery), hardware and system ACM Computing Surveys, Vol. 22, No. 3, September 1990

5. Federated Database Systems l 187 software requirements, and communication ams between 61 and 75, it may not be capabilities. possible to correlate it to a score in DB2.CLASS.SCORE because both 73 and 77 would have been represented by a score Semantic Heterogeneity of 7.5. Semantic heterogeneity occurs when there Detecting semantic heterogeneity is a is a disagreement about the meaning, inter- difficult problem. Typically, DBMS sche- pretation, or intended use of the same or mas do not provide enough semantics to related data. A recent panel on semantic interpret data consistently. Heterogeneity heterogeneity [Cercone et al. 19901 showed due to differences in data models also con- that this problem is poorly understood and tributes to the difficulty in identifica- that there is not even an agreement regard- tion and resolution of semantic hetero- ing a clear definition of the problem. Two geneity. It is also difficult to decouple examples to illustrate the semantic heter- the heterogeneity due to differences in ogeneity problem follow. DBMSs from those resulting from semantic Consider an attribute MEAL-COST of heterogeneity. relation RESTAURANT in database DBl that describes the average cost of a meal per person in a restaurant without service Autonomy charge and tax. Consider an attribute by the same name (MEAL-COST) of relation The organizational entities that manage BOARDING in database DB2 that de- different DBSs are often autonomous. In scribes the average cost of a meal per per- other words, DBSs are often under separate son including service charge and tax. Let and independent control. Those who con- both attributes have the same syntactic trol a database are often willing to let others properties. Attempting to compare at- share the data only if they retain control. tributes DBl.RESTAURANTS.MEAL- Thus, it is important to understand the COST and DBS.BOARDING.MEAL- aspects of component autonomy and how COST is misleading because they are they can be addressed when a component semantically heterogeneous. Here the DBS participates in an FDBS. heterogeneity is due to differences in A component DBS participating in an the definition (i.e., in the meaning) of FDBS may exhibit several types of auton- related attributes [Litwin and Abdellatif omy. A classification discussed by Veijalai- 19861. nen and Popescu-Zeletin [ 19881 includes As a second example, consider an attri- three types of autonomy: design, commu- nication, and execution. These and an ad- bute GRADE of relation COURSE in database DBl. Let COURSE.GRADE de- ditional type of component autonomy scribe the grade of a student from the set called association autonomy are discussed of values {A, B, C, D, FJ. Consider another below. attribute SCORE of relation CLASS in da- Design autonomy refers to the ability of tabase DB2. Let SCORE denote a normal- a component DBS to choose its own design ized score on the scale of 0 to 10 derived by with respect to any matter, including first dividing the weighted score of all ex- ams on the scale of 0 to 100 in the course (a) The data being managed (i.e., the Uni- and then rounding the result to the nearest verse of Discourse), half-point. DBl.COURSE.GRADE and (b) The representation (data model, query DBB.CLASS.SCORE are semantically het- language) and the naming of the data erogeneous. Here the heterogeneity is due elements, to different precision of the data values (c) The conceptualization or semantic taken by the related attributes. For exam- interpretation of the data (which ple, if grade C in DBl.COURSE.GRADE greatly contributes to the problem of corresponds to a weighted score of all ex- semantic heterogeneity), ACM Computing Surveys, Vol. 22, No. 3, September 1990

6.188 l Amit Sheth and James Larson (d) Constraints (e.g., semantic integrity and resources (i.e., the data it manages) constraints and the serializability cri- with others. This includes the ability to teria) used to manage the data, associate or disassociate itself from the fed- (e) The functionality of the system (i.e., eration and the ability of a component DBS the operations supported by system), to participate in one or more federations. (f) The association and sharing with other Association autonomy may be treated as systems (see association autonomy be- a part of the design autonomy or as an low), and autonomy in its own right. Alonso and Barbara [1989] discuss the issues that are k) The implementation (e.g., record and relevant to this type of autonomy. file structures, concurrency control A subset of the above types of autonomy algorithms). were also identified by Heimbigner and Heterogeneity in an FDBS is primarily McLeod [1985]. Du et al. [1990] use the caused by design autonomy among compo- term local autonomy for the autonomy of a nent DBSs. component DBS. They define two types of The next two types of autonomy involve local autonomy requirements: operation the DBMS of a component DBS. Commu- autonomy requirements and service auton- nication autonomy refers to the ability of omy requirements. Operation autonomy re- a component DBMS to decide whether quirements relate to the ability of a to communicate with other component component DBS to exercise control over its DBMSs. A component DBMS with com- database. These include the requirements munication autonomy is able to decide related to design and execution autonomy. when and how it responds to a request from Service autonomy requirements relate to the another component DBMS. right of each component DBS to make de- Execution autonomy refers to the ability cisions regarding the services it provides to of a component DBMS to execute local other component DBSs. These include the operations (commands or transactions sub- requirements related to association and mitted directly by a local user of the com- communication autonomy. Garcia-Molina ponent DBMS) without interference from and Kogan [1988] provide a different clas- external operations (operations submitted sification of the types of autonomy. Their by other component DBMSs or FDBMSs) classification is particularly relevant to the and to decide the order in which to execute operating system and transaction manage- external operations. Thus, an external sys- ment issues. tem (e.g., FDBMS) cannot enforce an order The need to maintain the autonomy of of execution of the commands on a com- component DBSs and the need to share ponent DBMS with execution autonomy. data often present conflicting require- Execution autonomy implies that a com- ments. In many practical environments, it ponent DBMS can abort any operation that may not be desirable to support the auton- does not meet its local constraints and that omy of component DBSs fully. Two exam- its local operations are logically unaffected ples of relaxing the component autonomy by its participation in an FDBS. Further- follow: more, the component DBMS does not need to inform an external system of the order l Association autonomy requires that each in which external operations are executed component DBS be free to associate or and the order of an external operation with disassociate itself from the federation. respect to local operations. Operationally, This would require that the FDBS be a component DBMS exercises its execution designed so that its existence and opera- autonomy by treating external operations tion are not dependent on any single in the same way as local operations. component DBS. Although this may be a Association autonomy implies that a com- desirable design goal, the FDBS may ponent DBS has the ability to decide moderate it by requiring that the entry whether and how much to share its func- or departure of a component DBS must tionality (i.e., the operations it supports) be based on an agreement between the ACM Computing Surveys, Vol. 22, No. 3, September 1990

7. Federated Database Systems l 189 federation (i.e., its representative entity Different architectures and types of such as the administrator of the FDBS) FDBSs are created by different levels of and the component DBS (i.e., the admin- integration of the component DBSs and by istrator of a component DBS) and cannot different levels of global (federation) serv- be a unilateral decision of the component ices. We will use the taxonomy shown in DBS. Figure 3 to compare the architectures of l Execution autonomy allows a component various research and development efforts. DBS to decide the order in which exter- This taxonomy focuses on the autonomy nal and local operations are performed. dimension. Other taxonomies are possible Futhermore, the component DBS need by focusing on the distribution and heter- not inform the external system (e.g., ogeneity dimensions. Some recent publica- FDBS) of this order. This latter aspect tions discussing various architectures or of autonomy may, however, be relaxed by different taxonomies include Eliassen and informing the FDBS of the order of Veijalainen [ 19881, Litwin and Zeroual transaction execution (or transaction [ 19881, Ozsu and Valduriez [ 19901, and wait-for graph) to allow simpler and Ram and Chastain [ 19891. more efficient management of global MDBSs can be classified into two types transactions. based on the autonomy of the component DBSs: nonfederated database systems and federated database systems. A nonfederated Taxonomy of Multi-DBMS and Federated database system is an integration of com- Database Systems ponent DBMSs that are not autonomous. A DBS may be either centralized or distrib- It has only one level of management,2 and uted. A centralized DBS system consists of all operations are performed uniformly. In a single centralized DBMS managing a sin- contrast to a federated database system, a gle database on the same computer system. nonfederated database system does not dis- A distributed DBS consists of a single dis- tinguish local and nonlocal users. A partic- tributed DBMS managing multiple data- ular type of nonfederated database system bases. The databases may reside on a single in which all databases are fully integrated computer system or on multiple computer to provide a single global (sometimes called systems that may differ in hardware, sys- enterprise or corporate) schema can be tem software, and communication support. called a unified MDBS. It logically appears A multidatabase system (MDBS) supports to its users like a distributed DBS. operations on multiple component DBSs. A federated database system consists of Each component DBS is managed by (per- component DBSs that are autonomous yet haps a different) component DBMS. A participate in a federation to allow partial component DBS in an MDBS may be cen- and controlled sharing of their data. Asso- tralized or distributed and may reside on ciation autonomy implies that the compo- the same computer or on multiple com- nent DBSs have control over the data they puters connected by a communication sub- manage. They cooperate to allow different system. An MDBS is called a homogeneous degrees of integration. There is no central- MDBS if the DBMSs of all component ized control in a federated architecture be- DBSs are the same; otherwise it is called a cause the component DBSs (and their heterogeneous MDBS. A system that only database administrators) control access to allows periodic, nontransaction-based ex- their data. change of data among multiple DBMSs FDBS represents a compromise between (e.g., EXTRACT [Hammer and Timmer- no integration (in which users must explic- man 19891) or one that only provides access itly interface with multiple autonomous da- to multiple DBMSs one at a time (e.g., no tabases) and total integration (in which joins across two databases) is not called an MDBS. The former is a data exchange sys- * This definition may be diluted to include two levels tem; the latter is a remote DBMS interface of management, where the global level has the author- [Sheth 1987a]. ity for controlling data sharing. ACM Computing Surveys, Vol. 22, No. 3, September 1990

8.190 l Amit Sheth and James Larson Multidatabase will consist of heterogeneous component Systems DBSs. In the rest of this paper, we will use the term FDBS to describe a heterogeneous distributed DBS with autonomy of compo- nent DBSs. Nonfederated Federated FDBSs can be categorized as loosely DatabaseSystems DatabaseSystems coupled or tightly coupled based on who e.g., UNIBASE /\ manages the federation and how the com- [Brzezinskiet 784 \ ponents are integrated. An FDBS is loosely coupled if it is the user’s responsibility to create and maintain the federation and Loosely Coupled Tightly Coupled there is no control enforced by the feder- e.g., MRDSM ated system and its administrators. Other [Litwin 19851 terms used for loosely coupled FDBSs are interoperable database system [Litwin and /\ Abdellatif 19861 and multidatabase system Single Multiple [Litwin et al. 1982].3 A federation is tightly Federation Fedsrations coupled if the federation and its adminis- e.g., DDTS e.g., Mermaid trator(s) have the responsibility for creat- [Dwyerand Larson19871 [Templetonet al. 1987a] ing and maintaining the federation and Figure 3. Taxonomy of multidatabase systems. actively control the access to component DBSs. Association autonomy dictates that, in both cases, sharing of any part of a autonomy of each component DBS is sac- component database or invoking a capabil- rificed so that users can access data through ity (i.e., an operation) of a component DBS a single global interface but cannot directly is controlled by the administrator of the access a DBMS as a local user). The fed- component DBS. erated architecture is well suited for mi- A federation is built by a selective and grating a set of autonomous and stand- controlled integration of its components. alone DBSs (i.e., DBSs that are not sharing The activity of developing an FDBS results data) to a system that allows partial and in creating a federated schema upon which controlled sharing of data without affecting operations (i.e., query and/or updates) are existing applications (and hence preserving performed. A loosely coupled FDBS always significant investment in existing applica- supports multiple federated schemas. A tion software). tightly coupled FDBS may have one or To allow controlled sharing while pre- more federated schemas. A tightly coupled serving the autonomy of component DBSs FDBS is said to have single federation if it and continued execution of existing appli- allows the creation and management of cations, an FDBS supports two types of only one federated schema.* Having a single operations: local and global (or federation). This dichotomy of local and global opera- 3 The term multidatabase has been used by different people to mean different things. For example, Litwin tions is an essential feature of an FDBS. [1985] and Rusinkiewicz et al. [1989] use the term Global operations involve data access using multidatabase to mean loosely coupled FDBS (or in- the FDBMS and may involve data managed teroperable system) in our taxonomy; Ellinghaus et al. by multiple component DBSs. Component [1988] and Veijalainen and Popescu-Zeletin [1988] use it to mean client-server type of FDBS in our taxon- DBSs must grant permission to access the omy; and Dayal and Hwang [1984], Belcastro et al. data they manage. Local operations are [1988], and Breitbart and Silberschatz [1988] use it to submitted to a component DBS directly. mean tightly coupled FDBS in our taxonomy. They involve only data in that component 4 Note that a tightly coupled FDBS with a single DBS. A component DBS, however, does not federated schema is not the same as a unified MDBS but is a special case of the latter. It espouses the need to distinguish between local and global federation concepts such as autonomy of component operations. In moSt environment% the DBMS~, dichotomy of operations, and controlled FDBS will also be heterogeneous, that is, sharing that a unified MDBS does not. ACM Computing Surveys, Vol. 22, No. 3, September 1990

9. Federated Database Systems l 191 federated schema helps in maintaining uni- A type of FDBS architecture called the formity in semantic interpretation of the client-server architecture has been dis- integrated data. A tightly coupled FDBS is cussed by Ge et al. [ 19871 and Eliassen and said to have multiple federations if it allows Veijalainen [1988]. In such a system, there the creation and management of multiple is an explicit contract between a client and federated schemas. Having multiple feder- one or more servers for exchanging infor- ated schemas readily allows multiple inte- mation through predefined transactions. A grations of component DBSs. Constraints client-server system typically does not al- involving multiple component DBS, how- low ad hoc transactions because the server ever, may be difficult to enforce. An orga- is designed to respond to a set of predefined nization wanting to exercise tight control requests. The schema architecture of a over the data (treated as a corporate re- client-server system is usually quite simple. source) and the enforcement of constraints The schema of each server is directly (including the so-called business rules) may mapped to the schema of the client. Thus choose to allow only one federated schema. the client-server architecture can be con- The terms federated database system and sidered to be a tightly coupled one for federated database architecture were intro- FDBS with multiple federations. duced by Heimbigner and McLeod [1985] to mean “collection of components to unite Scope and Organization of this Paper loosely coupled federation in order to share and exchange information” and “an orga- Issues involved in managing an FDBS deal nization model based on equal, autonomous with distribution, heterogeneity, and au- databases, with sharing controlled by ex- tonomy. Issues related to distribution have plicit interfaces.” The multidatabase archi- been addressed in past research and devel- tecture of Litwin et al. [1982] shares many opment efforts on distributed DBMSs. We features of the above architecture. These will concentrate on the issues of autonomy definitions include what we have defined as and heterogeneity. Recent surveys on the loosely coupled FDBSs. The key FDBS related topics include Barker and Ozsu concepts, however, are autonomy of com- [1988]; Litwin and Zeroual [1988]; Ram ponents, and partial and controlled sharing and Chastain [ 19891, and Siegel [1987]. of data. These can also be supported when The remainder of this paper is organized the components are tightly coupled. Hence as follows. In Section 1 we discuss a refer- we include both loosely and tightly coupled ence architecture for DBSs. Two types of FDBSs in our definition of FDBSs. system components-processors and sche- MRDSM [Litwin 19851, OMNIBASE mas-are particularly applicable to FDBSs. [Rusinkiewicz et al. 19891, and CALIDA In Section 2 we use the processors and [Jacobson et al. 19881 are examples of schemas to define various FDBS architec- loosely coupled FDBSs. In CALIDA, fed- tures. In Section 3 we discuss the phases in erated schemas are generated by a database an FDBS evolution process. We also dis- administrator rather than users as’in other cuss a methodology for developing a tightly loosely coupled FDBSs. Users must be rel- coupled FDBS with multiple federations. atively sophisticated in other loosely cou- In Section 4 we discuss four important pled FDBSs to be able to define schemas/ tasks in developing an FDBS: schema views over multiple component DBSs. translation, access control, negotiation, and SIRIUS-DELTA [Litwin et al. 19821 and schema integration. In Section 5 we discuss DDTS [Dwyer and Larson 19871 can be four tasks relevant to operating an FDBS: categorized as tightly coupled FDBSs with query formulation, command transforma- single federation. Mermaide [Templeton tion, query processing and optimization, et al. 1987131and Multibase [Landers and and transaction management. Section 6 Rosenberg 19821 are examples of tightly summarizes and discusses issues that need coupled FDBSs with multiple federations. further research and development. The paper ends with references, a comprehen- @Mermaid is a trademark of Unisys Corporation. sive bibliography, a glossary of the terms ACM Computing Surveys, Vol. 22, No. 3, September 1990

10.192 l Amit Sheth and James Larson used throughout this paper, and an appen- structure descriptions) (e.g., table defi- dix comparing some features of relevant nitions in a relational model), and entity prototype efforts. types and relationship types in the entity-relationship model. 1. REFERENCE ARCHITECTURE l Mappings: Mappings are functions that correlate the schema objects in one A reference architecture is necessary to schema to the schema objects in another clarify the various issues and choices within schema. a DBS. Each component of the reference architecture deals with one of the impor- These basic components can be com- tant issues of a database system, federated bined in different ways to produce different or otherwise, and allows us to ignore details data management architectures. Figure 4 irrelevant to that issue. We can concentrate illustrates the iconic symbols used for each on a small number of issues at a time by of these basic components. The reasons for analyzing a single component. A reference choosing these components are as follows: architecture provides the framework in l Most centralized, distributed, and feder- which to understand, categorize, and com- ated database systems can be expressed pare different architectural options for de- using these basic components. veloping federated database systems. Section 1.1 discusses the basic system com- l These components hide many of the ponents of a reference architecture. Section implementation details that are not 1.2 discusses various types of processors relevant to understanding the im- and the operations they perform on com- portant differences among alternate mands and data. Section 1.3 discusses a architectures. schema architecture of a reference archi- Two basic components, processors and tecture. Other reference architectures de- schemas, play especially important roles scribed in the literature include Blakey in defining various architectures. The pro- [ 19871, Gligor and Luckenbaugh [ 19841, cessors are application-independent soft- and Larson [ 19891. ware modules of a DBMS. Schemas are application-specific components that de- 1.1 System Components of a Reference fine database contents and structure. They Architecture are developed by the organizations to which the users belong. Users of a DBS include A reference architecture consists of various both persons performing ad hoc operations system components. Basic types of system and application programs. components in our reference architecture are as follows: 1.2 Processor Types in the Reference Data: Data are the basic facts and in- Architecture formation managed by a DBS. Database: A database is a repository of Data management architectures differ in data structured according to a data the types of processors present and the model. relationships among those processors. There are four types of processors, each Commands: Commands are requests performing different functions on data ma- for specific actions that are either entered nipulation commands and accessed data: by a user or generated by a processor. transforming processors, filtering proces- Processors: Processors are software sors, constructing processors, and accessing modules that manipulate commands and processors. Each of the processor types is data. discussed below. Schemas: Schemas are descriptions of data managed by one or more DBMSs. A 1.2.1 Transforming Processor schema consists of schema objects and their interrelationships. Schema objects Transforming processors translate com- are typically class definitions (or data mands from one language, called source ACM Computing Surveys, Vol. 22, No. 3, September 1990

11. Federated Database Systems l 193 Component Icon (with [Onuegbe et al. 1983; Zaniolo 19791, Example) allowing a CODASYL DBS to be proc- Type essed using SQL commands. l A program generator that translates SQL commands into equivalent COBOL pro- Processor grams allowing a file system to be proc- essed using SQL commands. Command For some command-transforming pro- cessors, there may exist companion data- transforming processors that convert data Data produced by the transformed commands <--ii-> into data compatible with the commands in the source format. For example, a data- transforming processor that is the com- panion to the above SQL-to-CODASYL Schema command-transforming processor is a table builder that accepts individual database records produced by the CODASYL DBMS and builds complete tables for display to Information the SQL user. Mapping Figure 5(a) illustrates a pair of compan- ion transforming processors. Using infor- mation from schema A, schema B, and the mappings between them, the command- transforming processor converts com- Database mands expressed using schema A’s descrip- tion into commands expressed using schema B’s description. Using the same information, the companion data- transforming processor transforms data Figure 4. Basic system components of the data man- agement reference architecture. described using schema B’s description into data described using schema A’s description. To perform these transformations, a language, to another language, called target transforming processor needs mappings be- language, or transform data from one tween the objects of each schema. The task format (source format) to another format of schema translation involves transform- (target format). Transforming processors ing a schema (schema A) describing data in provide a type of data independence called one data model into an equivalent schema data model transparency in which the data (schema B) describing the same data in a structures and commands used by one pro- different data model. This task also gener- cessor are hidden from other processors. ates the mappings that correlate the Data model transparency hides the dif- schema objects in one schema (schema B) ferences in query languages and data for- to the schema objects in another schema mats. For example, the data structures (schema A). The task of command transfor- used by one processor can be modified to mation entails using these mappings to improve overall efficiency without requiring translate commands involving the schema changes to other processors. Examples of objects of one schema (schema B) into com- command-transforming processors include mands involving the schema objects of the the following: other schema (schema A). The schema l A command transformer that trans- translation problem and the command lates SQL commands into CODASYL transformation problem are further dis- data manipulation language commands cussed in Sections 4.1 and 5.2, respectively. ACM Computing Surveys, Vol. 22, No. 3, September 1990

12.194 . Amit Sheth and James Larson CA Schema B (b) Figure5. Transforming processors. (a) A pair of companion transforming processors. (b) An abstract transforming processor. Mappings are associated with a trans- command-transforming processor and data forming processor in one of two ways. In converter pair. the first case, the mappings are encoded into the transforming processor’s logic, making the transforming processor specific 1.2.2 Filtering Processor to the schemas. Alternatively, the map- Filtering processors constrain the com- pings are stored in a separate data structure mands and associated data that can be and accessed by the transforming processor passed to another processor. Associated when converting commands and data. This with each filtering processor are mappings is a more general approach. It may also be that describe the constraints on commands possible to generate a transforming proces- and data. These constraints may either be sor for transforming specific commands embedded into the code of the filtering or data automatically. For example, an processor or be specified in a separate data SQL-to-COBOL program generator might structure. Examples of filtering processors generate a specific data-transforming pro- include the following: cessor, the generated COBOL program, that converts data to the required form. Syntactic constraint checker, which For the remainder of this paper we will checks commands to verify that they are illustrate a command-transforming proces- syntactically correct. sor and data converter pair as a single Semantic integrity constraint checker, transforming processor as illustrated in which performs one or more of the follow- Figure 4(b). This higher-level abstraction ing functions: (a) checks commands to enables us to hide the differences between verify that they will not violate semantic a single data-transforming processor, a sin- integrity constraints, (b) modifies com- gle command-transforming processor, or a mands in such a manner that when the ACM Computing Surveys, Vol. 22, No. 3, September 1990

13. I CommandFiltering Processor the Data Structures (4 (b) Figure 6. Filtering processors. (a) A pair of companion filtering processors. (b) An abstract filtering processor. commands are interpreted, semantic in- than one way to translate an update com- tegrity constraints will automatically be mand. We do not discuss the view update enforced, or (c) verifies that data pro- task in more detail because we feel that a duced by another processor does not vi- loosely coupled FDBS is not well suited to olate any semantic integrity constraint. support updates, and solving this problem l Access controller, which verifies that the in a tightly coupled FDBS is very similar user is permitted to perform the com- to solving it in a centralized or distributed mand on the indicated data or verifies DBMS [Sheth et al. 1988a]. that the user is permitted to use data produced by another processor. 1.2.3 Constructing Processor Figure 6(a) illustrates two filtering pro- Constructing processors partition and/or cessors, one that controls commands and replicate an operation submitted by a single one that controls data. Again, we will ab- processor into operations that are accepted stract command- and data-filtering proces- by two or more other processors. Construct- sors into a single filtering processor as ing processors also merge data produced by illustrated in Figure 6(b). several processors into a single data set for An important task that may be solved by consumption by another single processor. a filtering processor is that of view update. They can support location, distribution, This task occurs when the differences in and replication transparencies because a data structures between the view and the processor submitting a command does not schema is such that there may be more need to know the location, distribution, and ACM Computing Surveys, Vol. 22, No. 3, September 1990

14. 196 . Amit Sheth and James Larson <a> iYGzA /Data Exoressed\ (2 Schema A (b) Figure 7. Constructing processors. (a) A pair of constructing processors. (b) An abstract constructing processor. number of processors participating in pro- of companion constructing processors. Us- cessing that command. ing information from schema A, schema B, Tasks that can be handled by construct- schema C, and the mappings from schema ing processors include the following: A to schemas B and C, the command de- composer uses the commands expressed us- Schema integration: Integrating mul- ing the schema A objects to generate the tiple schemes into a single schema commands using the objects in schemas B Negotiation: Determining what proto- and C. Schema A is an integrated schema col should be used among the owners of that contains a description of all or parts various schemas to be integrated in de- of the data described by schemas B and C. termining the contents of an integrated Using the same information, the data schema merger generates data in the format of Query (command) decomposition schema A objects from data in the formats and optimization: Decomposing and of the objects in schemas B and C. optimizing a query (command) expressed Again we will abstract the command par- on an integrated schema titioner and data merger pair into a single constructing processor as illustrated in Global transaction management: Performing the concurrency and atomic- Figure 7(b). ity control 1.2.4 Accessing Processor These issues are further discussed in Sec- An accessing processor accepts commands tions 4 and 5. Figure 7(a) illustrates a pair and produces data by executing the ACM Computing Surveys, Vol. 22, No. 3, September 1990

15. Federated Database Systems l commands against a database. It may ac- cept commands from several processors and interleave the processing of those com- mands. Examples of accessing processors include the following: l A file management system that executes access procedures against stored file l A special application program that ac- cepts commands and generates data to be returned to the processor generating the Figure 8. Accessing processor. commands l A data manager of a DBMS containing data access methods those structures. It is an attempt to de- l A dictionary manager that manages ac- scribe all data of interest to an enterprise. cess to dictionary data In the context of the ANSI/X3/SPARC architecture, it is a database schema as Figure 8 illustrates an accessing processor expressed in the data definition language that accepts data manipulation commands of a centralized DBMS. The internal and uses access methods to retrieve data schema describes physical characteristics of from the database. the logical data structures in the conceptual Issues that are addressed by accessing schema. These characteristics include in- processors include local concurrency con- formation about the placement of records trol, commitment, backup, and recovery. on physical storage devices, the placement These problems and their solutions are ex- and type of indexes and physical represen- tensively discussed in the literature for cen- tation of relationships between logical rec- tralized and distributed DBMSs. Some of ords. Much of the description in the the issues of adapting these problems to internal schema can be changed without deal with heterogeneity and autonomy in having to change the conceptual schema. the FDBSs are discussed in Section 5.4. By making changes to the description in the internal schema and making the cor- 1.3 Schema Types in the Reference responding changes to the data in the da- Architecture tabase, it is possible to change the physical In this section, we first review the standard representation without changing any appli- three-level schema architecture for central- cation program source code. Thus it is ized DBMSs. We then extend it to a five- possible to fine tune the physical represen- level architecture that addresses the tation of data and optimize the perfor- requirements of dealing with distribution, mance of the DBMS in providing database autonomy, and heterogeneity in an FDBS. access for selected applications. Most users do not require access to all of the data in a database. Thus they do not 1.3.1 ANSIISPARC Three-Level Schema require access to all of the schema objects Architecture in the conceptual schema. Each user or The ANSI/X3/SPARC Study Group on class of users may require access to only a Database Systems outlined a three-level portion of the database. The subset of the data description architecture [Tsichritzis database that may be accessed by a user or and Klug 19781. The three levels of data a class of users is described by an external description are the conceptual schema, the schema. Because different users may need internal schema, and the external schema. access to different portions of the database, A conceptual schema describes the con- each user or a class of users may require a ceptual or logical data structures (i.e., the separate external schema. schema consists of objects that provide a In terms of the above constructs, filtering conceptual- or logical-level description of processors use the information in the ex- the database) and the relationships among ternal schemas to control what data can be ACM Computing Surveys, Vol. 22, No. 3, September 1990

16.198 . Amit Sheth and James Larson Filtering Processor n Transforming Processor m Internal Accessing Processor Figure 9. System architecture of a centralized DBMS. accessed by which users. A transforming chitecture for federated systems shown in processor translates commands expressed Figure 10. A system architecture consisting using the conceptual schema objects into of both processors and schemas of an FDBS commands using the internal schema ob- is shown in Figure 11. jects. An accessing processor executes the The five-level schema architecture of an commands to retrieve data from the phys- FDBS includes the following: ical media. A system architecture consist- ing of both processors and schemas of a Local Schema: A local schema is the con- centralized DBS is shown in Figure 9. ceptual schema of a component DBS. A local schema is expressed in the native data model of the component DBMS, and hence 1.3.2 A Five-Level Schema Architecture for different local schemas may be expressed Federated Databases in different data models. The three-level schema architecture is ad- Component Schema: A component equate for describing the architecture of a schema is derived by translating local sche- centralized DBMS. It, however, is inade- mas into a data model called the canonical quate for describing the architecture of an or common data model (CDM) of the FDBS. FDBS. The three-level schema must be ex- Two reasons for defining component sche- tended to support the three dimensions of mas in a CDM are (1) they describe the a federated database system-distribution, divergent local schemas using a single rep- heterogeneity, and autonomy. Examples of resentation and (2) semantics that are extended schema architectures include a missing in a local schema can be added to four-level schema architecture in Mermaid its component schema. Thus they facilitate [Templeton et al. 1987131,five-level schema negotiation and integration tasks per- architectures in DDTS [Devor et al. 1982b] formed when developing a tightly coupled and SIRIUS-DELTA [Litwin et al. 19821, FDBS. Similarly, they facilitate negotia- and others [Blakey 1987; Ram and tion and specification of views and multi- Chastain 19891. We have adapted these database queries in a loosely coupled architectures for our five-level schema ar- FDBS. ACM Computing Surveys, Vol. 22, No. 3, September 1990

17. Federated Database Systems . 199 I Local bb b Schema Figure 10. Five-level schema architecture of an FDBS. I onstructinq Processor onstructing Processor onstructina Processor Filtering Processor Filtering Processor Filtering Processor (F) (F) (Campon:nt) Transforming Processor Transforming Processor Figure 11. System architecture for an FDBS. I)’ The process of schema translation from schema objects. Transforming processors a local schema to a component schema use these mappings to transform com- generates the mappings between the com- mands on a component schema into com- ponent schema objects and the local mands on the corresponding local schema. ACM Computing Surveys, Vol. 22, No. 3, September 1990

18.200 l Amit Sheth and James Larson Such transforming processors and the com- similar to that of federated schema is rep- ponent schemas support the heterogeneity resented by the terms import schema feature of an FDBS. [Heimbigner and McLeod 19851, global schema [Landers and Rosenberg 1982J, Export Schema: Not all data of a com- global conceptual schema [Litwin et al. ponent DBS may be available to the fed- 19821, unified schema, and enterprise eration and its users. An export schema schema, although the terms other than im- represents a subset of a component schema port schemas are usually used when there that is available to the FDBS. It may in- is only one such schema in the system. clude access control information regarding its use by specific federation users. The External Schema: An external schema purpose of defining export schemas is to defines a schema for a user and/or appli- facilitate control and management of asso- cation or a class of users/applications. Rea- ciation autonomy. A filtering processor can sons for the use of external schemas are as be used to provide the access control as follows: specified in an export schema by limiting the set of allowable operations that can be l Customization: A federated schema submitted on the corresponding component can be quite large, complex, and difficult schema. Such filtering processors and the to change. An external schema can be export schemas support the autonomy fea- used to specify a subset of information in ture of an FDBS. a federated schema that is relevant to the Alternatively, the data available to the users of the external schema. They can FDBS can be defined as the transactions be changed more readily to meet chang- that can be executed by a component DBS ing users’ needs. The data model for an (e.g., [Ge et al. 1987; Heimbigner and external schema may be different than McLeod 1985; Veijalainen and Popescu- that of the federated schema. Zeletin 19881). In this paper, however, we Additional integrity constraints: will not consider that case of exporting Additional integrity constraints can also transactions. be specified in the external schema. Access control: Export schemas pro- Federated Schema: A federated schema vide access control with respect to the is an integration of multiple export sche- data managed by the component data- mas. A federated schema also includes the bases. Similarly, external schemas pro- information on data distribution that is vide access control with respect to the generated when integrating export sche- data managed by the FDBS. mas. Some systems use a separate schema called a distribution schema or an allocation A filtering process analyzes the com- schema to contain this information. A con- mands on an external schema to ensure structing processor transforms commands their conformance with access control and on the federated schema into the com- integrity constraints of the federated mands on one or more export schemas. schema. If an external schema is in a dif- Constructing processors and the federated ferent data model from that of the federated schemas support the distribution feature of schema, a transforming processor is also an FDBS. needed to transform commands on the ex- There may be multiple federated sche- ternal schema into commands on the fed- mas in an FDBS, one for each class of erated schema. federation users. A class of federation users Most existing prototype FDBSs support is a group of users and/or applications per- only one data model for all the external forming a related set of activities. For ex- schemas and one query language interface. ample, in a corporate environment, all Exceptions are a version of Mermaid that managers may be one class of federation supported two query language interfaces, users, and all employees and applications SQL and ARIEL, and a version of DDTS in the accounting department may be an- that supported SQL and GORDAS (a other class of federation users. A concept query language for an extended ER model). ACM Computing Surveys, Vol. 22, No. 3, September 1990

19. Federated Database Systems 201 Future systems are likely to provide ing local schema. The additional semantics more support for multimode1 external are supplied by the FDBS developer during schemas and multiquery language interfaces the schema design, integration, and trans- [Cardenas 1987; Kim 19891. lation processes. The five-level schema architecture Besides adding to the levels in the presented above has several possible schema architecture, heterogeneity and au- redundancies. tonomy requirements may also dictate changes in the content of a schema. For Redundancy between external and example, if an FDBS has multiple hetero- federated schemas: External schemas geneous DBMSs providing different data can be considered redundant with feder- management capabilities, a component ated schemas since a federated schema schema should contain information on the could be generated for every different operations supported by a component federation user. This is the case in the DBMS. schema architecture of Heimbigner and An FDBS may be required to support McLeod [ 19851 (they use the term import local and external schemas expressed in schema rather than federated schema). In different data models. To facilitate their loosely coupled FDBSs, a user defines the design, integration, and maintenance, how- federated schema by integrating export ever, all component, export, and federated schemas. Thus there is usually no need schemas should be in the same data model. for an additional level. In tightly coupled This data model is called canonical or com- FDBSs, however, it may be desirable to mon data model (CDM). A language asso- generate a few federated schemas for ciated with the CDM is called an internal widely different classes of users and to command language. All commands on fed- customize these further by defining ex- erated, export, and component schemas are ternal schemas. Such external schemas expressed using this internal command can also provide additional access language. control. Database design and integration is a Redundancy between an external complex process involving not only the schema of a component DBS and an structure of the data stored in the databases export schema: If a component DBMS but also the semantics (i.e., the meaning supports proper access control security and use) of the data. Thus it is desirable to features for its external schemas and if use a high-level, semantic data model [Hull translating a local schema into a compo- and King 1987; Peckham and Maryanski nent schema is not required (e.g., the data 19881 for the CDM. Using concepts from model of the component DBMS is the object-oriented programming along with a same as CDM of the FDBS), then the semantic data model may also be appropri- external schemas of a component DBS ate for use as a CDM [Kaul et al. 19901. may be used as an export schema in the Although many existing FDBS prototypes five-level schema architecture (external use some form of the relational model as schemas of component DBSs are not the CDM (Appendix), we believe that fu- shown in the five-level schema architec- ture systems are more likely to use a se- ture of Figure 10). mantic data model or a combination of an Redundancy between component object-oriented model and a semantic data schemas and local schemas: When model. Most of the semantic data models component DBSs uses CDM of the will adequately meet requirements of a FDBS and have the same functionality, CDM, and the choice of a particular one is it is unnecessary to define component likely to be subjective. Because a CDM schemas. using a semantic data model may provide richer semantic constructs than the data Figure 12 shows an example in which models used to express the local schemas, some of the schema levels are not used. No the component schema may contain more external schemas are defined over Feder- semantic information than the correspond- ated Schema 2 (all of it is presented to all ACM Computing Surveys, Vol. 22, No. 3, September 1990

20.202 l Amit Sheth and James Larson Figure 12. Example FDBS schemas with missing schemas at some levels. federation users using it). Component in one type of schema may also vary from Schema 2 is the same as the Local Schema those in another type of schema. For ex- 2 (the data model of the Component DBMS ample, a federated schema may have 2 is the same as the CDM). No export schema objects describing the capabilities schema is defined over Component Schema of the various component DBMSs in the 3 (all of it is exported to the FDBS). system, whereas no such objects exist in An important type of information asso- the local schemas. ciated with all FDBS schemas is the map- Two important features of the schema pings. These correlate schema objects at architecture are how autonomy is preserved one level with the schema objects at the and how access control is managed. These next lower level of the architecture. Thus, involve exercising control over schemas at there are mappings from each external different levels. Two types of administra- schema to the federated schema over which tive individuals are involved in developing, it is defined. Similarly, there are mappings controlling, and managing an FDBS: from each federated schema to all of the export schemas that define it. The map- l A component DBS administrator (com- pings may either be stored as a part of the ponent DBA) manages a component schema information or as distinct objects DBS. There is one component DBA5 for within the FDBS data dictionary (which each component DBS. The local, com- also stores schemas). The amount of dic- ponent, and export schemas are con- tionary information needed to describe a trolled by the component DBAs of the schema object in one type of schema may respective component DBSs. A key man- be different from that needed for another agement function of a component DBA type of schema. For example, the descrip- tion of an entity type in a federated schema may include the names of the users that ’ Here a database administrator is a logical entity. In reality, multiple authorized individuals may play the can access it, whereas such information is role of a single (logical) DBA, or the same individual not stored for an entity type in a compo- may play the role of the component DBA for multiple nent schema. The types of schema objects component DBSs. ACM Computing Surveys, Vol. 22, No. 3, September 1990

21. Federated Database Systems l 203 is to define the export schemas that spec- mas” in Heimbigner and McLeod [1985]), ify the access rights of federation users defining a view using a set of operators to access different data in the component (e.g., defining “superviews” in Motro databases. and Buneman [1981]), or defining a view . A federation DBA defines and manages a using a query in a multidatabase lan- federated schema and the external sche- guage ([Czejdo et al. 1987; Litwin and mas related to the federated schema. Abdellatif 19861; see Section 5.1). In a There can be one federation DBA for tightly coupled FDBS, it takes the form of each federated schema or one federation schema integration ([Batini et al. 19861; see DBA for the entire FDBS. Each federa- Section 4.4). tion DBA in a tightly coupled FDBS is a A typical process of developing federated specially authorized system administra- schemas in a loosely coupled FDBS is as tor and is not a federation user. In a follows. Each federation user is the admin- loosely coupled FDBS, federated schemas istrator of his or her own federated schema. are defined and maintained by the users, First, a federation user looks at the avail- not by the system-assigned federation able set of export schemas to determine DBA. This is further discussed in Sec- which ones describe data he or she would tion 2.1. like to access. Next, the federation user defines a federated schema by importing 2. SPECIFIC FEDERATED DATABASE the export schema objects by using a user SYSTEM ARCHITECTURES interface or an application program or by defining a multidatabase language query The architecture of an FDBS is primarily that references export schema objects. The determined by which schemas are present, user is responsible for understanding the how they are arranged, and how they are semantics of the objects in the export sche- constructed. In this section, we begin by mas and resolving the DBMS and semantic discussing the loosely coupled and tightly heterogeneity. In some cases, component coupled architectures of our taxonomy in DBMS dictionaries and/or the federated additional detail. Then we discuss how sev- DBMS dictionary may be consulted for ad- eral alternate architectures can be derived ditional information. Finally, the federated from the five-level schema architecture by schema is named and stored under account inserting additional basic components, re- of the federation user who is its owner. It moving all basic components of a specific can be referenced or deleted at any time by type, and arranging the components of the that federation user. five-level schema architecture in different A typical scenario for the administration ways. We then discuss assignment of com- of a tightly coupled FDBS is as follows. For ponents to computers. Finally, we briefly simplicity, we assume single (logical) fed- discuss four case studies. eration DBA for the entire tightly coupled FDBS. Export schemas are created by ne- 2.1 Loosely Coupled and Tightly Coupled gotiation between a component DBA and FDBSs the federation DBA; the component DBA has authority or control over what is in- With the background of Section 1, we dis- cluded in the export schemas. The federa- cuss distinctions between the loosely cou- pled and tightly coupled FDBSs in more tion DBA is usually allowed to read the detail. component schemas to help determine what data are available and where they are located and then negotiate for their access. 2.1.1 Creation and Administration of Federated The federation DBA creates and controls Schemas the federated schemas. External schemas The process of creating a federated schema are created by negotiation between a fed- takes different forms. In a loosely coupled eration user (or a class of federation users) FDBS, it typically takes the form of schema and the federation DBA who has the importation (e.g., defining “import sche- authority over what is included in each ACM Computing Surveys, Vol. 22, No. 3, September 1990

22. 204 l Amit Sheth and James Larson external schema. It may be possible to in- eration DBA [Litwin and Abdellatif stitute detailed and well-defined negotia- 19861. tion protocols as well as business rules (or some types of constraints) for creating, An example of multiple semantics is as modifying, and maintaining the federated follows. Suppose that there are two export schemas. schemas, each containing the entity SHOE. Based on how often the federated sche- The colors of SHOE in one component mas are created and maintained as well as schema, schemal, are brown, tan, cream, on their stability, an FDBS may be termed white, and black. The colors of SHOE in dynamic or static. Properties of a dynamic the other component schema, schema2, are FDBS are as follows: (a) A federated brown, tan, white, and black. Users defin- schema can be promptly created and ing different federated schemas may define dropped; (b) there is no predetermined pro- different mappings that are relevant to cess for controlling the creation of a feder- their applications. For example, ated schema. As described above, defining a federated schema in a loosely coupled l User1 maps cream in his federated sche- FDBS is like creating a view over the sche- mas to cream in schema1 and tan in mas of the component DBSs. Since such a schema2, federated schema may be managed on the l User2 maps cream in her federated fly (created, changed, dropped easily) by a schema to tan or cream in schema1 and user, loosely coupled FDBSs are dynamic. tan or white in schema2. A tightly coupled federation is almost al- ways static because creating a federated Proponents of the loosely coupled archi- schema is like database schema integration. tecture argue that a federated schema cre- A federated schema in a tightly coupled ated and maintained by a single federation FDBS evolves gradually and in a more con- DBA is utopian and totalitarian in nature trolled fashion. [Litwin 1987; Rusinkiewicz 19871. We feel that a loosely coupled approach may be better suited for integrating a large number 2.1.2 Case for Loosely Coupled FDBS of very autonomous read only databases A loosely coupled FDBS provides an inter- accessible over communication networks face to deal with multiple component (e.g., public databases of the types dis- DBMSs directly. A typical way to formulate cussed by Litwin and Abdellatif [ 19861). queries is to use a multidatabase language User management of federated schemas (see Section 5.1). This architecture has the means that the FDBMS can do little to - following advantages: optimize queries. In most cases, however, the users are free to use their own under- l A user can precisely specify relationships standing of the component DBSs to design and mappings among objects in the ex- a federated schema and to specify queries port schema. This is desirable when the to achieve good performance. federation DBA is unable to specify the mappings in order to integrate data in multiple databases in a manner meaning- 2.1.3 Case for Tightly Coupled FDBS ful to the user’s precise needs [Litwin The loosely coupled approach may be ill and Abdellatif 19861. suited for more traditional business or cor- l It is also possible to support multiple porate databases, where system control (via semantics since different users can im- DBAs that represent local and federation port or integrate export schemas differ- level authories) is desirable, where the users ently and maintain different mappings are naive and would find it difficult to from their federated schemas to export perform negotiation and integration them- schemas. This can be a significant advan- selves, or where location, distribution, and tage when the needs of the federation replication transparencies are desirable. users cannot be anticipated by the fed- Furthermore, in our opinion, a loosely ACM Computing Surveys, Vol. 22, No. 3, September 1990

23. Federated Database Systems l 205 coupled FDBS is not suitable for update grated to develop a single federated schema. operations. Updating in a loosely coupled Sometimes an organization will insist on FDBS may degrade data integrity. When a having a single federated schema (also user of a loosely coupled FDBSs creates called enterprise schema or global concep- a federated schema using a view definition tual schema) to have a single point of con- process, view update transformations are trol for all data sharing in the organization often not determined. The users may not across the component DBS boundaries. Us- have complete information on the compo- ing a single federated schema helps in de- nent DBSs and different users may use fining uniform semantics of the data in the different semantic interpretations of the FDBS. With a single federated schema, it data managed by the component DBSs (i.e., is also easier to enforce constraints that loosely coupled FDBSs support multiple cross export schemas (and hence multiple semantic interpretations). Thus different databases) then when multiple federated users can define different federated sche- schemas are allowed. mas over the same component DBSs, and Because one federated schema is created different transformations may be chosen by integrating all export schemas and be- for the same updates submitted on different cause this federated schema supports data federated schemas. Similar problems can requirements of all federation users, it may occur in a tightly coupled FDBS with mul- become too large and hence difficult to tiple federations but can be resolved at the create and maintain. In this case, it may time of federated schema creation through become necessary to support external sche- schema integration. A federation DBA cre- mas for different federation users. ating a federated schema using a schema A tightly coupled FDBS with multiple integration process can be expected to have federations allows the tailoring of the use more complete knowledge of the compo- of the FDBS with respect to multiple nent DBSs and other federated schemas. classes of federation users with different In addition to the update transformation data access requirements. Integrations of issue, transaction management issues need the same set of schemas can lead to differ- to be addressed (see Section 5.4). ent integrated schemas if different seman- A tightly coupled FDBS provides loca- tics are used. Thus this architecture can tion, replication, and distribution transpar- support multiple semantics, but the seman- ency. This is accomplished by developing a tics are decided upon by the federation federated schema that integrates multiple DBAs when defining the federated schemas export schemas. The transparencies are and their mappings to the export schemas. managed by the mappings between the fed- A federation user can select from among erated schema and the export schemas, and multiple alternative mappings by selecting a federation user can query using a classical from among multiple federated schemas. query language against the federated When an FDBS allows updates, multiple schema with an illusion that he or she is semantics could lead to inconsistencies. For accessing a single system. A loosely coupled this reason, federation DBAs have to be system usually provides none of these very careful in developing the federated transparencies. Hence a user of a loosely schemas and their mappings to the export coupled FDBS has to be sophisticated to schemas. Updates are easier to support in find appropriate export schemas that can tightly coupled FDBSs where DBAs care- provide required data and to define map- fully define mappings than in a loosely pings between his or her federated schema coupled FDBS where the users define the and export schemas. Lack of adequate se- mappings. mantics in the component schemas make this task particularly difficult. Let us now 2.2 Alternative FDBS Architectures discuss two alternatives for tightly coupled FDBSs in more detail. In this section, we discuss how processors In a tightly coupled FDBS with a single and schemas are combined to create various federation, all export schemas are inte- FDBS architectures. ACM Computing Surveys, Vol. 22, No. 3, September 1990

24. 206 . Amit Sheth and James Larson 2.2.1 A Complete Architecture of a Tightly component schemas. Mermaid [Temple- Coupled FDBS ton et al. 1987b] falls into this category.‘j An architecture of a tightly coupled FDBS, No filtering processors or export shown in Figure 11, consists of multiple schemas: All of the component schemas basic components as described below. are integrated into a single federated schema resulting in a tightly coupled sys- Multiple l export schemas and filter- tem in which component DBAs do not ing processors: Any number of exter- control what users can access. This ar- nal schemas can be defined, each with its chitecture fails to support component own filtering processor. Each external DBS autonomy fully. UNIBASE [Brze- schema supports the data requirements zinski et al. 19841 is in this category, and of a single federation user or a class of hence it is classified as a nonfederated federation users. system. l Multiple federated schemas and con- No constructing processor: The user structing processors: Any number of or programmer performs the constructing federated schemas can be defined, each process via a query or application pro- with its own constructing processor. Each gram containing references to multiple federated schema may integrate different export schemas. The programmer must export schemas (and the same export be aware of what data are available in schema may be integrated differently in each export schema and whether data are different federated schemas). replicated at multiple sites. This archi- l Multiple export schemas and filter- tecture, classified as a loosely coupled ing processors: Multiple export sche- FDBS, fails to support location, distri- mas represent different parts of a bution, and replication transparencies. If database to be integrated into different data are copied or moved between com- federated schemas. A filtering processor ponent databases, any query or applica- associated with an export schema sup- tion using them must be modified. ports access control for the related com- In practice, two processors may be com- ponent schema. bined into a single module, or two schemas . Multiple component schemas and may be combined into a single implemen- transforming processors: Each com- tation schema. For example, a component ponent schema represents a different schema and its export schemas are fre- component database expressed in the quently combined into a single schema with CDM. Each transforming processor a single processor that performs both trans- transforms a command expressed on the formation and filtering. associated component schema into one or more commands on the corresponding 2.2.3 Architectures with Additional Basic local schema. Components There are several types of architectures 2.2.2 Architectures with Missing Basic with additional components that are exten- Components sions or variations of the basic components There are several architectures in which all of the reference architecture. Such compo- of the processors of one type and all sche- nents enhance the capabilities of an FDBS. mas of one type are missing. Several ex- Examples of such components include the amples follow. following: l No transforming processors or com- l Auxiliary schema: Some FDBSs have ponent schemas: All of the local sche- an additional schema called an auxiliary mas are described in a single data model. In other words, the FDBS does not sup- ‘Its design, however, has provisions to store model port component DBSs that use different transformation information and attach a transforming data models. Hence there is no need for processor. ACM Computing Surveys, Vol. 22, No. 3, September 1990

25. Federated Database Systems l 207 schema that stores the following types of information: “‘“““‘;” Schema) Data needed by federation users but not available in any of the (preexisting) component DBSs. Information needed to resolve incom- patibilities (e.g., unit translation tables, format conversion information). . Statistical information helpful in per- forming query processing and optimi- Figure 13. Using an auxiliary schema to store trans- zation. lation information needed by a constructing processor. Multibase [Landers and Rosenberg 19821 describes the first two types of information in its auxiliary schema, This, however, can limit or conflict with whereas DQS [Belcastro et al. 19881 de- the autonomy of the component DBSs. scribes the last two types of information in its auxiliary schema. Mermaid [Tem- pleton et al. 1987133 describes the third 2.2.4 Extended Federated Architectures type of information in its federated To allow a federation user to access data schema. As illustrated in Figure 13, the from systems other than the component auxiliary schema and the federated DBSs, the five-level schema architecture schema are used by constructing proces- can be extended in additional ways. sors. It is also possible to consider the auxiliary schema to be a part (or sub- l Atypical component DBMS: Instead schema) of a federated schema. of a typical centralized DBMS, a com- . Enforcing constraints among com- ponent DBMS may be a different type of ponent schemas: As illustrated in Fig- data management system such as a file ure 14, an FDBS can have a filtering server, a database machine, a distributed processor in addition to a constructing DBMS, or an FDBMS. OMNIBASE uses processor between a federated schema a distributed DBMS as one of its com- and the component schemas. The filter- ponent DBMSs [Rusinkiewicz et al. ing processor enforces constraints that 19891. Figure 15 illustrates how one span multiple component schemas. The FDBS can act as a backend for another constructing processor, as discussed be- FDBS. By making local schema A2 of fore, transforms a query into subqueries FDBS A the same as external schema B2 against the component schemas of the of FDBS B, the component DBS A2 of component DBSs. Integrity constraints FDBS A is replaced by FDBS B. may be stored in an external schema or l Replacing a component database by a federated schema. The constraints may a collection of application pro- involve data represented in multiple ex- grams: It is conceptually possible to re- port schemas. The filtering processor place some database tables by application checks and modifies each update request programs. For example, a table contain- so when data in multiple component da- ing pairs of equivalent Fahrenheit and tabases are modified, the intercomponent Celsius values can be replaced by a pro- constraints are not violated. This capa- cedure that calculates values on one scale bility is appropriate in a tightly coupled given values on the other. A collection of system in which constraints among mul- conversion procedures can be modeled tiple component databases must be en- by the federated system as a special- forced. An early description of DDTS component database. A special-access [Devor et al. 1982aJ suggested enforce- processor can be developed that accepts ment of semantic integrity constraints requests for conversion information and spanning components in this manner. invokes the appropriate procedure rather ACM Computing Surveys, Vol. 22, No. 3, September 1990

26.208 l Amit Sheth and James Larson Integrity Constraints in External/Federated Schema 1 1 Filtering processor ] I Constructing Processor Figure 14. Using a filtering processor to enforce constraints across export schemas. than access a stored database. Navathe larger granularity and allocate them as et al. [1989] discuss a federated architec- desired. For example, DDTS [Dwyer and ture being developed to provide access Larson 19871 defines two types of modules: to databases as well as application Application Processor and Data Processor programs. (Figure 17). An Application Processor in- cludes a federated schema with the associ- 2.3 Allocating Processors and Schemas to ated constructing processor and all the Computers external schemas defined over the feder- ated schema with the associated filtering It is possible to allocate all processors and processors and transforming processors (if schemas to a single computer, perhaps to present). A Data Processor includes a local allow federation users to access data man- schema, a component schema, and the as- aged by multiple component DBSs on that sociated transforming processor and all ex- computer. Usually, however, different com- port schemas defined over the component ponent DBSs reside on different computers schema with associated filtering processors. connected by a communication system. Dif- An Application Processor performs the ferent allocations of the FDBS components user interface and distributed transaction result in different FDBS configurations. management and coordination functions Figure 16 illustrates the configuration of and is located at every site at which there a typical FDBS. A general-purpose com- are federation users. A Data Processor per- puter at site 1 supports a single component forms the data management functions re- DBS and two federation schemas for two lated to the data managed by a single different classes of federation users. Site 2 component DBS and is located at every site is a workstation that supports two export at which a component DBS is located. A schemas, each containing different data for site can have either or both of the two use by different federation users. Site 3 is modules. Mermaid [Templeton et al. a small workstation that supports a single 1987b] divides the processors and the sche- federation user and no component DBS. mas into four types of modules of smaller Site 4 is a database computer that has one granularity. component DBS but supports no federation Special communication processors can users. also be placed on each computer to enable It may be desirable to group a related set processors on two different sites to com- of processors and schemas into modules of municate with each other. Communication ACM Computing Surveys, Vol. 22, No. 3, September 1990

27. Federated Database Systems l 209 FDBS A I Component Component DBS Al DBS An 0) External Schema B 1 FDBS B Figure 15. FDBS B acting as a back end to FDBS A. processors are not shown in our reference federated schema called the Global Repre- architecture. They are placed between any sentation Schema, which is expressed in pair of adjacent processors that are allo- the relational data model. It has an external cated to different computers. schema called the Conceptual Schema rep- resented in the Entity-Category-Relation- 2.4 Case Studies ship (ECR) model [Elmasri et al. 19851. Users formulate requests directly against In this section we relate the terms and the Conceptual Schema in the GORDAS concepts of the reference architecture to query language [Elmasri 19811. The ECR those used in four example FDBSs. Our data model is rich in semantics (e.g., it purpose is not to survey these systems shows cardinality and operation con- [Thomas et al. 19901 but to show how the straints on an entity’s participation in re- reference architecture can be used to rep- lationships). The transforming part of the resent the architectures of various FDBSs Translation and Integrity Control proces- uniformly. This uniform representation sor is responsible for translating requests can greatly simplify the task of studying written in GORDAS on the ECR data and comparing these systems. model into the internal form of a relational 2.4.1 DOTS query language against the Global Repre- sentational Schema. The filtering part Figure 17 illustrates the original architec- of the Translation and Integrity Control ture of DDTS [Devor et al. 1982a] using processor is responsible for modifying the terminology of the reference architec- each query, so when it is processed, the ture (to the left of each colon) and the constraints specified in the Conceptual terminology used by DDTS (to the right of Schema will be enforced. For example, a each colon and in italics). It has a single GORDAS query that deletes a record will ACM Computing Surveys, Vol. 22, No. 3, September 1990

28. Site 1 Site 2 Site 3 External Schema External Schema (External Schema 1 External Schema I I L / Filtering Processor Filtering Processor I 1 Filtering, Processor 1 I I Federated Schema Federated Schema I f Federated Schema 3 I I I onstructing Processor Constructing Processor Constructing Processor Constructing Processor \ Site 4 \ (Export Schema I 1 Filtering Processor 1 Filtering Processor I (Component Schema 3 (Component Schema 1 TransforTing Proce: 1 I (Local (Local Schema 1 Component DBS 1 1 Component DBS 1 1 Component DBS ] Figure 16. Typical FDBS system configuration.

29. Federated Database Systems 211 Application Processor F(T) Federated: Global Representational I Constructing: -1 -1 I . a Processor sor Component I: Local Component n: Local Representational Representational Schema I I Transforming 1: Transforming n: Local Operations Local Operations Module I Module n I I I Component DBMS 1: Component DBMS n: IDS.2 Wet works RAM (Relational) I I Figure 17. DDTS architecture. be modified so that it verifies that deleting There are two separate processors in the record will not violate any semantic DDTS’s constructing processor, reflecting integrity constraints before the record is the decision to separate distributed query actually deleted. DDTS has since been ex- optimization from distributed query exe- tended to support external schemas [Dwyer cution. The Materialization and Access and Larson 19871 expressed in the rela- Planning component generates a distrib- tional data model defined over the Global uted execution strategy consisting of sets Representation Model. SQL is used to of commands, each expressed in terms query such external schemas. of one of the Local Representational ACM Computing Surveys, Vol. 22, No. 3, September 1990