Semantic heterogeneity resolution in federated databases

数据意图系统之间相互操作的一个关键方面涉及跨数据库边界的元数据和本体的中介。实现本地数据库和远程数据库之间的中介的一种方法是将远程元数据折叠到本地元数据中,从而创建一个公共平台,通过该平台可以实现信息共享和交换。图式植入和语义进化,我们对元数据折叠问题的方法,是一个部分数据库集成方案,在远程和本地(元)数据集成在一个逐步的人。我们介绍了元数据注入和逐步进化技术,以相互关联数据库元素,并解决数据库元素(类、属性和个体实例)的结构和语义上的冲突。我们使用了一个语义丰富的经典数据模型,以及一个增量积分和半对齐解析方案。在我们的方法中,只要对本地和远程信息单元的语义有足够的了解,它们之间的关系就会被阻止。通过将远程数据库元素嵌入到本地数据库中,将远程数据库元素导入本地数据库环境,假设本地远程类的相关性,并自定义重调度元数据的组织,解决了元数据折叠问题。我们已经实现了一个原型系统,并演示了它在实验神经科学环境中的应用。
展开查看详情

1.The VLDB Journal (1999) 8: 120–132 The VLDB Journal c Springer-Verlag 1999 Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution Goksel Aslan, Dennis McLeod Computer Science Department, University of Southern California, Los Angeles, CA 90089-0782, USA; e-mail: {gokselas,mcleod}@usc.edu Edited by R. King · Received June 19, 1998 / Accepted April 20, 1999 Abstract. A key aspect of interoperation among data-inten- sive systems involves the mediation of metadata and ontolo- gies across database boundaries. One way to achieve such mediation between a local database and a remote database is to fold remote metadata into the local metadata, thereby cre- ating a common platform through which information sharing and exchange becomes possible. Schema implantation and semantic evolution, our approach to the metadata folding problem, is a partial database integration scheme in which remote and local (meta)data are integrated in a stepwise man- ner over time. We introduce metadata implantation and step- Fig. 1. The metadata folding problem wise evolution techniques to interrelate database elements in different databases, and to resolve conflicts on the structure and semantics of database elements (classes, attributes, and individual instances). We employ a semantically rich canon- [14, 35], which unite into a loosely coupled form in order to ical data model, and an incremental integration and semantic interoperate. Interoperability between component database heterogeneity resolution scheme. In our approach, relation- systems is achieved by means of the ability of individ- ships between local and remote information units are deter- ual components to actively and cooperatively share and ex- mined whenever enough knowledge about their semantics change information units with other components in the fed- is acquired. The metadata folding problem is solved by im- eration. planting remote database elements into the local database, a Information sharing and exchange necessitates data and process that imports remote database elements into the local metadata to be mediated across component databases in a database environment, hypothesizes the relevance of local federation. One way to achieve such mediation is to fold re- and remote classes, and customizes the organization of re- mote metadata (conceptual schema) into the local metadata, mote metadata. We have implemented a prototype system thereby creating a common platform through which infor- and demonstrated its use in an experimental neuroscience mation sharing and exchange becomes possible. This prob- environment. lem is termed as “the metadata folding problem” (Fig. 1); it can be stated more formally as: given two independently Key words: Federated databases – Semantic hetrogeneity maintained databases, one local and one remote, it is the resolution – Database interoperability – Database integration process of importing and customizing remote database ele- – Schema evolution ments (e.g., classes, attributes, and individual instances) into the local database in the presence of semantic differences be- tween the two, so that remote information units can be ac- cessed/manipulated from/within the local database environ- ment. Importing means bringing remote database elements 1 Introduction into, and making them accessible within the local database environment. Customization, on the other hand, refers to the A cooperative federated database system is a collection of process of reorganizing and/or tuning local and previously autonomous and heterogeneous component database systems imported remote database elements by taking the real-world This research has been funded in part by the USC Integrated Me- concepts they model and their corresponding contexts into dia Systems Center (IMSC), a National Science Roundation Engineering account. In sum, the metadata folding problem is the prob- Research Center, with additional support from the NIHM under grant no. lem of (partial) integration of remote and local databases in 5P01MD/DA52194-02. the presence of semantic conflicts.

2.G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 121 Previous approaches that are applicable to the metadata folding problem assume that either (1) knowledge required to relate interdatabase elements is available – the global schema approach, (2) derivable within the federation – approaches that employ semantic dictionaries or ontologies, (3) or ob- tainable from users – the multidatabase language approach. Most previous approaches propose a largely one-step solu- tion, in which resolution of conflicts on the semantics of information units and integration of remote database ele- ments into the local database are performed in a single pass. There are only a few attempts that consider the importance of the acquisition of knowledge required to interrelate informa- Fig. 2. Global schema approach tion units in different databases. Assuming that knowledge required for semantic heterogeneity resolution and schema integration is always available or derivable results in either frequent user consultations on the semantics and interrela- schematic conflicts and techniques to resolve them on a case- tionships of information units and/or inconsistencies. by-case basis. We propose here a uniform approach to the metadata There are three main categories of research that are folding problem, which recognizes the fact that such a directly applicable to the metadata folding problem. (1) knowledge may not be available. This results in a step- Global schema approach: components agree on a common, wise approach to semantic heterogeneity resolution: a step- federation-wide global schema before any information shar- wise (partial) integration scheme. In our approach, required ing and exchange takes place. Information unit semantics knowledge is incrementally accumulated from information and interdatabase relationships are fixed on this schema. In- unit owners and maintainers (e.g., modelers, administrators, formation sharing and exchange occurs through this shared current users and application maintainers), who are the best global schema. (2) Federated databases with semantic dictio- experts on the semantics of information units they main- nary or ontologies: components agree on a pool of real-world tain in their databases. Interrelationships between database concepts and relationships between concepts. Each compo- elements in different components are determined whenever nent database is responsible for expressing the sharable por- sufficiently precise knowledge about their semantics is ac- tion of its conceptual schema in terms of this common vo- quired. cabulary. Information sharing and exchange occurs by an- The remainder of this paper is organized as follows. Sec- alyzing the actual concepts implied by individual database tion 2 examines previous work that is applicable to the meta- elements, by investigating interconcept relationships, and by data folding problem. Section 3 provides an example sharing deriving the meanings of unknown concepts when necessary. scenario between a local and a remote database. It outlines (3) Multidatabases with multidatabase languages: a multi- key ideas on which our approach to the metadata folding database system employs a powerful language which is fur- problem is based, and specifies the specific sub-problems on nished with explicit primitives that enable a user to mediate which our approach focuses. Section 4 defines the canonical through the component databases. Users in such an environ- data model we employ as the common sharing and exchange ment are assumed to be knowledgeable enough to be able language in the federation. Section 5 presents our approach to express their intentions by using this language statements and its sub-phases in detail. Section 6 describes implemen- in order to achieve information sharing and exchange. tation prototypes which realize our approach to the metadata Approaches assuming a global schema [1, 4, 7–10, 18, folding problem. Finally, Sect. 7 presents a summary, con- 20, 22, 31] (Fig. 2) address key issues such as generating a clusions, and potential enhancements to our work. global schema physically or virtually (e.g., via a view mech- anism), generating global schema-local schema mappings, usage of generalization primitive in schema integration, and the notions of equivalence of domains, classes, and attributes 2 Related work between databases. For instance, [7, 20] focus on the prob- lem of constructing a global schema given correspondence There are considerable research results that focus on the in- assertions (conditions among classes, attributes, or composi- dividual aspects of the metadata folding problem. A key fo- tion hierarchies of object schemas in different components). cus has been on attempts to explicitly capture the semantics Based on the correspondence assertions, integration rules are of information units within databases. For example, [11] de- constructed, which use a set of primitive integration opera- scribe what should be considered semantics and what should tors to achieve the integration. be considered structure in object-oriented databases. Seman- The most serious limitation of global schema approach tics of information units are tightly coupled with the envi- is that it requires too much and too broad global knowledge ronment where information units reside. Therefore, informa- to generate the global schema. It also requires a large global tion maintained in different database environments should be structure to be maintained. The required integration effort processed by respecting their corresponding environments, is very high. Another limitation is the lack of flexibility namely their contexts. [6, 33, 34, 36, 37] Semantic hetero- in the relationships between component database elements, geneity identification/resolution has been the focus of [12, since all the possible relationships between schema elements 26], while [19] provides a comprehensive classification of in different components are fixed in the global schema. A

3.122 G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases “Schema implantation and semantic evolution”, our ap- proach to the metadata folding problem, requires no global structures, although it does not rule out usage of such struc- tures. Here, global knowledge required from a component in order to interoperate with other components is minimal, leading to a very low integration effort. Interrelationships between database elements in the federation are highly dy- namic, and the effort that has to be spent during actual shar- ing and exchange is moderate. Several other recent research results address partially the metadata folding problem with the desired characteristics defined above (e.g., minimum user involvement, minimum Fig. 3. Federated databases with global structures (semantic dictio- nary/ontologies) global structures, minimum global knowledge). For instance, the OBSERVER system [28, 29] addresses the problems of query processing and query reformulation in global infor- mation systems. In this study, information contents of com- ponents are captured in the form of metadata (e.g., semantic descriptions of heterogeneous repositories) in vocabularies (ontologies). The problem of different vocabularies is re- solved by maintaining interrelationships between the terms in different ontologies. By contrast, our approach does not have any global structures which resemble ontologies and structures maintaining interontology relationships. The SIMS project [2, 3] constructs a domain model, a Fig. 4. Multidatabases with a multidatabase language hierarchical, terminological knowledge base, in order to de- scribe application domain semantics. By contrast, we dis- tribute the responsibility of maintaining information unit se- tightly coupled federated database with multiple federated mantics into individual components, and assume an environ- schemas [35] is a generalization of the global schema ap- ment where there are multiple component databases which, proach, and it also suffers from the same limitations. unlike SIMS, may model overlapping or even independent The federated databases with semantic dictionaries/on- application domains. The Carnot project at MCC [15, 39, tologies approach [12, 13, 17, 26, 27, 40, 41] (Fig. 3) is 42–44, 47] bears similar limitations. A global schema or based on the idea of agreeing on a collection of concepts context in the form of a Cyc knowledge base is used as the and interconcept relationships federation-wide, and describ- federating mechanism. Here, a relationship between a do- ing sharable portions of component databases in terms of main concept from a local component and one or more con- this commonly understood set of concepts. When consid- cepts in the global context is expressed as an articulation ax- ered within the context of the metadata folding problem, this iom, a statement of equivalence between these contexts [15]. approach requires a moderate amount of global knowledge, One of the services Carnot provides is the semantic services, global structures, integration effort, and actual exchange as whose purpose is to provide a global, enterprise-wide view compared with the global schema approach. The main lim- of all the resources being integrated [43]. Another limitation itation of this category is the difficulty of agreeing on a set is the difficulty of obtaining the declarative resource con- of concepts and interconcept relationships in a federation straint base which contains interresource dependencies, con- environment that is dynamic (evolves). sistency requirements, and consistency restoration strategies. In the third key approach to the metadata folding prob- The InfoSleuth project [5, 16, 45–48] investigates the use of lem (Fig. 4), multidatabase systems offer multidatabase users Carnot technology in a more dynamically changing envi- with powerful multidatabase languages through which users ronment such as the Internet [5]. In such an ever-growing can manipulate data in different non-integrated schemas[21, and ever-changing environment, information advertisement, 23, 24, 30]. The MDSL [23] multidatabase language of information discovery, and collaboration between clients to MRDSM (a prototype multidatabase system) contains ca- fuse information from many information sources are neces- pabilities to join data in different databases, to broadcast sary. InfoSleuth follows an agent-based approach to these user intentions over a number of database schemas, to flow problems, in which responsibility of carrying out different data between databases, and to aggregate data from vari- tasks are distributed over highly specialized agents. For ex- ous databases. This approach, when applied to the metadata ample, InfoSleuth employs a special agent, called an “ontol- folding problem, requires a comparatively small amount of ogy server”, which is responsible for managing the creation, global structures to be maintained. Interrelationships of dif- update, and querying of multiple ontologies. ferent databases are highly dynamic. Nevertheless, its re- quirement for database users to have and maintain a high degree of global knowledge about the remote information unit semantics constitutes a very important limitation; this likely results in high user effort during information sharing and exchange.

4.G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 123 relate database elements in different component databases, and to resolve conflicts on the semantics of database ele- ments. It is a uniform approach to the problems of (partial) schema integration, semantic heterogeneity resolution, and schema customization for the purpose of ensuring database interoperability in federated databases. It is based on the following key ideas. • The canonical data model employed in the federation and its constructs are important in making information unit semantics explicit. Fig. 5. Local and remote conceptual schemas • An incremental integration and semantic heterogeneity resolution process is utilized, wherein relationships be- tween local and imported remote information units are 3 Schema implantation and semantic evolution determined whenever enough knowledge about their se- mantics is acquired, therefore, recognizing the possibility 3.1 An example sharing scenario of incomplete, missing, and insufficient knowledge about information unit semantics in a federation. In this section, we present an example sharing scenario be- • A partial integration scheme is used, in which semantics tween two federation components which maintain related and context of local information units are dominant to information in their databases. Throughout the rest of this that of remote information units, thus enabling multiple paper, we will illustrate individual phases of our mechanism, semantics to co-exist in a federation. and new concepts we introduce by means of this example from the neuroscience domain. Figure 5 shows an example of a local component which The metadata folding problem can be decomposed into desires to expand its database with related remote infor- five key sub-problems: (1) information discovery: remote mation units. The local schema is managed by an object- database and the portion of the remote database that is relational DBMS. In the local database, information about of interest to local database needs should be located, (2) experiments and people performing the experiments are schema transformation: as component database systems may recorded, as well as information about research data. Each employ different metadata languages (data models) in or- experiment has a name, a start date, a description, and a der to model real-world entities, they need to be brought contact person. Each person has a name, a title, a phone into a common formalism, a canonical data model, so that number, and an address. The researchData relation keeps re- they can be compared, (3) semantic heterogeneity resolu- lated information on what experiment is performed on what tion: metadata, and data if necessary, should be analyzed subject. In addition, each relation has a unique identifier. in order to determine whether there exist relationships be- The remote component is interested in detailed analy- tween schema elements in different components, their real- sis of result data associated with experiments. As a result, world counterparts (their meanings) should be well under- the Physiology Data relation keeps record of different ex- stood, and conflicts in their meanings should be resolved, perimental data sets published in the literature. For each (4) schema/instance importation: remote data and metadata scientific data item, its authors, related neurons, and an- should be brought into and made accessible to the local notations are recorded. Moreover, two kinds of physiology environment, (5) schema/instance customization: while re- data are identified (indicated by arrows as sub-relations in specting the semantics of imported and local information Fig. 5). In addition to the attributes of Physiology Data, the units, the organization of imported and local information Time Series Data relation records number of traces, a set of units should be tuned so that imported information units fit trace identifiers, and data sets, while data points and scaling into the local context. Our approach specifically covers sub- information are kept in the Histogram Data relation. problems (2) through (5), while the information discovery At some point in time, the local database may be more in- problem is addressed elsewhere [17, 27]. volved with experimental results, e.g., different kinds of data In our example in Fig. 5, local database user/administrator sets obtained from experiments. The need here would then finds the relevant remote database and projects the re- be to extend the information content of the local database mote database portion that is of interest to the local ap- with the information content of the remote database. This plication. Both local and remote databases should be in a allows the access and manipulation of remote information common language, the canonical data model of the federa- units within/from the local database. The key process then is tion. Therefore, they are translated from extended relational to fold the remote conceptual schema into the local schema, model to the canonical data model. After local and remote which will take place within the local database context. database schemas are analyzed for potential conflicts, re- mote schema portions (Physiology Data, Histogram Data, Time Series Data) and corresponding instances are import- 3.2 Schema implantation and semantic evolution approach ed into the local database. The last step is to establish con- crete relationships between the local and imported remote Schema implantation and semantic evolution employs meta- schemas, e.g., where the Physiology Data and its subtables data implantation and stepwise evolution techniques to inter- belong within the local schema should be determined.

5.124 G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 4 Heterogeneous Semantic Data Model (HSDM) as the are associated with primitive attributes (an attribute whose canonical data model domain class is primitive) in order to clarify their meanings. Every non-primitive object class in HSDM has a class We employ a semantically rich and expressive object-based key, and an instance key, which are both user-defined. The data model, called HSDM (Heterogeneous Semantic Data class key of an object class is a collection of attributes of Model), as the canonical data model in use federation-wide. that class whose existence conceptually distinguishes that HSDM supports traditional object-based notions such as ob- class from others within the local context. Class keys al- ject identity, classes, attributes, and a set of semantic primi- low determination of the most appropriate place a particular tives (classification, aggregation, and generalization). More- class belongs within the generalization hierarchy, and can be over, it introduces new constructs in order to represent se- interpreted as conceptual class identifiers in this regard. Indi- mantics within conceptual schemas. The purpose of HSDM vidual attributes appearing in a class key represent the most is to ease information sharing and exchange among feder- characteristic aspects of their classes, and help in finding an ation components by producing conceptual schemas where appropriate location for the class in the class hierarchy. Im- information unit semantics are easily understood. HSDM is mediate sub-roots of the “object” class have single-attribute specifically an extension of PDM (Personal Data Manager) class keys: the attribute that is unique across all classes. [25]. The class key of a sub-class consists of class key of its An HSDM database is a collection of objects and rela- super-class combined with either a unique attribute across tionships between objects. Each object has a name, a char- all classes (in the case of attribute-defined sub-classes) or acter string representation of the object, which functions a predicate value that does not overlap with predicate val- as an object identifier. An HSDM class is a collection of ues of all predicate-defined sub-classes defined on the same similar objects with common attributes. An HSDM concep- attribute (in the case of predicate-defined sub-classes). It is tual schema forms a generalization hierarchy representing the existence of class keys and complementary attributes sub-class–super-class relationships between classes, where (Kind, Unitofmeasure, Format, and Length) which enables class “Object” is the root. STRING, NUMBER, and their easy placement of remote classes into the local class hier- sub-classes (P STRING, N STRING, M STRING, CHAR- archy. An instance key is a collection of attributes whose ACTER, INTEGER, REAL) constitute the predefined prim- value distinguishes an instance from others within a class. itive classes, classes which do not have any user-defined HSDM contains a number of schema evolution primi- attributes, in the class hierarchy. Distinguishing four sub- tives that provide capabilities such as renaming an attribute classes of STRING (P STRING, N STRING, M STRING, (class), dropping an attribute (class), adding a new attribute CHARACTER) helps resolution of certain conflicts between to a class, combining two classes, and making a class a attributes. P STRING (pure string) class can contain val- super-class (sub-class) of another. These primitives are ex- ues which consist of a sequence of non-numeric characters. tensively used during restructuring the implanted remote N STRING (numerable string) represents values which con- conceptual schema during folding: once the relationship be- sist of a sequence of numeric characters, an optional sign tween a local and a remote class is determined, these primi- character (‘+’ or ‘−’), and an optional fixed point charac- tives enable placement of remote class (instances) into their ter (‘.’). An M STRING (mixed string) value is a sequence designated places in the local class hierarchy. of numeric and non-numeric characters, provided that it has HSDM supports a number of null values corresponding at least one numeric and one non-numeric character in it. to different kinds of uncertainties while modeling incom- Classes CHARACTER, INTEGER, and REAL correspond plete information. Initial null (nulli ) corresponds to the ini- to usual data types present in most data models. tial unknown state of a newly introduced complementary Three kinds of sub-class–super-class relationships are attribute. The don’t know null (null? ) corresponds to a miss- identified in HSDM. 1. Attribute defined: the sub-class has ing attribute value. Finally, inapplicable null (nullx ) signifies at least one attribute that is different than the ones inherited improper attachment of an attribute to a class. from its super-class and the ones defined by its siblings, 2. Predicate defined: the sub-class and its super-class have the same set of attributes; the values of one of the attributes 5 Phases of schema implantation and semantic evolution are used to distinguish the sub-class instances from that of super-class and that of its siblings, 3. User defined: mem- bership of sub-class instances cannot be determined without Figure 6 illustrates the schema implantation and semantic user involvement. We assume that in HSDM each class has evolution approach to the metadata folding problem. Remote at most one super-class. and local schemas, which are translated into HSDM, carry Attributes in HSDM represent significant aspects of substantive knowledge that makes information unit seman- classes. They can be categorized as the ones that exist in tics explicit. This is achieved by means of unique HSDM original schemas before we apply our technique (application concepts like class keys, instance keys, and complementary attributes), and others which are introduced during the appli- attributes. After implanting remote database elements into cation of our technique (complementary attributes). Unlike the local database, knowledge required to interrelate them application attributes, complementary attributes may be as- is acquired in a stepwise fashion during the semantic evo- sociated with both classes and other attributes. Descriptor at- lution phase. Acquired knowledge is used to obtain a new tributes are complementary attributes, which define an object local database schema that is more tightly coupled with im- class’s (attribute’s) meaning in natural language. Comple- planted remote schema elements than that of the previous mentary attributes Kind, Unitofmeasure, Format, and Length version of the local database schema.

6.G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 125 Fig. 7. Semantic clarification Fig. 8. Local and remote conceptual schemas after semantic clarification Fig. 6. Our approach: schema implantation and semantic evolution any potential attribute which can serve as the (part of the) class key or instance key, complementary attributes are in- The schema implantation and semantic evolution tech- troduced for this purpose. User-defined sub-classes are con- nique consists of three phases: (a) semantic clarification, (b) verted to attribute-defined or predicate-defined sub-classes, schema implantation, and (c) semantic evolution. without loss of generality. Figure 8 shows example local and remote schemas after the semantic clarification phase. Class keys of classes are 5.1 Semantic clarification phase shown as attributes enclosed by boxes, while instance keys are underlined. In the local schema, existence of attribute The inability of most data models to represent information phone, the class key of person, conceptually distinguishes unit semantics in an explicit, comparable, and easily inter- the person class from all other classes. The specification of pretable manner necessitates explicit structures to maintain instance keys indicates the application’s point of view of the descriptions of and interrelationships between information real world. For example, the local database distinguishes ex- unit semantics that may reside in different databases. In or- periments by their name attribute values. Class keys and in- der to alleviate the need for such global structures, HSDM stance keys can contain complementary attributes, although provides special constructs that augment the semantics of such a need is not observed in our example. existing metadata (e.g., class keys, instance keys, descriptor attributes, the Kind attribute for primitive attributes). The semantic clarification phase (Fig. 7) transforms a 5.2 Schema implantation phase component conceptual schema from its native data model into HSDM (schema transformation), and augments it with The schema implantation phase (Fig. 9) loosely integrates additional information (semantic enrichment). Activities in local and remote conceptual schema elements. It consists of this phase are performed for each component only once when two sub-phases: superimposing and hypothesis specification. the component enters into the federation, or when it de- During superimposing (Fig. 10), remote schema elements cides to share and exchange information units with other are imported and super-imposed onto the local schema. Im- components. In this process, descriptor attributes are created mediate sub-trees of the remote schema are added as new and attached to each class and each attribute. For primi- immediate sub-trees of class “Object” in the local schema. tive attributes, complementary attributes Kind, Unitofmea- Predefined primitive classes constitute the directly sharable sure, Format, and Length are created in order to clarify their (agreed on) portion of conceptual schemas in the federation, meaning. For each user-defined class in the class hierarchy, and do not introduce any problems during the integration. a class key and an instance key are determined with the User-defined primitive classes of local and remote databases, help of a domain expert. In cases where there does not exist on the other hand, are interrelated by domain experts. Indi-

7.126 G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases Fig. 12. Local schema implanted with remote schema portion Fig. 9. Schema implantation lence assertion, which specifies interobject relationships be- tween local and remote class instances. The associated local class and associated remote class of a harmonizer are termed semantic peers. Harmonizer construction activity in the hypothesis speci- fication phase is limited to the immediate sub-roots of the lo- cal conceptual schema implanted with the remote conceptual schema. Semantic peers are bound to each other in the form of a harmonizer for a common goal: to establish a semantic relationship between them. In order to realize this goal, we must establish that they can complement each other’s struc- tural, behavioral, and semantic aspects: with the construction of a harmonizer, the local class definition is expanded with attributes that exist in the remote class definition but not in the local class definition. Similarly, the remote class defini- tion is expanded with attributes that exist in the local class definition but not in the remote class definition. Values for these newly introduced attributes are accumulated during the next phase. Fig. 10. Superimposing sub-phase With regard to our example, Fig. 12 shows the local schema implanted with the remote schema elements. (At- tribute domains and details are not shown for simplicity.) A vidual remote instances are also imported and made available harmonizer, Harmonizer1 , is constructed between the local within the local database during this sub-phase. class researchData and the remote class Physiology Data. The second sub-phase (Fig. 11) is termed hypothesis The structure of Harmonizer1 is shown in Fig. 13. Here, we specification; here, a number of hypothetical relevances (har- see how complementary attributes are introduced by seman- monizers) are specified between local and remote classes by tic peers on each other to complement each other’s struc- means of harmonizers. A harmonizer is a persistent structure tural and semantic aspects. In Fig. 13, Harmonizer1 is built that associates a local abstract class with a remote abstract as part of the schema implantation phase in order to investi- class for the purpose of investigating whether or not they gate the relevance of the associated local class researchData are semantically related (e.g., equivalence, sub-class, super- and the associated remote class Physiology Data. While no class, overlapping, or distinct). Each harmonizer has a name, attribute equivalences are specified for this harmonizer1 , an an associated local class, an associated remote class, attribute object equivalence assertion depends on the instance key of equivalence assertions, known equivalences between the at- the associated local class. According to this assertion, an in- tributes of local and remote classes, and an object equiva- stance of researchData class will be considered equivalent to another instance of Physiology Data class if the value of at- tribute experiment of the local instance is equal to the value of attribute experiment of the remote instance. After Harmonizer1 is constructed, the local class re- searchData is added with complementary attributes, authors, neurons, and annotations, which originally belong to the re- mote class Physiology Data. Accordingly, the remote class Physiology Data is complemented with complementary at- tributes, subject and experiment, which originally belong to Fig. 11. Hypothesis specification sub-phase the local researchData.

8.G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 127 Fig. 13. The structure of Harmonizer1 Fig. 15. Acquisition of knowledge about remote class instances explicit (e.g., attribute definitions, descriptor attributes, lo- cation of the attribute’s class within the class hierarchy), and are shipped to the remote database system (Fig. 15). Based on the semantic information about attributes, the remote domain expert investigates if the remote class defi- nition can be expanded with these new attributes by trying to provide meaningful attribute values for the characteris- tic sub-set of the remote class, and ships back these new values to the local database. New attribute values are cho- sen among permissible values in the domain classes of at- tributes or among different null values. For example, it is easy to choose values for a primitive attribute (an attribute whose domain class is primitive such as REAL, INTEGER, etc.), since primitive classes constitute the common part of Fig. 14. Semantic evolution each and every component database schema in the federa- tion. However, an interclass attribute (an attribute between two user-defined classes) cannot always be given meaning- According to the implicit contract between research- ful values from the domain class of the attribute. This is Data and Physiology Data as the consequence of building because the domain class of the attribute is defined in one Harmonizer1 , (1) researchData class will supply meaning- context (local database) and may not be well known to the ful values for the complementary attributes borrowed from other context (remote database). The value of such an at- the remote class Physiology Data (authors, neurons, and an- tribute can be chosen among different kinds of null values notations), and (2) Physiology Data will supply meaningful depending on whether or not the attribute makes sense for values for complementary attributes borrowed from the local the specific instances in the characteristic sub-set. class researchData (subject and experiment). Figure 16 shows the possible states of a harmonizer dur- ing the testing period. A harmonizer enters into the initial state when it is first created. The harmonizer sends local 5.3 Semantic evolution phase and remote requests to domain experts requesting additional knowledge about the associated local and associated remote The semantic evolution phase (Fig. 14) investigates whether classes of the harmonizer (complementary attribute values previously hypothesized relevances hold, configures harmo- which were introduced by the semantic peers on each other). nizers for different local-class–remote-class combinations, In the initial state, the harmonizer waits for this additional and activates schema evolution primitives depending on the knowledge to arrive. Either the local knowledge or the re- outcome of these investigations. With the construction of a mote knowledge may arrive first. If the local knowledge ar- harmonizer, the testing period starts for the harmonizer, dur- rives first, the harmonizer jumps to the WR state (in Fig. 16), ing which information is accumulated for attributes imposed where it waits for the remote knowledge to arrive next. Sim- by local and remote classes on each other. ilarly, the WL state is reached when the harmonizer is in the A sub-set of instances of a class that can be treated as initial state, and it receives the remote knowledge. In both representative for all instances is called a characteristic sub- cases, the harmonizer state changes to RE. In the RE state, set of that class. A characteristic sub-set of a class includes the harmonizer is ready to be evaluated, since it has already at least one instance from each of its sub-classes. During acquired both the local and the remote knowledge in order the testing period, local domain expertise tries to provide to determine the relevance of the semantic peers. Receiving meaningful values for the attributes imposed by the remote the evaluate directive, it determines the relationship between class. Values are supplied for only the characteristic sub-set semantic peers, and after a final user verification, it activates of the local class, not for all instances. Similarly, when a schema evolution primitives to establish the relationship on harmonizer is constructed, attributes that exist in the local the conceptual schema. The state F is the final state, where class definition but not in the remote class definition are this harmonizer is not needed any more, and can be deleted packed along with information that makes their semantics or archived.

9.128 G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases Fig. 16. State transition diagram for harmonizers Fig. 18. Final local conceptual schema after semantic evolution the remote class. The resulting conceptual schema is shown Fig. 17. Possible states of a harmonizer and consequent actions after acqui- in Fig. 18. sition of necessary knowledge about semantics of local and remote classes 6 Implementation A harmonizer is said to reach the equilibrium state when Implementation of a full-fledged mechanism which realizes it is able to suggest a relationship between semantic peers. our approach requires two components: (1) translators that Based on the collected attribute values, a decision is made transform database schemas from various data models to regarding whether newly introduced remote (local) attributes HSDM, and (2) a database tool that manages the interac- make sense for the local (remote) class. For example, if the tions between HSDM-managed databases. As there have characteristic sub-set of the remote class has meaningful val- been many efforts in the first area, we have focused on ues for the class key attributes of the local object class, then developing a tool which realizes the schema implantation it will be deduced that the remote class instances can sat- and semantic evolution approach in a federation where all isfy the condition to be regarded as the instances of the the component database systems employ HSDM as their data local class. Figure 17 shows possible states, decisions made model. This prototype database tool is named the HSDM Me- and consequent actions. As a result of this analysis, new diator, and it was tested with examples from neuroscience harmonizers may be formed, existing ones may be propa- domain within the context of USC Brain Project. gated down into the class hierarchy, or individual schema The prototype HSDM Mediator was built in two consec- evolution primitives may be activated to build explicit rela- utive parts. As HSDM is an extension of PDM [25], first, tionships between these semantic peers. we have implemented a database tool which allows mod- In our example, in order to test the relevance of the lo- eling, representation, and management of information units cal and remote classes researchData and Physiology Data using the PDM data model. Second, we have implemented we have built a harmonizer between these two classes. Dur- the HSDM Mediator itself. ing the testing period of this harmonizer, the remote domain expert tries to supply values to subject and experiment at- tributes of the local class for the characteristic sub-set of 6.1 PDM implementation the remote class. The local domain expert provides values to authors, neurons, and annotations attributes of the remote We have implemented a simple, easy-to-use, object-based class for the characteristic sub-set of the local class. While DBMS, called PDM, on top of ObjectStore [32]. PDM is the remote domain expert succeeds in providing meaning- based on a simple semantic data model proposed by McLeod ful values for the attributes subject and experiment, being and Lyngbaek [25]. It has a friendly user interface, in which unable to find meaningful remote class attribute values for users can browse the conceptual schema, kind1 definitions, some instances in the characteristic sub-set of the local class, and individual instances. It also supports individual data the local domain expert provides inapplicable nulls for these manipulation operations such as insert, modify, and delete. attribute values. Therefore, evaluation of the harmonizer re- sults in a super-class relationship between the local class and 1 A class in object-oriented data models is called an object kind in PDM

10.G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 129 PDM supports a basic naming scheme, according to which every object in the database has a character string representa- tion for identification purposes. This naming scheme can be seen as a primitive form of object identifier in object-based data models. Users can browse/create/manipulate PDM con- ceptual schemas and PDM databases by means of an im- portant construct, namely Working Kind. A Working Kind is like a cursor in relational DBMSs. It can be bound to a number of instances of an object kind. Most of the operations work relative to the Working Kind. PDM software was written in C++ with embedded Ob- jectStore [32] calls for database-related functionality, and embedded “curses” library function calls [38] for user- interface-related functionality. We have used ObjectStore as the storage sub-system in implementing PDM. Fig. 19. Implementation architecture 6.2 Implementation of the HSDM Mediator The HSDM Mediator has two functions within a federation. First, it functions as a database modeling tool by enabling information units to be modeled/maintained in an HSDM database. Second, it allows HSDM databases to interoper- ate with each other under the principles of our approach. Like the PDM software, the HSDM Mediator software has Fig. 20. Functionality of the HSDM mediator been written in C++ with embedded ObjectStore statements for database-related functionality, and embedded “curses” li- brary function calls for user-interface-related functionality. ity. The Mediator’s functionality is divided into five cate- We have used ObjectStore as the storage sub-system in im- gories corresponding to five sub-menus in the user interface: plementing the HSDM Mediator also. Database, Browse Schema, Working Kind, Manipulate DB, In order to implement a prototype tool for HSDM, which Interoperate (Fig. 20). also realizes our methodology, we have customized the PDM In Fig. 20, the Database sub-menu contains menu items software. This has been achieved by implementing the ca- which operate on databases, such as create, open, and status. pabilities HSDM supports but PDM lacks. In particular, we The Browse Schema sub-menu enables users to browse the implemented the following capabilities of HSDM, and inte- class hierarchy, as well as to browse class definitions. In- grated it with the already existing PDM software. PDM does dentation, highlighting, and underlining techniques are used not offer any means to clarify schema semantics. HSDM, in to display the class hierarchy on the screen. The third sub- contrast, allows specification of complementary attributes, menu in the user interface of the HSDM Mediator is the class keys, and instance keys. Therefore, we have written Working Kind sub-menu. This sub-menu enables manipula- code that enables classes to have instance and class keys, tion and retrieval of the Working Kind. The Manipulate DB and that allows attachment of complementary attributes to sub-menu includes operations that change the database state classes and attributes. For explicit information unit seman- in a permanent manner such as creating a kind, creating an tics, we have implemented attribute-defined and predicate- instance, or modifying an instance. defined sub-class mechanisms. Code required for interop- The Interoperate sub-menu (Fig. 21) enables HSDM- eration purposes such as for importing remote classes and managed components to interoperate. The Implant DB is instances, and code for harmonizer construction and main- used to implant remote databases into a local database. The tenance has been produced, and added into the software. operation implants a remote database into the local database Finally, schema evolution primitives we need to restructure conceptual schemas were added to the HSDM Mediator soft- ware. Figure 19 shows the implementation architecture of our approach to the metadata folding problem, where the local component communicates with the remote component via the “Implant DB” operation and via requests for comple- mentary information. The remote component provides re- mote data and metadata when “Implant DB” operation is invoked. It provides complementary attribute values for a particular harmonizer on request. We now discuss the functionality of the HSDM Media- tor prototype by describing individual operations it provides. We specifically focus on operations related to interoperabil- Fig. 21. The interoperate sub-menu

11.130 G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases Fig. 22. Possible solutions to the metadata folding problem by importing the class hierarchy, kinds, and instances of the be evaluated” state, the Eval. Harmonizer operation is ac- specified remote database. The All Remote Kinds operation tivated. As a result, accumulated information is analyzed, is necessary for displaying imported remote kinds that were and a suggestion is made regarding the possible relationship not tied with the local class hierarchy yet. The databases between the associated local kind and the associated remote from which these remote kinds originate, and any harmo- kind of this harmonizer. After the user is prompted with the nizers in which they are involved are also displayed. consequences of the suggested action, he/she is expected to The remaining operations are for harmonizer processing. confirm this suggestion. When the user confirms a suggested Using the Create Harmonizer operation, users specify hypo- action, individual schema evolution primitives are activated thetical relevances, harmonizers, between local and imported to establish the suggested relationship between the associ- remote kinds. All Harmonizers displays all the harmonizers ated local and remote kinds of the harmonizer. under investigation. Harmonizer Status, on the other hand, displays information about a specific harmonizer such as its name, associated local kind, associated remote kind, and its 7 Discussion, conclusions and research directions status (waiting for local and remote data, waiting for local data, waiting for remote data, ready to be evaluated, and In this paper, we have described an approach to the metadata evaluated). Harmonizers are persistent structures in our pro- folding problem, which emphasizes incremental acquisition totype. They should be disposed of or archived when they of knowledge required to fold a remote conceptual schema complete performing their functions (e.g., after they are eval- onto a local one for the sake of information sharing and ex- uated); Delete Harmonizer is used for this purpose. change. One observation we have made is the difficulty of The Harm. Enter Data operation enables the local do- maintaining and/or agreeing on global structures in a fed- main expert to enter complementary attribute values for the erated database system environment. Consequently, we do characteristic sub-set of the local kind. Using the RHarm. not assume any global entities in our approach, except for Enter Data operation, the remote domain expert provides HSDM. complementary attribute values for the characteristic sub-set We employ hypothetical processing (hypothesizing that of the remote kind. Supplied values are sent back to the two classes are related via equivalence, specialization, gener- local component. When a harmonizer is in the “ready to alization, overlapping, or irrelevance, and endeavor to prove that such a hypothesis holds for a small, but very typical

12.G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 131 sub-set of instances) while investigating whether local and 5. Bayardo R, Bohrer W, Brice R, Cichocki A, Flowler G, Helal A, remote classes have the ability to complement each other’s Kashyap V, Ksiezyk T, Martin G, Nodine M, Rashid M, Rusinkiewicz semantic and structural aspects. This is in order to form a M, Shea R, Unnikrishnan C, Unruh A, Woelk D (1997) InfoSleuth: Semantic Integration of Information in Open and Dynamic Environ- relationship between them in the local class hierarchy, there- ments. In: Peckham J (ed) Proceedings of ACM SIGMOD International fore incrementally placing remote conceptual schema ele- Conference on Management of Data, 1997, Tucson, Arizona. 26(2): ments into the local class hierarchy where they make sense. 195–206 When considered within the context of the metadata fold- 6. Bresson S, Goh CH, Fynn K, Jakobisiak M, Hussein K, Kon HB, Lee ing problem, each approach to the metadata folding prob- T, Madnick SE, Pena T, Qu J, Shum AW, Siegel M (1997) The Context lem has its pros and cons, which are summarized in Fig. 22. Interchange Mediator Prototype. ACM SIGMOD Rec 26: 525–527 7. Chen ALP, Koh JL, Kuo TCT, Liu CC (1995) Schema integration For example, the global schema approach requires too much and query processing for multiple object databases. Integrated Comput global knowledge from federation users both during inte- Aided Eng (Special Issue on Multidatabase and Interoperable Systems) gration and during actual sharing and exchange. It requires 2(1): 21–34 huge amounts of space to store the global schema, and the 8. Czejdo B, Rusinkiewicz M, Embley D (1987) An approach to schema initial integration effort that has to be spent is very costly and integration and query formulation in federated database systems. In: prohibitive. Furthermore, semantics and interrelationships of Proceedings of the 3rd IEEE Conference on Data Engineering, 1987, Los Angeles, CA. IEEE Comp Society, pp 477–484 schema elements are not dynamic because of the fixed global 9. Dayal U, Hwang H (1984) View definition and generalization for schema. Nevertheless, exchange effort is very low, since ev- database integration in a multidatabase system. IEEE Trans Software ery possible sharing pattern is fixed and obvious in the form Eng 10(6): 628–644 of the global schema. The schema implantation and seman- 10. Elmasri R, Navathe S (1984) Object integration in logical database tic evolution approach on the other hand requires minimal design. In: Proceedings of IEEE Computer Society 1st International global knowledge, minimal global structures, and minimal Conference on Data Engineering, 1984, Los Angeles, CA. IEEE Comp Society, pp 426–433 initial integration effort. Interrelationships of schema ele- 11. Geller J, Perl Y, Neuhold EJ (1991) Structure and semantics in object- ments in different components are highly dynamic since they oriented database class specifications. ACM SIGMOD Rec 20(4): 40– are not tightly bounded. However, it necessitates moderate 43 exchange effort to be spent because of the need to acquire 12. Hammer J, McLeod D (1993) An approach to resolving semantic het- additional attribute values for classes that are hypothesized erogeneity in a federation of autonomous, heterogeneous database sys- to be related. tems. Int J Intelligent Coop Inf Syst 2(1): 51–83 13. Hammer J, McLeod D, Si A (1994) Object discovery and unification The ultimate success of our approach, and of any ap- in a federated database system. Technical Report USC-CS. Computer proach claiming to provide a solution to the metadata folding Science Department, University of Southern California, Los Angeles, problem, depends on the amount of user interaction/consultation Calif. required. Acquiring additional information for only a char- 14. Heimbigner D, McLeod D (1985) A federated architecture for infor- acteristic sub-set of a class in our approach is a direct result mation management. ACM Trans Off Inf Syst 3(3): 253–278 of this concern. One cause for still too much user interac- 15. Huhns M, Jacobs N, Ksiezyk T, Shen W, Singh M, Cannata P (1992) Enterprise Information Modeling and Model Integration in Carnot. In: tion in our scheme is choosing harmonizers in a way that Petrie CJ jr (ed) Enterprise Integration Modeling: Proceedings of the results in irrelevance. Currently, our approach depends on First International Conference, 1992. MIT Press, Cambridge, Mass. user intuition in building initial harmonizers. It would be 16. Jacobs N, Shea R (1996) The Role of Java in InfoSleuth: Agent-based worthwhile to consider building a mechanism that analyzes Exploitation of Heterogeneous Information Resources. In: IntraNet96 data and metadata of remote and local databases, and sug- Java Developers Conference, April 1996 gests potential harmonizers to users. This would greatly re- 17. Kahng J, McLeod D (1996) Dynamic classificational ontologies for discovery in cooperative federated databases. In: Proceedings of the duce required user input, since it will reduce the number of 1st International Conference on Cooperative Information Systems, June harmonizers to be built in order to integrate the schemas. 1996, Brussels, Belgium. IEEE-CS Press Another shortcoming of our approach is that we depend on 18. Kaul M, Drosten K, Neuhold EJ (1990) Viewsystem: Integrating het- structural properties of a class definition while investigating erogeneous information bases by object-oriented views. In: Proceed- its relevance to a remote class. An extension which consid- ings of International Conference on Data Engineering 6, 1990, Los ers behavioral properties (methods) of class definitions as Angeles, CA. IEEE Comp Society, pp 2–10 19. Kim W, Choi I, Gala S, Scheevel M (1993) On resolving schematic well would contribute additional power. Still another inter- heterogeneity in multidatabase systems. Distrib Parallel Databases 1(3): esting extension would be to study implications of multiple 251–279 inheritance within this framework. 20. Koh JL, Chen ALP (1993) Integration of heterogeneous object schemas. In: Elmasri R, Kouramojian V, Thalheim B (eds) Proceedings of the 12th International Conference on Entity Relationship Approach, 1993, Lecture notes in CS, Vol 823, Springer, pp 297–314 References 21. Krishnamurthy R, Litwin W, Kent W (1991) Language features for IEEE interoperability of databases with schematic discrepancies. In: 1. Abiteboul S, Bonner A (1991) Objects and views. In: Clifford J, King Clifford J, King R (eds) Proceedings of ACM SIGMOD International R (eds) Proceedings of ACM SIGMOD, 1991, Rec 20(2): 238–247 Conference on Management of Data, 1991, Denver Colo. SIGMOD 2. Arens Y, Knoblock CA, Hsu C (1996) Query processing in the SIMS Record 20(2): 40–49 Information Mediator. In: Tate A (ed) Advanced Planning Technology. 22. Larson J, Navathe SB, Elmasri R (1989) A theory of attribute equiva- AAAI Press, Menlo Park, Calif. lence in databases with application to schema integration. IEEE Trans 3. Arens Y, Knoblock CA, Shen W (1996) Query Reformulation for Dy- Software Eng 15(4): 449–463 namic Information Integration. J Intelligent Inf Syst 6(2/3): 99–130 23. Litwin W, Abdellatif A (1986) Multidatabase interoperability. IEEE 4. Batini C, Lenzerini M, Navathe S (1986) A comparative analysis of Comput 19(12): 10–18 methodologies for database schema integration. ACM Comput Surv 24. Litwin W, Mark L, Roussopoulos N (1990) Interoperability of multiple 18(4): 323–364 autonomous databases. ACM Comput Surv 22(3): 267–293

13.132 G. Aslan, D. McLeod: Semantic heterogeneity resolution in federated databases 25. Lyngbaek P, McLeod D (1984) A personal data manager. In: Dayal U, 36. Siegel M, Madnick SE (1991) A metadata approach to resolving se- Schlageter G, Seng LH (eds) Proceedings of the International Confer- mantic conflicts. In: Lohman GM, Sernadas A, Camps R (eds) Pro- ence on Very Large Data Bases, 1984, Singapore Morgan Kaufmann, ceedings of the International Conference on Very Large Data Bases, pp 14–25 1991, Barcelona, Catalonia, Spain. Morgan Kaufman, pp 133–145 26. McLeod D (1991) The identification and resolution of semantic het- 37. Siegel M, Madnick SE (1991) Context Interchange: sharing the mean- erogeneity in multidatabase systems. In: International Workshop on ing of data. ACM SIGMOD Rec 20(4): 77–78 Interoperability in Multidatabase Systems, 1991, Kyoto, Japan 38. Strong J (1986) Programming with curses. O’Reilly & Associates, Se- 27. McLeod D, Si A (1995) The design and experimental evaluation of bastopol, Calif. an information discovery mechanism for networks of autonomous 39. Tomlinson C, Lavender G, Meredith G, Woelk D, Cannata P (1992) database systems. In: Yu PS, Chen ALP (eds) Proceedings of IEEE The Carnot Extensible Services Switch (ESS) -Support for Service International Conference on Data Engineering, 1995, Taipei, Taiwan. Execution. In: Petrie CJ jr (ed) Enterprise Integration Modeling: Pro- IEEE Comp Society, pp 15–24 ceedings of the First International Conference, 1992. MIT Press, Cam- 28. Mena E, Kashyap V, Illarramendi A, Sheth A (1996) Managing Mul- bridge, Mass. tiple Information Sources through Ontologies: Relationship between 40. Tsai PSM, Chen ALP (1994) Concept hierarchies for database inte- Vocabulary Heterogeneity and Loss of Information. In: Baader F, Buch- gration in a multidatabase system. In: 6th International Conference on heit M, Jeusfeld MA, Nutt W (eds) Proceedings of the Third Workshop Management of Data, 1994, Bangalore, India Knowledge Representation Meets Databases (KRDB), 1996, Budapest, 41. Wiederhold G (1994) Interoperation, mediation and ontologies. In: Hungary. CEUR Workshop Proceedings No. 4 Workshop on Heterogeneous Knowledge-Bases, 1994. W3, pp 33–48 29. Mena E, Kashyap V, Sheth A, Illarramendi A (1996) OBSERVER: An 42. Woelk D, Shen W, Huhns M, Cannata P (1992) Model-Driven En- Approach for Query Processing in Global Information Systems based terprise Information Management in Carnot. In: Petrie CJ jr (ed) En- on Interoperation across Pre-existing Ontologies. In: Proceedings of terprise Integration Modeling: Proceedings of the First International the First IFCIS International Conference on Cooperative Information Conference, 1992. MIT Press, Cambridge, Mass. Systems (CoopIS), 1996, Brussels, Belgium. IEEE-CS Press, pp 14–25 43. Woelk D, Cannata R, Huhns M, Shen W, Tomlinson C (1993) Using 30. Missier R, Rusinkiewicz M (1995) Extending a Multidatabase Manipu- Carnot for Enterprise Information Integration. In: Second International lation Language to Resolve Schema and Data Conflicts. In: Meersman Conference on Parallel and Distributed Information Systems, January R, Mark L (eds) Proceedings of the Sixth IFIP TC-2 Working Confer- 1993, San Diego, Calif. IEEE-CS, pp 133–136 ence on Database Semantics (DS)-6, 1995, Stone Mountain, Atlanta, 44. Woelk D (1994) Carnot Intelligent Agents and Digital Libraries. In: Georgia, USA, pp 93–115 Proceedings of the First Annual Conference on the Theory and Practice 31. Motro A (1987) Superviews: Virtual integration of multiple databases. of Digital Libraries, June 1994 IEEE Trans Software Eng 13(7):785–798 45. Woelk D, Tomlinson C (1994) The InfoSleuth Project: Intelligent 32. Object Design Inc (1993) ObjectStore User Guide: Library Interface, Search Management via Semantic Agents. In: Second International Release 3.0. Object Design Inc. World Wide Web Conference, October 1994, Chicago, USA 33. Sciore E, Siegel M, Rosenthal A (1992) Context Interchange using 46. Woelk D, Tomlinson C (1995) InfoSleuth: Networked Exploitation meta-attributes. In: Finin TW, Nichola CK, Yesha Y (eds) First In- of Information Using Semantic Agents. In: COMPCON Conference, ternational Conference on Information and Knowledge Management, March 1995, San Francisco, Calif. IEEE-CS, pp 147–152 1992, Lecture notes in CS, Vol 752; Springer, Baltimore, Md, pp 377– 47. Woelk D, Tomlinson C (1995) Carnot and InfoSleuth: Database Tech- 386 nology and the World Wide Web. In: Carey MJ, Schneider DA (eds) 34. Sciore E, Siegel M, Rosenthal A (1990) Using semantic salues to facil- ACM SIGMOD Int. Conference on the Management of Data, May itate interoperability among heterogeneous information systems. ACM 1995, San Jose, Calif., SIGMOD Record 24(2):443–444 Comput Surv 22(3): 183–236 48. Woelk D, Huhns M, Tomlinson C (1995) InfoSleuth Agents: The Next 35. Sheth A, Larson J (1990) Federated database systems for managing Generation of Active Objects. MCC Technical Report INSL-054-95. distributed, heterogeneous, and autonomous databases. ACM Comput MCC, Austin, Tex. Surv 22(3): 183–236