HiFi: A Unified Architecture for High Fan-in Systems

数据获取和传感器技术的进步导致了“高扇入”体系结构的发展:广泛分布的系统的边缘与传感器网络和射频读取器等众多受体相融合,内部节点由传统的主机组成,采用连续聚合的原理。这种架构带来了新的重大数据管理挑战。加州大学伯克利分校的高保真(hi-fi)系统旨在应对这些挑战。我们演示了使用数据流查询处理从多个设备中获取、过滤和聚合的hifi的最初原型,这些设备包括传感器、rfid读取器和组织为高扇入系统的低功率网关。
展开查看详情

1. HiFi: A Unified Architecture for High Fan-in Systems (System Demonstration) Owen Cooper*, Anil Edakkunni*, Michael J. Franklin*, Wei Hong+, Shawn R. Jeffery*, Sailesh Krishnamurthy*, Fredrick Reiss*, Shariq Rizvi*, and Eugene Wu* + *EECS Dept., UC Berkeley Intel Research Berkeley 1.1 High Fan-in Systems Abstract In many cases, sensors and readers will serve as the Advances in data acquisition and sensor receptors at the edges of widely distributed systems. For technologies are leading towards the example, in a supply chain management deployment, development of “High Fan-in” architectures: collections of sensors and RFID readers on individual widely distributed systems whose edges consist store shelves (in a retail scenario) or dock doors (in a of numerous receptors such as sensor networks warehouse/manufacturing scenario) continuously collect and RFID readers and whose interior nodes readings. These readings include “beeps” from low- consist of traditional host computers organized function passive RFID tags (indicating the presence of using the principle of successive aggregation. particular tagged objects, such as cases of goods), as well Such architectures pose significant new data as more content-rich information from smart sensors and management challenges. The HiFi system, under higher-function tags (such as temperature readings, development at UC Berkeley, is aimed at shipping histories, etc.). addressing these challenges. We demonstrate an These “edge” devices produce data that will be initial prototype of HiFi that uses data stream aggregated locally with data from other nearby devices. query processing to acquire, filter, and aggregate That data will be further aggregated within a larger area, data from multiple devices including sensor and so on. This arrangement results in a distinctive motes, RFID readers, and low power gateways bowtie topology we refer to as a High Fan-In system (see organized as a High Fan-in system. Figure 1). A sophisticated system such as one supporting a nation-wide supply chain application may consist of vast numbers of widely dispersed receptor devices (i.e., many thousands or more depending on the technology) and 1. Introduction many levels of successively wider-scoped aggregation and Emerging wireless sensor networks and RFID storage. Such systems will comprise a vast array of technologies are on a fast track to widespread deployment heterogeneous resources, including inexpensive tags, in applications such as environmental monitoring, asset wired and wireless sensing devices, low-power compute tracking, telemetry-based remote monitoring, and real- nodes and PDAs, and computers ranging from laptops to time supply chain management. Driven by both the largest mainframes and clusters. technological and market forces, sensor and RFID-based deployments raise the promise of taking computing from its current, user-driven mode, to one of direct and continuous interaction with the physical world. * This work was funded in part by NSF under ITR grants IIS-0086057 and SI-0122599, by the IBM Faculty Partnership Award program, and by research funds from Intel, Microsoft, and the UC MICRO program. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment Figure 1 - A High Fan-in Architecture Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004 1357

2.1.2 Query Processing: A Unifying Framework system are the receptor devices that repeatedly measure some aspect of the physical world (or perhaps the virtual Traditionally, sensor-based information systems have world, say in the case of a network monitor or other been deployed using a piecemeal approach — a sensor- logical sensor). These devices are typically concerned specific programming environment is used to task the with fairly short time scales, perhaps on the order of edge receptors, a separate transport or information bus is seconds or less. As one moves away from the edges, the used to route the sensor readings, and a database system timescales of interest increase. or other data manager is used to collect and process the For example, in a retail RFID scenario, individual sensor readings. As a result, sensor application readers on shelves may read several times a second, while deployments have tended to be costly, difficult, and the manager of a store may be concerned with how sales inflexible. of particular items are going over the course of a morning, In contrast, our work is based on the notion that and planners at regional and corporate centers may be database techniques in general and data stream query more concerned with longer-term sales trends over a processing in particular have a role to play at all levels of season or several seasons. large-scale High Fan-in systems. In fact, we propose to use stream query processing and views as a unifying Space - As with time, the area of geographic interest framework for data access across an entire High Fan-in grows significantly as one moves from the edges of a environment. Thus, while it is certainly the case that the High Fan-in system to the interior. Again using the retail various types of sensor devices and computing platforms RFID scenario, individual readers are concerned with a present in such systems all have their own unique space of a few square meters, aggregation points within characteristics and idiosyncrasies, our design uses stream the store would be concerned with entire departments or query processing as the glue that binds these disparate perhaps the store as a whole, and regional and national pieces together to become a highly-functional application centers are concerned with those much larger deployment platform. geographical areas. Building on earlier work at Berkeley on both adaptive data stream processing (TelegraphCQ [CCDF+03]) and Resources – Finally, the range of computing resources sensor network databases (TinyDB [MFHH03]) we have available at various levels of a High Fan-in system also developed a new multi-level system for continuous and vary dramatically, from small, cheap sensor motes (e.g., historical data processing in large-scale, sensor-rich Berkeley Motes, see Figure 2) on the edges up to the applications. Our system, called HiFi, consists of host- largest mainframes in the interior of the system. centric processing components and device-centric Communication resources also can range from low power, processing components that can interoperate in a seamless lossy radios at the edges, to dedicated high-speed fiber in manner. As such, HiFi runs across a gamut of platforms, the interior. ranging from battery powered wireless sensor motes and RFID readers, to mid-tier Linux-based sensor platforms and PDAs, to plugged-in, diskful servers. 2. HiFi Challenges and Architecture While building on the growing body of work in the areas of data stream processing, sensor network databases, and data integration, the design of HiFi also addresses a number of challenges that arise from the unique properties of High Fan-in architectures and the applications they support. 2.1 The Challenges of Scale From a data management perspective, the most Figure 2 - Berkeley Mica Wireless Sensor Mote challenging new aspect that High Fan-in systems bring to the table is the tremendous range they span in terms of The key architectural principle on which the design of three key characteristics: time, space, and resources. HiFi rests is the use of declarative query processing to provide uniform data access across all of these various Time – Timescales of interest in a High Fan-in system scales. As with other data-intensive environments, the can range from seconds or less at the edges, to years in the idea is to use the declarative approach to shield interior of the system. At the edges of a High Fan-in application developers from the complexities of the 1358

3.underlying platform, while enabling the system to device requirements. First, these nodes serve as optimize and efficiently execute data access and aggregation points for ever larger geographic regions, processing operations. resulting in an increase in the number of streams being monitored and the aggregate data volume to be processed. 2.2 Cascading Stream Processing Secondly, many applications will need to archive streams HiFi is based on a notion of cascading stream processing, and provide access to that past data. Thus, diskful in which data streams collected at the very edges of the systems will be required. Finally, the availability of large network are continually filtered, refined, aggregated and volumes of both live and historical stream data will make brought in towards the interior of the network. The multi- such systems magnets for queries from throughout the platform nature of the system enables seamless transitions network. between sensor-like devices, mid-tier single-board sensor aggregators, and host computers. Each RFID reader or sensor network access point in a HiFi network is a data stream generation machine at the edge of a large (potentially global) information system. While the volume of data from any single stream is likely to be modest, the low cost and eventual ubiquity of such devices leads to a torrent of data in aggregate. Furthermore, because these devices are embedded in the physical world, they are geographically proximate to the entities or spaces they are intended to be monitoring. As such, the processing of the data streaming from these devices will exhibit highly-localized patterns at many granularities. For example, an RFID reader on a store shelf sampling several times a second will constantly and repeatedly detect the presence of items on that shelf. While at one level, these “beeps” are indeed a data stream, Figure 3 - Stargate Mid-tier Processing Node in general the continued presence of an item on the shelf is not data of interest beyond the scope of that shelf. The hierarchical nature of a High Fan-in architecture leads Rather, it is only when an item is removed or a new item naturally to the use of stream query processing for appears that an interesting event worthy of propagation aggregation. That is, as data streams flow from the edges can be said to occur. towards the interior they can be combined and aggregated Such concerns argue for the ability to place stream in order to produce summaries and reduce data volume. processing at or close to the edges of these receptor-based Indeed, aggregation is one of the major uses of stream systems, in order to perform highly-local tasks such as query processing in HiFi. There are, however, a number data cleaning, filtering, and simple event detection. of other important tasks for which we believe stream Another argument for pushing stream query processing query processing is particularly well suited. These out towards the edges of the network is to reduce the include: communication requirements for battery-powered wireless devices such as Berkeley Motes, as has been • Data Cleaning – Sensors and RFID readers are demonstrated in our TinyDB work and other sensor notoriously noisy devices, and dealing with the network database projects. poor quality of data they produce is one of the On the other hand, much stream processing is likely to main challenges in a High Fan-in (or any sensor- be best performed on mid-tier devices such as the Intel based) system. We believe that declarative Stargate single-board computer (see Figure 3), which is a queries can be used to specify cleaning low-power Linux machine that can run on Li Ion batteries functionality for any single device as well as or can be plugged into the wall, and can use 802.11 across groups of devices. communication (compared to the lower range/lower power radios used on sensor motes). Processing tasks • Event Monitoring – One of the main functions that involve the correlation of multiple streams and/or the of a High Fan-in system is to continuously application of more sophisticated filtering and business monitor the edge environment, and to send alerts rules are good candidates for such devices. when events of interest are detected. These Still other stream processing will best be done on host events may vary significantly in terms of the computers. As you move in towards the interior of the timescales and geographic areas over which they network, three factors combine to change the nature of are detected. While many commercial systems 1359

4. have developed their own “event languages”, we readers, mid-tier sensor aggregation points, and believe that a stream query language (perhaps host computers. suitably extended) is the right substrate for such functionality. • Unified and adaptive cross-platform query optimization in this heterogeneous environment. • Stream Correlation – A further advantage of a query-based approach is that it natively supports • In-network query processing both in the interior the ability to compare and correlate data from and at the edges of the network. multiple streams. Such streams may be homogeneous, as in the case of comparing • The use of stream queries and views for temperature readings from a group of identical providing key functionality such as data sensors, or heterogeneous, as in the case of cleaning, filtering, and event detection. combining temperature readings with RFID “beeps”. • Support for continuous multi-level aggregation. • Outlier Detection – Another form of data We will show HiFi running on a heterogeneous network reduction provided by a HiFi System is outlier consisting of at least the following platforms: detection. In many monitoring systems, expected events are of less immediate interest 1) Laptop PCs than anomalies. Queries can be used to detect and propagate various types of outliers in a 2) Intel Stargate low-power single-board computer. streaming environment. 3) Berkeley sensor motes capable of sensing light, As the above (partial) list indicates, we believe that stream temperature, and sound. query processing can serve as the basis of a wide range of High Fan-in functionality in a uniform manner. This is a 4) RFID readers with passive, read-only RFID tags. significant departure from the ad hoc way in which such systems are currently being constructed. Our initial HiFi The demonstration will highlight the various features and prototype has been developed as a proof-of-concept underlying technology of the HiFi design and will platform, in order to test the viability of our query-based emphasize the power and flexibility of a uniform stream approach. query model for supporting applications over multiple classes of receptor devices and networks. 3. Overview of the Demo We have built an initial version of HiFi using the REFERENCES TelegraphCQ stream query processor and the TinyDB sensor database system. The goal of this prototype is to [CCDF+03] S. Chandrasekaran, O. Cooper, A. examine the feasibility of the uniform query processing Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. model and to derive a better understanding of the core Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. components required for building High Fan-in systems. Shah, “TelegraphCQ: Continuous Data Flow Processing In this demo, we will show how to use HiFi to query for an Uncertain World”, Proceedings of the 1st and correlate data from multiple streaming sources. In Conference on Innovative Data Systems Research (CIDR particular we intend to implement a simple tracking 2003), Asilomar, CA, January, 2003. application using passive RFID tags and sensor motes and showing the power of stream query processing for [MFHH03] S. Madden, M. Franklin, J. Hellerstein, and providing real-time analyses of correlated readings across W. Hong, “The Design of an Acquisitional Query multiple streams and device types and successive levels of Processor for Sensor Networks”, Proceedings of the ACM aggregation. In addition to showing some gadgets such as SIGMOD Int'l Conf. on Management of Data (SIGMOD RFID tags/readers and several classes of wireless sensor 2003), San Diego, CA, June 2003, pp 491-502. devices, we will also demonstrate some of the salient features of the architecture, including: • Integration of multiple platforms including wireless sensor motes, passive RFID tags and 1360