## 展开查看详情

1.The VLDB Journal 11: 68–91 (2002) / Digital Object Identiﬁer (DOI) 10.1007/s007780200062 Query processing techniques for arrays Arunprasad P. Marathe∗ , Kenneth Salem Department of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada e-mail: {apmarathe,kmsalem}@uwaterloo.ca Edited by M. Carey. Received: 10 August 2001 / Accepted: 11 December 2001 Published online: 24 May 2002 – c Springer-Verlag 2002 Abstract. Arrays are a common and important class of data. seven-band Thematic Mapper = image At present, database systems do not provide adequate array dim. 2 (spectral) support: arrays can neither be easily deﬁned nor conveniently dim. 0 A manipulated. Further, array manipulations are not optimized. This paper describes a language called the Array Manipulation dim. 1 Language (AML), for expressing array manipulations, and a collection of optimization techniques for AML expressions. noise-reduced noise-reduced band 3 image B C In the AML framework for array manipulation, arbitrary band 4 image externally-deﬁned functions can be applied to arrays in a struc- tured manner. AML can be adapted to different application do- mains by choosing appropriate external function deﬁnitions. This paper concentrates on arrays occurring in databases of D digital images such as satellite or medical images. AML queries can be treated declaratively and subjected TVI image to rewrite optimizations. Rewriting minimizes the number of Fig. 1. A Thematic Mapper image and several derived images applications of potentially costly external functions required to compute a query result. AML queries can also be optimized for space. Query results are generated a piece at a time by pipelined execution plans, and the amount of memory required computational models. Array data are common in many ap- by a plan depends on the order in which pieces are generated. plication domains, such as remote sensing and medical imag- An optimizer can consider generating the pieces of the query ing [3,21,22]. Most database management systems (DBMS), result in a variety of orders, and can efﬁciently choose or- however, provide very limited support for arrays. ders that require less space. An AML-based prototype array This paper presents the Array Manipulation Language database system called ArrayDB has been built, and it is used (AML), which can be used to describe array queries. AML to show the effectiveness of these optimization techniques. expressions describe how arbitrary, externally deﬁned func- tions are used to generate a desired query result. Thus, by Key words: Array manipulation language – Array query op- appropriate choice of functions, AML can be customized for timization – Declarative query language – User-deﬁned func- a particular application. Because arrays may be large, and ar- tions – Pipelined evaluation – Memory-usage optimization ray manipulations complex, array queries may be expensive. This paper presents an array query processing algorithm that generates optimized, pipelined query evaluation plans from AML queries. The query processor is implemented in an array database system called ArrayDB. 1 Introduction Figure 1 shows a remote sensing example that illustrates the kind of array queries for which AML is well suited. The Arrays are an appropriate model for many types of data, in- three-dimensional array A in Fig. 1 represents a multi-spectral cluding digital images, digital video, and gridded outputs from image captured by the Landsat Thematic Mapper. Two of the This research was partially supported by NSERC (Natural Sciences array dimensions are spatial, and the third is spectral. This and Engineering Research Council of Canada). We are grateful to the array can be thought of as a stack of seven two-dimensional anonymous reviewers for their comments and suggestions. images of the same scene, each captured using a sensor sen- ∗ sitive to a different band of the electromagnetic spectrum. Present address: Open Text Corporation, 185 Columbia Street West, Waterloo, Ontario N2L 5Z5, Canada Often, multi-spectral images such as A are not used di- e-mail: amarathe@opentext.com rectly. Instead, useful parameters are derived from them. An

2.A.P. Marathe, K. Salem: Query processing techniques for arrays 69 fnr (v0 , v1 , v2 , v3 , v4 , v5 , v6 , v7 , v8 ) { C. Finally, speciﬁc array transformations may have properties x ← (v1 + v3 + v5 + v7 )/4; that can be exploited by an optimizer that understands them. y ← (v2 + v4 + v6 + v8 )/4; For example, the noise reduction algorithm that produces ar- z ← |x − y|; rays B and C is a discrete two-dimensional convolution. An if ( (|v0 − x| > 2z) ∨ (|v0 − y| > 2z) ) return y; optimizer with some knowledge of linear systems might be else return v0 ; able to infer, for example, that adding two noise-reduced im- } ages is equivalent to applying noise reduction to their sum. This paper makes the following contributions. First, it Fig. 2. A noise reduction ﬁlter. v0 is the original cell value; v1 through presents AML, a language for deﬁning array queries. Each v8 are the values of its eight neighbors, numbered clockwise from of the arrays B, C and D from Fig. 1 can be described as an the upper left AML query against the original Landsat Thematic Mapper im- age A. Since the result of an AML query is an array, AML can be used to deﬁne views, such as the TVI image, on stored base example of such a parameter is the transformed vegetation arrays, such as the multi-spectral Thematic Mapper image. index (TVI). The TVI at any point is computed from the radi- Second, it describes the array query processing techniques ance in the third and fourth spectral bands at the corresponding that are implemented in ArrayDB. ArrayDB has a rewrite op- point in the Thematic Mapper image, using the function [20, timizer that transforms AML expressions into equivalent ex- Chap. 7]: pressions that may be much cheaper to evaluate. From these 0.5 optimized expressions, ArrayDB generates pipelined query b4 − b3 ftvi (b3 , b4 ) = + 0.5 , (1) evaluation plans. Arrays ﬂow through the pipelines in small b4 + b3 chunks, and materialization of potentially large intermediate where bi denotes the radiance in the ith band. The TVI at each results can often be avoided. ArrayDB’s optimizer does not point in the scene is indicative of the amount of green biomass exploit all of the optimization opportunities that are described present there [20]. above, primarily because AML itself does not capture every- A Thematic Mapper image may include noise from a va- thing needed to exploit them. For example, the optimizer does riety of sources. Since noise can degrade the true radiometric not “understand” convolution. However, AML is quite good at information content of the image, it may be desirable to at- capturing regular structure in array queries. ArrayDB exploits tempt to reduce the noise before performing further calcula- this to reorder query operators, eliminate unnecessary work, tions, such as extraction of the TVI [20]. Thus, a TVI image and reduce space requirements. may be generated in two steps, as illustrated in Fig. 1. First, Third, it presents an empirical evaluation of ArrayDB’s noise reduction is applied to the third and fourth bands of query processor. The evaluation is based on a small suite of the Thematic Mapper image, resulting in arrays B and C re- array queries, including the TVI query illustrated in Fig. 1. spectively. The TVI image (array D) is then derived from the The evaluation demonstrates the effectiveness of the query noise-reduced bands. processor. It also illustrates some of the limitations of AML, There are many kinds of noise reduction techniques. To and of ArrayDB’s query evaluation strategies. make the example concrete, we will assume that noise re- The rest of the paper is organized as follows. The array data duction is accomplished using a convolution ﬁlter. The ﬁlter model and the AML query language are desribed in Sect. 2 computes the noise-reduced radiance at a particular point from and Sect. 3 respectively. Section 4 describes ArrayDB, and original radiance at that point and the radiances of its eight im- the techniques that it uses for optimizing and evaluating AML mediate neighbors. (Noise reduction is applied independently queries. Section 5 describes the array query suite that serves to each of the bands of interest.) The exact calculation, which as ArrayDB’s workload for the performance evaluation. The is adapted from [20], is shown in Fig. 2. evaluation itself is presented in Sect. 6. Section 7 consists of This example illustrates several points. First, there is a a survey of related work. The scenario illustrated in Fig. 1 is wide variety of complex, domain-speciﬁc array transforma- used as a running example throughout this paper. tions that may be used to deﬁne the desired result, such as the TVI array, of an array query. An array query language should be ﬂexible enough to express them. 2 Data model and terminology Second, there is considerable room for query optimiza- Many of the deﬁnitions in this paper involve inﬁnite vectors tion. Array queries may have a regular structure that can be of non-negative integers. The notation x[i] refers to the ith exploited. In the case of Fig. 1, it is not difﬁcult to determine element of the vector x. Indexing starts at zero. The vectors which points in the original Thematic Mapper image (A) con- consisting entirely of zeros or entirely of ones are denoted by tribute to the TVI at a particular point in array D. Thus, if only 0 and 1, respectively. part of array D is required, it may be possible to generate that Operations on vectors are applied element by element, un- part without using all of A. Also, the TVI values in D can be less otherwise stated. Thus, z = x/y means that for all calculated in any order. i ≥ 0, z[i] = x[i]/y[i] . Similarly, predicates such as x < y There are other opportunities as well. Redundant calcula- are true iff x[i] < y[i] for all i ≥ 0. tions can be eliminated using techniques such as caching and view materialization. For example, several different parame- Deﬁnition 1 (Shape). A shape is an inﬁnite vector of non- ters may be derived from the noise-reduced arrays B and C, negative integers. although only one is illustrated in Fig. 1. It that case, it might Shapes are written by listing the vector elements between an- be a good idea to materialize (compute and store) arrays B and gled brackets. All elements not listed explicitly are assumed

3.70 A.P. Marathe, K. Salem: Query processing techniques for arrays a subarray of a slab along As Fig. 3 shows, a subarray is simply an array that is wholly dimension 1 contained within another. The position of the subarray within A at x the containing array is identiﬁed by the position of the subar- array A ray’s smallest point (indicated by a dot in Fig. 3). x [0] Deﬁnition 7 (Array slab). A slab of an array A in dimension dim. 0 a slab along dimension 0 i (i-slab for short) is a subarray of A with the shape . . . , A[i− dim. 1 1], 1, A[i + 1], . . . . As illustrated in Fig. 3, an i-slab is simply a slice of unit width x [1] Fig. 3. Subarrays and slabs through an array in the ith dimension. There are A[i] i-slabs in an array A. to be ones. Thus, the shapes 1, 1, 2 and 4, 4 denote the inﬁnite vectors 1, 1, 2, 1, 1, 1, . . . and 4, 4, 1, 1, 1, . . . , re- 3 The array manipulation language spectively. AML is an algebra consisting of three operators that manip- Deﬁnition 2 (Vector containment). A vector x is contained ulate arrays [26]. Each operator takes one or more arrays as in a shape A iff 0 ≤ x < A. We write x ∈ A or “x in A”. arguments, and produces an array as result. Subsample (sub Deﬁnition 3 (Array). An array A consists of a shape A, a for short) is a unary operator that can delete data. Merge is value domain DA , and a mapping MA . The ith element of the a binary operator that combines two arrays deﬁned over the shape A, A[i], represents the length of the array in dimension same domain. Apply applies a user-deﬁned function to an ar- i. The domain DA is a non-empty set of values. The mapping ray to produce a new array. The manner in which the function MA maps each vector x ∈ A to an element of DA . is applied is described in Sect. 3.3. All of the AML operators take bit patterns as parameters. AML arrays have an inﬁnite number of dimensions, num- Deﬁnition 8 (Bit pattern). A bit pattern P is an inﬁnite bi- bered from zero. Vectors x that are in A are sometimes called nary vector of the form r∗ , where r is a ﬁnite binary vector. the cells or elements or points of A. A[x] denotes the value of element x of A. That is, A[x] = MA (x). (In contrast, as The ﬁnite vector r is used to represent the inﬁnite pat- mentioned in Deﬁnition 3, A[i] represents the length of A in tern P . For example, we write P = 1011 to mean P = dimension i.) When the position of an array element in some 101110111011 . . .. Notice that “01” and “0101” represent dimension is not speciﬁed, it is assumed to be zero. Thus, both the same pattern 01010101 . . .. When appropriate, we use A[0, 1] and A[0, 1, 0, 0, . . .] denote value of the same element run-length encoding to further compress the pattern nota- of array A. tion. For example, P = 03 12 02 means that P consists The mapping function MA of an array A only deﬁnes of an inﬁnite number of repetitions of two ones sand- domain values for points x that are within A. Sometimes, wiched between three zeros and two zeros. That is, P = however, it will be convenient to think of arrays as having 000110000011000001100 . . .. P denotes the bit-wise com- inﬁnite lengths in all dimensions. For this purpose, A[x] is plement of a pattern P . To simplify our notation, we deﬁne deﬁned to be ⊥ for all points x ∈ A, where ⊥ is a special that P [i] = 0 for all i < 0. value distinct from any other value in any domain. The operator deﬁnitions make use of two pattern functions, In programming languages, the “type” of an array is fre- index and count. quently a composite made up of the array’s shape and the Deﬁnition 9 (Index). If P is a bit pattern and k a positive array’s value domain. However, for our purposes it is more integer, index(P , k) is the index of the kth ‘1’ in P . If P = 0 convenient to keep these two aspects of type distinct from (denoting the pattern 0∗ ), index(P , k) is deﬁned to be −1 for each other. Therefore, Deﬁnition 3 distinguishes between the all k > 0. shape (A) and the domain (DA ) of array A. Deﬁnition 10 (Count). If P is a bit pattern and k a non- Deﬁnition 4 (Size). The size of an array A, written |A|, is negative integer, count(P , k) is the number of ones in the ∞ i=0 A[i]. ﬁrst k + 1 positions of P , i.e., from P [0] to P [k], inclusive. Deﬁnition 5 (Dimensionality). The dimensionality of array A is written dim(A). If |A| is 0 then dim(A) is undeﬁned; if 3.1 The subsample operation |A| is ∞ then dim(A) is ∞; otherwise, dim(A) is the smallest i such that A[j] = 1 for all j ≥ i. If dim(A) is d, then A is The sub operator takes an array, a dimension number, and a called a d-dimensional array. pattern as parameters, and produces an array. The dimension number is written as a subscript, as in An array having a length of zero in one or more dimensions B = subi (P , A), is called a null array, denoted by NULL. Such arrays have zero size and undeﬁned dimensionality. where A is an array, P is a pattern, and i is the dimension number. The operator divides A into slabs along dimension Deﬁnition 6 (Subarray). Array B is a subarray at point x of i, and then retains or discards slabs based on the pattern P . array A iff DB = DA , x ∈ A, and for every point y in B, If P [k] = 1, then slab k is retained, otherwise it is not. The B[y] = A[x + y]. retained slabs are concatenated to produce the result B.

4.A.P. Marathe, K. Salem: Query processing techniques for arrays 71 B = SUB 1(10, A) The next two theorems describe how two successive sub 20 22 24 operations can be combined or reordered. 10 12 14 Theorem 4 (Combining two subs). 00 02 04 If P = 0 and Q = 0, then array A B = SUB 0(10, A) subi (Q, subi (P , A)) = subi (R, A), 20 21 22 23 24 25 20 21 22 23 24 25 where R is deﬁned by 10 11 12 13 14 15 00 01 02 03 04 05 R[j] = P [j] ∧ Q[count(P , j) − 1] 00 01 02 03 04 05 B = SUB (0000111, A) for all j ≥ 0. 1 24 25 Note that in Theorem 4, it sufﬁces to generate A[i] bits of dim. 0 14 15 R, and treat it as the r in Deﬁnition 8 because subsequent bits of R are not relevant. A similar observation applies to all of dim. 1 04 05 the new patterns that the subsequent AML rewrite rules deﬁne, Fig. 4. Examples of the subsample operation although we will not explicitly state it. Theorem 5 (Reordering two subs). Deﬁnition 11 (subsample). Let A be an array and P be a If i = j then pattern. The result of subi (P , A) is an array. The resulting subi (Q, subj (P , A)) = subj (P , subi (Q, A)). array, which will be called B, is deﬁned in terms of A and P as follows: • DB = DA 3.2 The merge operation • if A[i] > 0, then B[i] = count(P , A[i] − 1), else B[i] = 0 The merge operator takes two arrays, a dimension number, a • for all j ≥ 0 except j = i, B[j] = A[j] pattern, and a default value as parameters. It merges the two • for all points x in B, B[. . . , x[i − 1], x[i], x[i + 1], . . .] = arrays to produce its result. The dimension number is written A[. . . , x[i − 1], index(P , x[i] + 1), x[i + 1], . . .] as a subscript, as in Several applications of the subsample operator are illus- C = mergei (P , A, B, δ), trated in Fig. 4. With the sub pattern “10”, the array B in the where A and B are arrays, P is the pattern, and δ is the default top expression in Fig. 4 is formed by choosing every other value. The explicit reference to δ will often be dropped if the 1-slab of the array A. In the middle expression, the sub pat- default is not important. Merge is deﬁned only if DA = DB tern “10” selects every other 0-slab (row) from the array A. and δ ∈ DA . In the bottom expression, the sub selects the last two 1-slabs Conceptually, merge divides both A and B into slabs (columns) of A. Note that in the expression subi (P , A), only along dimension i. C is obtained by merging these slabs ac- the ﬁrst A[i] (the length of A in dimension i) bits of the pattern cording to the pattern P ; ones in P correspond to slabs from A P are relevant. (the ﬁrst array), and zeros to slabs from B (the second array). In the example shown in Fig. 1, the expression For example, if P = 101 (which stands for the inﬁnite pattern B = sub2 (0010000, A) (2) 101101101 · · ·), then a slab from B is sandwiched between can be used to extract the third spectral band from Thematic two slabs from A. The merging process repeats until all the Mapper array A. The sub operation is applied to A in dimen- slabs from both A and B are exhausted. sion two, which is the spectral dimension. The ‘1’ in the third It is convenient to deﬁne merge formally in two steps. position of the sub pattern indicates that the third band is to The ﬁrst step generates an array C by interleaving slabs from be kept. A similar expression can be used to extract the fourth A and B, as described above. Because of shape mismatches spectral band in Fig. 1. between A and B, however, or because of the particular pattern The following theorems follow easily from the deﬁnition P , some values in C may be ⊥. The second step eliminates of subsample, and are stated without proof.1 this problem by transforming any such ⊥ values to the default value δ. Theorem 1 (Sub with NULL array). subi (P , NULL) = NULL. Deﬁnition 12 (merge). Let A and B be arrays such that DA = DB . Let P be a pattern, and δ be a value from DA . The Theorem 2 (Sub with ‘0’ pattern). result of mergei (P , A, B, δ) is an array. This array, which subi (0, A) = NULL. will be called C, is deﬁned in two steps. First, an intermediate Theorem 3 (Sub with ‘1’ pattern). array C is deﬁned as follows: subi (1, A) = A. • DC = DA ∪ {⊥} 1 Proofs of all of the nontrivial theorems involving AML operators • if A[i] = 0 and B[i] = 0, then C [i] = 0; otherwise can be found in [23]. To illustrate the general proof technique, a proof C [i] = max(index(P , A[i]), index(P , B[i])) + 1 of one such nontrivial theorem, Theorem 10, is given in Appendix A. • for all j ≥ 0 except j = i, C [j] = max(A[j], B[j])

5.72 A.P. Marathe, K. Salem: Query processing techniques for arrays A a10 a11 b10 a12 a13 b11 Theorem 6 (Merge with ‘0’ pattern). MERGE1 (110, A, B) a10 a11 a12 a13 a00 a01 b00 a02 a03 b01 mergei (0, A, B, δ) = B. a00 a01 a02 a03 dim. 0 Theorem 7 (Merge with ‘1’ pattern). b10 b11 δ δ dim. 1 mergei (1, A, B, δ) = A. B δ δ δ δ b10 b11 a10 a11 a12 a13 MERGE (101, A, B,δ ) 0 The merge operator is commutative and associative, pro- b00 b01 b00 b01 δ δ vided that the merge patterns are properly adjusted. a00 a01 a02 a03 Theorem 8 (Commutativity of merge). Fig. 5. Examples of the merge operation mergei (P , A, B, δ) = mergei (P , B, A, δ). Theorem 9 (Associativity of merge). • for all points x in C : If P = 0, P = 1, Q = 0, Q = 1, and the expression on the – if P [x[i]] = 1, then C [. . . , x[i−1], x[i], x[i+1], . . .] left is merge-balanced, then = A[. . . , x[i − 1], count(P , x[i]) − 1, x[i + 1], . . .], – otherwise C [. . . , x[i − 1], x[i], x[i + 1], . . .] = mergei ( Q, mergei (P , A, B, δ), C, δ) = B[. . . , x[i − 1], count(P , x[i]) − 1, x[i + 1], . . .] mergei (R, A, mergei (S, B, C, δ), δ), Next, array C is deﬁned as C with ⊥ values replaced by δ. where That is: DC = DA ; for all i ≥ 0, C[i] = C [i]; and for all points x in C, if C [x] = ⊥, then C[x] = δ, otherwise R[j] = Q[j] ∧ P [count(Q, j) − 1], C[x] = C [x]. and Figure 5 illustrates the merge operation. The ﬁrst example illustrates a merge in dimension 1. That is, columns of A are S[j] = Q[index(R, j + 1)], merged with columns of B. The second example shows a row for all j ≥ 0. Furthermore, the AML expression on the right- (dimension zero) merge. The second example also shows how hand side is merge-balanced. the default value (δ) is used by a mergei operation. It serves two purposes. First, in a dimension j = i, the lengths of the The following two theorems show that sub and merge two arrays may not match. If so, the shorter array (B in Fig. 5) operators can be reordered. In particular, they describe how a is expanded, using δ values, until it matches the length of sub can be pushed through a subsequent merge operator. A the longer array. Second, as the two arrays are interleaved in proof of Theorem 10 appears in Appendix A. dimension i, the operation may exhaust the slabs of one array before it exhausts the slabs of the other. In this case also, slabs Theorem 10 (Pushing sub through merge, version 1). ﬁlled with δ values are used in place of the array slabs from If P = 0, P = 1, Q = 0 and the expression on the left is the exhausted array. merge-balanced, then For some merge operators with particular patterns, the arrays C and C from Deﬁnition 12 are identical. If so, the subi ( Q, mergei (P , A, B, δ)) = merge operator is said to be balanced. mergei (T , subi (R, A), subi (S, B), δ), Deﬁnition 13 (Balanced merge). The merge operation where mergei (P , A, B, δ) is balanced if both of the following con- ditions hold: R[j] = Q[index(P , j + 1)], 1. For all dimensions j = i, A[j] = B[j]. and 2. C[i] = (A[i] + B[i]). S[j] = Q[index(P , j + 1)], where C is the array deﬁned by the merge. and In Fig. 5, the top merge is balanced, whereas the bottom merge is not. An AML expression in which all merge op- T [j] = P [index(Q, j + 1)], erations are balanced is said to be in merge-balanced form. Theorems 10 and 11 that follow hold only for AML expres- for all j ≥ 0. Furthermore, the merge operation on the right sions that are in merge-balanced form. is balanced. In the running example of Fig. 1, array B can be put on top of array C using the expression Theorem 11 (Pushing sub through merge, version 2). If i = j and the merge on the left is balanced, then D = merge2 (10, B, C). (3) (D is not explicitly shown in Fig. 1.) When the TVI function is subi ( Q, mergej (P , A, B, δ)) = applied to D , the TVI image D results. As Eq. (3) illustrates, mergej (P , subi (Q, A), subi (Q, B), δ). merge can be used to increase the dimensionalities of arrays. Theorems 6–8 follow easily from the deﬁnition of merge. Furthermore, the merge operation on the right is balanced.

6.A.P. Marathe, K. Salem: Query processing techniques for arrays 73 3 dim. 0 2 dim. 1 array A 1 f(A, [2,2]) 0 0 1 2 3 4 f: f(A, [2,1]) f(A, [0,1]) B = APPLY(f, A, 10, 0110) 1 f(A, [0,2]) array B 0 P0 P1 0 1 2 3 Fig. 6. An illustration of the apply operation. The notation f (A, x) refers to the result of applying f to the subarray of A of shape D f at x. Thus, f (A, x) is an array of shape Rf . In this example, D f is 2, 2 , and Rf is 1, 2 3.3 The apply operation only if x falls in selected slabs in all d dimensions of the array; that is, only if Pi [x[i]] = 1 for all 0 ≤ i < dim(A). The apply operator applies an externally-deﬁned function to In Fig. 6, the pattern P0 = 10 (which means 101010 . . .) an array to produce a new array. In its most general form, it is selects the ﬁrst and third row slabs, whereas the pattern P1 = written as 0110 selects the second and third column slabs. This leads to a total of four applications of the function f . The dashed B = apply(f, A, P0 , P1 , . . . , Pd−1 ), squares in A in Fig. 6 show the four subarrays to which f is applied. Each application of f produces a result of shape where f is the function to be applied, A is the array to apply Rf = 1, 2 . These four arrays are concatenated, as shown in it to, Pi ’s are patterns, and d = dim(A). Patterns that con- Fig. 6, to produce the ﬁnal result. sist entirely of ones are often dropped from the notation. In particular, in the expression Deﬁnition 14 (apply). Let A be an array, f be a function B = apply(f, A), that maps arrays of shape D f over domain DA to arrays of shape Rf over domain D, and P0 , P1 , . . . , Pdim(A)−1 be all of the patterns are assumed to be ones. patterns. Let f (A, x), where x ∈ A, represent the result of A simple way to deﬁne an operation, like apply, that ap- applying f to the subarray of A of shape D f at x. The result plies an externally deﬁned function f would be to insist that of the expression apply(f, A, P0 , P1 , . . . , Pdim(A)−1 ) is an f map from arrays of A’s shape and domain to arrays of B’s array. This array, which will be called B, is deﬁned as follows: shape and domain. The operator would then simply compute B = f (A). However, many common array functions have • DB = D some structural locality: the value found at a particular point • for all i ≥ 0, in B depends only on the values at certain points in A, not on – if A[i] < D f [i] or Pi = 0, then B[i] = 0 the values at all points in A. For example, if f is a smooth- – otherwise B[i] = count(Pi , A[i] − D f [i]) · Rf [i] ing function that maps each point in A to the average of that • for all x in B, B[x] = f (A, y)[x mod Rf ], where y[i] = point and its neighbors, then to determine the value at some index(Pi , x[i]/Rf [i] + 1) for all 0 ≤ i < dim(A) point in B, we need only look at the corresponding point and Several important properties of this deﬁnition are illus- its neighbors in A. Such information can be very valuable for trated in Fig. 6. First, although the subarrays to which f is optimizing the execution of an expression involving the array applied may overlap in A, the resulting subarrays do not over- operators. lap in the array B. Second, the arrangement of resulting sub- The apply operation is deﬁned so that this kind of struc- arrays in B preserves the spatial arrangement of the selected tural relationship can be made explicit when it exists. Apply subarrays in A. Finally, the subarrays to which f is applied requires that f be deﬁned to map subarrays of A of some ﬁxed must be entirely contained within A. In the example in Fig. 6, shape D f to subarrays of B of some ﬁxed shape Rf . The shape this means that even if the subarray at [3, 3] was selected by D f is called the domain box of f , and Rf is called the range the patterns, f (A, [3, 3]) would not be evaluated, since that box. The apply operator applies f to some or all of the sub- subarray lies partially outside of A. arrays (of shape D f ) of A. The results of these applications In the running example in Fig. 1, array B results from are concatenated to generate B. This process is illustrated in applying the noise reduction function fnr to the third spectral Fig. 6. band–array B deﬁned by Eq. (2). fnr maps arrays of shape The pattern arguments of apply specify to which of the 3, 3 to arrays of unit size. That is, D nr is 3, 3 , and Rnr is possible subarrays of the input array A the function f should 1, 1 . The AML expression that computes array B from array be applied. Pattern Pi can be thought of as selecting slabs in B is simply dimension i, with the selected slabs corresponding to the ones in the pattern. The function f is applied to the subarray at x B = apply(fnr , B ). (4)

7.74 A.P. Marathe, K. Salem: Query processing techniques for arrays Recall that an intermediate array D that puts B on top of C When Theorem 13 is applied, the new apply pattern, P i is deﬁned by Eq. (3). The TVI array D can be deﬁned using will contain fewer ones than the original pattern Pi .As a result, D = apply(ftvi , D ), (5) the apply operator will apply its function f fewer times. Theorem 14 is similar to Theorem 13, except that it ap- assuming that the function ftvi is deﬁned to have D tvi = plies to the input of an apply operator. Theorem 14 says that 1, 1, 2 and Rtvi = 1, 1 . The apply operation applies ftvi to if there are slabs of an apply’s input array that are not used by the corresponding pairs of cells in D . By combining Eqs. (4) any of the function applications that the apply performs, then and (5) with Eqs. (2) and (3), we arrive at the full AML ex- those slabs can be eliminated from the input array. Slab elim- pression that deﬁnes the TVI array in terms of the seven-band ination is accomplished by introducing a sub operator prior Thematic Mapper array A: to the apply. This does not reduce the number of function ap- D = apply(ftvi , plications that the apply operation must perform. The beneﬁt merge2 (10, of Theorem 14’s rewrite is that it may be possible to push the (6) apply(fnr , sub2 (0010000, A)), newly introduced sub operation down to and into earlier apply apply(fnr , sub2 (0001000, A)))). operations using the other rewrite rules. That is, the new sub operation may be used to make earlier apply operations less Often, it is necessary to apply an externally-deﬁned func- expensive. tion to all non-overlapping subarrays of a particular shape. For example, an inexpensive way to compute a low-resolution ver- Theorem 14 (Pulling sub out of apply). sion of an array is to tile the array, and to replace each tile with If Pi = 0 and A[i] ≥ D f [i] > 0, then a single cell having the average value of the cells under the tile. Since this type of function application is quite common, apply ( f, A, P0 , P1 , . . . , Pi , . . .) = the tiled apply operator is deﬁned to support it. Assuming apply(f, subi (Q, A), P0 , P1 , . . . , P i , . . .), that dim(A) = d, it is deﬁned as follows: where for 0 ≤ j ≤ (A[i] − D f [i]), tiled apply(f, A) ≡ (7) Q[j] = ∨jt=j−Df [i]+1 Pi [t]; apply(f, A, 10Df [0]−1 , 10Df [1]−1 , . . . , 10Df [d−1]−1 ). The following theorem follows immediately from the def- and for (A[i] − D f [i] + 1) ≤ j < A[i], inition of apply. Q[j] = 0; Theorem 12 (Apply with a ‘0’ pattern). apply(f, A, P0 , P1 , . . . , Pi , . . .) = NULL if any Pi = 0. and for all j ≥ 0, The next two theorems show how the structural locality P i [j] = Pi [index(Q, j + 1)]. captured by an apply operator can be used to reduce the num- ber of applications of an externally-deﬁned function, or to identify and eliminate unnecessary portions of the input array. 3.4 More on patterns and shapes Theorem 13 shows that if an apply operation produces data that a subsequent sub operation deletes, then under certain Patterns appearing in AML operations can be deﬁned in terms conditions, the apply operation’s patterns can be adjusted so of the shapes of array(s), domain boxes, or range boxes. For that it can avoid producing such data in the ﬁrst place. Speciﬁ- example, the expression cally, if the sub operation deletes all of the data produced by a sub0 (10A[0] , A) particular function application, then that function application selects only the ﬁrst 0-slab (row) of array A. Aliases can be can be eliminated. However, if any portion of the function’s used to deﬁne names for intermediate arrays. In the AML ex- range box is not eliminated by the sub, then the function ap- pression plication cannot be eliminated since some of the values that it produces are required. sub0 (10B[0] , apply(f, A) as B), Theorem 13 (Pushing sub into apply). the alias B refers to the array that results from the apply oper- If Pi = 0, Q = 0, and Rf [i] > 0, then ation. The deﬁnition of the tiled apply operator (Equation 7) illustrates the use of a domain box shape to deﬁne a pattern. subi ( Q, apply(f, A, P0 , P1 , . . . , Pi , . . .)) = Pattern deﬁnitions are not allowed to refer to array element subi (S, apply(f, A, P0 , P1 , . . . , P i , . . .)), values. A consequence of this restriction is that the shape of the result of an AML operation can always be determined (without where actually evaluating the operator) if the shapes of the operator’s R [i]−1 array arguments are known. By induction, we can show that the P i [j] = ∨t=0 f Q[((count(Pi , j) − 1) · Rf [i]) + t] shape of the result of an arbitrary AML expression can be de- ∧ Pi [j], termined once the shapes of the expression’s terminal, or leaf, arrays are known.2 This property is useful when evaluating and AML expressions because it implies that the space required to S[j] = Q[(count(Pi , index(P i , j/Rf [i] + 1) − 1) implement an AML operation can be determined in advance. 2 · Rf [i]) + (j mod Rf [i])], The FISh programming language — an experimental functional programming language for array programming — also puts a lot of for all j ≥ 0. emphasis on static shape analysis [17].

8.A.P. Marathe, K. Salem: Query processing techniques for arrays 75 AML Query 3.5 On AML’s expressiveness Preprocessing Catalogs A query language is expressive if it can perform many useful operations in its application domain. AML’s expressiveness Merge-balanced AML Query in image processing can be judged by an answer to the ques- Logical Rewriting tion: What image processing operations can AML express? Optimized AML Query Notice that AML can express any operation that produces an Plan array from an array by using a singleton apply operation — Generation an apply operation that directly maps from the input array Query Evaluation Plan to the output array. Of course, this characterization is neither Plan interesting nor useful. AML is designed to exploit structural Refinement locality often found in array manipulations: an output array Refined Query Evaluation Plan element can often be computed from a small set of adjacent Plan Evaluation Input Arrays elements of the input arrays. An AML evaluator is expected to optimize and efﬁciently evaluate array queries that con- tain structural locality. Since user-deﬁned functions are not Array Result interpreted by AML, expressions that contain singleton apply Fig. 7. Overview of Query Processing in ArrayDB operators will probably not be optimized effectively. There- fore, the expressiveness question should be rephrased as: What image processing operations can AML express without using extensibility with an emphasis on the former goal. It is ac- singleton applys? curate to say that we included only those operators in AML There is no single, widely-accepted image processing lan- that we knew we could optimize. Other operations must be guage; no universal set of image processing operations exists. implemented using singleton applys. To gauge AML’s ability to express image processing opera- tions, we compared it to Image Algebra [33,34] — a language believed to be very expressive in the image processing do- 4 AML query processing main. Ritter and Wilson [34] have gathered over 80 computer vision algorithms and their formulations in Image Algebra.3 To process an AML query, ArrayDB ﬁrst generates an efﬁcient At least one array database system, RasDaMan, has chosen a evaluation plan for the query, and then executes the plan to pro- query language based on Image Algebra. RasDaMan’s query duce the array that the query deﬁnes [27]. As shown in Fig. 7, language RasQL [3,43] is based on a subset of the Image Al- four steps are used to convert an AML query into an evaluation gebra operators. plan prior to execution: preprocessing, logical rewriting, plan The detailed comparison is reported in [23]. For the sake of generation, and plan reﬁnement. The remainder of this section brevity, we only summarize the conclusions of that study here. describes these steps in more detail. Despite containing only three operators, AML does a reason- able job of expressing many Image Algebra operators. Without resorting to singleton applys, AML can express the follow- 4.1 Query preprocessing ing image-manipulating operators of Image Algebra: induced operators, global reduce operators, some spatial transforma- The preprocessor begins by parsing theAML query, generating tions, image catenation, range restrictions, some domain re- a parse tree with one internal node for each sub, merge, and strictions, and image extension. AML’s apply can also express apply operator in the query, and a leaf node for each input the non-recursive version of image-template product — Im- array. age Algebra’s most useful operator. On the other hand, AML ArrayDB treats a leaf array as a special type of tiled apply cannot express the following image-manipulating operators operator which has no input array. Conceptually, the leaf of Image Algebra without resorting to singleton applys: ar- apply operator generates the corresponding leaf array. Like bitrary spatial transformations, arbitrary domain restrictions, other apply operations, each leaf apply is associated with an and recursive image-template product. Using recursive image- externally-deﬁned function. In the case of a leaf apply, this template product, one can enforce the order in which the pixels function is called an accessor function. A leaf apply opera- of an image are processed — for example, row-major order or tor generates a portion of its output array with each call to its serpentine scan order. It will be seen in Sect. 4 that an AML accessor function. In a non-leaf apply, each external function query processor that we have built considers several alterna- call uses a portion of the input array to produce a portion of tive orders when processing input array elements, but there is the output array. In the case of a leaf apply, each call to the no way to specify such an order in an AML query. We view accessor function generates a portion of the output by reading this feature as one of AML’s strengths: the query processor the stored representation of the leaf array. Leaf apply oper- has the ﬂexibility to choose an appropriate order. ations also have pattern parameters. These patterns have the Image Algebra’s primary design goals seem to have been same meaning that they do for non-leaf applys: they specify expressiveness and generality. Optimizability is not of primary which portions of the result array need to be generated. The concern. For AML, the design goals were optimizability and ArrayDB preprocessor assigns to each leaf apply node pat- terns that indicate that the entire leaf array is to be generated. 3 It should be noted that some of these algorithms use assignment However, these patterns may get modiﬁed during the logical statements and loops in addition to Image Algebra statements. rewriting phase.

9.76 A.P. Marathe, K. Salem: Query processing techniques for arrays b10 b11 δ δ treated like applys). Each apply operator applies its asso- δ δ δ δ ciated function some number of times. Let ci represent the a10 a11 a12 a13 dim. 0 number of function applications performed by the ith apply k b00 b01 δ δ dim. 1 operator. The function cost of Q, written cost(Q), is i=1 ci ; i.e., it is the total number of function applications performed a00 a01 a02 a03 by Q. MERGE0 ( 101 ) δ δ δ δ b10 b11 δ δ The logical rewriting procedure rewrites an AML query so a10 a11 a12 a13 b00 b01 δ δ that its function cost is minimized, in a restricted sense that a00 a01 a02 a03 is explained below. Since no step in the rewriting procedure MERGE0 ( 110 ) MERGE1 ( 1100 ) increases the number of applications of any function, function cost minimization means that no apply operation will perform a10 a11 a12 a13 b10 b11 δ δ more function applications in the rewritten query than it did a00 a01 a02 a03 δ δ δ δ b00 b01 δ δ in the original. We expect that a reduction in the number of array A DEFAULT array array B DEFAULT array function applications should lead to a reduction in the time re- quired to evaluate the query. Furthermore, since data retrieval Fig. 8. Illustration of merge balancing is modeled as function applications, reductions in the number of function applications in the query’s leaves translate directly to reductions in the amount of disk I/O. Once the query tree has been created, the preprocessor as- The logical rewriting procedure ﬁnds a query with mini- sociates type information with each apply (leaf and non-leaf) mum function cost from among the queries that are both equiv- in the query by consulting the ArrayDB catalogs. ArrayDB alent to and apply-consistent with the original query Q. (A maintains three catalogs: proof of this claim appers in [23].) Queries that are apply- Function Catalog: The function catalog records the name, do- consistent with Q apply the same functions, in the same order, main and range box shapes, and domain and range element as those that are applied by Q, although the number of appli- type names of each external function known to ArrayDB. cations of each function may vary. Array Catalog: The array catalog records the name, shape, element type name, location, and accessor function name Deﬁnition 16 (Apply-consistent). An AML query Q is of each stored array known to ArrayDB. apply-consistent with another AML query Q if there exists Type Catalog: The type catalog records the name and repre- a total mapping m from the apply operations in Q to the sentation size of each element type known to ArrayDB. apply operations in Q such that both of the following condi- tions hold: After the catalogs have been consulted, the preprocessor per- forms bottom-up type inference to determine the shape and • For every apply operation x in Q , x and m(x) use the element type of the array produced by each AML operator in same external function. the query. As was noted in Sect. 3.4, AML is designed so that • For all pairs x1 , x2 of apply operations in Q , if x1 pre- this is always possible. cedes x2 , then m(x1 ) precedes m(x2 ) in Q. Finally, the preprocessor converts the AML query into the merge-balanced form that was deﬁned in Sect. 3.2. Merge- Even though ArrayDB’s rewriting procedure ﬁnds a balancing involves replacing unbalanced merge operations minimum-cost, apply-consistent equivalent query, it is pos- with expressions involving balanced merge operations and sible that there are lower-cost equivalent queries that are not new leaf array constants.4 For example, the bottom unbalanced apply-consistent with the original. For example, it may be merge in Fig. 5 is balanced as illustrated in Fig. 8. In the worst possible to transform apply(fb , apply(fa , A)) into an equiv- case, merge-balancing may add up to 2dn nodes to the query alent expression apply(fc , A), where fc is a composition of tree, where n is the number of nodes before merge-balancing, fb and fa . ArrayDB does not attempt to ﬁnd such rewrites. In- and d is the maximum dimensionality of the arrays in the deed, it cannot ﬁnd them since it knows nothing about external query [23]. functions except their domains and ranges.5 The equivalence theorems from Sect. 3 are the basis for ArrayDB’s rewrite transformations. Figure 9 summarizes the 4.2 Logical rewriting transformations that are used during rewriting. ArrayDB ap- plies these transformations by making d top-down passes During logical rewriting, ArrayDB systematically transforms through the query tree, where d is the maximum dimension- the AML query into an equivalent form that is expected to be ality of any array appearing in the query.6 When the rewrite more efﬁcient to evaluate. Speciﬁcally, the logical rewriting process visits a node x on its ith top-down pass, it attempts to phase aims to reduce the function cost of an AML query. 5 Even with such limited knowledge, adjacent functions could be Deﬁnition 15 (Function cost). Suppose that an AML query composed, albeit in some very special cases. For example, if the two Q contains k apply operators, including its leaves (which are adjacent functions fa and fb map scalar elements to scalar elements, a composite function fc that calls fa and fb in sequence could be 4 All elements of these new arrays have the same value, and Ar- “created.” ArrayDB does not attempt such rewrites. 6 rayDB represents them using constant space, irrespective of the size This includes leaf arrays, the result array, and intermediate arrays of the array. produced by the query’s operators.

10.A.P. Marathe, K. Salem: Query processing techniques for arrays 77 Rewrites for subi on the ith rewrite pass in the modiﬁed tree.7 The time complexity of the rewriting Number Transformation Theorem procedure is O(d2 n), where n is the number of nodes in the SUB i query tree prior to the ﬁrst rewriting pass [23]. NULL X 1 Theorem 2 An example of the logical rewriting SUB i Consider the AML query in Expression 8, which returns the X lower-left quadrant of the TVI array (array D) from Fig. 1 in X 2 Theorem 3 Sect. 1. We have assumed that the Thematic Mapper array A’s SUB i shape is 1024, 2024, 7 . That is, each of the seven bands are SUB i of the shape 1024, 1024 . The TVI array’s shape will then SUB i be 1022, 1022 . The two outermost sub operations clip the 6 Theorem 4 lower-left quadrant: sub1 (1511 0511 ) keeps only the ﬁrst 511 SUB i SUB j columns of the TVI array, and sub0 (1511 0511 ) keeps only the ﬁrst 511 rows. The remainder of the expression is simply the SUB j SUB i deﬁnition of the TVI array in terms of the Thematic Mapper 7 Theorem 5 array (array A), as given in Eq. (6). (The function fA is the SUB i MERGE i accessor function associated with the array A.) sub1 (1511 0511 , 8 MERGE i SUB i SUB i Theorem 10 sub0 (1511 0511 , SUB i MERGE j apply(ftvi , merge2 (10, apply(fnr , MERGE j SUB i SUB i (8) 9 Theorem 11 sub2 (0010000, SUB i SUB i apply(fA , A))) apply(fnr , sub2 (0001000, APPLY APPLY 10 Theorem 13 apply(fA , A))))))) Rewrites for mergei on the ith rewrite pass ArrayDB’s logical rewriting procedure produces Expres- Number Transformation Theorem sion 9 from Expression 8. In the unoptimized expression (Ex- MERGE i pression 8), the apply(fA , A) operations read in the entire TVI Y array one band at a time, and the subsequent sub2 operations X Y ﬁlter all but the desired bands. In the optimized formula, the 3 Theorem 6 sub2 operations have been pushed into the apply operations MERGE i below them. Instead of reading the entire TVI array, each of X these leaf apply operations now reads only the band that is 4 X Y Theorem 7 required for the computation, as speciﬁed by the patterns P2 in the applys. Rewrites for apply on the ith rewrite pass apply(ftvi , Number Transformation Theorem merge2 (10, APPLY apply(fnr , NULL sub0 (1513 0511 , X sub1 (1513 0511 , 5 Theorem 12 (9) apply(fA , A, P2 = 0010000)))), APPLY APPLY apply(fnr , sub0 (1513 0511 , SUB i sub1 (1513 0511 , 11 Theorem 14 apply(fA , A, P2 = 0001000)))))) Fig. 9. ArrayDB logical rewrite. The tables show the rewrites consid- Similarly, the two outer sub operations that performed the ered by ArrayDB during its ith top-down rewriting pass through the clipping in Expression 8 have been pushed through the ap- query tree. The “current” node before and after the transformation is plications of ftvi and fnr . (In this example, they cannot be indicated using bold lines pushed all the way into the leaf apply operations, since the accessor function fA reads the stored Thematic Mapper array a full band at a time.) As a result, many fewer applications of fnr and ftvi are needed to evaluate Expression 9 than Expres- apply a rewrite at x if x is one of subi , mergei , or apply; oth- sion 8. Notice that as the clipping subs are pushed through erwise, x is ignored. If a rewrite can be applied at x, the query 7 Another alternative is to try to apply every possible rewrite in all tree is modiﬁed as illustrated in Fig. 9, and the pass continues of the d dimensions at a node before proceeding to its children.

11.78 A.P. Marathe, K. Salem: Query processing techniques for arrays Operator Arity Parameters Memory cost (buffer space Restrictions required in i-order) apply p 1 external function f |Rf | elements input chunk shape is D f , output chunk shape is Rf replicate p 1 output chunk shape (C out ), C out [i] i-slabs of the input input chunks must be of unit chunk order, patterns array, plus |C out | elements size regroup p 1 chunk order, input chunk C in [i] i-slabs of the input ar- output chunks are of unit size shape (C in ) ray, plus 1 element combine p k (k ≥ 1) chunk order, ﬁlter patterns, 1 element input and output chunks are write patterns of unit size leaf p 0 accessor function (f ), chunk |Rf | elements order, patterns reorder p 1 input chunk order, output entire input array, plus C el- input and output chunk chunk order, chunk shape ements shapes are identical (C) Fig. 10. Properties of ArrayDB’s physical operators the apply operations, the clipping window expands slightly. “ p” emphasizes that these are physical operators.) The phys- The larger window is required so that fnr can be applied prop- ical operators are summarized in Fig. 10. erly to elements on the window’s boundary. By exploiting its knowledge of the shapes of the domain and range boxes of apply p: The apply p operator applies an externally-deﬁned the applied functions, ArrayDB’s rewrite procedure is able to function to each input chunk. Each function application determine when such adjustments are necessary. produces one output chunk. The input chunk shape of an apply p operator matches the domain box shape of the function it is applying, and its output chunk shape matches 4.3 Plan generation the function’s range box shape. Leaf p: The leaf p operator is similar to the apply p op- ArrayDB’s plan generator generates a query evaluation plan erator, except that leaf p does not take input from other from an AML expression. A query evaluation plan consists operators. Instead, leaf p operators feed stored array data of a tree of physical operators. Each (non-leaf) operator in the into a plan. ArrayDB assumes that arrays are stored using plan consumes data produced by its children. The root operator regular tiling [35,9]. A regularly tiled array is (conceptu- produces the query result. Conceptually, each physical oper- ally) made up of non-overlapping subarrays (tiles). All of ator produces an array, and consumes the arrays produced by the tiles are of the same shape, and they completely span its children. However, operators do not normally fully materi- the array. The individual array elements within a tile are alize the arrays that they produce. Instead, arrays are produced stored contiguously on a physical storage device such as and consumed a chunk at a time. Chunks are non-overlapping a disk. rectangular subarrays. On each NextChunk call, leaf p invokes an externally- ArrayDB’s physical operators are iterators [11]. Speciﬁ- deﬁned accessor function to retrieve one array tile. Thus, cally, they are array chunk iterators. A chunk iterator produces the leaf p operator’s output chunks have the same shape its output chunks one at a time, in response to NextChunk re- as the tile shape. quests from its parent in the query plan. To obtain the data it A leaf p operator can be conﬁgured so that it will pro- needs to produce a chunk, a chunk iterator may, in turn, make duce some, but not all, of the chunks of the stored array. one or more NextChunk requests to its children. The chunks to be produced are determined by a set of bit A chunk iterator produces the chunks of its output array in a patterns which are supplied to a leaf p operator as pa- particular order, e.g., column-major order or row-major order. rameters. A leaf p operator takes one pattern parameter Similarly, it expects to be able to consume its input chunks in for each dimension of its stored array. The patterns act as a particular order. In a plan, successive operators must have masks on the slabs of array tiles (chunks) in each dimen- compatible chunk shapes and chunk generation orders. For sion. Speciﬁcally, the tiles in the jth slab in dimension i example, if an operator produces chunks of shape 3, 3 in are produced by the leaf p operator only if the jth bit of row-major order, then its parent must consume 3, 3 chunks the ith pattern parameter is set. in row-major order. regroup p: The regroup p operator obtains chunks of a The plan generator produces plans in which the chunk given shape (C in ), and produces chunks of unit size. Fig- shapes of successive operators are compatible. It leaves the ure 11 illustrates the effect of a regrouping operation on operators’ chunk generation orders unspeciﬁed. Chunk gen- a 6, 6 array with the input chunk shape of 2, 2 . A eration orders are chosen during the plan reﬁnement phase, regroup p operator neither generates nor destroys array which is described in Sect. 4.4. elements. Its output array is identical to its input array, except that it is chunked differently. 4.3.1 ArrayDB physical operators In general, a regroup p operator will have to buffer the chunks that it consumes so that it can produce its output ArrayDB has six physical operators: apply p, replicate p, chunks in the proper order. For example, in the scenario regroup p, combine p, leaf p, and reorder p. (The sufﬁx illustrated in Fig. 11, the regroup p operator will have

12.A.P. Marathe, K. Salem: Query processing techniques for arrays 79 Input Array A 1 a7 a8 a9 REGROUP_P 0 a4 a5 a6 a8 a9 1 0 1 a1 a2 a3 b4 b6 0 1 0 1 1 COMBINE_P b1 b3 0 1 filter patterns a2 a3 1 0 Fig. 11. Regrouping a 6, 6 array with 2, 2 input chunks, in row- 1 b4 b5 b6 1 1 1 1 write patterns for A major order (0-order). Input chunks are consumed in row-major order, 1 b1 b2 b3 1 0 1 write patterns for B as illustrated by the arrow. Output chunks (which have unit size) are Input Array B produced in row-major order Fig. 13. combine p with two input arrays. Array element labels indi- cate which input array elements appear in the output, and where they 1314 14 15 15 16 appear 13 14 15 16 9 10 10 11 11 12 9 10 11 12 REPLICATE_P 9 10 10 11 11 12 5 6 7 8 5 6 6 7 7 8 1 2 3 4 5 6 6 7 7 8 1 2 2 3 3 4 Fig. 12. Replicate p in row-major order with output chunks of shape REORDER_P 2, 2 applied to a 4, 4 array. Elements of the input array are num- bered to indicate the order in which they will be consumed by the operator. The output chunks, indicated by the bold lines, are pro- Fig. 14. A reorder p operator with 2, 2 chunks, row-major input, duced in row-major order. The numbers in the cells of the output and column-major output array indicate the input cells to which they correspond to buffer each row of three input chunks.8 The amount of buffering required depends on the shape of the array its input arrays one element at a time, and produces its and on the shapes of the input chunks. In general, it may output array one element at a time. also depend on the order in which the regroup p operator The behavior of a combine p operator is controlled by produces and consumes the chunks. This issue is discussed a set of pattern parameters. For each of its input arrays, in more detail in Sect. 4.4. a combine p operator accepts one ﬁlter pattern and one replicate p: A replicate p operator consumes chunks of write pattern for each dimension of the input array. (Filter unit size and produces output chunks with a speciﬁed shape patterns and write patterns are computed from the patterns C out . The operator produces all possible (overlapping) of the sub and merge operations that the combine p op- subarrays of shape C out of its input array. Figure 12 shows erator replaces. The details of the computations are given an example of its behavior. Like regroup p, replicate p in [23].) The ﬁlter patterns act as masks to determine which must, in general, buffer more input chunks because it has of the input array’s elements will appear in the output of to produce output chunks in the proper order, and because combine p. An array element in the jth slab in dimension each element of the input array may appear in more than i of the input array will appear in the output array only if one output chunk. the jth bit of the ith ﬁlter pattern is set. The write patterns Like leaf p, replicate p can be conﬁgured so that it will determine where the unﬁltered input elements will appear produce some, but not all, of the possible output chunks. in the output. If the jth set bit of the dimension-i write The chunks to be produced are speciﬁed by a set of pattern pattern occurs in the kth position, then elements from the parameters, one for each dimension of the input array. As jth unﬁltered i-slab in the input are placed in the kth i-slab was the case for leaf p, the patterns act as masks. The in the output. The net effect of the ﬁlter and write patterns replicate p produces the chunk at position x only if, for is to map, in each dimension i, i-slabs of the input array all dimensions 0 ≤ i < d, the x[i]th bit of the ith pattern to i-slabs of the output array. The mapping is one-to-one parameter is set. and onto, and is, in general, partial. Figure 13 shows an combine p: ArrayDB’s combine p operator was designed to example of the operation of a combine p operator. replace a logical subtree made up entirely of sub and reorder p: The reorder p operator reorders the chunks of merge operations. If the subtree has x input arrays (x ≥ its input array. It does not change the chunk shape, and 1), then the combine p operator will be x-ary. Replacing it does not affect the values of the chunk elements. For such a subtree by a single combine p operator avoids the example, a reorder p operator can consume array chunks generation of intermediate arrays. in row-major order and produce them in column-major ArrayDB’s implementation of combine p requires that the order, as illustrated in Fig. 14. Reorder p buffers its entire input and output chunks be of unit size. That is, it consumes input array so that it can produce its output chunks in the 8 Alternatively, it would be possible for regroup p to reopen its correct order. Note that in the sense of relational query input operator and reread chunks from its input array. However, re- optimization, chunk order is a physical property — that is, generation of the input array may be substantially more expensive a property of the implementation plans that is not visible than reusing buffered chunks, and this operation is not supported in at the logical level. Reorder p then plays the role of a ArrayDB. physical property enforcer.

13.80 A.P. Marathe, K. Salem: Query processing techniques for arrays APPLY APPLY_P Deﬁnition 17 (Memory cost). The memory cost of an AML REPLICATE_P iterator plan is the sum of the buffer space requirements of the T individual operators in the plan. The memory costs of individ- REGROUP_P ual plan operators are deﬁned in Fig. 10. T Reducing memory cost of a plan is important because it can make the difference between a plan that can execute entirely in Fig. 15. Plan for an apply node memory and one that cannot. In the latter case, it is necessary MERGE COMBINE_P to split the plan by materializing partial results on secondary storage, with a corresponding increase in execution cost. MERGE SUB REGROUP_P REGROUP_P REGROUP_P The plan reﬁnement phase achieves memory cost reduc- SUB SUB tion in two ways. First, it eliminates unnecessary operations T1 T2 T3 from the plan. Second, it determines the order in which each T1 T2 T3 plan operator will produce and consume array chunks. The orders are intelligently chosen such that the plan’s memory Fig. 16. Plan for a subtree made up of sub and merge nodes cost is minimized. The ﬁrst task is relatively straightforward. A regroup p 4.3.2 Plan generation algorithm operator is unnecessary if its input and output chunk shapes are the same. A combine p operator is unnecessary if: (1) it A query evaluation plan is generated by a recursive, top-down has only one child, and (2) its ﬁlter patterns consist only of translation of an AML expression tree. The AML operators ones. Eliminating unnecessary regroup p and combine p op- are translated into physical operators as follows. erations not only reduces memory cost, but also avoids unnec- Non-leaf apply: If the root operator of the AML expression essary data copying. The plan shown in Fig. 17a contains ﬁve tree is a non-leaf apply with domain box D f and range unnecessary regroup p nodes (indicated by arrows). They are box Rf , an apply p operator, a replicate p operator, and deleted during the plan reﬁnement phase, resulting in the plan a regroup p operator are added to the plan as shown in shown in Fig. 17b. Fig. 15. The apply p operator’s input chunk shape is D f The second task is more involved. As was noted in and its output chunk shape is Rf . The replicate p op- Sect. 4.3, each physical operator produces and consumes ar- erator generates only those chunks required by the subse- ray chunks in a particular order. Many chunk orders are pos- quent apply p operator. Thus, the pattern arguments for sible. The chunk order is important because it can affect the the replicate p are the pattern arguments of the AML number of chunks that must be buffered by regroup p and apply operator. The purpose of the regroup p operator is replicate p, thus affecting the memory costs of these two to change the chunks produced by the input expression T to operators. the unit chunks required by the subsequent replicate p Figure 18 illustrates how chunk order affects memory cost operator. Note that the regroup p and replicate p op- for the case of a regroup p operator with a 2, 4 input chunk erators change, in two steps, the (arbitrary) chunk shape shape operating on a 8, 8 array. The left-hand side of the produced by input expression T to match the domain box ﬁgure illustrates how the operation is performed in row-major shape of the apply’s external function. order. The solid arrow indicates the order in which the output Leaf apply: If the root node of the expression tree is a leaf chunks (unit size) must be produced. Clearly, the regroup p apply, a leaf p operator is generated. The leaf p oper- operator must buffer at least two input chunks if it is to produce ator’s patterns are those of the leaf apply operator in the the output chunks in the proper row-major order. The right- AML query. hand side of the ﬁgure shows the same operation performed in Sub or merge: If the root operator of the AML expression column-major order. In this case, the regroup p must buffer tree is a sub or a merge, the plan generator ﬁnds the four input chunks. Modifying the shape of the array would maximal tree of sub and merge operations rooted there. change this comparison. For example, if the array was wider, The tree is translated into a k-ary combine p operator the memory requirement for row-major order would increase, and k regroup p operators, where k is the number of but the requirement for column-major order would remain leaves of the tree. This translation is shown in Fig. 16. unchanged. The combine p operator’s write and ﬁlter patterns are de- Because operators consume the chunks of their input ar- rived from the sub and merge patterns. The purpose of the rays in the order in which they are produced, the chunk orders regroup p operations is to produce chunks of unit size, for the operators in a plan are not independent of one another. as required by the combine p operator. Nevertheless, a producer and a consumer can use different chunk orders if a reorder p operator is inserted between them The plan generation algorithm converts the AML expres- in the plan. A reorder p operator itself has a memory cost, sion given in Eq. (9) for the optimized TVI query to the iterator since it must buffer chunks to reorder them. To determine the plan shown in Fig. 17a. best chunk order for each operator, the optimizer must balance the additional cost of reordering with the potential downstream 4.4 Plan reﬁnement beneﬁts it may bring. ArrayDB uses a dynamic programming algorithm to select The plan reﬁnement phase minimizes the memory cost of an the chunk orders of a plan’s operators. If an operator’s input AML iterator plan. array consists of k chunks, there are k! ways those chunks

14.A.P. Marathe, K. Salem: Query processing techniques for arrays 81 <1,1> APPLY_P tvi <1,1,2> REPLICATE_P <1,1> <1,1> REGROUP_P APPLY_P tvi <1,1> <1,1,2> COMBINE_P REPLICATE_P <1,1> <1,1> <1,1> REGROUP_P REGROUP_P COMBINE_P <1,1> <1,1> <1,1> <1,1> APPLY_P nr APPLY_P nr APPLY_P nr APPLY_P nr <3,3,1> <3,3,1> <3,3,1> <3,3,1> REPLICATE_P REPLICATE_P REPLICATE_P REPLICATE_P <1,1> <1,1> <1,1> <1,1> REGROUP_P REGROUP_P COMBINE_P COMBINE_P <1,1> <1,1> <1,1> <1,1> COMBINE_P COMBINE_P REGROUP_P REGROUP_P <1,1> <1,1> <1024,1024,1> <1024,1024,1> REGROUP_P REGROUP_P LEAF_P LEAF_P <1024,1024,1> <1024,1024,1> LEAF_P LEAF_P (a) (b) Fig. 17. Initial (a) and reﬁned (b) plans for the optimized TVI query (Eq. 9). Each edge is labeled with the shape of the chunks that ﬂow along that edge. Arrows are used in indicate operators in the initial plan that are unnecessary. These are eliminated in the reﬁned plan dim. 0 dim. 1 <2,4> chunks consumed row-major order (0-order) column-major order (1-order) chunk production/consumption chunk production/consumption Fig. 18. Regrouping in 0-order and in 1-order could be ordered. The optimizer does not consider all such For an n-operator plan, there are dn possible assignments orderings. Instead, it considers d possible iteration orders for of chunk orders to plan operators. The dynamic program- each operator, where d is the maximum dimensionality of any ming algorithm ﬁnds a minimum memory cost assignment array appearing in the AML plan. Speciﬁcally, it considers in O(nd2 ) time. The algorithm proceeds bottom up through a i-order, for 0 ≤ i < d, as described in Deﬁnition 18. For plan tree. At each node x, it determines for each 0 ≤ i < d example, 2-order means the chunks are sorted in dimension 2, the minimum total memory cost of the plan subtree rooted at then dimension 0, then dimension 1, then dimension 3, then x, assuming that x’s output is in i-order. This cost, denoted by dimension 4, and so on. In two-dimensional arrays, 0-order Ci (x), is determined for each non-leaf node by the formula: is row-major order, and 1-order is column-major order. Other orders, such as the Z-curve or the Hilbert curve [1], are also Ci (x) = ci (x) + y∈X min(Ci (y), possible, and possibly even useful, especially if chunks in the (10) minj=i (Cj (y) + cji (reord(y)))), base arrays have been laid out in such an order on secondary storage. For simplicity’s sake, the optimizer does not consider where X is the set of children of x, ci (x) is the memory cost them. of operator x itself in i-order, and cji (reord(y)) is the cost of a j-order to i-order reorder p operator inserted between Deﬁnition 18 (Chunk i-order). Chunk i-order (i-order for y and x in the plan. In other words, to produce x’s result in short) for an array A is determined by sorting the chunks of i-order, each child of x either produces its result in i-order, or A using their positions in dimension i as the primary sort key, it produces its result in some other order and a reorder p is and their positions in the remaining dimensions, in order of inserted after that child to convert its output to i-order before increasing dimension number, as secondary sort keys. it reaches x. If x is a leaf p operator, then Ci (x) = ci (x).

15.82 A.P. Marathe, K. Salem: Query processing techniques for arrays With each plan tree node x, the dynamic programming al- detectors in each band. The multiple detectors — for example, gorithm associates a cost table with d entries. The ith entry is of six — are carefully calibrated and matched prior to the satel- the form (Ci (x), choice 1 , choice 2 , . . . , choice |X | ). choice j lite launch. However, their radiometric response tends to drift records the iteration order for the jth child (1 ≤ j ≤ |X |) over time, resulting in relatively higher or lower values along that was used to achieve the minimum subtree cost Ci (x). every sixth line in the image data (for example). Valid data is When the dynamic programming algorithm ﬁnishes, d plans present in the defective lines but it must be normalized with are available to evaluate the AML expression, each one gen- respect to its neighboring observations. The normalization is erating the result array in a certain order. Out of these d plans, performed by subtracting a value δ from every sixth line in the cheapest plan is chosen for evaluation. The chunk orders of the original image. The value δ is determined by computing a the operators in the cheapest plan are determined using a ﬁnal histogram for scan lines 1, 7, 13 and so on; a second one for top-down traversal of the plan tree to select the appropriate lines 2, 8, 14, and so on; and so forth. These histograms are “choice” entries from the cost tables. When the chunk orders compared in terms of their mean and median values to arrive of two successive operators differ, a reorder p operator is at the value of δ. Lillesand and Kiefer show an illustration of inserted between them in the plan. the destriping procedure [20, p. 484]. The cost ci (x) of a particular operator x depends on de- For concreteness, let δ = 25. Suppose that the apply func- tails of its implementation, such as the granularity of memory tion deduct25 with unit-sized domain and range boxes per- allocation. Each ArrayDB operator has an associated costing forms the noise removal for one pixel value. The apply pat- method which can be invoked by the optimizer to obtain a tern in dimension 0 can be used to apply deduct25 selectively cost estimate for evaluation of that operator. The cost estimate to the scan lines 1, 7, 13, and so on. The corrected lines can used for each of ArrayDB’s physical operators — the buffer then be merged with a subsampled version of the original im- space required to implement the physical operation in a cer- age where the problem lines have been eliminated. In the AML tain i-order — is given in Fig. 10. These estimates assume expression below, it is assumed that destriping is performed on that the unit of memory allocation is a slab of input chunks. band ﬁve. The AML expression for A5 is sub2 (0000100, A); In addition, each operator is charged for a buffer to be used to the other bands can also be extracted from A similarly. pass output chunks to its parent in the query plan. This buffer is just large enough to hold one output chunk. merge0 (105 , (11) apply(deduct25 , A5 , P0 = 105 ), sub0 (015 , A5 )) 5 The query suite 5.2 TVI One way to evaluate performance of a DBMS is to run it on a benchmark. Since there are no benchmarks for array database Computing vegetation indices using between-band differences systems, we created a suite of array queries to be used to and ratios is a commonly used image enhancement method. measure ArrayDB’s performance. The queries in the suite are Image enhancement aims to create enhanced images from the described in this section. The empirical results obtained by original image data to increase the amount of information that measuring ArrayDB’s performance on the queries in the suite can be visually interpreted from the data. are presented in Sect. 6. The TVI computation and its expression in AML have The suite contains ﬁve queries from the digital image already been described in detail in Sect. 1 and Sect. 3, re- processing domain. For easy reference, the queries in the spectively. The following AML expression for TVI is just an suite are given the following names: TVI, NDVI, DESTRIPE, abbreviated form of Eq. (6). MASK, and WAVELET. TVI, NDVI, and DESTRIPE are apply(tvi , (12) based on common image processing operations described merge2 (10, apply(nr , A3 ), apply(nr , A4 ))) in [20]. MASK was inspired by a query described in [21]. WAVELET uses wavelet reconstruction as a method of con- In the above expression, D tvi = 1, 1, 2 , Rtvi = 1, 1 , structing a high-resolution image from four low-resolution im- D nr = 3, 3 , and Rnr = 1, 1 . ages [38]. For simplicity and uniformity, all the queries except WAVELET are constructed such that they manipulate one or more bands of a multi-band satellite image such as the image 5.3 NDVI A shown in Fig. 1. For brevity, bands 1 through 7 of that image will be denoted by the names A1 through A7 . Like TVI, NDVI (normalized difference vegetation index) is also a vegetation index. NDVI is computed from data in the AVHRR (advanced very high-resolution radiometer) sensor’s 5.1 DESTRIPE bands one and two using the formula b2 − b 1 The destriping procedure [20, p. 483] — a noise removal oper- NDVI = , (13) ation — is an example of an image rectiﬁcation and restoration b2 + b1 operation. Such operations correct distorted or degraded im- where b1 and b2 represent data from bands one and two, re- age data to create a more faithful representation of the original spectively [20, p. 448]. Vegetated areas have positive NDVI scene. values; areas with clouds, water, and snow have negative NDVI Some multi-spectral scanners aboard satellites sweep mul- values; rock and bare soil give NDVI values near 0. It is prefer- tiple scan lines simultaneously. To do that, they have multiple able that the data values b1 and b2 be in terms of radiance or

16.A.P. Marathe, K. Salem: Query processing techniques for arrays 83 reﬂectance [20, p. 448],9 rather than in units of pixel intensi- h1 low-frequency D components of A ties. Suppose that the pixel intensities in bands A1 and A2 are h1 in the range 0 to 255. Pixel intensity and absolute radiance are E related to each other by the following formula [20, p. 481]: h2 dim. 2 B dim. 0 LMAX − LMIN h1 bout = · bin + LMIN . (14) F dim. 1 255 h2 A Here, bout is the absolute spectral radiance value, bin is the high-frequency pixel intensity, LMIN is the spectral radiance corresponding h2 G components of A to the pixel intensity of 0, and LMAX is the spectral radiance C required to generate the maximum pixel intensity of 255. The Fig. 19. Wavelet decomposition constants LMIN and LMAX are sensor-speciﬁc. dim. 0 dim. 2 Suppose that the apply function dn2ar performs the con- B version described by Eq. (14), and that the apply function ndvi D H dim. 1 h computes the NDVI as per Eq. (13). The AML query for the NDVI computation can now be given as follows. J A low E apply(ndvi , resolution k (15) merge2 (10, apply(dn2ar , A1 ), apply(dn2ar , A2 ))) images C F I high resolution In the above expression, dn2ar has unit-sized domain and h image range boxes, D ndvi is 1, 1, 2 , and Rndvi is 1, 1 . G Fig. 20. Wavelet reconstruction 5.4 MASK MASK is an example of an image classiﬁcation operation. in remote sensing, the spatial resolution required to study an Image classiﬁcation categorizes all the pixels in a digital image urban area is usually much different than that needed to study into one of several classes. MASK’s computation is described an agricultural area or the open ocean [20, p. 599]. The wavelet as follows [21]: In an image, retrieve all the pixels whose transform is one way to decompose an image into many com- intensities, when averaged with all the neighboring pixels, are ponents so that the image can be reconstructed at multiple between two constant values, say 10 and 100. resolutions as needed. To understand how wavelet reconstruc- The result pixels of the MASK query might not form an tion works, it is ﬁrst necessary to describe the wavelet-based AML array and therefore, MASK’s result is deﬁned to be a image decomposition. binary image containing a ‘1’ in each position where the pixel Figure 19 shows an n × n image A on the left. Wavelet satisﬁes the criterion, and a ‘0’ in all the other positions — decomposition transforms each row of A as follows. A row is these are the two classiﬁcation classes. logically divided into n2 groups of two adjacent pixels each. Suppose that band one contains the original n × n image, (n is even.) Suppose that the pixel values in a group are b and and that the function avg9 with D avg9 = 3, 3 and Ravg9 = c. As per the wavelet transform with the Haar basis [38], two 1, 1 calculates the average of the nine pixels (a central pixel functions h1 and h2 , deﬁned by the following equations, are and its eight neighbors), compares it to the two constants 10 applied to b and c. and 100, and returns either ‘0’ or ‘1’. The AML expression for MASK is as follows. h1 = (b + c)/2 (17) apply(avg9 , A1 ) (16) h2 = (b − c)/2 (18) Due to apply’s semantics, the output array of MASK has Notice that h1 + h2 = b and that h1 − h2 = c. That is, the the shape n − 2, n − 2 . If necessary, such a mask can be transform is invertible without loss of information. expanded — using merge operators — by adding two rows In Fig. 19, image B gathers the results of all the h1 function and two columns to it. The boundary pixels can be arbitrarily applications, and image C gathers the results of all the h2 assigned to the class ‘0’. (Other ways of handling the boundary function applications. Images B and C have shapes n × n2 . condition are also possible.) Next, the decomposition just described is applied to all the columns in images B and C. As a result, the column lengths shrink by half, and a set of four n2 × n2 images D, E, F , and 5.5 WAVELET G is generated. D contains the low-frequency components of A, whereas G contains the high-frequency components of WAVELET’s computation is an example of multi-resolution A. The decomposition may then proceed recursively on the image processing. In multi-resolution image processing, im- image D. (n is conveniently chosen to be a power of two.) ages need to be viewed at multiple resolutions. For example, The decomposition ends when a set of “small” — for example, 9 Radiance is a measure of the “brightness” of a point on the 32 × 32 — images is generated. ground, whereas reﬂectance is a measure of the amount of light re- Wavelet reconstruction combines four low-resolution im- ﬂected by a surface. Radiance and reﬂectance are related [20, p. 22]. ages to form a high-resolution image. Figure 20 illustrates

17.84 A.P. Marathe, K. Salem: Query processing techniques for arrays wavelet reconstruction. Image names have been retained from query optimization techniques presented in Sect. 4 effective? Fig. 19. Suppose that D, E, F , and G are n2 × n2 images. That is, do they reduce the cost of evaluating an array query? Wavelet reconstruction begins by combining D and E by The results presented in Sects. 6.2.1 and 6.2.2 show that Ar- putting one atop the other in dimension 2 to generate the image rayDB’s optimizations can signiﬁcantly reduce the time and H. Likewise, F and G combine to form I.10 Suppose that (d, space required for query evaluation with little optimization e) is a pair of matching pixels in H with d coming from D overhead. Second, how efﬁcient are ArrayDB’s optimized, and e from E. According to the Haar wavelet transform, two iterator-based evaluation plans? In absolute terms, can Ar- functions hˆ1 and hˆ2 are applied to the pair (d, e) as follows. rayDB execute array queries quickly? The results presented in Sect. 6.3 show that ArrayDB’s query evaluation times are hˆ1 = d + e (19) close to those of custom, hand-coded programs in some cases, hˆ2 = d − e (20) but not in others. These results suggest several avenues of im- provement for ArrayDB. ˆ performs the tasks of hˆ1 and hˆ2 In Fig. 20, the function h by producing a 2 × 1 array with values (d + e, d − e) as output for each pair of pixels (d, e). Therefore, the result of applying 6.1 Workload and evaluation environment ˆ to H (image B) is twice as high as H. Similarly, h h ˆ applied to I produces the image C. The images B and C of shapes n × n2 In all of the experiments described in this section, the workload are put one atop the other to form the image J.11 The function consists of the query suite described in Sect. 5. Figure 21 kˆ is similar to h ˆ except that one application of kˆ produces a summarizes the default properties of the workload, which are applicable unless otherwise indicated. For each of the ﬁrst four 1×2 array. Therefore, applying kˆ to J produces an n×n high- queries, the input array is a seven-band, 1024 × 1024 multi- resolution image A. Wavelet reconstruction can continue on spectral image of the Washington, DC area. For the WAVELET the image A by combining it with three other n × n images. query, the input array consists of four concatenated 512 × Both wavelet decomposition and wavelet reconstruction 512 images produced by the wavelet decomposition procedure can be expressed using AML queries; the following descrip- described in Sect. 5.5. (For WAVELET, n in Eq. (21) is 1024.) tion only shows how wavelet reconstruction is achieved us- Unless speciﬁed otherwise, all input arrays were laid out in ing AML. Speciﬁcally, it is shown how AML can express one 4 KB tiles of shape 64, 64, 1 . The output chunk shapes of step of wavelet reconstruction whereby the four low-resolution the ArrayDB leaf p operators match the tile shape, so that images D, E, F , and G in Fig. 20 combine to form the high- any stored tile can be retrieved with a single I/O operation. resolution image A. The four low-resolution images are typ- All experiments were run on a Sun Ultra-10 computer with ically stored together in one array. Suppose that the array X 128 MB of main memory, running the Solaris 2.6 operating stores D, E, F , and G concatenated in dimension 0. D can system. The “direct I/O” feature of Solaris 2.6 was used to be extracted from X as follows; the other three images can be bypass the ﬁle system’s buffer cache, so all runs were effec- extracted from X similarly. tively cold runs. The machine was run in single-user mode to D = sub0 (1n/2 03n/2 , X) (21) minimize the pollution of measured wall-clock running times by operating system multi-tasking. The AML expressions for the images B, C, and A are as Unless otherwise indicated, each reported running time is follows. (D hˆ = 1, 1, 2 , Rhˆ = 2, 1, 1 , D kˆ = 1, 1, 2 , and an average over approximately twenty runs. Conﬁdence inter- Rkˆ = 1, 2, 1 .) vals were calculated at a 99% conﬁdence level. Conﬁdence intervals are not shown in the results unless they are at least ˆ merge2 (10, D, E)) B = apply(h, (22) 5% of the mean running time. This was done to reduce clutter ˆ merge2 (10, F, G)) C = apply(h, (23) in the graphs. ˆ merge2 (10, B, C)) In the descriptions of empirical results, the phrase “opti- A = apply(k, (24) mization on” means that all of the AML query optimizations It is an interesting fact that all of the wavelet decomposi- discussed in this paper were enabled; the phrase “optimization tion and reconstruction transforms (and not just the ones with off” means that the logical rewriting step and the step in the the Haar basis functions that we have chosen) have recursive plan reﬁnement phase that deletes no-op physical operators structures similar to the ones shown in Fig. 19 and Fig. 20. from an AML plan were disabled. Therefore, AML can express all such transforms. 6.2 Effectiveness of optimization 6 Experimental results This paper describes two important array query optimization This section presents an empirical evaluation of ArrayDB. The techniques. The ﬁrst one saves disk I/O and CPU time by evaluation is intended to answer two questions. First, are the avoiding the reading and processing of array data that are not 10 These two steps are unnecessary; they are included only because needed to compute the query result. The experiments reported later on in this section, AML will be used to express the wavelet in Sect. 6.2.1 demonstrate the effectiveness of this technique. reconstruction computation. Having these steps facilitates a simple The second technique reduces the buffer space requirement translation of wavelet reconstruction to AML. of an array query plan by controlling iteration orders. The 11 Once again, this step is performed only because it facilitates a experiments reported in Sect. 6.2.2 show the effectiveness of simple translation of wavelet reconstruction to AML. this technique.

18.A.P. Marathe, K. Salem: Query processing techniques for arrays 85 Input array Input array Input tile Input tile AML query Query shape size (MB) shape size (KB) expression TVI 1024, 1024, 7 7 64, 64, 1 4 Equation 12 NDVI 1024, 1024, 7 7 64, 64, 1 4 Equation 15 DESTRIPE 1024, 1024, 7 7 64, 64, 1 4 Equation 11 MASK 1024, 1024, 7 7 64, 64, 1 4 Equation 16 WAVELET 2048, 512 1 64, 64 4 Equation 24 Fig. 21. Query workload characteristics clipping window 16 TVI 14 NDVI DESTRIPE running time (wall-clock) in sec. y x 12 MASK WAVELET 10 output array 8 Fig. 22. Clipping window 6 4 6.2.1 Rewrite optimization 2 To measure the effect of rewrite optimization, we applied a clipping window to the results of each of the queries in the 0 1/1 1/4 1/16 1/64 1/256 suite, as illustrated in Fig. 22. The clipping fraction is deﬁned clipping fraction as the size of the clipping window divided by the size of the Fig. 23. Running times of clipped queries with optimization on full output array. Clipping is implemented using two AML sub operations. If the original workload query expression is Q, and 256 the clipping fraction is f 2 , the expression for the clipped query TVI NDVI is: DESTRIPE MASK sub1 (0(1−f )Q[1]/2 1f Q[1] 0(1−f )Q[1]/2 , 64 WAVELET (25) Ideal sub0 (0(1−f )Q[0]/2 1f Q[0] 0(1−f )Q[0]/2 , Q)), speedup where Q is the shape of the result of query Q. We expect the 16 query evaluation time to vary with the clipping fraction. Figure 23 shows the wall-clock query running times as a function of the clipping fraction, with optimization on. Fig- ure 24 shows the corresponding speedup curves. Query eval- 4 uation times decrease with the clipping fraction, as expected, because the optimizer is able to “push” the clipping window down into the query expression, reducing the number of func- 1 1/1 1/4 1/16 1/64 1/256 tion applications and the number of stored tiles to be retrieved clipping fraction from the disk. Fig. 24. Speedup curves for clipped queries with optimization on Ideally, query evaluation times should be proportional to the clipping fraction. However, as Fig. 24 shows, the speedup falls off as the clipping fraction gets smaller. There are several In absolute terms, the query optimization time did not exceed reasons for this. First, there is some query evaluation overhead 0.3 s in any of our experiments. — for example, optimization time and plan initiation time — that is independent of result size. Second, I/O costs do not decrease smoothly with the clipping fraction because tile 6.2.2 Optimization of iteration order sizes are ﬁxed. Any tile that is at least partially covered by the clipping window is retrieved by the query evaluation plan. ArrayDB’s optimizer uses a dynamic programming algorithm When rewrite optimization is disabled, evaluation times to select the iteration orders of each of the operators in a query for the clipped queries are essentially independent of the clip- plan. To determine the effectiveness of this technique, we mea- ping fraction. (This is not shown in Fig. 23.) When optimiza- sured the total memory requirement of the query evaluation tion is disabled, the query plan generates the full query result plan, with and without optimization. We varied the tile shapes and then clips it. Not surprisingly, the cost of generating the of the stored input arrays, since the tile shape affects memory full query result dominated the query evaluation times. requirement of a plan. In all of the experiments that we performed, query opti- The results of this experiment are summarized in Fig. 25. mization times were insigniﬁcant compared to the query eval- For the sake of brevity, only the results for the TVI query uation times, unless the query evaluation time was very small. are shown. The plans for the other queries exhibited similar

19.86 A.P. Marathe, K. Salem: Query processing techniques for arrays Tile shape Memory costs of TVI plans(in KB) Query ArrayDB C++ ArrayDB slower (tile size Optimization on Optimization off CPU time (s) CPU time (s) by a factor of = 4 KB) Iteration Order-0 Order-1 Order-2 TVI 12.53 2.22 5.64 Cost order cost cost cost NDVI 8.05 1.47 5.48 512, 8, 1 33 1 2222 133 2222 DESTRIPE 5.44 0.03 181.33 256, 16, 1 49 1 2337 248 2337 MASK 3.67 0.34 10.79 128, 32, 1 82 1 1853 477 2566 WAVELET 9.36 0.18 52.00 64, 64, 1 147 0 936 936 3025 32, 128, 1 82 0 477 1853 2566 Fig. 26. Comparison of ArrayDB versus C++ programs 16, 256, 1 49 0 248 2337 2337 8, 512, 1 33 0 133 2222 2222 C++ programs. For each of the ﬁve queries in the query suite, Fig. 25. Memory costs of TVI plans with and without optimization we wrote a C++ program to evaluate the query. The C++ pro- grams were given as much memory as they required; they were not limited to the amounts of memory consumed by the properties. The ﬁgure’s ﬁrst column shows the shape of the corresponding ArrayDB plans. input array’s tile. The second column shows the total memory We do not expect the generated plans to match the running requirement of the plan generated by ArrayDB, with the op- times of the custom programs.12 The purpose of this experi- timizer enabled. (These total memory requirements includes ment was to measure the cost (in terms of query evaluation the data buffer space required by all of the plan’s operators, time) of the declarative interface and physical data indepen- but not the buffer used to hold the query result, since this was dence offered by an array database system, and to identify and the same size in all the cases.) The third column shows the determine the causes of any performance problems. iteration order selected by the optimizer for the plan opera- In these experiments, an input array tile shape of tors. In this experiment, the optimizer always chose to assign 1024, 1024, 1 was used. That is, the input arrays were laid the same iteration order to all of the operators in a given plan, out in band-major order. This simpliﬁes I/O for the C++ pro- though this is not always the case. The ﬁnal three columns grams. No clipping was applied in these experiments. Each show the total memory cost when the optimizer is disabled query generates its full output array. (no rewrite optimization or iteration order selection). Three Figure 26 shows the query evaluation times for ArrayDB values are shown, one assuming that all operators execute in and the custom C++ programs for each of the queries in the 0-order, one assuming that all operators execute in 1-order, suite. The ﬁgure shows CPU times, rather than wall-clock and one assuming that all operators execute in 2-order. (Since times. For each of the queries except DESTRIPE, the ArrayDB the input array is three dimensional, these are the only orders plan does the same amount of disk I/O as the corresponding that are considered.) C++ program does. We have focused on the CPU times, since Several conclusions can be drawn from the data in Fig. 25. it is principally in CPU time that the ArrayDB and C++ plans First, iteration order matters. The last three columns show that differ. The last column of Fig. 26 shows the factor by which a bad iteration order can be an order of magnitude costlier ArrayDB was slower than the corresponding C++ program. than a good one. Second, the best choice of iteration order For TVI, NDVI, and MASK, ArrayDB comes relatively varies with the shape of the input array’s tiles. Unless the close to the custom programs. ArrayDB’s operator imple- physical layout is ﬁxed for all data (which is not a good idea mentations are not heavily optimized, and we believe that because different workloads might beneﬁt from different lay- much of the performance difference in those three cases could outs), evaluation order should be chosen dynamically to reﬂect have been eliminated by better implementations. For the DE- layout of data used by a particular query. In Fig. 25, notice that STRIPE and WAVELET queries, however, ArrayDB is much any ﬁxed choice of iteration order will result in costly plans for slower, and we can identify speciﬁc reasons for the difference at least some of the input tile shapes. The ArrayDB optimizer in speed. One is that ArrayDB’s plans do more copying and has the ﬂexibility to adapt to the physical design, choosing the reorganization of data than the custom programs do. Other right iteration order for each input tile shape. problems include ArrayDB’s inability detect and exploit re- Finally, the memory cost with ArrayDB optimization on is peated subexpressions in AML queries, and the lack of an less than the cost with optimization off, even when both plans in-place update operation. use the same iteration order. For example, when the tile shape The data copying overhead occurs in WAVELET and is 64, 64, 1 , the optimized plan requires 147 KB and runs in DESTRIPE for the following reasons. The AML query for 0-order, while the unoptimized 0-order plan requires 936 KB. WAVELET contains three merge operators because apply is This difference is due to rewrite optimization. Although both a unary operator and the inverse Haar basis functions are bi- plans iterate in 0-order, the optimized plan generates smaller nary operations. To apply the inverse Haar transformations, intermediate results (because of rewrites), and so requires less AML must ﬁrst combine the two input arrays (using merge) buffer space. Thus, rewrite optimization helps to reduce mem- into a single array. In the resulting plan, each merge is im- ory cost as well as query evaluation time. plemented by a combine p operator. ArrayDB’s implementa- tion of the combine p operator requires explicit data move- 6.3 Quality of ArrayDB’s query evaluation plans 12 A similar observation was made by Musick and Critchlow [29] when they compared performance of relational DBMSs and OR- To gauge the quality of ArrayDB’s query evaluation plans, we DBMSs executing point, multi-point, and range queries with that of compared them to custom, manually generated, query-speciﬁc native Unix fwrite and fread system calls.

20.A.P. Marathe, K. Salem: Query processing techniques for arrays 87 ment. The C++ WAVELET program avoids data movement synchronization overhead. Guibas and Wyatt’s inﬂuential by stepping through the elements of the two arrays in lock work on this topic [12] — performed in the context of a subset step, performing calculations on-the-ﬂy (and thus also avoid- of APL operators — has subsequently been extended [4,15, ing function-call overhead). For DESTRIPE, the C++ program 41]. The study reported in [15] shows how to compose FOR- reads the desired band and simply corrects every sixth row in TRAN 90 operators such as RESHAPE, EOSHIFT, MERGE, it, making updates in place. ArrayDB ﬁrst computes the cor- WHERE constructs, and array reduction operations. The re- rected rows, then computes the uncorrected rows, and then placement of the contiguous sub and merge operations by a merges the arrays formed in the previous two steps. Each of single combine p operation in ArrayDB is also an example of these steps involves copying data. such an array function composition. Common subexpressions ArrayDB’s failure to detect common subexpressions fur- and one-to-many array operations are problematic to handle ther affects DESTRIPE. In the plan for DESTRIPE, ArrayDB in this optimization framework [15]. reads the base array twice from disk, once to compute the In scalar-oriented programming languages, array traver- corrected rows and once to extract the uncorrected rows.13 sals must be coded explicitly, often using some form of looping With common subexpression detection, one reading would control structure. Because of the big difference in the perfor- have been avoided. mance of CPU and memory (be it main memory or cache), loop In summary, these experiments have identiﬁed some of the optimizations try to improve the temporal and spatial locality limitations of ArrayDB’s query evaluation plans: the need to of array accesses so that elements accessed together (tempo- copy data as it ﬂows through plan operators, lack of support rally or spatially) can be found in faster memories. Many loop- for in-place update, and the lack of a binary apply operator. related optimizations are known [28]. Some, such as tiling [28, It is not clear whether these issues are best addressed at the p. 694], improve cache locality by manipulating the order in language level, or in the optimizer and evaluator, or both. For which array elements are visited. Other optimizations, such example, a more sophisticated optimizer, with a larger palette as loop interchange, skewing, reversal, and the order in which of plan operators, might be able to identify what is, in effect, a they can be applied to loop nests have been studied [44]. Ar- binary apply operation or an in-place update and implement ray restructuring is a loop-related optimization that changes it using an appropriate physical operator. On the other hand, the array layout in memory so that spatial data locality of array if AML had a binary apply operator or an update operator, element accesses in a loop improves [18]. Array layouts cho- the optimizer would have a much easier time identifying such sen for one loop may affect performance of subsequent loops, operations. We leave the exploration of these issues for future and therefore, it may be necessary to ﬁnd globally advanta- work. geous array layouts. Such loop-related optimizations are similar in spirit to the iteration order optimization implemented for AML by Ar- 7 Related work rayDB. Although the optimizations are different, both seek to exploit iteration order to reduce costs associated with the Languages which support array manipulation can be classi- memory hierarchy. In case of ArrayDB, iteration order selec- ﬁed into two broad categories: collection-oriented languages tion produces memory-efﬁcient plans. In such plans, pieces of and scalar-oriented languages. A language is considered several arrays can be ﬁt into smaller and faster memories, and collection-oriented if collection types (for example, sets, se- therefore, memory hierarchy is well utilized. quences, arrays, vectors, and lists) and operations for manip- ulating them “as a whole” are primitive in the language [37]. APL, the ImageAlgebra [34], FORTRAN 90, andAML are ex- 7.1 Arrays in relational database management systems amples of collection-oriented languages with array types. In scalar-oriented languages, such as C and Pascal, collections Two approaches can be taken to support arrays in relational are manipulated element by element. database management systems or extended relational systems. Some implementations of collection-oriented languages The ﬁrst approach is to store each array element as a relational perform early ﬁltering of array data, much as ArrayDB does. tuple in which the element’s indices (as well as its value) The data to be ﬁltered are identiﬁed by lineage tracing, a kind are represented explicitly. Array manipulations can then be of data-ﬂow analysis. The complexity of such an analysis de- deﬁned using SQL or some other relational query language. pends on the kinds of array manipulations that the language However, SQL is not particularly well-suited to the kinds of supports. Often lineage is traced only through array operators array queries described in this paper [23]. that operate independently on each individual element of an The second approach uses array-valued relational at- array [3,5,12,42,43]. ArrayDB and AML are notable in that tributes. Such attribute values can be represented using bi- they support lineage tracing (and pushdown of ﬁltering op- nary large objects (BLOBs). Alternatively, in object-relational erations) through array operators that operate on rectangular DBMSs such as Illustra [16], Postgres [40], Paradise [8], and array chunks of arbitrary size [24,25]. Informix Universal Server [31], an array type with associated A common optimization in collection-oriented languages methods can be deﬁned. Standardization initiatives are un- is to (effectively) combine several consecutive array opera- derway for an image data type: Part 5 of the upcoming SQL tions into a composite array operation. Evaluation of the com- standard for multi-media (SQL/MM) is devoted to still im- posite operation avoids generating intermediate arrays, re- ages [39]. duces redundant data movement, and reduces parallel loop If BLOBs are used, the DBMS stores arrays but provides 13 This is the reason why ArrayDB and the C++ program for DE- little support for array manipulations. It may be possible to STRIPE do not perform the same number of I/O operations. select a portion of an array by retrieving only the correspond-

21.88 A.P. Marathe, K. Salem: Query processing techniques for arrays ing portion of the array BLOB, but other array manipulations of selecting images, often on the basis of image content, from would have to be implemented by the application itself. If an a large collection of stored images. Manipulation of the se- array type is available, array operations can be included in lected images is a secondary concern. Thus, such systems are queries by making calls to the array methods deﬁned for the complementary to systems such as ArrayDB. array type. That is, each query will have a relational part and File-based storage packages such as NetCDF [32] are a non-relational part, where the non-relational part consists of widely used to store array data. Like ArrayDB, NetCDF and expressions involving array methods. Unfortunately, in most similar packages help to isolate applications from the details of object-relational systems, optimization of the non-relational the physical organization of array data. However, these pack- parts of a query is very limited. Method invocations appear- ages are not database systems. Only simple retrieval and stor- ing in the non-relational parts of a query are treated as black age operations are supported, so most array manipulation is boxes. Since the optimizer does not understand these methods, performed by the application. little or no optimization of the array manipulations is possible. Multi-dimensional OLAP (MOLAP) systems such as Ess- At best, the DBMS might optimize the placement of the non- base are decision-support systems that store and manipulate relational parts of the query within the relational evaluation multi-dimensional arrays [10]. MOLAP systems emphasize plan [14]. efﬁcient combination, grouping, and aggregation of array ele- Efforts to address this problem are still at the research ments. A formal model for OLAP systems is described in [13]. stage. PREDATOR is a research prototype DBMS in which Such systems exploit some of the same kinds of optimiza- relational and non-relational optimizers can be combined to tions, such as early data ﬁltering, used by ArrayDB. However, support queries with relational and non-relational parts [36]. many operations in OLAP applications are performed on irreg- In particular, an array query optimizer could be applied to an ular, data-dependent groupings of array elements. In contrast, array expression composed of the methods of an array ADT. AML’s operators are best suited for operations with a regular AML and the ArrayDB optimizer would be well-suited for use structure based on array indices. in such an environment. 8 Conclusions and future work 7.2 Array database systems This paper describes AML, a query algebra for arrays, and Several database management systems have been designed, techniques for optimizing and evaluating AML expressions. like ArrayDB, speciﬁcally for array-structured data. Array AML expressions deﬁne structured applications of uninter- database systems are often designed for speciﬁc application preted, externally-deﬁned functions to arrays.AML query pro- domains such as scientiﬁc computing and online analytical cessing is implemented in ArrayDB — a database manage- processing (OLAP). ment system for arrays. ArrayDB’s query optimizer is capable T2 [5], Titan [6], and RasDaMan [2] are database man- of rewriting AML queries to eliminate unnecessary function agement systems designed for multi-spectral images and other applications and I/O. The optimizer also performs other opti- raster data. These systems are similar to ArrayDB in that they mizations, such as cost-based selection of the order in which allow externally-deﬁned functions to be applied to array data. array elements are processed. Using a suite of image process- However, AML is more general than the query languages used ing queries, we have shown that ArrayDB’s query processing in these systems in that it allows such functions to be applied techniques are effective at reducing query evaluation times to rectangular subarrays of any size. This allows ArrayDB to and memory requirements. directly implement and optimize a broader class of array op- The research reported in this paper can be extended in sev- erations. In addition, ArrayDB performs some optimizations, eral directions. First, it would be interesting to extend AML such as choosing the iteration order for function application, so that it incorporates other index-based operators (such as that are not considered in the other array database systems. a transpose or dimension reordering operator) and content- AQL is a scalar-oriented query language with low-level based operators. A content-based operator would restructure array manipulation primitives. A prototype AQL database an array, or apply functions to an array, in a manner that would system is described in [19]. Unlike AML, AQL is not a frame- depend on the value of an array element, rather than its posi- work for applying externally-deﬁned, application-speciﬁc ar- tion. Second, query optimization techniques that exploit some ray manipulations. Instead, application-speciﬁc array oper- of the properties of the user-deﬁned functions (such as commu- ations can be deﬁned within AQL using four array-related tativity or associativity) can be studied. An immediate ques- primitives plus such things as conditionals and arithmetic op- tion is how to describe these properties to the query optimizer erations. Two of the array primitives create arrays; one per- so that it can reason about them and exploit them. It should forms subscripting (extracting a value from an array); and not be too difﬁcult to recognize instances where two adjacent one determines the shape of an array. Lineage tracing can be user-deﬁned functions can be composed by manufacturing a performed at the array element level on AQL expressions. composite function that calls the two original function in se- AQL is even more ﬂexible than AML in terms of the types of quence. Such an optimization avoids generation of some of the lineage tracing that it can support. One drawback of this ﬂex- intermediate arrays during query evaluation. Third, the plan re- ibility, however, is that it is not clear how to produce efﬁcient, ﬁnement phase can be generalized to consider different chunk pipelined query evaluation plans for AQL queries. shapes in addition to considering different chunk orders. At Like the systems described above, image database systems present, ArrayDB’s physical operators assume ﬁxed input and allow array-structured data to be stored and retrieved [7]. How- output chunk shapes. Physical operators such as regroup p ever, image database systems typically focus on the problem and combine p can potentially produce and consume subar-

22.A.P. Marathe, K. Salem: Query processing techniques for arrays 89 rays of various shapes. The dynamic programming-based al- where gorithm would need to be revisited to examine whether it can R[j] = Q[index(P , j + 1)], be generalized in the presence of variable chunk shapes. A and fourth issue is the integration of array query processing into a relational database system or image database system. Images S[j] = Q[index(P , j + 1)], and other arrays are usually associated with non-array meta- and data. Ideally, it should be possible for an application to deﬁne T [j] = P [index(Q, j + 1)], queries that involve the kinds of array manipulations described for all j ≥ 0. Furthermore, the merge operation on the right in this paper, and that also use the meta-data for ﬁltering or is balanced. for other purposes. A ﬁnal issue is parallel evaluation of AML queries. AML Proof. Let Y P = mergei (P , A, B, δ); let Y Q = subi (Q, is a data-parallel language. Data-parallel languages permit ef- Y P ); let Z R = subi (R, A); let Z S = subi (S, B); and let ﬁcient parallel implementations because the operators in such Z T = mergei (T , Z R , Z S , δ). The goal is to prove that Y Q languages provide implicit parallelism [37]. The query com- and Z T have the same i-slabs. Moreover, it needs to be shown piler does not have to do complex loop analysis to ﬁnd par- that if the merge operator in the original expression is bal- allelism. Some of the issues involved in building a parallel anced, then the merge operator in the rewritten expression is evaluator for AML are: data partitioning and layout schemes; also balanced. methods for coordinating data retrieval; methods for coor- Since sub and merge operators do not reorder or duplicate dinating computation; and methods for interprocessor com- the slabs coming from the same array, to prove that Y Q and munication. Prior research has addressed issues such as the Z T have the same i-slabs, it sufﬁces to show the following data partitioning problem for user-deﬁned functions that con- three statements: (1) i-slab j (j ≥ 0) of A is in Y Q iff it is in sume and produce one-dimensional streams [30], and parallel Z T ; (2) i-slab j (j ≥ 0) of B is in Y Q iff it is in Z T ; and (3) evaluation of specialized forms of queries on remote-sensing i-slab j (j ≥ 0) of Y Q comes from A iff the i-slab j (j ≥ 0) data [6]. of Z T comes from A. The ﬁrst statement above can be proved as follows. As per Observation A2 applied to Y P = mergei (P , A, B, δ), the A A proof of an AML logical rewrite rule i-slab j (j ≥ 0) of A is equal to the i-slab index(P , j + 1) of Y P . Now the i-slab index(P , j + 1) of Y P is in Y Q iff A proof of Theorem 10 appears in this appendix. The proof Q[index(P , j + 1)] = 1. technique contained therein is more generally useful in that Now the i-slab j (j ≥ 0) of A is in Z T iff R[j] = 1. other rewrite rules can also be proved using a similar approach. From the deﬁnition of R, the i-slab j (j ≥ 0) of A is in Z T Proofs of all of the non-trivial AML rewrite rules can be found iff Q[index(P , j + 1)] = 1. By comparing this conclusion to in [23]. Sub and merge operators map slabs in their input the one reached in the previous paragraph, the ﬁrst statement arrays to slabs in their output arrays. Therefore, the proof that is proved. follows shows that the original expression and the rewritten The proof of the second statement — which involves using expression generate the same array slabs. Since sub and merge the deﬁnition of S — is symmetric to that of the ﬁrst statement. do not change or permute array cell values in slabs, it then The third statement can be proved as follows. As per Ob- follows that the result arrays from the original expression and servation A1 applied to Y Q = subi (Q, Y P ), the i-slab j the rewritten expression are identical. (j ≥ 0) of Y Q is equal to the i-slab index(Q, j + 1) of Y P . The following observations, which follow from the deﬁ- Now the i-slab index(Q, j + 1) of Y P comes from A iff nitions of sub and merge, help in the proof. Each observation P [index(Q, j + 1)] = 1. establishes correspondences between the i-slabs of the output The i-slab j (j ≥ 0) of Z T comes from A iff T [j] = 1. array and the i-slabs of the input arrays of a particular AML From the deﬁnition of T , the i-slab j (j ≥ 0) of Z T comes operator. The i-slabs themselves are numbered from 0; that is, from A iff P [index(Q, j + 1)] = 1. By comparing this con- the slab number is the index of the i-slab in an array. clusion to the one reached in the previous paragraph, the third Observation A1. In the AML equation Y = subi (P , A), statement is proved. where P = 0, the i-slab number j (j ≥ 0) of Y equals Finally, let us prove that the merge operator in the rewrit- the i-slab number (index(P , j + 1)) of A. ten expression is balanced. The merge operator in the origi- nal expression is balanced and therefore, for all the dimen- Observation A2. In the merge-balanced AML expression sions j = i, A[j] = B[j]. In the rewritten expression, Y = mergei (P , A, B, δ), where P = 0 and P = 1, the Z R [j] = A[j] and Z S [j] = B[j] for all j = i because i-slab number j (j ≥ 0) of A equals the i-slab number the sub operators with the patterns R and S do not change (index(P , j + 1)) of Y ; the i-slab number j (j ≥ 0) of B the array lengths of their argument arrays in dimensions other equals the i-slab number (index(P , j + 1)) of Y . than dimension i. Therefore, the merge operator in the rewrit- ten expression is balanced as far as all dimensions j = i are Theorem 10 (pushing sub through merge, version 1) concerned. If P = 0, P = 1, Q = 0 and the expression on the left is Next, let us prove that the merge operator in the rewritten merge-balanced, then expression is balanced in dimension i. Y P [i] = A[i] + B[i] because the merge operator in the original expression is bal- subi ( Q, mergei (P , A, B, δ)) = anced. Suppose that, in the original expression, the sub oper- mergei (T , subi (R, A), subi (S, B), δ), ator deletes a i-slabs of A and b i-slabs of B (a ≥ 0, b ≥ 0).

23.90 A.P. Marathe, K. Salem: Query processing techniques for arrays Therefore, Y Q [i] = A[i] + B[i] − a − b. Now in the rewrit- 17. Jay CB (1999) Shaping distributions. In: Hammond K, Michael- ten expression, the sub operators must delete a i-slabs from son G (eds), Research directions in parallel functional program- A and b i-slabs from B because otherwise the two expres- ming. Springer, London, pp 219–232 sions will not be equivalent. Therefore, Z R [i] = A[i] − a and 18. Leung S, Zahorjan J (1995) Optimizing data locality by array Z S [i] = B[i]−b. Now Z T [i] must be equal to Y Q [i] because restructuring. Technical Report 95-09-01, Department of Com- otherwise the two expressions will not be equivalent. There- puter Science and Engineering, University of Washington, Seat- fore, Z T [i] = A[i]+B[i]−a−b. Now Z R [i]+Z S [i] is equal tle, Wash. to (A[i] − a) + (B[i] − b) which, in turn, is equal to Z T [i]. 19. Libkin L, Machlin R, Wong L (1996) A query language for mul- Therefore, the merge operator in the rewritten expression is tidimensional arrays: design, implementation, and optimization techniques. In: Proceedings of the ACM-SIGMOD International balanced in dimension i. Conference on Management of Data, Canada, ACM, pp 228–239 20. Lillesand TM, Kiefer RW (1999) Remote sensing and image interpretation 4th edn. Wiley, New York References 21. Lohman GM, Stoltzfus JC, Benson AN, Martin MD, Cardenas 1. Arya M, Cody W, Faloutsos C, Richardson J, Toga A (1994) AF (1983) Remotely-sensed geophysical databases: experience QBISM: extending a DBMS to support 3D medical images. In: and implications for generalized DBMS. In: Proceedings of the Proceedings of the 10th International Conference on Data En- ACM SIGMOD International Conference on Management of gineering, Houston, Texas, February. IEEE Computer Society Data, San Jose, Calif., May, pp 146–160 Press, pp 314–325 22. Maier D, Vance B (1993) A call to order. In: Proceedings of the 2. Baumann P, Dehmel A, Furtado P, Ritsch R, Widmann N (1998) ACM SIGACT-SIGMOD-SIGART Symposium on Principles of The multidimensional database system RasDaMan. In: Proceed- Database Systems, pp 1–16 ings of ACM SIGMOD International Conference on Manage- 23. Marathe AP (2001) Query processing techniques for arrays. PhD ment of Data, Seattle, Washington, June, pp 575–577 thesis, Department of Computer Science, University of Water- 3. Baumann P (1994) Management of multidimensional discrete loo, Waterloo, Ontario, Canada, January data. VLDB J 3(4):401–444 24. Marathe AP (2001) Tracing lineage of array data. In: Proceed- 4. Budd T (1988) An APL compiler. Springer, Berlin Heidelberg ings of the Thirteenth International Conference on Scientiﬁc and New York Statistical Database Management, Fairfax, Virginia, July, pp 69– 5. Chang C, Acharya A, Sussman A, Saltz J (1998) T2: a customiz- 78 able parallel database for multi-dimensional data. SIGMOD Rec 25. Marathe AP (2001) Tracing lineage of array data. J Intell Inf Syst 27(1):58–66 17(2/3):193–214 6. Chang C, Moon B, Acharya A, Shock C, Sussman A, Saltz JH 26. MaratheAP, Salem K (1997)A language for manipulating arrays. (1997) Titan: a high-performance remote sensing database. In: In: Proceedings of the 23rd International Conference on Very Proceedings of the Thirteenth International Conference on Data Large Data Bases, Athens, Greece, August. Morgan Kaufmann, Engineering, Birmingham, UK, April, pp 375–384 pp 46–55 7. Chang S, Hsu A (1992) Image information systems: where do 27. Marathe AP, Salem K (1999) Query processing techniques for we go from here? IEEE Trans Knowl Data Eng 4(5):431–442 arrays. In: Proceedings of the ACM SIGMOD International Con- 8. DeWitt DJ, Kabra N, Luo J, Patel JM, Yu J (1994) Client–server ference on Management of Data, Philadelphia, Pennsylvania, paradise. In: Proceedings of the 20th VLDB Conference, Santi- June. ACM Press, pp 323–334 ago, Chile, pp 558–569 28. Muchnick SS (1997) Advanced compiler design and implemen- 9. Furtado P, Baumann P (1999) Storage of multidimensional ar- tation. Morgan Kaufmann, San Francisco rays based on arbitrary tiling. In: Proceedings of the 15th In- 29. Musick R, Critchlow T (1999) Practical lessons in supporting ternational Conference on Data Enginering, Sydney, Australia, large-scale computational science. SIGMOD Rec 28(4):49–57 March, pp 480–489 30. Ng KW, Muntz RR (1999) Parallelizing user-deﬁned functions 10. Garcia-Molina H, Ullman JD, Widom J (2000) Database system in distributed object-relational DBMS. In: Proceedings of the implementation. Prentice Hall, Upper Saddle River, New Jersey 1999 International Database Engineering and Applications Sym- 11. Graefe G (1993) Query evaluation techniques for large posium, Montreal, Canada, August, pp 442–445 databases. ACM Comput Surv 25(2):73–170 31. Olson MA, Hong WM, Ubell M, Stonebraker M (1996) Query 12. Guibas LJ, Wyatt DK (1978) Compilation and delayed evalua- processing in a parallel object-relational database system. Bull tion in APL. In: Conference Record of the Fifth Annual ACM IEEE Comput Soc Tec Comm Data Eng 19(4):3–10 Symposium on Principles of Programming Languages, Tucson, 32. Rew R, Davis G, Emmerson S, Davies H (1996) NetCDF user’s Arizona, January, pp 1–8 guide, version 2.4. Unidata Program Center, Boulder, Colorado 13. Gyssens M, Lakshmanan LVS (1997) A foundation for multi- 33. Ritter GX, Wilson JN, Davidson JL (1990) Image algebra: an dimensional databases. In: Proceedings of the 23rd International overview. Comput Vision Graph Image Process 49:297–331 Conference on Very Large Data Bases, Athens, Greece, August. 34. Ritter GX, Wilson JN (1996) Handbook of computer vision al- Morgan Kaufmann, pp 106–115 gorithms in image algebra. CRC Press, Boca Raton, Florida 14. Hellerstein JM, Stonebraker M (1993) Predicate migration: op- 35. Sarawagi S, Stonebraker M (1994) Efﬁcient organization of large timizing queries with expensive predicates. In: Proceedings of multidimensional arrays. In: Proceedings of the 10th Interna- the ACM-SIGMOD International Conference on Management tional Conference on Data Engineering, Houston, Texas, Febru- of Data, Washington, D.C., ACM, pp 267–276 ary. IEEE Computer Society Press, pp 328–336 15. Hwang GH, Lee JK, Ju RD (1998) A function-composition ap- 36. Seshadri P, Livny M, Ramakrishnan R (1997) The case for en- proach to synthesize Fortran 90 array operations. J Parallel Dist hanced abstract data types. In: Proceedings of the 23rd VLDB Comput 54(1):1–47 Conference, Athens, Greece, pp 66–75 16. Illustra Information Technologies, Inc. (1994) Illustra user’s 37. Sipelstein J, Blelloch GE (1991) Collection-oriented languages. guide. Oakland, Calif. Proc IEEE 79(4):504–523

24.A.P. Marathe, K. Salem: Query processing techniques for arrays 91 38. Stollnitz EJ, DeRose TD, Salesin DH (1996) Wavelets for com- 42. Vandenberg SL, DeWitt DJ (1991) Algebraic support for com- puter graphics: theory and applications. Morgan Kaufmann, San plex objects with arrays, identity, and inheritance. In: Proceed- Francisco ings of the ACM-SIGMOD International Conference on Man- 39. Stolze K (2000) SQL/MM part 5: still image – the standard and agement of Data, ACM Inc, pp 158–167 implementation aspects. Jenaer Schriften zur Mathematik und 43. Widmann N, Baumann P (1998) Efﬁcient execution of opera- Informatik Math/Inf/00/27, Institut f¨ur Informatik, Friedrich- tions in a DBMS for multidimensional arrays. In: Proceedings Schiller-Universit¨at Jena, September of the 10th International Conference on Scientiﬁc and Statistical 40. Stonebraker M, Rowe LA, Hirohama M (1990) The implementa- Database Management, Capri, Italy, July tion of POSTGRES. IEEE Trans Knowl Data Eng 2(1):125–142 44. Wolf MW, Lam MS (1991) A data locality optimizing algo- 41. Treat JM, Budd TA (1984) Extensions to grid selector composi- rithm. In: Proceedings of the ACM SIGPLAN ’91 Conference tion and compilation in APL. Inf Process Lett 19(3):117–123 on Programming Language Design and Implementation (PLDI), Toronto, Canada, June, pages 30–44