Page 18 - Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016.
P. 18
ucture for four-sided queries, with the query running time Lemma 2. [4]The layout L and search tree Y can be used
O(logB N + T /B) and O(log22 N ) space usage. This struc- to answer any two-sided range query in O(logB N T /B) mem-
ture doesn’t have any assumption about B and it’s much ory transfers, where T is the number of reported points.
simpler. In 2009, Afshani et al.[1] proved, that for optimal
query bound, for three-sided range reporting, the structure 2.1.1 The construction
has to use Ω(N (log log N )ε) space, where ε > 0.
Arge et al.[4] presented a contruction method for the layout.
2. CACHE-OBLIVIOUS DATA STRUCTURE The basic idea is to store all the points, sorted by the x-
coordinate, in the leafs of a balanced binary search tree,
For the cache-oblivious data structure we chose the data stored implicitly using the van-Emde-Boas layout. All the
structure from [4], as it’s the simplest and doesn’t have as- internal nodes carry additional information about the actual
sumptions about B, and has the same running time. We points in the subtree. Then using the sweep line approach;
first describe the data structure for two-sided queries and starting from −∞, we sweep a horizontal line upwards across
how to construct it. After that we describe the construction the plane. Each time the sweep line is at point p, we traverse
of the data structure for four-sided queries. the tree leaf-to-root and update all the nodes on the path.
After this step, we have to check in the root of the tree, if
2.1 Data structure for two-sided queries there exists a sparse query. If it does, we have to traverse a
root-to-leaf path (specified by the information in the internal
The structure consists of two parts: a sequence L of length nodes), to find the point, we use to create the tree Y. The
O(N ), called the layout, which stores the points of S, with whole process was proved[4] to take O(N logB N ) memory
the possibility of duplicate entries; and a balanced binary transfers.
search tree Y on a subset of the y-coordinates of the points
in S, stored implicitly in the van-Emde-Boas layout. Each 2.2 Data structure for four-sided queries
leaf in the tree stores a pointer to an element of L. To answer
a query (−∞, x] × [y, ∞), a search for y in Y is done and We first explain how to construct a data structure for three-
then the pointer is followed into L. Then a scan forward sided queries, as it is a step to construct the data structure
over L is performed until an x-coordinate greater than x is for four-sided queries. Our three-sided structure consists of
found. a balanced binary tree T with the points in S stored at
the leaves, sorted by their x-coordinates. T is laid out in
The key point of the data structure is the way the points memory using the van-Emde-Boas layout, so that a root-to-
are stored in the layout L. Assume that we are prepared leaf path can be traversed cache-obliviously in O(logB N )
to scan through αT points for α > 1, when answering a memory transfers. For each internal node v in T , let Sv be
query with output T . We will call it a dense if we scan at the points of S stored in the subtree rooted at v. We store
most αT points, and sparse otherwise. Consider the mini- the points in Sv in two secondary structures, Lv and Rv,
mum y-coordinate y1 such that there exists a sparse query associated with v. Lv is a structure for answering two-sided
(−∞, x] × [y1, ∞). Let S0 be a sequence sorted by the x- queries of the form (∞, x] × [y, ∞) (with the x-opening to
coordinate, with y coosrdinate less than y1. As there are the left); Rv is a structure for answering two-sided queries
no sparse queries with the y-coordinate less than y1, we can of the form [x, ∞)×[y, ∞) (with the x-opening to the right).
answer those queries efficiently. As we repeat the step with
the remaining points, we get a concatenation of sequences Each point is stored in two linear space structures on each
S0, S1, ..., Sk of points, with which we can answer all the of the O levels of the tree T , the structure uses O(N log2 N )
queries efficiently. The problem with this approach is that space. To answer the query [xl, xr] × [yb, ∞), we take the
the space can be more than linear, since the worst case size first node v such that xl is contained in the subtree rooted
is Θ(N ). at he left child l, and xr is contained in the subtree rooted
at the right child r. Then we query Rl with the query range
To reduce the size of the layout we need to store the se- [xl, ∞) × [yb, ∞) and Lr with the query range (inf ty, xr] ×
quences in such a way, that every sequence Si is identical [yb, ∞).
with Si+1 having a suffix valuse as large as possible. We take
a point with the minimum y-coordinate, such that there ex- To construct the data structure for four-sided queries, we
ists a sparse query (−∞, xi] × [yi, ∞). Instead of storing all need to apply the same method again. We store all the
the points with y-coordinate less than yi, we store only the points, this time sorted by the y-coordinate, in a balanced
points that has the x-coordinate less than xi too. With this binary tree, stored using the van-Emde-Boas layout. For
improvement it can be proved that the size of the layout L each internal node v, let Sv represent the points from S
is linear: stored in the subtree rooted in v. We store the points in
Sv in two secondary structures Uv and Bv, associated with
Lemma 1. [4]Layout L uses at most α N = O(N ) space. v. Uv is a structure for answering a three-sided query of the
α−1 form [xl, xr]×[y, ∞); Bv is a structure for answering a three-
sided query of the form [xl, xr] × [y, −∞). This construction
To answer the two-sided queries we create, in addition to adds another factor of O(log2N ) to the space complexity,
the leayout L, a binary search tree Y over the y-coordinates to get the final O(N log22 N ) space complexity of the data
we used to construct L, and store the whole tree using the structure for four-sided queries.
van-Emde-Boas layout. Each leaf stores a pointer to the 3. RESULTS
start of the sequence it produced in L.With both, the tree
Y and the layout L, we can answer two-sided range queries As we implemented the mentioned data structures, we no-
in O(logB N T /B). ticed that the construction algorithm rely heavily on root-
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 18
Koper, Slovenia, 10-11 October
O(logB N + T /B) and O(log22 N ) space usage. This struc- to answer any two-sided range query in O(logB N T /B) mem-
ture doesn’t have any assumption about B and it’s much ory transfers, where T is the number of reported points.
simpler. In 2009, Afshani et al.[1] proved, that for optimal
query bound, for three-sided range reporting, the structure 2.1.1 The construction
has to use Ω(N (log log N )ε) space, where ε > 0.
Arge et al.[4] presented a contruction method for the layout.
2. CACHE-OBLIVIOUS DATA STRUCTURE The basic idea is to store all the points, sorted by the x-
coordinate, in the leafs of a balanced binary search tree,
For the cache-oblivious data structure we chose the data stored implicitly using the van-Emde-Boas layout. All the
structure from [4], as it’s the simplest and doesn’t have as- internal nodes carry additional information about the actual
sumptions about B, and has the same running time. We points in the subtree. Then using the sweep line approach;
first describe the data structure for two-sided queries and starting from −∞, we sweep a horizontal line upwards across
how to construct it. After that we describe the construction the plane. Each time the sweep line is at point p, we traverse
of the data structure for four-sided queries. the tree leaf-to-root and update all the nodes on the path.
After this step, we have to check in the root of the tree, if
2.1 Data structure for two-sided queries there exists a sparse query. If it does, we have to traverse a
root-to-leaf path (specified by the information in the internal
The structure consists of two parts: a sequence L of length nodes), to find the point, we use to create the tree Y. The
O(N ), called the layout, which stores the points of S, with whole process was proved[4] to take O(N logB N ) memory
the possibility of duplicate entries; and a balanced binary transfers.
search tree Y on a subset of the y-coordinates of the points
in S, stored implicitly in the van-Emde-Boas layout. Each 2.2 Data structure for four-sided queries
leaf in the tree stores a pointer to an element of L. To answer
a query (−∞, x] × [y, ∞), a search for y in Y is done and We first explain how to construct a data structure for three-
then the pointer is followed into L. Then a scan forward sided queries, as it is a step to construct the data structure
over L is performed until an x-coordinate greater than x is for four-sided queries. Our three-sided structure consists of
found. a balanced binary tree T with the points in S stored at
the leaves, sorted by their x-coordinates. T is laid out in
The key point of the data structure is the way the points memory using the van-Emde-Boas layout, so that a root-to-
are stored in the layout L. Assume that we are prepared leaf path can be traversed cache-obliviously in O(logB N )
to scan through αT points for α > 1, when answering a memory transfers. For each internal node v in T , let Sv be
query with output T . We will call it a dense if we scan at the points of S stored in the subtree rooted at v. We store
most αT points, and sparse otherwise. Consider the mini- the points in Sv in two secondary structures, Lv and Rv,
mum y-coordinate y1 such that there exists a sparse query associated with v. Lv is a structure for answering two-sided
(−∞, x] × [y1, ∞). Let S0 be a sequence sorted by the x- queries of the form (∞, x] × [y, ∞) (with the x-opening to
coordinate, with y coosrdinate less than y1. As there are the left); Rv is a structure for answering two-sided queries
no sparse queries with the y-coordinate less than y1, we can of the form [x, ∞)×[y, ∞) (with the x-opening to the right).
answer those queries efficiently. As we repeat the step with
the remaining points, we get a concatenation of sequences Each point is stored in two linear space structures on each
S0, S1, ..., Sk of points, with which we can answer all the of the O levels of the tree T , the structure uses O(N log2 N )
queries efficiently. The problem with this approach is that space. To answer the query [xl, xr] × [yb, ∞), we take the
the space can be more than linear, since the worst case size first node v such that xl is contained in the subtree rooted
is Θ(N ). at he left child l, and xr is contained in the subtree rooted
at the right child r. Then we query Rl with the query range
To reduce the size of the layout we need to store the se- [xl, ∞) × [yb, ∞) and Lr with the query range (inf ty, xr] ×
quences in such a way, that every sequence Si is identical [yb, ∞).
with Si+1 having a suffix valuse as large as possible. We take
a point with the minimum y-coordinate, such that there ex- To construct the data structure for four-sided queries, we
ists a sparse query (−∞, xi] × [yi, ∞). Instead of storing all need to apply the same method again. We store all the
the points with y-coordinate less than yi, we store only the points, this time sorted by the y-coordinate, in a balanced
points that has the x-coordinate less than xi too. With this binary tree, stored using the van-Emde-Boas layout. For
improvement it can be proved that the size of the layout L each internal node v, let Sv represent the points from S
is linear: stored in the subtree rooted in v. We store the points in
Sv in two secondary structures Uv and Bv, associated with
Lemma 1. [4]Layout L uses at most α N = O(N ) space. v. Uv is a structure for answering a three-sided query of the
α−1 form [xl, xr]×[y, ∞); Bv is a structure for answering a three-
sided query of the form [xl, xr] × [y, −∞). This construction
To answer the two-sided queries we create, in addition to adds another factor of O(log2N ) to the space complexity,
the leayout L, a binary search tree Y over the y-coordinates to get the final O(N log22 N ) space complexity of the data
we used to construct L, and store the whole tree using the structure for four-sided queries.
van-Emde-Boas layout. Each leaf stores a pointer to the 3. RESULTS
start of the sequence it produced in L.With both, the tree
Y and the layout L, we can answer two-sided range queries As we implemented the mentioned data structures, we no-
in O(logB N T /B). ticed that the construction algorithm rely heavily on root-
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 18
Koper, Slovenia, 10-11 October