Lecture04 Range Searching
Lecture04 Range Searching
Lecture04 Range Searching
Lecture 4, CS 631100
Lecture 4, CS 631100
Outline
Reference
Textbook chapter 5 Mounts Lectures 17 and 18
Problem: querying a database Solution in one dimension Data structure in IR2 : range trees Extension to higher dimensions log n factor improvement
Lecture 4, CS 631100
Lecture 4, CS 631100
Geometric interpretation
Lecture 4, CS 631100
Query problems
Assume n is the total number of transactions in the database We will show how to build a data structure in O(n log n) time that allows to perform this type of queries in O(k + log n) time where k is the size of the output (the number of transactions that are reported) The data structure is built only once, then a large number of queries can be answered quickly O(n log n) is the preprocessing time O(k + log n) is the query time
Lecture 4, CS 631100
Boxes
3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy
Lecture 4, CS 631100
Boxes
3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy
Lecture 4, CS 631100
Boxes
3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension Algorithmic problems with boxes are relatively easy
Lecture 4, CS 631100
Problem statement
Let P be a set of n points in IRd We assume d = O(1) Preprocess P so as to answer queries of the type
Input: (a1 , b1 , a2 , b2 , . . . ad , bd ) Output: P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])
Lecture 4, CS 631100
Lecture 4, CS 631100
Problem statement
P is a set of real numbers Queries: nd all the points in P that are between a and b Data structure:
Balanced Binary Search Tree Preprocessing time: (n log n) time to build a BBST Space usage: (n)
Lecture 4, CS 631100
Answering a query
Algorithm Report (T , a, b) Input: a BBST T storing P , an interval [a, b] Output: P [a, b] 1. if T = N U LL 2. then return 3. x value stored at the root of T 4. if a<x 5. then Report(T .lef t, a, b) 6. if a x b 7. then output x 8. if x<b 9. then Report(T .right, a, b)
Lecture 4, CS 631100
Report left path, right path, vsplit and subtrees in between. Length of
path from root to vsplit left path right path
Lecture 4, CS 631100
10
Lecture 4, CS 631100
11
Introduction
A set P of n points in IR2 Query: given (a1 , b1 , a2 , b2 ), nd all points (x, y ) from P in rectangle [a1 , b1 ] [a2 , b2 ]. Results presented in this section
(n log n) preprocessing time (n log n) space usage (k + log2 n) query time
Lecture 4, CS 631100
12
Canonical sets
First store T in a BBST using the xcoordinates as keys We associate each node v of T with a canonical set Cv containing points in P stored in the subtree rooted at v .
Lecture 4, CS 631100
13
Range trees in IR
Each canonical set Cv is stored in a BBST Tv using the y coordinates as keys. Tv is called the canonical tree at node v .
We make the query through TWO steps: 1st on x-coordinates, & 2nd on y -coordinates (as shown in the following slides).
Lecture 4, CS 631100 Orthogonal Range Searching 14
Lecture 4, CS 631100
15
Partitioning P
After we make the query with range [a1 , b1 ] on x-coord.: We take the nodes on the left path and the right path, which gives P . For each node on the left path, select canonical tree Ti of its right child, (gives some Ci ). For each node on the right path, select canonical tree Ti of its left child, (gives some Ci ). It takes O(log n) time (height of the BBST). There are c = O(log n) canonical sets in our partition.
Lecture 4, CS 631100
16
For all i, use interval [a2 , b2 ] to perform a 1-dim. search query in Ci using canonical tree Ti .
The union of all these results gives P ([a1 , b1 ] [a2 , b2 ]) Analysis of query time:
Let ki = no. of points reported from Ti c i=1 ki k Query time:
c c
O(log n + ki ) = c log n +
i=1 i=1
ki = O(log2 n + k )
Lecture 4, CS 631100
17
Ci
Ti
canonical tree
Query on x-coordinates on T :
Obtain P (points on left & right paths)& canonical trees Ti . It takes O(log n) time. It takes O(log2 n + k ) (refer to previous slide).
Query on y -coordinates on Ti :
A point p belongs to all the canonical sets in the path from the vertex of T that stores p to the root (and only these canonical sets) Thus p lies in O(log n) canonical sets Hence
v T
where Cv = the canonical set at node v . The memory space used is O(n log n). Actually, it is (n log n).
Why?
Lecture 4, CS 631100
19
Ti n
n 2 n 2
n 2( n 2) = n
n 4
n 4
n 4
n 4
4( n 4) = n ...
Preprocessing time
Tv can be build in O(|Cv | log |Cv |) time |Cv | log |Cv | log n
Hence the range tree can be built in time |Cv | = log nO(n log n) = O(n log2 n)
Lecture 4, CS 631100
21
Lecture 4, CS 631100
22
Idea
We assume d > 1 and d = O(1). We want to perform range searching in IRd . We still build T with respect to the x1 coordinate.
For each canonical set of T we build a (d 1)dimensional range searching data structure using coordinates (x2 , x3 , . . . xd ). To answer a ddimensional query
Find the canonical trees of T associated with [a1 , b1 ] Make a d 1dimensional query on each canonical tree recursively, using [a2 , b2 ] [a3 , b3 ] . . . [ad , bd ]
Lecture 4, CS 631100
23
Analysis
Lecture 4, CS 631100
24
O(n logd2 n)
d2 n 2O( n 2 log 2) = O(n logd2 n 2) d2 n 4O( n 4 log 4) = O(n logd2 n 4)
...
nO(1) = O(n)
Then size of d-dim. range tree is d2 n O(n logd2 n) + O(n logd2 n 2 ) + O (n log 4 ) + . . . + O (n) d2 d1 = log n O(n log n) = O(n log n).
Lecture 4, CS 631100 Orthogonal Range Searching
25
Lecture 4, CS 631100
26
Motivation
In IR2 the query time of range trees is (k + log2 n) For comparison based algorithms, (k + log n) is a lower bound. Can we do better to achieve the lower bound? Yes, well then show how to obtain (k + log n) optimal query time.
Lecture 4, CS 631100
27
Ci
Cj
Take the nodes on the left path and the right path. Select canonical set Ci at right child of a node on left path; Select canonical set Cj at left child of a node on right path. It takes O(log n) time (height of the BBST T ). Let {C1 , C2 , . . . , Cc } = canonical sets selected, where c = O(log n).
Lecture 4, CS 631100 Orthogonal Range Searching 28
Av.lef t
Av.right
Thus after performing search on a2 or b2 in Tv , we can perform search on a2 or b2 in Tv.lef t & Tv.right in O(1) time.
Lecture 4, CS 631100 Orthogonal Range Searching 29
Make a search for key a2 in array Ai ; Starting from a2 , walk along array Ai until b2 is exceeded.
Av
Av.lef t
Av.right
Lecture 4, CS 631100
30
Ai
By following pointer links, we can search a2 in a canonical array Ai in O(1) time. Starting from a2 , walk along array Ai (& reporting them) until b2 is exceeded.
Lecture 4, CS 631100 Orthogonal Range Searching
31
Hence we can answer 2-dim. range query in O(log n + k ) optimal time. This technique is known as fractional cascading. By induction, it also improves by a factor O(log n) the results in d > 2 (by using canonical arrays at the last level, and the linking pointers). Hence range trees with fractional cascading in d 2 yield
Query time: O(k + logd1 n) (improved by a O(log n) factor) Space usage: O(n logd1 n) (same as before) Preprocessing time: O(n logd1 n) (same as before)
Lecture 4, CS 631100
32
O(log n + k ) query time and O(n log n) preprocessing time are optimal. But space complexity is NOT optimal. O(n log n/ log log n) space is possible in 2 dimensions with the same query time, and this is optimal. (not covered in this course)
Lecture 4, CS 631100
33
Concluding remarks
Range trees:
simple nearly optimal
Lecture 4, CS 631100
34
Next Lecture
Next lecture:
Segment Trees and Interval Trees
Segment Trees Interval Trees
Lecture 4, CS 631100
35