Lecture04 Range Searching

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Orthogonal Range Searching

Lecture 4, CS 631100

Sheung-Hung Poon [email protected]

Fall 2011 National Tsing Hua University (NTHU)

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

Outline

Reference
Textbook chapter 5 Mounts Lectures 17 and 18

Problem: querying a database Solution in one dimension Data structure in IR2 : range trees Extension to higher dimensions log n factor improvement

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

An Example of Application on Database


A database in a bank records transactions A query: nd all the transactions such that
The amount is between $ 1000 and $ 2000 It happened between 10:40am and 11:20am

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

An Example of Application on Database


A database in a bank records transactions A query: nd all the transactions such that
The amount is between $ 1000 and $ 2000 It happened between 10:40am and 11:20am

Geometric interpretation

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

Query problems

Assume n is the total number of transactions in the database We will show how to build a data structure in O(n log n) time that allows to perform this type of queries in O(k + log n) time where k is the size of the output (the number of transactions that are reported) The data structure is built only once, then a large number of queries can be answered quickly O(n log n) is the preprocessing time O(k + log n) is the query time

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

Boxes

3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

Boxes

3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension 2dbox Also known as rectangle Parallel to coordinate axis Algorithmic problems with boxes are relatively easy

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

Boxes

2dbox Also known as rectangle Parallel to coordinate axis

3dbox [0, 3] [0, 2.5] [0, 2] Generalize to any dimension Algorithmic problems with boxes are relatively easy

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

Problem statement

Let P be a set of n points in IRd We assume d = O(1) Preprocess P so as to answer queries of the type
Input: (a1 , b1 , a2 , b2 , . . . ad , bd ) Output: P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])

We denote k = |P ([a1 , b1 ] [a2 , b2 ] [ad , bd ])|

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

One Dimensional Case (d=1)

One Dimensional Case (d=1): Using BBST

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

One Dimensional Case (d=1)

Problem statement

P is a set of real numbers Queries: nd all the points in P that are between a and b Data structure:
Balanced Binary Search Tree Preprocessing time: (n log n) time to build a BBST Space usage: (n)

Query time: (k + log n) time. How?

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

One Dimensional Case (d=1)

Answering a query
Algorithm Report (T , a, b) Input: a BBST T storing P , an interval [a, b] Output: P [a, b] 1. if T = N U LL 2. then return 3. x value stored at the root of T 4. if a<x 5. then Report(T .lef t, a, b) 6. if a x b 7. then output x 8. if x<b 9. then Report(T .right, a, b)

Lecture 4, CS 631100

Orthogonal Range Searching

Orthogonal Range Searching

One Dimensional Case (d=1)

Analysis of query time

Report left path, right path, vsplit and subtrees in between. Length of
path from root to vsplit left path right path

All lengths are O(log n) Sum of the sizes of red subtrees: k

Query time: O(k + log n)

Lecture 4, CS 631100

Orthogonal Range Searching

10

Orthogonal Range Searching

Two Dimensional Case (d=2)

Two Dimensional Case (d=2): Using range tree

Lecture 4, CS 631100

Orthogonal Range Searching

11

Orthogonal Range Searching

Two Dimensional Case (d=2)

Introduction

A set P of n points in IR2 Query: given (a1 , b1 , a2 , b2 ), nd all points (x, y ) from P in rectangle [a1 , b1 ] [a2 , b2 ]. Results presented in this section
(n log n) preprocessing time (n log n) space usage (k + log2 n) query time

Query time will be slightly improved in the last section

Lecture 4, CS 631100

Orthogonal Range Searching

12

Orthogonal Range Searching

Two Dimensional Case (d=2)

Canonical sets
First store T in a BBST using the xcoordinates as keys We associate each node v of T with a canonical set Cv containing points in P stored in the subtree rooted at v .

Lecture 4, CS 631100

Orthogonal Range Searching

13

Orthogonal Range Searching

Two Dimensional Case (d=2)

Range trees in IR

Each canonical set Cv is stored in a BBST Tv using the y coordinates as keys. Tv is called the canonical tree at node v .

We make the query through TWO steps: 1st on x-coordinates, & 2nd on y -coordinates (as shown in the following slides).
Lecture 4, CS 631100 Orthogonal Range Searching 14

Orthogonal Range Searching

Two Dimensional Case (d=2)

Step 1: Querying x-coordinates


First make the query with range [a1 , b1 ] on x-coordinates
Let P = P ([a1 , b1 ] (, )) Let P be the set of points on the right path and the left path (when searching for a1 and b1 ) We partition P \ P into c canonical subsets
Thus P = P C1 C2 . . . Cc

Lecture 4, CS 631100

Orthogonal Range Searching

15

Orthogonal Range Searching

Two Dimensional Case (d=2)

Partitioning P

After we make the query with range [a1 , b1 ] on x-coord.: We take the nodes on the left path and the right path, which gives P . For each node on the left path, select canonical tree Ti of its right child, (gives some Ci ). For each node on the right path, select canonical tree Ti of its left child, (gives some Ci ). It takes O(log n) time (height of the BBST). There are c = O(log n) canonical sets in our partition.

Lecture 4, CS 631100

Orthogonal Range Searching

16

Orthogonal Range Searching

Two Dimensional Case (d=2)

Step 2: Querying y -coordinates


p P check if p [a1 , b1 ] [a2 , b2 ], and report it if it is.

For all i, use interval [a2 , b2 ] to perform a 1-dim. search query in Ci using canonical tree Ti .

The union of all these results gives P ([a1 , b1 ] [a2 , b2 ]) Analysis of query time:
Let ki = no. of points reported from Ti c i=1 ki k Query time:
c c

O(log n + ki ) = c log n +
i=1 i=1

ki = O(log2 n + k )

Lecture 4, CS 631100

Orthogonal Range Searching

17

Orthogonal Range Searching

Two Dimensional Case (d=2)

Analysis of total query time

Ci

Ti

canonical tree

Query on x-coordinates on T :

Obtain P (points on left & right paths)& canonical trees Ti . It takes O(log n) time. It takes O(log2 n + k ) (refer to previous slide).

Query on y -coordinates on Ti :

Total query time = O(log n) + O(log2 n + k ) = O(log2 n + k ).


Lecture 4, CS 631100 Orthogonal Range Searching 18

Orthogonal Range Searching

Two Dimensional Case (d=2)

Space complexity (Proof 1)

A point p belongs to all the canonical sets in the path from the vertex of T that stores p to the root (and only these canonical sets) Thus p lies in O(log n) canonical sets Hence
v T

|Cv | = O(n log n),

where Cv = the canonical set at node v . The memory space used is O(n log n). Actually, it is (n log n).
Why?

Lecture 4, CS 631100

Orthogonal Range Searching

19

Orthogonal Range Searching

Two Dimensional Case (d=2)

Space complexity (Proof 2)

Ti n
n 2 n 2

n 2( n 2) = n
n 4

log n levels ... ... ... ...

n 4

n 4

n 4

4( n 4) = n ...

n(1) = n Total = (n log n)


Lecture 4, CS 631100 Orthogonal Range Searching 20

Orthogonal Range Searching

Two Dimensional Case (d=2)

Preprocessing time
Tv can be build in O(|Cv | log |Cv |) time |Cv | log |Cv | log n

Hence the range tree can be built in time |Cv | = log nO(n log n) = O(n log2 n)

We can do better ...


Compute the Tv s from leaves to root Computing Tv is merging two sorted sequences It takes O(|Cv |) time Overall, we can build the range tree in time |Cv | = (n log n)

Lecture 4, CS 631100

Orthogonal Range Searching

21

Orthogonal Range Searching

Range trees in higher dimensions

Range trees in higher dimensions

Lecture 4, CS 631100

Orthogonal Range Searching

22

Orthogonal Range Searching

Range trees in higher dimensions

Idea
We assume d > 1 and d = O(1). We want to perform range searching in IRd . We still build T with respect to the x1 coordinate.

For each canonical set of T we build a (d 1)dimensional range searching data structure using coordinates (x2 , x3 , . . . xd ). To answer a ddimensional query
Find the canonical trees of T associated with [a1 , b1 ] Make a d 1dimensional query on each canonical tree recursively, using [a2 , b2 ] [a3 , b3 ] . . . [ad , bd ]

Lecture 4, CS 631100

Orthogonal Range Searching

23

Orthogonal Range Searching

Range trees in higher dimensions

Analysis

Query time: O(logd n + k )


Due to d nested levels in d-dim. range tree, Searching for d levels takes O(logd n) time. Reporting all points inside the query range takes O(k ) time.

Space complexity: O(n logd1 n)


By induction on d (See next slide ...)

Preprocessing time: O(n logd1 n)


Compute the Tv s from leaves to root As the size of the range tree is O(n logd1 n), building the whole range tree takes O(n logd1 n).

Lecture 4, CS 631100

Orthogonal Range Searching

24

Orthogonal Range Searching

Range trees in higher dimensions

Space complexity (Proof by Induction)


Suppose (d 1)-dim. range tree has size of O(n logd2 n).
T Ti O(n logd2 n)
d2 n O( n 2 log 2)

O(n logd2 n)
d2 n 2O( n 2 log 2) = O(n logd2 n 2) d2 n 4O( n 4 log 4) = O(n logd2 n 4)

log n levels ... ... ... ...


d2 n O( n 4 log 4)

...

nO(1) = O(n)

Then size of d-dim. range tree is d2 n O(n logd2 n) + O(n logd2 n 2 ) + O (n log 4 ) + . . . + O (n) d2 d1 = log n O(n log n) = O(n log n).
Lecture 4, CS 631100 Orthogonal Range Searching

25

Orthogonal Range Searching

Improved range trees

Improved range trees: Fractional cascading

Lecture 4, CS 631100

Orthogonal Range Searching

26

Orthogonal Range Searching

Improved range trees

Motivation

In IR2 the query time of range trees is (k + log2 n) For comparison based algorithms, (k + log n) is a lower bound. Can we do better to achieve the lower bound? Yes, well then show how to obtain (k + log n) optimal query time.

Lecture 4, CS 631100

Orthogonal Range Searching

27

Orthogonal Range Searching

Improved range trees

Step 1: Querying x-coordinates (Same as before:)


Make the query with range [a1 , b1 ] on x-coordinates.

Ci

Cj

Take the nodes on the left path and the right path. Select canonical set Ci at right child of a node on left path; Select canonical set Cj at left child of a node on right path. It takes O(log n) time (height of the BBST T ). Let {C1 , C2 , . . . , Cc } = canonical sets selected, where c = O(log n).
Lecture 4, CS 631100 Orthogonal Range Searching 28

Orthogonal Range Searching

Improved range trees

Step 2: Querying y -coordinates (Modied)


When processing a query (a1 , b1 , a2 , b2 ), we search canonical trees Tv , always with two keys a2 and b2 . For each such tree, we spend O(log n) searching time. Main Idea: As Cv.lef t and Cv.right are subsets of Cv , We keep pointers between nodes of Tv and nodes of Tv.lef t & Tv.right that keep same key, or next larger key.
Av

Av.lef t

Av.right

Thus after performing search on a2 or b2 in Tv , we can perform search on a2 or b2 in Tv.lef t & Tv.right in O(1) time.
Lecture 4, CS 631100 Orthogonal Range Searching 29

Orthogonal Range Searching

Improved range trees

Step 2: Querying y -coordinates (Modied)


Minor Idea: Replacing each canonical tree Ti by a canonical array Ai for canonical set Ci :

Make a search for key a2 in array Ai ; Starting from a2 , walk along array Ai until b2 is exceeded.

Av

Av.lef t

Av.right

Lecture 4, CS 631100

Orthogonal Range Searching

30

Orthogonal Range Searching

Improved range trees

Step 2: Querying y -coordinates (Modied)


First make a binary search for a2 in Aroot , which takes O(log n) time.
Aroot Au Av v u Ci w Cj Aw Aj

Ai

By following pointer links, we can search a2 in a canonical array Ai in O(1) time. Starting from a2 , walk along array Ai (& reporting them) until b2 is exceeded.
Lecture 4, CS 631100 Orthogonal Range Searching

31

Orthogonal Range Searching

Improved range trees

Improving d-dim. range trees

Hence we can answer 2-dim. range query in O(log n + k ) optimal time. This technique is known as fractional cascading. By induction, it also improves by a factor O(log n) the results in d > 2 (by using canonical arrays at the last level, and the linking pointers). Hence range trees with fractional cascading in d 2 yield

Query time: O(k + logd1 n) (improved by a O(log n) factor) Space usage: O(n logd1 n) (same as before) Preprocessing time: O(n logd1 n) (same as before)

Lecture 4, CS 631100

Orthogonal Range Searching

32

Orthogonal Range Searching

Improved range trees

Remarks on 2-dim. improved range trees

O(log n + k ) query time and O(n log n) preprocessing time are optimal. But space complexity is NOT optimal. O(n log n/ log log n) space is possible in 2 dimensions with the same query time, and this is optimal. (not covered in this course)

Lecture 4, CS 631100

Orthogonal Range Searching

33

Orthogonal Range Searching

Improved range trees

Concluding remarks

Range trees:
simple nearly optimal

Spatial databases mainly use Rtrees


not covered in this course good in practice with real data-sets but no performance guarantee (no good worst case bound on the query time)

Lecture 4, CS 631100

Orthogonal Range Searching

34

Orthogonal Range Searching

Next Lecture

Summary of this lecture:


Orthogonal Range Searching
2-dim. range trees d-dim. range trees Fractional cascading

Next lecture:
Segment Trees and Interval Trees
Segment Trees Interval Trees

Lecture 4, CS 631100

Orthogonal Range Searching

35

You might also like