S2R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search

Chen, Xinyu; Xu, Jiajie; Zhou, Rui; Zhao, Pengpeng; Liu, Chengfei; Fang, Junhua; Zhao, Lei

doi:10.1007/s10707-019-00372-z

S²R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search

Published: 08 July 2019

Volume 24, pages 3–25, (2020)
Cite this article

Download PDF

Access provided by Institution of Civil Engineers Library

GeoInformatica Aims and scope Submit manuscript

S²R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search

Download PDF

Xinyu Chen^1,2,
Jiajie Xu^1,3,
Rui Zhou⁴,
Pengpeng Zhao^1,2,
Chengfei Liu⁴,
Junhua Fang^1,5 &
…
Lei Zhao¹

1034 Accesses
31 Citations
Explore all metrics

Abstract

Semantic-aware spatial keyword search is an important technique for digital map services. However, existing indexing and search methods have limited pruning effect due to the high dimensionality in semantic space, causing query efficiency to be a serious issue. To handle this problem, this paper proposes a novel pivot-based hierarchical indexing structure S²R-tree to integrate spatial and semantic information in a seamless way. Instead of indexing objects in the original semantic space, we carefully design a space mechanism to transform the high dimensional semantic vectors to a low dimensional space, so that more effective pruning effect can be achieved. On top of the S²R-tree, an efficient query processing algorithm is further designed, which not only ensures efficient query processing by a set of theoretical bounds, but also returns accurate results despite of the indexing in the low dimensional space. Furthermore, we conduct extensive experiments to evaluate and compare our proposed and baseline methods.

On Efficient Spatial Keyword Querying with Semantics

Multi-objective spatial keyword query with semantics: a distance-owner based approach

Article 08 February 2020

Efficient Spatial Database Keyword Query Search

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, people increasingly rely on location based services (LBS), such as Google Map and AMAP. As an essential technique for LBS systems, spatial keyword search [8, 17, 31, 33, 41, 44, 45, 47, 56, 57, 60] has been widely known as an important topic. As a result, a great deal of efforts [22, 38, 42, 43, 46] have been made so far to enable the location based services to be efficient and of high-quality.

In recent years, extensive studies have been carried out on processing Spatial Keyword Queries (SKQ) [10, 14, 28, 34, 36, 46, 48, 50, 51]. Pioneer works [18, 39, 52] mainly deal with SKQs with boolean or approximate keyword match. Though some classical indexing structures, such as IR-tree [18], S2I [39] and MHR-tree [52], are successful to reduce the computational and I/O cost, they are unable to retrieve the objects that are synonyms but literally different to query keywords. More recently, some semantic-aware SKQ approaches [15, 19, 37] have been proposed to capture the semantic meanings of the keywords learned from the natural language process tools (e.g. word embedding [1]), so that more meaningful results can be finally returned. However, since the semantic vectors are high dimensional. These indices, e.g. NIQ-tree proposed in [37], show unsatisfactory pruning effect in query processing due to the phenomenon of ‘curse of dimensionality’. More effective solutions are thus needed to facilitate both effective and efficient querying.

Therefore, this paper aims to design a more effective indexing and querying framework for semantic-aware SKQs. It is yet a challenging problem due to the following main aspects: firstly, the semantic understanding of keywords requires large scale latent features (e.g. parsed by word embedding), with each phrase represented by a high dimensional vector [1, 26], leading to not only more memory space in storage, but also higher I/O cost in data access. Secondly, each spatial object is associated with large-scale latent features in semantic space, which severely deteriorates the pruning effect of the multi-dimensional spatial keyword search algorithms [37]. This explains the poor performance of existing semantic-aware indexing methods when the dimensionality is large (for describing semantics). A key problem is thus how to index high dimensional data in relatively low dimensions while enabling accurate semantic-aware SKQ results to be successfully found.

Motivated by the effectiveness of pivot-based metric index [3, 35, 49], we propose a novel hybrid indexing structure called S²R-tree to address all the above issues. The S²R-tree adopts a hierarchical structure to seamlessly integrate information in spatial and semantic domains. A pivot-based space mapping mechanism is designed, so that high dimensional vectors can be transformed to low dimensional coordinates, in which the vectors tend to have high variance. In this way, the S²R-tree indexes the semantic vectors using the low dimensional pivot-based coordinates, rather than the original, so that large dead space can be avoided. A query processing algorithm on top of the S²R-tree tree is further proposed to prune the search space based on some theoretical bounds. We also exploit pivot-based principles of data partitioning and filtering, so as to prune in the original semantic space while using the low dimensional S²R-tree only. To sum up, the contributions of the paper can be summarized as:

1.
We design a set of pivot-based principles for space transformation and partitioning, so that high dimensional semantic vectors can be rationally mapped to low dimensional coordinates with high data variance.
2.
We propose a novel hybrid indexing structure S²Rtree, which not only integrates the spatial and semantic information seamlessly, but also represents semantic information by pivot-based coordinates in low dimensions, so that pruning effect can be improved significantly.
3.
We design an efficient and accurate SKQ processing algorithm on top of the S²R-tree, which can greatly prune the high dimensional search space based on some theoretical bounds.
4.
We conduct an extensive experiment analysis and make comparisions with baseline algorithms, then demonstrate the efficiency of our method.

The remainder of this paper is organized as follows. Section 2 surveys the related works of our issue. Section 3 formalizes the problem we are trying to work out. Section 4 introduces the baseline algorithm and our solution to handle this problem. Section 5 presents experiment results. Finally, Section 6 gives the conclusions and our future work.

2 Related work

Spatial keyword query is widely used in location-based devices and services which puts both spatial and textual relevance between a query and objects in the dataset into consideration. A range of contributions are already made in the literature that study different aspects of spatio-textual querying [5,6,7, 9, 11, 12, 16, 23,24,25, 30, 32, 53]. Some efforts are made to support the Spatial Keyword Boolean Query (SKBQ) [18, 20] that requires exact keywords match, which may lead few or no results to be found. To overcome this problem, lots of work have been done to support the Spatial Keyword Approximate Query (SKAQ) [29, 39, 52], which ensures the query results are no longer sensitive to spelling errors and conventional spelling differences. Numerous works study the problem of spatial keyword query on road network based query, collective query [4, 21], diversified querying [54], why-not questions [13, 58], interactive querying [48, 59] and so on.

Many classical indexing structures are proposed to support spatial keyword query, like IR-tree [18], MHR-tree [52], S2I [39], RCA [55] etc. All of them are lack of the semantics in objects and queries, which makes them unable to retrieve the objects that are synonyms but literally different to query keywords. For this purpose, we need a novel structure which can accurately return to users objects that have both high spatial and semantic similarities to query.

In [37], Qian proposes an iDistance based hybrid indexing structure, called NIQ-tree, which first incorporates semantic relevance into consideration. The structure of NIQ-tree is a combination of Quadtree for spatial, iDistance for semantics, and inverted lists for keywords. It is then organized in a hierarchical structure, so that pruning in all domains can be achieved simultaneously. Since semantics of user’s query are represented by high dimensional vectors obtained via word embedding [1] or LDA [2], it makes query processing inefficient due to the phenomenon of ‘curse of dimensionality’. This severely deteriorates the pruning effect of the multi-dimensional spatial keyword search algorithms. More effective methods that can enhance the efficiency of high-dimensional data management are thus highly sought after.

As pivot-based indexing method [3, 35, 49] has been proven to be successful in indexing high dimensional metric data, we consider to adopt it to work out the low-efficiency of pruning in high dimensional space. A variant of iDistance [27], i.e. M-Index [35], adopts pivot-based Voronoi space partition technique and each node is represented by a pivot permutation. According to the similarity of permutation, M-Index can improve the pruning efficiency and thus enhance the performance of searching. Traina et al. [49] proposes a pivot-based metric index method which can greatly improve the performance of indexing via referring to some pivots to build index. However, as far as we know, existing semantic-aware spatial keyword query indexes cannot support the accuracy and efficiency requirements of semantic-based retrieval for SKQ. In this paper, we propose a novel hybrid indexing structure, which not only integrates the spatial and semantic information seamlessly, but also represents semantic information by pivot-based coordinates in low dimensions, so that pruning effect can be improved significantly.

3 Problem definition

In this section, we briefly introduce some basic definitions and the statment of our problem. Table 1 summarizes the notations used throughout the paper.

Table 1 Summary of notations

Full size table

Definition 1 (Spatial object)

A spatial object is an object in geographical space, such as a restaurant, a cinema, a library and so on. Each spatial object o is modelled as o = (ε,μ), where ε is a geographical location that contains a longitude and latitude in two-dimension space, μ is a set of keywords to describe the spatial object o.

Definition 2 (Spatial keyword query)

A spatial keyword query is a user’s input. Each query is formalized as q = (ε,μ), where ε is the position of q represented by a longtitude and latitude in two dimensional geographical space, and μ is a set of words representing user’s interests, such as ‘Sichuan restaurant’. We use the term query to represent it in short in the rest of this paper.

Example 1

Figure 1 shows an example with a query q and seven spatial objects, where each spatial object is a point-of-interest(POI) with a location and a set of keywords. The query aims to find a POI w.r.t. ‘coffee shop’ close to the query location. We can see that keywords of both objects o₂ (‘STARBUCKS’) and o₃ (‘ENO COFFEE’) are consistent to the query keywords in semantics, while in contrast, other objects have low semantic relevance to the query obviously. We thus use the term semantic vector to model the semantics of any given keyword.

Definition 3 (Semantic vector)

Semantic vector that describes the semantic information of a set of keywords is a d-dimension vector in latent semantic space. In latent features (γ₁,γ₂,⋯,γ_d), each component represents a semantic interpretation. The different syntactic and semantic features of words are distributed to each dimension of the word. Given a set of keywords, it corresponds to a semantic vector γ = (γ₁,γ₂,⋯,γ_d) to denote, where each field represents a latent feature of a word, which captures useful syntactic and semantic features. We use γ_q and γ_o to denote the semantic vector of a query q and a spatial object o respectively in the following paper.

Next we discuss the ranking of spatial objects.

Definition 4 (Distance function)

Distance function is used to calculate the distance between an object and a query so that we can use this result to return the objects which meet the requirements of user. In spatial space, we use Euclidean distance to compute their spatial distance denoted as dist(q.ε,o.ε). We use the sigmoid function to normalize it to [0,1] as is shown in Eq. 1 [40].

$$ \mathcal{S}\mathcal{D}(q,o)=\frac{2}{1+e^{-dist(q.\varepsilon ,o.\varepsilon )}}-1 $$

(1)

In semantic space, we alse utilize Euclidean distance to calculate their semantic distance $\mathcal {TD}(q,o)$. In order to let $\mathcal {S}\mathcal {D}(q,o)$ and $\mathcal {TD}(q,o)$ contribute equally to the result, we use Eq. 2 to make it normalized to [0,1].

$$ \mathcal{TD}(q,o)=\frac{\sqrt{\textstyle{\sum}_{i=0}^{d-1}(\gamma_{qi}-\gamma_{oi})^{2}}}{\max\mathcal {TD}}\\ $$

(2)

where γ_∗i is the i-th component of ∗’s semantic vector γ_∗; d is the dimension of semantic vector; $ \max \mathcal {TD} $ is the maximum possible pairwise distance in semantic space.

Considering the spatial and semantic influences in the final result, we set a weighting factor λ to balance these two distances. And distance function $\mathcal {D}(q,o)$ [10] can be computed as follows:

$$ \mathcal{D}(q,o)=\lambda \times \mathcal{S}\mathcal{D}(q,o)+(1-\lambda )\times \mathcal{TD}(q,o) $$

(3)

Example 2

Continuing with the example in Fig. 1, where the spatial distances between the query location and spatial objects are also given. According to Eq. 2, we can easily derive the semantic distance between q and o₂ and o₃, i.e. $\mathcal {TD}(q,o_{2})=0.141$ and $\mathcal {TD}(q,o_{3})=0.063$. That means, they are very relevant to query keywords in semantics. While by considering spatial distance and semantics simultaneously, the distance from q to o₂, $\mathcal {D}(q,o_{2})=0.1955$ is much less than that of o₃, $\mathcal {D}(q,o_{3})=0.3315$, meaning that o₂ has higher priority to return. This coincides to our observation that o₃ would incur much more travel cost than o₂.

Problem formalization

Given a set of spatial objects O, a spatial keyword query q and an integer k, the problem of this paper is to return the top-k objects in O that are most similar to q according to Eq. 3.

4 Main algorithms

In this section, we introduce the baseline algorithm first and then propose our solution to solve the problem.

4.1 NIQ*-tree based algorithm

Towards the SKQ problem defined in Section 2, NIQ-tree [37] can be modified to a two-layered structure, called NIQ*-tree, which combines spatial and semantic domains hierarchically.

As is shown in Fig. 2, we adopt a spatial-first method because of its better pruning effect in spatial domain due to its 2D nature. NIQ*-tree utilizes Quadtree to index them according to their spatial closeness since Quadtree has more stability when a new object is inserted. For each leaf node of Quadtree, all objects are further organized by iDistance index in the semantic layer, such that objects are grouped and managed by their semantic coherence, and then construct a B⁺tree to organize these objects according to their key value which can be calculated as follows:

$$ key=i \times r+ \mathcal{TD}(p_{i},o) $$

(4)

where i is the identifier of the cluster C_i; r is a constant to map the objects in C_i into the range of [i × r,(i + 1) × r); p_i is the reference point of to C_i. The basic form of a node in NIQ*-tree is $n=(p,\mathcal {R},o,r)$, where p is the pointer(s) to its child node(s); $ \mathcal {R} $ is the minimum bounding rectangle(MBR in short) in spatial, which covers all objects contained by n; o and r are the center point and radius to refer to a semantic hyper-sphere that covers all objects contained by n in spatial domain.

During the process of searching, we use a priority queue to traverse the spatial layer nodes according to the best match distance $\mathcal D_{bm}$ to a query q which can be calculated as follows:

$$ \mathcal D_{bm}(q,n)=\lambda \times \min \mathcal {D}_{s}(q,n)+(1-\lambda )\times \min \mathcal {D_{T}}(q,n) $$

(5)

and

$$ \min \mathcal D_{T}(q,n)=\left\{\begin{array}{ll} 0 & \mathcal{TD}(q,n.o)\leq n.r\\ \mathcal {TD}(q,n.o)-n.r & \mathcal {TD}(q,n.o) > n.r \end{array}\right.; $$

where λ is an integer between 0 and 1; $\min \mathcal {D}_{s}(q,p)$ is the minimum spatial distance between query point $ q.\mathcal {R} $ and the spatial MBR of n; $\min \mathcal {D_{T}}(q,n)$ is the minimum possible semantic distance between q and n; $\mathcal {TD}(q,n.o)$ is the semantic distance between q and n.o. Note that, $ \mathcal D_{bm}(q,n) $ is the lower bound distance to q for all unvisited objects.

In the query processing, we dynamically maintain the top-k minimum distance for all scanned objects and keep the k-th minimum distance as an upper bound. If the node we fetch from the priority queue is a non-leaf node, we add all its child nodes to the queue; otherwise, we access objects in the iDistance node of n which intersects with the searching space and then update top-k and the upper bound. The search processing terminates when the lower bound is no less than the upper bound indicating that remaining unvisited objects have no opportunity to be better than the current top-k results.

4.2 S²R-tree based algorithm

Since the pruning in semantic space is usually inefficient due to the high dimensionality, a more rational solution can be transfering the semantic vectors into a low dimensional space for indexing. To this end, motivated by the OmniR-tree [49], this section further proposes a more effective hybrid indexing structure, called Spatial-Semantic R-tree (S²R-tree in short), to improve querying efficiency without sacrificing accuracy.

Index structure

S²R-tree adopts a two-layer structure shown as Fig. 3. Similar to NIQ*-tree, it is a spatial-first structure. R-tree is used to group objects according to their geographical coordinates since its superior pruning effect in 2D space.

In the semantic domain, motivated by [49], S²R-tree adopts a pivot-based indexing method to avoid the ‘large dead space’ phenomenon of iDistance. The basic idea of our solution is to map high dimensional semantic vectors to a low dimensional space, so that more effective indexing can be achieved. More specifically, we select a set P = (p₀,p₁,⋯,p_m− 1) of m pivots (known as reference points) in the original space, so that a d-dimensional semantic vector γ = (γ₀,γ₁,⋯,γ_d− 1) can be transformed to a m-dimensional pivot-based coordinates $ \gamma ^{P} = ({\gamma ^{P}_{0}}, {\gamma ^{P}_{1}}, ..., \gamma ^{P}_{m-1}) $.

Definition 5 (Pivot-based coordinates)

Let P = (p₀,p₁,⋯,p_m− 1) be a set of pivots in d-dimensional space, the pivot-based coordinates is a vector in the pivot-based system subject to P. Given semantic vector γ in original d-dimensional space, it is projected to an m-dimensional pivot-based coordinates $ \gamma ^{P} = ({\gamma ^{P}_{0}}, {\gamma ^{P}_{1}},\cdots , \gamma ^{P}_{m-1}) $ in the pivot-based system, where each component $ {\gamma ^{P}_{i}} (0 \leq i < m) $ is the semantic distance between γ and pivot p_i in the d-dimensional space (i.e. original semantic space), such that ${\gamma ^{P}_{i}} = \mathcal {TD}(\gamma ,p_{i})$ which can be calculated as Eq. 2.

Example 3

As is shown in Fig. 3b, in original space, o₁,o₂,⋯,o₇ are objects in Example 1 and we choose objects o₀ and o₄ as p₀ and p₁ respectively, i.e. P = {o₀,o₄}. The semantic vectors (e.g. for o₀ and o₄) obtained by Word2Vec are d-dimensional. While in the pivot-based coordinates, each object is represented by a m-dimensional coordinate, where each dimension is the semantic distance between the pivot and the object in the original space. Assuming that the distances between o₂ and each pivot in the original space are $\mathcal {TD}(o_{2},o_{0})=0.5$, $\mathcal {TD}(o_{2},o_{4})=0.95$, thus we have the pivot-based coordinate of o₂, i.e. ${o_{2}^{P}}=(\mathcal {TD}(o_{2},o_{0}), \mathcal {TD}(o_{2},o_{4}))=(0.5,0.95)$.

By projecting the semantic vectors to the pivot-based coordinates, it is successfully transformed from high dimensional space to lower dimensions (i.e. from d to m). Note that data variance in the low-dimensional representation is expected to be maximized, and it is affected by the pivots in P. We apply the HF algorithm [49] to generate P, in which pivots are properly located in the border of dataset to best identify objects in semantic space collectively. We finally utilize R-tree to index the pivot-based coordinates in m-dimensions, and it is expected to be effective given that the number of m is always small.

Next, we discuss how to reasonably generate the pivot set P, for maintaining the properties of original space in the low dimensional space. We are aimed to find a set of m pivots, which have the most dissimilarity in the original space, such that they can best identify objects in the semantic space collectively. As we all know finding such a set of pivots is a NPC problem, thus we apply a heuristic method [49]. As shown in Algorithm 2, we first initialize P by adding a pair of points (i.e. p₀ and p₁) with the maximum pairwise distance in the dataset. Afterwards, we keep adding semantic vectors into P, until m pivots are selected (i.e.|P| = m). In each round, the semantic vector that has the maximum distance to all partially selected pivots in P is chosen as the new pivot (Lines 2–4), since it ensures data variance in low dimensional space. Specifically, we use the following formula to measure the distance between a semantic vector to all pivots in set P, such that

$$ \mathcal MG(o,P)=\sum\limits_{p}^{p \in P} \mathcal {TD}(o,p) $$

(6)

where $\mathcal {TD}(o,p)$ is the semantic distance between o and p.

The format of each node of the S²R-tree is $ n = (ptr, \mathcal {R}, \mathcal {B}) $, where ptr is the pointer(s) to the child node(s); $ \mathcal {R} $ is the spatial MBR to cover all objects contained by n geographically; $ \mathcal {B} $ is the minimum bounding box(MBB in short) that covers the pivot-based coordinates (m dimensional) of objects contained by n. The MBB of nodes in spatial layer are computed on basis of that of its child nodes in a bottom-up way. In this way, we derive hybrid index S²R- tree, which integrates spatial and low dimensional semantic information in a seamless way.

Query processing

On top of the S²R-tree, the SKQ processing is carried out in spatial and semantic spaces collaboratively. Algorithm 1 shows the details.

We set the upper bound $\mathcal {D}_{ub}$ and lower bound $\mathcal {D}_{lb}$ to store current searching scope(Lines 1,2). Query processing follows the steps below. Starting from the root node, we traverse the S²R-tree using a priority queue Q (Line 4), and keep visiting the node popped out from Q (Lines 5–16). In this procedure, we dynamically maintain the top-k candidates in C initialized as empty (Line 3) of the objects that we have seen, for any two visited objects o ∈ C and o^′∉C it must hold that $\mathcal D(q, o) < \mathcal D(q, o^{\prime }) $. We also keep track of the distance of the k-th object in C, which is obviously the upper bound distance $\mathcal D_{ub} = \max \left \{ \mathcal D(q,o)|o \in C \right \} $ of the final results (since C are the objects we have found so far). Everytime when a node n is popped out from the queue Q, we perform:

if n is a leaf node of S²R-tree, then we visit all objects belonging to n (Lines 7–12). More specifically, we calculate their actual distances to query. If $\mathcal D(q, o) <\mathcal D_{ub} $, meaning that the object o is superior than at least one top-k candidates in C that we have found, then o is included to replace the worst object in C, and $\mathcal D_{ub} $ is updated accordingly.
if n is a non-leaf node, we simply insert all its child nodes into Q(Lines 14-15), in which all objects are ranked by their minimum possible distance $ \min \mathcal {D}(q, n) $ to query q, which is defined as:
$$ \min \mathcal{D}(q,n)=\lambda \times \min \mathcal {SD}(q,n)+(1-\lambda )\times \min \mathcal {TD}(q,n) $$
(7)

where λ is an integer between 0 and 1; $\min \mathcal {SD}(q,n)$ is the minimum spatial distance from q to node n and $\min \mathcal {TD}(q,n)$ is possible minimum semantic distance from q to any object contained in the node n which can be calculated as follows (see Lemma 2 for detailed proof):

$$ \min \mathcal {TD}(q,n)=\max (\min \mathcal {TD}_{i}(q,n),0\leq i<m ) $$

(8)

where $\min \mathcal {TD}_{i}(q,n)$ is the possible minimum semantic distance between q and n, w.r.t. (the bounds derived from) the pivot p_i, such that (see Lemma 1 for detailed proof)

$$ \min \mathcal {TD}_{i}(q,n)=\left\{\begin{array}{ll} \min(n.\mathcal{B}_{i})-\mathcal {TD}(q,p_{i})\text{ ,}& \mathcal {TD}(q,p_{i})\leq \min(n.\mathcal{B}_{i})\\ 0\text{ ,}&\min(n.\mathcal{B}_{i})< \mathcal {TD}(q,p_{i})\leq \max(n.\mathcal{B}_{i})\\ \mathcal {TD}(q,p_{i})-\max(n.\mathcal{B}_{i})\text{ ,} & \mathcal {TD}(q,p_{i})>\max(n.\mathcal{B}_{i}) \end{array}\right. $$

(9)

where $n.\mathcal {B}_{i}$ refers to the i-th dimension of node n’s MBB in low dimensional space, where $\min (n.\mathcal {B}_{i})$ and $\max (n.\mathcal {B}_{i})$ are the minimum and maximum values of $ n.\mathcal {B}_{i} $ respectively, denoting the minimum and maximum distances between any node o ∈ n and pivot p_i in original space, i.e. $ \min (n.\mathcal {B}_{i}) \leq \mathcal TD(o, p_{i}) \leq \max (n.\mathcal {B}_{i}) $.

Before we prove that $ \min \mathcal {D}(q, n) $ is the minimum possible distance from q to any object o belonging to node n, we first use Lemma 1 to show a bound of distance between query q and objects in n driven by a pivot p_i in original space using S²R-tree.

Lemma 1

Given a queryqand a nodenof the S²R-tree,when considering the pivotp_i ∈ P,the distance$ \min \mathcal {TD}_{i}(q,n)$in Eq. 9is a lower bound semantic distance betweenqand any objectoin n with respect top_i,i.e.$ \forall o \in n: \mathcal {TD}(q, o) \geq \min \mathcal {TD}_{i}(q,n) $.

Proof

According to the above, when given a node n in S²R-tree, $n.\mathcal {B}_{i}$ refers to the i-th dimension of node n’s MBB in low dimensional space, $\min (n.\mathcal {B}_{i})$ and $\max (n.\mathcal {B}_{i})$ denotes the minimum and maximum distances respectively between any node o ∈ n and pivot p_i in original space.

In accordance with the triangle inequality, for any given points of o, q and p_i in the original space, we have:

$$ \mathcal{TD}(o,p_{i})\leq \mathcal{TD}(q,o)+\mathcal{TD}(q,p_{i}) $$

(10)

if $\mathcal {TD}(q,p_{i})\leq \min (n.\mathcal {B}_{i})$ and according to Eq. 10, we have:
$$ \left.\begin{array}{ll} \mathcal{TD}(q,o) \geq \mathcal{TD}(o,p_{i})-\mathcal{TD}(q,p_{i})\\ \\ \forall o \in n,\mathcal {TD}(o,p_{i})\geq \min(n.\mathcal{B}_{i}) \end{array}\right\}\Rightarrow \mathcal{TD}(q,o) \geq \min(n.\mathcal{B}_{i})-\mathcal{TD}(q,p_{i}) $$
Thus,
$$ \min \mathcal {TD}_{i}(q,n)=\min(n.\mathcal{B}_{i})-\mathcal{TD}(q,p_{i}) $$
(11)
if $\min (n.\mathcal {B}_{i}) < \mathcal {TD}(q,p_{i})\leq \max (n.\mathcal {B}_{i})$, it indicates q ∈ n, i.e.
$$ \min \mathcal {TD}_{i}(q,n)=0 $$
(12)
if $\mathcal {TD}(q,p_{i}) > \max (n.\mathcal {B}_{i})$ and according to Eq. 10, we have:
$$ \left.\begin{array}{ll} \mathcal {TD}(q,o) \geq \mathcal {TD}(q,p_{i})-\mathcal {TD}(o,p_{i})\\ \\ \forall o \in n,\mathcal {TD}(o,p_{i})\leq \max(n.\mathcal{B}_{i}) \end{array}\right\}\Rightarrow \mathcal {TD}(q,o) \geq \mathcal {TD}(q,p_{i})-\max(n.\mathcal{B}_{i}) $$
Thus,
$$ \min \mathcal {TD}_{i}(q,n)=\mathcal{TD}(q,p_{i})-\max(n.\mathcal{B}_{i}) $$
(13)

By combining Eqs. 11, 12 and 13, Eq. 9 is proven. □

Example 4

According to Fig. 3, we have $\min (R_{5}.\mathcal {B}_{1})=0.12$, $\max (R_{5}.\mathcal {B}_{1})=0.3$. Assuming that semantic distance between q and p₁ is $ \mathcal {TD}(q,p_{1}) =0.45$, which is greater than $ \max (R_{5}.\mathcal {B}_{1}) $. Thus $\forall o \in R_{5},\mathcal {TD}(q,o) \geq \mathcal {TD}(q,p_{1})-\max (R_{5}.\mathcal {B}_{1})=0.15$, i.e. $\min \mathcal {TD}_{1}(q,R_{5})=0.15$.

Based on Lemma 1, by considering multiple pivots, Eq. 8 is the bound semantic distance between query q and any object in node n (in original semantic space) according to Lemma 2.

Lemma 2

Given a query q, a node n in S²R-tree,the possible semantic distance$\min \mathcal {TD}(q,n)$from q to all objects in n can be calculated as Eq. 8.

Proof

According to Lemma 1, we have the possible minimum semantic distance based on p_i between q and all objects in n, $ \min \mathcal {TD}_{i}(q,n) (0\leq i<m)$. Via grouping the possible minimum semantic distance between q and n based on each pivot, we have Eq. 8. Thus lemma 2 can be proven. □

Based on Lemma 1 and Lemma 2, we can see that $ \min \mathcal {D}(q, n) $ is the minimum possible distance between query q and node n, as shown in Theorem 1.

Theorem 1

Given a query q and a node n of S²R-tree,$ \min \mathcal {D}(q, n) $in Eq. 7is the minimum possible distance betweenqand any objecto ∈ n,i.e.$ \min \mathcal {D}(q, n) \leq D(q, o) $.

Proof

According to Lemma 2, in semantic domain, we can calculate the minimum possible semantic distance between q and n using Eq. 8, i.e.

$$ \forall o \in n, \min \mathcal {TD}(q,n) \leq \mathcal {TD}(q,o) $$

(14)

and in spatial domain, we have:

$$ \forall o \in n, \min \mathcal {SD}(q,o) \leq \mathcal {SD}(q,o) $$

(15)

Combined with Eqs. 14 and 15 and considering the joint impact of spatial and semantic distances to $ \min \mathcal {D}(q, n) $, we set a weighting factor λ, i.e.

$$ \begin{array}{@{}rcl@{}} \min \mathcal{D}(q, n) &=& \lambda \times \min \mathcal {SD}(q,n)+(1- \lambda) \times \min \mathcal {TD}(q,n)\\ & \leq& \lambda \times \mathcal {SD}(q,o)+(1- \lambda) \times \mathcal {TD}(q,o)=\mathcal {D}(q,o) \end{array} $$

i.e. $\min \mathcal {D}(q, n) \leq D(q, o)$. Theorem 1 is proven. □

Note that, all information in $ \min \mathcal {D}(q, n) $ can be obtained from the S²R-tree structure. That means, the pivot-based indexing in low-dimensional can help us to accurately prune objects in the original (high-dimensional) semantic space. Every time when a node n is popped out from Q, we can also derive a lower bound distance of all unvisited objects, i.e. $\mathcal D_{lb} = \min \mathcal {D}(q, n) $, according to Theorem 2.

Theorem 2

Every time a nodenis popped out fromQ,for any unvisited objecto,it must hold thatminD(q,n) ≤ D(q,o).

Proof

Assuming that n^′ is any unvisited node except n in Q, as $\min \mathcal {D}(q,n^{\prime })$ is the possible minimum distance from the query q to n^′ and n is the top element in Q, we have:

$$ \left.\begin{array}{ll} \forall o \in n^{\prime},\min \mathcal{D}(q,n^{\prime})\leq \mathcal{D}(q,o)\\ \\ \min \mathcal{D}(q,n)\leq \min \mathcal{D}(q,n^{\prime}) \end{array}\right\}\Rightarrow \min \mathcal{D}(q,n)\leq \mathcal{D}(q,o) $$

(16)

i.e. $\min \mathcal {D}(q,n)$ is the lower bound distance $\mathcal D_{lb}$ to q for all unvisited objects. Thus, Theorem 2 can be proven. □

Once the condition $\mathcal D_{lb}\geq \mathcal D_{ub} $ is met (Lines 15–16), we can safely terminate the SKQ searching process, since all unvisited objects are not possible to replace the current top-k candidates that we have found in C so far. Otherwise, it stops when Q is empty, meaning that all nodes of S²R-tree are traversed. Finally, we return users all spatial objects in candidate set C (Line 18).

Assuming that we have n objects and storing these objects in a S²R-tree to return the top-k objects that are most similar to query. When k is small, in this case, time complexity is related to the furcation f of S²R-tree, i.e. O(log_fn). When k is large (i.e. close to n), time complexity is O(n). And space complexity of S²R-tree is O(n).

5 Experiments

In this section, we conduct several experiments to compare our proposed algorithms and present the results.

5.1 Experiment settings

On one hand, we use the real dataset of the online check-in records of Foursquare, which consists of the user ID, time of check-in, venue with geo-location (point of interest) and other information about the user written in plain English. There are 422030 objects in the whole datast in sum. On the other hand, we use another dataset crawled from DaZhong Comments which records some information of shops (point of interest) written in Chinese, including the name, positional information about the shops and the comments of users to POI.

In order to compare the semantic similarity between different venues, we utilize Word2Vec to derive the semantic vector, where each component is a latent feature of a word. Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

In our experiments, we apply HF algorithm [49] to generate all pivots we used. These well-selected pivots are properly located in the border of dataset to best identify objects in semantic space collectively.

All algorithms are implemented in C++ on the same dataset Foursquare and DaZhong Comments. We compare our method with proposed NIQ-tree and R-tree in query time and I/O cost which is represented by the number of visited objects. The default values for parameters are given in Table 2. During the experiments, we keep a variable changing and other variables to maintain the default value which is satisfied with control variable principle. The experiments are realized on a PC with 8 GB memory.

Table 2 default values of parameters

Full size table

5.2 Comparison with proposed algorithms

In this subsection, we compare our algorithm with the baselines on different parameters shown in Table 2.

Effect of |D|

As shown in Fig. 4, the efficiency of three algorithms shows a similar tendency both in Foursquare and DaZhong Comments. As we can find from these figures, both query time and I/O cost show an upward trend when the size of dataset goes up. Because we need to retrieve more objects and the time of distance calculating becomes greater simultaneously. In comparison, when |D| becomes larger, the efficiency of NIQ*-tree is close to R-tree since its worse spatial pruning effect. Besides, both S²R-tree and NIQ*-tree outperforms the R-tree, since they enable query processing to prune in spatial and semantic spaces collaboratively. We can see that the S²R-tree has the best performance, especially when the dataset becomes large, which can be explained by the pruning in lower dimensional space.

Effect of k

Figure 5 shows the performance of R-tree, NIQ*-tree and S²R-tree when k ranges from 10 to 50. All three algorithms show a slightly increase on both datasets. As shown in Fig. 5a and c, the number of visited objects shows an upward tendency with k floating from 10 to 50. This phenomenon is attributed that we want to return more objects which have high similarity with query, such that we have to visit more objects and to make more distance calculation. Additionally, the performances of R-tree and NIQ*-tree are basically identical, which shows that NIQ*-tree has poor pruning ability in high dimensional space. While S²R-tree is significantly superior to NIQ*-tree and R-tree both in I/O cost and query time but is more sensitive to k than R-tree and NIQ*-tree since its high efficiency in pruning dead space via mapping high dimensional data into lower dimensional space, which is not beyond our expectation.

Effect of λ

From Fig. 6, it is obvious that when λ increases, the number of visited objects and query time both decrease, due to the better pruning ability in the spatial domain in 2D space. More specifically NIQ*-tree is superior to R-tree only if λ is less than 0.5, while S²R-tree precedes others all the time. We can see from Fig. 6a and b, when λ is more than 0.6, the number of visited objects shows a sharply decrease in all three algorithms. It can be explained that when λ ranges closely to 1, spatial distance takes more and more proportion in the final distance. While S²R-tree is still better than the other two algorithms. It is within our expectation because S²R-tree has perferable efficiency via pruning in low-dimensional space. Furthermore, R-tree is the most sensitive to the variety of λ because of its pruning only in spatial domain.

5.3 Evaluation on S²R-tree parameters

In this subsection, we further evaluate the performance of S²R-tree by varying the parameters c and m.

Effect of m

As is shown in Fig. 7, the number of pivot m in pivot-based space affects the performance of our proposed indexing structure. With the increase of m from 2 to 8, we can observe from Fig. 7a and c that the number of visited objects shows a little decrease, while as is shown in Fig. 7b and d, query time shows a gentle descent when m ranges from 2 to 8. This phenomenon can be explained for more dead space pruning conducted in pivot-based space(low dimensional space) and thus we need less time to calculate distance which is not beyond our expectation. From Fig. 7, it is noted that the increase of data size |D| alse makes more I/O cost and query time when m remains the same both in datasets Foursquare and DaZhong Comments.

Effect of c

According to Fig. 8, the performance of S²R-tree is affected by c, which is the capacity of S²R-tree leaf node. On one hand, we can observe from Fig. 8a and c that the visited objects remain almost constant with respect to c but increase with the data size |D|. It is attributed that although the number of objects in each leaf node increases gradually, the efficiency of pruning dead space also has an upward tendency and thus the number of visited objects remains almost constant when c ranges from 30 to 150. On the other hand, as is shown in Fig. 7b and d the query time has a decrease when c changes from 30 to 150 since more dead space are pruned and less time is spent in computing distance. Additionally, the increase of data size |D| alse makes more I/O cost and query time when c remains the same.

In summary, we can conclude that the S²R-tree based search algorithm is more efficient than other two baseline algorithms R-tree and NIQ*-tree, almost in all test settings. This demostrates the suprior querying performance of S²R-tree due to not only the pruning in a low dimensional space after coordinates transformation, but also the effectiveness of a series of bounds in query optimization.

6 Conclusion and future work

This paper proposes a novel hybrid index structure S²R-tree, which integrates the spatial and semantic information seamlessly. Instead of indexing objects in the original semantic space, we carefully design a pivot-based space mapping mechanism to transform the high dimensional semantic vectors to a low dimensional space, so that more effective pruning effect can be achieved. A novel SKQ search algorithm on top of S²R-tree is further designed, using some theoretical bounds to speed up query processing significantly. We conduct extensive experiments and then demonstrate the efficiency of our methods comparing with baseline algorithms.

In the future, we would like to consider the constraints of road network in spatial, develop a network topology-aware SKQ querying framwork and further optimize the query.

References

Andreas J, Klein D (2014) How much do word embeddings encode about syntax? In: ACL, pp 822–827
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Google Scholar
Bozkaya T, Özsoyoglu ZM (1997) Distance-based indexing for high-dimensional metric spaces. In: SIGMOD, pp 357–368
Cao X, Cong G, Jensen CS, Ooi BC (2011) Collective spatial keyword querying. In: SIGMOD, pp 373–384
Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: ER, pp 16–29
Cao X, Chen L, Cong G, Xiao X (2012) Keyword-aware optimal route search. PVLDB 5(11):1136– 1147
Google Scholar
Cao X, Chen L, Cong G, Guan J, Phan N, Xiao X (2013) KORS: keyword-aware optimal route search system. In: ICDE, pp 1340–1343
Chen L, Cong G (2015) Diversity-aware top-k publish/subscribe for text stream. In: SIGMOD, pp 347–362
Chen L, Cong G, Cao X (2013) An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD, pp 749–760
Chen L, Cong G, Jensen CS, Wu D (2013) Spatial keyword query processing: an experimental evaluation, vol 6, pp 217–228
Article Google Scholar
Chen L, Cui Y, Cong G, Cao X (2014) SOPS: a system for efficient processing of spatial-keyword publish/subscribe. PVLDB 7(13):1601–1604
Google Scholar
Chen L, Cong G, Cao X, Tan K (2015) Temporal spatial-keyword top-k publish/subscribe. In: ICDE, pp 255–266
Chen L, Lin X, Hu H, Jensen CS, Xu J (2015) Answering why-not questions on spatial keyword top-k queries. In: ICDE, pp 279–290
Chen W, Zhao L, Xu J, Liu G, Zheng K, Zhou X (2015) Trip oriented search on activity trajectory. J Comput Sci Technol 30(4):745–761
Article Google Scholar
Chen J, Xu J, Liu C, Li Z, Liu A, Ding Z (2017) Multi-objective spatial keyword query with semantics. In: DASFAA 2017, pp 34–48
Chapter Google Scholar
Chen Z, Cong G, Zhang Z, Fu TZJ, Chen L (2017) Distributed publish/subscribe query processing on the spatio-textual data stream. In: ICDE, pp 1095–1106
Chen L, Shang S, Zhang Z, Cao X, Jensen CS, Kalnis P (2018) Location-aware top-k term publish/subscribe. In: ICDE, pp 749–760
Cong G, Jensen CS, Wu D (2009) Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1):337–348
Google Scholar
Fariha A, Sarwar SM, Meliou A (2018) Squid: semantic similarity-aware query intent discovery. In: SIGMOD Conference 2018, pp 1745–1748
Felipe ID, Hristidis V, Rishe N (2008) Keyword search on spatial databases. In: ICDE, pp 656–665
Gao Y, Zhao J, Zheng B, Chen G (2016) Efficient collective spatial keyword query processing on road networks. IEEE Trans Intell Transp Syst 17(2):469–480
Article Google Scholar
Gunasekaran YD, Rahman MF, Hasani S, Zhang N, Das G (2018) DBLOC: density based clustering over location based services. In: SIGMOD, pp 1697–1700
Han J, Wen J (2013) Mining frequent neighborhood patterns in a large labeled graph. In: CIKM, pp 259–268
Han J, Wen J, Pei J (2014) Within-network classification using radius-constrained neighborhood patterns. In: CIKM, pp 1539–1548
Han J, Zheng K, Sun A, Shang S, Wen J (2016) Discovering neighborhood pattern queries by sample answers in knowledge base. In: ICDE, pp 1014–1025
Henao R, Li C, Carin L, Su Q, Shen D, Wang G, Wang W, Min MR, Zhang Y (2018) Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms. In: ACL, pp 440–450
Jagadish HV, Ooi BC, Tan K, Yu C, Zhang R (2005) idistance: an adaptive b⁺-tree based indexing method for nearest neighbor search. In: TODS, vol 30, pp 364–397
Li G, Xu J, Feng J (2012) Keyword-based k-nearest neighbor search in spatial databases. In: CIKM, pp 2144–2148
Li F, Yao B, Tang M, Hadjieleftheriou M (2013) Spatial approximate string search. TKDE 25(6):1394–1409
Google Scholar
Li M, Chen L, Cong G, Gu Y, Yu G (2016) Efficient processing of location-aware group preference queries. In: CIKM, pp 559–568
Li X, Cheng Y, Cong G, Chen L (2017) Discovering pollution sources and propagation patterns in urban area. In: KDD, pp 1863–1872
Liu H, Xu J, Zheng K, Liu C, Du L, Wu X (2017) Semantic-aware query processing for activity trajectories. In: WSDM, pp 283–292
Liu A, Wang W, Shang S, Li Q, Zhang X (2018) Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica 22 (2):335–362
Article Google Scholar
Mahmood AR, Aref WG (2017) Query processing techniques for big spatial-keyword data. In: SIGMOD, pp 1777–1782
Novak D, Batko M, Zezula P (2011) Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf Syst 36(4):721–733
Article Google Scholar
Qian Z, Xu J, Zheng K, Sun W, Li Z, Guo H (2016) On efficient spatial keyword querying with semantics. In: DASFAA, pp 149–164
Qian Z, Xu J, Zheng K, Zhao P, Zhou X (2018) Semantic-aware top-k spatial keyword queries. World Wide Web J 21(3):573–594
Article Google Scholar
Ray S, Blanco R, Goel AK (2017) High performance location-based services in a main-memory database. GeoInformatica 21(2):293–322
Article Google Scholar
Rocha-Junior JB, Gkorgkas O, Jonassen S, Nørvåg K (2011) Efficient processing of top-k spatial keyword queries. In: SSTD, pp 205–222
Shang S, Ding R, Yuan B, Xie K, Zheng K, Kalnis P (2012) User oriented trajectory search for trip recommendation. In: EDBT’12, pp 156–167
Shang S, Ding R, Zheng K, Jensen CS, Kalnis P, Zhou X (2014) Personalized trajectory matching in spatial networks. VLDB J 23(3):449–468
Article Google Scholar
Shang S, Liu J, Zheng K, Lu H, Pedersen TB, Wen J (2015) Planning unobstructed paths in traffic-aware spatial networks. GeoInformatica 19(4):723–746
Article Google Scholar
Shang S, Chen L, Wei Z, Guo D, Wen J (2016) Dynamic shortest path monitoring in spatial networks. J Comput Sci Technol 31(4):637–648
Article Google Scholar
Shang S, Chen L, Jensen CS, Wen J, Kalnis P (2017) Searching trajectories by regions of interest. IEEE Trans Knowl Data Eng 29(7):1549–1562
Article Google Scholar
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2017) Trajectory similarity join in spatial networks. PVLDB 10(11):1178–1189
Google Scholar
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2018) Parallel trajectory similarity joins in spatial networks. VLDB J 27(3):395–420
Article Google Scholar
Shang S, Chen L, Zheng K, Jensen CS, Wei Z, Kalnis P (2019) Parallel trajectory-to-location join. IEEE Trans Knowl Data Eng 31(6):1194–1207
Article Google Scholar
Sun J, Xu J, Zheng K, Liu C (2017) Interactive spatial keyword querying with semantics. In: CIKM, pp 1727–1736
Traina C Jr, Filho RFS, Traina AJM, Vieira MR, Faloutsos C (2007) The omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J 16(4):483–505
Article Google Scholar
Wang T, Li G, Feng J (2011) Efficient algorithms for top-k keyword queries on spatial databases. In: MDM, pp 285–286
Xu J, Gao Y, Liu C, Zhao L, Ding Z (2015) Efficient route search on hierarchical dynamic road networks. Distrib Parallel Databases 33(2):227–252
Article Google Scholar
Yao B, Li F, Hadjieleftheriou M, Hou K (2010) Approximate string search in spatial databases. In: ICDE, pp 545–556
Yue X, Xi M, Chen B, Gao M, He Y, Xu J (2019) A revocable group signatures scheme to provide privacy-preserving authentications. In: MONET
Zhang C, Zhang Y, Zhang W, Lin X, Cheema MA, Wang X (2014) Diversified spatial keyword search on road networks. In: EDBT, pp 367–378
Zhang D, Chan C, Tan K (2014) Processing spatial keyword query as a top-k aggregation query. In: SIGIR, pp 355–364
Zhao K, Chen L, Cong G (2016) Topic exploration in spatio-temporal document collections. In: SIGMOD, pp 985–998
Zhao K, Liu Y, Yuan Q, Chen L, Chen Z, Cong G (2016) Towards personalized maps: mining user preferences from geo-textual data. PVLDB 9(13):1545–1548
Google Scholar
Zhao J, Gao Y, Chen G, Chen R (2018) Why-not questions on top-k geo-social keyword queries in road networks. In: ICDE 2018, pp 965–976
Zheng K, Su H, Zheng B, Shang S, Xu J, Liu J, Zhou X (2015) Interactive top-k spatial keyword queries. In: ICDE, pp 423–434
Zheng K, Zheng B, Xu J, Liu G, Liu A, Li Z (2017) Popularity-aware spatial keyword search on activity trajectories. World Wide Web 20(4):749–773
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61872258, 61572335, 61772356, 61876117, and 61802273, the Dongguan Innovative Research Team Program under grant number 2018607201008, the Australian Research Council discovery projects under grant numbers DP160102412, DP170104747, DP180100212, and the Open Program of State Key Laboratory of Software Architecture under item number SKLSAOP1801.

Author information

Authors and Affiliations

Institute of Artificial Intelligence, School of Computer Science and Technology, Soochow University, Suzhou, China
Xinyu Chen, Jiajie Xu, Pengpeng Zhao, Junhua Fang & Lei Zhao
Neusoft Corporation, Shenyang, China
Xinyu Chen & Pengpeng Zhao
Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Jiajie Xu
Faculty of SET, Swinburne University of Technology, Melbourne, Australia
Rui Zhou & Chengfei Liu
University of Electronic Science and Technology of China, Chengdu, China
Junhua Fang

Authors

Xinyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiajie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Pengpeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chengfei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Junhua Fang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jiajie Xu, Rui Zhou or Pengpeng Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Xu, J., Zhou, R. et al. S²R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. Geoinformatica 24, 3–25 (2020). https://doi.org/10.1007/s10707-019-00372-z

Download citation

Received: 28 January 2019
Revised: 31 May 2019
Accepted: 20 June 2019
Published: 08 July 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10707-019-00372-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

S2R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search

Abstract

Similar content being viewed by others

On Efficient Spatial Keyword Querying with Semantics

Multi-objective spatial keyword query with semantics: a distance-owner based approach

Efficient Spatial Database Keyword Query Search

1 Introduction

2 Related work

3 Problem definition

Definition 1 (Spatial object)

Definition 2 (Spatial keyword query)

Example 1

Definition 3 (Semantic vector)

Definition 4 (Distance function)

Example 2

Problem formalization

4 Main algorithms

4.1 NIQ*-tree based algorithm

4.2 S2R-tree based algorithm

Index structure

Definition 5 (Pivot-based coordinates)

Example 3

Query processing

Lemma 1

Proof

Example 4

Lemma 2

Proof

Theorem 1

Proof

Theorem 2

Proof

5 Experiments

5.1 Experiment settings

5.2 Comparison with proposed algorithms

Effect of |D|

Effect of k

Effect of λ

5.3 Evaluation on S2R-tree parameters

Effect of m

Effect of c

6 Conclusion and future work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

S²R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search

4.2 S²R-tree based algorithm

5.3 Evaluation on S²R-tree parameters