Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds

Zhang, Xingjian; Song, Dezhao; Priya, Sambhawa; Heflin, Jeff

doi:10.1007/978-3-642-41335-3_43

Xingjian Zhang²⁶,
Dezhao Song²⁶,
Sambhawa Priya²⁶ &
…
Jeff Heflin²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8218))

Included in the following conference series:

International Semantic Web Conference

4409 Accesses
1 Citations

Abstract

In this paper we present the infrastructure of the contextual tag cloud system which can execute large volumes of queries about the number of instances that use particular ontological terms. The contextual tag cloud system is a novel application that helps users explore a large scale RDF dataset: the tags are ontological terms (classes and properties), the context is a set of tags that defines a subset of instances, and the font sizes reflect the number of instances that use each tag. It visualizes the patterns of instances specified by the context a user constructs. Given a request with a specific context, the system needs to quickly find what other tags the instances in the context use, and how many instances in the context use each tag. The key question we answer in this paper is how to scale to Linked Data; in particular we use a dataset with 1.4 billion triples and over 380,000 tags. This is complicated by the fact that the calculation should, when directed by the user, consider the entailment of taxonomic and/or domain/range axioms in the ontology. We combine a scalable preprocessing approach with a specially-constructed inverted index and use three approaches to prune unnecessary counts for faster intersection computations. We compare our system with a state-of-the-art triple store, examine how pruning rules interact with inference and analyze our design choices.

Download to read the full chapter text

Chapter PDF

Browsing Linked Data Catalogs with LODAtlas

LOD Lab: Scalable Linked Data Processing

Using Triple Pattern Fragments to Enable Streaming of Top-k Shortest Paths via the Web

Keywords

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable Semantic Web data management using vertical partitioning. In: VLDB, pp. 411–422 (2007)
Google Scholar
Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. The VLDB Journal 18(2), 385–406 (2009)
Article Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)
Chapter Google Scholar
Cheng, G., Ge, W., Qu, Y.: Falcons: searching and browsing entities on the Semantic Web. In: WWW, pp. 1101–1102 (2008)
Google Scholar
d’Aquin, M., Motta, E.: Watson, more than a Semantic Web search engine. Semantic Web Journal 2(1), 55–63 (2011)
Google Scholar
Delbru, R., Campinas, S., Tummarello, G.: Searching web data: an entity retrieval and high-performance indexing model. Journal of Web Semantics 10 (2012)
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2), 158–182 (2005)
Article Google Scholar
Lei, Y., Uren, V.S., Motta, E.: Semsearch: A search engine for the Semantic Web. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, pp. 238–245. Springer, Heidelberg (2006)
Chapter Google Scholar
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19(1), 91–113 (2010)
Article Google Scholar
Pan, Z., Heflin, J.: DLDB: Extending relational databases to support Semantic Web queries. In: Workshop on Practical and Scaleable Semantic Web Systems, pp. 109–113 (2003)
Google Scholar
Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. In: Meersman, R., Tari, Z. (eds.) OTM-WS 2007, Part II. LNCS, vol. 4806, pp. 1105–1114. Springer, Heidelberg (2007)
Chapter Google Scholar
Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. ACM SIGMOD Record 38(4), 23–28 (2010)
Google Scholar
Tummarello, G., Delbru, R., Oren, E.: Sindice.com: Weaving the open linked data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 552–565. Springer, Heidelberg (2007)
Chapter Google Scholar
Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.C., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: WWW, pp. 639–648 (2012)
Google Scholar
Walter, S., Unger, C., Cimiano, P., Bär, D.: Evaluation of a layered approach to question answering over linked data. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 362–374. Springer, Heidelberg (2012)
Chapter Google Scholar
Zhang, X., Heflin, J.: Using tag clouds to quickly discover patterns in linked data sets. In: Workshop on Consuming Linked Data (2011)
Google Scholar
Zhang, X., Song, D., Priya, S., Heflin, J.: Infrastructure for efficient exploration of large scale linked data via contextual tag clouds. Tech. Rep. LU-CSE-13-002, Department of Computer Science and Engineering, Lehigh University
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engineering, Lehigh University, 19 Memorial Drive West, Bethlehem, PA, 18015, USA
Xingjian Zhang, Dezhao Song, Sambhawa Priya & Jeff Heflin

Authors

Xingjian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dezhao Song
View author publications
You can also search for this author in PubMed Google Scholar
Sambhawa Priya
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Heflin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, UK
Harith Alani
Massachusetts Institute of Technology, Cambridge, MA, USA
Lalana Kagal
IBM Research, Hawthorne, NY, USA
Achille Fokoue
Free University Amsterdam, The Netherlands
Paul Groth
Technical University Darmstadt, Germany
Chris Biemann
Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Josiane Xavier Parreira
VU Amsterdam, The Netherlands
Lora Aroyo
Stanford University, CA, USA
Natasha Noy
IBM Research, Yorktown Heights, NY, USA
Chris Welty
University of California, Santa Barbara, CA, USA
Krzysztof Janowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Song, D., Priya, S., Heflin, J. (2013). Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8218. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41335-3_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-41335-3_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41334-6
Online ISBN: 978-3-642-41335-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds

Abstract

Chapter PDF

Similar content being viewed by others

Browsing Linked Data Catalogs with LODAtlas

LOD Lab: Scalable Linked Data Processing

Using Triple Pattern Fragments to Enable Streaming of Top-k Shortest Paths via the Web

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds

Abstract

Chapter PDF

Similar content being viewed by others

Browsing Linked Data Catalogs with LODAtlas

LOD Lab: Scalable Linked Data Processing

Using Triple Pattern Fragments to Enable Streaming of Top-k Shortest Paths via the Web

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation