Diving into the Deep End of RDF: OWL, SHACL, and SPARQL, vs TerminusDB data products
A new way of managing linked data is emerging, embedded in a term called semantic data products. At its core, it enables a horizontally scalable approach to semantic knowledge graphs. Potentically a new acronym could emerge for this: a local-first approach to Zero Copy Integration.
This new way of dealing with semantic knowledge graphs is thus enabled by the open source TerminusDB engine for data products. Such data products include both version control (git-like moves) and a semantic layer on top of a traditional RDF triple store, with triples exposed as technical documents in JSON-LD format. Crucially, they included a datalog interface!
What this means in practice is that you get a practical distribution mechanism that enables collaboration at scale with semantic knowledge graphs, and an awesome foundation for distributed expert systems. More on that in future articles. Let’s go to the core of what this new approach means first.
A deep-dive. Not a short one.
A brief background
I’m the founder of DFRNT.com. Users of our semantic data product builder come from many data modelling backgrounds. Some know knowledge graph modelling primarily from the world of RDF, OWL and SPARQL; others come from the world of SQL and regular relational databases. Most have done at least a bit of both. I wanted to contrast RDF, OWL and SPARQL for which there is plenty information available, with what is provided by TerminusDB, as a community resource.
To put the DFRNT data product builder in context, it provides a data modelling user interface, visual modeller, and data product manager for TerminusDB data products. As such, it needs to closely follow the semantics of the TerminusDB semantic knowledge graph implementation. That implies that it is important to understand similarities and differences between existing knowledge graph implementations commonly in use.
Having recently presented to a group of seasoned knowledge modellers, questions were floated for which there was only scattered information available. This post is intended to relate DFRNT with TerminusDB data products having a united semantic schema, with existing triple store paradigms such as OWL, SHACL and naturally RDF.
Comparison between OWL+SHACL and TerminusDB
Let’s start with similarities and where there is a difference between the worlds of RDF, OWL+SHACL, and SPARQL; and the JSON-LD document-based RDF representation, JSON-LD based schema and WOQL world of TerminusDB (and thus also DFRNT).
We will be picking these concepts apart, illuminating each to shed light on how they relate and how to think of them in relation to data modelling in DFRNT as a data product builder.
Understanding RDF, Linked Data, OWL and SHACL
RDF, JSON-LD, OWL are representations used to store, manage, and query knowledge graph data based on subject-predicate-object triples. They form a combination of technologies that allows developers to access large amounts of connected knowledge in an efficient manner, especially when the shape of the data for a domain is emerging and possibly unknown.
RDF stands for Resource Description Framework. It is a format for representing information on the web using triples composed of subject-predicate-object elements which can be stored in databases known as triple stores. OWL (Web Ontology Language) is an extension of RDF that provides more advanced features such as defining classes and relationships between RDF resources and thus organising the knowledge nuggets, which makes it easier to search for specific data within a database as it is typed, and relationships expressed formally.
SHACL is a language that allows you to specify rules and conditions, and to define the shape of, or structure, the resources of an RDF graph should conform to. It is used primarily for data validation and to ensure that data instances and their properties are conformant. It provides a way to express rules for expected structures, data types, cardinality and more for RDF and OWL resources. It came to be due to arising needs for closed-world interpretations of OWL[^i].
Finally, SPARQL (Structured Query Language) is used to query both over RDF graphs, primarily contained in triple stores authored in OWL and SHACL, as well as other kinds of databases including relational databases.
TerminusDB Data Products + JSON-LD Linked Data
The TerminusDB triple store is also designed to store and query data using a system of RDF triples as data products, together with versioning and collaboration on the context-bounded data products themselves.
In contrast to these popular tools and languages in the OWL-space, the idea behind the TerminusDB semantic knowledge graph model is to better support the needs in enterprise data modelling. We see that the TerminusDB model also enables the cross-organisational use cases that DFRNT customers use it for, in domains characterised by high complexity.
It is important to note that TerminusDB also builds upon an RDF-based triple store. It has a schema engine compatible with OWL ontologies (but with a closed-world interpretation, similar to when adding SHACL, and with some constructs omitted for performance reasons), and a Prolog-based datalog query engine with the WOQL query language.
Furthermore, TerminusDB provides a way to make the data boundaries for a context clear. Its engine introduces the concept of bounded data products for instances and schema with specific prefixes stated as a shared context, along with a git-like model for data product collaboration.
The query language for the TerminusDB datalog engine is WOQL (Web Object Query Language). It is a domain-specific language (DSL) which builds on constraints-based declarative logic programming and is expressed directly through their SDK, native to the querying language (Javascript and Python) in use, and with a wire-representation of JSON in the WOQL API. Reasoning can be performed using declarative constructs connecting with equivalents from the Prolog inner implementation.
The schema engine itself is largely compatible with OWL constructs expressed as JSON-LD schema documents based on underlying RDF triple-store schema representations, including for example multiple inheritance of properties in the data model (when diamond property constraints are not violated). TerminusDB follows a closed-world interpretation suitable for the enterprise, merging OWL and SHACL constructs into a single schema language that can be computed over in a performant way[¹]. The closed world and open world interpretations will be highlighted further down in this post. The JSON-LD representation in TerminusDB assumes the context of the data product. It is worth noting that the @context key of JSON-LD maps is not mandatory[^ii] and in TerminusDB it is returned separately from instance data in document queries.
The Difference Between Open and Closed World Interpretations
When it comes to understanding the difference between closed world interpretation and open world interpretations of what is stored in an RDF triple store, it is important to understand what each of these terms mean. A closed world interpretation means that the data in each database (or data product) can only be interpreted within its own context.
That is, statements made about a specific dataset are limited by the contextual boundaries of the knowledge stored in the triple store being processed. In TerminusDB, each data product has a contextual description, indicating the @base of data product and the URI prefixes of schema and instance data stored in a data product, the bounded context of triples that are considered the entire “world”.
On the other hand, open world interpretations allow for more flexibility when interpreting data as they take into consideration external sources of information which may provide additional insight not available within a single triple store.
The key difference between an open world interpretation and a closed world interpretation could be summarised as follows:
- The open world interpretation returns unknown for an empty response.
- The closed world interpretation returns not found for an empty response.
RDF Triples as Controlled Documents in TerminusDB
TerminusDB exposes its triple store as individual triples in its datalog engine, WOQL datalog query language, and in the GraphQL API. A key advantage for using triple store data stored in TerminusDB is its ability to not only assemble JSON-LD documents from the RDF triples it stores, but to use the JSON-LD node ID to enable users and developers to interact with it through document-oriented REST-ful CRUD operations (create, read, update and delete) on sets of triples that belong together.
The schema is structured like layered, or framed, JSON-LD documents describing the schema as triples in a similar way as OWL ontologies. Top level JSON-LD frames for classes are called document types. They can have both specific data properties as part of their maps, as well as property references to “subdocuments” for triples that are directly connected and part of the top-level document. These “subdocuments” are created, fetched, deleted, and updated along with it, and represented as maps connected to properties in the JSON-LD representations.
The TerminusDB RDF triple store and query engine
TerminusDB’s triple store RDF engine, its built-in schema checking engine and datalog WOQL query engine offers an intuitive platform for managing large amounts of structured data with ease, especially for enterprise contexts where typed and trusted data is crucial.
With the unique combination of technologies in TerminusDB, its users can quickly create schemas defining relationships between resources stored in their data products while also leveraging existing RDF and OWL knowledge. The RDF notion of everything is a URI helps connect TerminusDB data product contexts, schema, and instance data to any other knowledge representation that builds upon RDF and OWL.
The interoperability allows for more sophisticated searching and analysing of data, as well as the ability to connect with information in a much broader context in applications that use TerminusDB data products. TerminusDB allows storing references to remote data instances through types created with the “Foreign” (instead of Class) keyword.
Comparing RDF, OWL and SPARQL, with TerminusDB
The world of data management has changed dramatically in recent decades with the emergence of knowledge graph representations such as RDF, OWL, SHACL and SPARQL, and tools implementing them. These tools are designed to make it easier for developers to store and query structured knowledge with RDF triples composed of subject-predicate-object elements, stored in databases known as triple stores.
TerminusDB takes a somewhat different, and integrated approach by combining a triple store with its document-oriented schema checker, and datalog query engine into one powerful semantic knowledge graph platform, for which DFRNT provides a user interface.
Understanding The Differences Between the Two Semantic Technology stacks
Although both RDF, OWL, SHACL and SPARQL, as well as TerminusDB offer ways for managing large amounts of structured data, they differ significantly in terms of their capabilities.
Triple stores for one or many contexts — the case for a data product approach
They both use RDF as the base framework for representing knowledge using triples composed of subject-predicate-object elements. A first difference is that most RDF stores store triples for one or many contexts as one database, whereas TerminusDB takes a different approach and stores triples for a specific and bounded context together in one triple store, a data product; expecting different contexts to live in separate data products.
Performing data processing and analytics over triples from more than one context is performed by instructing the datalog engine to operate over multiple data products collocated in the same instance, thanks to the data product collaboration (clone/push/pull) abilities to localize data products. The advantage of this data product approach is that the git-like collaboration and versioning abilities enables data governance over the data products where clear ownership, authority and processes for updates can be defined. Additionally, data can live in multiple places through an integrative data product-orientated local-first interpretation of the zero copy integration approach that is also suitable for data mesh scenarios.
Closed world schema definitions with one language instead of two, for the enterprise
OWL and SHACL together offer a closed-world interpretation to OWL but offer overlapping restriction sets which can be confusing. With wide contexts of triples in traditional triple stores (quads instead of triples are often used for contextualising the data), the definition of the boundaries is often harder to reason about in practice by comparison. OWL (Web Ontology Language) is an extension to RDF that provides additional capabilities such as defining classes, relationships between them, and properties, and SHACL enables constraints to be defined on top of OWL.
TerminusDB makes a different assumption, the data product has a specific context for which it is authoritative. The data product consists of a schema, and data instances of that schema. The interpretation is clearer and enables a document-oriented approach to the triple store thanks to the schema, enabling easy interaction from a computational perspective, as data structures in most languages are easier to match with a document approach, and is how JSON as a data format has become a de-facto standard for API integration. It becomes easier to process linked data when presented as coherent structured technical documents, in JSON-LD format, than to operate on triples alone.
As with OWL; classes, relationships between them, and properties are supported in the TerminusDB schema and follow similar patterns. The difference being that constraints are imposed by the TerminusDB schema engine and are enforced over the instance data and part of the TerminusDB schema language that is expressed using JSON-LD schema documents[^iii].
Query language differences: SPARQL and WOQL
SPARQL (Structured Query Language) is an SQL-like query language used to extract information from RDF triple stores. It can use OWL for reasoning and providing necessary descriptions of data stored in the triple store. SHACL helps ensure that the triples in the data store conforms to the constraints imposed upon the data in the RDF store.
This sounds similar to what the TerminusDB triple store schema checker does, which is an accurate observation. The TerminusDB schema uses the same definitions for the knowledge graph both for the schema checker and for the shapes of the data and keeps it all in one definition that is easily updated and maintained as individual and reusable definitions. To be clear, TerminusDB does not implement SHACL, instead similar constructs are integrated directly into the Terminus JSON-LD schema definitions.
The WOQL Datalog Query Language
The WOQL query language offers a declarative datalog experience for extracting data from the document-oriented triple store. JSON-LD documents and subdocuments are traversable from the perspective of the triples in the triple store and without being limited by the JSON-LD document representations.
If a JSON-LD node is expected (based on the schema), it can be returned as a document, or as its @id reference. The same way of expressing results declaratively is used in GraphQL for traversing the triples from node, via a property, to another node, no matter if it is a document or a subdocument, and easily travels the boundaries of typed documents.
Both the GraphQL and WOQL approaches support path queries to be expressed, enabling traversal over multiple hops in the knowledge graph when needed.
Considerations for making your choice between SPARQL and TerminusDB
Ultimately, the decision to choose between RDF, OWL and SPARQL; or TerminusDB will depend on the project needs, and a key distinction is the representation and bounds being triples/quads or separate data products; and whether RDF, OWL, SPARQL and SHACL are important properties of the ecosystem or if an all-in-one model is preferred.
Advantages of Using TerminusDB Model
We believe that the TerminusDB model offers many advantages using over an RDF, OWL, SHACL and SPARQL combination, with the key distinctions being the clarity of the bounded data product model, the document-oriented JSON-LD approach together with a single and unified schema representation language.
We believe TerminusDB is great for the enterprise
We believe the TerminusDB data model is especially useful for projects that require a high degree of precision and accuracy in managing data, offering a fully typed data model with framed JSON-LD data definitions. Avoiding the impedance mismatch between traditional knowledge graphs and data processing, designed to help developers access large amounts of semantic knowledge graph data quickly and easily, makes it easier to build semantic knowledge graph applications and frontends.
The ease of build is aided by data definitions that can be reused across technology stacks that are easy to recognise for software engineers. Offering a type-checked structured document approach makes the interaction very similar to any other JSON-based REST API, but with the advantage of fully typed data.
Emerging semantic modelling tooling and DFRNT
Furthermore, on top of this combination of the TerminusDB technology stack, the DFRNT data product builder and modeller enable TerminusDB users to quickly create schemas, define classes and relationships between them, together with rich properties and constraints on them.
This improves the ability for digital architects and data wizards to express their data models in a reusable format and helps them interpret information within a much broader context than with other technologies alone.
The data product paradigm shift
Finally, TerminusDB offers a closed world interpretation, vs. the open world interpretation in OWL[^iv]. The OWL open world approach is useful for applications that need to account for unknown elements or potential inconsistencies in datasets. On the other hand, the closed world approach increases data accuracy as data becomes definitive and authoritative for the domain under consideration. This is important for the enterprise as data for production use needs to be well-defined for collaboration and use, to be deterministic.
In conclusion, the TerminusDB model provides an intuitive and powerful approach for managing structured data with ease of use. It also boasts features such as versioning using a git-like approach and a datalog engine and WOQL declarative query language for reasoning about the knowledge graph in its base triple encoding.
RDF with OWL, SHACL and SPARQL, or TerminusDB?
Choosing either RDF, OWL and SPARQL or TerminusDB for structuring data will have its own implications depending on the specific project needs. For example, using RDF, OWL, SHACL and SPARQL together is more suitable for projects that require an interoperability layer with the academic world and existing data already following OWL (and SHACL) definitions.
On the other hand, using TerminusDB may be better suited for projects that require a datalog reasoning engine, easy to use document-oriented APIs, a GraphQL engine, but most importantly, a git-like collaboration framework, with versioning, delta change requests, and branching built in, centred around context-bound data products as the unit of collaboration within and outside of the enterprise.
The TerminusDB semantic data product paradigm
The TerminusDB data product paradigm enables cross-organisational and cross-department collaboration, as the authority and governance aspects of a data product paradigm and contextualised domains can both be defined and managed. Each domain can distribute versioned data products and receive change requests back with improvement suggestions, corrections and more.
Multiple data products can be cloned and co-located for analysis that spans data products, with delta updates pushed and pulled to/from the data domain source data products.
In these kinds of scenarios, the closed world interpretation of the schema and instance data, compared to the open world interpretations of OWL is an advantage with the accuracy it brings, similarly to what SHACL offers, but with the benefits of the data product paradigm and a unified schema language that can easily be transformed and maintained in software structures and functions.
Tips for Getting Started with TerminusDB
If you are looking to get started with TerminusDB for your own projects, here are some tips that may be useful. Firstly, it is important to become familiar with the structure of the triple store schema representations and differences between SPARQL and the WOQL datalog query engine for your use cases.
The datalog foundations for WOQL could offer some important benefits over SPARQL, whereas SPARQL might enable reuse of existing queries and ways of working. If coming from the world of OWL, it is also beneficial to understand the basic differences between the open and closed world interpretations of OWL.
The amount of control and customisation you need
Once you have established a basic understanding of these fundamentals of linked data, it is important to consider what type of data you are working with. How much control and customisation do you need over that data?
Understanding the pros and cons with a semi-structured or completely structured dataset is key, as this helps compare approaches with using SHACL and OWL together, or whether it would be beneficial to use the single schema language of TerminusDB offering both structure and constraints in one data control plane. This is an important factor in deciding whether or not TerminusDB is the right platform for your project.
Understand how the data and model will be deployed
Finally, it is important to consider how you intend to deploy your data, in one or many data products, or as a single triple or quad store. There are several options for running TerminusDB available depending on your specific needs, and too many options for RDF and OWL to mention here.
TerminusDB instances are available from TerminusDB, as TerminusCMS, a Azure hosted cloud-native offering, integrated hosting with the DFRNT data modeller here at dfrnt.com. TerminusDB can even be self-hosted locally with the open source TerminusDB version using Docker and connected to with DFRNT. It would then be running locally on your own laptop, even on Windows!
With these considerations in mind, you should be ready to get started with building out your own TerminusDB instance and data products in it.
Integrated TerminusDB data product hosting in DFRNT
We hope to provide one of the most intuitive data product builder experiences with the DFRNT visual data modeller as it offers forms-based, YAML and JSON modelling of both instance and schema data, together with a GraphQL user interface, and ability to visualise TerminusDB schema and data. A simple example data product is built into the DFRNT free trial, making it easy to get started.
Both paradigms offer powerful capabilities for semantic knowledge graphs
In conclusion, the combination of RDF, OWL and SPARQL as well as TerminusDB’s triple store, schema engine and WOQL datalog query engine offer powerful capabilities for managing structured data.
The open world interpretation vs. closed world interpretation as offered by TerminusDB, is an important consideration to make before choosing your tech stack. The open world interpretation found in OWL without SHACL adds an additional layer of flexibility when dealing with unknown elements or potential inconsistencies in a dataset, but also leads to risks of data inconsistencies that enterprises often want to avoid.
Both sets of stacks offer closed world interpretations of data (when OWL is used in conjunction with SHACL), and as one unified schema expression in TerminusDB. We believe that the unified stack with TerminusDB data products and collaboration abilities offers advantageous characteristics for many use cases, especially in complex cross-organisational ecosystems.
Use cases
Based on conversations with our users, we see that models of intra- and inter-organisational large-scale complex collaboration ecosystems have benefited immensely from the specific properties of TerminusDB data products. But we also hear of much simpler data product that have simple model, but where the easy modelling experience makes it easy to express both the shape of the data, and to get data added to the data product through document-orientated triples as forms, tabular and YAML based data entry and editing interfaces, and with optional JSON-LD as well.
Ultimately, the choice between RDF, OWL and SPARQL or TerminusDB will depend on the specific needs of a project. With the considerations outlined in this blog post, software developers, architects and semantic data engineers should be able to make informed decisions on which platform best suits their requirements and get started with building out their own projects.
This was a long blog post! Thanks for taking the time, hope it is useful to you!
This post originally appeared in The Semantic Data Engineer publication at Medium, and on the DFRNT blog: How to Make an Informed Decision with RDF: OWL, SHACL, and SPARQL vs TerminusDB | DFRNT.
Footnotes:
[¹]: Considerations in using SHACL instead of OWL / RDFS for schema constraints, see Issue #123 · terminusdb/terminusdb · GitHub
[^i]: SHACL and OWL Compared (spinrdf.org)
[^ii]: JSON-LD nodes, do not require the @context element. A context has unique properties.
[^iii]: Schema Reference — TerminusDB
[^iv]: Consider using SHACL instead of OWL / RDFS for schema constraints · Issue #123 · terminusdb/terminusdb · GitHub
The Semantic Data Engineer publication on Medium is found here: https://medium.com/the-semantic-data-engineer