Please
refer
to
check
the
errata
for
any
errors
or
issues
reported
since
publication.
This
document
is
also
available
in
this
document,
which
may
include
some
normative
corrections.
non-normative
format:
diff
to
previous
version
Copyright
©
2010-2013
2010-2015
W3C
®
(
MIT
,
ERCIM
,
Keio
,
Beihang
),
All
Rights
Reserved.
).
W3C
liability
,
trademark
and
document
use
rules
apply.
The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page's title, content type, and preview image to provide appropriate information in a user's newsfeed when she clicks the "Like" button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines.
A key enabling technology behind these developments is the ability to add structured data to HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that allows just that: it provides a set of markup attributes to augment the visual information on the Web with machine-readable hints. In this Primer, we show how to express data using RDFa in HTML, and in particular how to mark up existing human-readable Web page content to express machine-readable data.
This
document
provides
only
a
Primer
to
RDFa 1.1.
The
complete
specification
of
RDFa,
with
further
examples,
can
be
found
in
the
RDFa 1.1
Core [
rdfa-core
],
RDFa
Lite [
rdfa-lite
],
XHTML+RDFa 1.1 [
xhtml-rdfa
],
and
the
HTML5+RDFa 1.1 [
rdfa-in-html
html-rdfa
]
specifications.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was published by the RDFa Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-rdfa@w3.org ( subscribe , archives ). All comments are welcome.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 14 October 2005 W3C Process Document .
The web is a rich, distributed repository of interconnected information. Until recently, it was organized primarily for human consumption. On a typical web page, an HTML author might specify a headline, then a smaller sub-headline, a block of italicized text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands what the headline expresses-a blog post title. The sub-headline indicates the author, the italicized text is the article's publication date, and the single-word links are subject categories. Computers do not understand the nuances between the information; the gap between what programs and humans understand is large.
What if the browser, or any machine consumer such as a Web crawler, received information on the meaning of a web page's visual elements? A dinner party announced on a blog could be copied to the user's calendar, an author's complete contact information to the user's address book. Users could automatically recall previously browsed articles according to categorization labels (i.e., tags). A photo copied and pasted from a web site to a school report would carry with it a link back to the photographer, giving him proper credit. A link shared by a user to his social network contacts would automatically carry additional data pulled from the original web page: a thumbnail, an author, and a specific title. When web data meant for humans is augmented with hints meant for computer programs, these programs become significantly more helpful, because they begin to understand the data's structure.
RDFa allows HTML authors to do just that. Using a few simple HTML attributes, authors can mark up human-readable data with machine-readable indicators for browsers and other programs to interpret. A web page can include markup for items as simple as the title of an article, or as complex as a user's complete social network.
Historically,
RDFa 1.0 [
rdfa-syntax
]
was
specified
only
for
XHTML.
RDFa 1.1 [
rdfa-core
]
is
the
newer
version
and
the
one
used
in
this
document.
RDFa 1.1
is
specified
for
both
XHTML [
xhtml-rdfa
]
and
HTML5 [
rdfa-in-html
html-rdfa
].
In
fact,
RDFa 1.1
also
works
for
any
XML-based
languages
like
SVG [
SVG11
svg11
].
This
document
uses
HTML
in
all
of
the
examples;
for
simplicity,
we
use
the
term
"HTML"
throughout
this
document
to
refer
to
all
of
the
HTML-family
languages.
RDFa
is
based
on
attributes.
While
some
of
the
HTML
attributes
(e.g.,
href
,
src
)
have
been
re-used,
other
RDFa
attributes
are
new.
This
is
important
because
some
of
the
(X)HTML
validators
may
not
properly
validate
the
HTML
code
until
they
are
updated
to
recognize
the
new
RDFa
attributes.
This
is
rarely
a
problem
in
practice
since
browsers
simply
ignore
attributes
that
they
do
not
recognize.
None
of
the
RDFa-specific
attributes
have
any
effect
on
the
visual
display
of
the
HTML
content.
Authors
do
not
have
to
worry
about
pages
marked
up
with
RDFa
looking
any
different
to
a
human
being
from
pages
not
marked
up
with
RDFa.
We begin the introduction to RDFa by using a subset of all the possibilities called RDFa Lite 1.1 [ rdfa-lite ]. The goal, when defining that subset, was to define a set of possibilities that can be applied to most simple to moderate structured data markup tasks, without burdening the authors with additional complexities. Many Web authors will not need to use more than this minimal subset.
Consider
Alice,
a
blogger
who
publishes
a
mix
of
professional
and
personal
articles
at
http://example.com/alice
.
We
will
construct
markup
examples
to
illustrate
how
Alice
can
use
RDFa.
A
more
complete
markup
of
these
examples
is
available
on
a
dedicated
page
.
The previous example demonstrated how Alice can markup text to make it machine readable. She would also like to mark up the links in a machine-readable way, to express the type of link being described. RDFa lets the publisher add a "flavor", i.e., a label, to an existing clickable link that processors can understand. This makes the same markup help both humans and machines.
In
her
blog's
footer,
Alice
already
declares
her
content
to
be
freely
reusable,
as
long
as
she
receives
due
credit
when
her
articles
are
cited.
The
HTML
includes
a
link
to
a
Creative
Commons [
CC-ABOUT
cc-about
]
license:
<p>All content on this site is licensed under <a href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p>
A human clearly understands this sentence, in particular the meaning of the link with respect to the current document: it indicates the document's license, the conditions under which the page's contents are distributed. Unfortunately, when Bob visits Alice's blog, his browser sees only a plain link that could just as well point to one of Alice's friends or to her CV. For Bob's browser to understand that this link actually points to the document's licensing terms, Alice needs to add some flavor , some indication of what kind of link this is.
She
can
add
this
flavor
using
again
the
property
attribute.
Indeed,
when
the
element
contains
the
href
(or
src
)
attribute,
property
is
automatically
associated
with
the
value
of
this
attribute
rather
than
the
textual
content
of
the
a
element.
The
value
of
the
attribute
is
the
http://creativecommons.org/ns#license
,
defined
by
the
Creative
Commons
:
<p>All content on this site is licensed under
<a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/">
a
Creative
Commons
License</a>.
©2011
Alice
Birpemswick.</p>
With this small update, Bob's browser will now understand that this link has a flavor: it indicates the blog's license:
Alice is quite pleased that she was able to add only structured-data hints via RDFa, never having to repeat the content of her text or the URL of her clickable links.
In
a
number
of
simple
use
cases,
such
as
our
example
with
Alice's
blog,
HTML
authors
will
predominantly
use
a
single
vocabulary.
However,
while
generating
full
URLs
via
a
CMS
system
is
not
a
particular
problem,
typing
these
by
hand
may
be
error
prone
and
tedious
for
humans.
To
alleviate
this
problem
RDFa
introduces
the
vocab
attribute
to
let
the
author
declare
a
single
vocabulary
for
a
chunk
of
HTML.
Thus,
instead
of:
<html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created">2011-09-10</span></p> ... </body>
Alice can write:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/"> ... <h2 property="title">The Trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> ... </body>
Note
how
the
property
values
are
single
"terms"
now;
these
are
simply
concatenated
to
the
URL
defined
via
the
vocab
attribute.
The
attribute
can
be
placed
on
any
HTML
element
(i.e.,
not
only
on
the
body
element
like
in
the
example)
and
its
effect
is
valid
for
all
the
elements
below
that
point.
Default vocabularies and full URIs can be mixed at any time. I.e., Alice could have written:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/"> ... <h2 property="title">The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created">2011-09-10</span></p> ... </body>
Perhaps a more interesting example is the combination of the header with the licensing segment of her web page:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/"> ... <h2 property="title">The Trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> ... <p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p> </body> </html>
The
full
URL
for
the
license
term
is
necessary
to
avoid
mixing
vocabularies.
As
an
alternative,
Alice
could
have
also
chosen
to
use
the
vocab
attribute
again:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/"> ... <h2 property="title">The Trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> ... <p vocab="http://creativecommons.org/ns#">All content on this site is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p> </body> </html>
because
the
vocab
in
the
license
paragraph
overrides
the
definition
inherited
from
the
body
of
the
document.
The
vocab
attribute
references
structured
data
vocabularies,
identified
using
URLs.
RDFa
does
not
limit
the
form
of
these
URLs
or
the
document
formats
accessible
by
de-referencing
them;
however
users
SHOULD
aim
to
use
widely
shared,
conventional
values
for
identifying
such
vocabularies,
following
conventions
of
case,
spelling
etc.
established
by
their
publishers.
Alice's
blog
page
may
contain,
of
course,
multiple
entries.
Sometimes,
Alice's
sister
Eve
guest
blogs,
too.
The
front
page
of
the
blog
lists
the
10
most
recent
entries,
each
with
its
own
title,
author,
and
introductory
paragraph.
How,
then,
should
Alice
mark
up
the
title
of
each
of
these
entries
individually
even
though
they
all
appear
within
the
same
web
page?
RDFa
provides
resource
,
an
attribute
for
specifying
the
"context",
i.e.,
the
exact
URL
to
which
the
contained
RDFa
markup
applies:
<body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... </div> ... <div resource="/alice/posts/jos_barbecue"> <h2 property="title">Jo's Barbecue</h2> <p>Date: <span property="created">2011-09-14</span></p> <h3 property="creator">Eve</h3> ... </div> ... </body>
(Note
that
we
used
relative
URLs
in
the
example;
the
value
of
resource
could
have
been
any
URLs,
i.e.,
relative
or
absolute.)
We
can
represent
this,
once
again,
as
a
diagram
connecting
URLs
to
properties:
Alice can use the same technique to give her friend Bob proper credit when she posts one of his photos:
<div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> ... The trouble with Bob is that he takes much better photos than I do: ... <div resource="http://example.com/bob/photos/sunset.jpg"> <img src="http://example.com/bob/photos/sunset.jpg" /> <span property="title">Beautiful Sunset</span> by <span property="creator">Bob</span>. </div> </div>
Notice
how
the
innermost
resource
value,
http://example.com/bob/photos/sunset.jpg
,
"overrides"
the
outer
value
/alice/posts/trouble_with_bob
for
all
markup
inside
the
containing
div
.
Once
again,
here
is
a
diagram
that
represents
the
underlying
data
of
this
new
portion
of
markup:
We have seen, in a previous section , how Alice can use RDFa to include Creative Commons statements on her blog. However, the solution in that section assigned these statements to the whole page , and not to individual blog items. This may be an issue if the page includes multiple items . Indeed, Alice may be forced to repeat the relevant statements like this:
<body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... <p vocab="http://creativecommons.org/ns#">All content on this blog item is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. <span property="attributionName">©2011 Alice Birpemswick</span>.</p> </div> ... <div resource="/alice/posts/jims_concert"> <h2 property="title">I was at Jim's concert the other day</h2> <p>Date: <span property="created">2011-10-22</span></p> <h3 property="creator">Alice</h3> ... <p vocab="http://creativecommons.org/ns#">All content on this blog item is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. <span property="attributionName">©2011 Alice Birpemswick</span>.</p> </div> ... </body>
which may be tedious and error prone.
HTML+RDFa
introduces
the
notion
of
"Property
copying"
to
alleviate
this
situation.
Using
this
feature
Alice
can
"collect"
a
number
of
statements
as
a
pattern,
and
refer
to
that
pattern
from
other
parts
of
the
page.
This
is
done
using
the
magic
property
rdfa:copy
and
the
magic
type
rdfa:Pattern
as
follows:
<body vocab="http://purl.org/dc/terms/"> ... <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <p>Date: <span property="created">2011-09-10</span></p> <h3 property="creator">Alice</h3> ... <link property="rdfa:copy" href="#ccpattern"/> </div> ... <div resource="/alice/posts/jims_concert"> <h2 property="title">I was at Jim's concert the other day</h2> <p>Date: <span property="created">2011-10-22</span></p> <h3 property="creator">Alice</h3> ... <link property="rdfa:copy" href="#ccpattern"/> </div> ... <div resource="#ccpattern" typeof="rdfa:Pattern"> <p vocab="http://creativecommons.org/ns#">All content on this blog item is licensed under <a property="license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. <span property="attributionName">©2011 Alice Birpemswick</span>.</p> </div> </body>
(Alice
may
choose
to
use
CSS
to
make
the
CC
statements
invisible
on
the
screen
if
she
wants.)
The
effect
of
this
structure
is
to,
conceptually,
"copy"
all
the
RDFa
statements
appearing
in
the
pattern
to
replace
the
link
element,
yielding
the
following
structure:
Alice may want to add her personal data to her individual blog items, too. She decides to combine her FOAF data with the blog items, i.e.:
<div vocab="http://purl.org/dc/terms/">
<div resource="/alice/posts/trouble_with_bob">
<h2 property="title">The trouble with Bob</h2>
...
<h3 vocab="http://xmlns.com/foaf/0.1/" property="http://purl.org/dc/terms/creator" typeof="Person">
<span property="name">Alice Birpemswick</span>,
Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>,
Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a>
</h3>
...
</div>
...
</div>
The structured data she generates looks like this:
Unfortunately,
this
solution
is
not
optimal
in
two
respects.
First
of
all,
notice
that
Alice
had
to
use
the
full
URI
for
the
creator
property:
this
is
because
the
vocab
attribute
is
used
to
set
the
FOAF
terms,
i.e.,
the
simple
creator
value
would
have
been
misinterpreted.
We
will
come
back
to
the
issue
of
using
several
vocabularies
in
another
section
below.
The other issue is that Alice would like to design her Web page so that her personal data would not appear on the page in each individual blog item but, rather, in one place like a footnote or a sidebar. I.e., what she would like to see is something like:
If
the
FOAF
data
was
were
included
into
in
each
blog
item,
Alice
would
have
to
create
a
complex
set
of
CSS
rules
to
achieve
the
visual
effect
she
wants.
To
solve
this,
Alice
decides
to
make
use
of
the
structure
she
already
used
for
her
FOAF
data
but,
this
time,
assigning
it
a
separate
URI
using
the
resource
attribute:
<div vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person">
<p>
<span property="name">Alice Birpemswick</span>,
Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>,
Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a>
</p>
...
</div>
It is actually considered as a good practice to use real URIs whenever possible, i.e., Alice's new alternative should be preferred in general. Indeed, if a real URI is used, then it becomes possible to unambiguously refer to that particular piece of information, whereas that becomes more complicated with blank nodes.
The
resource="#me"
markup
(which,
by
the
way,
also
presupposes
that
the
target
is
in
the
same
HTML
scope)
is
a
FOAF
convention:
the
URL
that
represents
the
person
Alice
is
http://example.com/alice#me
.
It
should
not
be
confused
with
Alice's
homepage,
http://example.com/alice
.
Of
course,
Alice
could
have
used
a
different
URI
if,
for
example,
her
blog
and
her
personal
homepage
were
kept
separate;
e.g.,
she
could
have
used
resource="http://alice.example.com/alice/home#myself"
instead
of
resource="#me"
.
Using
the
explicit
URI
for
her
FOAF
data
Alice
can
add
a
direct
reference
to
the
blog
item
using
again
the
resource
attribute:
<div vocab="http://purl.org/dc/terms/">
<div resource="/alice/posts/trouble_with_bob">
<h2 property="title">The trouble with Bob</h2>
<h3 property="creator" resource="#me">Alice</h3>
...
</div>
</div>
...
<div class="sidebar" vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person">
<p>
<span property="name">Alice Birpemswick</span>,
Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>,
Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a>
</p>
...
</div>
The
resource
attribute
appears,
in
this
case,
together
with
property
on
the
same
element
:
in
this
situation
resource
indicates
the
"target"
of
the
relation.
Usage
of
this
attribute
allows
Alice
to
"distribute"
the
various
parts
of
her
structured
data
on
her
page.
What
she
gets
is
a
slightly
modified
version
of
the
previous
structure,
where
the
only
difference
is
the
usage
of
an
explicit
URI
instead
of
a
blank
node:
Using
this
approach,
it
becomes
very
easy
to
also
add
references
to
the
same
data
from
different
blogs:
blog
posts:
<div vocab="http://purl.org/dc/terms/"> <div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <h3 property="creator" resource="#me">Alice</h3> ... </div> </div> ... <div vocab="http://purl.org/dc/terms/"> <div resource="/alice/posts/my_photos"> <h2 property="title">I will post my photos nevertheless…</h2> <h3 property="creator" resource="#me">Alice</h3> ... </div> </div> ... <div class="sidebar" vocab="http://xmlns.com/foaf/0.1/" resource="#me" typeof="Person"> <p> <span property="name">Alice Birpemswick</span>, Email: <a property="mbox" href="mailto:alice@example.com">alice@example.com</a>, Phone: <a property="phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p> ... </div>
Leading to the following structure:
Combined
with
property
,
the
resource
attribute
plays
exactly
the
same
role
as
href
,
already
used
for
"links
with
flavor",
except
that
it
does
not
provide
a
clickable
link
to
the
browser
like
href
does.
Also,
the
resource
attribute
can
be
used
on
any
HTML
element,
as
opposed
to
href
whose
usage
is
restricted,
in
HTML,
to
the
a
and
link
elements.
There
is
a
similarity
between
this
issue
and
its
solution
and
the
issue
and
the
approach
taken
in
the
section
on
property
copying
.
There
is,
however,
a
subtle
but
important
difference
between
the
two.
The
solution
using
the
resource
attribute
introduces
a
new
node
in
the
graph,
as
shown
on
Figure
12
,
whereas
copying
the
properties
does
not.
Which
of
the
two
approaches
should
be
adopted
is
often
based
on
the
vocabulary
that
is
used.
The previous examples show that, for more complex cases, multiple vocabularies have to be used to express the various aspects of structured data. We have seen Alice using the Dublin Core, as well as the FOAF and the Creative Commons vocabularies, but there may be more. For example. Alice may want to add vocabulary elements defined by search engines on their schema.org site [ schema ].
Alice
can
use
either
full
URLs
for
all
the
terms,
or
can
use
the
vocab
attribute
to
abbreviate
the
terms
for
the
predominant
vocabulary.
But,
in
some
cases,
the
vocabularies
cannot
be
separated
easily,
which
means
that
the
usage
of
vocab
may
become
awkward.
Here
is,
for
example,
the
kind
of
HTML
she
might
end
up
with:
<html> <head> ... </head> <body vocab="http://schema.org/"> <div resource="/alice/posts/trouble_with_bob" typeof="BlogPosting"> <h2 property="http://purl.org/dc/terms/title">The trouble with Bob</h2> ... <h3 property="http://purl.org/dc/terms/creator" resource="#me">Alice</h3> <div property="articleBody"> <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> ... </body> </html>
Note
that
the
schema.org
and
the
Dublin
Core
terms
are
intertwined
for
a
specific
blog,
and
it
becomes
an
arbitrary
choice
whether
to
use
the
vocab
attribute
for
http://purl.org/dc/terms/
or
for
http://schema.org/
.
We
have
seen
the
same
problem
in
a
previous
section
when
FOAF
and
Dublin
Core
terms
were
mixed.
To
alleviate
this
problem,
RDFa
offers
the
possibility
of
using
prefixed
terms:
a
special
prefix
attribute
can
assign
prefixes
to
represent
URLs
and,
using
those
prefixes,
the
vocabulary
elements
themselves
can
be
abbreviated.
The
prefix:reference
syntax
is
used:
the
URL
associated
with
prefix
is
simply
concatenated
to
reference
to
create
a
full
URL.
(Note
that
we
have
already
used
this
convention
to
simplify
our
figures.)
Here
is
how
the
HTML
of
the
previous
example
looks
like
when
prefixes
are
used:
<html> <head> ... </head> <body prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/"> <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting"> <h2 property="dc:title">The trouble with Bob</h2> ... <h3 property="dc:creator" resource="#me">Alice</h3> <div property="schema:articleBody"> <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>
The
usage
of
prefixes
can
greatly
reduce
possible
errors
by
concentrating
the
vocabulary
choices
to
one
place
in
the
file.
Just
like
vocab
,
the
prefix
attribute
can
appear
anywhere
in
the
HTML
file,
only
affecting
the
elements
below.
prefix
and
vocab
can
also
be
mixed,
for
example:
<html> <head> ... </head> <body vocab="http://purl.org/dc/terms/" prefix="schema: http://schema.org/"> <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting"> <h2 property="title">The trouble with Bob</h2> ... <h3 property="creator" resource="#me">Alice</h3> <div property="schema:articleBody"> <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>
html
element
contains
a
large
number
of
prefix
declarations.
The
character
encoding
(i.e.,
UTF-8,
UTF-16,
ASCII,
etc.)
used
for
an
HTML5
file
is
declared
using
a
meta
element
in
the
header.
In
HTML5
this
meta
declaration
must
fall
within
the
first
512
bytes
of
the
page,
or
the
HTML5
processor
(browser,
parser,
etc.)
will
try
to
detect
the
encoding
using
some
heuristics.
A
very
"long"
html
tag
may
therefore
lead
to
problems.
One
way
of
avoiding
the
issue
is
to
place
most
of
the
prefix
declarations
on
the
body
element.
The
previous
example,
whereby
the
Dublin
Core
and
the
schema.org
vocabularies
are
used
within
the
same
blog
post,
raises
another
issue.
It
so
happens
that
not
only
Dublin
Core,
but
also
schema.org
has
a
property
called
creator
.
Because
RDFa
uses
URIs
to
denote
properties
that,
by
itself,
is
not
a
problem.
However,
if
Alice
wants
to
use
both
these
properties
in
the
same
blog
post
(e.g.,
because
she
wants
search
engines
to
manage
her
blog
post
but,
at
the
same
times,
she
wants
Dublin
Core
aware
applications,
like
catalogs,
to
handle
her
blog
post,
too)
this
is
what
she
may
have
to
do:
<html> <head> ... </head> <body prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/"> <div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting"> <h2 property="dc:title">The trouble with Bob</h2> ... <h3 property="dc:creator" resource="#me"><span property="schema:creator" resource="#me">Alice</span></h3> <div property="schema:articleBody"> <p>The trouble with Bob is that he takes much better photos than I do:</p> </div> ... </div> </body> </html>
Which
is
a
bit
awkward.
Fortunately,
RDFa
allows
the
value
of
a
property
attribute
to
be
a
list
of
values,
i.e.,
she
can
also
write:
<html>
<head>
...
</head>
<body prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/">
<div resource="/alice/posts/trouble_with_bob" typeof="schema:BlogPosting">
<h2 property="dc:title">The trouble with Bob</h2>
...
<h3 property="dc:creator schema:creator" resource="#me">Alice</h3>
<div property="schema:articleBody">
<p>The trouble with Bob is that he takes much better photos than I do:</p>
</div>
...
</div>
</body>
</html>
yielding the structure:
Similarly
to
property
,
typeof
also
accepts
a
list
of
values.
For
example,
schema.org
also
has
a
notion
of
a
Person,
similar
to
FOAF;
Alice
may
choose
to
use
both:
<div class="sidebar" prefix="foaf: http://xmlns.com/foaf/0.1/ schema: http://schema.org/"
resource="#me" typeof="foaf:Person schema:Person">
<p>
<span property="foaf:name">Alice Birpemswick</span>,
Email: <a property="foaf:mbox" href="mailto:alice@example.com">alice@example.com</a>,
Phone: <a property="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a>
</p>
...
</div>
A number of vocabularies are very widely used by the Web community with well-known prefixes—the Dublin Core vocabulary is a good example. These common vocabularies tend to be defined over and over again, and sometimes Web page authors forget to declare them altogether.
To alleviate this issue, RDFa introduces the concept of an initial context that defines a set of default prefixes. These prefixes, whose list is maintained and regularly updated by the W3C , provide a number of pre-defined prefixes that are known to the RDFa processor. Prefix declarations in a document always override declarations made through the defaults, but if a web page author forgets to declare a common vocabulary such as Dublin Core or FOAF, the RDFa Processor will fall back to those. The list of default prefixes are available on the Web for everyone to read.
For
example,
the
following
example
does
not
declare
the
dc:
prefix
using
a
prefix
attribute:
<html> <head> ... </head> <body> <div> <h2 property="dc:title">The trouble with Bob</h2> ... <h3 property="dc:creator" resource="#me">Alice</h3> ... </div> </body> </html>
However,
an
RDFa
processor
still
recognizes
the
dc:title
and
dc:creator
short-hands
and
expands
the
values
to
the
corresponding
URLs.
The
RDFa
processor
is
able
to
do
this
because
the
dc
prefix
is
part
of
the
default
prefixes
in
the
initial
context.
Default
prefixes
are
used
as
a
mechanism
to
correct
RDFa
documents
where
authors
accidentally
forgot
to
declare
common
prefixes.
While
authors
may
rely
on
these
to
be
available
for
RDFa
documents,
the
prefixes
may
change
over
the
course
of
5-10
years,
although
the
policy
of
W3C
is
that
once
a
prefix
is
defined
as
part
of
a
default
profile,
that
particular
prefix
will
not
be
changed
or
removed.
Nevertheless,
the
best
way
to
ensure
that
the
prefixes
that
document
authors
use
always
map
to
the
intent
of
the
author
is
to
use
the
prefix
attribute
to
declare
these
prefixes.
Since default prefixes are meant to be a last-resort mechanism to help novice document authors, the markup above is not recommended. The rest of this document will utilize authoring best practices by declaring all prefixes in order to make the document author's intentions explicit.
As we have seen in the previous sections, RDFa Lite is fairly powerful. Alice could indeed express complex sets of structured information. However, there are cases when the set of attributes presented so far does not cover all the needs, or make the resulting HTML structure a bit awkward and possibly error-prone. In those cases additional RDFa possibilities, provided through additional RDFa attributes, may come to the rescue; some of these will be presented in this section.
RDFa Lite does not define a separate class of RDFa processors. In other words conforming RDFa processors are supposed to handle all RDFa features, not only those listed used by RDFa Lite.
content
attribute
When creating her blog, Alice decided to use this simple structure to add Dublin Core information to her blog post (see also Figure 2 ):
<html> <head> ... </head> <body> ... <h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2> <p>Date: <span property="http://purl.org/dc/terms/created">2011-09-10</span></p> ... </body> </html>
However, to do that, Alice had to accept a small compromise. Indeed, although the string "2011-09-10" unambiguously identifies a date for a machine, it does not looks very natural for a human reader. Surely a native English reader would prefer something like "10th of September, 2011". On the other hand, although it is of course possible for a machine to parse and interpret that string as a date, too, it is clearly more complicated to do so. The problem is that, as a default, RDFa uses the textual content of the element for the property value. While this works well in most of the cases, sometimes, like in this example, this has awkward consequences.
To
alleviate
this
problem
RDFa
makes
it
possible
to
re-use
the
content
attribute
of
HTML.
The
blog
entry
could
be
written
as
follows:
<html>
<head>
...
</head>
<body>
...
<h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2>
<p>Date: <span property="http://purl.org/dc/terms/created" content="2011-09-10">10th of September, 2011</span></p>
...
</body>
</html>
The
resulting
structure
is
exactly
the
same
as
before
(i.e.,
Figure
2
).
The
difference
is
the
presence
of
the
content
attribute:
it
instructs
the
RDFa
processor
to
overrule
the
default
behavior
of
using
the
textual
content,
and
to
use
the
value
of
the
content
attribute
instead.
Using
this
attribute
Alice
could
provide
a
more
readable
date,
while
maintaining
an
unambiguous
content
for
machines
using
the
structured
data.
The
content
attribute
has
another
important
usage.
The
"traditional"
approach
to
add
simple
metadata
to
a
Web
page
has
been
to
use
the
document
header
through
the
link
and
the
meta
elements.
While
there
is
no
problem
using
link
in
RDFa
Lite
(which
uses
the
href
attribute,
i.e.,
can
be
used
to
define
"flavored"
links),
the
fact
that,
in
a
conforming
HTML
file,
the
meta
element
may
have
no
text
content
means
that
the
only
way
of
using
the
header
for
such
statements
is
to
use
the
content
attribute.
For
example,
using
the
meta
element
is
the
approach
suggested
by
Facebook
for
the
Open
Graph
Protocol [
OGP
ogp
]
vocabulary;
i.e.,
if
Alice
wants
to
make
use
of
the
"Like"
button
in
her
posts,
this
is
what
she
would
add
to
her
header:
<html> <head prefix="og: http://ogp.me/ns#" > ... <meta property="og:title" content="The Trouble with Bob" /> <meta property="og:type" content="text" /> <meta property="og:image" content="http://example.com/alice/bob-ugly.jpg" /> ... </head> <body> ... </body> </html>
In
this
example
the
prefix
for
the
Open
Graph
Protocol
vocabulary
is
defined
via
the
prefix
attribute.
Alas,
many
authors
forget
to
do
so.
Fortunately,
the
og
prefix
is
part
of
the
initial
context
for
RDFa,
i.e.,
the
resulting
information
will
be
valid
even
without
the
prefix
declaration…
Alice has already put license information on her page:
<p>All content on this site is licensed under <a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/"> a Creative Commons License</a>. ©2011 Alice Birpemswick.</p>
but
she
would
like
to
complete
this
by
recording
the
date
of
her
copyright
statement
as
a
structured
data,
too.
She
can
use
the
date
term
of
Dublin
Core:
<p>All content on this site is licensed under
<a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/">
a
Creative
Commons
License</a>.
©<span
property="dc:date"
>2011</span>
Alice
Birpemswick.</p>
However,
the
value
used
for
the
date
may
be
ambiguous
for
machines.
Of
course,
if
a
program
"knows"
that
that
http://purl.org/dc/terms/date
refers
to
a
date,
then
of
course
it
can
find
out
that
the
string
"2011"
stands
for
a
year.
But
there
may
be
processors
that,
for
example,
provide
a
visual
presentation
of
all
the
structured
data
on
a
specific
page,
and
would
like
to
use
a
different
"widget"
to
represent
a
year
and
again
another
one
to
represent,
say,
an
integer
number.
How
would
such
a
processor
know
which
one
to
choose?
Alice
may
decide
to
be
helpful
by
adding
an
additional
information
to
that
item
in
the
form
of
a
datatype
.
This
additional
information
can
be
conveyed
to
the
RDFa
processor
using
the
datatype
RDFa
attribute
as
follows:
<p>All content on this site is licensed under
<a property="http://creativecommons.org/ns#license" href="http://creativecommons.org/licenses/by/3.0/">
a
Creative
Commons
License</a>.
©<span
property="dc:date"
datatype="xsd:gYear"
>2011</span>
Alice
Birpemswick.</p>
where
xsd:gYear
stands
for
http://www.w3.org/2001/XMLSchema#gYear
,
and
is
one
of
the
standard
datatypes
defined
by
W3C
's
Datatype
specification
[
xmlschema11-2
]
which
contains
such
types
as
booleans,
integers,
dates,
or
doubles.
(
xsd
is
one
of
the
default
prefixes
for
RDFa.)
about
Alice has used the following patterns to define structured data for the individual blogs:
<div resource="/alice/posts/trouble_with_bob"> <h2 property="title">The trouble with Bob</h2> <h3 property="creator" resource="#me">Alice</h3> ... </div>
The
role
of
the
resource
attribute
in
the
div
element
is
to
set
the
"context",
i.e.,
the
subject
for
all
the
subsequent
statements.
Also,
when
combined
with
the
property
attribute,
resource
can
be
used
to
set
the
"target",
i.e.,
the
object
for
the
statement
(much
as
href
).
This
pattern
is
perfectly
fine,
but
it
may
become
too
verbose
in
some
cases.
Indeed,
let
us
suppose
that
Alice
would
like
to
set
up
a
separate
index
page
for
all
her
blogs,
blog
posts,
and
the
only
information
she
would
like
to
put
there,
as
structured
data,
is
references
to
the
titles.
Following
the
same
pattern,
she
would
have
to
do
something
like:
<ul> <li resource="/alice/posts/trouble_with_bob"><span property="title">The trouble with Bob</span></li> <li resource="/alice/posts/jos_barbecue"><span property="title">Jo's Barbecue</span></li> ... </ul>
This of course works, but it is a bit convoluted. Merging the information into one element, i.e.:
<ul resource="/alice/posts/trouble_with_bob"> <li resource="/alice/posts/trouble_with_bob" property="title">The trouble with Bob</li> ... </ul>
would
not
be
correct;
the
combination
of
property
and
resource
would
generate
a
different
statement
than
originally
intended.
RDFa
introduces
a
separate
attribute,
called
about
,
that
can
be
used
as
an
alternative
to
resource
in
setting
the
the
context.
Using
that
attribute,
Alice
could
write:
<ul> <li about="/alice/posts/trouble_with_bob" property="title">The trouble with Bob</li> <li about="/alice/posts/jos_barbecue" property="title">Jo's Barbecue</li> ... </ul>
The
fundamental
difference
between
about
and
resource
is
that
the
former
is
only
used
to
set
the
context,
whether
combined
with
the
property
attribute
on
the
same
element
or
not.
This
also
means
that,
for
such
usage,
about
and
resource
are
interchangeable;
i.e.,
in
her
original
blog
item,
Alice
could
have
chosen
to
write:
<div about="/alice/posts/trouble_with_bob">
<h2 property="title">The trouble with Bob</h2>
<h3 property="creator" resource="#me">Alice</h3>
...
</div>
rel
Another pattern that Alice used in her code is as follows:
<div vocab="http://xmlns.com/foaf/0.1/" resource="#me"> <ul> <li property="knows" resource="http://example.com/bob/#me" typeof="Person"> <a property="homepage" href="http://example.com/bob/"><span property="name">Bob</span></a> </li> <li property="knows" resource="http://example.com/eve/#me" typeof="Person"> <a property="homepage" href="http://example.com/eve/"><span property="name">Eve</span></a> </li> <li property="knows" resource="http://example.com/manu/#me" typeof="Person"> <a property="homepage" href="http://example.com/manu/"><span property="name">Manu</span></a> </li> </ul> </div>
Each
"branch"
in
the
list
sets
a
separate
object
(blank
nodes
in
this
example)
and
the
same
property
(
foaf:knows
)
is
used
to
bind
them
to
the
same
context.
The
property="knows"
had
to
be
repeated
in
each
list
element
to
define
the
corresponding
property.
If
this
structure
is
generated
by
some
CMS
systems,
this
is
of
course
not
a
problem.
However,
if
such
structure
is
authored
manually,
it
is
clearly
error
prone:
the
property
name
can
be
misspelled
or
forgotten.
Instead,
Alice
could
use
another
RDFa
attribute,
namely
rel
.
Using
this
attribute
the
corresponding
HTML
would
look
as:
<div vocab="http://xmlns.com/foaf/0.1/" resource="#me">
<ul rel="knows">
<li resource="http://example.com/bob/#me" typeof="Person">
<a property="homepage" href="http://example.com/bob/"><span property="name">Bob</span></a>
</li>
<li resource="http://example.com/eve/#me" typeof="Person">
<a property="homepage" href="http://example.com/eve/"><span property="name">Eve</span></a>
</li>
<li resource="http://example.com/manu/#me" typeof="Person">
<a property="homepage" href="http://example.com/manu/"><span property="name">Manu</span></a>
</li>
</ul>
</div>
In
contrast
to
property
,
rel
never
considers
the
textual
content
of
an
element
(or
the
value
of
the
content
attribute).
Instead,
if
no
clear
target
has
been
specified
for
a
link
via,
e.g.,
a
resource
or
an
href
attribute,
the
processor
is
supposed
to
go
“down”
and
find
one
or
more
targets
in
the
hierarchy
and
use
those.
This
is
what
happens
in
this
case:
the
knows
attribute
on
the
ul
element
does
not
include
any
obvious
target;
however,
the
processor
finds
those
in
the
individual
li
elements
and
will
use
those.
This
pattern
is
typical
for
the
usage
of
rel
.
In
many
situations,
property
and
rel
are
interchangeable
when
the
intended
structured
data
involves
(flavored)
links.
There
are,
however,
subtle
differences
involving,
for
example,
“chaining”
that
must
be
used
with
care.
The
interested
reader
should
consult
the
relevant
section
of
the
RDFa 1.1
specification
for
further
details.
In
general,
it
is
advised
to
use
property
,
when
possible.
RDFa
benefits
from
the
power
of
RDF [
rdf-primer
rdf11-primer
],
the
W3C
's
standard
for
interoperable
machine-readable
data.
Although
readers
of
this
document
are
not
expected
to
understand
RDF,
some
may
be
interested
in
how
these
two
specifications
interrelate.
RDF, the Resource Description Framework, is the abstract data representation we have drawn out as graphs in the examples above. Each arrow in the graph is represented as a subject-property-object triple: the subject is the node at the start of the arrow, the property is the arrow itself, and the object is the node or literal at the end of the arrow. A set of such RDF triples is often called an "RDF graph", and is typically stored in what is often called a "Triple Store" or a "Graph Store".
Consider the first example graph:
The two RDF triples for this graph are written, using the Turtle syntax [ turtle ] for RDF, is as follows:
<http://www.example.com/alice/posts/trouble_with_bob> <http://purl.org/dc/terms/title> "The Trouble with Bob" ; <http://purl.org/dc/terms/created> "2011-09-10" .
The
TYPE
arrows
we
drew
are
no
different
from
other
arrows.
The
TYPE
is
just
another
property
that
happens
to
be
a
core
RDF
property,
namely
rdf:type
.
The
rdf
vocabulary
is
located
at
http://www.w3.org/1999/02/22-rdf-syntax-ns#
.
The
contact
information
example
from
above
should
thus
be
diagrammed
as:
The point of RDF is to provide a universal language for expressing data and relationships. A unit of data can have any number of properties that are expressed as URLs. These URLs can be reused by any publisher, much like any web publisher can link to any web page, even ones they did not create themselves. Using data in the form of RDF triples, collected from various locations, and also using the RDF query language SPARQL [ sparql11-query ], one can search for "friends of Alice's who created items whose title contains the word 'Bob'," whether those items are blog posts, videos, calendar events, or other data types.
RDF is an abstract data model meant to maximize the reuse of vocabularies. RDFa is a way to express RDF data within HTML, in a way that is machine-readable, and by reusing the existing human-readable data in the document.
As Alice marks up her page with RDFa, she may discover the need to express data, such as her favorite photos, that is not covered by existing vocabularies. If she needs to, Alice can create a custom vocabulary suited for her needs. Once a vocabulary is created, it can be used in RDFa markup like any other vocabulary.
The
instructions
on
how
to
create
a
vocabulary,
also
known
as
an
RDF
Schema,
are
available
in
the
RDF
Primer [
rdf-primer
rdf11-primer
].
At
a
high
level,
the
creation
of
a
vocabulary
for
RDFa
involves:
http://example.com/photos/vocab#
.
Photo
and
Camera
,
as
well
as
the
property
takenWith
that
relates
a
photo
to
the
camera
with
which
it
was
taken.
vocab
attribute
or
with
the
prefix
declaration
mechanism.
For
example:
prefix="photo:
http://example.com/photos/vocab#"
and
typeof="photo:Camera"
.
It is worth noting that anyone who can publish a document on the Web can publish a vocabulary and thus define new data fields they may wish to express. RDF and RDFa allow fully distributed extensibility of vocabularies.
There is a wide variety of tools that can be used to generate or process RDFa data. Good sources for these are the RDFa page of the W3C Semantic Web Wiki , although care should be taken that some tools may be related to a previous version of RDFa. Another source may be the RDFa community site’s implementation page . Both these sources are constantly evolving. By the way, the latter is part of a more general community page that contains further examples for using RDFa, general information, as well as information on how to get involved. In particular, RDFa fragments can be tested using the real-time RDFa 1.1 editor that can also display a visual representation of the underlying structural data.
At the time of publication, the active members of the RDF Web Application Working Group were:
Thanks also to Grant Robertson and Guus Schreiber who, though not part of the Working Group, have provided useful comments on earlier drafts of this note.