Linked Data for Librarians

by Seth van Hooland and Ruben Verborgh

PArt 1 – Module 3: Possibilities and limitations of RDF

Linked Data for Librarians

by Seth van Hooland and Ruben Verborgh

Part 1 – Module 3: Possibilities and limitations of RDF

Institute of Museum and Library Services Drexel University College of Computing & Informatics

Tim Berners-Lee proposed
4 principles to publish Linked Data.

  1. Use URIs as names for things.
  2. Use HTTP URIs so people
    can look up those names.
  3. When someone looks up a URI,
    provide useful information using the standards.
  4. Include links to other things,
    so people can discover more.

(Non-)information resources
should be uniquely identifiable.

The URL is part of a broader family
of technologies related to identification.

The broadest family is IRI,
which supports non-ASCII characters.

An HTTP URL identifies and locates a resource anywhere in the universe.

Using HTTP URLs ensures that
anybody can look up the resource.

Dereferencing a URI should lead to
useful information about that resource.

By including links to other resources,
we create a Web of Data.

An immense amount of Linked Data
is available on the Web for reuse.

No Linked Data set is ever complete.
We make the open-world assumption.

The Dublin Core terms are a set of
15 common metadata properties.

Schema.org is a single vocabulary
that covers many different fields.

RDF Schema is an RDF vocabulary
to model other RDF vocabularies.

Practitioners in the RDF world often
refer to vocabularies as ontologies.

The Web Ontology Language (OWL) provides concepts for ontologies.

SPARQL Protocol And Query Language: query and update RDF datasets.

The SPARQL language defines
forms a query can take.

There are currently 4 read-only query forms:

SELECT
find values that satisfy conditions
CONSTRUCT
create triples that satisfy conditions
ASK
check whether data exists
DESCRIBE
show information about a resource

The main unit of a SPARQL query
is a Basic Graph Pattern (BGP).

This query searches DBpedia for artists influenced by Picasso.

PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?person ?personLabel WHERE {
  ?person a dbo:Artist.
  ?person foaf:name ?personLabel.
  ?person dbo:influencedBy dbr:Pablo_Picasso.
}

Here is the live result of that query.

This query searches Wikidata for artists influenced by Picasso.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?person ?personLabel WHERE {
  ?person wdt:P106/wdt:P279* wd:Q483501.
  ?person wdt:P737 wd:Q5593.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Here is the live result of that query.

Why are the results
and the queries different?

[cars in rainbow colors]
by PictureWendyCC BY-NC 2.0

Heterogeneity exists on multiple levels
across metadata collections.

Heterogeneity is our best friend
and our largest enemy.

[sad and smiley face]
Kunsthal by PictureWendyCC BY-NC 2.0

Standardization and agreement
have provided us with foundations.

The current level of standardization
still leaves some areas uncovered.

Which vocabularies should we use
to describe our metadata, and how?

Web APIs are the Achilles’ heel
of interoperability on the Web.

See more in the REST module.

Self-assessment 1: HTTP URLs

Why are HTTP URLs important for Linked Data?

  1. HTTP URLs are not important for Linked Data.
    • No. While not necessary for the RDF data model itself,
      the Linked Data principles mandate HTTP URLs.
  2. Because they guarantee consistent semantics.
    • No. Semantic consistency comes from the reuse of unique concept identifiers (URIs), not specifically from HTTP URLs.
  3. So we can look up (un)known concepts.
    • Yes. HTTP URLs can be dereferenced: to obtain more data about a concept, follow its URL.

Self-assessment 2: OWL and RDFS

Which of the following propositions are true?

  1. RDFS and OWL are an answer to Schema.org.
    • No: Schema.org is (mainly) a vocabulary with terms to describe concrete things, such as books, people, articles, … RDFS and OWL contain terms to model other ontologies or vocabularies (such as Schema.org).
  2. OWL replaces RDFS.
    • No: RDFS is still required to express basic ontological relations. As such, RDFS and OWL are often used side by side.
  3. OWL extends RDFS.
    • Yes: OWL extends RDFS with more advanced ontological concepts.

Self-assessment 3: SPARQL

What is SPARQL?

  1. A data model.
    • No. The data model is RDF.
  2. A query language.
    • Yes, SPARQL is a query language for RDF.
  3. A protocol.
    • Yes, SPARQL is a protocol to execute SPARQL queries over HTTP.

Self-assessment 4: SPARQL queries

Does the same SPARQL query return the same results
on different sources about the same metadata?

  1. In theory, but not in practice.
    • Yes: in theory, SPARQL queries should be interoperable across datasources. Even if different sources use different ontologies, reasoning can bridge the gap.
  2. In practice, but not in theory.
    • No: in practice, datasets use different ontologies and most endpoints do not have reasoning enabled to bridge the gap.
  3. In theory and in practice.
    • No: in practice, datasets use different ontologies and most endpoints do not have reasoning enabled to bridge the gap.