When someone looks up an URI, provide useful information using standards.
Include links to other URIs so people can discover more.
Controlled vocabularies can play a pivotal role to address these two principles.
Role of Controlled Vocabularies
Subset of natural language
Created to avoid the problems which rise with the use of natural language
for indexing and retrieval
Improving precision and recall
By providing synonymy control, controlled vocabularies improve recall,
which is the proportion of the documents relevant to the search that
were successfully retrieved. Greater precision, the proportion of retrieved documents
relevant to the search, is achieved through the control of polysemy
Come in different flavours
Within the LIS community we can distinguish between three different types
of controlled vocabularies:
OpenRefine can not make a choice if there’s a match with different headings.
If one concept uses a label as its preferred term, and another uses
the same label to designate a non-preferred term, OpenRefine can not choose
For example, skating is an alternative label of the term with preferred label
Ice skating (sj96005713), but a separate term with the preferred label Skating
(sj85123105) also exists!
Preprocessing of the LCSH
Some changes were made in our version of the LCSH:
Subdivisions are only present if they do not conflict with
an existing main heading with the same label.
Alternate labels were only added to the extent that they do not
cause clashes with other labels.
Configuring LCSH as reconciliation source
Click the RDF button,
select Add reconciliation service, Based on SPARQL endpoint.
Start the reconciliation process for the Categories column with
this new endpoint (Reconcile, Start reconciling, LCSH (preprocessed),
Important: Experiment first with a very little subset of the records,
as in below 100. The process takes a lot of time and if you launch it on
the entire data set, it will take at least a day on your laptop.
Create for example a filter on Record ID with 123 so that you have
results after a couple of seconds
Impact of reconciliation
Creating a link between your catalog record and an entry of the LCSH
unfortunately does not allow you to be connected automatically with all
other records which link to that heading!
Always keep in mind that URLs are unidirectional: you point to the LCSH,
but the LoC is agnostic of the fact that you point to them.
I wanted the act of adding a link to be trivial. So long
as I didn’t introduce some central link database, everything
would scale nicely.
The unidirectionality of links was an explicit design choice. Asking
the linked entity to confirm the link would create too much of a bottleneck
for the Web to grow. Imagine someone at LoC whose job it is to check each link created to the LCSH…
exist, such as
Xanadu by Ted Nelson.
Self-assessment 1: thesauri
What key aspect distinguishes thesauri from other forms of controlled vocabularies?
A formal standard exists to verify their well-formedness.
Yes, the ISO 25964 standard defines exactly how a thesaurus should be constructed.
A thesaurus provides description at a more granular level.
No, this does not depend on the type of vocabulary.
Thesauri can be represented in SKOS.
Thesauri can indeed be represented in SKOS, but so can other types of vocabularies, as illustrated by the LCSH.
Self-assessment 2: non-preferred terms
Why adding non-preferred terms to a vocabulary?
It reduces the negative effect of synonymy on search results.
Yes! Even if an end-user performs a search on a synonym encoded as a non-preferred term in regards to the preferred term used for indexing, the results are the same.
It reduces the negative effect of polysemy on search results.
No! Adding too many and potentially even irrelevant non-preferred terms will increase the negative impact of polysemy.
You can increase the success rate of the reconciliation process.
Yes! That is, if you have configured the process to include the non-preferred terms.
Self-assessment 3: labels and concepts
How do labels and concepts relate to each other in a SKOS vocabulary?
Labels allow defining the structure whereas concepts can express the specific terms used.
No! Completely wrong.
Labels are used to express semantic relations between concepts.
No, semantic relations are expressed by using properties such as broader or narrower.
Concepts are abstract units of thought; labels are strings of characters associated with a concept.
Self-assessment 4: unidirectionality
Why is it important to acknowledge unidirectionality when creating URLs?
It explains why we don’t need SPARQL.
No, it’s the opposite! SPARQL exactly allows us to traverse links in both ways across a graph.
It helps understand why it isn’t straightforward to connect all records together which point out to a central authority file.
Exactly! It’s not because you link to the LCSH, that the LCSH, or other people referring to the same heading, are made aware of its existence.
In order to prevent the creation of dead links.
No, but understanding unidirectionality helps us to realize why dead links are unavoidable.