These days taxonomy management systems, such as Graphologi, use a set of standards to store data. These are SKOS, SKOS-XL and OWL.
If you haven't heard of SKOS take a look at this tutorial:
Getting Started with Standards In Graphologi
Behind the standards is something called RDF (Resource Description Framework). RDF data is stored in something called a ‘triple store’. Triples stores can almost always be queried using a language called SPARQL. SPARQL is a very powerful language, but can be a bit intimidating to start with.
With regard to taxonomies, it can be very useful to be able to use SPARQL to query the data behind them for various reasons. One reason is for integration purposes, as SPARQL gives the most flexibility in terms of asking for the data that an integration requires. The second relates to a feature in Graphologi - reporting - which is based upon SPARQL queries. If you can write some SPARQL you can create all sorts of reports for different purposes - additional business rules, quality checks, or management reporting.
This tutorial covers some of the basics of SPARQL that will be helpful in writing queries aimed at generating reports in Graphologi, but the principles are applicable more widely.
Within Graphologi you can create and run SPARQL queries using the SPARQL dashboard.
What does a SPARQL query look like? Here is a very basic one:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT * WHERE { ?s a skos:Concept . } LIMIT 10
What is going on here? Well, this query is saying ‘give me back 10 SKOS concepts’, or, to be more precise, the IRI of those concepts.
There are several key aspects:
The first line is something called a prefix statement. This simply allows you to write queries in a more compact way. Without prefixes you have to enter full IRIs which can be time consuming and less readable. Here we have a prefix for the SKOS namespace - skos - and you can see we have used it on line 3. In Graphologi it will automatically detect well known prefixes and add in the PREFIX line - just start typing skos: and the rest just happens - meaning you don’t have to remember them.
The second line defines the type of query and the results we want back. SELECT is the standard query for most purposes. SELECT * WHERE means ‘give me back everything’. Often you don’t want to do that but for this simple example it’s fine.
The third line is the interesting bit! This is saying find all things that are concepts. ?s is a variable that will be populated with results. The a in the middle is shorthand for type and skos:Concept is using our prefix to say SKOS concepts. Note the period (full stop) at the end of the line. This is important when you have multiple lines in this part of the query.
The last line is restricting the query to 10 results and it is always a good idea, certainly when first writing a query to put a LIMIT on it to avoid queries running amok (which can happen).
Let’s write something a bit more useful…
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT * WHERE { ?s a skos:Concept . ?s skos:prefLabel ?label . FILTER (LANG(?label) = 'en') } ORDER BY ?label LIMIT 10
This query is asking for 10 concepts where the results will include the IRI, the preferred label in English and where the labels are ordered.
To explain some of the additional parts this query contains:
Lines 3 and 4 contain what is called a ‘join’. This happens because they both use the variable ?s. What this means is that results must have both a type of skos:Concept and a preferred label - skos:prefLabel is the property for that.
Generally the lines with variables form one of a few patterns, e.g. variable name -> property -> variable name, or variable name -> property -> value.
The FILTER line is saying ‘filter labels to only those that have a language code of ‘en’.
The ORDER BY line is saying order by the values in the ?label variable (i.e. order preferred labels)
One of the issues with the last query is that it will return concepts from all taxonomies (concept schemes). Most of the time we probably want to narrow it down to a single taxonomy. So let’s do that with the next query.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT * WHERE { ?s a skos:Concept . ?s skos:inScheme <http://skos.um.es/unescothes/CS000> . ?s skos:prefLabel ?label . FILTER (LANG(?label) = 'en') } ORDER BY ?label LIMIT 10
In the above query we have added another ‘graph pattern’. This introduces another useful property - skos:inScheme - which every concept that belongs to a taxonomy will have. In this example our query uses the UNESCO thesaurus. The bold text is the IRI of the UNESCO thesaurus. The IRI of a taxonomy is the identifier for it. It is also the IRI you enter when creating a taxonomy in Graphologi and can be found on the details screen, as shown below.
If you’re wondering why skos:inScheme, then the reason for that is that, in SKOS, there is a thing called a ‘concept scheme’, which is the technical name for what we call a taxonomy. Concept schemes have a type similar to that which we used for our concepts, but for concept schemes it is skos:ConceptScheme.
If you want to retrieve all the taxonomies in a project in Graphologi you need a query similar to the following:
PREFIX dct: <http://purl.org/dc/terms/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT * WHERE { ?s a skos:ConceptScheme . ?s dct:title ?title . } ORDER BY ?title LIMIT 10
In the above we have a new prefix, whilst line 4 contains the pattern looking for concept schemes and line 5 uses a new property - dct:title - which, in Graphologi, is the property that all taxonomies with title use. The dct namespace is Dublin Core Terms - a very widely used metadata model.
One other thing to note about this query is that we have taken the language filter off, so this will return all the titles for each taxonomy.
What if we wanted to get all the titles for the UNESCO taxonomy. We need a query like the following:
PREFIX dct: <http://purl.org/dc/terms/> SELECT * WHERE { <http://skos.um.es/unescothes/CS000> dct:title ?title . } ORDER BY ?title LIMIT 10
This introduces a new idea - using IRIs on the left of a graph pattern. As we know the IRI of the UNESCO taxonomy we can directly ask for the titles of it.
Let’s revisit the following query:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT * WHERE { ?s a skos:Concept . ?s skos:inScheme <http://skos.um.es/unescothes/CS000> . ?s skos:prefLabel ?label . FILTER (LANG(?label) = 'en') } ORDER BY ?label LIMIT 10
By default this query will return rows of results similar to the following.
As we used SELECT * this returns all the ‘bound’ variables. Bound simply means when a variable has a value.
We can write this query in a different way to be more explicit about the values we want returned. Take a look at the second line in the following example.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?s ?label WHERE { ?s a skos:Concept . ?s skos:inScheme <http://skos.um.es/unescothes/CS000> . ?s skos:prefLabel ?label . FILTER (LANG(?label) = 'en') } ORDER BY ?label LIMIT 10
Note how we now have SELECT ?s ?label, which means we are telling the query to return just the values for ?s and ?label in each row of the results. This will actually give us the same results as the previous query because we only have two results. However, if you change it to SELECT ?label then you would just get back the labels.
It is generally a good idea to be explicit in the results you want, because it will potentially reduce the number of results and improve the performance of queries.
We will now cover some common properties. First is getting the top concepts in a taxonomy.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?s ?label WHERE { ?s a skos:Concept . ?s skos:topConceptOf <http://skos.um.es/unescothes/CS000> . ?s skos:prefLabel ?label . FILTER (LANG(?label) = 'en') } ORDER BY ?label
The following gets the narrower concepts of a particular concept - http://skos.um.es/unescothes/C00016. Replace the IRI of the concept to that which you want to use.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?s ?label WHERE { <http://skos.um.es/unescothes/C00016> skos:narrower ?s . ?s skos:prefLabel ?label . FILTER (LANG(?label) = 'en') } ORDER BY ?label
The following table lists some of the other common SKOS properties you might want to use.
Property | Description |
---|---|
skos:broader | Broader concepts. This property is the inverse of skos:narrower. |
skos:altLabel | Alternative label. |
skos:hiddenLabel | Hidden label. |
skos:related | This property is used for related concepts in the same taxonomy. |
skos:relatedMatch | Related concepts in other taxonomies. |
skos:exatchMatch | Concepts in other taxonomies that are considered equivalent. |
skos:closeMatch | Concept is other taxonomies that are closely related. |
skos:changeNote | Notes detailing changes to a concept. |
skos:historyNote | Notes detailing the history of a concept. |
skos:definition | A definition of a concept. |
The general pattern to use these properties is the same as the other examples, as shown in the following example. Simply replace {property} with the relevant property from the table.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?s ?o WHERE { ?s {property} ?o . }
You can combine patterns together in a query as we showed further above, but remember that all the patterns must occur for a result to be included. If you think that some of the patterns may not be present then you can include the pattern using the OPTIONAL keyword, as in the following example:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?s ?label WHERE { ?s a skos:Concept . OPTIONAL { ?s skos:altLabel ?label . FILTER (LANG(?label) = 'en') } } ORDER BY DESC(LCASE(?label)) LIMIT 10
The above will include results whether or not they have alternative labels. You can include multiple OPTIONAL clauses if there are multiple properties that may not exist. Note that we are ordering our results by descending (DESC) lower case (LCASE) labels.
The order of patterns and statements in a query is important, but for the vast majority of queries the right order naturally occurs due to how you think about the query. That is, patterns, followed by filters, followed by OPTIONAL clauses (which may also include filters).
If you are comfortable with all that you have read above, then the next step is to read the official SKOS primer at:
https://www.w3.org/TR/skos-primer/
This will give you a thorough introduction to SKOS and how it is represented in RDF.