Mapping language

Template

As any XSLT stylesheet we start by declaring the xml header and the xml xsl:stylesheet element.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:spinque="com.spinque.tools.importStream.EmitterWrapper"
    xmlns:su="com.spinque.tools.importStream.Utils"
    extension-element-prefixes="spinque">

  <xsl:output method="text" encoding="UTF-8"/>

...

</xsl:stylesheet>

We include two Spinque namespaces. By declaring the spinque namespace we make the methods to generate triples available to our stylesheet.

With the su namespace we make additional utility functions available. Examples are methods to transform strings, such as changing the case, splitting, and normalizing. There are also methods for processing numbers, dates and person names.

Additional XML namespaces

In the xsl:stylesheet element you can also declare additional XML namespaces you need later on. This would be the namespaces that are in your data and you want match on, or the namespaces that you want to use in the generated triples.

<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:spinque="com.spinque.tools.importStream.EmitterWrapper"
    xmlns:su="com.spinque.tools.importStream.Utils"

    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:oai="http://www.openarchives.org/OAI/2.0/"
    xmlns:schema="https://schema.org/"

    extension-element-prefixes="spinque">

Templates

In XSLT you define templates that match with elements in your XML data. In this case you define templates that match the elements in your virtual XML fragments. For example, to match a row element coming from a CSV container you define a template as follows:

<xsl:template match="row">
  ...
</xsl:template>

Generating triples

The key differences between a normal XSLT file and a Spinque data mapping is that we do not write the normal textual output of the XSLT, but instead generate triples. There are two elements to generate a triple. With spinque:attribute you generate a triple with a literal value. In other words you define an attribute for an object such as a string value, a date or geographic coordinates. The parameters subject, attribute, value, and type are mandatory. lang is an optional parameter.

<spinque:attribute subject="..." attribute="..." value="..." type="..." lang ="..."/>

With spinque:relation you generate a relation between two objects.

<spinque:relation subject="..." predicate="..." object="..."/>

Spinque automatically creates the required resources when you output an attribute or a relation. When you generate an attribute this means there is a resource created for the subject and the attribute. When you generate a relation there are resources created for the subject, predicate and the object.

Given the virtual XML fragment for a row from a CSV file:

<row>
  <field column="0" name="productID">product12</field>
  <field column="1" name="productName">Bicycle</field>
  <field column="2" name="supplier">supplier45</field>
</row>

We would generate an attribute for the name and a relation to the supplier:

<xsl:template match="row">
  <xsl:variable name="productID" select="field[@name='productID']"/>
  <xsl:variable name="supplierID" select="field[@name='supplier']"/>
  <spinque:attribute subject="{$productID}" attribute="name" value="{field[@name='productName']}" type="string"/>
  <spinque:relation subject="{$productID}" predicate="supplier" object="{$supplierID}"/>
</xsl:template>

Note that we put the variables in the spinque:attribute and spinque:relation between {…}, e.g. subject="{$productID}". For constants values you do not put brackets around e.g. predicate="supplier".

In the example we used xsl variables to store the identifiers of the resources. This is not required, but does make your mapping file more readable.

Creating a graph

The relation in the example above creates a resource for the supplier. At this time we do not have any information about the supplier yet, other than the identifier. Now we could add a data source with information about suppliers.

<row>
  <field column="0" name="supplierID">supplier45</field>
  <field column="1" name="name">bicycles.com</field>
</row>

In the data mapping for this data source we would generate an attribute with the name for the supplier.

<xsl:template match="row">
  <xsl:variable name="supplierID" select="field[@name='supplierID']"/>
  <spinque:attribute subject="{$supplierID}" attribute="name" value="{field[@name='name']}" type="string"/>
</xsl:template>

Because we use the same identifier the two data sources are automatically integrated in the resulting data graph.

Generated data graph

A special relation is the type of a resource. We use http://www.w3.org/1999/02/22-rdf-syntax-ns#type from the RDF vocabulary for this purpose. Although it is not required to specify a type relation, we stronly advise you to do so. Types are very useful when you model search strategies over your data graph.

<spinque:relation subject="{$subjectID}" predicate="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" object="product"/>

spinque:attribute

The {{<spinque:attribute …/>}} expects at least 4 XML-attributes:

  • subject : the id of the object to which the attribute should be added

  • attribute : the name of the attribute

  • value: the actual value

  • type : can be string, date, integer, double or point (geometric object).

Besides these XML-attributes, it is also possible to specify a few others:

  • p : a probability of how sure it is that the value exists

  • flags : these can be multiple space- or comma-separated values.
    Currently supported are: – allowEmpty : to not throw away empty strings) – noNormalizeWhitespace : to preserve whitespace as it was) example flags="allowEmpty, noNormalizeWhitespace"

spinque:relation

The {{<spinque:relation …/>}} expects at least 3 XML-attributes:

  • subject : the id of the object to which the relation should be added

  • predicate : the name of the relation

  • object: the id of thetarget.

Besides these XML-attributes, it is also possible to specify a few others:

  • p : a probability of how sure it is that the relation exists

Debug statements

Sometimes it has handy to get some intermediate output. The spinque:debug element allows you to generate output anywhere in your stylesheet. When you run the debugger or the indexer from the Spinque CLI the debug messages are shown.

<spinque:debug message="this is debug output: {$subjectID}"/>