Link Rewriting

last modified on Jun 13, 2017

When exporting, importing or publishing data that links to each other, it might be neccessary to rewrite the links between the files. E.g., consider you have two local files called A.xml and I.xml, and A.xml contains a reference to I.xml:

A.xml

  <ref target="I.xml#inquisition">siehe Inquisition</ref>

Now, when A.xml has been imported as textgrid:4711.0 and I.xml as textgrid:4721.0 this link should read

textgrid:4711.0 (former A.xml)

  <ref target="textgrid:4721.0#inquisition">siehe Inquisition</ref>

since the original filename is no longer known and now TextGrid URIs are the means of reference. Similarly, after publication those URIs should be rewritten to PIDs.

Where URIs are rewritten depends on the content type of the respective file. E.g., in TEI files, we should rewrite (among others) //ref/@target, while we should rewrite, e.g., //img/@src and //a/@href in XHTML.

Choosing a rewrite method

By default, the Import and Export tool will select an appropriate rewrite method for your document's detected content type. You can modify this for individual items by clicking the corresponding table cell in the import or export tool, you'll see a combo box in which you can chose from the built-in rewriting specifications.

You can also specify the URI to a rewriting spec by typing it into the cell, e.g., internal:tei#tei for the built-in TEI transformation, or, say, textgrid:9876#myformat for the spec with the ID myformat in the object at textgrid:9876.

Rolling your own rewrite method

To specify your own import method, you need to write an XML file that conforms to the import specification schema. We'll use the specification for TEI documents as an example since it demonstrates all available features:

<rw:importSpec xmlns:rw="http://textgrid.info/import"
    xmlns:tei="http://www.tei-c.org/ns/1.0" 
    xmlns:xlink="http://www.w3.org/1999/xlink">

    <rw:xmlConfiguration xml:id="tei"
        description="TEI P5 (Basic rewriting + XLink)">

This first defines the importSpec and declares the required namespaces. We then start a xmlConfiguration (i.e. the spec for a single format). This requires an id (here tei), and you should also provide a description that can be shown in the user interface.

Now we describe the elements and attributes that should be rewritten:

        <rw:element name="tei:ref" method="none">
            <rw:attribute name="target" method="token" />
        </rw:element>

The element tei:name is associated with the method='none', which means its contents shouldn't be rewritten. However, it has an attribute named target that can contain URIs which we should rewrite. The token method means that the attribute can contain a whitespace-separated list of URIs which should be rewritten separately. The alternatives here are none (no rewriting), token (white-space separated list of values) or full (whole attribute value is one value).

        <rw:element name="tei:idno" method="full">
            <rw:mode>import</rw:mode>
            <rw:required attribute="type" pattern="textgrid|handle" />
        </rw:element>

For the idno element, we only want rewriting when we import (or publication) – on export, existing values should be kept as-is. Additionally, we only want rewriting when the idno element has a type attribute that matches the regular expression textgrid|handle, i.e. we only want to rewrite TextGrid URIs and Handles.

Sometimes you'll want to handle any element, without the need to list them explicitely. You can do so as illustrated here:

        <rw:any-element name="default" method="none">
            <rw:attribute name="xlink:href" method="full" />
            <rw:attribute name="url" method="full" />
            <rw:attribute name="facs" method="full" />
        </rw:any-element>

I.e., we'd like to support the attributes xlink:hrefurl and facs on just any attribute.

Here's the rest of the TEI spec:

Quelle erweitern

        <rw:element name="tei:ptr" method="none">
            <rw:attribute name="target" method="token" />
        </rw:element>
        <rw:element name="tei:link" method="none">
            <rw:attribute name="target" method="token" />
            <rw:attribute name="targets" method="token" />
        </rw:element>
        <rw:element name="tei:graphic" method="none">
            <rw:attribute name="url" method="full" />
        </rw:element>
        <rw:element name="tei:gloss" method="none">
            <rw:attribute name="target" method="token" />
        </rw:element>
        <rw:element name="xi:include" method="none">
            <rw:attribute name="href" method="full" />
        </rw:element>
    </rw:xmlConfiguration>
</rw:importSpec>

Known Limitations

  • There's no support for xml:base yet.
  • We don't support patterns or XPath expressions for element or attribute values.