Link Rewriting
last modified on Jun 13, 2017
When exporting, importing or publishing data that links to each other,
it might be neccessary to rewrite the links between the files. E.g.,
consider you have two local files called A.xml
and I.xml
,
and A.xml
contains a reference to I.xml
:
A.xml
<ref target="I.xml#inquisition">siehe Inquisition</ref>
Now, when A.xml
has been imported as textgrid:4711.0
and I.xml
as textgrid:4721.0
this link should read
textgrid:4711.0 (former A.xml)
<ref target="textgrid:4721.0#inquisition">siehe Inquisition</ref>
since the original filename is no longer known and now TextGrid URIs are the means of reference. Similarly, after publication those URIs should be rewritten to PIDs.
Where URIs are rewritten depends on the content type of the respective
file. E.g., in TEI files, we should rewrite (among others)
//ref/@target
, while we should rewrite, e.g., //img/@src
and //a/@href
in XHTML.
Choosing a rewrite method
By default, the Import and Export tool will select an appropriate rewrite method for your document's detected content type. You can modify this for individual items by clicking the corresponding table cell in the import or export tool, you'll see a combo box in which you can chose from the built-in rewriting specifications.
You can also specify the URI to a rewriting spec by typing it into the
cell, e.g., internal:tei#tei
for the built-in TEI transformation, or,
say, textgrid:9876#myformat
for the spec with the ID myformat
in the
object at textgrid:9876
.
Rolling your own rewrite method
To specify your own import method, you need to write an XML file that conforms to the import specification schema. We'll use the specification for TEI documents as an example since it demonstrates all available features:
<rw:importSpec xmlns:rw="http://textgrid.info/import"
xmlns:tei="http://www.tei-c.org/ns/1.0"
xmlns:xlink="http://www.w3.org/1999/xlink">
<rw:xmlConfiguration xml:id="tei"
description="TEI P5 (Basic rewriting + XLink)">
This first defines the importSpec and declares the required namespaces.
We then start a xmlConfiguration (i.e. the spec for a single format).
This requires an id (here tei
), and you should also provide a
description that can be shown in the user interface.
Now we describe the elements and attributes that should be rewritten:
<rw:element name="tei:ref" method="none">
<rw:attribute name="target" method="token" />
</rw:element>
The element tei:name
is associated with the method='none'
, which
means its contents shouldn't be rewritten. However, it has an attribute
named target
that can contain URIs which we should rewrite.
The token
method means that the attribute can contain a
whitespace-separated list of URIs which should be rewritten separately.
The alternatives here are none
(no rewriting), token
(white-space
separated list of values) or full
(whole attribute value is one
value).
<rw:element name="tei:idno" method="full">
<rw:mode>import</rw:mode>
<rw:required attribute="type" pattern="textgrid|handle" />
</rw:element>
For the idno
element, we only want rewriting when we import (or
publication) – on export, existing values should be kept as-is.
Additionally, we only want rewriting when the idno
element has
a type
attribute that matches the regular
expression textgrid|handle
, i.e. we only want to rewrite TextGrid URIs
and Handles.
Sometimes you'll want to handle any element, without the need to list them explicitely. You can do so as illustrated here:
<rw:any-element name="default" method="none">
<rw:attribute name="xlink:href" method="full" />
<rw:attribute name="url" method="full" />
<rw:attribute name="facs" method="full" />
</rw:any-element>
I.e., we'd like to support the attributes xlink:href
, url
and facs
on just any attribute.
Here's the rest of the TEI spec:
Quelle erweitern
<rw:element name="tei:ptr" method="none">
<rw:attribute name="target" method="token" />
</rw:element>
<rw:element name="tei:link" method="none">
<rw:attribute name="target" method="token" />
<rw:attribute name="targets" method="token" />
</rw:element>
<rw:element name="tei:graphic" method="none">
<rw:attribute name="url" method="full" />
</rw:element>
<rw:element name="tei:gloss" method="none">
<rw:attribute name="target" method="token" />
</rw:element>
<rw:element name="xi:include" method="none">
<rw:attribute name="href" method="full" />
</rw:element>
</rw:xmlConfiguration>
</rw:importSpec>
Known Limitations
- There's no support for
xml:base
yet. - We don't support patterns or XPath expressions for element or attribute values.