Handling Documents and Irregular Data

One of the benefits of using XML is the ability to model irregular data hierarchies, including data with these characteristics:

Collections of heterogeneous elements
Structures with many optional elements
Structures where the order is important
Recursive structures
Structures with complex containment requirements

This sounds complex, but most of these conditions are present in XML that represents documents. The following Pole example exhibits many of these characteristics.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="pole.xsl"?>
<document>
  <title>To the Pole and Back</title>
  <section>
    <title>The First Day</title>
    <p>It was the <emph>best</emph> of days, it was the
      <emph>worst</emph> of days.</p>
    <list>
      <item><emph>best</emph> in that the sun was out.</item>
      <item><emph>worst</emph> in that it was 39 degrees below zero.</item>
    </list>
    <section>
      <title>Lunch Menu</title>
      <list>
        <item>ice cream</item>
        <item>popsicles</item>
      </list>
    </section>
  </section>
  <section>
    <title>The Second Day</title>
    <p>Ditto the first day.</p>
  </section>
</document>

Comparing this sample with the characteristics of irregular data, you can see that it contains heterogeneous collections of elements—a section can contain an arbitrary collection of <title> elements, <p> elements, <list> elements, and so on. Many elements are indeed optional—a section need not contain <p> or <list> elements, or other "section" elements. The order of most elements is important to preserve in the output—the first section comes before the second section. The structure is recursive because a "section" element can contain another "section". The <emph> element is probably allowed anywhere—indicating a complex set of containment requirements.

XSLT's ability to handle such irregular and recursive data makes it useful for transforming documents into a display language such as HTML—hence the name and origins of the Extensible Stylesheet Language.

The mechanism for handling data-driven transformations is similar to subroutines in programming languages. Template fragments (subroutines) can be defined and called. Instead of calling the templates by name, however, the most appropriate fragment is chosen based on the type of element for which the template is designed.

To manage this data, start by writing an output template for the HTML "wrapper," which inserts the document title into the output in two places, and then asks XSLT to find the appropriate template (call the appropriate subroutine) for "section" elements. For example:

<HTML>
  <HEAD>
    <TITLE><xsl:value-of select="document/title"/></TITLE>
  </HEAD>
  <BODY>
    <H1><xsl:value-of select="document/title"/></H1>
    <xsl:apply-templates select="document/section"/>
  </BODY>
</HTML>

The <xsl:apply-templates> element selects the "section" children of the document (not all of them, just the top level) and asks XSLT to find and apply an appropriate template. Now it is necessary to write a template that is appropriate for "section" elements.

<xsl:template match="section">
  <DIV>
    <xsl:apply-templates />
  </DIV>
</xsl:template>

The XSLT processor will output this template fragment for each of the "section" elements selected by the <xsl:apply-templates> element. The value of the match attribute indicates the kinds of nodes for which this template is appropriate. In this case it indicates that this template is appropriate for "section" elements. The nodes selected by <xsl:apply-templates> are matched up with the correct template.

Note that the template for sections itself contains an <xsl:apply-templates> element. Without a select attribute, all the children will be selected, and the XSLT processor will take each one in order (title, p, list, section) and look for an appropriate template. There already is a section template—this one—and the XSLT processor will recursively apply it, resulting in a nested structure of <DIV> elements that mirrors the nested structure of "section" elements in the source document.

Now define some more templates to handle other element types.

<xsl:template match="title">
  <H2><xsl:apply-templates /></H2>
</xsl:template>
<xsl:template match="p">
  <P><xsl:apply-templates /></P>
</xsl:template>
<xsl:template match="list">
  <UL>
    <xsl:for-each select="item">
      <LI><xsl:apply-templates /></LI>
    </xsl:for-each>
  </UL>
</xsl:template>
<xsl:template match="emph">
  <I><xsl:apply-templates /></I>
</xsl:template>

In each case you can include <xsl:apply-templates> to continue selecting the children (whatever they may be) and finding the appropriate template.

The <xsl:apply-templates> element is not limited to selecting element children, but can select other child nodes as well, including text. You can add a template to copy text children to the output.

<xsl:template match="text()"><xsl:value-of /></xsl:template>

When run against the Pole sample document, the templates above produce the following output:

<HTML>
  <HEAD>
    <TITLE>To the Pole and Back</TITLE>
  </HEAD>
  <BODY>
    <H1>To the Pole and Back</H1>
    <DIV>
      <H2>The First Day</H2>
      <P>It was the <I>best</I> of days, it was the
        <I>worst</I> of days.</P>
      <UL>
        <LI><I>best</I> in that the sun was out.</LI>
        <LI><I>worst</I> in that it was 39 degrees below zero.</LI>
      </UL>
      <DIV>
        <H2>Lunch Menu</H2>
        <UL>
          <LI>ice cream</LI>
          <LI>popsicles</LI>
        </list>
      </DIV>
    </DIV>
  </BODY>
</HTML>

By recursively processing the source document with <xsl:apply-templates>, this style sheet essentially converts the element types in the source XML to HTML element types. Even though this example is fairly trivial, you can already see some additional structural modifications occurring, notably the creation of the <HEAD> element and the duplication of the document title in both the <H1> element and the <TITLE> element.

This collection of templates can be packaged into a style sheet file by placing them within an <xsl:stylesheet> element. The XSLT namespace must be declared here.

The top-level template (or root template) needs to be marked as such by placing it within a template and giving it the special pattern / to indicate that this is the template for the document root. Here is the final complete style sheet.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/">
    <HTML>
      <HEAD>
        <TITLE><xsl:value-of select="document/title"/></TITLE>
      </HEAD>
      <BODY>
        <H1><xsl:value-of select="document/title"/></H1>
        <xsl:apply-templates select="document/section"/>
      </BODY>
    </HTML>
  </xsl:template>
  <xsl:template match="title">
  </xsl:template>
  <xsl:template match="section">
    <DIV>
      <H2><xsl:value-of select="title"/></H2>
      <xsl:apply-templates />
    </DIV>
  </xsl:template>
  <xsl:template match="p">
    <P><xsl:apply-templates /></P>
  </xsl:template>
  <xsl:template match="list">
    <UL>
      <xsl:for-each select="item">
        <LI><xsl:apply-templates /></LI>
      </xsl:for-each>
    </UL>
  </xsl:template>
  <xsl:template match="emph">
    <I><xsl:apply-templates /></I>
  </xsl:template>  

</xsl:stylesheet>

This example illustrates the data-driven model of XSLT processing. For most of the structure of a document, you won't know what could be coming next. Instead you can create isolated templates for the types of nodes you expect to see in the output without too much consideration of their structure. In places where the structure is locally known, you can use <xsl:for-each> and <xsl:value-of> to populate the template. For instance, "list" and "item" elements appear in a regular and predictable structure. The ability to switch smoothly between data-driven and template-driven transformation is an important feature of XSLT.

Try it! This style sheet and data can be found in the To The Pole sample at XSLT Developer's Guide Samples.

See the XSL version of this sample at XSL Developer's Guide Samples.