XSL and mixed content

The subsequent example shows an element <content> having a mixed content model possibly containing <url> and <emphasis> child nodes:

<content>The <url href="http://w3.org/XML">XML</url> language
  is <emphasis>easy</emphasis> to learn. However you need
  some <emphasis>time</emphasis>.</content>

Embedded element nodes have been set to bold style in order to distinguish them from xs:text nodes. A possible XHtml output might look like:

<p>The <a href="http://w3.org/XML">XML</a>language is<em>easy</em> to learn. However you
need some <em>time</em>.</p>

We start with a first version of an XSL template:

  <xsl:template match="content">
    <p>
      <xsl:value-of select="."/>
    </p>
  </xsl:template>

As mentioned earlier all #PCDATA text nodes of the whole subtree are glued together leading to:

<p>The XML language is easy to learn. However you need some time.</p>

Our next attempt is to define templates to format the elements <url> and <emphasis>:

...
<xsl:template match="content">
  <p>
    <xsl:apply-templates select="emphasis|url"/>
  </p>
</xsl:template>

<xsl:template match="url">
  <a href="{@href}"><xsl:value-of select="."/></a>
</xsl:template>

<xsl:template match="emphasis">
  <em><xsl:value-of select="."/></em>
</xsl:template>
...

As expected the sub elements are formatted correctly. Unfortunately the #PCDATA text nodes between the element nodes are lost:

<p>
  <a href="http://w3.org/XML">XML</a>
  <em>easy</em>
  <em>time</em>
</p>

To correct this transformation script we have to tell the formatting processor to include bare text nodes into the output. The XPath standard defines a function text() for this purpose. It returns the boolean value true for an argument node of type text:

...
<xsl:template match="content">
 <p>
   <xsl:apply-templates select="text()|emphasis|url"/>
 </p>
</xsl:template>
...

The yields the desired output. The text node result elements are shown in bold style

<p>The <a href="http://w3.org/XML">XML</a> language is <em>easy</em> to learn. However
you need some <em>time</em>.</p>

Some remarks:

  1. The XPath expression select="text()|emphasis|url" corresponds nicely to the schema's content model definition:

    <xs:element name="content">
      <xs:complexType mixed="true">
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element ref="emphasis"/>
          <xs:element ref="url"/>
        </xs:choice>
             ...
      </xs:complexType>
    </xs:element>
  2. In most mixed content models all sub elements of e.g. <content> have to be formatted. During development some of the elements defined in a schema are likely to be omitted by accidence. For this reason the typical XPath expression acting on mixed content models is defined to match any sub element nodes:

    select="text()|*"
  3. Regarding select="text()|emphasis|url" we have defined two templates for element nodes <emphasis> and <url>. What happens to those text nodes being matched by text()? These are subject to a default rule: The content of bare text nodes is written to the output. We may however redefine this default rule by adding a template:

    <xsl:template match="text()">
      <span style="color:red">
        <xsl:value-of select="."/>
      </span>
    </xsl:template>

    This yields:

    <p>
       <span style="color:red">The </span>
       <a href="http://w3.org/XML">XML</a>
       <span style="color:red"> language is </span>
       <em>easy</em>
       <span style="color:red"> to learn. However you need some </span>
       <em>time</em>
       <span style="color:red">.</span>
    </p>

    In most cases it is not desired to replace all text nodes throughout the whole document. In the current example we might only format text nodes being immediate children of <content>. This may be achieved by restricting the XPath expression to <xsl:template match="content/text()">.