## Well formed XML documents

The general structure of an XML document is as follows:

We explore a simple XML document representing E-mail type messages:

We deliberately omit the closing element </from>:

Experienced HTML authors may be confused: Older HTML is not an XML standard. Instead HTML belongs to the set of SGML applications. SGML is a much older standard namely the Standard Generalized Markup Language being only of historic interest.

Even if every XML element has a closing counterpart the resulting XML may be invalid:

This type of error is caused by so called improper nesting of elements: The element <from>is being closed before the inner element <to> has been closed. This would contradict representing XML documents as a tree like structures. The parser thus echoes:

We provide two examples illustrating proper and improper nesting of XML documents:

The following example violates the XML proper nesting constraint and thus does not represent a well-formed document:

XML elements may have attributes like date in the following example:

No. 2

### Single and double attribute value quotes

Q:

We recall the problem of nested quotes yielding non-well formed XML code:

<img src="bold.gif" alt="We may use "quotes" here" />

The XML specification defines legal attribute value definitions as:

Literals
 [1] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"' |  "'" ([^%&'] | PEReference | Reference)* "'" [2] AttValue ::= '"' ([^<&"] | Reference)* '"' |  "'" ([^<&'] | Reference)* "'" [3] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [4] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [5] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@\$_%]

Find out how it is possible to set the attribute alt's value to the string We may use "quotes" here.

A:

The production rule for attribute values reads:

 [2] AttValue ::= '"' ([^<&"] | Reference)* '"' |  "'" ([^<&'] | Reference)* "'"

This allows us to use either of two alternatives to delimit attribute values:

<img ... alt="..."/>

Validity constraint: do not use " inside the value string.

<img ... alt='...'/>

Validity constraint: do not use ' inside the value string.

We may take advantage of the second rule:

<img src="bold.gif" alt='We may use "quotes" here' />

Notice that according to ??? the delimiting quotes must not be mixed. The following code is thus not well formed:

<img src="bold.gif'/>

No. 3

### A graphical representation of a memo.

 Q: Draw a graphical representation similar as in Figure 659, “MathML tree graph representation ” of the memo document being given in Figure 674, ““date” and “priority” attributes. ”. A: The memo document's structure may be visualized as: A graphical representation of Figure 674, ““date” and “priority” attributes. ”: The sequence of element child nodes is important in XML and has to be preserved. Only the order of the two attributes date and priority is undefined: They actually belong to the  node serving as a dictionary with the attribute names being the keys and the attribute values being the values of the dictionary. Attributes and quotes As stated before XML attributes have to be enclosed in single or double quotes. Construct an XML document with mixed quotes like  

Constraints being imposed on XML documents:

These constraints are part of the definition of a well formed document. The specification imposes additional constraints for a document to be well-formed.

No. 4

### CDATA usage limitation

Q:

State the obvious limitation of CDATA sections with respect to representing document content. Hint: Is there any content you may not be allowed to use?

A:

The CDATA termination symbol ]]> itself cannot be represented:

<h3><![CDATA[A CDATA section is being terminated by «]]>».]]></h3>
xmlparse /tmp/pre.xhtml
file:///tmp/pre.xhtml:1:63: fatal error org.xml.sax.SAXParseException;
systemId: file:///tmp/pre.xhtml; lineNumber: 2; columnNumber: 63;
The character sequence "]]>" must not appear in content unless used to
mark the end of a CDATA section.

### Note

A CDATA's closing terminal is exactly ]]> : Using e.g. ]] > containing a space does not cause any parsing problem.