XML DTD Document Type Definition


A DTD is a Document Type Definition that defines the structure and the legal elements and attributes of an XML document. With a DTD, independent groups of people can agree on a standard DTD for interchanging data. An application can use a DTD to verify that XML data is valid.

You can use an internal or an external DTD. Below is an example of an internal DTD as provided by w3schools.com.

<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

Seen from a DTD point of view, all XML documents are made up by the following building blocks:

  • Elements
  • Attributes
  • Entities
  • PCDATA
  • CDATA

Elements were briefly discussed in the HTML Introduction post. Attributes were also briefly discussed in a previous post HTML Introduction Part 2.

HTML Entities

HTML has special characters that have special meaning in an HTML document. The < and the > have a special meaning. The HTML entity:

"&nbsp;"

is the “no-breaking-space” entity used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser. In XML, The less than entity is < and the greater than entity is >. There is also the following entities for ampersand, double quote and apostrophe.

&amp;	&
&quot;	"
&apos;	'

PCDATA

As w3schools.com says: “PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters; these need to be represented by the & < and > entities, respectively.

CDATA

CDATA means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

An XML Schema describes the structure of an XML document, just like a DTD. An XML Schema is an XML-based alternative to DTD. XML Schemas are more powerful than DTDs. One of the greatest strength of XML Schemas is the support for data types and another is that they are written in XML. These comments from w3schools.com lead us to the blog post XSD Introduction.

Leave a comment

Your email address will not be published. Required fields are marked *