DTD – XML Building Blocks

The elements are the main building blocks of both XML and HTML documents.

The Building Blocks of XML Documents:

The XML documents, when viewed from a DTD point of view, are made up by the below-listed building blocks:

  • Elements
  • Attributes
  • Entities
  • PCDATA
  • CDATA

Elements:

The main building blocks of both XML and HTML documents are the elements. The text, or other elements, can be included in an element or it can be empty.

Example: HTML elements:

"body" and "table"

Example: XML elements:

"note" and "message"

Example: empty HTML elements:

"hr", "br" and "img"

Example:

hello world
how are you

Attributes:

To provide extra information about elements, the Attributes are used. They are always placed inside the opening tag of an element and always come in name/value pairs.

Example:


Explanation:

In the above example, the “img” element has additional information about a source file. Here, “img” is the name of the element, “src” is the name of the attribute and “book.gif” is the value of the attribute. The element is closed by a ” /” because it itself is empty.

Entities:

In XML, special meanings are attached to some characters. For instance, the start of an XML tag is defined by the less-than sign (<). The HTML entity: “&nbsp;” is a “no-breaking-space” entity and is used to insert an extra space in a document in HTML. When a document is parsed by an XML parser, the entities are expanded.

The predefined entities in XML:

Entity References Character
&lt; <
&gt; >
&amp; &
&quot;
&apos;

PCDATA:

The text data that will be parsed by the XML parser is also termed as Parsed Character Data (PCDATA). Usually, all the text in an XML document is parsed by the XML parsers. The text between the XML tags is also parsed if an XML element is parsed. The parser examines the text for entities and markup. The tags inside the text will be treated as markup. The entities will be expanded. The &, <, or > characters should not be included in a parsed character data. The &amp; &lt; and &gt; entities, respectively, are used to represent them.

CDATA:

The text data that should not be parsed by the XML parser is also termed as CDATA or Character Data. The tags inside the text will NOT be treated as markup. The entities will not be expanded.