FACTOID # 54: The Mall in Washington, D.C. is 1.4 times larger than Vatican City.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > CDATA

The term CDATA (/ˈsi.deɪ.tə/), meaning character data, is used for distinct, but related purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure. A specialized markup language using SGML is used to write the electronic version of the Oxford English Dictionary. ... The Standard Generalized Markup Language (SGML) is a metalanguage in which one can define markup languages for documents. ... The Extensible Markup Language (XML) is a general-purpose markup language. ...

Contents

CDATA sections in XML

In an XML document or external parsed entity, a CDATA section is a section of element content that is marked for the parser to interpret as only character data, not markup. A CDATA section is merely an alternative syntax for expressing character data; there is no semantic difference between character data that manifests as a CDATA section and character data that manifests as in the usual syntax in which "<" and "&" would be represented by "&lt;" and "&amp;", respectively.


Syntax and interpretation

A CDATA section starts with the following sequence:


<![CDATA[


and ends with the first occurrence of the sequence:


]]>


All characters enclosed between these two sequences are interpreted as characters, not markup or entity references. For example, in a line like this:


<sender>John Smith</sender>


the opening and closing "sender" tags are interpreted as markup. However, if written like this:


<![CDATA[<sender>John Smith</sender>]]>


then the code is interpreted the same as if it had been written like this:


&lt;sender>John Smith&lt;/sender>


or like this:


&lt;sender&gt;John Smith&lt;/sender&gt;


That is, the "sender" tags will have exactly the same status as the "John Smith"— they will be treated as text.


Similarly, if the numeric character reference &#240; appears in element content, it will be interpreted as the single Unicode character 00F0 (small letter eth). But if the same appears in a CDATA section, it will be parsed as six characters: ampersand, hash mark, digit 2, digit 4, digit 0, semicolon. A numeric character reference (NCR) is a common markup construct used in SGML and other SGML-based markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set (UCS) or Unicode. ... Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. ...


Uses of CDATA sections

New authors of XML documents often misunderstand the purpose of a CDATA section, mistakenly believing that its purpose is to "protect" data from being treated as ordinary character data during processing. Some APIs for working with XML documents do offer options for independent access to CDATA sections, but such options exist above and beyond the normal requirements of XML processing systems, and still do not change the implicit meaning of the data. Character data is character data, regardless of whether it is expressed via a CDATA section or ordinary markup.


CDATA sections are useful for writing XML code as text data within an XML document. For example, if one wishes to typeset a book with XSL explaining the use of an XML application, the XML markup to appear in the book itself will be written in the source file in a CDATA section. However, a CDATA section cannot contain the string "]]>" and therefore it is not possible for a CDATA section to contain nested CDATA sections. The preferred approach to using CDATA sections for encoding text that contains the triad "]]>" is to use multiple CDATA sections by splitting each occurrence of the triad just before the ">". For example, to encode "]]>" one would write: The eXtensible Stylesheet Language (XSL) is a family of transformation languages which allows one to describe how files encoded in the XML standard are to be formatted or transformed. ...

 <![CDATA[]]]]><![CDATA[>]]> 

This means that to encode "]]>" in the middle of a CDATA section, replace all occurrences with the following:

 ]]]]><![CDATA[> 

(This effectively stops and restarts the CDATA section).


CDATA in DTDs

CDATA-type attribute value

In Document Type Definition (DTD) files for SGML and XML, an attribute value may be designated as being of type CDATA: arbitrary character data. Within a CDATA-type attribute, character and entity reference markup is allowed and will be processed when the document is read. Document Type Definition (DTD), defined slightly differently within the XML and SGML specifications, is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. ...


For example, if an XML DTD contains

 <!ATTLIST foo a CDATA #IMPLIED> 

it means that elements named foo may optionally have an attribute named "a" which is of type CDATA. In an XML document that is valid according to this DTD, an element like this might appear:

 <foo a="1 &amp; 2 are &lt; &#51; &#x10;"> 

and an XML parser would interpret the "a" attribute's value as being the character data "1 & 2 are < 3".


CDATA-type entity

An SGML or XML DTD may also include entity declarations in which the token CDATA is used to indicate that entity consists of character data. The character data may appear within the declaration itself or may be available externally, referenced by a URI. In either case, character reference and parameter entity reference markup is allowed in the entity, and will be processed as such when it is read. A Uniform Resource Identifier (URI), is a compact string of characters used to identify or name a resource. ...


CDATA-type element content

An SGML DTD may declare an element's content as being of type CDATA. Within a CDATA-type element, no markup will be processed. It is similar to a CDATA section in XML, but has no special boundary markup, as it applies to the entire element.


External links


  Results from FactBites:
 
SGML: CDATA and RCDATA (12693 words)
CDATA means that characters in that element are declared to be data characters and is only terminated by an end-tag open in context (a , or a letter; or / if the start-tag was "net-enabling".) this may change as the FEATURES clause in the SGML declaration changes.
CDATA and RCDATA must not be used if you want your documents to outlive your system or software, or the current version of either.
CDATA element; Right, but note that this is true even if the end-tag for the element is not declared minimizable (something that tend to confuse people).
CDATA Sections (1681 words)
CDATA sections in an XML document serve no other purpose than to unambiguously say "this is all text, not markup".
CDATA sections (like entity references) are not considered to be part of the document tree in an XPath processor.
CDATA is just a syntactic alternative to using references, and is not recorded in the input tree, just as is the same as produce identical input and you can not preserve white space and quote styles inside tags.
  More results at FactBites »


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m