FACTOID # 67: Nearly a quarter of people in Monaco are over 65.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > XML Schema Language Comparison

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. There are several different languages available for specifying an XML schema. Each language has its strengths and weaknesses. An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. ... The Extensible Markup Language (XML) is a W3C-recommended general-purpose markup language that supports a wide variety of applications. ...


Note: the W3C defined schema language is called, "XML Schema". However, this name can be confusing in the context of referring to a number of XML schema languages. As such, throughout this document, references to the term "XML Schema" will be any XML schema language where the meaning might be ambiguous, while the term, "W3C XML Schema" will be used for the W3C-defined XML schema language. The World Wide Web Consortium (W3C) is a consortium that produces standards—recommendations, as they call them—for the World Wide Web. ... XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. ... XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. ...

Contents

Overview

Though there are a number of schema languages available, the primary 3 languages are Document Type Definitions, W3C XML Schema, and RELAX NG. Each language has its own advantages and disadvantages. Document Type Definition (DTD), defined slightly differently within the XML and SGML specifications, is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. ... XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. ... In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML, based on Murata Makotos RELAX and James Clarks TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document. ...


This article will also cover a quick survey of other minor schema languages.


The primary purpose of a schema language is to specify what the structure of an XML document can be. This means which elements can reside in which other elements, which attributes are and are not legal to have on a particular element, and so forth. A schema is somewhat equivalent to a grammar for a language; a schema defines what the vocabulary for the language may be and what a valid "sentence" is. Grammar is the study of rules governing the use of language. ...


Document Type Defintions

Document Type Definition (DTD), defined slightly differently within the XML and SGML specifications, is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. ...

Advantages

DTDs are perhaps the most widely supported schema language for XML. Most XML processors will understand a DTD even if they do not validate against it.


Of the primary 3 languages, DTDs are the only ones that can be defined inline. That is, the DTD can actually be embedded directly into the document.


DTDs can define more than merely the content model. It can define data elements that can be used in the document, much like a C or C++ preprocessor may have #defines that are used internally. Wikibooks has a book on the topic of C Programming The C programming language (often, just C) is a general-purpose, procedural, imperative computer programming language developed in the early 1970s by Dennis Ritchie for use on the Unix operating system. ... C++ (IPA pronounciation: ) is a general-purpose, high-level programming language with low-level facilities. ...


The DTD language is compact and highly readable, though it does require some experience to understand.


Disadvantages

The primary disadvantage to DTDs is their weakness of specificity. The content models for DTDs are very basic, particularly compared to the other two languages.


Overuse of DTD-defined elements may make a document illegible or incomprehensible without the associated DTD. Additionally, there are several XML processors that, typically for ease-of-implementation reasons, do not understand DTDs. As such, if DTD-defined entities are being used, these XML processors will not recognize them.


The language that DTDs are written in is not XML. Therefore, DTDs cannot use the various frameworks that have been built around XML. XML editors that support writing DTDs must do so by parsing an additional language, for example. Some XML processors, typically for economy of implementation or execution, simply ignore DTD information, including DTD data elements.


The DTD concept for XML was borrowed from the SGML DTD concept. As such, the construct could not be changed when XML was extended with namespaces. As such, DTDs are namespace unaware. The Standard Generalized Markup Language (SGML) is a metalanguage in which one can define markup languages for documents. ... An XML namespace is a W3C standard for providing uniquely named elements and attributes in an XML instance. ...


There is limited support for defining the type of the contained data. DTDs are primarily structural in nature. They do not have the ability to specify that an element contains an integral number, real number, a date, or anything of that nature.


W3C XML Schema

Main article: XML Schema (W3C)

XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. ...

Advantages over DTDs

Compared to DTDs, W3C XML Schemas are exceptionally powerful. They provide much greater specificity than DTDs could. They are namespace aware, and provide support for types.


W3C XML Schema is written in XML itself, and therefore has a schema of its own (appropriately, written in W3C XML Schema).


W3C XML Schema has a large number of built-in and derived data types. These are specified by the W3C XML Schema specification, so all W3C XML Schema validators and processors must support them.


Due to the nature of the schema language, after an XML document is validated, the entire XML document, both content and structure, can be expressed in terms of the schema itself. This functionality, known as Post-Schema-Validation Infoset (PSVI), can be used to transform the document into a hierarchy of typed objects that can be accessed in a programming language through a neutral interface. This article needs cleanup. ...


Commonality with RELAX NG

Both RELAX NG and W3C XML Schema allow for similar mechanisms of specificity. Both allow for a degree of modularity in their languages, going so far as to being able to split the schema into multiple files. And both of them are, or can be, defined in an XML language.


Advantages over RELAX NG

RELAX NG lacks any analog to PSVI.


Additionally, RELAX NG has slightly poorer specificity in certain respects. For example, it is not possible to define a specific number or range of repetitions of patterns under RELAX NG; under W3C XML Schema, this is possible.


Also, RELAX NG has no ability to apply default attribute data to an element's list of attributes, while W3C XML Schema does.


RELAX NG has only two built-in data types (string and token), while W3C XML Schema has far more.


W3C XML Schema has a formal mechanism for attaching a schema to an XML document.


Disadvantages

W3C XML Schema does not allow for the definition of which element, or elements, may be valid root elements of a document. As such, all elements defined in the schema can be a root element. A well-defined list of root elements (and equally importantly, elements that cannot be roots) is a useful feature for a schema, particularly for languages that are intended to be split into multiple files and included from one to another. It allows easy validation of the separate parts, as well as not validating invalid root elements.


It is difficult to define that a particular set of elements may occur in any order relative to one another. This is a valuable construct, particularly for formats pertaining to documents.


While being written in XML is an advantage, it is also a disadvantage in some ways. The W3C XML Schema language in particular can be quite verbose, while a DTD can be terse and relatively easily editable.


W3C XML Schema does not implement most of the DTD ability to provide data elements to a document. While technically a comparative deficiency, it also does not have the problems that this ability can create as well, which makes it a strength.


While W3C XML Schema's ability to add default attributes to elements is an advantage, it is a disadvantage in some ways as well. It means that an XML file may not be useable in the absence of its schema, even if the document would validate against that schema. In effect, all users of such an XML document must also implement the W3C XML Schema specification, thus ruling out minimalist or older XML parsers. It can also dramatically slow down processing of the document, as the processor must potentially download and process a second XML file (the schema).


Though W3C XML Schema has an extensive number of built-in data types, it provides no mechanism for the user to add more. This means that a user who needs to define a type not supported by WXS will need to provide post-Schema validation services for that kind of data.


RELAX NG

Main article: RELAX NG

In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML, based on Murata Makotos RELAX and James Clarks TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document. ...

Advantages over W3C XML Schema

RELAX NG provides for most of the advantages that W3C XML Schema does over DTDs.


The primary advantage of RELAX NG over W3C XML Schema is the compact form of RELAX NG. While the language can be written in XML, RELAX NG also has an equivalent form that is much more like a DTD, but with greater specifying power. These two forms can be switched between with no loss of features or even commenting.


RELAX NG provides very strong support for unordered content. That is, it allows the schema to state that a sequence of patterns may appear in any order.


RELAX NG also allows for non-deterministic content models. What this means is that RELAX NG allows the specification of a sequence like the following:

 <zeroOrMore> <ref name="odd"/> <ref name="even"/> </zeroOrMore> <optional> <ref name="odd"/> </optional> 

When the validator encounters something that matches the "odd" pattern, it is unknown whether this is the optional last "odd" reference or simply one in the zeroOrMore sequence without looking ahead at the data. RELAX NG allows this kind of specification. W3C XML Schema requires all of its sequences to be fully deterministic, so mechanisms like the above must be either specified in a different way or omitted altogether.


RELAX NG allows attributes to be treated as elements in content models. In particular, this means that one can provide the following:

 <element name="some_element"> <choice> <attribute name="has_name"> <value>false</value> </attribute> <group> <attribute name="has_name"> <value>true</value> </attribute> <element name="name"><text/></element> </group> </choice> </element> 

This block states that the element "some_element" must have an attribute named "has_name". This attribute can only take true or false as values, and if it is true, the first child element of the element must be "name", which stores text. If "name" did not need to be the first element, then the choice could be wrapped in an "interleave" element along with other elements. The order of the specification of attributes in RELAX NG has no meaning, so this block need not be the first block in the element definition.


W3C XML Schema cannot specify such a dependency between the content of an attribute and child elements.


RELAX NG's specification only lists two built-in types (string and token), but it allows for the definition of many more. In theory, the lack of a specific list allows a processor to support data types that are very problem specific. In practice, most RELAX NG processors support the W3C XML Schema set, though some support user-defined types and type validation as well.


Most RELAX NG schemas can be algorithmically converted into W3C XML Schemas and even DTDs (except when using RELAX NG features not supported by those languages, as above). The reverse is not true. As such, RELAX NG can be used as a normative version of the schema, and the user can convert it to other forms for tools that do not support RELAX NG.


Disadvantages

Most of RELAX NG's disadvantages are covered under the section on W3C XML Schema's advantages over RELAX NG.


Schematron

Main article: Schematron

Schematron is a fairly unique schema language. Unlike the main 3, it defines an XML file's syntax as a list of XPath-based rules. If the document passes these rules, then it is valid. The Schematron is an XML structure validation language for making assertion about the presence or absence of patterns in trees. ... XPath (XML Path Language) is an expression language for addressing portions of an XML document, or for computing values (strings, numbers, or boolean values) based on the content of an XML document. ...


Advantages

Because of its rule-based nature, Schematron's specificity is very strong. It can require that the content of an element be controlled by one of its siblings. It can, also, request or require that the root element, regardless of what element that happens to be, have specific attributes.


Disadvantages

Specifying basic structure. While Schematron is good at relational constructs, its ability to specify the basic structure of a document, which elements can go where, results in a very verbose schema.


The typical way to solve this is to combine Schematron with RELAX NG or W3C XML Schema. There are several schema processors available for both languages that support this combined form. This allows Schematron rules to specify additional constraints to the structure comprised of W3C XML Schema or RELAX NG.


Namespace Routing Language (NRL)

This is not technically a schema language. Its sole purpose is to direct parts of documents to individual schemas based on the namespace of the encountered elements. An NRL is merely a list of XML namespaces and a path to a schema that each corresponds to. This allows each schema to be concerned with only its own language definition, and the NRL file routes the schema validator to the correct schema file based on the namespace of that element. Namespace Routing Language (NRL). ... An XML namespace is a W3C standard for providing uniquely named elements and attributes in an XML instance. ...


This XML format is schema-language agnostic and works for just about any schema language.


See Also



 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.