XML is a markup language similar to HTML. It stands for Extensible Markup Language and is a W3C recommended specification as a general purpose markup language. This means, unlike other markup languages, XML is not predefined so you must define your own tags. The primary purpose of the language is the sharing of data across different systems, such as the Internet.
There are many languages based on XML; Some examples are XHTML, MathML, SVG, XUL, XBL, RSS, and RDF. You can also create your own.
"Correct" XML (valid and well-formed)
For an XML document to be correct it must be a well-formed document, conforming to all of XML's syntax rules, and valid, conforming to a specific language's rules. An example of a document that is not well formed is one with an element that has an opening tag with no closing tag and is not self-closing.
Example
In the below example, we see a document in which a tag that isn't self-closing has no closing tag.
<message> <warning> Hello World <!--missing </warning> --> </message>
Now let's look at a corrected version of that same document:
<message> <warning> Hello World </warning> </message>
To be valid, an XML document needs to conform to some semantic rules which are usually set in an XML schema or a Document Type Definition (DTD). A document that contains an undefined tag is invalid. For example, if we never defined the <warning>
tag, the document above wouldn't be valid.
Most browsers offer a debugger that can identify poorly-formed XML documents.
Entities
Like HTML, XML offers methods (called entities) for referring to some special reserved characters (such as a greater than sign which is used for tags). There are five of these characters that you should know:
Entity | Character | Description |
---|---|---|
< | < | Less than sign |
> | > | Greater than sign |
& | & | Ampersand |
" | " | One double-quotation mark |
' | ' | One apostrophe (or single-quotation mark) |
Even though there are only 5 declared entities, more can be added using the document's Document Type Definition. For example, to create a new &warning;
entity, you can do this:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE body [ <!ENTITY warning "Warning: Something bad happened... please refresh and try again."> ]> <body> <message> &warning; </message> </body>
You can also use numeric character references to specify special characters; for example, © is the "©" symbol.
Displaying XML
XML is usually used for descriptive purposes, but there are ways to display XML data. If you don't define a specific way for the XML to be rendered, the raw XML is displayed in the browser.
One way to style XML output is to specify CSS to apply to the document using the xml-stylesheet
processing instruction.
<?xml-stylesheet type="text/css" href="stylesheet.css"?>
There is also another more powerful way to display XML: the Extensible Stylesheet Language Transformations (XSLT) which can be used to transform XML into other languages such as HTML. This makes XML incredibly versatile.
<?xml-stylesheet type="text/xsl" href="transform.xsl"?>
Recommendations
This article is obviously only a very brief introduction to what XML is, with a few small examples and references to get you started. For more details about XML, you should look around on the Web for more in-depth articles.
Learning the HyperText Markup Language (HTML) will help you better understand XML.
See also
The Using XML article above is a great resource on information for transforming and creating your own language.