XML puts the X in text

Feature

XML puts the X in text

XML is a new, improved meta-language that enables users to make web and other documents look just the way they want them to

Extensible mark-up language is used primarily as a web creation tool. It allows users to set standards defining what information should be included in a document and in what order. Used correctly in combination with other standards, it allows users to define the content of the document separately from its formatting, which means you can re-use that content in other applications, or in different formats of presentation.

XML works because it is very, very simple. It provides a basic syntax that can share information between various computers and different applications, without the need to convert the file format.

Clearly targeted at web users, XML has two strengths. The first is that it provides syntax for document mark up (like HTML). However, XML also provides syntax for declaring the structure of documents.

It is not dissimilar to HTML in that it is a meta-language. It is a subset of the SGML controlled by the Worldwide Web Consortium (W3C). XML developed as a result of organisations needing to produce very large volumes of documents quickly. It has a smaller and simpler syntax than SGML.

SGML is a bulky language, whereas XML, being a meta language, allows users to format their own document mark-ups. Within HTML users cannot alter or extend the static mark-up between <HEAD> and <BODY>. XML allows you to define these tags according to your design wishes. For example, you could have <Heading1> or <Quote>. Each of these elements is defined through document type definitions and style sheets and can be applied to as many XML documents as the user chooses.

Many XML applications currently support CSS. However, XML does have its own version of this: Extensible Stylesheet Language (XSL). Using XSL guarantees that XML documents will appear as the designer intended no matter what browser or platform they are being viewed on.

Composing XML documents

XML documents are made up of elements. An element consists of two tags, the first places the name of the element between a less-than (<) and a greater than (>). The second is identical except for the forward slash (/) that appears before the element name. Like HTML, text between the opening and closing tags is considered part of the elements and is formatted according to the rules set up by that element.

With XML, it's vital for users to unlearn bad habits picked up while using HTML. While an HTML browser can ignore simple errors in HTML code, XML will not. For this reason attribute values must be in quotation marks, elements must have an opening and closing tag, and tags must be nested correctly.

The three elements that make up a well-formed XML document (i.e. a document that XML parsers will recognise correctly) are:

The document. This is a file containing the document data, i.e. a tagged document with XML elements, some of which contain attributes.

The stylesheet. This determines how a document will be formatted when it is viewed, regardless of the application it is viewed in. Bear in mind that you can have several stylesheets for one document, depending on the environment it is used in.

Document Type Definition. This specifies rules for how the XML document element, attributes and other data are defined and related within an XML-compliant document. Validation is a very important aspect of authoring XML and can be performed at any step in processing. Developers have to screen documents to check their structure. Applications which need to process a large amount of information quickly, or don't have the additional processing requirements imposed by validation, can stick to well-formed documents. These are also an easy way to get started with XML.

Well-formed documents

By using well-formed documents, which are very basic XML documents containing just the basic syntax, developers can create parseable documents which they can then extend to use more formal DTDs when the need arises.

If a document is authored correctly, XML should not be visible within documents. For most users, it will hide behind tools. Once a standard DTD is created, users won't need to create their own; they will simply be able to apply it to their document making modifications according to the structure of their document.

Simple, but effective code

XML relies on a small set of rules which are easy to comprehend and readable to both humans and computers. This means that developers should find it fairly simple to get to grips with the basics. DTDs can be developed or based on the structure of documents that seem to work well. The parsers are also simple to build.

XML documents are founded upon a core set of basic, nested structures. While these structures can become very complicated as multiple layers of details are added to them, the mechanisms that underlie them require very little implementation effort, and structures do not have to change when the document objects change.

The X in text

The X stands for extensible. XML grows with the developers experience, because they can use it to create their own DTDs, thus effectively creating extensible tag sets which are then available for use in multiple applications.

XML provides a core standard around which other standards will grow and is being constantly extended with additional standards that add linking, referencing and style abilities to the core standards.

XML was designed to be extended! As a meta-language - that is a language made up of sets of rules which are applied to documents written in that language - it is made to grow and the DTDs make sure it does just that.

XML also supports many of the standards applied to HTML, like Hypertext Transfer Protocol (HTTP), and CSS. W3C are still developing other standards, specifically for XML's use.

Multiple platforms

Unlike HTML, which appeared differently according to which piece of software you used to view documents, XML can be interpreted with a wide range of tools and on a variety of platforms. The document structure is consistent and this means that parsers can be built to interpret them at low costs. It complements Java well and many of the developments in XML have been made in Java. Parsers are available in C++, C, JavaScript, Tcl, Python and a generic application-programming interface (API). These developments have so far utilised freeware plug-ins, which of course lower the cost of building XML enabled applications.

Developing standards

The development of standards for XML is a continuing one. Details are available freely on the W3C site. XML is supported by the XML Working Group, which develops the supporting standards and publishes them once completed.

XML documents have an openness about them, alien to their binary counterparts. While companies may wish to disable this aspect of XML by encrypting their data (to protect their application development), this would also lose many of XML's benefits. XML does allow proprietary formats, but its openness is a core strength of the language.

Most authors find XML a fairly simple language to learn because it uses other SGML attributes. Designed to be used over the Internet, XML will most likely find chief function on the Web. This doesn't mean that XML won't be used in other applications; it has the potential to become a multi-application standard language that will tie together many of our systems.

XML and CSS make marking up pages for presentation on the Internet easy. HTML cannot cope with the requirements of multiple platforms and are slow for encoders developing large sites. However, using CSS, users can determine the style and content of repeated elements and thus saves themselves time designing repetitive pages. The formatting information in the stylesheet links to the XML tags in the document, which allows editors to be marked up based on content rather than requiring precise formatting. The use of style sheets should also lead to a reduction in bandwidth used because CSS centralises the formatting information required for the site.

Another advantage with XML is the ability to re-use page contents. If the user has appropriate supporting software, they could extract XML data from a page and keep it on their hard drive, so capturing information on the site for their use (e.g. price lists and site maps).

XML is also search engine efficient, because it enables data categorisation, which is much faster than context based searching.

Conclusion

XML is a web language for the future. However, it is also useful for other applications because it facilitates data transfer in a universal format. Provided applications can parse XML and can share data, they can share structured information between platforms and applications.

XML is a meta-language which provides a set of standards developers use to develop their own standards. The culture of openness surrounding these standards facilitates the development of new data management systems that are both simple and effective.

Rachel Hodgkins


Email Alerts

Register now to receive ComputerWeekly.com IT-related news, guides and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

This was first published in August 1999

 

COMMENTS powered by Disqus  //  Commenting policy