IBM is working on ways to make XML documents and data easier to pull into its content management software, and to index and search the data once it is in there.
Jim Reimer, chief architect of content management at IBM, said the initiative, code named Cinnamon, relates to handling XML documents and doing tasks such as automatic ingesting of the documents.
Handling of XML documents has been focused on being able to receive documents that are set in different document type definition (DTD) schemas and have them be, in effect, mapped into rows in a database.
"One way of gauging the completeness of a content management system is how rich a model you are able to manage for the way in which you are describing the content objects that are in the collection," he said.
"Content systems frequently have much more extensive description methods, like hierarchy and structure, like folders or folders in folders."
In IBM's latest Content Manager, Version 8, the company made extensions to what could be represented in a data collection, such as the primitives, the data modeling services, or what can be expressed in an XML document, including multi-valued attribute sets, arbitrary hierarchy, links, and relationships.
"The challenge, if you have such documents, is how to get them into CM and, second, how to deal with the landscape where you have evolving DTDs and schemas and different authors, writing in different DTDs and schemas, that are giving you content," Reimer explained.
The underlying technology aimed at dealing with evolving schemas is also being researched at IBM, in a project dubbed Clio, which is part of the company's overall eXperanto effort.
The next objective is to handle the automatic ingesting of documents including all the parsing, extraction, and projection into the new data model.
"It is very important, also, to be able to live with the evolution of those schemas," Reimer said.
Stephen O'Grady, an analyst at RedMonk, said that Cinnamon addresses the future problem for companies as they collect increasing numbers of XML documents.
"There is no question that having documents in XML will be advantageous to companies for a host of reasons," O'Grady said, estimating that major companies will need to address XML content management within about a year and a half.
Tom Sullivan writes for Inforworld