Checking the form on file formats

Ensuring data is widely accessible is a key concern for all businesses, says Jack Schofield


Ensuring data is widely accessible is a key concern for all businesses, says Jack Schofield

There are three ages of computing: the age of hardware, the age of software, and the age of data. It is data compatibility that is now becoming most important.

In the first age of computing, it was the choice of hardware that ultimately drove the business – were you an IBM or DEC shop? In the second age, hardware brands generally became less important than the operating system and applications. Businesses chose Unix or Microsoft Windows, DB2 or Oracle, Lotus Notes or Novell Groupwise.

Today, there is no doubt that IT is starting to revolve around data and file formats. Which is not to claim that this is a new idea or that data formats have not been important before. Of course they have – look at the impact of SQL. But formats are becoming a central concern for a number of reasons.

First, data is now horribly expensive to create and store, while hardware and software have become relatively cheap.

Second, data needs to be accessed more widely by more devices: it is online and on the desktop, as well as on the server; you may need to access it from a mobile phone or PDA, or from several different applications, not just a VT100 terminal.

Third, regulatory changes on data retention and protection, and long-term access to it, mean businesses have to take the problems seriously.

The point is that multiple applications from multiple suppliers should be able to generate compatible files. Microsoft’s response includes its XML-based Office Open file formats, which are being standardised via industry association ECMA. These will be the default when Office System 2007 launches next January.

Again, other firms will be able to produce compatible files – Novell has already produced a prototype for the capability. And, yes, Microsoft really wants you to convert billions of old .doc files, and will provide batch programs to do it.

Microsoft has also provided PDF support in the beta version of Office 2007, though this has led to a spat with Adobe, which claims to be worried about Microsoft embracing and extending the standard.

On the web, Yahoo has gone further to include microformats such as hReview, hCard and hCalendar (Computer Weekly, 24 January).

I believe microformats based on XHTML are going to be important in making data accessible over the internet – that means re-usable data in standard fields. Microsoft is also aware of the issues, and it would be nice if Google woke up.

It is a pretty safe bet that every established business has run into the problem of needing access to data in old files it can no longer read because either the hardware or software no longer exists. That is normal. But looking ahead, it won’t be, because it can’t be.

Jack Schofield is computer editor at The Guardian

Read more on Privacy and data protection