Book review: SpectraLogic's Society's Genome

Written by SpectraLogic CEO Nathan Thompson with product specialist Bob Cone and software developer John Kranz, Society’s Genome – Genetic diversity in digital preservation, makes a case for “genetic diversity” in data protection.

The premise of Society’s Genome is a weighty one. It is that the very survival of society as we know it is dependent on the preservation of its data; that data and the information and knowledge we gain from it is “society’s genome”, and the “recipe” by which modernity is propagated.”

For the author, the preservation of data against “far-reaching and complex” threats and, “[e]nsuring that this data can be retrieved by future generations is one of the greatest priorities of our time.”

To emphasise this the author reminds us of the loss for several centuries of Roman scholar Ptolemy’s Geography and of the ancient Sanskrit Bakhshali Manuscript, a mathematical encyclopedia.

With these hints at former “dark ages” Thompson asks, “How much more dangerous would it be then to lose substantive geographic data in the age of GPS?”

The fundamentals of a strategy to avoid such a new dark age are by ensuring a “genetic diversity” of data preservation by looking to the natural world for insights that can be applied to counter contemporary threats to data.

Citing examples that include the British royal family’s history of cousin marriage and resulting haemophilia as well as the genetic narrowness of the staple lumper potato that contributed to the Irish famine of the 1840s, the author goes on to argue that the evolution of business IT is also heading for a potential lack of “genetic” diversity.

Namely, that while data growth increases exponentially the locations and the media that store data are becoming concentrated, in fewer datacentres and on fewer media types.

“In a way,” says Thompson, “consolidation and standardisation have made enterprise disk drives a modern day lumper potato.”

The answer, in short, says SpectraLogic’s CEO, is to implement a diversity of data storage by using a variety of media; a combination of disk, tape and optical media.

Citing IDC figures that say 47% of data worldwide that should require security has none, it goes on to assert, “The question is not if – but when – a data catastrophe will occur,” and that services vital to society are at stake, such as communications, energy generation, transport, defence, commerce and healthcare.

So, to sum up, the argument is, data and information are vital to human society and a catastrophic loss of that data would therefore threaten our existence as we know it. Furthermore, when looking at nature we see that methods used by nature to ensure survival – genetic diversity – can be applied to data protection. Therefore, what is needed is to ensure multiple varieties of media to ensure data protection.

The bulk of the book goes on to provide a very readable account of the increasing volume and importance of data via the internet of things, social media and mobile, the importance of data to the corporate world, the increasingly ingenious ways data is being interrogated, but also the threats faced, including natural disasters, hacking and cyber war.

When it comes to practicalities Society’s Genome starts with some principles, namely diversity of protocol, of media, of volatilty and of geography.

Diversity of protocol dictates that data copies should be held in different formats so that a threat to one is not necessarily a threat to other copies.

When talking of diversity of media Thompson says, “A complete copy of disk data stored on a secondary disk (preferably using a different storage protocol), tape, or optical media offers the best chance of data survival against malicious threats or human error.”

It goes on to cite Google’s Raymond Blum explaining the web giant’s use of tape: “Tape is great because it is not disk,” and also favourably contrasts tape’s lifecycle (eight years) with that of disk (three).

Diversity of volatility appears to refer to the susceptibility of a medium to catastrophic power surges such as in an electro-magnetic pulse attack and here again tape comes out well, as does optical media, owing to the in-built “air gap” between the media and any network that may carry a threat.

Finally, Thompson deals with geographic separation and concludes that while so-called tectonic separation may be necessary for some organisations, for others, perhaps smaller ones, the likelihood of not being able to carry on at all after major events rules out the need for world-scale redundancy of data stores.

So, how does it all stack up?

Well, it’s a well-written and readable book. It is well-researched and contains many chapters of interesting examples, not just from storage and IT but from a wide sweep of human history.

But at the same time one can’t help feeling that the whole endeavour is an exercise in “top of the funnel” marketing, and I doubt SpectraLogic execs would deny that.

So, in some senses one reads certain sections, especially those that trumpet the advantages of the use of tape – albeit alongside other media – as a case of, “Well, they would say that, wouldn’t they?” given Spectra Logic is heavily invested in the tape storage market.

Having said that, there is frequent mention of the suitability of optical media as a viable long-term storage medium and as far as I know SpectraLogic has no direct investment in this space.

An obvious set of omissions, however, from a book with ambitions to cover a wide sweep of history and act as a guide for the future are next-generation storage technologies. Here I am thinking of holographic, DNA and quantum storage, that are potential game changers that still await crucial breakthroughs in various fields, but there is no discussion of these

The practical conclusions and technological focus of Society’s Genome, therefore, don’t look forward, in this sense, and are firmly rooted in contemporary, or one might argue, last generation technologies.