The amount of information that is held throughout the company network has been growing ever since computers were introduced to the workplace.
Figures released by analysts IDC showed that network attached storage grew by over half in 2005 over the preceeding year.
This is as a direct result of the amount of information that companies are now accumulating. A company’s information keeps growing and IT needs to manage this information, therefore the obvious thing to do is archive it.
Claus Egge, storage analyst for IDC said, “There is an element of compliance involved; there is legislation where some firms have to ensure that certain information is never deleted. The increase in requirements to keep this information certainly means that storage requirements will increase.”
However, after spending this money on ensuring that adequate storage is available, IT directors are left with one essential problem to fix – that of enabling end users to access the information. It is estimated that employees can spend up to 25% of their day searching for information they know exists but just can’t find.
The task has grown from merely requiring the ability to pull basic data from a database, to include complex queries which require searching for documents, information held within documents, across the internet or intranets and even for delivering sound and images relating to a search.
Adding to the complexity of providing access to valuable information is the number of emails that now relate to essential business information that must be kept and tracked.
It is clear that merely increasing storage capacity will not solve the problem. A report published in February this year the technology analyst company, Taneja Group, states that, “By 2008, we believe that Information Classification and Management (ICM) will be a non-negotiable necessity for enterprise IT.”
The tools must be implemented to allow users to access the information they need. The report said, “any next-generation response to today’s data tsunami absolutely requires some intelligence at its head in order to ensure a successful deployment.”
The first step to being able to retrieve information is knowing what is out there and where it is. It is important to have sensible policies on what is stored, for how long and where. For any information storage and retrieval system to work the archiving of information needs to be prioritised on importance.
Services from storage specialists or consultancy companies such as Unisys and Incentra can help with the task of implementing policies on which information should be kept and for how long, and exactly what storage requirements there are. Many IT managers may groan at the prospect of more storage requirements, but it is clear that carefully managed storage doesn’t have to mean hugely inflated budgets for future storage costs.
Tier one, high availability storage products should be used for business critical data and information, while information that is less essential can be archived on the cheaper tier two products such as SATA disks, and finally, for information that needs to be stored for many years there is always tape.
Based on policy, software tools can then be deployed that allow good storage and retrieval of information. A wide range of products are available, depending on the level of complexity you need. Electronic document management system, such as that from Invu, enable businesses to retrieve documents by initially scanning them into a system where they are automatically indexed and filed, and can then be searched on keywords. Through one interface users can view any documents which are returned in the search.
Alternatively there are content management systems like the Universal Content Manager from Stellent. The addition of Universal Records Management enables companies to apply records and retention schedules, and litigation holds to content stored in repositories and applications.
Stellent has its own repositories which include Content Server and Imaging and Business Process Management, but the Universal Records Management agents also work with Microsoft’s and Symantec’s content management systems SharePoint Server and Enterprise Vault, as well as Windows, Unix and Linux file servers.
Another comprehensive solution would be Njini’s ICM system, IAM Suite, which looks inside unstructured data documents from desktop applications such as spreadsheets, PDFs, etc. and rates the files content and value from policies and keywords that have been set to define how documents should be classified. It can then control duplication, store the documents in the appropriate storage layer, and manage access policies.
Virgin Mobile has implemented the NjiniEngine for unstructured data management, de-duplication and policy implementation. The company undertook an analysis exercise with Njini to examine the growth and levels of duplication of its Windows based network shared drives. Keith Bennett, lead infrastructure architect from Virgin Mobile explains it’s policy, “
From that analysis, we discovered we had duplications rates of around 24% of the files held and we had a growth rate of around 65%. Our policy at the time was simply a case of introducing storage quotas for users and buying additional storage. Armed with the analysis undertaken and an idea of potential future growth, we took the decision to undertake a Proof of Concept with Njini to see how VM could benefit from implementing the njiniEnroll and njiniEnforce components.”
After implementation Virgin Mobile found that for 50 users in the PoC phase it could make a saving of 50Gb in storage capacity due to deduplication. Bennett continues, “From VM’s perspective, the key driver has always been on reducing cost of managing unstructured data.
The fact that we are able prove we can reduce our storage costs by reducing our storage requirements coupled with the fact that from a user perspective, they see no change at all has been the biggest benefit. Also, we have been able to have a much greater level of detail of what data is being held and how best to organize that data has been very important to VM.”
Virgin Mobile is now in the process of purchasing and planning the full implementation of Njini across the whole user population, approx 1500 users at three separate locations.
Meanwhile Google uses its searching technology for business with the launch of Google Mini for SMBs and Google Search Appliance for the enterprise. Search Appliance is capable of indexing up to 30 million documents, and provides access to information held within applications including calendaring, CRM, ERP and business intelligence.
The Google Search Appliance crawls the network content and creates a master index of documents that is then accessed by Google’s search technology.
Another option is a combined hardware software product such as Hitachi’s Content Archive Platform. This is an active digital archiving product. It supports policy-based integration from storage repositories around the network, such as e-mail, file systems, databases, applications and content or document management systems.
Again, the Content Archive Platform provides a centralised search, policy-based retention, authentication and protection.
The issue of being able to trace and access e-mail information was highlighted in 2005 when Morgan Stanley was ordered to pay damages of $604 million after the company failed to produce e-mail evidence during a court case, and the ruling was seen as a move to make corporations more accountable for retrieving documented information.
However, that is not the only legal imperative to be able to archive and retrieve information; human resources may also require archived information in settling disputes between employees, or in dismissal cases. In addition, information may be required to be produced for in payment or supplier disputes
Managed email services such as those from Postini, are one option of reducing the burden of tracking information held in e-mails. Postini’s Archive Manager allows companies to set and enforce policies for the discovery, processing and archiving of e-mail and IM. These policies can be based on global, group or user requirements. As with many e-mail information management systems, it captures inbound and outbound message content and attachments, and control of the messaging environment is provided via a web console
There are also e-mail management software solutions that companies can implement directly. These can automatically index and store inbound and outbound emails, and can, depending on policy, archive these to other storage areas so that they can still be retrieved after they have been deleted from individual mailboxes. The Invu electronic document management system also includes this feature.
CODA, a financial accounting software provider, employed HP’s StorageWorks Reference Information Storage System (RISS) when it found that it was having a lack of space for electronic storage. Like most companies, the majority of its communication is done via e-mail, and with 500 roaming consultants there was a problem with personal folders throughout the organisation, and potential security of information. CODA purchased the RISS hardware and software solution which receives data from applications and actively archives it so that it is searchable online.
“It’s about clever management of information,” explained Richard Hall, CODA’s Group IT Manager. “We estimated that everyone was searching their e-mail for about 20 to 30 minutes per day, and previously there wasn’t a facility to do this.” RISS automatically removes attachments and archives messages over 30 days old. These are stored centrally, but appear the same to the user.
Combining existing products with backup and restore assessments and storage metering tools, IT departments are able to make sensible decisions on where to take their storage requirements in the future. Now data assessment services including profiling, classification together with data management porducts means they can make sensible decisions about what information to keep and where.
Comment on this article: firstname.lastname@example.org
This was first published in November 2006