CenturionStudio.it - Fotolia

Turning machine data into operational intelligence

We look at how companies analyse server and security logs to tackle cybercrime and internal fraud, and optimise the user experience

Log management originates from a time when storage was limited and expensive.

The aim was to preserve logs generated by a range IT devices and store them centrally for future scrutiny, at least for a period of time, before overwriting them on the device.

The need to examine old logs may have been to prove compliance of some sort, investigate a security incident or scrutinise a user’s suspected misbehaviour.

As storage became cheaper, devices could keep archives of more log data, so the original need to prevent overwriting old data became less pressing. But machine data had proved its worth for investigating IT incidents.

With more and more devices churning it out – servers (real and virtual), routers, load balancers, application delivery controllers, firewalls, user devices and cloud platforms – another challenge emerged, to correlate it all.

A primary early-use case was security. This led to log management being rebranded as security information and event management (SIEM), which a previous Computer Weekly buyer’s guide examined in context-aware security. But attaching “security” to the name diverted attention from two other broader capabilities some of the emerging tools were capable of.

First, machine data is valuable to IT systems’ general management. For example, monitoring virtual and cloud environments to identify inactive (ghost) or rogue virtual machines; or detecting when an online application needs extra resources from a cloud platform (cloud-bursting) and checking beforehand that this is needed to serve real users and not a volumetric denial of service attack. Recent research shows European businesses benefiting from such techniques to manage IT complexity.

Second, the intelligence could lend insight into an organisation’s commercial activity and inform business decisions. For example, correlating callcentre volumes and/or waiting times with other customer-related data, monitoring key performance indicators (KPIs) to ensure acceptable transaction times for web-based services or tailoring online adverts based on user information, such as device type. Research shows that managing the cross-channel customer experience is easier with the insight provided by machine data.

From log management to operational intelligence

Research in 2014 looked at the maturity of European organisations using machine data. The most advanced were going well beyond the use of traditional log files, enriching the data with other sources. This included communications records (for example from email and voice systems); internet activity data (click streams, cookie records); data from other sorts of machines (industrial plant, sensors); and external feeds such as software vulnerability information, social media and weather reports.

These organisations were doing this to take and use operational intelligence to a higher level; to provide insight – through the automated collection, management and analysis of data from a wide range machines – into the very heart of the organisation’s activities.

Applications for this data needed tools capable of using it in real time, so systems could respond to the issues arising – be they security, performance or business-related.

Four stages of operational intelligence capability were recognised in the report:

  • Search and investigate – the original concept of log management, the historic view;
  • Proactive monitoring – this covers SIEM, which some suppliers have evolved to provide a real-time capability to respond to security incidents;
  • Operational visibility – the capability to automate certain aspects of IT management, depending on events;
  • Real-time business insights – the ability for IT systems to respond to arising business issues.

The final level requires tools with user interfaces such that business managers can use them, as well as IT operations. Research shows only organisations with the most advanced operational intelligence capability provide insight all the way to board level. 

Another challenge is to collect data from as wide a range of sources as possible. Around 40% of organisations say they collect all or most log data from IT applications, infrastructure and websites; this falls to about 20% for mobile devices and applications. However, with the use of purpose-built tools the amount gathered rises quickly.

Log management tools

The buyer’s guide on context-aware security covered specialised SIEM software suppliers, although it should be pointed out that some – such as HP ArcSight, IBM QRadar and LogRhythm – claim to serve the broader use case, along with specialist operational intelligence suppliers. Perhaps the best known is Splunk.

By no means does Splunk have the market to itself. Competitors include XpoLog, LogEntries, Tibco (which acquired LogLogic in 2012) and Vitria; and open-source tools such as Graylog and Logstash. There also cloud-based tools such as Sumo Logic, Loggly, LogEntries and, inevitably, there is now Splunk Cloud.

Perhaps the biggest competitor to all of these comes from attempts to use machine data and turn it into operational intelligence using existing in-house business intelligence and data processing tools. There are many options for using a free resource many are not exploiting – but could benefit from so doing.

Buying criteria for operational intelligence tools

Data collection and storage
  • Does the tool have built-in adapters to gather machine data from a wide range of sources?
  • Does it have an API for adding your own custom sources?
  • Does the tool have a flexible repository to accommodate data from multiple sources, while providing a meaningful representation of the underlying raw machine data?
  • Can you easily add data sources to existing data stores?
  • Is the tool able to integrate data from multiple sources to provide a single view?
  • Can large volumes of data be processed fast enough to provide real-time correlation between events happening now and historic data?
  • Does the repository support or integrate with big data stores such as Hadoop?
  • Can external data feeds be integrated to enrich internally gathered machine data (for example, adding real-world locations to IP addresses and mapping information)?
User interface
  • Is the user interface suitable for use by business as well as IT users (for example, drag and drop interface)?
  • Is the user interface customisable to support individual user preferences?
  • Does it represent machine data in a format that can be understood by non-IT users?
  • Are graphics/reports easy to create, customise and share?
  • Is the user interface powerful enough to support complex correlations (for example, support for pivot tables)?
  • Does the tool provide automated alerting?
  • Does the licencing model accommodate broad rather than narrow use of the interface?
  • Do the tools enable external co-operation with users from external organisations, such as partners and service providers?
Administration and programming
  • Is the tool easy to manage with a clear administrator console?
  •  Does the tool provide automated alerts regarding security issues and other IT administrative issues?
  • Is the tool extensible, that is, can programmers build their own add-on applications on both the client and server?
  • Does the tool support standard methods for building extensions (for example, JavaScript, Python etc)?
  • Are add-on applications easily shared among communities of developers and their users
  • Does the tool integrate with other enterprise applications or custom applications?

Bob Tarzey is an analyst at Quocirca.

Read more on IT operations management and IT support