Splunk University: big data skills training from the ninjas

Alongside Splunk’s annual event Splunk.conf runs the software firm’s learning and certification course, Splunk University, where in 2017 I spent a day learning how to use Splunk to deep dive data and improve my big data skills.

As is common with third party software deployments, firms usually run certification courses which employees can go on to become an expert in the software their company is using.

After a few wines I decided it would be a really good idea to go on one of these courses and learn a little more about one of the firms I write about often, and managed to get the firm to agree.

I’ve written before about my fear of getting too techy, having chosen to put my degree in computer science on a shelf and never look at it again.

When it came time to actually attend the Splunk University course before the firm’s annual .conf event in Washington D.C I regretted agreeing to go – some of the coding workshops at university had been my least favourite parts of the course and I was worried it would bring back too many disappointing memories.

I joined the Splunk Fundamentals 1 course, which is advertised to do exactly what it says on the tin and teach you the basics of using Splunk to get the most out of your data.

The one day course promised eight modules to cover searching in Splunk, using various different fields as part of searches, creating dashboards and reports, learning about Splunk’s search language, using transforming commands in Splunk and creating lookups, reports and alerts.

We started by logging in to a dedicated server which was hosting all of the example data we’d be searching and accessing in real time during the session.

Yes, I did make a delighted “OOH!” noise as I successfully logged in to the server. Yes, I realise logging into the server is not the most complicated thing I’ve ever been asked to do, but I was anxious so I’ll allow myself this joy.

On this server was data from a virtual company called “Buttercup Games” a Splunk invention created to generate data with the purpose of testing and teaching the Splunk environment.

Buttercup is actually quite a big thing in the Splunk universe; the pony mascot has its own game and always stands tall at Splunk events.

Fictional company Buttercup Games is a platform selling video games and some accessories through third party retailers and an online store with three web servers – two different kinds of data injected into Splunk that can then be interpreted.

Data can come from logs, configurations, messages, call details records, alerts, metrics, scripts, clickstreams, virtual machines, internet devices, communications devices, sensors, databases  – basically “any system using a computer, any device that has logs” according to our session teacher and Splunk senior training technical consultant Laurent Dongradi.

Data is indexed as it enters the system – an index is a location within Splunk that searches for and stores event data – different users have different types of access to data depending on what they would need to know.

For example a security team might need access to the web index and the security index to notice trends in web patterns and whether these may have contributed to cyber-attacks.

As the class started and we were given time to acclimatise to the dashboard, the person sitting next to me immediately said: “Right, let’s click on some stuff.” to which my response was to stare at him in terror.

There’s just something about code that makes me feel unwell – and during our first module where we were talking through searching the data loaded into Splunk I realised we had to use Splunk’s own search language, called “Spl”.

Since I quit coding I’ve been continually assured that in some cases it is becoming easier to learn – automation is allowing some companies to develop software that requires little training or technical knowledge to use.

The Spl language was similar to many others, and once the familiarity set in I realised the best way to tackle each module was to power through… until a search returned a ridiculous result and then suddenly I was back to being terrified. I proceeded with caution.

Dongradi explained “we could define Splunk very simply as a search engine” allowing you to use different terms to search through data and use the results to decide what to do next.

For example Shazam uses Splunk to analyse usage data for its application to assess whether any changes to functionality have been successful, and Travis Perkins uses it to help respond more quickly to cyber-security threats.

What you’re searching for will depend upon the business – as a fake retailer Buttercup Games may be interested in looking at online vs in-store sales or trying to notice patterns surrounding abandoned online baskets.

If there are searches you hope to return to or check regularly they can be made into reports, and can be displayed on a dashboard which can be shared around the organisation – visualisations of data can be generated to help to drill down to underlying events.

But as we progressed it became clear to me that learning to use Splunk was by no means the most difficult part of the session – it was understanding what data to search and how to use the results.

For example we began by searching for failed log in attempts because our imaginary manager had asked us to.

This made sense to me – I knew that if a company is concerned about cyber-attacks it is a good idea to look at failed log in attempts to see if there were any patterns relating to the server that the attempts occurred on, the time gap between failures and whether the same person is consistently attempting entry and failing.

Insights like this can help pinpoint issues and more quickly prevent a disaster.

Later on in the workshop we were asked to look into other pieces of data, the reasons for which were a little less obvious.

One of the searches developed a chart which showed each of the actions that took place on the Buttercup Games website for each of the products the site sold.

For example, there were certain games that were added to the cart more than others, and also many that had been viewed, but then not purchased.

As it happens, World of Cheese was a good seller, but Holy Blade of Gouda was not very popular – there is an argument that data like this can determine which are the bestselling games or perhaps whether there may be a problem with a certain purchasing webpage if the purchase is abandoned many times for a particular game.

Then as an example of how to transforming commands such as ‘top’, ‘rare’ and ‘stats’ we were asked to find the top two places visitors to the website are coming from.

For some reason when I successfully extracted this data and displayed  it as a pie chart I got very excited – possibly because we’d been sitting in a room for a long time without seeing the sun and it was one of the last things we were asked to do.

Using Splunk was easy enough to learn, but as in many cases this doesn’t mean investing in expensive software will gain insight for your company – data scientists are in high demand at the moment as technology adoption and digital transformation has dramatically increased the amount of data firms are collecting.

Learning as many tech skills as possible can only be a positive thing – as an organisation Splunk works to help people in and outside of the firm to learn the skills needed to use its big data technology.

But data should not be underestimated – when properly interpreted it can be the perfect companion to a business plan, but it’s also very easy to drown in a sea of figures.

It’s all very well and good finding out which products are selling well online and which are struggling… but what are you then going to do about it?

Learning exactly how data ninjas glean insights from software such as Splunk was extremely interesting and gave me a new found respect for exactly how much data organisations are dealing with.

Whether I’ll place my Splunk University participation certificate on the shelf next to my degree and never look at it again? I’ve not decided yet…