Toward the end of 2012, Quocirca met with an interesting
company called DataSift. DataSift is a
social data platform company - it takes feeds of data from the majority of
social media sites and can then mine through social conversations for content,
trends and insights. This is of obvious
interest for organisations that are tracking sentiment of their brand in the
market - but may also have other uses as well.
The one obvious target for DataSift is Twitter - the vast
majority of Twitter data is available in the public domain (only direct
messages (DMs) are hidden from general view).
However, DataSift can also track activity around an organisation's
Facebook page, content from blogs and forums - including other semi-private
information the organisation accesses via social networks established between
itself and the public.
The platform is cloud-based with prices based on a
combination of "complexity", hours and hourly cost along with a data cost. The hourly cost is the simplest to explain. The price is based on the period being
analysed - for a week, this would be 168 hours, for a month (nominally) 720
hours. Complexity is more difficult and is
based on a calculation that can only be completed once the query has been
created. However, the business model
does mean that you only pay for what you get: no on-going subscriptions that
have to be paid no matter what - everything is on a per use basis. The data cost is based on a small charge per
Tweet analysed. For statistical
validity, DataSift recommends that a 10% sample rate is used, which lowers the price
significantly.
As a test, Quocirca asked DataSift to run a Twitter-only analysis
of 2012 Twitter activity for a named set of vendors who are often mentioned in
the same breath as big data. The query
required just 10 lines of code to be written, and gave a complexity score of
2.1. Without the 10% filter in place,
2.23 million Tweets were analysed.
We selected an interesting topic as the basis for our test
and Quocirca will be writing a more detailed piece on the findings, but the
highlights below illustrate the potential power of the system:
- Twitter activity around big data grew by 64% over the year. This is not surprising - big data was still an emerging topic back at the beginning of the year, but was being pushed harder and harder by the vendors and the media as the year progressed.
- Nearly three quarters of Tweets contained an active link. People were not just dropping Twitter comments about big data - they were referring people to other content outside of Twitter.
- Apache had the biggest footprint with 9.4% of vendor mentions in Tweets being about it. Apache, with its Hadoop parallel processing engine and Cassandra database, is unsurprisingly the big player here.
- Second placed was 10gen, the commercial entity that looks after MongoDB, with 6.24% of vendor mentions.
- Of the "big guys", IBM gained a creditable third place with 3.25%, with HP in fourth with 2.38%.
- There were geographic differences - IBM's strongest country was France; Cloudera's was Japan. SAP was (unsurprisingly) strong in Germany; DataSift itself was very strong in the UK.
- At a domain level - the sites that people were pointing people to most from their Tweets, Forbes.com was a surprise winner. Behind that, GigaOM.com and Techcruch.com were the next biggest content sources.
As a single point of interest, a look was taken at HP at a
sentiment analysis level. Through the
first part of the year, people's views of HP remained fairly level, with a net
sentiment score (positive comments minus negative comments) of 0 - not good
news in itself, but it could have been worse.
However, between 14th November and 10th December,
a lot of sentiment activity took place.
On the 21st November, HP's sentiment score
plunged close to -10,000. It recovered back
to zero by the 24th, and then went back down to -5,000 on the 28th,
rose again and then crashed down to -7,000 on the 1st December.
Why? On November 20th,
HP's CEO Meg Whitman told Wall Street analysts that HP had massively overpaid
for software firm, Autonomy, and accused former executives at Autonomy of
cooking the books. Financial and
technical analysts went into a frenzy - the very people who use social
networking the most to get information out as quickly as possible. The ongoing fall-out was what caused the
triple-dip poor sentiment scores over the following weeks.
This shows that, although HP got a fourth place in the
mentions it had around big data, it was not necessarily positive to HP's
brand. This is why a company such as
DataSift is important - it not only can remove the grunt work of dealing with
analysing the massive firehose of data that comes from social networks, but
also applies solid analytic against this to ensure that what a customer sees as
results is there in context.



Leave a comment