Case study: JustGiving develops Facebook-like graph on HDInsight

Online fundraising site JustGiving has created a Facebook-like media graph database to understand how people donate

Online fundraising site JustGiving has created a Facebook-like social media graph database called GiveGraph to understand how people donate to causes.

Unlike Facebook, GiveGraph attempts to calculate a weighting for the strength of the relationships between people and the causes they are most likely to support.

The implementation is believed to be the most complex of its kind, built using Microsoft's cloud-based HDInsight Hadoop distribution and integrated with Facebook’s OpenGraph.

Speaking to Computer Weekly at Microsoft's Future Decoded event in London on 10 November 2014, Mike Bugembe, chief analytics officer at JustGiving, said it was trying to find out what people care about.

In the world of e-commerce, retailers look at the Amazon recommendation engine or the technology behind eBay as methods to improve their understanding of the customer. But the giving sector is different.

"We needed giving to become social, but it also had to be relevant," said Bugembe. 

He pointed out that retail transaction data could be mined to understand customer demographics, whereas in the charity sector "the transaction data does not give you a good idea of what people care about".

Instead, people experience life events, which change their outlook, said Bugembe. "Your previous behaviour is not the best indication of your future," he added.

Proof and the team

The project began with a proof of concept. "We had to answer the question of whether a machine is better than intuition," said Mike Bugembe, chief analytics officer at JustGiving.

The online fundraising site ran two sets of test subjects, the results of which demonstrated that the computer-based GiveGraph system was able to beat the marketing team by 91% in its ability to predict the causes people would be most likely to support.

The team behind the GiveGraph system was made up of 14 machine-learning experts, developers and statisticians. Putting together the team for the project was as challenging as the technical implementation. 

"We looked for the mythical data scientist, but it was impossible, so found individuals who had multiple competencies," said Bugembe.

The programming team comprises engineers with a statistical background, statisticians with a computer science background and machine-learning experts with coding and maths knowledge, he said.

Key to the team’s success was having people who were "intellectually curious", according to Bugembe.

The most extensive machine-learning libraries were in Java, while the developers used .Net, so the application needed to be developed in a way that gave the multi-disciplinary team the flexibility to use Java, .Net or Linux.

Relationships generate big data

Prior to starting the GiveGraph project, JustGiving used traditional methods for data analysis. This provided the JustGiving site with a first attempt at segmenting the people who used it, based on a collaborative filter. Users were divided into groups of similar types of people to provide recommendations.

However, Bugembe said this didn’t work because what people care about is very personal to them: "Cancer or earthquakes are not selective."

Through its research and analysis, the JustGiving data analytics team realised that the causes people care about are related to their relationships and connections with each other, and these relationships may be several steps removed from the individual who is donating.

Bugembe and the team of developers and data analysts decided to follow Facebook’s Open Graph functionality. "While Facebook looks at people-to-people relationships, we have people-to-people-to-cause," he said.       

Bugembe says social media interactions can be highly personal. These interactions can linked through systems of associateship to predict the causes people would be most likely to support. "We are able to look at all people who use JustGiving, the relationship they have with each other and their reputation," he said.

Traditional relational database designs cannot handle this volume of data. "The dataset triples in size when you look at a media graph, and most databases cannot handle this," said Bugembe.

This is because a relational database, which is optimised for processing rows and columns, is not suitable for processing where any row and any column could be related, he explained: "When you look at a system of relationship, you can have a relationship with any row or column." 

Indexing, to speed up data processing, is almost impossible because there are millions of inter-related data points, which would require the entire database to be scanned.

More articles on graph databases

Scaling with Azure

In terms of implementation, JustGiving used the Microsoft Azure platform as a service. "We already knew .Net, but we realised our implementation would be pushing the boundaries of the technology. The GiveGraph applications have to work with five million users a month, so we needed to ensure the site was robust," said Bugembe.

JustGiving built GiveGraph using Windows Azure to ensure it did not have to worry about the availability of the back end. It used HDInsight – Microsoft’s distribution of Hadoop in the Azure cloud – to build the graph.

GiveGraph integrates with Facebook’s OpenGraph so it can rank content from Facebook, but additionally calculates a weighting against people’s relationships.

Read more on Big data analytics