Skillpages develops DataStax Cassandra to rank trades with NoSQL

How do you find a good tree surgeon you can trust because he or she has been recommended by someone in your social circle?

How do you find a good tree surgeon? One you can trust, because he or she has been recommended by someone in your social circle? These questions are the heart of a Dublin-based web company, Skillpages, founded in 2010 by Iain MacDonald and Michael Gallagher with $4m.

Mike McCarthy, CTO of Skillpages, describes the company as similar to LinkedIn, but blue-collar and with skills to the fore. It could be that you are an IT consultant, specialising in data warehousing by day, but a guitar teacher out of hours.

The website has a presence in 170 countries and there is a Spanish language version. In 2010 it had 65,000 users. It now has 19 million.

Traffic growth points to NoSQL

In its early stages, the service used Microsoft SQL Server as its database engine, but as traffic increased through 2011 – generating more than a billion rows of data from 3 million users – it explored NoSQL databases

McCarthy says his developer team built prototypes using MongoDB and Cassandra. Although MongoDB seemed better out of the box, the team elected for Cassandra because it met their need for scalability, resilience and ease of deployment. “Cassandra is good for scale out, meeting our plans for multi-region replication,” he says.

In moving to production they chose DataStax as the Cassandra supplier. Doing it themselves would have meant making and then learning from mistakes, he says, and that would have been too time-consuming. “And so we did not have to add any dedicated engineers to the project,” he says.

In making its move to Cassandra, the team ran the new system in parallel with the old one for six weeks in 2011. SkillPages then expanded its cluster from three nodes initially to six in 2011, then to 12 nodes in 2012.

“We find DataStax OpsCenter particularly powerful and insightful in detecting and resolving issues before they become problems,” he adds. 

“It is a good tool for managing the clusters.”

Secret sauce

The graph database, SkillGraph – that makes up what McCarthy calls the “secret sauce” of the business – is one they did build for themselves. It models users’ real-life connections on and off the platform, using users’ profiles on Facebook and Gmail, but not LinkedIn.

“We process in excess of 40 GB additional telemetry data on a daily basis. This data excludes user generated content and profile data. Our primary social graph models in excess of 2.5 billion objects. This is continuously reconciled against connected social media accounts,” says the company.

McCarthy explains the company’s approach is to achieve a relevancy that could be indirect. For example, US army veterans could be a good source for the skill of putting up telephone poles and the company’s data mining will surface such skill pools, he contends. He gives another example of a plumber in Manhattan and how their ranking, through the company's proprietary software SkillRank, will change according to how actively they are contributing to the service’s community.

Diverse IT estate

The company’s IT estate – which runs on Amazon’s cloud services – is “certainly diverse”, says McCarthy. 

“We still have a Microsoft SQL footprint, we use Solr for search and we do a lot with Oracle’s MySQL, too. We use Hadoop and generally open to open source," he says.

“But we are not religious about particular technologies. We look to hire developers who can solve problems rather than those specialising in certain technologies.”

He has 20 developers, with "almost all of the European Union is represented” in the team, he says, which is based in the Republic of Ireland.

The NoSQL option was not so much about cost, he says. “We won’t go cheap necessarily, but we don’t flash our money around either. We invest the investors’ money wisely,” he says.

Read more on Web software