heyengel - stock.adobe.com

Silicon Valley startups look to plug gaps in data value chain

A group of California-based startups and early-stage data management companies are promising CIOs ways to plug gaps in the data-to-insight value chain

This article can also be found in the Premium Editorial Download: Computer Weekly: The startups transforming data analytics

The classic strategic management model of the “value chain”, made famous by Michael Porter, can be applied to data.

Each point in the chain adds something new: something of value. In the context of data work, you start off with raw data, then it goes through stages of refinement before it results, finally, in insight that is significant. For a company, contributing either to making money or saving money.

So far, so simple. Except it’s not.

Computer Weekly was represented on a recent European IT press visit to business applications and data analytics companies in San Francisco and Silicon Valley.

As often, but not always, hints about the future shape of UK corporate IT can be seen in what’s coming out of new and relatively new technology companies in northern California.

Five of the companies visited on the recent tour are tackling stages in what could be called the “value chain” of taking raw data to final analytical value: Fivetran, Promethium, Alation, DataGrail, and Anaplan.

Fivetran: Get data pipelines out of the ‘90s

George Fraser, CEO of Oakland-based extract, load and transform (ELT) data company Fivetran, said he and his co-founder and chief operations officer Taylor Brown were “baffled” as to why data cannot be easily extracted from source systems and loaded into data warehouses and data lakes.

Their focus is on cloud data warehouses, such as Snowflake, and their belief is that the pipeline stage of the data stack is still stuck in the 1990s, while data storage and analytics have moved on. The company’s closest strategic partnerships are, it says, with Google Cloud, Snowflake and BI platform provider Looker.

The company was founded in 2012, has 600 customers, 110 employees and 12 office dogs. It opened an EMEA office in Dublin in January 2018.

The company’s name is a play on Fortran, a computing language that will evoke nostalgia in the minds of many Computer Weekly readers. It was, however, just a holding name while the company’s founders looked for a business technology problem to solve, then build, a company around.

Read more about Silicon Valley data management start-ups and early stage companies

And they believe they have found it. Their thesis is that a good deal has changed in the end-to-end data stack, the advent and growth of cloud computing and the plunging cost of storage.

Modern data warehouses are, on this view, cloud-based, columnar in data architecture, with computer separated from storage. But extract, transform and load (ETL) has, in their view, not changed in almost two decades.

Data warehousing has gone through the Hadoop revolution into the cloud, while, at the other end of the value chain, BI has gone from the reporting tools generation of Business Objects, Cognos and Microstrategy through self-service BI to what they term “centralised, cloud-native, self-service” BI.

Fraser also counsels traditional companies against modelling themselves on the vanguard of companies with “exotic data needs, like Netflix. For the median company the cost of storage is not a concern, really. Getting people to look at the data is the hard part.”

Striking fire from data: Promethium

Prometheus was the Greek god who bestowed the gift of fire upon mankind. Kaycee Lai, CEO and founder of Promethium, based in Menlo Park, hopes to set fire to the many stages that lie between discovering a data source and running a query that will yield a relevant insight to a business.

The thesis of the company is that steps like determining access, ingesting and integrating data, selecting relevant subsets of data and then assembling those for query can be collapsed.

“Some 90% of the cost of business intelligence is labour. You’re never sure if the data is right until you get to the query stage,” says Lai. “In the meantime, you have expended several months of time on the labour of business analysts, IT professionals, data scientists, DBAs, and so on. That is where the bottleneck is: someone has to figure out the mess.

“We think you need to change the process from the beginning. Instead of starting from the bottom, and finding all possible data sources, start from: ‘what question can I answer?’”

His firm’s software, he said, presents the user with a menu of questions to ask, automatically joins data across all sources – he cites a Teradata data warehouse, an Oracle database, a Hadoop data lake as typical – and leaves the user to run the query and validate the results of that.

He said while it typically takes 180 days labour to aggregate data across all sources, with Promethium it takes six; and that while it takes 360 days labour to “determine accurate context”, it takes one day with their software.

“I love data catalogues, but this is a different thing”, he added. “Anyone can search a file system, but that doesn’t mean you know what is in it. That was the big ‘Aha!’ moment for me [when deciding to set up the company]: ‘how is a human being supposed to make sense of data at a necessary level of abstraction, such as ‘how many gilets did we sell in London last year?’ Shouldn’t I be able to just ask a question, rather than figure out how to form a SQL statement, which is an unnatural procedure?”

Alation: Surfacing the social graph among knowledge workers

Data catalogue vendor Alation – not a new company to the tour – continues, on co-founder and CEO Satyen Sangani’s account to continue its endeavours to make customers’ investments in big data technology and personnel work better than they have been doing so far.

The thesis of the company, based in Redwood City, is that digital transformations fail because of the lack of a data culture in complex organisations, and that data catalogue technology is a catalyst for making the social graph within such organisations visible to knowledge workers: to know who else has been looking at what. And they apply machine learning to how data stores are used to translate “data speak into plain English” (as a metaphor for natural language, the supplier’s software is not restricted to English).

“The reason why we think the data catalogue is key is that we have to transform not just the tools that knowledge workers use, but how they think: adopting a scientific mind set. And leveraging AI not just in the end products people are using, but in the workflows of how they are doing their jobs on a day-to-day basis, and making them more data literate.

“Our growth, from 200 to 400 employees, and with a broad customer base in big global enterprises, shows why that is important.”

Customers include Ebay, MunichRe, Pfizer, BMW, and SurveyMonkey. MunichRe has, he says, used Alation on a data lake to launch a new green energy business, deploying some 2,000 actuaries, analysts, and data scientists.

“We apply machine learning to translate data speak, the things in the technical systems, into English to surface stuff like: ‘who has touched the data, how often has this question been asked’ and so on.”

DataGrail: Managing data in the “age of privacy”

Daniel Barber, CEO, DataGrail founded the San Mateo based startup with the wave of data privacy compliance – signified by the GDPR in Europe and the CCPA in California – in mind.

The idea behind the company is to operate at a meta level vis-à-vis the personal data in business applications in order to make its use compliant, responding to requests for deletion, and so on. And its philosophy is avowedly a direct counter to Mark Zuckerberg’s famous 2010 edict that privacy is dead, says Barber.

Large companies, especially, he said will run a plethora of software as a service marketing systems: Marketo, Eloqua, Salesforce, and the rest. Managing personal data across business systems is increasingly problematic.

Barber claims his company’s software makes handling data requests a simpler task. Their “request manager” system receives such requests on their client’s behalf, and “automatically performs access, delete, portability, or other privacy requests” across a gamut of CRM applications. They then email the requestor after the company has reviewed the output, and store a compliance log for internal and auditor review.

Barber’s background includes stints at Responsys and Node.io where he met his co-founders, Earl Hathaway and Ignacio Zendejas – each of whom have built machine learning systems at companies including Quantcast and Facebook.

“We all observed that businesses use a number of SaaS [software-as-a-service] systems to run their business. At Datanyse, I saw we tracked all the systems that businesses would use on their web sites, and it is not in single digits.”

Anaplan: Connecting dots

Like Alation, Anaplan is not new to this particular Silicon Valley IT press tour. Nor is it a startup or even an early stage company: it is public and has moved into a much bigger office in San Francisco than the one where this tour encountered it on previous visits.

Nevertheless, its focus on providing software as a service to enable its customers to do business planning, using data in a more connected way, fits the theme of generating value from raw data. And putting spreadsheets to the sword into the bargain.

The supplier now says it has 1,100 customers in 46 countries and $240m in its most recent annual revenue statement. Their estimate is that 81% of companies worldwide use spreadsheets as a primary planning tool and 72% are still reliant on on-premises software. So, they see scope for further growth.

Ed Tang, vice-president of strategic finance, business analytics and go-to-market operations at Box, presented on the IT Press Tour and testified: “Before we implemented Anaplan, we used spreadsheets for planning functions in both Sales and Finance.

“These were often massive files that existed in hundreds of versions and were painfully slow to open. Now we have a trusted, easy-to-use, cloud-based platform for broader collaboration, improved alignment across departments and better decision-making”.

Data may be the new oil, but it looks like the refining process needs more work if companies and organisations are to get business value from it.

Read more on Big data analytics

Data Center
Data Management