Databricks CEO: Managing open source in the cloud is hard

This is a guest post for Computer Weekly Open Source Insider written by Ali Ghodsi in his capacity as co-founder and CEO at data science, big data processing and machine learning company Databricks.

Databricks was founded by the creators of Apache Spark. The company was founded to provide an alternative to the MapReduce system and provides a just-in-time cloud-based platform for big data processing clients.

The company’s technology is used by developers, data scientists and analysts to help users integrate the fields of data science, engineering and the business behind them across the machine learning lifecycle. 

Ghodsi writes as follows…

The open source community around the world is continuing to grow. According to an Octoverse report, over 1.3 million first time contributors joined the open source community in 2019. Plus, over 3.6 million software code repositories depend on each of the top 50 open source projects. 

So although open source communities are thriving, how are open source powered software vendors faring?

Despite initial skepticism, today investors and end users see the value and potential, of open source software, leading to whole list of open source companies receiving billions in funding every year. 

However, now that the open source business model has proven its worth, new challenges have become apparent.

The red herring

But there’s a red herring out there i.e the suggestion that public cloud vendors are hurting open source software

There is a growing perception that cloud vendors are exploiting open source without giving anything back and that open source vendors are hitting back by changing licenses. 

But the real issue is that it’s extremely hard to manage and run a high quality managed service in the cloud and not all open source companies are good at it.

Red Hat enjoyed huge success by becoming the prime open source enterprise vendor at a time where on-premise was the only deployment method for businesses. However, the on-premise paradigm was fundamentally different from the SaaS paradigm. In the former, most of the value of the vendor came from support, training, and services. 

They were heavily reliant on human expertise and services, which came with higher churn and lower upsells because these vendors could easily be replaced independent of the software. In contrast, the SaaS open source business model requires the SaaS vendor to be responsible for a host of additional things like providing security and reliability guarantees, and automatic software upgrades. 

The two business models are very different, as both the strategic relevance and the level of engineering required by a SaaS-based model are much higher.

Public cloud value

Today, as cloud adoption continues to take off, open source vendors are realising that they need to shift to having a SaaS-based offering. But they know cloud vendors are naturally better at operating cloud hosted software. They perceive this as a threat and thus might attempt to block the cloud vendors out by changing the licensing terms.

The reality is open source software itself has zero intrinsic monetisation value because anyone can use it, so there will always be a requirement for open source vendors to determine the value beyond the software. 

We believe this value lies in the vendor’s ability to deliver open source software as a service. The cloud is an inevitability these vendors will need to embrace and prove their performance at scale to cope with the increase in demand for edge computing. Instead of wasting time on pushing cloud providers away, they should be focused on building great SaaS offerings. 

Limiting the license of their software will just lead to less adoption and community-driven innovation around those open source projects, which poses a far bigger existential threat to their business.

Collaborating with cloud vendors

In recent years, Microsoft, Google and AWS have been very engaged with open source communities and the positive approach from the world’s biggest tech firms is a marked change to how they behaved in the past. In Spring last year, Google announced seven partnerships with open source vendors – a landmark statement that open source has arrived for enterprises. Microsoft, fuelled by its strategic mandate, has also hand-picked open source companies to keep innovation vibrant.

Microsoft and Google’s pioneering partnership approach is the benchmark for how the big tech giants can help open source tech companies. They are treating them as partners rather than third-party providers by directly integrating them on their cloud platforms and providing clarity in billing and support, all on one interface. 

For example, one of the fastest-growing and most broadly used AI and data services on Azure is Azure Databricks, a service provided through a deep partnership between an open source vendor and Microsoft. Today, customers process over two exabytes per month on Azure Databricks with millions of server-hours spinning up every day.

The next frontier will be centred around how open source businesses handle data it captures or creates, its value and the ecosystem built around it. We’re only at the beginning of this, and it’s exciting to see the emergence of economies forming around data itself.


Databricks CEO Ghodsi: it’s extremely hard to manage and run a high quality managed service in the cloud.





Data Center
Data Management