Calmer waters promised in the data lake through Linux Foundation Delta Lake Project 

The Linux Foundation’s promotion and hosting of Delta Lake is an interesting development.

Delta Lake (wait for it… the clue is in the name) is a project focusing on improving the reliability and performance of data lakes. 

Delta Lake was actually announced by unified analytics company Databricks earlier this year before this autumn becoming a Linux Foundation project with an open governance model.

The team points out that organisations in every vertical aspire to get more value from data through data science, machine learning and analytics, but they are hindered by the lack of data reliability within data lakes. 

Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. 

NOTE: ACID compliance describes properties of database data that have atomicity, consistency, isolation and durability — MariaDB provides a nice fully-fledged definition here if you want to read more.

Conformant comfort

The schema enforcement capability in Delta Lake is said to help to ensure that the data lake is free of corrupt and not-conformant data.

“Bringing Delta Lake under the neutral home of the Linux Foundation will help the open source community dependent on the project develop the technology addressing how big data is stored and processed, both on-prem and in the cloud,” said Michael Dolan, VP of strategic programs at the Linux Foundation. 

“Alibaba has been a leader, contributor, consumer and supporter for various open source initiatives, especially in the big data and AI area. We have been working with Databricks on a native Hive connector for Delta Lake on the open source front and we are thrilled to see the project joining the Linux Foundation. We will continue to foster and contribute to the open source community,” said Yangqing Jia, VP of big data & AI at Alibaba.

As noted above, Delta Lake will have an open governance model that encourages participation and technical contribution and will provide a framework for long-term stewardship by an ecosystem invested in Delta Lake. 





Data Center
Data Management