As more organisations look to navigate the uncharted sea that is ‘enterprise AI’, data management expert, Cloudera has thrown its hat (or should that be oars) into the ring by announcing that its Cloudera Data Platform (CDP) now supports Apache Iceberg.
For the uninitiated, Apache Iceberg is the open table format that has been designed and developed as an open community standard. In turn, this allows anyone with SQL skills to build data lakes and perform related operations without needing to learn a new language.
With its announcement, Cloudera claims its open data lakehouse offerings will provide customers with a foundation for analytics and AI capabilities for all their enterprise data, irrespective of whether it resides in the cloud or on-premise.
Cloudera has been an early proponent of Apache Iceberg, introducing it to its Public Cloud offering last year and recently rolling out support for Iceberg V2. At the same time, data lakehouses have become all the rage in recent years as companies look to store and access data across multiple public clouds and on-premise environments
According to Merv Adrian, founder and principal analyst at IT Market Strategy, “Apache Iceberg is a key technology capable of enabling multi-function and multi-vendor data ecosystems, a big win for enterprises that need to involve all their data to get the most from AI.”
Cloudera states that by using CDP, companies have a safe and fast path to trusted Enterprise AI based on an advanced open data lakehouse, enabling them to deploy the latest AI models with data anywhere.
Only as good as your data
Indeed, generative AI and Large Language Models (LLM) are only as good as the data they have been trained on. \
With increased support for Apache Iceberg, LLMs can utilise all the data an organisation has under management from Cloudera, enabling users to tap into more of their data and in different ways. In fact, Cloudera recently launched its own LLM Chatbot Augmented with Enterprise Data, so that businesses can build their own AI application powered by an open source LLM of their choice using their internally hosted data.
Of course, more data requires more security and governance, with Cloudera only too happy to highlight its unified capabilities across both structured and unstructured data.
25 million terabytes served
But in all seriousness that does clearly count for a lot, as Cloudera recently claimed that its solutions are managing 25 million terabytes of data.
As Ram Venkatesh, Cloudera’s chief technology officer, says: “Large enterprises want to get value from all their data using AI and data analytics”.
While there seems to be a new ‘generative AI’ product being released every minute, the reality is we have only reached the tip of the iceberg (sorry!). However, it’s clear that without effective and well-governed data management many initiatives will likely sink rather than swim.