Cloudera Evolve25: Iceberg foundations form unified data approach to AI & analytics 

Cloudera has come forward with updates to its platform in line with its product roadmap and used its Evolve25 practitioner and partner conference in New York this month to lay down the news.

Specifically, then, Cloudera Iceberg REST Catalog and Cloudera Lakehouse Optimizer are here as a commitment to delivering the best open data lakehouse powered by Iceberg. 

With these updates, Cloudera says that Iceberg REST Catalog provides the open interoperability needed to share data, while Lakehouse Optimizer helps to ensure data is always optimised and cost-effective for all engines accessing the data, all under Cloudera’s unified governance and security. 

“As enterprises race to unlock the power of AI and analytics, they face significant barriers: complex data architectures, siloed platforms and inconsistent governance. Moving data between systems for analysis or AI training increases costs, introduces security risks and delays insights,” says the company. “Modern organisations need open, secure and interoperable data architectures that support data anywhere for AI everywhere and multi-engine analytics without forcing data duplication or vendor lock-in.”

Zero-copy data sharing 

Cloudera has integrated the Iceberg REST Catalog across what it describes as a full-lifecycle data and AI platform. The aim is to enable secure, zero-copy data sharing and unified governance across any cloud or data centre.

Cloudera’s new offering aims to solve these challenges by integrating the Iceberg REST Catalog into its platform. 

This allows third-party engines to access Cloudera-managed data directly – without copying or moving it – and ensures consistent policy enforcement and metadata intelligence in public clouds, data centres and the edge. 

Cloudera insists that it is the only vendor (across all clouds) that is capable of delivering unified security, governance and interoperability all the way from real-time ingestion and large-scale processing to AI and BI consumption. 

“Following these updates, all Cloudera customers on Iceberg will now benefit from seamless zero-copy interoperability across the ecosystem, enabling connections to leading analytics and AI engines such as Snowflake, Databricks, AWS Athena, AWS EMR and Salesforce – with full ACID compliance and unified access policies. They will also gain enterprise-grade governance, extending fine-grained access controls, lineage and auditing to third-party tools through Cloudera’s Shared Data Experience (SDX), which ensures secure data democratisation and compliance at scale,” notes the company.

In addition, customers will have open metadata access, providing instant discoverability of data assets without being locked into proprietary catalogues, which accelerates AI development and business intelligence.

Finally, these enhancements are said to deliver a lower total cost of ownership. 

Cloudera Lakehouse Optimizer 

The Cloudera Lakehouse Optimizer is a new service designed to create automated optimisations and table maintenance for Apache Iceberg within Cloudera’s lakehouse. 

It offers optimisations that are said to “go beyond basic table maintenance” i.e. including tasks like rewriting manifest and position delete files. By optimising tables, it eliminates the need for manual data management tasks and operational costs, allowing customers to focus on extracting insights from their data. It’s an open solution, applicable to any Iceberg-compatible engine on any public cloud and provides enterprise-ready observability and control through a unique user interface that allows for granular policy definition and modification. 

Policies can be applied to a specific table or an entire catalogue and the service will be the only one of its kind available on-premises in a future release. Internal benchmarks show that this service improves query performance by up to 13x and reduces storage costs by 36%.