AI workflows - Starburst: Chewing into the fruits of AI data products

This is a guest post for the Computer Weekly Developer Network written by Justin Borgman, CEO of Starburst.

Starburst is known for its data platform that allows data science and software application development teams to discover, trust and act on data with built-in governance and context across cloud, on-premises or hybrid environments without moving data.

Arguing that enterprise AI is built around access, collaboration and governance, Borgman explores a set of new AI workflows designed to address the key challenges enterprises face when scaling AI agents and other AI initiatives, particularly in terms of access, usability and control.

He writes in full as follows…

Every enterprise is talking about AI adoption and it’s easy to understand why. In a very short space of time, we’ve moved from the foundational building blocks of large language models (LLMs) powering gen-AI services and RAG workflows to the advent of specialised AI agents.

In time, this new generation of AI agents will create a competitive advantage for the enterprises deploying them, driving greater productivity and improving efficiency. But as things stand, enterprise organisations are at different stages of development on the agentic AI journey.

Some are surging ahead, while others are experimenting with AI but are struggling to get those initiatives off the ground. In many cases, this is due to an AI implementation gap at the heart of enterprise AI. But the underlying factor here is that AI is only as good as the data that fuels it. For AI agents to achieve the critical mass of value that businesses demand, they need access to high-quality, contextual data across the entire organisation.

Data products are fuel

As enterprise AI moves from experimentation to reality, organisations are seeking new methodologies and solutions that will turn ideas into action and innovation into production. Integral to this new reality is the emergence of “data products”, which are a selection of purpose-built and reusable data sets.

Data products use business-approved metadata to solve specific, targeted business problems quickly across analytics and AI workloads.

A data product combines different data entities with built-in governance, metadata and access controls, making it easy for teams to discover, trust and consume data without manual preparation. This approach turns raw or siloed data into well-defined, query-ready data assets that can be securely shared across teams and environments.

Data products are essential for AI workflows. Metadata is a critical component for retrieving accurate results. By helping to capture and curate metadata in a structured and consistent way, data products serve as the perfect catalyst for AI, which thrives on context and metadata. Because of this, metadata that includes schema details and governance rules helps AI models understand context, which in turn improves feature engineering, model accuracy and explainability.

Building the data stack

Fortunately, many enterprise organisations already have the infrastructure in place to support AI workflows and create their own AI data stacks. This is down to the fact that many enterprises have adopted a federated data system that allows them to access and query data in real-time, regardless of where it lives, virtually or geographically.

Organisations have been able to consolidate their data management using managed data lakehouse and analytics solutions designed to quickly find relevant data across any dimension, source or index.

As a result, developers, DevOps and data teams can rely on AI workflows that will help them to accelerate AI adoption and innovation. This is helping them to move AI strategy from experimentation to production by making governed, proprietary data instantly usable for a variety of use cases. This includes SQL functions that enable analysts to apply gen-AI directly within SQL queries, making it easy to analyse and transform unstructured text using functions like classification, sentiment analysis and translation.

Other features include AI search, which transforms structured, semi-structured and unstructured data into formats that AI agents can leverage. They can also access proprietary and third-party AI models within a robust data governance framework.

Building the agentic AI workforce

AI workflow technologies and methodologies will also play a critical role in the rollout of enterprise AI agents that will take on different levels of responsibility throughout the organisation. This is important because, despite the enormous potential of agentic AI, there are still some risks associated with relying on autonomous AI agents to carry out specific tasks that, if managed incorrectly, could damage operations or harm reputations.

This is why it’s vital to have human oversight in place to control and manage every aspect of an agentic AI workforce. From agents designed to manage the most basic of repeatable tasks to those capable of making more complex decisions higher up the enterprise food chain. To the point that we will have agents that are capable of managing other agents. For this to be a success, humans will be required to manage agentic workflows – essentially orchestrating and coordinating agents to complete multi-step tasks reliably, within their defined roles, to the best of their abilities. As this workforce begins to scale, humans will be required to ensure that agents act within governance, policies and a shared context.

Eventually, this will lead to the development of specialised small language models (SMLs) that will operate individually or in tandem with LLMs to perform very specific tasks. We will reach a stage where the models will become agents once they are given a role, a shared context and the tools required to deliver on a specific task.

A federated approach

Having effective AI workflows in place will deliver trust and accountability, especially when it comes to agentic AI and building out an enterprise AI workforce. It will help to eliminate concerns about data leakage, biased outcomes, shadow automation and unclear liability. It will allow organisations to deliver a form of ‘AgentOps’ that will track and manage the performance of agents, providing an audit trail that can be used for continuous assessment and improvement.

All of this is possible by having a federated approach to data in place that will underpin AI workflows for every type of AI use case.

It will also help to power an enterprise organisation’s AI workforce by providing the building blocks – in the form of a data lake house platform, connectors, data products and AI intelligence – needed to facilitate data queries from disparate resources and return results with traceable lineage. This will ensure that every agent’s query, response, interaction and workflow can be traced to a verifiable data source.

By adopting high-leverage auditable workflows, developers, DevOps and data teams can build the governance and data foundation that will allow them to build out their AI capabilities with confidence.