Petya Petrova - Fotolia

Without a data strategy, AI will just scale your chaos

Snowflake’s chief data analytics officer, Anahita Tafvizi, explains why a data strategy focused on governance, consistency and accuracy is the only way to build artificial intelligence that users will trust

For many organisations, the promise of artificial intelligence (AI) is being undermined by a fundamental problem: their data isn’t ready for it. This can lead to conflicting metrics and inconsistent AI agents, creating what Snowflake’s chief data analytics officer (CDAO), Anahita Tafvizi, calls “chaos at scale”.

Tafvizi argues there can be no AI strategy without a data strategy. She believes that unless organisations first do the hard work of removing data silos, establishing clear governance and ensuring accuracy, their AI initiatives are destined to fail. For her, building a strong data foundation is not a trade-off for innovation, but an essential enabler for it.

Speaking with Computer Weekly ahead of the Snowflake World Tour in Sydney, Tafvizi, who joined the data platform provider in December 2023, suggests some practical steps CDAOs can take to build a data foundation that enables AI innovation.

Editor’s note: This interview was edited for clarity and brevity.

What are the most pressing practical steps for getting the most value from an organisation’s data?

Sridhar [Ramaswamy, Snowflake CEO] and I say this all the time: there is no AI strategy without a data strategy. To get full advantage from your AI, you need to make your data AI-ready.

You have to get your data ready in a practical sense, but also get the culture of the company ready, which means data literacy, the right governance and the right education.

AI readiness for data has several components. The first is: how do we remove the silos? When I joined Snowflake, we didn’t have a centralised data team. We had a data team under sales, a data team under marketing, a data team under HR, and so on. Each of these data teams was doing amazing work, but they were doing it in silos, independently of each other.

It resulted in things like metric sprawl and dashboard sprawl. The same metric was defined differently by different teams. Even simple definitions, such as the lifetime value of a user or the number of active accounts, could be different. When I joined, we decided to centralise the data teams under me, which has helped us bring these definitions together, remove data silos, and make sure people are speaking the same language. And we are creating what we call the Metrics Council. Every time we define a new metric, it will go through this council to make sure we are all in alignment.

Consistency is a common challenge for a lot of CDAOs. In the age of AI, this is becoming an even bigger problem, because metric sprawl and dashboard sprawl are now joined by agent sprawl. It’s not very hard to build an agent. As a result, you now have different teams building agents that may be inconsistent and making different decisions, which magnifies the problem you had before.

If you release an AI model that’s only 80% accurate to your sales team, you lose their trust and they won’t use the model, so what was the point? If you take your time, you can come in with very high quality, even if that means going to market a little later
Anahita Tafvizi, Snowflake

The second part is having the right metadata, which we call the semantic data layer. This is how you build a translation layer between your data and your business logic. I’ve led data teams for a couple of decades, and every time I went to a new company, the biggest challenge was a lack of documentation. No one documents anything. So when people come in and ask, ‘Where do I find the revenue data?’, they have to ask the person next to them. If the only documentation is in people’s heads, how do you give that context to the AI models?

The third part is a verified query repository. We’re solving this AI problem for both structured and unstructured data, and interestingly, unstructured data is easier to solve. With unstructured data, LLMs [large language models] are all probabilistic – they’re basically determining what the next word is most likely to be in a particular sentence.

The problem with structured data, in my opinion, is that the bar for accuracy is 100%. If you ask, ‘How much was my Q1 revenue?’, there is only one correct answer. You have to be correct down to the penny. That’s a very difficult problem to solve, and the way we have done it is to create a verified query repository – a set of queries verified by the data team so that business users know they are reliable.

Lastly, there is continuous testing and learning, which we call reinforcement learning. Every time you find an error, you add to your instructions, you add to your verified query repositories, you make your semantic layer better, and then you keep iterating. This is how my team built a GTM [go-to-market] AI assistant, a tool for our sales organisation they can use to prepare for a meeting or better understand their accounts.

The combination of great AI models and bad data or bad governance brings chaos at scale. You previously had chaos; now you have scaled your chaos. It has never been more important than it is now to build governance across technology, operations, and even the company culture.

How should CDAOs balance the desire for a single source of truth with the growing availability of real-time data for real-time decisions?

Removing the silos gives you a single source of truth.

Moving to a unified business intelligence platform and removing dashboard and metric sprawl felt like a thankless, back-end job. No one really sees it if it’s done well. But when I started having customer meetings, everyone was interested in hearing about the challenges of dashboard and metric sprawl within their companies.

Their business users don’t know where the truth lies. Why do two dashboards show two different numbers? While there are always nuances to explain these discrepancies, such as different definitions or time frames, you have to explain them, and that makes it very hard for users. Every minute a salesperson spends trying to debug and triangulate two different dashboards is a minute they don’t spend with customers. Our job is to make the information readily available, trustworthy and consistent for them.

Building a very strong foundation layer to enable AI innovation is something I have personally experienced and have always pushed for, but it’s interesting to see this is a consistent theme with other CDAOs across the industry as well.

We had a CDAO panel at the Snowflake Summit a few months ago, and the question of balancing governance and innovation came up. Governance means taking the time to build your data foundation the right way, but that can mean you may not be able to build your next shiny AI project as fast as you’d like.

But I don’t consider them trade-offs. I think a data foundation enables innovation, rather than working against it. I set a very high bar: if you release an AI model that’s only 80% accurate to your sales team, you lose their trust and they won’t use the model, so what was the point? If you take your time, you can come in with very high quality, even if that means going to market a little later. Don’t get me wrong, I also always push for execution and innovation, but at the end of the day, I hold quality to a very high bar.

What do you think are the most important AI governance practices that CDAOs should be putting into action right now?

Governance has many components. To me, accuracy is non-negotiable. In the past, we would build dashboards and, subject to the nuances of definition, the data was, by and large, accurate. With AI agents, it has been harder to get the accuracy to that level, but that’s the expectation. Otherwise, it’s not helpful. So the first step is to deliver timely, accurate, high-quality data.

Then there’s access control. Snowflake makes it very easy to bring the access controls for different tables to your AI agent. For example, the GTM AI Assistant we built is open to the entire sales organisation. But when salespeople use it, they only see the accounts they’re supposed to see, not the entire company’s revenue. And a sales manager can see the performance of their team, but not their peers’ teams. The access controls you have in Snowflake flow through to the AI agents.

To summarise, CDAOs should be concerned with giving the right people the right access to the right data at the right time.

There doesn’t seem to be agreement about what the term ‘AI guardrails’ means. How can CDAOs come to sensible conclusions about what they should expect?

There are some things we can all agree AI shouldn’t do – promote violent crimes, hate speech, self-harm and sexual content. As technologists, we should ensure LLMs are blocked from providing these sorts of responses. In fact, last year, we released Cortex Guard, which enables enterprises to easily implement safeguards that filter out potentially inappropriate or unsafe LLM responses.

When it comes to the broader view on guardrails, I believe it’s in all our interests that we do everything we can to ensure AI is trusted – whether that is through accountability and data governance, risk management or transparency. Trust should be considered non-negotiable.

Read more about AI in APAC

Read more on Business intelligence and analytics