Giving ourselves the chance to lead the AI race, and stay ethical too
This is a guest blog post by Denas Grybauskas, Chief Governance and Strategy Officer at Oxylabs.
Within the UK and Europe more broadly, many companies developing AI are having to balance an ever-growing list of priorities. AI models require huge amounts of data, but creators of the original content must be respected. In the last year alone, we have seen the EU AI Act and GPAI template come into effect. These bring some order, but the nuts and bolts of implementation are still being worked out and negotiated by lawmakers and businesses alike.
Europe has long been seen as regulating industries more strictly than China and the US, and AI is no exception. But what are the effects of this? Are we effectively balancing innovation with ethics for the enterprises within our region?
Keeping up with the need for data
To understand how to move forward, we first have to put ourselves in the shoes of organisations currently expanding AI operations. Data is needed to train models, and developers must acquire a consistent pipeline of clean, broad data to maintain their operations. It is essential that access to large datasets is maintained. Smaller datasets, like in any data practice, risk replicating bias. Bigger datasets often mean more balanced training data, which can lead to less biased outcomes. And there is no larger dataset than the whole open Internet. Thus, it follows that open access to the public web is crucial for responsible AI development.
We have already seen Big Tech strike deals to acquire training data so they can continue to operate. However, such deals, while a step forward, do not solve all the problems. Consider, for example, the deal between Amazon and the New York Times (NYT). If NYT is its only major source of training data, Amazon AI will lean heavily towards NYT’s point of view. To avoid this and have a versatile AI that takes into account multiple viewpoints, Amazon has to either make hundreds, if not thousands, of deals or find data resources elsewhere. Hopefully, it can use public data aggregated from multiple online sources to achieve holistic AI.
Additionally, if we pursue the dealmaking route, many will likely be left behind. Smaller AI companies, as well as smaller publishers, will face economic barriers and potentially find themselves without data or compensation for content. Data is the new oil, especially for AI, but like with any resource, egalitarian access isn’t guaranteed.
Pairing regulation with ethics – can it be done?
The need for large data sets is evident, and legislators are already being cautious in how they police access to these. Earlier this year, the Data (Use and Access) Act was debated in the House of Lords. With this legislation, the UK government aims to make training data more accessible, understanding its crucial role in AI innovation. However, there are concerns that more consideration should be given to protecting content creators, with the government promising separate legislation to address this.
Meanwhile, the US is making its regulatory stance clear with the Big Beautiful Bill. They will be prioritising innovation over all else, providing clear direction to US AI model developers for the time being. Such an act could hardly exist in Europe, where rapid regulation, not rapid innovation, often takes precedence. Of course, regulation can help protect both private persons and businesses. However, rushing to regulate an industry without giving yourself time to understand it has downsides.
We’ve seen it with GDPR. Companies that built privacy by design early avoided fines and had easier access to other markets that adopted similar GDPR standards at a later date. At the same time, some parts of GDPR were hard to interpret, and it became burdensome for businesses. Ideally, legislators would have learned their lessons and aim to avoid this as they turn their eyes to the AI industry.
The steps needed for innovation and compliance
While regulation is still evolving, we have the opportunity to strike a balance between establishing proper guardrails and allowing space for innovation. An approach that requires the disclosure of training data sources rather than secure affirmative consent is one way to move forward. Such requirements address at least some of the safety and fairness concerns and are less likely to drive innovative companies away, causing Europe to fall behind in the AI race.
If companies are backed into a corner and unable to collect enough data, they might push forward with biased AI models built on small data sets. Alternatively, if they decide the regulation isn’t something they can follow, they might find loopholes to collect data regardless and ignore all guardrails. Overly burdensome legislation might drive Big Tech companies, such as those in the USA and China, to lobby for exemptions. Its results are already seen in how the USA equates EU regulation with a digital tax against Big Tech.
The European Commission is attempting to find a solution and promote transparency through the GPAI template, which requires AI companies to disclose information on data sources, compliance measures, and other relevant details. We are only at the starting point, but focusing on transparency can help create conditions for building a fair AI and data ecosystem where concerns of all stakeholders are addressed. From here, we can work toward a model of remuneration for content creators that will allow both theirs and the AI industry to thrive.
We still need to see more of the GPAI template’s actual effects to tell whether it provides enough protection for one industry without excessively burdening another.
The EU often prefers to risk over-regulation rather than under-regulation, which is all the more reason to pause and consider the existing regulation before rushing to expand it. Making effective use of existing regulatory tools, rather than creating new ones, may be the best way to bring clarity and balance to the markets.
