Taking back control: Could a distributed model breed a better AI?

AI tools such as ChatGPT are trained on datasets scraped from the web, but you don’t have much say if your data is used. Technologist Bruce Schneier says it’s time to give control of AI training data back to the people

In his latest book, A hacker’s mind: How the powerful bend society’s rules, and how to bend them back, public interest technologist and cyber security expert Bruce Schneier describes a world a few years hence where artificial intelligence (AI) has had a profound effect on humanity.

Schneier is not alone in his assessment that the short-term future for AI will be a wild ride. He worries that the power of AI models to hack – in its simplest form we will define hacking here as any act of discovering and exploiting vulnerabilities and loopholes in systems, not necessarily in a cyber security context – will far outpace the ability of human hackers to keep up.

He argues that the hacks AIs will discover will almost inevitably be used to benefit the wealthy and the powerful. Imagine, if you dare, a scenario where AIs become so adept at exploiting tax and regulatory systems in the service of amoral hedge funds and venture capitalists that wealth inequality increases exponentially and economic systems begin to crash. It’s not possible today, but it’s probable tomorrow.

“I talk about the notion of AI hacking [and] finding vulnerabilities in systems,” says Schneier. “In general, AI is very discontinuous technology and we don’t know what’s possible – things that we think are easy end up being hard and vice versa. So we don’t know.

“But I think this is going to be the biggest change in human society. I think it’s going to affect everything.”

Nobody, not even Schneier, yet has the answers to how to solve these problems, but through his work as chief of security architecture at Inrupt, where he has reunited with long-time collaborator John Bruce and worldwide web pioneer Tim Berners-Lee, he is now working on an idea that, if it comes good, may give some power over AI back to the people.

Berners-Lee has always been an advocate for the open web and makes no secret of wanting to safeguard the democratic principles on which he founded it. He and Bruce set up Inrupt on similar principles of enabling individuals to control their experience and data in a way that since the advent of platforms like Google and Facebook, now Meta, in the mid-2000s, has been lost.

Put as simply as possible, Inrupt’s technology – the Solid Platform – organises data, applications and identities in a way that gives the data owner the power to choose how and where it is stored, and who can access it, via their own personal online data store or Pod.

Early adopters have included NatWest Bank, the BBC, the government of Flanders in Belgium, and the NHS, which have been exploring pilot use cases for an enterprise version since 2020.

What does this have to do with AI, then?

So it’s a cloud storage service? Not exactly. Think of a Pod as something more akin to a private website where you control how your personal data is made available to applications or other people in a way that makes sense to you.

Were you at a party with someone? Then you can let them see photos you took at the party, but not your holiday snaps. Did you work with someone on a project? Then you can let them access the project files, but not the draft of your novel. Have you gone through a relationship breakdown? Then you can rescind your ex’s access to your data.

“It is putting data in the hands of people in a way that is generative, which is something we’ve lost with the big tech platforms,” says Schneier. “In a sense, the early web was generative in that everybody could do their thing.

“This is a way to almost take data sideways. Instead of Fitbit having your Fitbit data and your refrigerator having your refrigerator data, Inrupt turns it sideways so that you have your data from all those places in one place, and apps can be written that use data from here or there easily.”

How does this relate to AI? Well, right now, a lot of the concern over large language models, such as the one that underlies the current AI bogeyman, ChatGPT, lies in how they are trained – using all the data they can possibly scrape from every corner of the public internet, without asking. At the very least, this is concerning from a privacy perspective.

Schneier says this model of “training without consent” is ripe for disruption. He asks: what if we disrupted this and took back control of what data AIs are allowed to be trained on?

“The notion that you can train a personal AI on yourself will be powerful in ways we don’t even comprehend,” says Schneier. “That’s just not possible now because Fitbit has your Fitbit data, and Twitter has your tweets, and Facebook has your Facebook stuff, and Google has your email. You can’t get to all of your stuff.”

So long Siri, adieu Alexa

In this context, says Schneier, the power in turning data sideways and controlling who or what you allow to access it comes because, theoretically, it could enable you to train a personal, private AI that is tailored to your specific needs and interests in the online sphere.

“Right now, you have a large language model that is trained on everything, so it’s racist, because a lot of people are racist. But if you could train the large language model on just me, it could be a better assistant,” he says.

“For example, it could produce a first draft of something. That would be neat, and that I would be able to make a second draft out of.

“If AI is trained on me, as me, then it becomes my assistant, working for me, not someone else. That seems to be a huge game-changer.”

Schneier is a fan of assistive technology. “I want a world where someone who is not very articulate can write a letter to their congressperson. I want that world,” he says. “But I don’t want a world where an AI is writing a million letters pretending to be from people. So how do we unlock that assistant feature? One of the ways is through personalisation.”

The value to a creative professional, like Schneier, who has published close to 20 books in his career, or this reporter, who has published zero but keeps trying just in case, is clear.

But the applications of this theory go way beyond generating a useable first draft of a book, or a letter to a politician. What if you could train an AI assistant on the venues you checked into on Facebook, or the photos you posted to Instagram? Maybe you’re into street food or good beer; if you find yourself in a new city, your AI might be able to recommend some excellent food trucks or real ale pubs. Did you enjoy that one band at Glastonbury? They’re playing near you tonight.

This is already done to some extent – Amazon trains its algorithms on your data all the time, but for its own benefit, not yours. “It’s to the user’s benefit if they coincide, but if they come in conflict, Amazon wins because they own the Echo,” says Schneier.

Are you ready for this?

The advent of a version of Alexa that has any utility beyond providing you with a weather update before leaving the house, setting a timer, or accessing the BBC Sounds app, is certainly an interesting prospect to consider.

But even if you’re onboard with that, there will be other prospects to consider, not least the idea of moving beyond the walled gardens of the tech giants that, for all their failings, we have become used to over the past 15 years.

Many may fear that setting up and running their own AI assistants and Pods will be beyond their technical capabilities, a factor Schneier acknowledges, but does not think will necessarily prove to be a huge issue.

He says: “Sure, you could run your own Pod. Just like you could run your own email server, but you don’t – you use Apple or Google or Microsoft. Most likely, your Pod will be hosted by someone else.”

Precisely who these hosts will be is yet to be decided. Your Pod could be supplied as part of your broadband or mobile package. If you want to buy your own physical storage device, it could sit there. It could even be offered as a service by a tech giant.

“There are all these different ways that we put stuff in the cloud, because that’s the way it makes sense. This will be that too…. [But] the mechanics of it have to be transparent to the user. It can’t be that you need to be an expert to have your own Pod.”

Is Schneier comfortable with building this service only to put it in the hands of an Apple or Google, or worse, a Meta?

“You’ve got to worry about the tech giants,” he says, “but you can also move the Pod like a web server. If you don’t like what they [the host] are doing, go somewhere else, it’s super easy…. Certainly, it would be better if you ran your own server, but you’re not going to do that, let’s be reasonable.”

The growing influence of federated social media platform Mastodon could serve as a possible model for consumer Pods. Mastodon is not a single, privately held social network like Twitter now is, rather a network of distributed servers or instances which federate together in a collective group. If a Mastodon user doesn’t like their server, they’re free to move whenever they like.

Changing the internet, again

If AI proves to be a nightmare movie plot threat, like Skynet in Terminator, all the Pods in the world won’t stop the nukes from flying. But if this plays out as Schneier suggests it may, the idea of decentralising data in the service of giving individuals agency over AI, and their own privacy, is certainly an attractive one.

“This notion of distributing our data is much more resilient, reliable, generative and better than if the big tech monopolies have it,” says Schneier. “Google has all of your data, but they don’t have to have it, it’s just convenient for them to have it.

“We can make it convenient for you to have it, and that would be better for you. That’s the vision of Pods, the vision of Solid. The reality is not there yet, but Tim Berners-Lee changed the internet once. He has a track record of changing the way the internet works, so I wouldn’t put it past him to do it again.”

