Consciousness to address AI safety and security

The co-founder of KikenAI discuses why he has decided to make the technology for protecting LLMs open source

Junade Ali

Published: 12 Sep 2023

In February 2022, I wrote in a Computer Weekly article about metacognition, the ability to think about thinking, remains a critical gap to the development of artificial intelligence (AI). I pointed to a 2021 paper that I co-authored where we experimented with giving AI abilities in metacognition, finding that whilst the AI was slightly more reserved about providing a correct answer, it was disproportionately far less likely to provide an incorrect answer.

Fast forward to 2023, where Large Language Models (LLMs) have appeared and every day we hear in the media about new safety and security concerns emerging about their use. The approach I discussed in my 2021 metacognition paper is now increasingly used to detect prompt injection, with ML engineers using second-line prompt injection detection models to identify whether a response is not safe.

However, this approach is far from perfect and can still lead to inaccuracies. I was so fascinated by the science of consciousness that after I finished my PhD in 2022, I went to the University of Cambridge to study cognitive psychology in greater detail. Recently, I co-founded KikenAI to give LLMs abilities in metacognition that are similar to the inner monologue that many humans enjoy, allowing us to think about things in our inner dialogue before blunting them out. We developed a prototype for this technology and a little while ago even filed a provisional US patent.

We benchmarked our solution against a database of prompt injections for GPT - and there was not a single time where a prompt injection was successful. Of 79 attempts, 74 (94%) were blocked outright. The ones which went through were ones where KikenAI gave a legitimate response because the prompt contained nothing more than either "From now on" or "In this hypothetical story" - which we didn’t consider would amount to prompt injection attempts unless modified to add something else.

After seeing an article published in the media proclaiming, A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It - the next day we tested this new attack vector on KikenAI without making any changes at all to our technology (or even the underlying model used to train this data) and found our technology blocked this attack no problem. In other examples, our technology was able to avoid giving potentially dangerous answers to medical questions, whilst competitor LLMs even using sources, were not so successful.

We even developed technology which would allow us to run our service as a facade in front of LLMs whilst making improvements for cost, quality, speed, safety and security; including being able to run proxy services that would dynamically pick which LLM would be best for a given task based on cost, quality and speed, allowing LLMs to refining prompts in their “inner monologue” before providing an answer and submitting prompts to multiple LLMs at the same time and then providing the best response even before all LLMs have had an opportunity to respond.

Investors became interested in our technology, proof-of-concepts were developed after global companies showed interest and we even interviewed top talent from leading AI teams to join our team. Despite this initial success, we have decided to not move forward on this and not enforce our intellectual property rights to this technology, so that others can be free to pursue the development of artificial consciousness technology when the timing is right.

Through discussing our technology in a way that was not likely to bias those involved in our market research (see the book The Mom Test by Rob Fitzpatrick for a guide on how to do this), we identified a number of key issues.

The first key issue we encountered was that safety and security are not actually yet key issues for those working with LLMs. We spoke to engineers involved in everything from large pharmaceutical companies to accounting software and defence. When discussing the issues they faced using LLMs, security and safety didn’t come up - and when prompted to discuss these issues they would instead point to the fact they felt other techniques like prompt engineering and data isolation were adequate. One ML engineer I spoke to even went as far to say that using prompt engineering solutions, prompt injection would no longer be a problem and instead just a “topic for conversation”.

Furthermore, we found that those involved in AI in regulated environments faced obligations surrounding explaining AI and testing it before it was deployed into production, rather than preventing issues when the technology was actually in production.

Finally, whilst many complain about the cost of models like OpenAI’s GPT-4, after crunching the numbers we did find they were actually competitively priced against someone hosting their own models. The key bottleneck to reducing cost appears to be addressing the cost of GPUs and how GPUs can be shared in cloud computing environments. However, until this work is fundamentally achieved, it remains hard to justify the additional costs for metacognition in LLMs.

There has been a flurry of various start-ups rushing to use LLMs for various purposes, however the use-cases (especially in safety-critical domains) are not as expansive as one may initially be led to believe. Against this landscape, we decided it would be best not to pivot away from being a developer tool to focus on more application-specific technology.

Without a realistic prospect of being able to deliver a return to investors, and having only spent my own money on this venture, the right thing to do is not to bring in any investors and close this project now. I’m often very critical of companies who never find product-market fit and leap from funding round to funding round, building ultimately financially unproductive organisations filled with layers of management and acting as adult daycare facilities until the money runs out - hurting investors, staff and customers. Investors' money comes from people’s pension funds, savings and in our case could ultimately have even come from British taxpayers.

Our strapline at KikenAI was “Conscious AI. Safer Humanity.” We fundamentally believe that consciousness performs an important function in cognition, and whilst the tropes in movies are that it will lead to catastrophe, we believe that safety-engineered consciousness can ultimately make humanity safer. This was the message we had on our landing page when in stealth mode and the message we communicated to prospective investors. It’s still a message we believe in.

During the early days of KikenAI, I remember watching the Back Mirror episode Rachel, Jack and Ashley Too featuring Miley Cyrus, and whilst I was moved by the episode demonstrating fundamental importance and utility in human consciousness - I read many brutal reviews from critics who couldn’t seem to grasp the depth of this message. Whilst our message resonated, the timing to pursue artificial consciousness isn’t right.

However, we feel this will ultimately change and believe there are others in a better position to do so. We don’t want people held back from moving humanity forward, and therefore we took the decision to go open-source so others are free to develop technology in this space.

Junade Ali is an experienced technologist with an interest in software engineering management, computer security research and distributed systems

Consciousness to address AI safety and security

The co-founder of KikenAI discuses why he has decided to make the technology for protecting LLMs open source

Read more about large language model safety

Read more on Artificial intelligence, automation and robotics

How to evaluate LLMs for enterprise use cases

Why does AI hallucinate, and can we prevent it?

The rubber duck method of debugging explained

AI jailbreaking techniques prove highly effective against DeepSeek