phonlamaiphoto - stock.adobe.com
The UK’s National Cyber Security Centre (NCSC) has issued advice and guidance for users of AI tools such as ChatGPT that rely on large language model (LLM) algorithms, saying that while they present some data privacy risks, they are not necessarily that useful currently when it comes to deploying them in the service of cyber criminal activity.
Use of LLMs has seen exponential growth since US startup OpenAI released ChatGPT into the wild at the end of 2022, prompting the likes of Google and Microsoft to unveil their own AI chatbots at speed, with varying results.
LLMs work by incorporating vast amounts of text-based data, usually scraped without explicit permission from the public internet. In doing so, said the NCSC, they do not necessarily filter all offensive or inaccurate content, meaning potentially controversial content is likely to be included from the get-go.
The algorithm then analyses the relationships between the words in its dataset and turns these into a probability model that is used to provide an answer based on these relationships when the chatbot is prompted.
“LLMs are undoubtedly impressive for their ability to generate a huge range of convincing content in multiple human and computer languages. However, they’re not magic, they’re not artificial general intelligence, and contain some serious flaws,” said the NCSC’s researchers.
For example, such chatbots often get things wrong and have been seen “hallucinating” incorrect facts. They are prone to bias and can often be very gullible if asked a leading question. They need huge compute resources and vast datasets, the obtaining of the latter poses ethical and privacy questions. Finally, said the NCSC, they can be coaxed into creating toxic content and are prone to injection attacks.
The research team also warned that while LLMs do not necessarily learn from the queries with which they are prompted, the queries will in general be visible to the organisation that owns the model, which may use them to further develop its service. The hosting organisation may also be acquired by an organisation with a different approach to privacy, or fall victim to a cyber attack that results in a data leak.
Queries containing sensitive data also raise a concern – for example, someone who asks an AI chatbot for investment advice based on prompting it with non-public information may well commit an insider trading violation.
The NCSC also suggested that organisations considering using LLMs to automate some business tasks avoid using public LLMs, and either turning to a hosted, private service, or building their own models.
Cyber criminal use of LLMs
The past couple of months have seen lengthy debate about the utility of LLMs to malicious actors, so the NCSC researchers also considered whether or not these models make life easier for cyber criminals.
Acknowledging that there have been some “incredible” demonstrations of how LLMs can be used by low-skilled individuals to write malware, the NCSC said that at the present time, LLMs suffer from appearing convincing, and are better suited to simple tasks. This means that they are rather more useful when it comes to helping someone who is already an expert in their field save time since they can validate the results on their own, rather than helping someone who is starting from scratch.
“For more complex tasks, it’s currently easier for an expert to create the malware from scratch, rather than having to spend time correcting what the LLM has produced,” said the researchers.
“However, an expert capable of creating highly capable malware is likely to be able to coax an LLM into writing capable malware. This trade-off between ‘using LLMs to create malware from scratch’ and ‘validating malware created by LLMs’ will change as LLMs improve.”
The same goes for employing LLMs to help conduct cyber attacks that are beyond the attacker’s own capabilities. Again, they currently come up short here because while they may provide convincing-looking answers, these may not be entirely correct. Hence, an LLM could inadvertently cause a cyber criminal to do something that will make them easier to detect. The problem of cyber criminal queries being retained by LLM operators is also relevant here.
The NCSC did, however, acknowledge that since LLMs are proving adept at replicating writing styles, the risk of them being used to write convincing phishing emails – perhaps avoiding some of the common errors made by Russian-speakers when they write or speak English, such as discarding definite articles – is rather more pressing.
“This may aid attackers with high technical capabilities but who lack linguistic skills, by helping them to create convincing phishing emails or conduct social engineering in the native language of their targets,” said the team.
Read more about AI and cyber security
- WithSecure research into GPT-3 language models, used by the likes of ChatGPT, surfaces concerning findings about how easy it is to use large language models for malicious purposes. Should security teams be worried?
- AI tools such as ChatGPT are trained on datasets scraped from the web, but you don’t have much say if your data is used. Technologist Bruce Schneier says its time to give control of AI training data back to the people.