Efficient Ether founder: Advancing LLM precision & reliability

This is a guest post for the Computer Weekly Developer Network written by Ryan Mangan, datacentre and cloud evangelist… and founder of Efficient Ether.

Efficient Ether is a startup on a mission to reshape cloud cost management, XaaS (Everything as a Service) and environmental stewardship for companies of all sizes, globally.

Mangan writes as follows…

Over the past eighteen months, the IT industry has made notable advancements with the introduction of Large Language Models (LLMs) and in turn, many organisations are now either planning or integrating AI into their businesses or technology stacks. This rapid progression brings both opportunities and risks. However, this has not stopped the major tech companies from racing to build, develop and integrate LLM technology into their wider product offerings like Microsoft Copilot and Amazon Q.

The fast adoption of LLM AI technology has also drawn attention from governments and regulatory bodies and in December 2023, the ISO standard 42001 was established, which sets a benchmark for the use and development of AI technology. For developers, this new era of technological growth presents various challenges, including adapting to new organisational, training and ethical standards. They also face technical issues such as hallucinations and other challenges like Model autophagy disorder (MAD), which is likened to Mad Cow Disease, highlighting the complexities of managing advanced AI systems.

Challenges of hallucinations

Those who have any experience with using Large Language Models will know that they can generate ‘hallucinations# from time to time, or produce outputs which are factually incorrect or nonsensical. If not monitored, such inaccuracies can lead to ramifications, especially for industry sectors that handle sensitive data, like health care and finance.

LLMs often prioritise data patterns at the expense of accuracy which at times leads to randomly produced misleading outputs. Interestingly, there is a paper by Zhang et al. (2023) that details “How Language Model Hallucinations Can Snowball”, The paper details the progression and intensification of these inaccuracies. It also details the mechanisms that lead to factually incorrect or nonsensical outputs, which illustrates some of the complexities and challenges in identifying and mitigating hallucinations.

There are also other considerations that you should be aware of, including the quality of data, as highlighted in a paper by Alemohammand et al., “Model Autophagy Disorder (MAD),” which demonstrates a tendency for generative models to deteriorate in output quality over time. This paper highlights the importance of maintaining data quality and continuously monitoring and validating LLM outputs to ensure their reliability and integrity. Simply incorporating an LLM into systems could present substantial risks without adequate controls.

Enhance LLMs with RAG & CRAG

There are a number of ways to reduce LLM hallucinations and many are turning to Retrieval Augmented Generation (RAG) technology.

This technology helps reduce AI-generated inaccuracies, as RAG combines the linguistic capabilities of Large Language Models (LLMs) with real-time data retrieval from dependable sources in a vector database. The RAG methodology helps ground AI outputs on a foundation of improving factual accuracy and reducing the incidence of hallucinations. RAG-integrated LLMs provide improved responses that are not only contextually relevant but can also provide substantial enhancement in the overall quality of factual outputs.

This sounds great; however, on occasions, RAG may not provide the desired result and outputs can still hallucinate. An interesting paper titled “Corrective Retrieval Augmented Generation” (CRAG) by Yan et al. (2024) details ways to further enhance the robustness of generation by addressing a critical challenge inherited in RAG, which is the heavy reliance on the relevance of retrieved documents and the potential for inaccuracies if the retrieval process falters. CRAG essentially introduces a lightweight retrieval evaluator, which is designed to assess the quality of documents retrieved for query, providing an improved degree of confidence that triggers different knowledge retrieval actions accordingly.

Ryan Mangan, datacentre and cloud evangelist… and founder of Efficient Ether.

Accepting that the retrial from static and limited databases may not always provide the required documents or information, the paper depicts the use of web searches to augment retrieval results. Moreover, a decompose-then-recompose algorithm is implemented for processing documents to concentrate on essential information and discard irrelevant content selectively. CRAG can be used to improve the RAG-based model’s performance across both short- and long-form generation tasks, as detailed in the paper.

In addition to RAG and CRAG, there is also a capability known as function calling within LLMs, which further enhances an LLM’s utility and reliability. Function calling enables LLMs to interact with external databases and services like public API’s, which can provide up-to-date and precise information like weather forecasts.

This feature transitions traditional LLM static knowledge repositories into dynamic knowledge, enabling real-time retrieval of data and processing, which is a crucial advancement and helps reduce the occurrence of hallucinations and increase the reliability and effectiveness of the models.

Crafting precision in LLM interactions

Prompt Engineering is now considered a required technical skill. It involves understanding the model’s internal workings and the context of its application and fine-tuning prompts to achieve optimal outputs.

One could argue that LLM prompt engineering is an industry-required skill and a well-crafted prompt can reduce the chances and risk of hallucination, as it essentially guides the LLM towards outputs that are accurate and relevant to the required context.

Closing points

Large Language models offer tremendous innovation potential and open wider opportunities to organisations; however, consideration should be taken when navigating their inherent risks, hallucinations and other challenges like the risk of Model Autophagy Disorder.

As we further delve down the rabbit hole of AI, technology concepts Like RAG, CRAG and function calling represent significant strides towards hardening the reliability of LLMs; however, this is only one part of the bigger picture; there still needs to be awareness and the application of responsible AI through the adherence to ethical standards and a commitment to ensuring AI is reliable.

As developers in this fast-evolving landscape, our focus must remain on leveraging AI to augment human decision-making, ensuring a future where technology is a dependable and ethical enhancement of our capabilities.

Citations & references

Alemohammad, S., Luzi, L., Humayun, A. I., Babaei, H., LeJeune, D., Siahkoohi, A., & Baraniuk, R. G. (2023). Self-Consuming Generative Models Go MAD. ArXiv. /abs/2307.01850

ISO/IEC 42001:2023, Information technology – Artificial intelligence – Management system. Status: Published.

Zhang, M., Press, O., Merrill, W., Liu, A., & Smith, N. A. (2023). How Language Model Hallucinations Can Snowball. ArXiv. /abs/2305.13534

Yan, S., Gu, J., Zhu, Y., & Ling, Z. (2024). Corrective Retrieval Augmented Generation. ArXiv. /abs/2401.15884