cherezoff - stock.adobe.com

Feature

What developers need to know about LLMs in the enterprise

We look at the main areas enterprise developers need to consider when building, testing and deploying enterprise applications powered by large language models

Cliff Saran, Managing Editor
Adrian Bridgwater

Published: 29 Jan 2024

Thanks to the availability of popular generative artificial intelligence (GenAI) software, such as Copilot in the Microsoft Office suite, almost anyone can start using public large language models (LLMs).

Analyst firm Gartner describes use of such AI-enhanced tools as “everyday AI” – in other words, these are tools that augment what people already do. For instance, the Copilot in Teams summarises meetings, the Word Copilot helps in document drafting and PowerPoint’s Copilot is designed to help people create presentation slide decks quickly. But since everyone has access to this technology, arguably it cannot be regarded as a business differentiator.

The second application area for LLMs is what Gartner deems “game-changer” AI, where the technology is used to build entirely new products and services. Such technology is disruptive to industries, leading to entirely innovative products and services.

While tools such as ChatGPT have captured people’s imagination and show the power of public LLMs, it is game-changing application areas that are likely to be the ultimate goal for business leaders and the software development teams tasked with making the supporting technologies.

What is clear from experts Computer Weekly has spoken to is that LLMs require a very different mindset to traditional software development in terms of programming skills and quality assurance testing. The IT infrastructure is also radically different to traditional enterprise applications, and IT leaders will need to balance data privacy and data protection regulations with the ability to train LLMs and test their accuracy.

How to get started with LLMs

In traditional programming, when a programmer uses a system or application programming interface (API) with known capabilities, it is possible to document inputs and expected results.

David Colwell, vice-president of AI and machine learning (ML) at Tricentis, says: “You can make confident assertions about the results to expect because the system, while not necessarily stable, can be expected to be both deterministic and pre-knowable.”

However, as Colwell points out, learning systems inherently break this paradigm. “This is not a bug; rather it’s a feature of their very nature,” he says. “They are trained on vast datasets, but we do not expect them to get every answer right; we just expect them to improve over time.”

Sometimes, an improvement in one area can come at the expense of another, and that’s acceptable.

For Colwell, AI systems should not be interrogated, blamed or held to account for the fact that they are non-deterministic. Instead, he recommends IT decision-makers take time to acknowledge the limitations of the technology.

Andrea Mirabile, global director of AI research at Zebra Technologies, notes that working with LLMs often involves a deeper understanding of machine learning algorithms. However, he says that while the average programmer may find it beneficial to have a foundational knowledge of ML concepts, some tools and frameworks offer a more accessible entry point.

In his experience, understanding tasks such as model fine-tuning, hyperparameter tuning and nuances in handling training data may contribute to achieving optimal results. Low-code tools are also useful. Mirabile suggests IT decision-makers consider how low-code tools could be used to provide a more approachable interface for those with less ML expertise. For example, LangChain is a platform offering tools for developers to facilitate rapid prototyping and experimentation with LLMs.

Why you need to focus on quality and diverse datasets

LLMs heavily depend on the quality and diversity of the training data. If the data is biased or lacks diversity, the model’s outputs may exhibit biases or perpetuate stereotypes, Mirabile warns. Biased outputs can lead to unfair or undesirable consequences, especially in applications involving sensitive topics or diverse user bases.

LLMs also hallucinate, as Ebenezer Schubert, vice-president of engineering at OutSystems, explains: “They can also be tricked into getting into areas or saying things. The prompt used for LLMs can be hijacked if you are not careful. If you are doing any fine-tuning, based on your interaction, and not paying attention to the dataset, this could create some adversarial training effects as well. These are some of the things that you need to pay attention to.”

However, fine-tuning LLMs for specific tasks requires expertise. According to Mirabile, achieving optimal performance often involves experimenting with hyperparameters and adapting the model to the task at hand. “Inadequate fine-tuning may result in suboptimal performance or difficulty in adapting the model to specific use cases,” he says.

Mirabile urges IT decision-makers to be cautious when deploying LLMs that incorporate complex deep learning models, as these can be challenging to interpret. “Understanding why a model made a specific prediction can be elusive,” he says. “A lack of interpretability may hinder trust in the model’s decisions, especially in critical applications where accountability and transparency are essential.”

What is a private LLM?

One of the first decisions IT leaders need to take is whether to use one of the growing number of public LLMs or deploy one internally, which is ring-fenced from the public internet.

Commercial LLMs on the public cloud are available to anyone through a subscription service. These are trained on vast swathes of publicly available data, scooped up from websites and social media sites. An obvious question for an IT decision-maker contemplating public LLM subscription services is what corporate data their business is willing to share publicly, to avoid the risk of data leakage.

Another factor is cost. According to Ilkka Turunen, field CTO at Sonatype, most API-accessible models are paid per token sent and received. Turunen warns that this means costs can accrue fast following extensive use. “The calculations for these requests are not always straightforward and an intimate understanding of the payload is required,” he adds.

There are also huge intellectual property (IP) ramifications on how these models are trained, which have yet to be resolved.

In a meeting with the House of Lords Communications and Digital Committee in November 2023, Owen Larter, director of public policy at the office for responsible AI at Microsoft, responded to a question about copyright by saying: “Jurisdictions like the EU and Japan recently clarified that there is an exception within their law for text and data mining within the context of training an AI model. There’s a longer list of countries that have that type of regime.”

“When [LLMs] are deployed, companies should have well-defined policies for their use. As with any critical IT resource, key employee access control needs to be implemented”

Oliver King-Smith, smartR AI

But in the US, The New York Times recently filed a lawsuit against Microsoft and OpenAI, the developer of ChatGPT, claiming that the two companies unlawfully used work from its websites to create artificial intelligence products that compete with and threaten The Times’ ability to provide its news service.

Beyond the legal wrangles over intellectual property rights, using domain-specific or proprietary data can help a business differentiate its LLM over rivals.

A private LLM is one where the models run inside an organisation's internal IT infrastructure without relying on any outside connections. “By keeping these models within their own secure IT infrastructure, the enterprise knowledge and data can be protected,” says Oliver King-Smith, CEO of smartR AI.

However, he points out that private models need buy-in from all stakeholders in the organisation. King-Smith urges IT decision-makers looking at deploying private LLMs to undertake a risk assessment prior to implementation.

“When they are deployed, companies should have well-defined policies for their use,” he adds. “As with any critical IT resource, key employee access control needs to be implemented, especially when they deal with sensitive information.”

For instance, businesses that need to comply with standards, such as International Traffic in Arms Regulations (ITAR), General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPPA), need to consider whether their LLMs are compliant. As an example of accidental misuse, King-Smith says some lawyers have been caught preparing cases on ChatGPT, which is a clear violation of attorney-client privilege.

According to King-Smith, a major benefit of private models over ChatGPT is that they can learn internal knowledge locked within the organisation’s emails, internal documents, project management systems and other data sources. “This rich storehouse captured into your private model enhances the model's ability to operate inside your enterprise,” he says.

What IT infrastructure you need to deploy LLMs

The implication of operating a private LLM is that the internal IT function of the organisation is responsible for the upkeep of hardware and software. Training LLMs is achieved using an array of graphics processing units (GPUs) running on servers optimised for AI.

Rather than running servers on-premise, many organisations choose to have the hardware hosted externally, accessed via a public infrastructure-as-a-service (IaaS) provider. For instance, retail giant Walmart uses both public cloud providers and its own in-house generative AI technology stack.

A private LLM running on cloud infrastructure allows organisations to take advantage of the scale and elasticity on tap in the public cloud, while keeping company-specific data secure

“It’s paramount for us that our user data or customer data and our IP stays within our firewall and isn’t used to train other datasets, so we spend a lot of time figuring out how to do that,” says Walmart senior vice-president David Glick, who heads up the retailer’s enterprise business services department.

Using the public cloud to host and run private LLMs allows IT leaders to avoid the risk of data leakage, which can occur if proprietary data is uploaded to a public LLM. A private LLM running on cloud infrastructure allows organisations to take advantage of the scale and elasticity on tap in the public cloud, while keeping company-specific data secure.

The industry has recognised that there is demand for providing IT infrastructure and platforms optimised for running AI. The major server providers have all pivoted their offering to cater for machine learning and LLM training workloads.

“Companies see the potential of generative AI and want to get the same results. But they wonder whether they want to bring their data to the cloud,” says Paulo Pereira , vice-president of system engineering at Nutanix. “Your internal data becomes part of the public domain. But models by themselves are worthless without data.”

Nutanix has developed what it calls “GPT-in-a-box”, which uses open source models and trains them with private data. “Users retain control over their data because it stays within the organisation. Companies see the potential of generative AI and want to get the same results. But models are worthless without your data,” adds Perreira.

What guardrails do you need for LLMs?

Finally, beyond technical infrastructure and programming skills, it is essential that software developer teams use appropriate guardrails to ensure security, privacy and compliance when building enterprise AI applications. Business leaders also need to consider the ethical implications of using LLMs, auditability and explainability to demonstrate to auditors that they put in place policies and procedures to ensure LLM-powered decision-making applications are fair and unbiased.

James Tedman, head of Europe region at BlueFlame AI, suggests focusing on the following areas:

Data security: Implement strong encryption for data at rest and in transit. Regularly review and update security protocols to protect sensitive information processed by various LLMs.
Privacy compliance: Adhere to privacy laws like GDPR or the California Consumer Privacy Act (CCPA). Ensure that all LLMs used comply with these regulations, particularly regarding user data handling and consent. Ensure that you have commercial agreements in place with the LLMs to prevent the use of data for model training.
Access control: Implement strict access controls and authentication mechanisms to prevent unauthorised access to AI systems and sensitive data.
Auditing and monitoring: Regularly audit AI systems for security vulnerabilities and monitor usage to detect and respond to malicious activities.
Bias and ethical considerations: Regularly evaluate different LLMs for biases. Implement measures to reduce the impact of these biases on decision-making and outputs.
Compliance with industry standards: Ensure that all AI solutions comply with industry-specific standards and regulations, particularly in sectors such as healthcare, finance and legal.
Transparent data usage: Maintain transparency in how AI systems use and process data, informing stakeholders about the AI models in use and their data handling practices.

What developers need to know about LLMs in the enterprise

We look at the main areas enterprise developers need to consider when building, testing and deploying enterprise applications powered by large language models

How to get started with LLMs

Read more about large language models (LLMs) in the enterprise

Why you need to focus on quality and diverse datasets

What is a private LLM?

What IT infrastructure you need to deploy LLMs

What guardrails do you need for LLMs?

Read more on Artificial intelligence, automation and robotics

LLM build vs. buy: A decision framework for LLM adoption

What is a large language model (LLM)?

What comes after LLMs? The next wave in generative AI

Rev speaks out on the big future of small language models