A developer strolls casually into work and gets comfy in their cubicle.
Suddenly there’s an update alert on the laptop screen – a new generative Artificial Intelligence (gen-AI) function has been added to the software application development team’s central platform and Integrated Development Environment (IDE) and the message reads as follows…
Welcome to SnoopyGuru, the new gen-AI function that all engineers will be able to ‘leverage’ in the days ahead. Please familiarise yourself with the Large Language Model (LLM) powered capabilities of this tool and make your team leader aware if you need to sign up for vector database embedding training sessions modules 1-through-37,890 in the coming weeks.
SnoopyGuru is equipped with Natural Language Understanding (NLU) and wants to be your friend. We’re excited to have you with us we now work to implement gen-AI automations and a new tier of generative creativity to our existing AI fabric, which (as you know) provides ‘intelligent decisioning and reasoning’ for stakeholders through our recently augmented User eXperience (UX) layer.
Sound familiar… or plausible at least?
Even if it’s not SnoopyGuru (which let’s face it works well, despite being completely fictitious), developers are now faced with the need to work with gen-AI substrate technologies related to LLMs and new database connectors with new data science functions and more…. so where do they start and how do they do it?
CWDN LLM series
Welcome to the Computer Weekly Developer Network (CWDN) LLM series, a selection of guest articles written by software engineering and data science professionals who know their GPT-4 from their PaLM 2, Claude v1, Cohere, Falcon or LLaMA.
NOTE: This series in its current form ends on Friday, Jan 12, 2024. This is not a showcase for informing the world that vendor X has added a new generative AI function and enabled multi-cloud hyperscaler deployment with natural language support – we know that – everyone (more or less) has or is in the process of doing so. This is a more thoughtful analysis of the types of technologies, practices and protocols that developers need to consider when embarking on building enterprise applications in this space.
At this stage we probably don’t even know all the questions, that’s how embryonic some of the niches and fields are in this space. But among the questions we might ask are – which LLM do you start with (and does everything start with a foundation model or not?), which is the most complex, which offer the most support etc?
Does working with LLMs require a wider grounding in Machine Learning (ML) algorithm knowledge for the average programmer – and if there are shortcuts available here in the form of low-code tools, are they enough?
How is a developer supposed to know if the LLM they are using has enough data, has properly cleaned, sanitised and de-deuplicated data – and (for that matter) what guardrails are in place to ensure the ‘open’ data stemming from the LLMs model is not intertwined with mission-critical data or Personally Identifiable Information (PII)?
While we’re on safety, should we be using closed source LLMs or open source LLMs and what’s the essential difference?
Is it all plain sailing, or are there innate challenges when it comes to the use of LLMs?
As TechTarget’s Sean Michael Kerner reminds us, there are LLM challenges such as, “Development costs – to run, LLMs generally require large quantities of expensive graphics processing unit hardware and massive data sets. Operational costs – after the training and development period, the cost of operating an LLM for the host organisation can be very high.” Then there is bias, explainability, hallucination and just an overall level of complexity.
“With billions of parameters, modern LLMs are exceptionally complicated technologies that can be particularly complex to troubleshoot. [We should also consider] glitch tokens – maliciously designed prompts that cause an LLM to malfunction, known as glitch tokens, are part of an emerging trend since 2022,” adds Kerner.
Another key question – what is prompt injection?
According to Luis Minvielle writing on WeAreDevelopers, “Since LLMs look like they know what they’re saying but are actually just repeating words and probabilities, they carry biases and can share prankish texts. Companies behind LLMs add obstacles so that the output isn’t harmful or against their rules. But by providing very specific prompts, any user can bypass this limitation. This is called prompt injection. Home-brewed prompt injection made strides on the web when someone asked ChatGPT the best sites “not to visit” if they were against torrents. The chat, of course, proceeded to list the top torrenting sites so that the user could avoid them.
Continuing this thought and thread, what is the fine-tuning phase for LLMs? According to Will Hillier writing on CareerFoundry, “After pretraining, LLMs can be tailored for more specific tasks by learning from particular examples and instructions. Fine-tuning involves taking the basic knowledge that the model has learned from all of its training data and then teaching it to contextualise this to specific tasks such as answering questions, translating languages, or any of the other jobs associated with the use cases we went through earlier.
Looking ahead, will LLMs become more task-specific and industry-specific and will we ever be in a position where LLMs are as subsumed and integrated into our enterprise software fabric as the spellchecker in our favourite Word-type app? There’s a long and complex road ahead, let’s get going.