Zebra AI lead: A developer deep dive on LLM (part 1)

This is a discussion recorded for the Computer Weekly Developer Network (CWDN) with Andrea Mirabile in his position as global director of AI research at Zebra Technologies.

With over 10 years as an AI scientist, Mirabile is arguably quite suitably positioned to discuss LLMs with a deep understanding of how the technology works.

Zebra Technologies has established a global AI research network to examine the on-device execution of generative AI LLMs to empower front-line workers.

CWDN: Which LLM do you start with (and does everything start with a foundation model or not?), which is the most complex, which offers the most support etc?

Mirabile: Recommendations hinge on the task’s objective. When it comes to building an application, the developer team should begin with LLMs behind APIs, such as those offered by OpenAI, Google, Microsoft and Anthropics for streamlined development. APIs often incorporate advanced tricks to enhance overall model performance in terms of speed and accuracy. Once satisfied with the application, consider optimising costs by transitioning to open source models from Mistral and Meta. For those interested in open source LLMs, hubs like Hugging Face provide a collection of models with a focus on text generation, offering a resource for exploration and implementation.

For researchers, opting for smaller models is advisable.

The shorter training time for smaller models allows faster iteration through different experiments. Swiftly identifying effective approaches and refining models becomes more feasible with quicker experimentation, contributing to the improvement of LLMs. The choice of LLM depends on the specific objectives and context, whether for application development or research endeavours.

CWDN: Does working with LLMs require a wider grounding in Machine Learning (ML) algorithm knowledge for the average programmer – and if there are shortcuts available here in the form of low-code tools, are they enough?

Mirabile: Working with LLMs often involves a deeper understanding of Machine Learning (ML) algorithms. While the average programmer may find it beneficial to have a foundational knowledge of ML concepts, there are tools and frameworks that offer a more accessible entry point.

Familiarity with ML concepts, especially in natural language processing (NLP) and deep learning can be helpful. Tutorials, ranging from beginner to advanced user are available at deeplearning.ai, for example. Understanding tasks such as model fine-tuning, hyperparameter tuning and nuances in handling training data may contribute to achieving optimal results. Low-code tools are useful, so explore low-code tools that provide a more approachable interface for those with less ML expertise. For example, LangChain is a platform offering tools for developers, facilitating rapid prototyping and experimentation with LLMs.

When it comes to shortcuts and limitations, I’d highlight automation limits.

While low-code tools streamline certain aspects, they may have limitations in handling highly specialised tasks or complex model configurations. Plus, even with low-code tools, comprehending the outputs of LLMs and making informed decisions based on model behaviour would benefit from a foundational understanding of ML concepts.

CWDN: How is a developer supposed to know if the LLM they are using has enough data, has properly cleaned, sanitised and de-duplicated data – and (for that matter) what guardrails are in place to ensure the ‘open’ data stemming from the LLMs model is not intertwined with mission-critical data or Personally Identifiable Information (PII)?

Mirabile: To address this, the model’s author must disclose the datasets used for training.

Open source models generally provide more transparency regarding the training data, allowing other researchers to replicate experiments. However, exceptions exist, such as Llama, an open source model that incorporates non-public, internal data. Despite standard practices of cleaning and anonymising data during model training, the accuracy of large-scale cleaning processes may vary.

CWDN: While we’re on safety, should we be using closed source LLMs or open source LLMs and what’s the essential difference?

Mirabile: The primary distinction between closed and open source LLMs lies in the transparency they offer. Closed LLMs operate as black boxes, providing minimal information about the training data, optimisation techniques and additional information sources enhancing model performance. On the other hand, transparency becomes a pivotal advantage for open source LLMs. From a security standpoint, there isn’t a definitive winner, as each approach has its own set of constraints.

When looking at closed-source, the proprietary nature of the model may provide security through obscurity, making it challenging for malicious actors to exploit vulnerabilities. However, this also implies that identifying and addressing security issues might be a prolonged process.

With open source, we have security gains from the collaborative efforts of the community. The scrutiny of many eyes on the code facilitates the swift detection and resolution of security vulnerabilities.

Nevertheless, the public exposure of the code may reveal potential weaknesses.

CWDN: Is it all plain sailing, or are there innate challenges when it comes to the use of LLMs?

Mirabile: While the use of LLMs offers remarkable capabilities, it is not always plain sailing and there are inherent challenges associated with their deployment and utilisation. LLMs heavily depend on the quality and diversity of the training data. If the data is biased or lacks diversity, the model’s outputs may exhibit biases or perpetuate stereotypes. Biased outputs can lead to unfair or undesirable consequences, especially in applications involving sensitive topics or diverse user bases.

LLMs, particularly complex deep learning models, can be challenging to interpret. Understanding why a model made a specific prediction can be elusive. A lack of interpretability may hinder trust in the model’s decisions, especially in critical applications where accountability and transparency are essential.

Fine-tuning LLMs for specific tasks requires expertise.

Achieving optimal performance often involves experimenting with hyperparameters and adapting the model to the task at hand. Inadequate fine-tuning may result in suboptimal performance or difficulty in adapting the model to specific use cases.

Andrea Mirabile, global director of AI research at Zebra Technologies.

Training and using large LLMs can be computationally expensive and resource-intensive, particularly cloud-based LLMs, compared to on-device LLMs, which can provide cost savings. This poses challenges for smaller organisations or individuals with limited computing resources. Deploying and maintaining LLMs can incur significant operational and development costs. This includes expenses related to infrastructure, software development and ongoing maintenance. High costs may limit the accessibility of advanced language models, again hindering their adoption by organisations with budget constraints. For end users, on-device LLM options should be considered as effective cost-efficient options, where appropriate.

LLMs can inadvertently generate inappropriate or harmful content.

Determining ethical guidelines for model behaviour and ensuring responsible use is essential. Ethical lapses can result in reputational damage and legal consequences, emphasising the need for robust ethical frameworks. Addressing bias in LLMs, ensuring explainability in their decision-making processes and mitigating hallucinations (generating incorrect or nonsensical information) are ongoing challenges. Unaddressed bias, lack of explainability, or hallucinations can undermine the reliability and trustworthiness of LLMs, affecting their suitability for various applications.

It’s worth noting that leveraging tools provided by tech players such as Microsoft, Google, OpenAI and Amazon can contribute to mitigating operational and development costs associated with LLMs. These companies offer platforms and services that streamline the deployment, management and fine-tuning of language models.

Part 2 of this discussion is posted here.