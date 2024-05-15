Google has fired the latest salvo in the race for artificial intelligence (AI) supremacy with significant enhancements to its Gemini model, including a groundbreaking two million token context window for Gemini 1.5 Pro.

Gemini 1.5 Pro, Google’s multimodal generative AI model, can analyse and classify video, audio, code, and text. This enables applications like chatbots to handle complex scenarios involving various content types, such as processing motor claims with related video and textual evidence.

Launched earlier this year with a one million token context window, the model now boasts double the capacity. This allows it to process significantly more information, like analysing 30,000 lines of code or uploading entire database tables and schemas for streamlined SQL analysis.

However, the new enhancement, currently available through a waitlist for developers, goes beyond simply handling large volumes of data.

“It’s about smarter, more comprehensive interactions with information,” said Stephanie Wong, Google’s head of technical marketing, in a LinkedIn post. “The coherence provides highly relevant answers across modalities that can refer back to earlier parts of the conversation.” Wong added that Google is aiming for an unlimited context window size in the future.

Next month, Google will also introduce context caching to Gemini 1.5 Pro. This will allow users to send large files and other parts of a prompt only once, making the expansive context window more useful and cost-effective.

Gemini 1.5 Pro's ability to handle larger context windows stems from Google’s Mixture-of-Experts (MoE) architecture. This increases model capacity without a proportional increase in computation, eliminating the need to fine-tune foundation models or rely heavily on retrieval augmented generation (RAG) to ground model responses in external data.

Despite this advancement, RAG still plays a crucial role in refining output accuracy and relevance for use cases like coding.

“With RAG, you'll be able to parse your private code base to get contextually relevant coding suggestions,” explained Brad Calder, vice-president and general manager of Google Cloud Platform and technical infrastructure, during Google Cloud Next ’24 last month.

“It's going to continue to be an important tool and mechanism for you to take your IP [intellectual property] and find information that's closest to what you're looking for,” he said.

For applications demanding low latency and cost efficiency, Google has introduced Gemini 1.5 Flash. This smaller, lightweight model is optimised for narrower or high-frequency tasks where rapid response times are critical.

Demis Hassabis, CEO of Google DeepMind, said in a blog post that Gemini 1.5 Flash “excels at summarisation, chat applications, image and video captioning, data extraction from long documents and tables, and more”.

“This is because it’s been trained by 1.5 Pro through a process called ‘distillation’, where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model,” he added.