AWS details vector search for Amazon MemoryDB

This week sees the move to general availability for vector search for Amazon MemoryDB.

Amazon MemoryDB is an in-memory database service that is compatible with the open source Redis data store.

This news represents a new capability for cloud-centric developers to use to store, index, retrieve and search vectors to develop real-time Machine Learning (ML) and generative artificial intelligence (gen-AI) applications with in-memory performance and multi-AZ durability.

NOTE: Multi-Availability Zone (AZ) file systems are composed of a high-availability cluster of Windows file servers spread across two AZs (a preferred AZ and a standby AZ),

With this launch, Amazon MemoryDB delivers vector search performance at the highest recall rates among popular vector databases on Amazon Web Services (AWS). The company claims that users will “no longer have to make trade-offs” around throughput, recall and latency, which are traditionally in tension with one another.

“You can now use one MemoryDB database to store your application data and millions of vectors with single-digit millisecond query and update response times at the highest levels of recall. This simplifies your generative AI application architecture while delivering peak performance and reducing licensing cost, operational burden, and time to deliver insights on your data,” notes Channy Yun, principal developer advocate for AWS.

With vector search for Amazon MemoryDB, users can use the existing MemoryDB API to implement generative AI use cases such as Retrieval Augmented Generation (RAG), anomaly (fraud) detection, document retrieval and real-time recommendation engines.

Users can also generate vector embeddings using artificial intelligence and machine learning (AI/ML) services like Amazon Bedrock and Amazon SageMaker and store them within MemoryDB.

So then Mr Yun, which use cases would benefit most from vector search for MemoryDB?

“You can use vector search for MemoryDB for real-time semantic search for retrieval-augmented generation (RAG),” said Yun. “You can use vector search to retrieve relevant passages from a large corpus of data to augment a Large Language Model (LLM). This is done by taking your document corpus, “chunking” them into discrete buckets of texts and generating vector embeddings for each chunk with embedding models such as the Amazon Titan Multimodal Embeddings G1 model, then loading these vector embeddings into Amazon MemoryDB.”

With RAG and MemoryDB, users can build real-time generative AI applications to find similar products or content by representing items as vectors, or search documents by representing text documents as dense vectors that capture semantic meaning.

How about low latency durable semantic caching?

Yep, says Yun, that’s here too.

NOTE: Semantic caching is a process to reduce computational costs by storing previous results from the foundation model (FM) in-memory. Users can store prior inferenced answers alongside the vector representation of the question in MemoryDB and reuse them instead of inferencing another answer from the LLM.

“If a user’s query is semantically similar based on a defined similarity score to a prior question, MemoryDB will return the answer to the prior question. This use case will allow your generative AI application to respond faster with lower costs from making a new request to the FM and provide a faster user experience for your customers,” said Yun.

Can we count on real-time anomaly (fraud) detection then?

Rule-based & batch ML

Absolutely says Yun, users can use vector search for anomaly (fraud) detection to supplement their rule-based and batch ML processes by storing transactional data represented by vectors, alongside metadata representing whether those transactions were identified as fraudulent or valid.

“The machine learning processes can detect users’ fraudulent transactions when the net new transactions have a high similarity to vectors representing fraudulent transactions. With vector search for MemoryDB, you can detect fraud by modeling fraudulent transactions based on your batch ML models, then loading normal and fraudulent transactions into MemoryDB to generate their vector representations through statistical decomposition techniques such as principal component analysis (PCA),” he noted.

As inbound transactions flow through a front-end application, users can run a vector search against MemoryDB by generating the transaction’s vector representation through PCA and if the transaction is highly similar to a past detected fraudulent transaction, they can reject the transaction within single-digit milliseconds to minimize the risk of fraud.

New GA features

At re:Invent 2023, Yun explains that AWS released vector search for MemoryDB in preview. Based on feedback, new features and improvements include the ability to allow MemoryDB to operate as a low latency durable semantic cache, enabling cost optimisation and performance improvements for generative AI applications.

Users will also notice the ability to filter better on similarity when conducting vector search. There’s also shared memory so as to not duplicate vectors in memory. Vectors are stored within the MemoryDB keyspace and pointers to the vectors are stored in the vector index. Performance improvements at high filtering rates have also been included to power the performance-intensive generative AI applications.

Vector search is available in all Regions that MemoryDB is currently available.