Data scientists loves vector databases, this year more than ever.
Why is this so? Because vector databases have the ability to detect and identify relationships in data, so their usage has become increasingly popular as users seek to gain more meaning from data.
Vector databases are ideal for applications like recommendation systems, anomaly detection and natural language processing, and as sources for AI applications – specifically Large Language Models (LLM).
As lakefs.io reminds us, “A vector database is designed to store and retrieve vector embeddings – the distilled representations of training data produced and output from the training stage of the machine learning process. They serve as the filter through which fresh data is processed during inference.”
All well and good for vector database companies then (organisations like DataStax, KX and Elasticsearch come first to mind)
Milvus, Pinecone & Weaviate
Open source ‘data movement’ (another way of saying data integration) platform Airbyte has now made available additional connectors for the Milvus (an open source vector database, built for developing and maintaining AI applications) Qdrant (a vector database & vector similarity search engine that deploys as an API service providing search for the nearest high-dimensional vectors) and Weaviate (an open source vector database that allows users to store data objects and vector embeddings from ML-models and scale them into billions of data objects) as the destination for moving data from hundreds of data sources, which then can be accessed by Artificial Intelligence (AI) models.
“We were the first general purpose data movement platform to add support for vector databases – the first to build a bridge between data movement platforms and AI,” said Michel Tricot, CEO, Airbyte. “Now, we are doubling down as our users are clamouring for more and more vector database support so they don’t have to struggle with creating custom code to bring in data; they can use the new Airbyte connector to select the data sources they want.”
The vector database destination in Airbyte enables users to configure the full ELT pipeline, starting from extracting records from a variety of sources to separating unstructured and structured data, preparing and embedding text contents of records, and finally loading them into vector databases – all through a single, user-friendly interface.
These vector databases can then be accessed by LLMs. All existing advantages of the Airbyte platform are now extended to vector databases, including the availability of a no-code connector builder that makes it possible to create new connectors for data integrations that addresses the ‘long-tail’ of data sources.
Certified connectors for both Airbyte Cloud and Airbyte Open Source Software (OSS) versions are now available for Milvus, Pinecone and Weaviate.
There is a community connector for both versions of Airbyte for Qdrant, as well as a community connector for Airbyte OSS available for Chroma. More options are planned for the future.