AWS recalibrates data economics further with S3 Vectors, batch & Intelligent-Tiering
AI needs data, AI needs inter (and intra) data repository contextual linking and AI needs all of that with a whole menu of search types spanning contextual search, semantic search, SQL search and now (especially in a world of AI-driven LLM-enriched informatics) we also need vector search too.
During this year’s AWS re: Invent 2025 in Las Vegas, AWS detailed the fact that Amazon S3 Vectors is now generally available with significant scale and performance improvements.
What is Amazon S3 Vectors?
Amazon S3 Vectors is an AWS service that functions to add native vector database functionality (data stores capable of weaving interrelationships together between not just two objects, but three and, often significantly, more) to instances running in Amazon S3 (AWS core object storage service that the company says has “recalibrated the economics” of storage today) in a way that lets data science teams store and query “vector embeddings” directly within S3.
Vector embeddings are numerical values that exist as representations of names, words or phrases i.e. unstructured data (like text, images, or audio) that embody meanings and relationships to enable machine learning models to comprehend, process and analyse text. Sometimes referred to as digital fingerprints for data itself, vector embeddings are part of a multidimensional process for transforming individual data points (plural) into vectors to encapsulate semantic context by encoding data relationships.
What is AWS doing with Amazon S3 Vectors?
The company is saying that Amazon S3 Vectors makes S3 a “more cost-effective and scalable” solution for AI and machine learning applications that require searching and retrieving data based on semantic meaning, such as semantic search, AI agent memory and recommendation systems.
How does it work?
In parallel with our definitions above, S3 Vectors stores vector embeddings created by AI models. The technology uses created vector buckets and indexes, meaning users can store this data in “vector buckets” and can organise it into “vector indexes” within those buckets, each capable of holding millions of vectors.
- Native vector search – enables sub-second query performance for these vector indexes, allowing users to find semantically similar data without needing a separate, external vector database.
- Metadata – means users can attach metadata to each vector to filter search results based on conditions like date, category, or user preferences
20 trillion vectors per bucket
Designed to provide the same elasticity, scale and durability as Amazon S3, S3 Vectors scales up to two billion vectors per index (40x preview capacity), supports up to 20 trillion vectors per bucket, delivers 2-3x faster frequent-query performance and reduces costs by up to 90% over alternatives – eliminating overhead for customers building AI applications.
S3 Vectors brings these capabilities to customer data and integrates with Amazon Bedrock Knowledge Bases and Amazon OpenSearch Service, making it easy to build AI agents, RAG systems, inference pipelines and semantic search applications that understand context and intent.
Customers like BMW Group, MIXI, Precisely, Qlik and Twilio are using S3 Vectors to accelerate AI search and power recommendation systems at scale – without the complexity or cost of managing a dedicated vector infrastructure.
50TB should be enough for anyone
Remember when Bill Gates famously said that 640K ought to be enough for anybody? Spoiler alert, it’s probably apocryphal and he denies ever saying it, but the idea that even technology leaders should underestimate the need for future storage, processing, analytics and more
So then, in related product news, AWS notes that because data volumes have surged in recent years, we now work at a point where Amazon S3 is now storing more than 500 trillion objects and hundreds of exabytes of data.
“As individual objects also grow larger, AWS is increasing the maximum S3 object size 10x from 5TB to 50TB, so customers can store massive data files like high-resolution videos, seismic data and AI training datasets as single objects in their original form – simplifying workflows while maintaining full access to all S3 storage classes and features,” detailed AWS, in a press statement.
AWS has also accelerated Amazon S3 Batch Operations for large jobs to run up to 10x faster, delivering the speed that users need for large-scale data processing and time-sensitive data migrations.
“With S3 Batch Operations, customers can perform batch workloads such as replicating objects across AWS Regions for backup or disaster recovery, tagging objects for S3 Lifecycle management and computing object checksums to verify the content of stored datasets, at a scale of up to 20 billion objects in a job,” said the company.
Since launch, Amazon S3 Tables for Apache Iceberg workloads has grown to more than 400,000 tables. The technology has also launched over 15 new features and capabilities in the last 12 months, so it has driven the development of S3 native Iceberg support for data lakes.
At this year’s re: Invent, AWS is adding two major capabilities to S3 Tables: support for the Intelligent-Tiering storage class and automatic replication across AWS Regions and accounts.
Why use Intelligent-Tiering?
AWS says Intelligent-Tiering is all about automatic cost optimisation, it automatically optimises table data across three access tiers:
- Frequent Access,
- Infrequent Access
- Archive Instant Access
This archiving is based on access patterns and it offers up to 80% storage cost savings without performance impact or operational overhead. Automatic replication enables distributed teams to query local data for faster performance while maintaining consistency across Regions and accounts. Customers can now automatically replicate tables, eliminating manual updates and complex syncing—simplifying compliance and backup management while keeping complete table structures intact and ready to use.
Finally, for this group of updates, AWS is expanding Amazon S3 Access Points to support Amazon FSx for NetApp ONTAP so that customers can access the files they store in FSx as if the data were in S3.
“Customers can now use the data they store in FSx for NetApp ONTAP with AWS’s AI, ML and analytics services that are built to work with S3 data – such as Amazon Bedrock Knowledge Bases, Amazon SageMaker and Amazon Athena. With this integration, customers with NetApp ONTAP enterprise data on premises can easily migrate to FSx for NetApp ONTAP and start using that data with all S3 compatible tools and applications for analytics and AI.”
Recalibrated data economics?
Has AWS truly recalibrated data economics, then?
Many would argue that it has i.e. by sheer weight of influence in the data industry, any shift that AWS (or, for argument’s sake, any cloud service provider hyperscaler) makes at the storage management, data processing and vector capabilities level is likely to have a profound impact on the way modern enterprise applications are able to dovetail with the most demanding AI-centric application use cases, particularly when there is an ease-of-use element that manifests itself in the platform via agentic services that help transform datasets, workloads and applications themselves.

