The Art and Science of Sizing Search Nodes
Getting the most out of your search deployment isn't just about writing the perfect query; it's about ensuring the underlying system is perfectly sized for your workload. For many, this has meant facing a difficult choice. If your search indexes are large but your query and indexing rates are moderate, you may have been forced to scale up to more expensive, higher-tiered nodes simply to get the storage capacity you need. This often leads to overprovisioning compute resources and unnecessary costs.
To solve this and provide more cost-effective scaling, we are introducing
storage-optimized search nodes
. These nodes are designed specifically for use cases where large index sizes are the primary scaling factor, rather than high computational demands from indexing or querying.
This post will delve into the key components of sizing a search deployment, from data ingestion and index size to query performance. We'll provide context on how to scope your workloads and show how our new storage-optimized nodes offer a powerful new way to build the most cost-effective and performant solution for your specific needs.
Understanding the core components of search node sizing
Several key factors influence the sizing of your Atlas Search Node deployment:
1. Data size and index size
The first consideration is your index size. Dedicated search nodes (DSN) utilize local solid-state drives (SSDs) of a fixed size. Therefore, a DSN must possess adequate disk space for the index. It's crucial to remember that a collection's size and the resulting search index's size are not always directly related due to index mapping. For example, if your documents have 100 fields but your search index is configured for only 5, the index will be substantially smaller than the collection. Conversely, mapping all fields or using features like autocomplete can significantly increase index size.
Estimating index size:
Insert 1-2 GB of data or create a small collection using
$out
.
Create a search index with your desired field mappings.
The resulting index size will give you an index-to-collection size ratio.
Use this ratio to estimate the total index size based on your expected collection size. For instance, if a 1GB collection yields a 250MB index (a 0.25:1 ratio), a 12GB collection would likely result in an approximately 3GB index. If you already use Atlas Search, you can find the index size in
cluster metrics
or on the index list page.
2. Data ingestion
Data must first be inserted into a MongoDB collection to become searchable. When a search index is created, a collection scan populates the index. To keep the index current, Atlas Search uses change streams to monitor collection alterations. Both initial indexing and ongoing synchronization can impose considerable read pressure on the cluster. The cluster must be sized appropriately to handle this; otherwise, replication lag between the cluster and the search index can increase. For very heavy data ingestion, MongoDB sharding can distribute the read/write load.
3. Indexing
Indexing is the process of applying inserts, updates, and deletes from change streams to the search index. This can be resource-intensive, depending on ingest and update rates. Optimizing indexing involves considering both the cluster and the search node.
4. Steady-state replication and lag
The goal is to replicate data from the collection to the index in under one second. However, various factors can extend this time. To minimize replication lag, consider these potential bottlenecks:
Cluster:
High resource utilization on the cluster can impact its ability to publish change streams quickly enough. Aim to minimize overall load and ensure it's spread evenly across replica set members.
Change streams:
Listening to change streams can put substantial read pressure on a cluster. Adding additional secondaries to the replica set can alleviate this.
DSN indexing:
Search Nodes use high-performance local disks. Each tier offers higher input/output per second (IOPS). If you observe high vCPU utilization with heavy insert rates, upgrading to a higher DSN tier might be beneficial.
Indexing parallelism:
For extremely heavy indexing loads (e.g., >10k inserts/updates per second), sharding the replica set may be necessary. Sharding allows Atlas Search to index each shard independently, reducing the overall load.
Number of indexes:
A large number of search indexes can also contribute to replication lag and affect eventual consistency. Each index adds overhead, and having a high number of them can slow down the replication process.
5. Query performance: QPS and latency
When sizing your Atlas Search deployment, two key performance metrics are critical: Queries Per Second (QPS) and latency.
Queries per second (QPS):
This measures sustained query throughput. A general starting point for estimation is 10 QPS per vCPU core. For example, a minimum setup of two S20 nodes (each with 2 vCPUs) provides 4 vCPUs, supporting roughly 40 QPS. This is a baseline; query complexity and index mappings will influence actual QPS. QPS is supported by horizontal scaling; you can deploy up to 32 Search Nodes per cluster/shard/region to increase the overall vCPU count.
Latency:
This is the time between query execution and response receipt. The general aim is sub-100ms latency, though some cases demand much lower latency. Latency is a function of DSN resources and can be improved by vertical scaling (moving to a higher search node tier). Sufficient CPU, RAM, and disk I/O are essential. CPU is primarily leveraged for queries using concurrent segment search; if adding CPU resources reduces latency, it indicates underprovisioning.
The challenge: When storage is the bottleneck
While horizontal and vertical scaling effectively address performance, what happens when your primary challenge isn't speed, but size? As applications mature, search indexes can grow to hundreds of gigabytes or even terabytes.
With a traditional coupled architecture, this growth forces you to scale up your entire database cluster, leading to significant costs for compute resources you may not even need. Even with dedicated search nodes, which utilize fast but fixed-size local SSDs, you previously had to upgrade to a higher tier solely for more storage capacity. This often results in overprovisioning compute resources and inflating costs. For example, a customer needing the compute of an S50 node but the storage of an S70 could face a major price premium.
The solution: Introducing storage-optimized search nodes
To address this exact problem, MongoDB is
introducing storage-optimized search nodes
. These nodes are engineered for workloads where the footprint of your index is the main scaling factor, not high query rates or intense indexing operations. If your indexes are large but your query load is moderate, these nodes offer a cost-effective path to scale without overprovisioning.
Key benefits include:
Increased storage capacity:
They provide
more than double
the storage capacity of other node classes.
Cost-effectiveness:
These nodes can
save up to 50%
on DSN storage costs and are approximately
40% less expensive
than existing high-CPU nodes when anchored on RAM. At the same vCPU count, users receive roughly
3x the storage
compared to high-CPU nodes.
Optimized architecture:
With an
8:1 RAM-to-vCPU ratio
, they offer a balanced profile perfect for large indexes.
Choosing the right tool for the job: A comparison of search nodes
Atlas Search now provides three distinct classes of dedicated nodes, giving you the flexibility to isolate workloads and perfectly match your infrastructure to your needs.
Node classes comparison:
table,
th,
td {
border: 1px solid black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Class
Notes
Use Case(s)
Storage Capacity Range
Low-CPU
8:1 RAM:vCPU ratio, very large memory options, smaller storage options
Vector search, Low data volume
50 - 3200 GB
High-CPU
2:1 RAM:vCPU ratio
FT/Lexical search, Vector Search, Heavy indexing, Heavy query load
100 - 3200 GB
Storage-optimized
8:1 RAM:vCPU ratio, very large memory options, 2x+ storage capacity over high/low CPU
Large search indexes, Moderate indexing, Moderate query load, Binary quantized vector search
375 - 6000 GB
Unlocking modern AI: The impact on vector search
The rise of AI has put a new focus on vector search. While traditionally memory-constrained, modern techniques like
automatic binary quantization
are shifting the bottleneck from RAM to storage. Binary quantization makes indexes more storage-constrained, and storage-optimized nodes are the perfect solution.
For a large-scale vector search deployment using BQ that requires 3600GB of storage, you can now select a storage-optimized node that fits your needs precisely, rather than drastically overprovisioning a high-CPU node just for its disk space. This alignment of resources to workload ensures you can build and scale modern AI applications efficiently and economically.
Concluding thoughts
Sizing search deployments is a blend of art and science. This post has provided guidance on the specific components of search and general approaches to maximize search indexing and query performance. The introduction of storage-optimized search nodes further enhances your ability to right-size your Atlas Search deployments, ensuring you have the most cost-effective and performant solution for your specific workload needs.
Learn more about MongoDB Atlas Search Nodes and the new storage-optimized options in our
documentation
.
August 12, 2025