Best Practices For Time Series Collections - MongoDB Manual v8.0
Best Practices For Time Series Collections - MongoDB Manual v8.0
This page describes best practices to improve performance and data usage for time series collections.
{
timestamp: ISODate("2020-01-23T00:00:00.441Z"),
coordinates: [1.0, 2.0]
},
{
timestamp: ISODate("2020-01-23T00:00:10.441Z"),
coordinates: []
},
{
timestamp: ISODate("2020-01-23T00:00:20.441Z"),
coordinates: [3.0, 5.0]
}
coordinates fields with populated values and coordinates fields with an empty array result in a
schema change for the compressor. The schema change causes the second and third documents in the
sequence to remain uncompressed.
Optimize compression by omitting the fields with empty values, as shown in the following documents:
{
timestamp: ISODate("2020-01-23T00:00:00.441Z"),
coordinates: [1.0, 2.0]
},
{
timestamp: ISODate("2020-01-23T00:00:10.441Z")
},
{
timestamp: ISODate("2020-01-23T00:00:20.441Z"),
coordinates: [3.0, 5.0]
}
If possible, insert data that contains identical metaField values in the same batches.
The following operation inserts six documents, but only incurs the cost of two inserts (one
per metaField value), because the documents are ordered by sensor. The ordered parameter is set
to false to improve performance:
db.temperatures.insertMany(
[
{
metaField: {
sensor: "sensorA"
},
timestamp: ISODate("2021-05-18T00:00:00.000Z"),
temperature: 10
},
{
metaField: {
sensor: "sensorA"
},
timestamp: ISODate("2021-05-19T00:00:00.000Z"),
temperature: 12
},
{
metaField: {
sensor: "sensorA"
},
timestamp: ISODate("2021-05-20T00:00:00.000Z"),
temperature: 13
},
{
metaField: {
sensor: "sensorB"
},
timestamp: ISODate("2021-05-18T00:00:00.000Z"),
temperature: 20
},
{
metaField: {
sensor: "sensorB"
},
timestamp: ISODate("2021-05-19T00:00:00.000Z"),
temperature: 25
},
{
metadField: {
sensor: "sensorB"
},
timestamp: ISODate("2021-05-20T00:00:00.000Z"),
temperature: 26
}
],
{ "ordered": false }
)
For example, inserting the following documents, all of which have the same field order, results in optimal
insert performance.
{
_id: ObjectId("6250a0ef02a1877734a9df57"),
timestamp: ISODate("2020-01-23T00:00:00.441Z"),
name: "sensor1",
range: 1
},
{
_id: ObjectId("6560a0ef02a1877734a9df66"),
timestamp: ISODate("2020-01-23T01:00:00.441Z"),
name: "sensor1",
range: 5
}
In contrast, the following documents do not achieve optimal insert performance, because their field orders
differ:
{
range: 1,
_id: ObjectId("6250a0ef02a1877734a9df57"),
name: "sensor1",
timestamp: ISODate("2020-01-23T00:00:00.441Z")
},
{
_id: ObjectId("6560a0ef02a1877734a9df66"),
name: "sensor1",
timestamp: ISODate("2020-01-23T01:00:00.441Z"),
range: 5
}
Increase the Number of Clients
Increasing the number of clients that write data to your collections can improve performance.
NOTE
Starting in MongoDB 8.0, the use of the timeField as a shard key in time series collections is
deprecated.
If possible, select identifiers or other stable values that are common in filter expressions as part of
your metaField.
Avoid selecting fields that are not used for filtering as part of your metaField. Instead, use those fields
as measurements.
You can improve performance by setting the granularity or custom bucketing parameters to the best
match for the time span between incoming measurements from the same data source. For example, if you
are recording weather data from thousands of sensors but only record data from each sensor once per 5
minutes, you can either set granularity to "minutes" or set the custom bucketing parameters
to 300 (seconds).
In this case, setting the granularity to hours groups up to a month's worth of data ingest events into a
single bucket, resulting in longer traversal times and slower queries. Setting it to seconds leads to multiple
buckets per polling interval, many of which might contain only a single document.
The following table shows the maximum time interval included in one bucket of data when using a
given granularity value:
seconds 1 hour
minutes 24 hours
hours 30 days
TIP
See also:
Timing of Automatic Removal
Use the timeField and other indexed fields for range queries.
General indexing strategies also apply to time series collections. For more information, see Indexing
Strategies.
db.weather.insertMany( [
{
metaField: { sensorId: 5578, type: "temperature" },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
},
{
metaField: { sensorId: 5578, type: "temperature" },
timestamp: ISODate( "2021-05-18T04:00:00.000Z" ),
temp: 11
}
] )
The following query on the sensorId and type scalar sub-fields returns the first document that matches
the query criteria:
db.weather.findOne( {
"metaField.sensorId": 5578,
"metaField.type": "temperature"
} )
Example output:
{
_id: ObjectId("6572371964eb5ad43054d572"),
metaField: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
}
For example, to query for distinct meta.type values on documents where meta.project = 10, instead
of:
Use:
db.foo.createIndex({"meta.project":1, "meta.type":1})
db.foo.aggregate([{$match: {"meta.project": 10}},
{$group: {_id: "$meta.type"}}])
1. Creating a compound index on meta.project and meta.type and supports the aggregation.
2. The $match stage filters for documents where meta.project = 10.
3. The $group stage uses meta.type as the group key to output one document per unique value.