通过聚合转换数据

Overview

在本指南中，您可以学习；了解如何使用Ruby驾驶员执行聚合操作。

聚合操作处理 MongoDB 集合中的数据并返回计算结果。 MongoDB 聚合框架是 Query API 的一部分，以数据处理管道的概念为模型。文档进入包含一个或多个阶段的管道，该管道将文档转换为聚合结果。

提示

完成聚合教程

您可以在服务器手册的完整聚合管道教程部分找到详细解释常见聚合任务的教程。选择一个教程，然后从页面右上角的下拉菜单中选择Ruby Select your language。

类比

聚合操作类似于汽车工厂。汽车工厂有一条装配线，其中包含配备专用工具的装配站，用于完成特定的工作，例如钻机和焊机。毛坯零件会进入工厂，然后装配线将其转换并组装为成品。

聚合管道是装配线，聚合阶段是装配站，操作符表达式则是专用工具。

比较聚合与查找操作

下表列出了查找操作可以执行的不同任务，并将它们与聚合操作可以执行的任务进行了比较。聚合框架提供了扩展功能，允许您转换和操作数据。

查找操作	聚合操作
Select certain documents to return Select which fields to return Sort the results Limit the results Count the results	Select certain documents to return Select which fields to return Sort the results Limit the results Count the results Rename fields Compute new fields Summarize data Connect and merge data sets

限制

执行聚合操作时要考虑以下限制：

返回的文档不能违反16 MB 的BSON文档大小限制。
默认下，管道阶段的内存限制为 100 MB。您可以通过将 true 值传递给 allow_disk_use 方法并将该方法链接到 aggregate 来超过此限制。
$graphLookup操作符有 100兆字节的严格内存限制，并忽略传递给allow_disk_use 方法的值。

运行聚合操作

注意

样本数据

本指南中的示例使用Atlas示例数据集的sample_restaurants数据库中的restaurants集合。要学习；了解如何创建免费的MongoDB Atlas 群集并加载示例数据集，请参阅Atlas入门指南。

要执行聚合，请将每个管道阶段定义为Ruby hash，然后将操作管道传递给 aggregate 方法。

聚合示例

以下代码示例计算纽约每个区的面包店数量。为此，它使用具有以下阶段的聚合管道：

$match阶段，用于筛选cuisine字段包含值"Bakery"的文档。
$group阶段，用于按borough字段对匹配文档进行分组，并累积每个不同值的文档计数。

database = client.use('sample_restaurants')
restaurants_collection = database[:restaurants]
  
pipeline = [
  { '$match' => { 'cuisine' => 'Bakery' } },
  { '$group' => {
      '_id' => '$borough',
      'count' => { '$sum' => 1 }
    }
  }
]
aggregation = restaurants_collection.aggregate(pipeline)
  
aggregation.each do |doc|
  puts doc
end

{"_id"=>"Bronx", "count"=>71}
{"_id"=>"Manhattan", "count"=>221}
{"_id"=>"Queens", "count"=>204}
{"_id"=>"Missing", "count"=>2}
{"_id"=>"Staten Island", "count"=>20}
{"_id"=>"Brooklyn", "count"=>173}

解释聚合

要查看有关MongoDB如何执行您的操作的信息，您可以指示MongoDB 查询规划器对其进行解释。MongoDB解释操作时，会返回执行计划和性能统计信息。执行计划是MongoDB完成操作的一种潜在方式。当您指示MongoDB解释一个操作时，默认下它会返回MongoDB执行的计划和任何被拒绝的执行计划。

要解释聚合操作，请将 explain 方法链接到 aggregate 方法。

以下示例指示MongoDB解释前面聚合示例中的聚合操作：

explanation = restaurants_collection.aggregate(pipeline).explain()
puts explanation

{"explainVersion"=>"2", "queryPlanner"=>{"namespace"=>"sample_restaurants.restaurants",
"parsedQuery"=>{"cuisine"=> {"$eq"=> "Bakery"}}, "indexFilterSet"=>false,
"planCacheKey"=>"6104204B", "optimizedPipeline"=>true, "maxIndexedOrSolutionsReached"=>false,
"maxIndexedAndSolutionsReached"=>false, "maxScansToExplodeReached"=>false,
"prunedSimilarIndexes"=>false, "winningPlan"=>{"isCached"=>false,
"queryPlan"=>{"stage"=>"GROUP", "planNodeId"=>3,
"inputStage"=>{"stage"=>"COLLSCAN", "planNodeId"=>1, "filter"=>{},
"direction"=>"forward"}},...}

运行Atlas全文搜索

注意

仅适用于具有Atlas Search索引的集合

此聚合管道操作符仅适用于具有Atlas Search索引的集合。

要指定对一个或多个字段进行全文搜索，可以创建 $search管道阶段。

此示例创建管道阶段来执行以下操作：

在 name术语中搜索 "Salt"字段
仅投影匹配文档的 _id 和 name 值

重要

要运行以下示例，您必须在覆盖 name字段的 restaurants集合上创建Atlas Search索引。然后，将 "<your_search_index_name>" 占位符替换为索引的名称。要学习；了解如何创建Atlas Search索引，请参阅 Atlas Search索引指南。

search_pipeline = [
  {
    '$search' => {
      'index' => '<your_search_index_name>',
      'text' => {
        'query' => 'Salt',
        'path' => 'name'
      },
    }
  },
  {
    '$project' => {
      '_id' => 1,
      'name' => 1
    }
  }
]
    
results = collection.aggregate(search_pipeline)
results.each do |document|
  puts document
end

{"_id"=>  {"$oid"=>  "..."}, "name"=>  "Fresh Salt"}
{"_id"=>  {"$oid"=>  "..."}, "name"=>  "Salt & Pepper"}
{"_id"=>  {"$oid"=>  "..."}, "name"=>  "Salt + Charcoal"}
{"_id"=>  {"$oid"=>  "..."}, "name"=>  "A Salt & Battery"}
{"_id"=>  {"$oid"=> "..."},  "name"=>  "Salt And Fat"}
{"_id"=>  {"$oid"=>  "..."}, "name"=>  "Salt And Pepper Diner"}