mongodb 聚合框架_如何使用MongoDB的聚合框架处理高级数据处理

2023-09-06 阅读 22 评论 0

摘要：mongodb 聚合框架MongoDB has come a long way. Even though there are many NoSQL databases out there, MongoDB is the first database that comes to mind when talking about NoSQL databases. MongoDB已经走了很长一段路。即使那里有许多NoSQL数据库，MongoDB也是

mongodb 聚合框架

MongoDB has come a long way. Even though there are many NoSQL databases out there, MongoDB is the first database that comes to mind when talking about NoSQL databases.

MongoDB已经走了很长一段路。即使那里有许多NoSQL数据库，MongoDB也是谈论NoSQL数据库时想到的第一个数据库。

Although there always has been a bit of heat between people who like SQL and people who prefer NoSQL, the truth is, databases like MongoDB solve a different problem.

尽管喜欢SQL的人和喜欢NoSQL的人之间总会有一些热度，但事实是，像MongoDB这样的数据库解决了另一个问题。

And they can be really handy when handing unstructured data where manipulating the shape of data quickly and efficiently (and turning it into relevant knowledge) is more helpful than the rock-solid performance provided by old school SQL databases.

在处理非结构化数据时，它们非常方便，在这种结构中，快速有效地处理数据形状(并将其转化为相关知识)比老式SQL数据库所提供的坚如磐石的性能更有帮助。

MongoDB comes with a powerful framework for doing this – that is, manipulating data right onto the server: the Aggregation Framework. Let's get into it and cover some quick points about it, what it is, and why it is important.

MongoDB随附了一个用于执行此操作的强大框架-即，将数据直接处理到服务器上：聚合框架。让我们开始讨论它，并简要介绍它，它是什么以及为什么它很重要。

什么是汇总框架？ (What is the Aggregation Framework?)

The Aggregation framework is just a way to query documents in a collection in MongoDB. This framework exists because when you start working with and manipulating data, you often need to crunch collections together, modify them, pluck out fields, rename fields, concat them together, group documents by field, explode array of fields in different documents and so on.

聚合框架只是在MongoDB中查询集合中文档的一种方式。之所以存在这个框架，是因为当您开始使用和处理数据时，经常需要整理集合，修改它们，拔出字段，重命名字段，将它们连接在一起，按字段分组文档，在不同文档中分解字段数组等等。。

This cannot be done by the traditional querying system which MongoDB comes with (that is, the find query or update query, or any other query you might have used).

MongoDB随附的传统查询系统(即，查找查询或更新查询，或您可能使用过的任何其他查询)无法做到这一点。

The simple query set in MongoDB only allows you to retrieve full or parts of individual documents. They don't really allow you to manipulate the documents on the server and then return them to your application.

在MongoDB中设置的简单查询仅允许您检索单个文档的全部或部分。它们实际上并不允许您处理服务器上的文档，然后将其返回给您的应用程序。

This is where the aggregation framework from MongoDB comes in. It's nothing external, as aggregation comes baked into MongoDB. You can learn to work with the MongoDB aggregation framework using this free YouTube playlist I made.

这是来自MongoDB的聚合框架的来源。它不是外部的，因为聚合已烘焙到MongoDB中。您可以使用我制作的这个免费的YouTube播放列表来学习如何使用MongoDB聚合框架。

管道 (Pipeline)

The Aggregation framework relies on the pipeline concept. Let's see an image which explains it in a better way:

聚合框架依赖于管道概念。让我们看一个更好地解释它的图像：

Here, as you can see, we pick up a collection and pass it through a pipeline. This pipeline consists of certain stages where certain operators modify the documents in the collection using various techniques. Finally, the output is returned to the application calling the query.

如您所见，在这里，我们拾取一个集合并将其通过管道。该管道由某些阶段组成，其中某些操作员使用各种技术来修改集合中的文档。最后，将输出返回给调用查询的应用程序。

Compare it with a simple query, like find. Sure, it works in most ways, but it is not really useful when you want to modify the data as well while retrieving it.

将其与简单的查询(如find)进行比较。当然，它可以在大多数方面起作用，但是当您在检索数据时也要修改数据时，它并没有真正的用处。

Either you'll need to fetch the documents and modify them accordingly in the application on the server, or worse, you'll send them to the client and let the frontend code modify it for you.

您需要获取文档并在服务器上的应用程序中进行相应的修改，或者更糟的是，将它们发送给客户端，并让前端代码为您修改它们。

In both cases, you're wasting resources and bandwidth. Thus, the aggregation framework neatly addresses this problem. Let's see how it does that with the operators.

在这两种情况下，您都在浪费资源和带宽。因此，聚合框架巧妙地解决了这个问题。让我们看看它如何与运营商合作。

管道运营商 (Pipeline operators)

In MongoDB, the pipeline is an array consisting of various operators, which take in a bunch of documents and spit out modified documents according to the rules specified by the programmer. The next operator takes in the documents spat out by the previous operator, hence, it's called a pipeline.

在MongoDB中，管道是由各种运算符组成的数组，它们根据程序员指定的规则接收一堆文档并吐出修改过的文档。下一个运算符接收上一个运算符吐出的文档，因此，它被称为管道。

You can have many operators in a pipeline, and these operators can be repeated as well, unlike regular MongoDB queries.

与常规MongoDB查询不同，您可以在管道中包含许多运算符，并且这些运算符也可以重复。

Let's take a look at some common pipeline operators in MongoDB.

让我们看一下MongoDB中一些常见的管道运算符。

$组 ($group)

This operator allows you to group a bunch of documents together on the basis of a certain field in documents. It can also be used to group the various fields in the documents together.

此运算符允许您根据文档中的某个字段将一堆文档分组在一起。它也可以用于将文档中的各个字段组合在一起。

I'm a big believer in the saying that a picture is worth a 1000 words. A video is worth a 1000 pictures (well, technically a lot more pictures, but okay), so let's see a quick video on that:

我非常相信一幅图片值得一千个单词。一部视频价值一千张照片(嗯，从技术上讲，它的图片要多得多，但是还可以)，所以让我们看一个快速的视频：

$ match ($match)

The match pipeline operator works very similarly to how the regular find operator works. The good thing about this, however, is that it can be used multiple times because you're in a pipeline environment! This makes it powerful.

匹配管道运算符的工作方式与常规查找运算符的工作方式非常相似。这样做的好处是，由于您处于管道环境中，因此可以多次使用！这使其功能强大。

Let's see how it is used on a collection:

让我们看看如何在集合中使用它：

$ limit ($limit)

The $skip pipeline operator skips the first N documents and passes the rest of the documents to the next operator. Let's see a quick example:

$ skip管道运算符将跳过前N个文档，并将其余文档传递给下一个运算符。让我们看一个简单的例子：

$跳过 ($skip)

The $skip pipeline operator skips the first N documents and passes rest of the documents to the next operator. Let's see a quick example:

$ skip管道运算符将跳过前N个文档，并将其余文档传递给下一个运算符。让我们看一个简单的例子：

$放开 ($unwind)

This operator is personally my favorite. $unwind takes in an array field and explodes it into multiple N sub-documents with the i-th document containing the i-th particular value of array as the value of the field name.

我个人最喜欢这个操作员。 $ unwind接收一个数组字段并将其分解为多个N子文档，其中第i个文档包含array的第i个特定值作为字段名称的值。

Combined with other operators like $group and $match, this becomes very powerful for data processing. Sounds confusing? Let's look at a simple example:

结合$ group和$ match等其他运算符，它对于数据处理变得非常强大。听起来令人困惑？让我们看一个简单的例子：

$项目 ($project)

The project operator allows you to pluck out a bunch of fields from every document and discard the rest. Not only that, but you can also rename the plucked fields, concat strings, take out substrings and much more!

项目操作员允许您从每个文档中抽取一堆字段，然后丢弃其余字段。不仅如此，您还可以重命名采摘字段，concat字符串，取出子字符串等等！

Let's see how this works in a nutshell:

简而言之，看看它是如何工作的：

使用汇总框架的最佳做法 (Best Practices for using the Aggregation Framework)

With great power comes great responsibility. You can easily exploit the aggregation framework for doing simple queries too, so it's important to make sure that you're not writing poor database queries.

拥有权利的同时也被赋予了重大的责任。您也可以轻松利用聚合框架来执行简单查询，因此确保您不会编写不良的数据库查询非常重要。

To begin with, keep the following points in mind:

首先，请记住以下几点：

MongoDB will reject any operator that takes more than 100MB of RAM and will throw an error. So make sure you trim down your data as soon as possible as a single operator should not take up more than 100MB memory.
MongoDB将拒绝占用超过100MB RAM的任何运算符，并会引发错误。因此，请确保您尽快缩减数据，因为单个操作员占用的内存不应超过100MB。
Order matters! Putting $match first will reduce the number of documents passed to the rest of the pipeline. Putting $project next will then further reduce the size of an individual document by getting rid of fields.
订单很重要！将$ match放在第一位将减少传递到其余管道的文档数量。接下来放置$ project将通过摆脱字段来进一步减小单个文档的大小。
Finally, make sure you do all the work which requires the use of indexed fields (sorting, matching, etc.) before you use operators like $project or $unwind. This is because these operators create new documents that do not have the indexes from the original document.
最后，在使用$ project或$ unwind之类的运算符之前，请确保您已完成所有需要使用索引字段(排序，匹配等)的工作。这是因为这些运算符会创建不具有原始文档索引的新文档。

结论 (Conclusion)

MongoDB is a great database tool and can be really helpful for small startups and businesses that want to iterate quickly. This is in part due to its loose restrictions and forgiving nature.

MongoDB是一个出色的数据库工具，对于希望快速迭代的小型初创企业和企业而言，确实很有帮助。部分原因是其宽松的限制和宽容的性质。

I'm using MongoDB myself at codedamn - a platform for developers like you where everyone learns and connects!

我自己在Codedamn上使用MongoDB-一个像您这样的开发人员的平台，每个人都可以学习和连接！

Peace!

和平！