English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

MongoDB Aggregation

Aggregate operations process data records and return calculated results. Aggregate operations group values from multiple documents together and can perform various operations on grouped data to return a single result. In SQL, count(*)and group by are equivalent to MongoDB aggregation.

Aggregate() method

For MongoDB aggregation, you should use aggregate() Method.

Syntax

aggregate()The basic syntax of the method is as follows-

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Online examples

In a collection, you have the following data-

{
   _id: 'ObjectId('7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is not a SQL database',
   by_user: 'oldtoolbag.com',
   url: 'https://www.oldtoolbag.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: 'ObjectId('7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No SQL database is very fast',
   by_user: 'oldtoolbag.com',
   url: 'https://www.oldtoolbag.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: 'ObjectId('7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is No SQL database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

Now, if you want to display a list from the above collection to show how many tutorials each user has written, you will use the followingaggregate()Method-

> db.mycol.aggregate([{$group : {"_id" : "$by_user", "num_tutorial" : {$sum : 1}}}])
{"_id": "oldtoolbag.com", "num_tutorial": 2 }
{"_id": "Neo4j", "num_tutorial": 1 }
>

The SQL equivalent query for the above use case will be select by_user, count(*) from mycol group by by_user.

In the above example, we have sorted by the fieldby_userThe documents are grouped, and the previous sum value will increase each time the documents are grouped by the user. Here is a list of available aggregation expressions.

ExpressionDescriptionExample
$sumCalculate the total.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avgCalculate the average valuedb.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$minGet the minimum value of the corresponding value of all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$maxGet the maximum value of the corresponding value of all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$pushInsert a value into an array in the result document.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSetInsert a value into an array in the result document without creating a duplicate.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$firstGet the first document data according to the sorting of the resource document.db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$lastGet the last document data according to the sorting of the resource documentdb.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

The concept of a pipeline

Pipes are generally used in Unix and Linux to pass the output of the current command as an argument to the next command.

MongoDB's aggregation pipeline passes the result of processing a MongoDB document to the next pipeline for processing. Pipeline operations can be repeated.

Expression: Processes input documents and outputs. Expressions are stateless and can only be used to compute the documents in the current aggregation pipeline, not others.

Here, we introduce several commonly used operations in the aggregation framework:

  • $project: Modify the structure of input documents. It can be used to rename, add, or delete fields, as well as to create computed results and nested documents.

  • $match: Used for filtering data, only outputting documents that meet the conditions. $match uses MongoDB's standard query operations.

  • $limit: Used to limit the number of documents returned by MongoDB's aggregation pipeline.

  • $skip: Skip a specified number of documents in the aggregation pipeline and return the remaining documents.

  • $unwind: Split a certain array type field in a document into multiple entries, each containing a value from the array.

  • $group: Group documents in a collection, which can be used for statistical results.

  • $sort: Sort the input documents and then output them.

  • $geoNear: Output documents that are close to a certain geographical location in order.

an instance of pipeline operator

1, an instance of $project

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
 );

In this way, the result will only contain the _id, title, and author fields. By default, the _id field is included. If you want to exclude _id, you can do it like this:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }});

2.$match instance

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: null, count: { $sum: 1 } } }
                       ] );

$match is used to get the score greater than70 is less than or equal to90 records, then the qualified records are sent to the next stage $group pipeline operator for processing.

3.$skip instance

db.article.aggregate(
    { $skip : 5 });

The first five documents are "filtered out" after the $skip pipeline operator is processed.