Simple find() queries are great for retrieving documents, but what if you need to perform more complex analysis? For example, what if you want to calculate the average price of all products in a specific category, or find the top 5 most active users?

This is where the Aggregation Framework comes in. It's a pipeline for data processing, where documents from a collection pass through a series of stages. Each stage transforms the documents in some way (e.g., filtering, grouping, reshaping) and passes the result to the next stage.

The Structure of a Pipeline

An aggregation pipeline is an array of stage documents. The db.collection.aggregate([...]) method takes this array as its argument.

JavaScript


db.collection.aggregate([
    { <stage_1> },
    { <stage_2> },
    { <stage_3> },
    // ...
])

Let's look at some of the most common and powerful stages. We'll use a sample orders collection for our examples:

JSON


// Sample documents in 'orders' collection
{ "customer_id": "alice", "item": "laptop", "price": 1200, "quantity": 1 }
{ "customer_id": "bob", "item": "mouse", "price": 25, "quantity": 2 }
{ "customer_id": "alice", "item": "keyboard", "price": 100, "quantity": 1 }
{ "customer_id": "charlie", "item": "mouse", "price": 25, "quantity": 1 }
{ "customer_id": "alice", "item": "mouse", "price": 25, "quantity": 1 }

Common Aggregation Stages

$match - The Filter

This is usually the first stage in a pipeline. It filters the documents, passing only the ones that match the specified conditions to the next stage. It uses the same query syntax as the find() method.

Goal: Find all orders placed by "alice".

JavaScript


db.orders.aggregate([
    { $match: { customer_id: "alice" } }
])
// Output: All 3 documents belonging to alice

$group - The Summarizer

This is the most powerful stage. It groups input documents by a specified identifier (_id) and applies accumulator expressions to the grouped documents.

Goal: Calculate the total amount spent by each customer.

JavaScript


db.orders.aggregate([
    {
        $group: {
            _id: "$customer_id", // Group by the customer_id field
            totalSpent: { $sum: { $multiply: ["$price", "$quantity"] } } // Accumulator
        }
    }
])
  • _id: "$customer_id": Tells MongoDB to create one group for each unique customer_id. The $ prefix indicates we're using the value from a field.
  • $sum: An accumulator that sums up the values. Here, we first $multiply the price and quantity for each document before summing.

Output:

JSON


[
  { "_id": "charlie", "totalSpent": 25 },
  { "_id": "bob", "totalSpent": 50 },
  { "_id": "alice", "totalSpent": 1325 }
]

$sort - The Orderer

Sorts the documents passed to it. Use 1 for ascending order and -1 for descending.

Goal: Find the customers who spent the most, in descending order.

JavaScript


db.orders.aggregate([
    {
        $group: {
            _id: "$customer_id",
            totalSpent: { $sum: { $multiply: ["$price", "$quantity"] } }
        }
    },
    { $sort: { totalSpent: -1 } } // Sort by our newly calculated field
])
// Output: alice, then bob, then charlie

$project - The Reshaper

Passes along documents with a new shape. You can add new fields, remove existing fields, or rename them.

Goal: Reshape the output from the previous stage to have a customer field instead of _id.

JavaScript


db.orders.aggregate([
    // ... ($group and $sort stages from above) ...
    {
        $project: {
            _id: 0, // Exclude the default _id field
            customer: "$_id", // Rename _id to customer
            totalSpent: 1 // Include the totalSpent field
        }
    }
])

Final Output:

JSON


[
  { "totalSpent": 1325, "customer": "alice" },
  { "totalSpent": 50, "customer": "bob" },
  { "totalSpent": 25, "customer": "charlie" }
]

Other useful stages include $limit, $skip, $unwind, and $lookup (to perform joins). By combining these stages, you can build incredibly powerful data analysis queries that run efficiently inside your database.