mongodb集群搭建詳細，MongoDB聚合(二)-技術動態-匯編語言學習筆記

2019獨角獸企業重金招聘Python工程師標準>>>
MongoDB聚合(二)2.3?$group
$group操作可以將文檔根據給定字段的不同值進行分組。
如果選定了需要分組的字段，可以將選定的字段傳遞給"$group"函數的"_id"字段。
{?"$group"?:?{"_id"?:?"$day"}?}
{?"$group"?:?{"_id"?:?"$grade"}?}
{?"$group"?:?{"_id"?:?{"state"?:?"$state",?"city"?:?"$city"}}?}
2.3.1?分組操作符
允許對每個分組進行計算，得到相應的結果。
2.3.2?算術操作符
兩個操作符用于對數值類型的字段的值進行計算："$sum"和"$average"。
"$sum"，對于分組中的每個文檔，將value與計算結果相加。
例如：
插入測試數據：
countrys?=?["china",?"japan",?"germany",?"canada",?"india"]
for?(var?i?=?1;?i?<?100;?i?++)?{name?=?countrys[Math.floor(Math.random()?*?countrys.length)]money?=??Math.floor(1000?*?Math.random())db.foo.insert({"country"?:?name,?"_id"?:?i,?"money"?:?money})
}
分組：
db.foo.aggregate(
{"$group"?:?{"_id"?:?"$country","total"?:?{"$sum"?:?"$money"}}
})
返回結果：
{?"_id"?:?"canada",?"total"?:?11316?}
{?"_id"?:?"china",?"total"?:?7225?}
{?"_id"?:?"india",?"total"?:?11190?}
{?"_id"?:?"germany",?"total"?:?10288?}
{?"_id"?:?"japan",?"total"?:?9330?}
$avg，返回每個分組的平均值。
例如：
聚合：
db.foo.aggregate(
{"$group"?:?{"_id"?:?"$country","average"?:?{"$avg"?:?"$money"}}
})
返回結果：
{?"_id"?:?"canada",?"average"?:?595.578947368421?}
{?"_id"?:?"china",?"average"?:?481.6666666666667?}
{?"_id"?:?"india",?"average"?:?559.5?}
{?"_id"?:?"germany",?"average"?:?411.52?}
{?"_id"?:?"japan",?"average"?:?466.5?}
2.3.3?極值操作符
極值操作符獲取數據集合的邊緣值。
"$max"?:?expr
返回最大值。
"$min"?:?expr
返回最小值。
"$first"?:?expr
返回分組第一個值。
"$last"?:?expr
返回分組最后一個值。
"$max"和"$min"會查看文檔沒一個值。
例如：
db.foo.aggregate(
{"$group"?:?{"_id"?:?"$country","lowest"?:?{"$min"?:?"$money"},"highest"?:?{"$max"?:?"$money"},}
})
返回結果：
{?"_id"?:?"canada",?"lowest"?:?130,?"highest"?:?950?}
{?"_id"?:?"china",?"lowest"?:?15,?"highest"?:?949?}
{?"_id"?:?"india",?"lowest"?:?98,?"highest"?:?994?}
{?"_id"?:?"germany",?"lowest"?:?36,?"highest"?:?960?}
{?"_id"?:?"japan",?"lowest"?:?28,?"highest"?:?850?}
如果結果集是有序的，則使用"$first"和"$last"操作。
例如：
插入測試數據：
for?(var?i?=?1;?i?<?100;?i++)?{name?=?"caoqing",age?=?idb.foo1.insert({"name"?:?name,?"age"?:?age})
}
聚合：
db.foo1.aggregate(
{"$group"?:?{"_id"?:?"$name","lowest"?:?{"$first"?:?"$age"},"highest"?:?{"$last"?:?"$age"},????}
}
)
查詢結果：
{?"_id"?:?"caoqing",?"lowest"?:?1,?"highest"?:?99?}
2.3.4?數組操作符
對數組進行操作。
"$addToSet"?:?expr
如果當前數組不包含expr，則添加。返回結果集中，每個元素最多出現一次。
"$push"?:?expr
添加expr到數組，返回包含所有值的數組。
2.3.5?分組行為
大多數操作符都是流式工作的，只要有新文檔進入，就可以處理，但是"$group"必須等到收入所有文檔后，才能分組，然后將分組發送給管道中的下一操作符。
在分片情況下，"$group"先在每個分片上執行，然后把分組結果發送到mongos再進行統一分組，剩余的管道操作也都是在mongos上運行的。2.4?$unwind
拆分可以將數組的每一個值拆分為單獨的文檔。
例如：
db.blog.insert({"auther"?:?"caoqing","comments"?:?[{"CEO"?:?"Bill?Gates","corp"?:?"MicroSoft"},{"CEO"?:?"Mark?Zuckerberg","corp"?:?"Facebook"},{"CEO"?:?"Dick?Costolo","corp"?:?"Twitter"}]
})
拆分：
db.blog.aggregate(
{"$unwind"?:?"$comments"
})
結果：
{?"_id"?:?ObjectId("5360c4f909e603f892b1457c"),?"auther"?:?"caoqing",?"comments"?:?{?"CEO"?:?"Bill?Gates",?"corp"?:?"MicroSoft"?}?}
{?"_id"?:?ObjectId("5360c4f909e603f892b1457c"),?"auther"?:?"caoqing",?"comments"?:?{?"CEO"?:?"Mark?Zuckerberg",?"corp"?:?"Facebook"?}?}
{?"_id"?:?ObjectId("5360c4f909e603f892b1457c"),?"auther"?:?"caoqing",?"comments"?:?{?"CEO"?:?"Dick?Costolo",?"corp"?:?"Twitter"?}?}
如果希望得到特定的子文檔，可以先拆分，再"$match"得到結果。
db.blog.aggregate(
{"$project"?:?{"comments"?:?"$comments"}},{"$unwind"?:?"$comments"},{"$match"?:?{"comments.CEO"?:?"Mark?Zuckerberg"}}
)
結果：
{?"_id"?:?ObjectId("5360c4f909e603f892b1457c"),?"comments"?:?{?"CEO"?:?"Mark?Zuckerberg",?"corp"?:?"Facebook"?}?}2.5?$sort
可以根據任意字段或多個字段進行排序。如果需要大量排序，最好在管道的第一階段進行排序，這時可以使用索引。否則排序會很慢，占用大量內存。
例如：
db.foo.aggregate(
{"$group"?:?{"_id"?:?"$country","total"?:?{"$sum"?:?"$money"}}},{"$sort"?:?{"total"?:?-1,?"country"?:?1}}
)
結果：
{?"_id"?:?"canada",?"total"?:?11316?}
{?"_id"?:?"india",?"total"?:?11190?}
{?"_id"?:?"germany",?"total"?:?10288?}
{?"_id"?:?"japan",?"total"?:?9330?}
{?"_id"?:?"china",?"total"?:?7225?}2.6?$limit
$limit接受一個參數n，返回結果集中前n個文檔。
例如：
db.foo.aggregate(
{"$group"?:?{"_id"?:?"$country","total"?:?{"$sum"?:?"$money"}}},{"$sort"?:?{"total"?:?-1,?"country"?:?1}},{"$limit"?:?3}
)
結果：
{?"_id"?:?"canada",?"total"?:?11316?}
{?"_id"?:?"india",?"total"?:?11190?}
{?"_id"?:?"germany",?"total"?:?10288?}2.7?$skip
$skip接受一個參數n,丟棄結果集中前n個文檔，將剩余文檔返回。如果需要丟棄大量數據，效率會很低。
例如：
db.foo.aggregate(
{"$group"?:?{"_id"?:?"$country","total"?:?{"$sum"?:?"$money"}}},{"$sort"?:?{"total"?:?-1,?"country"?:?1}},{"$skip"?:?2}
)
結果：
{?"_id"?:?"germany",?"total"?:?10288?}
{?"_id"?:?"japan",?"total"?:?9330?}
{?"_id"?:?"china",?"total"?:?7225?}2.8?使用管道
應該在管道開始階段(執行"$project",?"$group",?"$unwind"操作前)將盡可能多的文檔和字段過濾掉，管道如果不是直接從原來的集合中使用數據，就無法在篩選和排序過程中使用索引。
Mongodb不允許單一的聚合操作占用過多的內存，如果某個操作占用20%以上內存，操作會直接輸出錯誤。
可以將結果集放入一個集合方便以后調用。