読者です 読者をやめる 読者になる 読者になる

ログってなんぼ

日々のメモです

MongoDBに入れたデータをgroupしてsortしてlimitしてみるメモ

MongoDB

f:id:Okisanjp:20160714124359p:plain

Webサービスでの検索ワードをMongoDBに入れているのでそれをサンプルに。

とりあえずfind

こんなかんじで入ってる

> db.search.find()
{ "_id" : ObjectId("51c29e216ea305de1a000001"), "keywords" : "DMP", "update" : "2013-06-20 15:16:01" }
{ "_id" : ObjectId("51c29e9b6ea305701c000002"), "keywords" : "広告", "update" : "2013-06-20 15:18:03" }
{ "_id" : ObjectId("51c29eac6ea305701c000003"), "keywords" : "広告", "update" : "2013-06-20 15:18:20" }
{ "_id" : ObjectId("51d310706ea3057a1e000000"), "keywords" : "早見泰美", "update" : "2013-07-03 02:40:00" }
{ "_id" : ObjectId("51db8cdf6ea3051f49000000"), "keywords" : "早水 桃子", "update" : "2013-07-09 13:09:03" }
{ "_id" : ObjectId("51dcc26d6ea3050653000000"), "keywords" : "Ixy", "update" : "2013-07-10 11:09:49" }
{ "_id" : ObjectId("51dcc26e6ea3050653000001"), "keywords" : "HarperCollinsChildren", "update" : "2013-07-10 11:09:50" }
{ "_id" : ObjectId("51dcc2726ea3050653000002"), "keywords" : "黒瀧公之", "update" : "2013-07-10 11:09:54" }
{ "_id" : ObjectId("51dcc2736ea3050653000003"), "keywords" : "黒史郎", "update" : "2013-07-10 11:09:55" }
{ "_id" : ObjectId("51dcc2756ea3050653000004"), "keywords" : "高橋昌一郎", "update" : "2013-07-10 11:09:57" }
{ "_id" : ObjectId("51dcc2786ea3050653000005"), "keywords" : "鈴木健介", "update" : "2013-07-10 11:10:00" }
{ "_id" : ObjectId("51dcc27a6ea3050653000006"), "keywords" : "鈴木みそ", "update" : "2013-07-10 11:10:02" }
{ "_id" : ObjectId("51dcc27b6ea3050653000007"), "keywords" : "鈴木ちなみ", "update" : "2013-07-10 11:10:03" }
{ "_id" : ObjectId("51dcc27e6ea3050653000008"), "keywords" : "重松清", "update" : "2013-07-10 11:10:06" }
{ "_id" : ObjectId("51dcc27f6ea3050653000009"), "keywords" : "谷本真由美(@May_Roma)", "update" : "2013-07-10 11:10:07" }
{ "_id" : ObjectId("51dcc2826ea305065300000a"), "keywords" : "誉田哲也", "update" : "2013-07-10 11:10:10" }
{ "_id" : ObjectId("51dcc2846ea305065300000b"), "keywords" : "西村ミツル", "update" : "2013-07-10 11:10:12" }
{ "_id" : ObjectId("51dcc2866ea305065300000c"), "keywords" : "藤村真理", "update" : "2013-07-10 11:10:14" }
{ "_id" : ObjectId("51dcc2896ea305065300000d"), "keywords" : "藤本 壱", "update" : "2013-07-10 11:10:17" }
{ "_id" : ObjectId("51dcc28a6ea305065300000e"), "keywords" : "草薙雅哉", "update" : "2013-07-10 11:10:18" }
Type "it" for more

ちょっと見づらいので

_idを非表示にするのと、updateの降順でデータをとりなおす

> db.search.find({},{_id:0}).sort({update: -1})
{ "keywords" : "統計学", "update" : "2013-08-23 20:44:30" }
{ "keywords" : "統計学", "update" : "2013-08-23 20:44:29" }
{ "keywords" : "統計学", "update" : "2013-08-23 20:44:17" }
{ "keywords" : "統計学", "update" : "2013-08-23 20:44:03" }
{ "keywords" : "QP:flapper", "update" : "2013-08-23 16:00:41" }
{ "keywords" : "阿智 太郎", "update" : "2013-08-23 16:00:06" }
{ "keywords" : "hunter", "update" : "2013-08-22 21:33:09" }
{ "keywords" : "hunter", "update" : "2013-08-22 21:31:11" }
{ "keywords" : "オリンピック", "update" : "2013-08-22 17:38:26" }
{ "keywords" : "古着系アイドル18", "update" : "2013-08-21 22:18:47" }
{ "keywords" : "Linda ", "update" : "2013-08-16 02:40:11" }
{ "keywords" : "北島 英明=講師;羽根田 治=構成", "update" : "2013-08-12 15:35:59" }
{ "keywords" : "なれるSE", "update" : "2013-08-09 13:55:10" }
{ "keywords" : "ソキウス・ジャパン", "update" : "2013-08-07 18:10:21" }
{ "keywords" : "タモリ", "update" : "2013-08-07 18:10:18" }
{ "keywords" : "デビッド・バーカー", "update" : "2013-08-07 18:10:12" }
{ "keywords" : "ドラム", "update" : "2013-08-07 18:10:11" }
{ "keywords" : "ネーゼ", "update" : "2013-08-07 18:10:07" }
{ "keywords" : "ハル(@sakuraSoftware)", "update" : "2013-08-07 18:10:06" }
{ "keywords" : "ビクトル・ユーゴー", "update" : "2013-08-07 18:10:03" }
Type "it" for more

では集計

キーワードでグルーピングして件数で並べ替えて上位5件を出します

> db.search.aggregate(
    {
    $group:{
        "_id":"$keywords",
        "count":{"$sum":1}
        }
    },
    {
    $sort:{
        count:-1
        }
    },
    {
    $limit:5
    }
);


{
    "result" : [
        {
            "_id" : "古着系アイドル18",
            "count" : 33
        },
        {
            "_id" : "統計",
            "count" : 7
        },
        {
            "_id" : "ビッグデータ",
            "count" : 4
        },
        {
            "_id" : "ホーリーランド",
            "count" : 4
        },
        {
            "_id" : "統計学",
            "count" : 4
        }
    ],
    "ok" : 1
}

出来ましたね。古着系アイドル18ってなんだろ?

参考URL

SQL to Aggregation Mapping Chart