ログってなんぼ

日々のメモです

トレジャーデータ:td-agentで送ったデータにクエリ投げてみた

SS 2013 08 17 21 43 32

昨日書いたエントリで入れたtd-agentを動かしっぱなしにしてますが、一晩置いておいたらデータが増えたのでもうちょっとだけ遊んでみた。

URLごとのアクセス数上位10件をクエリしてみた

[code]

td query -w -d test "select v['path'] as path, count(1) as count from kndl group by v['path'] order by count desc limit 10"

Job 4305790 is queued. Use 'td job:show 4305790' to show the status. queued... started at 2013-08-17T12:41:53Z Hive history file=/mnt/hive/tmp/2655/hive_job_log__2070257873.txt Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Defaulting to jobconf value of: 4 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201306191947_86974, Tracking URL = http://ip-10-149-50-132.ec2.internal:50030/jobdetails.jsp?jobid=job_201306191947_86974 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306191947_86974 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4 2013-08-17 12:42:11,902 Stage-1 map = 0%, reduce = 0% 2013-08-17 12:42:21,000 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.06 sec 2013-08-17 12:42:22,031 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.06 sec 2013-08-17 12:42:23,044 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.06 sec 2013-08-17 12:42:24,063 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.06 sec 2013-08-17 12:42:25,083 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 8.73 sec 2013-08-17 12:42:26,102 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 8.73 sec 2013-08-17 12:42:27,122 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 8.73 sec 2013-08-17 12:42:28,192 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 11.47 sec 2013-08-17 12:42:29,253 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 11.47 sec 2013-08-17 12:42:30,270 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 11.47 sec 2013-08-17 12:42:31,290 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 11.47 sec 2013-08-17 12:42:32,312 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 14.06 sec 2013-08-17 12:42:33,328 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 14.06 sec 2013-08-17 12:42:34,338 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 14.06 sec 2013-08-17 12:42:35,348 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 16.65 sec 2013-08-17 12:42:36,367 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 16.65 sec 2013-08-17 12:42:37,387 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 16.65 sec 2013-08-17 12:42:38,406 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 16.65 sec MapReduce Total cumulative CPU time: 16 seconds 650 msec Ended Job = job_201306191947_86974 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201306191947_86975, Tracking URL = http://ip-10-149-50-132.ec2.internal:50030/jobdetails.jsp?jobid=job_201306191947_86975 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306191947_86975 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2013-08-17 12:42:44,341 Stage-2 map = 0%, reduce = 0% 2013-08-17 12:42:50,449 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.09 sec 2013-08-17 12:42:51,459 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.09 sec 2013-08-17 12:42:52,468 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.09 sec 2013-08-17 12:42:53,518 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.09 sec 2013-08-17 12:42:54,537 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.76 sec 2013-08-17 12:42:55,546 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.76 sec 2013-08-17 12:42:56,555 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.76 sec 2013-08-17 12:42:57,566 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.76 sec MapReduce Total cumulative CPU time: 3 seconds 760 msec Ended Job = job_201306191947_86975 finished at 2013-08-17T12:43:01Z MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 4 Cumulative CPU: 16.65 sec HDFS Read: 12162 HDFS Write: 29802 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 3.76 sec HDFS Read: 30915 HDFS Write: 460 SUCCESS Total MapReduce CPU Time Spent: 20 seconds 410 msec OK MapReduce time taken: 54.646 seconds Time taken: 54.839 seconds Status : success Result : +-------------------------------+-------+ | path | count | +-------------------------------+-------+ | /items/detail/B00E7PC00W | 26 | | /items/topseller/2292480051 | 25 | | /items/topseller/2275256051 | 25 | | /robots.txt | 23 | | / | 14 | | /items/detail/B00DBYIBE4/ | 4 | | /items/detail/B00D1FIK6M/ | 4 | | /items/newrelease/2275256051/ | 4 | | /items/detail/B00E81IGTY/ | 4 | | /items/detail/B009RGAF6C | 3 | +-------------------------------+-------+ 10 rows in set [/code]

データの母数が数百なんでまあこんなもんでしょうw

このくらいのデータ量とこの程度のサマリなら生ログにワンライナーで終了なんですけど、もっと複雑で膨大なデータやapache log以外の形式のログなどでは威力を発揮するでしょうね。

無料アカウントだと月60クエリ&150GBストレージ&CPUパワーはベストエフォートの制限があるらしいのでこれ以上遊べそうも有りませんが、エンタープライズプランなら8CPU専有&2TBストレージ&クエリ無制限&サポート付き他特典色々付きのようなので、次はいよいよ仕事で使ってみよかな?!

うーんでも3000$プランで2TBまでか〜それ以上は応相談か〜

トレジャーデータさん、ぜひ10TBで3500$くらいのプランも作って〜>< (笑)

TreasureData面白い。