ログってなんぼ

日々のメモです

トレジャーデータ:tdコマンドの結果をJSONファイルで受け取る

Treasure DataへのクエリをJSONで受け取るメモ。

ヘルプ確認

[code]

td help query

usage: $ td query [sql]

example: $ td query -d example_db -w -r rset1 "select count(*) from table1" $ td query -d example_db -w -r rset1 -q query.txt

description: Issue a query

options: -g, --org ORGANIZATION issue the query under this organization -d, --database DB_NAME use the database (required) -w, --wait wait for finishing the job -G, --vertical use vertical table to show results -o, --output PATH write result to the file -f, --format FORMAT format of the result to write to the file (tsv, csv, json or msgpack) -r, --result RESULT_URL write result to the URL (see also result:create subcommand) -u, --user NAME set user name for the result URL -p, --password ask password for the result URL -P, --priority PRIORITY set priority -R, --retry COUNT automatic retrying count -q, --query PATH use file instead of inline query -T, --type TYPE set query type (hive or pig) --sampling DENOMINATOR enable random sampling to reduce records 1/DENOMINATOR -x, --exclude do not automatically retrieve the job result [/code]

ほむほむ。

やってみる

生JSONは見づらいので、mjson.toolで整形して表示してみる [code] $td query -w -d test -o ./result.json -f json "select v['path'] as path, count(1) as count from kndl group by v['path'] order by count desc limit 10" && cat result.json | python -mjson.tool

Job 4348887 is queued. Use 'td job:show 4348887' to show the status. queued... started at 2013-08-20T12:51:12Z Hive history file=/mnt/hive/tmp/2655/hive_job_log__1923815255.txt Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Defaulting to jobconf value of: 4 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201306191947_92892, Tracking URL = http://ip-10-149-50-132.ec2.internal:50030/jobdetails.jsp?jobid=job_201306191947_92892 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306191947_92892 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4 2013-08-20 12:51:31,869 Stage-1 map = 0%, reduce = 0% 2013-08-20 12:51:41,002 Stage-1 map = 99%, reduce = 0% 2013-08-20 12:51:42,019 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.26 sec 2013-08-20 12:51:43,041 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.26 sec 2013-08-20 12:51:44,061 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.26 sec 2013-08-20 12:51:45,080 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.26 sec 2013-08-20 12:51:46,100 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.26 sec 2013-08-20 12:51:47,119 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 11.32 sec 2013-08-20 12:51:48,138 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 11.32 sec 2013-08-20 12:51:49,158 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 11.32 sec 2013-08-20 12:51:50,187 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 11.32 sec 2013-08-20 12:51:51,206 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 14.5 sec 2013-08-20 12:51:52,225 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 14.5 sec 2013-08-20 12:51:53,244 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 14.5 sec 2013-08-20 12:51:54,264 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 17.56 sec 2013-08-20 12:51:55,283 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 17.56 sec 2013-08-20 12:51:56,302 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 17.56 sec 2013-08-20 12:51:57,321 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 17.56 sec 2013-08-20 12:51:58,377 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 20.56 sec 2013-08-20 12:51:59,390 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 20.56 sec 2013-08-20 12:52:00,399 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 20.56 sec MapReduce Total cumulative CPU time: 20 seconds 560 msec Ended Job = job_201306191947_92892 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201306191947_92893, Tracking URL = http://ip-10-149-50-132.ec2.internal:50030/jobdetails.jsp?jobid=job_201306191947_92893 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306191947_92893 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2013-08-20 12:52:06,186 Stage-2 map = 0%, reduce = 0% 2013-08-20 12:52:12,240 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2013-08-20 12:52:13,249 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2013-08-20 12:52:14,259 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2013-08-20 12:52:15,269 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2013-08-20 12:52:16,287 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.77 sec 2013-08-20 12:52:17,297 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.77 sec finished at 2013-08-20T12:52:20Z 2013-08-20 12:52:18,306 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.77 sec MapReduce Total cumulative CPU time: 4 seconds 770 msec Ended Job = job_201306191947_92893 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 4 Cumulative CPU: 20.56 sec HDFS Read: 25887 HDFS Write: 140935 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 4.77 sec HDFS Read: 142048 HDFS Write: 473 SUCCESS Total MapReduce CPU Time Spent: 25 seconds 330 msec OK MapReduce time taken: 54.637 seconds Time taken: 54.817 seconds Status : success Result : written to ./result.json in json format [ [ "/robots.txt", 100 ], [ "/items/detail/B00E7PC00W", 55 ], [ "/", 45 ], [ "/items/topseller/2275256051", 28 ], [ "/items/topseller/2291657051", 27 ], [ "/items/topseller/2293263051", 26 ], [ "/items/topseller/2291568051", 26 ], [ "/items/topseller/2292480051", 25 ], [ "/items/topseller/2292340051", 20 ], [ "/items/topseller/2291791051", 20 ] ] [/code]

いいですね〜。