ログってなんぼ

日々のメモです

Treasure DataにApacheのアクセスログを送る(td-agent)

SS 2013 08 16 23 07 34

td-agentを用いてTreasureDataにデータを送るメモ

キンドルの雑貨屋さん - Kindleで電子書籍を読もう!

ここのアクセスログでもとりあえず送ってみる

アカウント登録

Feaures & Pricing | Treasure Data

Toolbeltのインスコ

Quickstart Guide | Treasure Data

CentOSはココを見てインスコ

Installing the Treasure Data CLI | Treasure Data

Apacheのログをtd-agent経由で送る

Analyzing Apache Logs on the Cloud | Treasure Data

アカウント認証する

[code]

td account

Enter your Treasure Data credentials. Email: xxxxx@example.com Password (typing will be hidden): Authenticated successfully. Use 'td db:create <db_name>' to create a database. [/code]

dbとtableを作る

[code]

td db:create test

Database 'test' is created. Use 'td table:create test <table_name>' to create a table.

td table:create test kndl

Table 'test.kndl' is created. [/code]

api keyをメモっておく

[code]

td apikey:show

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX [/code]

/etc/td-agent/td-agent.conf

[code]

Treasure Data Input and Output

type tdlog apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX auto_create_table buffer_type file buffer_path /var/log/td-agent/buffer/td use_ssl true type tail path /var/log/httpd/kndl.access_log pos_file /tmp/kndl.access.log.pos format apache tag td.test.kndl [/code]

auto_create_tableがあればtable事前に作らなくてもいいのかな?ちょっと不明だが先へ進めていく。あとでちゃんと調べよ。

td-agent再起動

Apacheアクセスログの読み取り権限がtd-agentユーザーに無いと思うので、グループを合わせるなどの方法でtd-agentユーザーがlogにアクセスできるようにしておく

[code]

/etc/init.d/td-agent restart

[/code]

td-agent.log確認

[code]

tail -f /var/log/td-agent/td-agent.log

2013-08-16 23:48:04 +0900 [info]: adding source type="http" 2013-08-16 23:48:04 +0900 [info]: adding source type="debug_agent" 2013-08-16 23:48:04 +0900 [info]: adding source type="tail" 2013-08-16 23:48:04 +0900 [warn]: 'pos_file PATH' parameter is not set to a 'tail' source. 2013-08-16 23:48:04 +0900 [warn]: this parameter is highly recommended to save the position to resume tailing. 2013-08-16 23:48:04 +0900 [info]: adding match pattern="debug." type="stdout" 2013-08-16 23:48:04 +0900 [info]: adding match pattern="td.." type="tdlog" 2013-08-16 23:48:04 +0900 [info]: listening fluent socket on 0.0.0.0:24224 2013-08-16 23:48:04 +0900 [info]: listening dRuby uri="druby://127.0.0.1:24230" object="Engine" 2013-08-16 23:48:04 +0900 [info]: following tail of /var/log/httpd/kndl.access_log [/code] warn出てるけどちょっとあとで調べるとして、とりあえずうまくいったっぽい

クエリ投げてみる

[code]

td query -w -d test "SELECT COUNT(*) FROM kndl"

Job 4293650 is queued. Use 'td job:show 4293650' to show the status. queued... started at 2013-08-16T15:29:18Z Hive history file=/mnt/hive/tmp/2655/hive_job_log_831129948.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201306191947_85279, Tracking URL = http://ip-10-149-50-132.ec2.internal:50030/jobdetails.jsp?jobid=job_201306191947_85279 Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201306191947_85279 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-08-16 15:29:37,939 Stage-1 map = 0%, reduce = 0% 2013-08-16 15:29:41,990 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2013-08-16 15:29:43,015 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2013-08-16 15:29:44,035 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2013-08-16 15:29:45,054 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2013-08-16 15:29:46,073 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2013-08-16 15:29:47,093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2013-08-16 15:29:48,113 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec 2013-08-16 15:29:49,132 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec finished at 2013-08-16T15:29:53Z 2013-08-16 15:29:50,187 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec MapReduce Total cumulative CPU time: 5 seconds 80 msec Ended Job = job_201306191947_85279 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 5.08 sec HDFS Read: 1272 HDFS Write: 102 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 80 msec OK MapReduce time taken: 20.495 seconds Time taken: 20.679 seconds Status : success Result : +-----+ | c0 | +-----+ | 53 | +-----+ 1 row in set [/code]

[code]

td query -w -d test "SELECT * FROM kndl limit 1"

Job 4293581 is queued. Use 'td job:show 4293581' to show the status. queued... started at 2013-08-16T15:13:46Z Hive history file=/mnt/hive/tmp/2655/hive_job_log__1500289243.txt OK MapReduce time taken: 0.28 seconds finished at 2013-08-16T15:14:01Z Time taken: 1.61 seconds Status : success Result : +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+ | v | time | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+ | {"user":"-","method":"GET","code":"200","time":"1376664671","size":"45717","host":"119.240.238.160","referer":"-","path":"/","agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36"} | 1376664671 | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+ 1 row in set [/code]

とりあえず一通りOK。

あとは細かいところ調査しながら継続していこう。