trdsql Log集計
Posted on:
Log集計
ApacheやnginxなどのLogをLTSVフォーマットで出力する方法も定着してきました。
そのようなLogをtrdsqlで解析する例です。
出力する側は、apacheのLogFormatの設定を以下のようにカスタマイズフォーマットにします。
LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\treferer:\%{Referer}i\tua:%{User-Agent}i" combined_ltsv
host,ident,user,time,req,status,size,referer,uaの項目が出力されます。
実際のLogは以下のようになります。
host:176.99.192.42 ident:- user:- time:[21/Oct/2019:21:33:53 +0900] req:GET /category/software HTTP/1.1 status:200 size:138 referer:- ua:Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
host:192.54.157.102 ident:- user:- time:[21/Oct/2019:21:33:53 +0900] req:GET /item/electronics/4478 HTTP/1.1 status:200 size:60 referer:/category/sports ua:Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
host:88.60.137.115 ident:- user:- time:[21/Oct/2019:21:33:53 +0900] req:POST /search/?c=Games+Electronics HTTP/1.1 status:200 size:98 referer:/item/networking/929 ua:Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3
...
解析
まずは trdsql の -aを実行してみます。
The table name is log.ltsv.
The file type is LTSV.
Data types:
+-------------+------+
| column name | type |
+-------------+------+
| \`host\` | text |
| ident | text |
| \`user\` | text |
| \`time\` | text |
| req | text |
| \`status\` | text |
| \`size\` | text |
| referer | text |
| ua | text |
+-------------+------+
Data samples:
+---------------+-------+----------+------------------------------+--------------------------------+------------+----------+---------+--------------------------------+
| \`host\` | ident | \`user\` | \`time\` | req | \`status\` | \`size\` | referer | ua |
+---------------+-------+----------+------------------------------+--------------------------------+------------+----------+---------+--------------------------------+
| 176.99.192.42 | - | - | [21/Oct/2019:21:33:53 +0900] | GET /category/software | 200 | 138 | - | Mozilla/5.0 (compatible; MSIE |
| | | | | HTTP/1.1 | | | | 9.0; Windows NT 6.1; WOW64; |
| | | | | | | | | Trident/5.0) |
+---------------+-------+----------+------------------------------+--------------------------------+------------+----------+---------+--------------------------------+
Examples:
trdsql "SELECT \`host\`, ident, \`user\`, \`time\`, req, \`status\`, \`size\`, referer, ua FROM log.ltsv"
trdsql "SELECT \`host\`, ident, \`user\`, \`time\`, req, \`status\`, \`size\`, referer, ua FROM log.ltsv WHERE \`host\` = '176.99.192.42'"
trdsql "SELECT \`host\`, count(\`host\`) FROM log.ltsv GROUP BY \`host\`"
trdsql "SELECT \`host\`, ident, \`user\`, \`time\`, req, \`status\`, \`size\`, referer, ua FROM log.ltsv ORDER BY \`host\` LIMIT 10"
Examplesの実行例をヒントにこれまでに紹介したSQLを使用して実行していきます。
上位を出力
アクセスが多いホストTop 5を出力
trdsql -oat "SELECT \`host\`, count(\`host\`) as count FROM log.ltsv GROUP BY \`host\` ORDER BY count DESC LIMIT 5"
+----------------+-------+
| host | count |
+----------------+-------+
| 36.69.176.222 | 5 |
| 92.132.226.51 | 4 |
| 76.222.144.225 | 4 |
| 28.63.137.225 | 4 |
| 28.57.188.28 | 4 |
+----------------+-------+
リクエストが多い順Top 5を出力
trdsql -oat "SELECT req, count(req) as count FROM log.ltsv GROUP BY req ORDER BY count DESC LIMIT 5"
+--------------------------------+-------+
| req | count |
+--------------------------------+-------+
| GET /category/software | 74 |
| HTTP/1.1 | |
| GET /category/electronics | 73 |
| HTTP/1.1 | |
| GET /category/games HTTP/1.1 | 66 |
| GET /category/books HTTP/1.1 | 44 |
| GET /category/office HTTP/1.1 | 30 |
+--------------------------------+-------+
検索条件と組み合わせて出力
status が200以外のリクエストと回数を出力
trdsql -oat "SELECT req, status,count(req) as count FROM log.ltsv WHERE status != '200' GROUP BY req, status ORDER BY count DESC"
+-------------------------------+--------+-------+
| req | status | count |
+-------------------------------+--------+-------+
| GET /item/books/3230 HTTP/1.1 | 404 | 1 |
| GET /item/games/4672 HTTP/1.1 | 404 | 1 |
+-------------------------------+--------+-------+