ログからボット経由のアクセスを確認してみた

某所のフロントのWebサーバのアクセスログを、ちょっと調べてみたら、世界各国から色んなボットがアクセスに来ていることが確認できた。

こじんまりとしたサイトでも、本当にたくさんの種類のボットさんが"訪れて"くれています。


とりあえず、"/robots.txt"へのアクセスを元にカウント。きちんと調べればもっとたくさんはいそうだけど。
まぁ、"/robots.txt"に人がアクセスすることは、そんなにないだろうなー、と勝手に予想。
"Wget"とかも混じっているけど、これはお手製ボットなんだろうか、よくわからない。


とりあえず、UA(ユーザエージェント)だけを抽出してアクセスの多い順に出力してみた。ネタとして結果を貼り付けておきます。
# 問題等あればコメントで教えてください。

# grep "/robots.txt" /var/log/apache/access_log | cut -d " " -f12- | sort | uniq -c | sort -nr
  2232 "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
  1315 "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
   529 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
   403 "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
   387 "Baiduspider+(+http://www.baidu.com/search/spider_jp.html)"
   280 "Baiduspider+(+http://www.baidu.jp/spider/)"
   246 "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)"
   218 "-"
   175 "MSMOBOT/1.1 (+http://search.msn.com/msnbot.htm)"
   146 "ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)"
   136 "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
   134 "FeedChecker/0.01"
   124 "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
    90 "SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler; http://www.similarpages.com; info at similarpages dot com)"
    80 "Baiduspider+(+http://help.baidu.jp/system/05.html)"
    77 "Modiphibot/0.91 (http://www.modiphi.com/; 0 subscribers)"
    69 "Gigabot/3.0 (http://www.gigablast.com/spider.html)"
    59 "Mozilla/5.0 (compatible; Yeti/1.0; NHN Corp.; http://help.naver.com/robots/)"
    59 "Mozilla/4.0 (compatible; NaverBot/1.0; http://help.naver.com/customer_webtxt_02.jsp)"
    37 "Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)"
    35 "ICC-Crawler(Mozilla-compatible; icc-crawl-contact(at)ml(dot)nict(dot)go(dot)jp; http://kc.nict.go.jp/project1/crawl.html)"
    34 "Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)"
    30 "Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)"
    28 "SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; http://boston.lti.cs.cmu.edu/crawler/; mhoy@cs.cmu.edu)"
    27 "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)"
    25 "HatenaScreenshot/1.0 (checker)"
    22 "R6_FeedFetcher(www.radian6.com/crawler)"
    21 "Baiduspider+(+http://www.baidu.com/search/spider.htm)"
    20 "Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/blog-search/)"
    20 "DoCoMo/2.0 D903i(c100;TB;W28H20) (compatible; Hatena-Mobile-Gateway/1.2; +http://mgw.hatena.ne.jp/help)"
    19 "Snapbot/1.0 (Snap Shots, +http://www.snap.com)"
    18 "nutch-crawl/Nutch-1.0-dev (imcs; http://imcs.ro; admin@imcs.ro)"
    18 "Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)"
    17 "librabot/1.0 (+http://search.msn.com/msnbot.htm)"
    17 "Yandex/1.01.001 (compatible; Win16; I)"
    16 "T-Mobile Dash Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; Smartphone; 320x240;) MSNBOT-MOBILE/1.1 (+http://search.msn.com/msnbot.htm)"
    16 "Sosospider+(+http://help.soso.com/webspider.htm)"
    16 "Mozilla/5.0 (compatible; Steeler/3.4; http://www.tkl.iis.u-tokyo.ac.jp/~crawler/)"
    15 "Googlebot-Image/1.0"
    14 "R6_CommentReader(www.radian6.com/crawler)"
    13 "ia_archiver-web.archive.org"
    13 "SurveyBot/2.3 (Whois Source)"
    13 "Mozilla/5.0 (compatible; MJ12bot/v1.2.4; http://www.majestic12.co.uk/bot.php?+)"
    13 "Grub/2.0 (IOI crawler; http://index.isc.org/; crawl@index.isc.org)"
    12 "Yandex/1.01.001 (compatible; Win16; i)"
    11 "Mozilla/5.0 (compatible; YoudaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )"
    11 "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.16) Gecko/20080702 Iceweasel/2.0.0.16 (Debian-2.0.0.16-0etch1)"
    11 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
    11
    10 "Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)"
    10 "CazoodleBot/0.0.2 (http://www.cazoodle.com/contact.php; cbot@cazoodle.com)"
     9 "voyager/2.0 (http://www.kosmix.com/crawler.html)"
     9 "urlfan-bot/1.0; +http://www.urlfan.com/site/bot/350.html"
     9 "Wget/1.9.1"
     9 "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
     8 "ichiro/3.0 (http://help.goo.ne.jp/door/crawler.html)"
     8 "Y!J-SRD/1.0"
     8 "Mozilla/5.0 (compatible; Charlotte/1.1; http://www.searchme.com/support/)"
     7 "renlifangbot/1.0 (+http://search.msn.com/msnbot.htm)"
     7 "Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)"
     7 "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8"
     7 "Cyberz Communication Agent (http://www.cyberz.co.jp/)"
     6 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; MEGAUPLOAD 2.0) ( )"
     5 "livedoor ScreenShot/0.10"
     5 "larbin_2.6.3 (larbin2.6.3@unspecified.mail)"
     5 "Wget/1.11.4"
     5 "MSR-ISRCCrawler"
     4 "Mozilla/5.0 (compatible; discobot/1.0; +http://discoveryengine.com/discobot.html)"
     4 "DoCoMo/2.0 N902iS(c100;TB;W24H12)(compatible; moba-crawler; http://crawler.dena.jp/)"
     3 "robotgenius (http://robotgenius.net)"
     3 "libwww-perl/5.805"
     3 "NextGenSearchBot 1 (for information visit http://www.zoominfo.com/About/misc/NextGenSearchBot.aspx)"
     3 "Mozilla/5.0 (compatible;YodaoBot-Image/1.0;http://www.youdao.com/help/webmaster/spider/;)"
     3 "Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)"
     3 "Mozilla/5.0 (Yahoo-MMCrawler/4.0; mailto:vertical-crawl-support@yahoo-inc.com)"
     3 "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)"
     3 "MLBot (www.metadatalabs.com/mlbot)"
     3 "DealGates Bot/1.1 by Luc Michalski (http://spider.dealgates.com/bot.html)"
     2 "robotgenius"
     2 "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
     2 "jp_viewer (larbin2.6.3@unspecified.mail)"
     2 "ip1 (larbin2.6.3@unspecified.mail)"
     2 "gooblogsearch/2.0 (http://help.goo.ne.jp/contact/)"
     2 "Y!J-BRI/0.0.1 crawler ( http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html )"
     2 "Wget/1.10.2"
     2 "Voracious/0.1"
     2 "SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler; http://www.similarpages.com; info@similarpages.com)"
     2 "Mozilla/5.0 (compatible; heritrix/${pom.version} +http://seekda.com)"
     2 "Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; +http://ws.daum.net/aboutWebSearch.html) Daumoa/2.0"
     2 "Mozilla/5.0 (compatible; MJ12bot/v1.2.3; http://www.majestic12.co.uk/bot.php?+)"
     2 "Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729)"
     2 "Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10"
     2 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
     2 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB5; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
     2 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
     2 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB5; .NET CLR 1.1.4322; .NET CLR 2.0.50727; MSIECrawler)"
     2 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 1.0.3705)"
     2 "Mozilla/4.0 (Toread-Crawler/1.1; +http://news.toread.cc/crawler.php)"
     2 "Mediapartners-Google"
     2 "<a href='http://db2-sql.blogspot.com'> DB DB2 ODBC</a>  (support@runnk.com)"
     1 "mixi-mobile-converter/1.0 (http://mixi.jp/)"
     1 "larbin_2.6.3 (myemail@address.co.uk)"
     1 "googlebot (search@socbay.com)"
     1 "fly/6.01 libwww/4.0D"
     1 "flatlandbot/baypup (Flatland Industries Web Spider; http://www.flatlandindustries.com/flatlandbot; jason@flatlandindustries.com)"
     1 "beast/Nutch-0.9 (agentspider; beast@mail.com)"
     1 "YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide)"
     1 "Yahoo Pipes 1.0"
     1 "WinWebBot/1.0; (Balaena Ltd, UK); http://www.balaena.com/winwebbot.html; winwebbot@balaena.com;)"
     1 "Webcrawler/Nutch-1.0-dev (Test crawl; lucene.apache.org/nutch/; a@b.net)"
     1 "SiteGuardBot (support@@siteguard.com)"
     1 "Shelob (shelob@gmx.net)"
     1 "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
     1 "Pockey-GetHTML/5.1.1 (Win32; GUI; ix86)"
     1 "Peter Wang/Nutch-1.0-dev (Nutch spiderman; http://peterpuwang.googlepages.com ; MyEmail)"
     1 "Mozilla/5.0 (compatible; OsO; http://oso.octopodus.com/abot.html)"
     1 "Mozilla/5.0 (compatible; BMC/1.0 (Y!J-AGENT))"
     1 "Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv:1.8) Gecko/20051107 Firefox/1.5"
     1 "Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 GTB5 (.NET CLR 3.5.30729)"
     1 "Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729)"
     1 "Mozilla/4.0"
     1 "Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)"
     1 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1; .NET CLR 2.0.50727)"
     1 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)"
     1 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MSIECrawler)"
     1 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; MSIECrawler)"
     1 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; MSIECrawler)"
     1 "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; obot)"
     1 "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
     1 "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 5.0 Robot)"
     1 "Mail.Ru/1.0"
     1 "Java/1.5.0_15"
     1 "Googlebot/2.1 (+http://www.google.com/bot.html)"
     1 "Gaisbot/3.0+(robot06@gais.cs.ccu.edu.tw;+http://gais.cs.ccu.edu.tw/robot.php)"
     1 "CCBot/1.0 (+http://www.commoncrawl.org/bot.html)"



# grep "/robots.txt" /mnt/data/log/apache2.2/access_log | cut -d " " -f12- | sort | uniq | wc -l
135

全てがボットではないだろうけど、とりあえず↑のログは135種類ほど。
ちなみに、元のログファイルは、約140万リクエスト程度分のログです。


Web解析Hacks ―オンラインビジネスで最大の効果をあげるテクニック & ツール

Web解析Hacks ―オンラインビジネスで最大の効果をあげるテクニック & ツール

  • 作者: Eric T. Peterson,株式会社デジタルフォレスト,木下哲也,有限会社福龍興業
  • 出版社/メーカー: オライリー・ジャパン
  • 発売日: 2006/11/08
  • メディア: 単行本(ソフトカバー)
  • 購入: 3人 クリック: 78回
  • この商品を含むブログ (21件) を見る