Go to the top

Bloquer les robots crawler indésirables

Nicolas / Blog, Gestion de projet Internet, SEO & référencement / 1 Comment

Bon, pour reprendre le sketch des inconnus il y a le bon bot et le mauvais bot.Le bon passe inaperçu chope « ses » datas (et oui sur internet la data est tienne), pourri un peut les stats et se casse… le mauvais bots chope « ses » data, te pourri tes stats, te pique tes mail et tes images, en fait 15 duplications avec des publicités de charme et de l’affiliation Iphone et te démonte ton référencement…. Voici donc une liste consolidée à insérer dans votre robots.txt. Je vais essayer de la tenir à jour …. F**** BOTbastard.

Si vous en avez d’autres merci de les notifier en commentaire :

# robotstxt_antibadbot begin
User-agent: 8484 Boston Project v 1.0
Disallow: /
User-agent: Atomic_Email_Hunter/4.0
Disallow: /
User-agent: atSpider/1.0
Disallow: /
User-agent: autoemailspider
Disallow: /
User-agent: bwh3_user_agent
Disallow: /
User-agent: China Local Browse 2.6
Disallow: /
User-agent: ContactBot/0.2
Disallow: /
User-agent: ContentSmartz
Disallow: /
User-agent: DataCha0s/2.0
Disallow: /
User-agent: DBrowse 1.4b
Disallow: /
User-agent: DBrowse 1.4d
Disallow: /
User-agent: Demo Bot DOT 16b
Disallow: /
User-agent: Demo Bot Z 16b
Disallow: /
User-agent: DSurf15a 01
Disallow: /
User-agent: DSurf15a 71
Disallow: /
User-agent: DSurf15a 81
Disallow: /
User-agent: DSurf15a VA
Disallow: /
User-agent: EBrowse 1.4b
Disallow: /
User-agent: Educate Search VxB
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailSpider
Disallow: /
User-agent: EmailWolf 1.00
Disallow: /
User-agent: ESurf15a 15
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Franklin Locator 1.8
Disallow: /
User-agent: FSurf15a 01
Disallow: /
User-agent: Full Web Bot 0416B
Disallow: /
User-agent: Full Web Bot 0516B
Disallow: /
User-agent: Full Web Bot 2816B
Disallow: /
User-agent: Guestbook Auto Submitter
Disallow: /
User-agent: Industry Program 1.0.x
Disallow: /
User-agent: infoConveraCrawler/0.8 ( http://www.authoritativeweb.com/crawl)
Disallow: /
User-agent: ISC Systems iRc Search 2.1
Disallow: /
User-agent: IUPUI Research Bot v 1.9a
Disallow: /
User-agent: LARBIN-EXPERIMENTAL (efp@gmx.net)
Disallow: /
User-agent: LetsCrawl.com/1.0 +http://letscrawl.com/
Disallow: /
User-agent: Lincoln State Web Browser
Disallow: /
User-agent: LMQueueBot/0.2
Disallow: /
User-agent: LWP::Simple/5.803
Disallow: /
User-agent: Mac Finder 1.0.xx
Disallow: /
User-agent: MFC Foundation Class Library 4.0
Disallow: /
User-agent: Microsoft URL Control – 6.00.8xxx
Disallow: /
User-agent: Missauga Locate 1.0.0
Disallow: /
User-agent: Missigua Locator 1.9
Disallow: /
User-agent: Missouri College Browse
Disallow: /
User-agent: Mizzu Labs 2.2
Disallow: /
User-agent: Mo College 1.9
Disallow: /
User-agent: Mozilla/2.0 (compatible; NEWT ActiveX; Win32)
Disallow: /
User-agent: Mozilla/3.0 (compatible)
Disallow: /
User-agent: Mozilla/3.0 (compatible; Indy Library)
Disallow: /
User-agent: Mozilla/3.0 (compatible; scan4mail (advanced version) http://www.peterspages.net/?scan4mail)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Advanced Email Extractor v2.xx)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Iplexx Spider/1.0 http://www.iplexx.at)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Disallow: /
User-agent: Mozilla/4.0 efp@gmx.net
Disallow: /
User-agent: Mozilla/4.08 [en] (Win98; U ;Nav)
Disallow: /
User-agent: Mozilla/5.0 (Version: xxxx Type:xx)
Disallow: /
User-agent: MVAClient
Disallow: /
User-agent: NameOfAgent (CMS Spider)
Disallow: /
User-agent: NASA Search 1.0
Disallow: /
User-agent: Nsauditor/1.x
Disallow: /
User-agent: PBrowse 1.4b
Disallow: /
User-agent: PEval 1.4b
Disallow: /
User-agent: Poirot
Disallow: /
User-agent: Port Huron Labs
Disallow: /
User-agent: Production Bot 0116B
Disallow: /
User-agent: Production Bot 2016B
Disallow: /
User-agent: Production Bot DOT 3016B
Disallow: /
User-agent: Program Shareware 1.0.2
Disallow: /
User-agent: PSurf15a 11
Disallow: /
User-agent: PSurf15a 51
Disallow: /
User-agent: PSurf15a VA
Disallow: /
User-agent: psycheclone
Disallow: /
User-agent: RSurf15a 41
Disallow: /
User-agent: RSurf15a 51
Disallow: /
User-agent: RSurf15a 81
Disallow: /
User-agent: searchbot admin@google.com
Disallow: /
User-agent: ShablastBot 1.0
Disallow: /
User-agent: snap.com beta crawler v0
Disallow: /
User-agent: Snapbot/1.0
Disallow: /
User-agent: Snapbot/1.0 (Snap Shots, +http://www.snap.com)
Disallow: /
User-agent: sogou develop spider
Disallow: /
User-agent: Sogou Orion spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sohu agent
Disallow: /
User-agent: SSurf15a 11
Disallow: /
User-agent: TSurf15a 11
Disallow: /
User-agent: Under the Rainbow 2.2
Disallow: /
User-agent: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Disallow: /
User-agent: VadixBot
Disallow: /
User-agent: WebVulnCrawl.unknown/1.0 libwww-perl/5.803
Disallow: /
User-agent: Wells Search II
Disallow: /
User-agent: WEP Search 00
Disallow: /
User-agent: 8484 Boston Project v 1.0
Disallow: /
User-agent: Atomic_Email_Hunter/4.0
Disallow: /
User-agent: atSpider/1.0
Disallow: /
User-agent: autoemailspider
Disallow: /
User-agent: bwh3_user_agent
Disallow: /
User-agent: China Local Browse 2.6
Disallow: /
User-agent: ContactBot/0.2
Disallow: /
User-agent: ContentSmartz
Disallow: /
User-agent: DataCha0s/2.0
Disallow: /
User-agent: DBrowse 1.4b
Disallow: /
User-agent: DBrowse 1.4d
Disallow: /
User-agent: Demo Bot DOT 16b
Disallow: /
User-agent: Demo Bot Z 16b
Disallow: /
User-agent: DSurf15a 01
Disallow: /
User-agent: DSurf15a 71
Disallow: /
User-agent: DSurf15a 81
Disallow: /
User-agent: DSurf15a VA
Disallow: /
User-agent: EBrowse 1.4b
Disallow: /
User-agent: Educate Search VxB
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailSpider
Disallow: /
User-agent: EmailWolf 1.00
Disallow: /
User-agent: ESurf15a 15
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Franklin Locator 1.8
Disallow: /
User-agent: FSurf15a 01
Disallow: /
User-agent: Full Web Bot 0416B
Disallow: /
User-agent: Full Web Bot 0516B
Disallow: /
User-agent: Full Web Bot 2816B
Disallow: /
User-agent: Guestbook Auto Submitter
Disallow: /
User-agent: Industry Program 1.0.x
Disallow: /
User-agent: infoConveraCrawler/0.8 ( http://www.authoritativeweb.com/crawl)
Disallow: /
User-agent: ISC Systems iRc Search 2.1
Disallow: /
User-agent: IUPUI Research Bot v 1.9a
Disallow: /
User-agent: LARBIN-EXPERIMENTAL (efp@gmx.net)
Disallow: /
User-agent: LetsCrawl.com/1.0 +http://letscrawl.com/
Disallow: /
User-agent: Lincoln State Web Browser
Disallow: /
User-agent: LMQueueBot/0.2
Disallow: /
User-agent: LWP::Simple/5.803
Disallow: /
User-agent: Mac Finder 1.0.xx
Disallow: /
User-agent: MFC Foundation Class Library 4.0
Disallow: /
User-agent: Microsoft URL Control – 6.00.8xxx
Disallow: /
User-agent: Missauga Locate 1.0.0
Disallow: /
User-agent: Missigua Locator 1.9
Disallow: /
User-agent: Missouri College Browse
Disallow: /
User-agent: Mizzu Labs 2.2
Disallow: /
User-agent: Mo College 1.9
Disallow: /
User-agent: Mozilla/2.0 (compatible; NEWT ActiveX; Win32)
Disallow: /
User-agent: Mozilla/3.0 (com
patible)
Disallow: /
User-agent: Mozilla/3.0 (compatible; Indy Library)
Disallow: /
User-agent: Mozilla/3.0 (compatible; scan4mail (advanced version) http://www.peterspages.net/?scan4mail)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Advanced Email Extractor v2.xx)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Iplexx Spider/1.0 http://www.iplexx.at)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Disallow: /
User-agent: Mozilla/4.0 efp@gmx.net
Disallow: /
User-agent: Mozilla/4.08 [en] (Win98; U ;Nav)
Disallow: /
User-agent: Mozilla/5.0 (Version: xxxx Type:xx)
Disallow: /
User-agent: MVAClient
Disallow: /
User-agent: NameOfAgent (CMS Spider)
Disallow: /
User-agent: NASA Search 1.0
Disallow: /
User-agent: Nsauditor/1.x
Disallow: /
User-agent: PBrowse 1.4b
Disallow: /
User-agent: PEval 1.4b
Disallow: /
User-agent: Poirot
Disallow: /
User-agent: Port Huron Labs
Disallow: /
User-agent: Production Bot 0116B
Disallow: /
User-agent: Production Bot 2016B
Disallow: /
User-agent: Production Bot DOT 3016B
Disallow: /
User-agent: Program Shareware 1.0.2
Disallow: /
User-agent: PSurf15a 11
Disallow: /
User-agent: PSurf15a 51
Disallow: /
User-agent: PSurf15a VA
Disallow: /
User-agent: psycheclone
Disallow: /
User-agent: RSurf15a 41
Disallow: /
User-agent: RSurf15a 51
Disallow: /
User-agent: RSurf15a 81
Disallow: /
User-agent: searchbot admin@google.com
Disallow: /
User-agent: ShablastBot 1.0
Disallow: /
User-agent: snap.com beta crawler v0
Disallow: /
User-agent: Snapbot/1.0
Disallow: /
User-agent: Snapbot/1.0 (Snap Shots, +http://www.snap.com)
Disallow: /
User-agent: sogou develop spider
Disallow: /
User-agent: Sogou Orion spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sohu agent
Disallow: /
User-agent: SSurf15a 11
Disallow: /
User-agent: TSurf15a 11
Disallow: /
User-agent: Under the Rainbow 2.2
Disallow: /
User-agent: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Disallow: /
User-agent: VadixBot
Disallow: /
User-agent: WebVulnCrawl.unknown/1.0 libwww-perl/5.803
Disallow: /
User-agent: Wells Search II
Disallow: /
User-agent: WEP Search 00
Disallow: /
# robotstxt_antibadbot end

Svetlana septembre 6, 2013 Post Reply

Voici un moment que je cherchais une telle liste. Quand on vois le nombre de bots qui passent sur un site, difficile de séparer le bon grain de l'ivraie ;)

Leave a Comment