Go to the top

Bloquer les robots crawler indésirables

Nicolas / Blog, Gestion de projet Internet, SEO & référencement / 5 Comments

Bon, pour reprendre le sketch des inconnus il y a le bon bot et le mauvais bot.Le bon passe inaperçu chope “ses” datas (et oui sur internet la data est tienne), pourri un peut les stats et se casse… le mauvais bots chope “ses” data, te pourri tes stats, te pique tes mail et tes images, en fait 15 duplications avec des publicités de charme et de l’affiliation Iphone et te démonte ton référencement…. Voici donc une liste consolidée à insérer dans votre robots.txt. Je vais essayer de la tenir à jour …. F**** BOTbastard.

Si vous en avez d’autres merci de les notifier en commentaire :

# robotstxt_antibadbot begin
User-agent: 8484 Boston Project v 1.0
Disallow: /
User-agent: Atomic_Email_Hunter/4.0
Disallow: /
User-agent: atSpider/1.0
Disallow: /
User-agent: autoemailspider
Disallow: /
User-agent: bwh3_user_agent
Disallow: /
User-agent: China Local Browse 2.6
Disallow: /
User-agent: ContactBot/0.2
Disallow: /
User-agent: ContentSmartz
Disallow: /
User-agent: DataCha0s/2.0
Disallow: /
User-agent: DBrowse 1.4b
Disallow: /
User-agent: DBrowse 1.4d
Disallow: /
User-agent: Demo Bot DOT 16b
Disallow: /
User-agent: Demo Bot Z 16b
Disallow: /
User-agent: DSurf15a 01
Disallow: /
User-agent: DSurf15a 71
Disallow: /
User-agent: DSurf15a 81
Disallow: /
User-agent: DSurf15a VA
Disallow: /
User-agent: EBrowse 1.4b
Disallow: /
User-agent: Educate Search VxB
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailSpider
Disallow: /
User-agent: EmailWolf 1.00
Disallow: /
User-agent: ESurf15a 15
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Franklin Locator 1.8
Disallow: /
User-agent: FSurf15a 01
Disallow: /
User-agent: Full Web Bot 0416B
Disallow: /
User-agent: Full Web Bot 0516B
Disallow: /
User-agent: Full Web Bot 2816B
Disallow: /
User-agent: Guestbook Auto Submitter
Disallow: /
User-agent: Industry Program 1.0.x
Disallow: /
User-agent: infoConveraCrawler/0.8 ( http://www.authoritativeweb.com/crawl)
Disallow: /
User-agent: ISC Systems iRc Search 2.1
Disallow: /
User-agent: IUPUI Research Bot v 1.9a
Disallow: /
User-agent: LARBIN-EXPERIMENTAL (efp@gmx.net)
Disallow: /
User-agent: LetsCrawl.com/1.0 +http://letscrawl.com/
Disallow: /
User-agent: Lincoln State Web Browser
Disallow: /
User-agent: LMQueueBot/0.2
Disallow: /
User-agent: LWP::Simple/5.803
Disallow: /
User-agent: Mac Finder 1.0.xx
Disallow: /
User-agent: MFC Foundation Class Library 4.0
Disallow: /
User-agent: Microsoft URL Control – 6.00.8xxx
Disallow: /
User-agent: Missauga Locate 1.0.0
Disallow: /
User-agent: Missigua Locator 1.9
Disallow: /
User-agent: Missouri College Browse
Disallow: /
User-agent: Mizzu Labs 2.2
Disallow: /
User-agent: Mo College 1.9
Disallow: /
User-agent: Mozilla/2.0 (compatible; NEWT ActiveX; Win32)
Disallow: /
User-agent: Mozilla/3.0 (compatible)
Disallow: /
User-agent: Mozilla/3.0 (compatible; Indy Library)
Disallow: /
User-agent: Mozilla/3.0 (compatible; scan4mail (advanced version) http://www.peterspages.net/?scan4mail)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Advanced Email Extractor v2.xx)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Iplexx Spider/1.0 http://www.iplexx.at)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Disallow: /
User-agent: Mozilla/4.0 efp@gmx.net
Disallow: /
User-agent: Mozilla/4.08 [en] (Win98; U ;Nav)
Disallow: /
User-agent: Mozilla/5.0 (Version: xxxx Type:xx)
Disallow: /
User-agent: MVAClient
Disallow: /
User-agent: NameOfAgent (CMS Spider)
Disallow: /
User-agent: NASA Search 1.0
Disallow: /
User-agent: Nsauditor/1.x
Disallow: /
User-agent: PBrowse 1.4b
Disallow: /
User-agent: PEval 1.4b
Disallow: /
User-agent: Poirot
Disallow: /
User-agent: Port Huron Labs
Disallow: /
User-agent: Production Bot 0116B
Disallow: /
User-agent: Production Bot 2016B
Disallow: /
User-agent: Production Bot DOT 3016B
Disallow: /
User-agent: Program Shareware 1.0.2
Disallow: /
User-agent: PSurf15a 11
Disallow: /
User-agent: PSurf15a 51
Disallow: /
User-agent: PSurf15a VA
Disallow: /
User-agent: psycheclone
Disallow: /
User-agent: RSurf15a 41
Disallow: /
User-agent: RSurf15a 51
Disallow: /
User-agent: RSurf15a 81
Disallow: /
User-agent: searchbot admin@google.com
Disallow: /
User-agent: ShablastBot 1.0
Disallow: /
User-agent: snap.com beta crawler v0
Disallow: /
User-agent: Snapbot/1.0
Disallow: /
User-agent: Snapbot/1.0 (Snap Shots, +http://www.snap.com)
Disallow: /
User-agent: sogou develop spider
Disallow: /
User-agent: Sogou Orion spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sohu agent
Disallow: /
User-agent: SSurf15a 11
Disallow: /
User-agent: TSurf15a 11
Disallow: /
User-agent: Under the Rainbow 2.2
Disallow: /
User-agent: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Disallow: /
User-agent: VadixBot
Disallow: /
User-agent: WebVulnCrawl.unknown/1.0 libwww-perl/5.803
Disallow: /
User-agent: Wells Search II
Disallow: /
User-agent: WEP Search 00
Disallow: /
User-agent: 8484 Boston Project v 1.0
Disallow: /
User-agent: Atomic_Email_Hunter/4.0
Disallow: /
User-agent: atSpider/1.0
Disallow: /
User-agent: autoemailspider
Disallow: /
User-agent: bwh3_user_agent
Disallow: /
User-agent: China Local Browse 2.6
Disallow: /
User-agent: ContactBot/0.2
Disallow: /
User-agent: ContentSmartz
Disallow: /
User-agent: DataCha0s/2.0
Disallow: /
User-agent: DBrowse 1.4b
Disallow: /
User-agent: DBrowse 1.4d
Disallow: /
User-agent: Demo Bot DOT 16b
Disallow: /
User-agent: Demo Bot Z 16b
Disallow: /
User-agent: DSurf15a 01
Disallow: /
User-agent: DSurf15a 71
Disallow: /
User-agent: DSurf15a 81
Disallow: /
User-agent: DSurf15a VA
Disallow: /
User-agent: EBrowse 1.4b
Disallow: /
User-agent: Educate Search VxB
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailSpider
Disallow: /
User-agent: EmailWolf 1.00
Disallow: /
User-agent: ESurf15a 15
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Franklin Locator 1.8
Disallow: /
User-agent: FSurf15a 01
Disallow: /
User-agent: Full Web Bot 0416B
Disallow: /
User-agent: Full Web Bot 0516B
Disallow: /
User-agent: Full Web Bot 2816B
Disallow: /
User-agent: Guestbook Auto Submitter
Disallow: /
User-agent: Industry Program 1.0.x
Disallow: /
User-agent: infoConveraCrawler/0.8 ( http://www.authoritativeweb.com/crawl)
Disallow: /
User-agent: ISC Systems iRc Search 2.1
Disallow: /
User-agent: IUPUI Research Bot v 1.9a
Disallow: /
User-agent: LARBIN-EXPERIMENTAL (efp@gmx.net)
Disallow: /
User-agent: LetsCrawl.com/1.0 +http://letscrawl.com/
Disallow: /
User-agent: Lincoln State Web Browser
Disallow: /
User-agent: LMQueueBot/0.2
Disallow: /
User-agent: LWP::Simple/5.803
Disallow: /
User-agent: Mac Finder 1.0.xx
Disallow: /
User-agent: MFC Foundation Class Library 4.0
Disallow: /
User-agent: Microsoft URL Control – 6.00.8xxx
Disallow: /
User-agent: Missauga Locate 1.0.0
Disallow: /
User-agent: Missigua Locator 1.9
Disallow: /
User-agent: Missouri College Browse
Disallow: /
User-agent: Mizzu Labs 2.2
Disallow: /
User-agent: Mo College 1.9
Disallow: /
User-agent: Mozilla/2.0 (compatible; NEWT ActiveX; Win32)
Disallow: /
User-agent: Mozilla/3.0 (com
patible)
Disallow: /
User-agent: Mozilla/3.0 (compatible; Indy Library)
Disallow: /
User-agent: Mozilla/3.0 (compatible; scan4mail (advanced version) http://www.peterspages.net/?scan4mail)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Advanced Email Extractor v2.xx)
Disallow: /
User-agent: Mozilla/4.0 (compatible; Iplexx Spider/1.0 http://www.iplexx.at)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
Disallow: /
User-agent: Mozilla/4.0 efp@gmx.net
Disallow: /
User-agent: Mozilla/4.08 [en] (Win98; U ;Nav)
Disallow: /
User-agent: Mozilla/5.0 (Version: xxxx Type:xx)
Disallow: /
User-agent: MVAClient
Disallow: /
User-agent: NameOfAgent (CMS Spider)
Disallow: /
User-agent: NASA Search 1.0
Disallow: /
User-agent: Nsauditor/1.x
Disallow: /
User-agent: PBrowse 1.4b
Disallow: /
User-agent: PEval 1.4b
Disallow: /
User-agent: Poirot
Disallow: /
User-agent: Port Huron Labs
Disallow: /
User-agent: Production Bot 0116B
Disallow: /
User-agent: Production Bot 2016B
Disallow: /
User-agent: Production Bot DOT 3016B
Disallow: /
User-agent: Program Shareware 1.0.2
Disallow: /
User-agent: PSurf15a 11
Disallow: /
User-agent: PSurf15a 51
Disallow: /
User-agent: PSurf15a VA
Disallow: /
User-agent: psycheclone
Disallow: /
User-agent: RSurf15a 41
Disallow: /
User-agent: RSurf15a 51
Disallow: /
User-agent: RSurf15a 81
Disallow: /
User-agent: searchbot admin@google.com
Disallow: /
User-agent: ShablastBot 1.0
Disallow: /
User-agent: snap.com beta crawler v0
Disallow: /
User-agent: Snapbot/1.0
Disallow: /
User-agent: Snapbot/1.0 (Snap Shots, +http://www.snap.com)
Disallow: /
User-agent: sogou develop spider
Disallow: /
User-agent: Sogou Orion spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Disallow: /
User-agent: sohu agent
Disallow: /
User-agent: SSurf15a 11
Disallow: /
User-agent: TSurf15a 11
Disallow: /
User-agent: Under the Rainbow 2.2
Disallow: /
User-agent: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Disallow: /
User-agent: VadixBot
Disallow: /
User-agent: WebVulnCrawl.unknown/1.0 libwww-perl/5.803
Disallow: /
User-agent: Wells Search II
Disallow: /
User-agent: WEP Search 00
Disallow: /
# robotstxt_antibadbot end

Svetlana septembre 6, 2013 Post Reply

Voici un moment que je cherchais une telle liste. Quand on vois le nombre de bots qui passent sur un site, difficile de séparer le bon grain de l'ivraie ;)

Jean C novembre 6, 2020 Post Reply

Bonjour,

Commentaire certes tardif, mais cela fait peu de temps que je m'intéresse à la question.
J'ai lu, sur des sites consacrés à ROBOTS.TXT, que ce fichier ne servait que de panneau "Interdiction d'entrer", mais ne pouvait empêcher quiconque le souhaite d'outrepasser l'avertissement et de procéder à un "crawling" intégral du site.
Si c'est effectivement le cas et si les "bots" listés ci-dessus sont classés comme malveillants, qu'est-ce qui pourrait les empêcher d'outrepasser ces directives, voire d'ignorer purement et simplement le fichier robots.txt ?

Merci de votre réponse

    Nicolas novembre 6, 2020 Post Reply

    Bonjour jean,

    vous avez tout à fait raison, mais certain outils demande de forcer cette config ce n'est pas obligatoirement natif. POur les bots SEO tel que ahrefs ou semrush qui sont des produits publique impossible pour eux d'analyser le sites s'il y a cette interdiction. cela empeche vos concurrents de vous analyser par exemple.

      Jean C novembre 6, 2020 Post Reply

      NIcolas,

      Merci de votre réponse.
      Elle me confirme que la protection offerte par robots.txt ne sera efficace que pour les "bots" qui sont construits pour respecter une convention de comportement. C'est toujours çà de pris.

      Cordialement.

Marie-David Tihon novembre 27, 2020 Post Reply

Bonjour,

Merci pour cette liste. Il y a aussi la possibilité aussi de disallower les bots de Yandex, BaiduSpider, SeznamBot, BUbiNG et Cliqzbot.

Bonne journée !

Répondre à Jean C Cancel

Leave a Comment