[interchange-bugs] [interchange-core] [rt.icdevgroup.org #344] 80legs webcrawler not recognized as Robot due to NotRobotUA

Kristen Eisenberg kristen.eisenberg at yahoo.com
Sat Oct 1 14:58:40 UTC 2011


<URL: http://rt.icdevgroup.org/Ticket/Display.html?id=344 >

On 03/02/2011 03:15 PM, David Christensen wrote:
>> 80 legs webcrawler identifies itself as:
>>
>> Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/webcrawler.html) Gecko/2008032620
>>
>> Because of the NotRobotUA entry 'Gecko' this crawler is not identified as such.
>>
>> Blocking via RobotIP will not work as it works via a distributed network of IP's ... So it will crawl creating a bunch of session id's with all different IP numbers.
>
> Yeah, I've been reconsidering the NotRobotUA change.  I like it in principle, but then you end up with cases like this.  Short of a JustKiddingThisIsReallyARobotUA directive, I'm not sure how to do this generally—it starts to feel like an arms race.  I think in the general case, we'd rather users always be able to have a session/checkout, so basically we'd run into cases like this as the exception to handle.
>
> Perhaps a suitable negative lookahead/behind pattern would help in this specific case.  I'm also open to ideas/other thoughts.


Kristen Eisenberg
Billige Flüge
Marketing GmbH
Emanuelstr. 3,
10317 Berlin
Deutschland
Telefon: +49 (33)
5310967
Email:
utebachmeier at
gmail.com
Site:
http://flug.airego.de-
Billige Flüge vergleichen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.icdevgroup.org/pipermail/interchange-bugs/attachments/20111001/338987bc/attachment.html>


More information about the interchange-bugs mailing list