[ic] RobotUA Problems

Jamie Neil jamie at versado.net
Fri Mar 19 08:15:56 EST 2004


Hi all,

Just been doing some site optimisation for spiders (disabling "more" in 
search results etc.) and I've stumbled across a problem with the default 
robot detection settings.

RobotUA matches on substrings in the HTTP User Agent. This is fine for 
things like "Googlebot" or "Slurp", but I've noticed when trawling 
through the logs that some users have customised user agent strings 
after installing "branded" browsers or toolbars. A couple of examples:

	Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; AskBar 3.00; YPC 
3.0.2; yplus 4.3.01b)

	Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows 98; sureseeker.com; 
searchengine2000.com)

Both of these will match the default RobotUA list (Ask and seek) and so 
won't get a sessionid (which I assume means the basket won't work).

I'm not sure whether this is a widespread problem, but searching through 
the usertrack log with:

	tail -n 100000 usertrack |grep nsession.*ADD

showed up 7 users in the last week without a sessionid who tried to add 
stuff to the basket.

I've replaced "Ask" with "Ask?Jeeves?Teoma" (I assume spaces and / are 
not allowed so I've used wildcards), but I'm not sure what to do with 
the more generic matches like "seek" or "search".

-- 
Jamie Neil | <jamie at versado.net> | 0870 7777 454
Versado I.T. Services Ltd. | http://versado.net/ | 0845 450 1254


More information about the interchange-users mailing list