[ic] RobotUA

Jonathan Clark interchange-users@icdevgroup.org
Tue Nov 26 17:56:01 2002


> > I've had my RobotUA all set up for a few days, but examining my rotated
> > access_log files, the robots aren't getting any further than this:
> >
> > 66.196.65.16 - - [25/Nov/2002:18:30:41 -0800] "GET /robots.txt HTTP/1.0"
> > 200 0 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com;
> > http://www.inktomi.com/slurp.html)"
> > 66.196.65.16 - - [25/Nov/2002:18:30:42 -0800] "GET / HTTP/1.0" 301 330
> > "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com;
> > http://www.inktomi.com/slurp.html)"
> >
> > Here's my RobotUA entry:
> >
> > RobotUA WebCrawler, BaiDuSpider, ZyBorg, almaden.ibm, Googlebot, Slurp,
> > Girafabo
> > t, ia_archiver, LinkWalker, MSIECrawler
> >
> > Has anyone verified that this directive really works to clean up the
> > URLs for spidering?
> >
> > - Grant
>
>
> Get a web browser that allows you to change the User Agent like Konqueror
> or even w3m, and then turn off cookies. Go to your web site and look at
> it. This will tell you if your configuration is working as you expedted
> to.

Yep. I use the Sam Spade windows app for this: www.samspade.org. It has a
'no frills' web browser which returns the raw response and lets you set the
UA. Its also really handy for watching cookies and redirect type responses.

Jonathan.