[ic] Just upgraded 4.8.9->5.2 - RobotUA question

DB DB at M-and-D.com
Sun Dec 12 12:57:14 EST 2004


I just upgraded by foundation based catalog from 4.8.9 to 5.2.0. I 
followed the UPGRADE file instructions and things went pretty smoothly. 
My main reason for the upgrade was to take advantage of the RobotUA feature.

After the upgrade, I added the section below to the end of my 
interchange.cfg, however I still entries like this in my apache access_log:

"GET /unlisted.html?id=gAW3nswb HTTP/1.0" 200 17202 "-" "ia_archiver"
"GET /helpfaq.html?id=SRvEvzVq HTTP/1.0" 200 32017 "-" "msnbot/0.3 
(+http://search.msn.com/msnbot.htm)"

Now I thought the RobotUA prevented spiders from obtaining session ids? 
Am I confused, or can someone tell me why these spiders appears to be 
still obtaining session ids?

Here's what I added to my interchange.cfg, and yes I did restart :) 
Thanks for any input. - DB


# Robots stuff - 12/12/2004

RobotUA <<EOR
     ATN_Worldwide, AltaVista, Arachnoidea, Aranha, Architext, Ask, Atomz,
     BackRub, Builder, CMC, Contact, Digital*Integrity, Directory, EZResult,
     Excite, Ferret, Fireball, Google, Gromit, Gulliver, Harvest, Hubater,
     H?m?h?kki, INGRID, IncyWincy, Jack, KIT*Fireball, Kototoi, LWP, Lycos,
     MegaSheep, Mercator, Nazilla, NetMechanic, NetResearchServer, NetScoop,
     ParaSite, Refiner, RoboDude, Rover, Rutgers, Scooter, Slurp, Spyder,
     T-H-U-N-D-E-R-S-T-O-N-E, Toutatis, Tv*Merc, Valkyrie, Voyager, WIRE,
     Walker, Wget, WhizBang, Wire, Wombat, Yahoo, Yandex, ZyBorg, appie,
     asterias, bot, contact, crawl, collector, fido, find, gazz, grabber,
     griffon, archiver, legs, marvin, mirago, moget, newscan, seek, speedy,
     spider, suke, tarantula, agent, topiclink, whowhere, winona, worm, 
xtreme,
     ia_archiver
EOR

RobotIP <<EOR
     202.9.155.123,      204.152.191.41,         208.146.26.19,
     208.146.26.233,     209.185.141.209,        209.185.141.211,
     209.202.148.36,     209.202.148.41,         216.200.130.207,
     216.35.103.6?,      216.35.103.70,          66.196.65.??,
     209.237.238.173,
EOR

RobotHost <<EOR
     *.crawler*.com,     *.excite.com,           *.googlebot.com,
     *.infoseek.com,     *.inktomi.com,          *.inktomisearch.com,
     *.lycos.com,        *.pa-x.dec.com,         add-url.altavista.com,
     westinghouse-rsl-com-usa.NorthRoyalton.cw.net,
EOR



More information about the interchange-users mailing list