[ic] Just upgraded 4.8.9->5.2 - RobotUA question

Jon Jensen jon at endpoint.com
Mon Dec 13 14:15:05 EST 2004


On Sun, 12 Dec 2004, DB wrote:

> I just upgraded by foundation based catalog from 4.8.9 to 5.2.0. I followed 
> the UPGRADE file instructions and things went pretty smoothly. My main reason 
> for the upgrade was to take advantage of the RobotUA feature.
>
> After the upgrade, I added the section below to the end of my 
> interchange.cfg, however I still entries like this in my apache access_log:
>
> "GET /unlisted.html?id=gAW3nswb HTTP/1.0" 200 17202 "-" "ia_archiver"
> "GET /helpfaq.html?id=SRvEvzVq HTTP/1.0" 200 32017 "-" "msnbot/0.3 
> (+http://search.msn.com/msnbot.htm)"
>
> Now I thought the RobotUA prevented spiders from obtaining session ids? Am I 
> confused, or can someone tell me why these spiders appears to be still 
> obtaining session ids?

Are you sure that they're still obtaining session IDs? All those log 
entries tell you is that they're successfully spidering URLs that have 
session IDs already in them. Mostly likely their index of your site 
already includes hundreds of URLs with embedded session IDs, and they'll 
keep spidering those, getting results, and thinking everything's fine.

The change you made says that they won't be issued a session ID, which is 
probably working. But it can't purge their old indexes. Perhaps some 
spiders eventually stop polling old addresses that aren't linked any 
longer, but I don't have any evidence of that.

If you want to be sure, do something like:

GET -H 'User-agent: ia_archiver' http://yoururl

And look for session IDs in the URLs you get back on that page.

Jon

-- 
Jon Jensen
End Point Corporation
http://www.endpoint.com/
Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...


More information about the interchange-users mailing list