[ic] Just upgraded 4.8.9->5.2 - RobotUA question
Jon Jensen
jon at endpoint.com
Mon Dec 13 14:15:05 EST 2004
On Sun, 12 Dec 2004, DB wrote:
> I just upgraded by foundation based catalog from 4.8.9 to 5.2.0. I followed
> the UPGRADE file instructions and things went pretty smoothly. My main reason
> for the upgrade was to take advantage of the RobotUA feature.
>
> After the upgrade, I added the section below to the end of my
> interchange.cfg, however I still entries like this in my apache access_log:
>
> "GET /unlisted.html?id=gAW3nswb HTTP/1.0" 200 17202 "-" "ia_archiver"
> "GET /helpfaq.html?id=SRvEvzVq HTTP/1.0" 200 32017 "-" "msnbot/0.3
> (+http://search.msn.com/msnbot.htm)"
>
> Now I thought the RobotUA prevented spiders from obtaining session ids? Am I
> confused, or can someone tell me why these spiders appears to be still
> obtaining session ids?
Are you sure that they're still obtaining session IDs? All those log
entries tell you is that they're successfully spidering URLs that have
session IDs already in them. Mostly likely their index of your site
already includes hundreds of URLs with embedded session IDs, and they'll
keep spidering those, getting results, and thinking everything's fine.
The change you made says that they won't be issued a session ID, which is
probably working. But it can't purge their old indexes. Perhaps some
spiders eventually stop polling old addresses that aren't linked any
longer, but I don't have any evidence of that.
If you want to be sure, do something like:
GET -H 'User-agent: ia_archiver' http://yoururl
And look for session IDs in the URLs you get back on that page.
Jon
--
Jon Jensen
End Point Corporation
http://www.endpoint.com/
Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...
More information about the interchange-users
mailing list