[ic] Just upgraded 4.8.9->5.2 - RobotUA question
DB
DB at M-and-D.com
Mon Dec 13 21:09:45 EST 2004
> On Sun, 12 Dec 2004, DB wrote:
>
>> I just upgraded by foundation based catalog from 4.8.9 to 5.2.0. I followed
>> the UPGRADE file instructions and things went pretty smoothly. My main reason
>> for the upgrade was to take advantage of the RobotUA feature.
>>
>> After the upgrade, I added the section below to the end of my
>> interchange.cfg, however I still entries like this in my apache access_log:
>>
>> "GET /unlisted.html?id=gAW3nswb HTTP/1.0" 200 17202 "-" "ia_archiver"
>> "GET /helpfaq.html?id=SRvEvzVq HTTP/1.0" 200 32017 "-" "msnbot/0.3
>> (+http://search.msn.com/msnbot.htm)"
>>
>> Now I thought the RobotUA prevented spiders from obtaining session ids? Am I
>> confused, or can someone tell me why these spiders appears to be still
>> obtaining session ids?
>
> Are you sure that they're still obtaining session IDs? All those log
> entries tell you is that they're successfully spidering URLs that have
> session IDs already in them. Mostly likely their index of your site
> already includes hundreds of URLs with embedded session IDs, and they'll
> keep spidering those, getting results, and thinking everything's fine.
>
> The change you made says that they won't be issued a session ID, which is
> probably working. But it can't purge their old indexes. Perhaps some
> spiders eventually stop polling old addresses that aren't linked any
> longer, but I don't have any evidence of that.
>
> If you want to be sure, do something like:
>
> GET -H 'User-agent: ia_archiver' http://yoururl
>
> And look for session IDs in the URLs you get back on that page.
>
> Jon
Hmm could be - how would I use that GET statement - in a perl script?
I'm not familiar with the syntax.
DB
More information about the interchange-users
mailing list