[ic] Just upgraded 4.8.9->5.2 - RobotUA question

DB DB at M-and-D.com
Mon Dec 13 21:09:45 EST 2004


> On Sun, 12 Dec 2004, DB wrote:
> 
>> I just upgraded by foundation based catalog from 4.8.9 to 5.2.0. I followed 
>> the UPGRADE file instructions and things went pretty smoothly. My main reason 
>> for the upgrade was to take advantage of the RobotUA feature.
>>
>> After the upgrade, I added the section below to the end of my 
>> interchange.cfg, however I still entries like this in my apache access_log:
>>
>> "GET /unlisted.html?id=gAW3nswb HTTP/1.0" 200 17202 "-" "ia_archiver"
>> "GET /helpfaq.html?id=SRvEvzVq HTTP/1.0" 200 32017 "-" "msnbot/0.3 
>> (+http://search.msn.com/msnbot.htm)"
>>
>> Now I thought the RobotUA prevented spiders from obtaining session ids? Am I 
>> confused, or can someone tell me why these spiders appears to be still 
>> obtaining session ids?
> 
> Are you sure that they're still obtaining session IDs? All those log 
> entries tell you is that they're successfully spidering URLs that have 
> session IDs already in them. Mostly likely their index of your site 
> already includes hundreds of URLs with embedded session IDs, and they'll 
> keep spidering those, getting results, and thinking everything's fine.
> 
> The change you made says that they won't be issued a session ID, which is 
> probably working. But it can't purge their old indexes. Perhaps some 
> spiders eventually stop polling old addresses that aren't linked any 
> longer, but I don't have any evidence of that.
> 
> If you want to be sure, do something like:
> 
> GET -H 'User-agent: ia_archiver' http://yoururl
> 
> And look for session IDs in the URLs you get back on that page.
> 
> Jon

Hmm could be - how would I use that GET statement - in a perl script? 
I'm not familiar with the syntax.

DB



More information about the interchange-users mailing list