[ic] Removing session ID from the URL

John Young john_young at sonic.net
Tue Jan 11 17:55:46 EST 2005


Grant wrote:
>>>>>It looks like IC usually appends a session ID to all links on the
>>>>>first page of the session.  If IC gets a session cookie back from the
>>>>>user after that, it keeps the session IDs out of the links.  Is there
>>>>>any way to keep IC from appending session IDs right away?
>>>>
>>>>If you don't want a session ID ever to appear in the URL, you can set this
>>>>in catalog.cfg:
>>>>
>>>>ScratchDefault  mv_no_session  1
>>>
>>>What a treat!  Thank you!  This means I don't have to maintain Robot*
>>>anymore if I can count on mv_pc not showing up.
>>(Mike wrote:)
>>Hmm.
>>
>>I don't think that is a safe statement -- a great deal of the benefit of
>>using Robot* is that there is no session assigned and you don't have
>>the hit on disk writing and storage.
> 
> 
> I didn't think of that.  Would a session be created even though the
> robot never returns a cookie and never includes an id in the request
> URL?  Actually, maybe a new session would be created every time a
> robot accesses a page with this behavior?


Depends on if the bot maintains cookies or not; however, if you
have various RobotUA, etc. settings, and a visitor is identified
as a bot, then before IC does a bunch of back-end session work, it
sorta falls through a trap door (actually, mv_tmp_session is set).
If you want to see how much junk IC might be dealing with in a
session, use <pre>[dump]</pre> in a test page.


>>I guess I still don't understand the problem. Are you saying that
>>even when the UserAgent is seen as a robot, we are putting session ids
>>in?
> 
> 
> I'm saying normal users sometimes (second page of the session) have an
> id in the URL which creates a "page" that isn't in Google's index. 
> Google then can't display a targeted ad for that page.  Google knows
> /page.html but doesn't know /page.html?id=abcd and definitely sees
> them as two different pages as far as AdSense is concerned.


You might need to be more explicit about on what server what happens
for those of us that do not use AdSense.

If it's a case where you need to make sure GoogleBot doesn't find
session id's, then we've already talked about that (use RobotUA,
RobotHost, etc.).

If it's a case where you are displaying Google ads on your site,
and there is some mechanism (JavaScript) where your site sends the
current URL to Google in order for Google to send you relevant ads
for placement on your pages (I'm just guessing about AdSense operation,
here), then perhaps you could filter what is sent to Google to strip
out session id and page count variables.


>>WRT the id= thing, you can remove that alias from the equation by
>>editing ICROOT/etc/vars.
> 
> 
> I think you mean ICROOT/etc/varnames, but very cool file.  Editing
> that wouldn't keep a session ID out of the URL though would it?


That's just a way to change variable aliases.


-John Young



More information about the interchange-users mailing list