[ic] Inktomi/Yahoo Search Engine Results include Session ID's

Kevin Walsh kevin at cursor.biz
Fri Feb 20 21:08:08 EST 2004


Gary Norton [gnorton at broadgap.com] wrote:
> I was wondering if anyone else was experiencing any problems with this as
> well. 
> 
> To illustrate, if you go to yahoo and search for "Toyota lift kits"
> (http://search.yahoo.com/search?p=toyota+lift+kits&ei=UTF-8&fr=fp-tab-web-t&
> n=20&fl=0&x=wrt) 
> 
> And look at the current #2 listing (suspensionconnection.com) you will
> notice that the session id has been indexed.
> 
> If you go even further and click "More pages from this site"
> (http://search.yahoo.com/search?p=toyota+lift+kits&ei=UTF-8&n=20&fl=0&fr=fp-
> tab-web-t&vst=0&vs=www.suspensionconnection.com)
> 
> It will display the "TOP 20 WEB RESULTS out of about 15,700". All together
> this site should have less than 3000 pages.  If you look at many of the
> links you can find the same page listed several times with a different
> session ID. 
> 
My guess is that you have upgraded to Interchange 5, from 4.8 or lower,
and these entries are artifacts from previous spider runs.  If a spider
is identified, Interchange 5 (and some 4.9s) will prevent session IDs
from being encoded into the URI args, so you get nice clean index entries.
Interchange versions 4.8 and earlier didn't have any spider-trap code
at all.

If a search engine already has a URI with a session ID in its index
then it will attempt to check if the URI is still valid.  To do this,
it will simply request the page as part of its crawl.  Interchange will
happily serve the page, so the search engine will assume that the
index entry is correct.

It is relatively easy to clean out the "invalid" search engine index
entries with a small change to the Interchange core.  Once your website
has been re-crawled (perhaps a month later) and the indexes are clean,
the extra Interchange core code can be removed.

At least, with Interchange 5, you will not see any new session IDs in
the indexes.  Google, of course, is more sensible and tends to simply
not follow URIs with arguments at all.

-- 
   _/   _/  _/_/_/_/  _/    _/  _/_/_/  _/    _/
  _/_/_/   _/_/      _/    _/    _/    _/_/  _/   K e v i n   W a l s h
 _/ _/    _/          _/ _/     _/    _/  _/_/    kevin at cursor.biz
_/   _/  _/_/_/_/      _/    _/_/_/  _/    _/



More information about the interchange-users mailing list