[ic] how do I stop Google from trying to index scan pages?

IC ic at tvcables.co.uk
Thu Sep 18 23:21:01 UTC 2008


> Hi list,
> 
> IC 5.4.2	Perl 5.8.8	Old Construct cat on Centos 4.7
> 
> My client has over 15,000 products, but Google only ranks about 400 in
> their
> index. The last 4 pages in the Google index are scans. It SEEMS like after
> hitting 4 scan pages, Google stops and turns away (probably because the
> page
> content appears to be similar). I make changes to pages and robots.txt
> then
> wait to see the new Google ranking in a few days/week. I have a lot of
> respect for Google and always spell it with a capital 'G', but I still
> have
> this problem. ;-)
> 
> I've been in the archives, but I can't get the precise info I need to
> change
> my robots.txt to stop these pages from being indexed. This is an example
> of
> the pages in question:
> 
> http://www.my-domain.com/cgi-
> bin/storeabc/scan/fi=products/st=db/sf=category
> /se=DVD%20Video/ml=16/tf=description.html
> 
> I have RobotUA, RobotHost, and RobotIP settings in catalog.cfg. I have a
> robots.txt file in my httpdocs directory, with entries like this (among
> others):
> 
> User-agent: Googlebot
> Disallow: /*?
> 
> User-agent: *
> Disallow: /storeabc/scan
> Disallow: /scan
> Disallow: /storeabc/process
> Disallow: /process
> Disallow: /cgi-bin/storeabc/process
> Disallow: /cgi-bin/storeabc/scan/
> Disallow: /cgi-bin/storeabc/search
> Disallow: /cgi-bin/storeabc/pages/process
> Disallow: /cgi-bin/storeabc/pages/scan/
> Disallow: /cgi-bin/storeabc/pages/search
> 
> I really just want the flypages, like this, to be ranked:
> 
> http://www.my-domain.com/cgi-bin/storeabc/sku12345.html
> 
> Any tips, pointers, ideas, ridicule?
> 
> 
> Curt Hauge


You could use [env REQUEST_URI], dump the URI to a variable and use a bit of
perl at the top of the page to search for /scan/ in the URI, if you find
/scan/ set the meta name robots for the page to:-

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> 

Andy.




More information about the interchange-users mailing list