[ic] how do I stop Google from trying to index scan pages?

Steve Graham icdev at mrlock.com
Thu Sep 18 21:04:12 UTC 2008


> I managed it like this:
>
> http://www.kynsitukku.hertell.com/robots.txt
> robots.txt
> User-agent: *
> Disallow: /admin/
> Disallow: /ord/
> Disallow: /query/
> Disallow: /scan/
> Disallow: /account.html
> Disallow: /process.html
> Disallow: /search.html
>
> User-agent: Googlebot-Image
> Disallow: /
>
>
> And then i created a sitemap that i submitted via the google webmaster
> tools page. The beginning of the sitemap is static (normal pages like
> about, contact etc), and the rest i just did with this simple query:
> http://shop.kynsitukku.hertell.com/sitemap.xml
>
> [query
> list=1
> ml=9999
> sql="
> select *
> from products_fi_FI
> where inactive <> '1'
> "
> ]<url>
>  <loc>http://www.kynsitukku.hertell.com/[sql-code].html</loc>
>  <priority>0.5</priority>
>  <changefreq>daily</changefreq>
> </url>
> [/query]
>
> To get completely rid of inactive products, i added this to the top of
> my flypage:
>
> [if-item-field inactive]
> [tmp page_title][msg arg.0="[item-code]"]Sorry, the page (%s) was not
> found[/msg][/tmp]
> [tag op=header]
> Status: 404 Not found
> Content-type: text/html
> [/tag]


Very Nice Rene,

I wish I would have made my catalog a root installation like you did - a 
little to late for that now, but I appreciate the code.

Steve 




More information about the interchange-users mailing list