[ic] Spiders and more lists again
interchange at hertell.com
Fri Sep 3 10:24:06 UTC 2010
On 3.9.2010 9:35, Stefan Hornburg (Racke) wrote:
> On 09/02/2010 04:10 PM, DB wrote:
>> This has been discussed before. Spiders crawling "more list" links
>> sometimes put a significant load on my server. I'm trying to identify
>> exactly which of my pages that the spiders are trying to crawl so that I
>> can modify the page(s), rewrite the URLs or otherwise fix the problem.
>> An example of such a request from my httpd access log is:
> These URLs are pointing to more pages and are different for each user's
> session. You
> better deny access to them in your robots.txt.
>> If I try this URL in a browser, I get "no search was found" Can anyone
>> provide a clue about what exactly the spider is looking for and/or come
>> up with a clever solution?
> I'm using clean URLs for category searches like
What i did was the following. I added this to robots.txt
and created a sitemap that i submitted to google. I put wrote a guide
for that over here: http://wiki.icdevgroup.org/moin.cgi/sitemap.xml
If you have more that 50k products, then you might need to split up the
sitemap into smaller pieces..
More information about the interchange-users