[ic] Spiders and more lists again
Stefan Hornburg (Racke)
racke at linuxia.de
Fri Sep 3 06:35:10 UTC 2010
On 09/02/2010 04:10 PM, DB wrote:
> This has been discussed before. Spiders crawling "more list" links
> sometimes put a significant load on my server. I'm trying to identify
> exactly which of my pages that the spiders are trying to crawl so that I
> can modify the page(s), rewrite the URLs or otherwise fix the problem.
>
> An example of such a request from my httpd access log is:
>
> GET
> /scan/MM=13d6003ffad2d76b12eb41868a5277a3:124360:124379:20.html?mv_more_ip=1&mv_nextpage=results&mv_arg=
> HTTP/1.0
>
These URLs are pointing to more pages and are different for each user's session. You
better deny access to them in your robots.txt.
> If I try this URL in a browser, I get "no search was found" Can anyone
> provide a clue about what exactly the spider is looking for and/or come
> up with a clever solution?
I'm using clean URLs for category searches like
http://www.f-shop.de/cgi-bin/f-shop/rollenspiele/dungeons_dragons_3_5
and wrote my own paging routine.
Regards
Racke
--
LinuXia Systems => http://www.linuxia.de/
Expert Interchange Consulting and System Administration
ICDEVGROUP => http://www.icdevgroup.org/
Interchange Development Team
More information about the interchange-users
mailing list