[ic] Spiders and more lists again

Fri Sep 3 06:35:10 UTC 2010

On 09/02/2010 04:10 PM, DB wrote:
> This has been discussed before. Spiders crawling "more list" links
> sometimes put a significant load on my server. I'm trying to identify
> exactly which of my pages that the spiders are trying to crawl so that I
> can modify the page(s), rewrite the URLs or otherwise fix the problem.
>
> An example of such a request from my httpd access log is:
>
> GET
> /scan/MM=13d6003ffad2d76b12eb41868a5277a3:124360:124379:20.html?mv_more_ip=1&mv_nextpage=results&mv_arg=
> HTTP/1.0
>

These URLs are pointing to more pages and are different for each user's session. You
better deny access to them in your robots.txt.

> If I try this URL in a browser, I get "no search was found" Can anyone
> provide a clue about what exactly the spider is looking for and/or come
> up with a clever solution?

I'm using clean URLs for category searches like

http://www.f-shop.de/cgi-bin/f-shop/rollenspiele/dungeons_dragons_3_5

and wrote my own paging routine.

Regards
          Racke

-- 
LinuXia Systems => http://www.linuxia.de/
Expert Interchange Consulting and System Administration
ICDEVGROUP => http://www.icdevgroup.org/
Interchange Development Team