[ic] RobotUA

Philip S. Hempel interchange-users@icdevgroup.org
Tue Nov 26 22:45:01 2002


Grant wrote:

>>Grant [listbox@email.com] wrote:
>>    
>>
>>>I've had my RobotUA all set up for a few days, but examining my rotated
>>>access_log files, the robots aren't getting any further than this:
>>>
>>>66.196.65.16 - - [25/Nov/2002:18:30:41 -0800] "GET /robots.txt
>>>      
>>>
>>HTTP/1.0" 200
>>    
>>
>>>0 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com;
>>>http://www.inktomi.com/slurp.html)"
>>>66.196.65.16 - - [25/Nov/2002:18:30:42 -0800] "GET / HTTP/1.0"
>>>      
>>>
>>301 330 "-"
>>    
>>
>>>"Mozilla/3.0 (Slurp/si; slurp@inktomi.com;
>>>http://www.inktomi.com/slurp.html)"
>>>
>>>Here's my RobotUA entry:
>>>
>>>RobotUA WebCrawler, BaiDuSpider, ZyBorg, almaden.ibm, Googlebot, Slurp,
>>>Girafabo
>>>t, ia_archiver, LinkWalker, MSIECrawler
>>>
>>>      
>>>
>>One of four things could be happening:
>>
>> 1. Your robots.txt could be limiting access.
>> 2. The spider may object to receiving a "301 Moved" status when asking
>>    for a webpage.  Perhaps it suspects 'cloaking' and just stops there.
>> 3. The spider may intend to return later to ask for more pages.  Some
>>    spiders do this to keep the load on your server to a minimum.
>>    Remember that some servers have lots of websites.
>> 4. RobotUA could be broken, although I doubt it.  You can check it
>>    yourself by pretending to be "Slurp/si; slurp@inktomi.com" when
>>    and then requesting '/'.  Check the resulting page for "unfriendly"
>>    links.
>> 5. I said 4, didn't I?  If you can think of another then let me know.
>>    
>>
>
>First of all, thanks a lot to Kevin, Phillip, and Jonathon for answering my
>question.  After using the Sam Spade browser to make sure the links were
>friendly, I'm thinking it must be #2 above.  How can I get www.mystore.com
>to forward to www.mystore.com/cgi-bin/catalog/index.html without issuing a
>301?  I was using .htaccess and the RedirectPermanent directive to
>accomplish that redirect, but that definitely returns a 301.  What can I do
>to make a clean switch there?
>
>- Grant
>
>  
>
Usually with a 301 it takes a couple of runs from most spiders to decide 
to go anywhere else into the
site. Now depending on how long your system has been running with a 301 
if you move now it will cause
you more problems. Realize that 301 is just like you told the mailman 
you have a new address and then
you send a new change of address to all of your magazine companies.
Now how long does it take for them to get around to sending them to your 
new address?
Then sundenlly you decide to send them and your mailman a new change of 
address again even before
they have actually acted on your old change of address. Well you will 
have at least 2 monthns before you get
any magazines or a good part of your mail will end up in different places.

So usually using 301 in difference to 302 that says temp move don't keep 
record of it. This is a very bad things
when it comes to spiders if you keep bouncing arround.
This is spoken completly from experience since I did this myself and 
have seen its effects.

Also all of your DMOZ entries also need to point to your redirected 
location to get credit for it.

Point is this if you have just started doing this move, then leave it 
alone. It will take at least 2 months for
google and a few others to catch up. If you have done this for awhile 
you could completley lose at least
a months worth of crawls until they get around to seeing the new move.

This happend to me and I got impatient myself and moved around again.
Lost much traffic and after talking to some people at webmasterworld,
they just told me to not mess with it and be patient they will crawl 
your site within one to two
months. If your sids are not showing they will jump on it soon.

-- 
Philip S. Hempel
debian/rules