[Date Prev][Date Next][Thread Prev][Thread Next][Minivend by date
][Minivend by thread
]
[mv] problems w/ MV 3.14-3
****** message to minivend-users from "Brian D. Davison" <brian@davison.net> ******
I've been running minivend for years, but haven't done much new minivend
coding so I've been off this list for quite a while. But I've searched
the archives and docs and can't seem to find some answers I need.
Most of these issues have bothered me for a long time, and perhaps
others will learn from some of the workarounds I've tried.
Session growing uncontrollably:
We're currently serving some 18k requests per day through minivend.
I expire the catalog (using expire -r -c catalog) every day at a
low-usage time period (which of course means people outside of the US
have problems), because most days I get error messages in
my http logs of failed requests that say that access to a minivend
url failed for reason: Premature end of script headers. So this
appears to be a problem with the expire process locking out regular
requests. I give the expire 40 minutes to run, and then I
automatically restart MV because sometimes it never recovers on its own.
While all this is very annoying, we've lived with it because I haven't
had time to dig deep enough to figure it out. But what really becomes
a problem is that the expire doesn't seem to be helping -- over time
(say 6-12 months) the session db grows to be huge (say >200MB) and
at that size, minivend requests take much longer to complete and the
machine gets much slower under the increased load. My not so good
solution is to just delete the session and retire_id bd files
when they get too big, but I really don't like that approach since
it loses all current data. [I use a SaveExpire of 7 days, and leave
SessionExpire unset (default of 1 day).]
Cached pages:
I'm sure this is a simple one. I use ClearCache Yes, PageCache Yes,
ScratchDir cache, and SearchCache Yes. And when I restart MV, I
get messages in my error log saying that .../cache/SearchCache
and .../cache/PageCache have been cleared, except that those
directories are always empty. The cache files are actually in
.../cache and it never gets cleared unless I do it manually.
The subdirectories are owned by the minivend user (although I see that
the permissions are 700, which is unusual but shouldn't be a problem).
Hung/stuck processes:
Periodically, I'll find MV hung, with a single process using more than
a few seconds of CPU time (say a few minutes). During this time, no
other MV process is functioning, so all activity is halted. Most of
the time the MV housekeeping will find this job and kill it, but
sometimes it fails in a way that the job is no longer there but the
other MV processes are not accepting new requests, and so I have to
manually kill them all off and restart MV.
What I'd really like is to keep MV from getting hung in the first
place. When I see stuck jobs like that, sometimes I'll kill them
manually (i.e. without waiting for the MV housekeeping), and I'll
notice in the error log that the request was made by a robot
(sometimes a search engine robot, sometimes a less-friendly one).
And that these requests (made by the robot) are usually requests
that include the session id in the URL. This is wild speculation,
but I'm guessing that mv sometimes has a problem with 'random'
session ids. Using robots.txt and apache deny directives are
insufficient, as there are some robots that I want (see next item)
and there are always new robots that I haven't filtered yet.
Besides, they just hide the problem (that I assume is w/in MV).
Can't eliminate session ids from URLs:
My site has a large enough catalog (20k+) that I'm unwilling to
create static pages for every item. So, we serve every page
dynamically (except for those served by the cache). However,
a large part of our traffic comes from search engines because
we've managed to get pages from the catalog indexed. I'm familiar
with the [set mv_no_session_id] and [set mv_no_count] and I have
those on _every_ page, and it helps reduce the set of URL names
for my pages that people might use. However, what I really want
[I think] is to eliminate session ids from URLs in all cases.
I realize this makes my catalog unusable by those who don't
allow cookies, but we have alternate ordering methods (email, toll-free
phone, etc.) that people can use if they want. The reason I want
this is because search engine robots (or any robot that I know of)
use cookies, so when they visit, all of the outgoing links get
the session id embedded in them. Which means that periodically, I get
search engine robots asking for a page multiple times with different
session ids (they can't tell that it is really the same page).
This caused me real problems a couple of days ago when Inktomi (a robot
that I normally like because it gets us in lots of engines) did this,
causing me loads of problems because of MV hanging (described above).
So, if I can avoid generating URLs with session ids, I can eliminate
multiple retrievals, and some of the stuck jobs from robots.
OK, have I ranted enough? I'm sure I've got more questions, but these are
the ones that have been a real pain. I'd really like pointers, or patches
that would help these issues. Does MV4 fix these things?
While I'm using MV3.14-3, these problems are _not_ specific to it because
I've lived with them through multiple versions. I'm currently running
under a Redhat 5.0 server, with perl 5.004_04 and GDBM db files, but I'm
planning downtime this weekend to upgrade to RH 6.1 (for other reasons).
-- Brian
Brian D. Davison - brian@davison.net (732) 249-6492
=============================================================================
Brian's Books - online computer books catalog - http://www.briansbooks.com/
============= WEB CACHING RESOURCES http://www.web-caching.com/ =============
-
To unsubscribe from the list, DO NOT REPLY to this message. Instead, send
email with 'UNSUBSCRIBE minivend-users' in the body to Majordomo@minivend.com.
Archive of past messages: http://www.minivend.com/minivend/minivend-list