[Date Prev][Date Next][Thread Prev][Thread Next][Minivend by date
][Minivend by thread
]
Re: Hardware and Performance
Quoting John Edstrom (edstrom@Poopsie.hmsc.orst.edu):
> ****** message to minivend-users from John Edstrom <edstrom@Poopsie.hmsc.orst.edu> ******
>
> mikeh@minivend.com
> >
> > ****** message to minivend-users from mikeh@minivend.com ******
> >
> > Quoting Colin Mitchell (colin@5points.net):
> > >
> > > There is a lot of overhead involved in establishing the connection between
> > > the SQL server and minivend. Sockets need to be opened, etc. In my
> ...
> > > keep a pool of already opened connections to the server, and you use those
> > > whenever possible instead of opening a new one.
> > >
> >
> > That part is obvious; I can probably reduce connections by doing only
> > one connection per separate database rather than one per table in a
> > single catalog, hashing the connection parameters and saving handles. My
> > understanding is that DBI already does this, though.
>
> I don't think so. As of DBI 1.10 I didn't see anything about caching
> connections. I see where there is a prepare_cached method to save
> statements, but the CacheKids entry in the docs says "... statement
> handles created by the (not yet implemented) connect_cached method."
>
Aha. It must have been in plans... I know all about that. 8-)
[snip]
>
> DBI doesn't seem to be caching connections.
>
> I wasn't aware that MV make one connection per table. The one
> connection per database per server that you mentioned above would help
> significantly if somebody uses several tables from each database.
> Connecting and disconnecting can have a significant impact. Here is a
> simple benchmark that sort of measures the overhead of establishing a
> connection-per-table approach:
>
> use strict;
> use DBI;
> use Benchmark;
>
> my ($db, $sth);
>
> timethese(200, {
> in_loop => sub {
> for (1..5) {
> $db = DBI->connect("dbi:mysql:test:host=localhost",
> '', '') or die $DBI::errstr;
> $sth = $db->prepare('select * from x');
> $sth->execute;
> $sth->finish;
> $db->disconnect;
> }
> },
> out_loop => sub {
> $db = DBI->connect("dbi:mysql:test:host=localhost",
> '', '') or die $DBI::errstr;
> for (1..5) {
> $sth = $db->prepare('select * from x');
> $sth->execute;
> $sth->finish;
> }
> $db->disconnect;
> },
> out_cached => sub { #look at prepare() overhead for a lark
> $db = DBI->connect("dbi:mysql:test:host=localhost",
> '', '') or die $DBI::errstr;
> for (1..5) {
> $sth = $db->prepare_cached('select * from x');
> $sth->execute;
> $sth->finish;
> }
> $db->disconnect;
> }
> });
>
> the output:
> Benchmark: timing 200 iterations of in_loop, out_cached, out_loop...
> in_loop: 11 wallclock secs ( 5.71 usr + 1.97 sys = 7.68 CPU)
> out_cached: 4 wallclock secs ( 1.83 usr + 0.48 sys = 2.31 CPU)
> out_loop: 4 wallclock secs ( 2.47 usr + 0.58 sys = 3.05 CPU)
>
> The difference is 4.63 seconds/200 reps or 23 ms per rep. The
> difference between in_loop and out_loop is 4 connect/disconnect pairs
> per rep, so the final cost is 23/4 or 5.8 ms per connect/disconnect
> pair on a P120.
Thank you, I love getting numbers! I will proceed to cache the table
connection by the 3.15 release -- it seems well worth it. I will need
some SQL people to test this before release though; my understanding of
it is only rudimentary as you can tell.
Do remember, though, that MiniVend doesn't automatically open tables;
it sets up a pseudo-object that only connects if actual data is
requested. Any number of accesses to one table makes one connection.
My own tendency is to try and keep search lists within one table,
putting the complex stuff in the link to the part number.
Also, if someone would try the new global Database directive
on SQL I would appreciate some feedback. I am not positive that SQL
will work on all systems, expecially on Perl 5.004.
>
> >
> > The part that is not obvious is how you do anything more in a forked
> > server setup. Essentially, you would have to pass data through a shared
> > memory space. Unless it is a threaded, and not a forked, implementation
> > things are not obvious. The overhead of this could easily kill any
> > savings.
>
> too, too true. I think that this is one or Perl's weakest points.
>
> The only thing I can think of is to have MV spawn a DBI server
> (DBI::Proxy?). Some overhead might be saved by having MV establish
> 'cheap' connections to the proxy which would establish and then
> maintain persistent 'expensive' connections to the db server. But
> that just defers the problem to the proxy; how do you get the proxy to
> keep more than one ball in the air at a time?
>
> I don't see a simple solution to this within perl either.
I did check out DBI::Proxy yesterday and that doesn't do what we need.
It looks like more code.....
But there are some possible things that can be done with MV4 and the
new virtual tlink. Right now, the problem is that I don't know what
databases a catalog has until well after I have forked. If I know what
catalog is calling before I fork, and I have a list of connections that
should be as persistent as possible, I can check to see if they are open
before forking.
It is a complex situation, one that is not a problem for MiniVend's
internal databases -- a simple microsecond-span file open away. Since
I use those more often than not, I rarely see scalability problems. I
have machines that serve hundreds of thousands of parsed pages a day
and run load averages under one.
--
Mike Heins http://www.minivend.com/ ___
Internet Robotics |_ _|____
131 Willow Lane, Floor 2 | || _ \
It's a little-known fact Oxford, OH 45056 | || |_) |
that the Y1K problem caused <mikeh@minivend.com> |___| _ <
the Dark Ages. -- unknown 513.523.7621 FAX 7501 |_| \_\