[ic] Rolling big tables (mysql)

Kevin Walsh kevin at cursor.biz
Wed Apr 11 18:55:27 EDT 2007


Grant <emailgrant at gmail.com> wrote:
> > > I do keep a separate table of robot UAs and match traffic rows to them
> > > with op=eq to populate another table with robot IPs and non-robot IPs
> > > for the day to speed up the report.  Don't you think it would be
> > > slower to match/no-match each IC request to a known robot UA and write
> > > to the traffic table based on that, instead of unconditionally writing
> > > all requests to the traffic table?  If not, excluding the robot
> > > requests from the traffic table would mean a lot less processing for
> > > the report and a lot fewer records for the traffic table.
> > >
> > Perhaps you should create a column called "spider" in the traffic table
> > and save a true or false value depending upon the [data session spider]
> > value.  You can then generate reports "WHERE spider = 0", for ordinary
> > users, or "WHERE spider = 1" for robots etc.  An index on the spider column
> > would be nice, of course.
> >
> Have you gotten comfortable using a partial match to determine a robot
> UA?  I used to use RobotUA but I ended up wanting to make exact
> matches.
>
I'm only concerned about whether a UA is a robot or not, so
a "partial match" on a list of keywords is fine.  The actual UA
name is saved in the Apache access_log file, so I could examine
and report on individual names later, if I really wanted to.

> 
> Indexing sounds like something I need to make use of.  Is that an
> mysql convention handled completely outside of IC?
> 
Indexing is extremely important.  If you're using MySQL, PostgreSQL
or similar then you'll need to create indexes that are appropriate
to the queries you want to perform.  Database indexes are maintained
by, and administered from within, the the database engine.

> >
> > Then again, I wouldn't save traffic data to a table anyway.  I'd use
> > usertrack and/or the apache access_log for that.  There are lots of tools
> > that will allow you to analyse Apache log files.  You can even save some
> > Interchange usertrack info into a custom Apache access_log file.
> >
> All of my domains run from the same catalog and I don't use
> directories or query strings in the URL so I need something that will
> allow me to track the domain involved with each request.  Can I save
> the domain into a custom access_log?  If so, do you know of an
> analyzer that would allow me to report on that domain info?
>
Each domain should have its own <VirtualHost> block in the Apache
configuration.  If you have this already then you could log hits to
each domain to an individual log file.

If you don't like multiple access_log files, for some reason, then you
can use the "%v" token, with Apache's LogFormat directive, which will
expand to the virtual host name.

Most logfile analysers can handle log files that contain records from
multiple virtual hosts, or you can use something like "split-logfile"
to split the logfile by virtual hostname.

A couple of Google searches will scare up some useful information.

-- 
   _/   _/  _/_/_/_/  _/    _/  _/_/_/  _/    _/
  _/_/_/   _/_/      _/    _/    _/    _/_/  _/   K e v i n   W a l s h
 _/ _/    _/          _/ _/     _/    _/  _/_/    kevin at cursor.biz
_/   _/  _/_/_/_/      _/    _/_/_/  _/    _/


More information about the interchange-users mailing list