[Camps-users] Upcoming development session and a proposed way forward

Sun Jan 11 15:58:37 UTC 2009

All,

We at End Point are going to be having our annual company meeting later
this month.  And at said company meeting, we're going to be having a
hackathon of sorts, which will consist of several groups, each working
on a particular open source project.  One group will be focusing on
doing work for camps.  I will be very closely involved with that group,
and need to figure out what we'll be doing.

We have a small handful of people and a three hour block, with a few
weeks to prepare.  So, we can't get a ton done.  But it's a
not-insignificant chunk of time and the more clearly we can define what
it is we're setting out to do, the greater our chances of delivering
something of value to the camps project.

With that in mind, I'd like to put forward (yet another) proposal for a
camp system design.  This is derived in part from the proposal I made
almost a year ago (in February), except that it arguably simplifies
things: instead of having "hosts" versus "services" versus "resources",
we just have "resources" which encompass everything.

This does not have to be what we do.  We could look at all sorts of
things, from starting out with a fundamentally new design, to unifying
the camp commands to a single "camp <command> <options>" style, to
improving the docs on the existing system, etc.

I'd love to hear some ideas on what matters, but I'd specifically like
to hear feedback on the proposed design (to follow, in great detail).
If we can have some rough consensus on what a next-generation camp
system would look like, this group could start on it, understanding that
we would start with a bite-sized chunk that we could really deliver in a
relatively short time.

Please let me know your thoughts.  Thanks.
- Ethan

BASIC MODEL

No separation of "hosts", "services", and "resources".  Just "resources".

A resource may be a host, or a service.  There's no need for
distinguishing them.

One resource can depend on another.

One resource may contain another.

We use resource containment (object composition) to build operational
relationships between resources.

We use (hopefully shallow) inheritance hierarchies to define families of
related resource types (like a "database" resource family of which
"Postgres" and "MySQL" are members).

INTERACTING WITH RESOURCES

1. Low level interaction

A resource has two basic "command interfaces", by which I mean ways of
issuing commands/requests to the thing represented by the resource (not
"interfaces" in the sense of Java-esque interfaces that a class implements):
* a "system" interface: this is the interface through which the camp
system issues commands/requests to initialize and configure the resource
in question (i.e. if you need to initialize a new Postgres cluster, you
use the Postgres resource's system interface)
* a "service" interface: this is the interface that issues
commands/requests to the actual public-facing interface offered by the
resource in question; once you've configured your Postgres resource
through its system interface, you issue DDL commands through its service
interface.

Each resource has a container attribute, which simply points back to the
resource that contains the resource in question.  So, the local
operating system resource is the container for the postgres resource.

By default, a resource's "system" interface simply passes arguments
through to the resource container's "service" interface.  Thus, commands
issued to the resource's system interface are formatted as commands
issued against the container service.  We would probably want some
convention so that adding in resource class-specific functionality (so
all commands issued would be issued relative to some base
script/executable/whatever) is easily done with minimal base/super-class
interaction annoyances (no need to call $self->SUPER::foo, for instance,
in Perl-speak).

The default resource is "localhost", the service and system interfaces
of which are the same.  They simply format commands to go to the
underlying operating system of the primary camp server.

A resource representing a remote host might use the "localhost"
interfaces to format commands, but pipe those commands over an SSH
tunnel, maintained within the resource instance.

A virtual machine resource would implement the SSH tunnel design,
presumably, for its "service" interface, while the default value of
using its container's "service" interface for its own "system" interface
would make perfect sense.  So, for instance, "localhost" might be the
container, and meaning that the VM resource system interface is the same
as the localhost service interface.  So, you are effectively issuing
shell commands to localhost via the VM resource service interface when
initializing the VM resource.

2. High-level interaction

We might want resources to implement a few common methods like:
* initialize
* remove
* start
* stop
* restart

Logically, initialize and remove methods would be primarily concerned
with the system interface.  Start, stop, restart would probably use the
service interface.  For instance, an Apache resource would basically use
file system operations (copying files, removing files) to build up and
tear down an Apache resource instance.  But controlling the resource
once it's initialized would all be done through the httpd command,
around which the service interface would basically be a wrapper.

RESOURCE CONFIGURATION

Resources would have attributes that must be configured.  Attribute
values should generally be calculated algorithmically.  The current camp
system basically treats attribute calculation as a set of mathematical
functions (that don't look terribly mathematical); for a given camp
number X, each attribute has only one possible value f(X) (i.e. the camp
number determines everything, from paths to port numbers to hostnames).

However, it should be possible for attributes to be persistent, meaning
that once derived they are permanently stored as part of a camp's
configuration, rather than dynamically calculated for all time.

It must be possible to reinitialize or reconfigure a resource (or set of
resources), and persistent attributes can be preserved rather than
recalculated.  Control over what persistent attributes to blow away at
re-initialization time must be built in from the beginning if this is to
make sense.

Resource configuration, once determined, should be serialized down to
some storage format like YAML, JSON, XML (!!!!!!!!!!!), etc., so that
persistent attributes are preserved.

Furthermore, because the central camp system needs to know all
attributes of the extant camps, this configuration information must be
preserved in the central system.  However, to potentially allow for a
distributed camp system where camps are spread across N nodes in a
server cluster (yes, we have a camp system for which this would be
highly relevant), and simply to allow local inspection of the
configuration, each camp's resource configuration information should
persist within the camp itself (i.e. in userland rather than base
camp-systemland).

This means having some configuration data scattered around, which is
kind of a drag.  But it's a manageable drag.  Camp commands for
manipulating configuration values (camp config set <name> <value>) could
only be counted as successful if they appear to work centrally and
locally, for instance.

MAGICAL RESOURCES

The base resource would of course define basic behavior for all resources.

A "localhost" resource would exist by default.  It refers to the
underlying operating system (and shell environment) for the central camp
system itself.

I'd like to propose that individual camps be treated as resources as
well.  This is a fairly new idea (in my mind, anyway) and may be
completely ridiculous.  I'm just putting it out there.  Representing the
camps themselves as resources, and then the attributes of the camps
(i.e. numbers, owners, base paths, etc.) are managed like any other
resource attribute.  Furthermore, the persistent attribute functionality
lets us potentially have more command over what goes into a given camp.
 Perhaps the default for the memcached resource is for camps to get a
single memcached server node.  That would be fine for most cases,
probably, unless you're the guy who needs to do hard-core memcached
usage, testing, etc., in which case you really need multiple memcached
servers for your camp.  So you can configure your camp resource to
indicate a need for 5 memcached servers, say, which in turn affects how
the memcached resources are configured when you reinitialize your camp.

That raises the complexity of things considerably but I like it anyway.
 Please act terribly surprised.

RESOURCE IDENTIFICATION

Each resource should have a friendly type-name.  Like "localhost",
"postgres", "apache", "git", "svn", etc.

Any given resource in a camp system deployment should have a name
attribute that can be explicitly set in the configuration, but the
resource should default to its type-name.  This means we have a decent
convention that would work well for relatively simple deployments for
which there is only one sort of resource for each given layer of the
stack (e.g. one Apache instance, one Postgres instance, one appserver
instance, etc.).

If there is truly only one use of a particular resource type within a
deployment, then the bare type-name can suffice to identify that
resource.  However, names can be relative to container resources;
resources representing remote hosts, for instance, could all contain a
"postgres" resource:
  - host1.postgres
  - host2.postgres
  - ...

Using a configuration-specified name only becomes important if you want
to name things according to use-case-specific roles ("master_db" versus
"slave_db", for instance), or if you need multiple instances of the same
resource (an Apache instance that serves static content, and an Apache
instance that functions as an appserver for php/mod_perl/mod_python).

RESOURCE DECLARATION

Naturally, we need a way to declare resources as existing/mattering, as
having relationships with each other, etc.

I need to think this one through, more (whereas nothing else defined
above requires any further thinking-through whatsoever, obviously).
However, a few things probably ought to guide us:
* simplicity and clarity
* ease of use
* common-sense defaults/conventions that fit the simple/common case,
easily extended/overridden for the less common cases.

If that sounds a little like Rails' "convention over configuration",
there's probably a reason.

Here's an idea:
* under any given camp type, there's a "resources" directory in which
the resource definitions reside
* any top-level resource (other than "localhost", which is the magical
root resource) appears as a directory within this "resources" directory
(e.g.. <camp_type>/resources/postgres/;
<camp_type>/resources/memcached/; <camp_type>resources/django/; etc.,
etc., etc.)
* the name of the directory identifies the resource's name relative to
its containing resource; in the above examples, that means you get
top-level resources named "postgres", "memcached", and "django"
* within each directory, some file exists that defines the resource
configuration/behavior; it could be named the same for any resource
(i.e. "resource", or "config", etc.), or it could be required to have a
name that matches the resource name (i.e. same as the directory)
* that file would be a Perl module.
* Furthermore, it would, when parsed, default to subclassing the
resource type of the same name as the resource being defined; so, if the
resource is named "postgres", the camp system would look for a
"postgres" resource definition class in its standard library of known
resource types; if found, the "postgres" resource's configuration module
would automatically subclass it.
* Furthermore, the module can override this and explicitly specify what
type of resource it is.  Hence we have convention and configuration.  Yay.
* Still furthermore, the module would automatically be using Moose and
whatever helper functions we want available for easy definition of
attributes, etc.
* A resource can have a subdirectory "resources" that contains still
more resource definitions, establishing the container/contained
relationship sensibly within the filesystem.
* A resource has a subdirectory "templates" that contains template
configuration files for rendering when installing resources into a camp.
 This is like the <camp_type>/etc/ directory from the existing camp
design, but cleanly separated by resource

A master configuration file for the camp type should specify a listing
of resources to include by default in a new camp.  Perhaps that's done
in some nice declarative manner, like:

 default_resources(
     qw(
         postgres
         apache
         memcached
         git
         django
     )
 );

(Or something equivalent, perhaps in YAML.  Whatever).

When operating on a camp (setting up a new one, for instance), the camp
system consults the command-line arguments to determine what resources
to include; if not specified there, it uses the default.

>From there, it loads up only the resource modules that are necessary.

All resources get instantiated as objects in memory with their
configuration determined as a first pass, and then templates get
rendered and resources installed/launched as a second pass.

BASE RESOURCE DEFINITIONS

It is basically implied by the above that resource types have a base
definition within the camp system.

So, there would be one base "resource" module defining the stuff common
to all resources.

>From there, we would have subclasses of this "resource" module that
implement specific resource types (again, "postgres" versus "mysql"
versus "apache" versus "fabulous shiny pants" versus whatever else).
This is standard inheritance.

We do not fret about getting too generic, at least at first.  I.e. no
"Resource::Database" from which "Resource::Database::Postgres" and
"Resource::Database::MySQL" derive.  We implement postgres and mysql
independently, and if common things can obviously be factored out into
some common ancestor, we do it.  But we shouldn't assume too many
similarities between competing resource types despite them fulfilling
the same role (if Postgres and MySQL can honestly be thought of as
fulfilling the same role, which is a topic for another day).

Because the resource configuration modules in each camp type are Perl
modules that subclass these things, we automatically get the opportunity
for customization within our camps without having to hack the base type
classes themselves.  Joy.  That ought to be a no-brainer, but those of
us accustomed to working with some older appservers will likely find
this most liberating.

I could go on but that's enough for now.

-- 
Ethan Rowe
End Point Corporation
ethan at endpoint.com