[Camps-users] Upcoming development session and a proposed way forward

Mon Jan 19 08:37:59 UTC 2009

Brian J. Miller wrote:
> Ethan Rowe wrote:
>> All,
>>
>>
>> With that in mind, I'd like to put forward (yet another) proposal for a
>> camp system design.  This is derived in part from the proposal I made
>> almost a year ago (in February), except that it arguably simplifies
>> things: instead of having "hosts" versus "services" versus "resources",
>> we just have "resources" which encompass everything.
>>
> 
> Figures, you would go the opposite direction that I already coded in ;-).

It looks like your work has been almost exclusively towards the
command-line interface; the CLI modules have some logic in them here and
there, while everything else is a stub.

Consequently, I feel like we can, at the hackathon, pursue this
"everything's a resource" model proposed, and won't be particularly at
odds with anything you've done.  Which is great; we're effectively
developing different parts in parallel.

>> This does not have to be what we do.  We could look at all sorts of
>> things, from starting out with a fundamentally new design, to unifying
>> the camp commands to a single "camp <command> <options>" style, to
>> improving the docs on the existing system, etc.
>>
> 
> See my other e-mail, I think there is enough there that building a
> unified command syntax will be the easy part, pulling all the other
> pieces together is more difficult.

Yep.  Though the hard part is the fun part.  ;)

>> I'd love to hear some ideas on what matters, but I'd specifically like
>> to hear feedback on the proposed design (to follow, in great detail).
>> If we can have some rough consensus on what a next-generation camp
>> system would look like, this group could start on it, understanding that
>> we would start with a bite-sized chunk that we could really deliver in a
>> relatively short time.
[snip]
>>
>> BASIC MODEL
>>
>> No separation of "hosts", "services", and "resources".  Just "resources".
>>
>> A resource may be a host, or a service.  There's no need for
>> distinguishing them.
>>
> 
> I'm not sure I fully see the need for "host" at all, but it may be that
> you are thinking more of encapsulation and hierarchy than is in camps
> v3. I was thinking of the host as just being a configuration parameter
> of a resource. Or may be this is more related to distributed
> environments than my brain is ready to contemplate :-).

In my view, abstraction of host is essential to the camp viability into
the future.  Bear in mind that we don't have to have immediate support
for remote hosts and such; only the abstraction of host so that remote
hosts can be added naturally in the future.

We've already had at least two significant camp deployments that
required support for resources running on remote hosts.  In both cases,
the main camp host (on which the camp system) handled webserver and
appserver, while a separate database server handled all the camp
database clusters.  We've seen need for further remote host usage; for
instance, memcached clusters.  Distributed filesystem development camps
(for End Point's own work with OpenAFS, for instance) would benefit
considerably from support for remote hosts.  And so on.

Beyond that, consider:
* virtualization: virtual servers through Xen and company are often
cited (particularly by people with relatively little experience actually
working with the problem) as the solution to the development environment
problem, except virtualization by itself doesn't give you any meaningful
management of the virtual environments.  A camp system that abstracts
out hosts can treat a virtual server like just another host, with a
configuration step that localhost or SSH-accessed remote hosts don't
typically require.
* scaling for large architectures/teams: once you have a team of 30+
engineers developing a system, the need to scale out your camp server
becomes fairly acute.  Getting a beefy server is one solution, but that
will only get you so far.  So, support for host as a resource gives us
the potential for scaling to multiple servers on  a round-robin basis or
similar, and thus support large teams (or complex architectures with
lots of services; or both).

>> One resource can depend on another.
>>
>> One resource may contain another.
>>
>> We use resource containment (object composition) to build operational
>> relationships between resources.
>>
>> We use (hopefully shallow) inheritance hierarchies to define families of
>> related resource types (like a "database" resource family of which
>> "Postgres" and "MySQL" are members).
>>
> 
> Yep, was thinking along these lines.
> 
>> INTERACTING WITH RESOURCES
>>
>> 1. Low level interaction
>>
>> A resource has two basic "command interfaces", by which I mean ways of
>> issuing commands/requests to the thing represented by the resource (not
>> "interfaces" in the sense of Java-esque interfaces that a class
>> implements):
>> * a "system" interface: this is the interface through which the camp
>> system issues commands/requests to initialize and configure the resource
>> in question (i.e. if you need to initialize a new Postgres cluster, you
>> use the Postgres resource's system interface)
>> * a "service" interface: this is the interface that issues
>> commands/requests to the actual public-facing interface offered by the
>> resource in question; once you've configured your Postgres resource
>> through its system interface, you issue DDL commands through its service
>> interface.
>>
>> Each resource has a container attribute, which simply points back to the
>> resource that contains the resource in question.  So, the local
>> operating system resource is the container for the postgres resource.
>>
>> By default, a resource's "system" interface simply passes arguments
>> through to the resource container's "service" interface.  Thus, commands
>> issued to the resource's system interface are formatted as commands
>> issued against the container service.  We would probably want some
>> convention so that adding in resource class-specific functionality (so
>> all commands issued would be issued relative to some base
>> script/executable/whatever) is easily done with minimal base/super-class
>> interaction annoyances (no need to call $self->SUPER::foo, for instance,
>> in Perl-speak).
>>
>> The default resource is "localhost", the service and system interfaces
>> of which are the same.  They simply format commands to go to the
>> underlying operating system of the primary camp server.
>>
>> A resource representing a remote host might use the "localhost"
>> interfaces to format commands, but pipe those commands over an SSH
>> tunnel, maintained within the resource instance.
>>
>> A virtual machine resource would implement the SSH tunnel design,
>> presumably, for its "service" interface, while the default value of
>> using its container's "service" interface for its own "system" interface
>> would make perfect sense.  So, for instance, "localhost" might be the
>> container, and meaning that the VM resource system interface is the same
>> as the localhost service interface.  So, you are effectively issuing
>> shell commands to localhost via the VM resource service interface when
>> initializing the VM resource.
>>
> 
> I'm not sure I follow all of this, and it sounds like a lot of
> abstraction to try to achieve right away, I'm wondering if it makes more
> sense in a later iteration?

Yeah, I'm trying to outline what the Thing might look like, but we'll
definitely need to get there in an iterative manner.  So I'm not
planning to pursue support for virtualized servers, remote hosts, etc.
at the hackathon.  :)

>> 2. High-level interaction
>>
>> We might want resources to implement a few common methods like:
>> * initialize
>> * remove
>> * start
>> * stop
>> * restart
> 
> To which I would add "refresh" based on the current implementation. But
> the list looks like a good start. I would probably also add "info" as a
> generic way to get just that, info. For example whether a service is
> running/down, out dated, and configuration data, etc.

Yeah, "refresh" is an important one.

It's also important for resources to indicate what they support and what
they don't; a version control system resource has no logical equivalent
to "start", "stop", "restart".  But "initialize", "remove", and
"refresh" make sense.  (One can debate whether in this specific case
"refresh" logically matches up to updating from upstream or tossing out
your local changes to revert to HEAD, or both).

Anyway, for any given deployment, these commands should behave
reasonably depending on the combination of resources involved; if your
camp-managed system doesn't have components that run as a daemon, then
"start" and "stop" should probably throw exceptions in code and soft
error codes from the shell.

>> Logically, initialize and remove methods would be primarily concerned
>> with the system interface.  Start, stop, restart would probably use the
>> service interface.  For instance, an Apache resource would basically use
>> file system operations (copying files, removing files) to build up and
>> tear down an Apache resource instance.  But controlling the resource
>> once it's initialized would all be done through the httpd command,
>> around which the service interface would basically be a wrapper.
>>
>> RESOURCE CONFIGURATION
>>
>> Resources would have attributes that must be configured.  Attribute
>> values should generally be calculated algorithmically.  The current camp
>> system basically treats attribute calculation as a set of mathematical
>> functions (that don't look terribly mathematical); for a given camp
>> number X, each attribute has only one possible value f(X) (i.e. the camp
>> number determines everything, from paths to port numbers to hostnames).
>>
> 
> Good description.
> 
>> However, it should be possible for attributes to be persistent, meaning
>> that once derived they are permanently stored as part of a camp's
>> configuration, rather than dynamically calculated for all time.
>>
>> It must be possible to reinitialize or reconfigure a resource (or set of
>> resources), and persistent attributes can be preserved rather than
>> recalculated.  Control over what persistent attributes to blow away at
>> re-initialization time must be built in from the beginning if this is to
>> make sense.
>>
>> Resource configuration, once determined, should be serialized down to
>> some storage format like YAML, JSON, XML (!!!!!!!!!!!), etc., so that
>> persistent attributes are preserved.
>>
>> Furthermore, because the central camp system needs to know all
>> attributes of the extant camps, this configuration information must be
>> preserved in the central system.  However, to potentially allow for a
>> distributed camp system where camps are spread across N nodes in a
>> server cluster (yes, we have a camp system for which this would be
>> highly relevant), and simply to allow local inspection of the
>> configuration, each camp's resource configuration information should
>> persist within the camp itself (i.e. in userland rather than base
>> camp-systemland).
>>
> 
> In this vein I have the system create a ".camp" directory inside of the
> camp itself for storing information of this kind. Currently I have it
> store the date+time the camp was created and the type, just as sample data.

I had exactly the same thought: have a ".camp" directory (a default;
this should be configurable) containing local metadata about the local camp.

(As an aside, I think we ought to go further and have camp commands
always walk the filesystem to find the nearest ".camp" directory and
effectively "bind" to that metadata store for camp operations, much like
git.)

> 
>> This means having some configuration data scattered around, which is
>> kind of a drag.  But it's a manageable drag.  Camp commands for
>> manipulating configuration values (camp config set <name> <value>) could
>> only be counted as successful if they appear to work centrally and
>> locally, for instance.
>>
> 
> Agreed.
> 
>> MAGICAL RESOURCES
>>
>> The base resource would of course define basic behavior for all
>> resources.
>>
>> A "localhost" resource would exist by default.  It refers to the
>> underlying operating system (and shell environment) for the central camp
>> system itself.
>>
>> I'd like to propose that individual camps be treated as resources as
>> well.  This is a fairly new idea (in my mind, anyway) and may be
>> completely ridiculous.  I'm just putting it out there.  Representing the
>> camps themselves as resources, and then the attributes of the camps
>> (i.e. numbers, owners, base paths, etc.) are managed like any other
>> resource attribute.  Furthermore, the persistent attribute functionality
>> lets us potentially have more command over what goes into a given camp.
>>  Perhaps the default for the memcached resource is for camps to get a
>> single memcached server node.  That would be fine for most cases,
>> probably, unless you're the guy who needs to do hard-core memcached
>> usage, testing, etc., in which case you really need multiple memcached
>> servers for your camp.  So you can configure your camp resource to
>> indicate a need for 5 memcached servers, say, which in turn affects how
>> the memcached resources are configured when you reinitialize your camp.
>>
> 
> This fits in with another desire I had for the new system, and that is
> to allow the ability to have a camp's contents be pulled from multiple
> repos, of possibly different VCS types. In our world that would allow
> for multiple camp projects to pull from a single IC resource for example
> which would make upgrades easier, etc.

Yep.  :)

> 
>> That raises the complexity of things considerably but I like it anyway.
>>  Please act terribly surprised.

You didn't act surprised.

>> RESOURCE IDENTIFICATION
>>
>> Each resource should have a friendly type-name.  Like "localhost",
>> "postgres", "apache", "git", "svn", etc.
>>
>> Any given resource in a camp system deployment should have a name
>> attribute that can be explicitly set in the configuration, but the
>> resource should default to its type-name.  This means we have a decent
>> convention that would work well for relatively simple deployments for
>> which there is only one sort of resource for each given layer of the
>> stack (e.g. one Apache instance, one Postgres instance, one appserver
>> instance, etc.).
>>
>> If there is truly only one use of a particular resource type within a
>> deployment, then the bare type-name can suffice to identify that
>> resource.  However, names can be relative to container resources;
>> resources representing remote hosts, for instance, could all contain a
>> "postgres" resource:
>>   - host1.postgres
>>   - host2.postgres
>>   - ...
>>
>> Using a configuration-specified name only becomes important if you want
>> to name things according to use-case-specific roles ("master_db" versus
>> "slave_db", for instance), or if you need multiple instances of the same
>> resource (an Apache instance that serves static content, and an Apache
>> instance that functions as an appserver for php/mod_perl/mod_python).
>>
> 
> We already talked a bit about this offline but I think it is a nice
> setup. I think a specific resource of a given type should be able to be
> marked as "default" allowing us to determine the default and call it by
> resource type even in the case that there are multiple with names.

Yep.  This is stuff I'm figuring would come in some subsequent iteration
post-hackathon.

>  > RESOURCE DECLARATION
>>
>> Naturally, we need a way to declare resources as existing/mattering, as
>> having relationships with each other, etc.
>>
> 
> Yep. I think we can look at just about any dependency handling in any
> package manager and get some hints.
> 
>> I need to think this one through, more (whereas nothing else defined
>> above requires any further thinking-through whatsoever, obviously).
>> However, a few things probably ought to guide us:
>> * simplicity and clarity
>> * ease of use
>> * common-sense defaults/conventions that fit the simple/common case,
>> easily extended/overridden for the less common cases.
>>
>> If that sounds a little like Rails' "convention over configuration",
>> there's probably a reason.
>>
>> Here's an idea:
>> * under any given camp type, there's a "resources" directory in which
>> the resource definitions reside
>> * any top-level resource (other than "localhost", which is the magical
>> root resource) appears as a directory within this "resources" directory
>> (e.g.. <camp_type>/resources/postgres/;
>> <camp_type>/resources/memcached/; <camp_type>resources/django/; etc.,
>> etc., etc.)
>> * the name of the directory identifies the resource's name relative to
>> its containing resource; in the above examples, that means you get
>> top-level resources named "postgres", "memcached", and "django"
>> * within each directory, some file exists that defines the resource
>> configuration/behavior; it could be named the same for any resource
>> (i.e. "resource", or "config", etc.), or it could be required to have a
>> name that matches the resource name (i.e. same as the directory)
>> * that file would be a Perl module.
>> * Furthermore, it would, when parsed, default to subclassing the
>> resource type of the same name as the resource being defined; so, if the
>> resource is named "postgres", the camp system would look for a
>> "postgres" resource definition class in its standard library of known
>> resource types; if found, the "postgres" resource's configuration module
>> would automatically subclass it.
>> * Furthermore, the module can override this and explicitly specify what
>> type of resource it is.  Hence we have convention and configuration. 
>> Yay.
>> * Still furthermore, the module would automatically be using Moose and
>> whatever helper functions we want available for easy definition of
>> attributes, etc.
>> * A resource can have a subdirectory "resources" that contains still
>> more resource definitions, establishing the container/contained
>> relationship sensibly within the filesystem.
>> * A resource has a subdirectory "templates" that contains template
>> configuration files for rendering when installing resources into a camp.
>>  This is like the <camp_type>/etc/ directory from the existing camp
>> design, but cleanly separated by resource
>>
> 
> Other than the storage of the template files this seems like a lot of
> extra work to traverse and read from the filesystem when it could be
> dropped into a single configuration file that has arbitrary depth
> (XML-esque) fairly easily.

Let me back up a bit and explain my thinking a little better, in case
it's not clear.

It's probably clear from what we've been discussing that each type of
resource (e.g. "Postgres", "MySQL", "Git", "Apache", "Rails", "Django",
"Memcached", etc.) would have its own class, each being a subclass
(directly or indirectly) of the basic resource class.

Within a given camp type, each resource would in turn be another class,
inheriting from the relevant resource type.  This gives us the
opportunity to do custom configuration, behavior, etc. per resource
within a camp type.

When we manipulate a specific camp, each resource within the camp is an
instance of the relevant class.

My outline above for file system organization is probably a bit much.
So, how about something like this instead:

<camp_base>/
    <camp_type>
        camp.pm
        resources/
            apache.pm
            rails.pm
            db_server.pm
            db_server/
                postgres.pm
        templates/
            apache/
            rails/
            db_server/
                postgres/

The "resources" directory is basically laid out like a standard Perl
library directory.

The "templates" directory does everything according to resource
layout/nesting.  Any files/paths within a given resource's template
directory is assumed to be relative to the resource's install location
within the camp.

In "camp.pm", we have the top-level camp resource subclass, which
ultimately contains everything.  It has a magical "localhost" container
resource by default.

What we've conventionally thought of as camp-level configuration
variables would be attributes in camp.pm.

The "config.yml" file simply specifies the subset of total resources to
install for a new camp by default.

When we need to work with a specific camp, we:
* determine the camp number (either provided explicitly or pulled from
DB sequence for a new camp)
* determine the subset of total resources included in the camp (pulled
from config.yml for a new camp, or from the camp's .camp/config.yml
local store for an existing camp)
* load up only the resource modules we need
* instantiate the camp object with the number, relevant camp information
(number, set of resource names to care about, etc.)
* walk the resource class types and instantiate each resource object in
turn; each resource constructor is given its container resource and the
top camp resource object, plus any configuration information specific to
the resource.
* if we're setting up a new camp or refreshing an existing one, we'll
render each resource's templates (using ASP-style templating, like with
Text::ScriptTemplate or a similar one Brian mentioned the name of which
I forget).  Each template gets marshalled into it:
- $camp (the top-level camp object)
- $resource (where "resource" is actually the name of the resource
itself, like $postgres, or $rails, or whatever)

Later on, I think we could have a means of allowing attribute
specification in config.yml, so values set in YAML end up doing some
metaprogramming of the resource subclasses, meaning that people wouldn't
need to have any familiarity with Perl or Moose to do basic camp
deployment setup.  However, I think it makes sense to get the pure Perl
class-based stuff in place and working first; making it easy with YAML
could be done later.

>> A master configuration file for the camp type should specify a listing
>> of resources to include by default in a new camp.  Perhaps that's done
>> in some nice declarative manner, like:
>>
>>  default_resources(
>>      qw(
>>          postgres
>>          apache
>>          memcached
>>          git
>>          django
>>      )
>>  );
>>
>> (Or something equivalent, perhaps in YAML.  Whatever).
>>
>> When operating on a camp (setting up a new one, for instance), the camp
>> system consults the command-line arguments to determine what resources
>> to include; if not specified there, it uses the default.
>>
>>> From there, it loads up only the resource modules that are necessary.
>>
> 
> See Module::Pluggable::Object and autouse (5.10 only??).

I think wrapping Module::Pluggable::Object with a subclass that adds
some particulars would work pretty well.

>> All resources get instantiated as objects in memory with their
>> configuration determined as a first pass, and then templates get
>> rendered and resources installed/launched as a second pass.

Which is consistent with what I outlined above.

>> BASE RESOURCE DEFINITIONS
>>
>> It is basically implied by the above that resource types have a base
>> definition within the camp system.
>>
>> So, there would be one base "resource" module defining the stuff common
>> to all resources.
>>
>>> From there, we would have subclasses of this "resource" module that
>> implement specific resource types (again, "postgres" versus "mysql"
>> versus "apache" versus "fabulous shiny pants" versus whatever else).
>> This is standard inheritance.
>>
>> We do not fret about getting too generic, at least at first.  I.e. no
>> "Resource::Database" from which "Resource::Database::Postgres" and
>> "Resource::Database::MySQL" derive.  We implement postgres and mysql
>> independently, and if common things can obviously be factored out into
>> some common ancestor, we do it.  But we shouldn't assume too many
>> similarities between competing resource types despite them fulfilling
>> the same role (if Postgres and MySQL can honestly be thought of as
>> fulfilling the same role, which is a topic for another day).
>>
>> Because the resource configuration modules in each camp type are Perl
>> modules that subclass these things, we automatically get the opportunity
>> for customization within our camps without having to hack the base type
>> classes themselves.  Joy.  That ought to be a no-brainer, but those of
>> us accustomed to working with some older appservers will likely find
>> this most liberating.
>>
>> I could go on but that's enough for now.
>>
> 
> I'll say :-). So who has some tuits?

Hopefully the blurb about file system layout and more specifics about
the resource class stuff is clearer and shows that I'm thinking about it
more.  Over the course of this week I hope to come up with plans
sufficiently specific to allow for some simple interface documentation
and rough unit tests, so the hackathon can proceed to implement the
stuff documented/tested.

Thanks.
- Ethan

-- 
Ethan Rowe
End Point Corporation
ethan at endpoint.com