[ic] Entire Catalog Static Page Generation

Kevin Old interchange-users@icdevgroup.org
Mon Apr 28 15:00:00 2003


On Mon, 2003-04-28 at 11:04, Kevin Old wrote:
> Hello everyone,
> 
> I have scoured the archives for the answer of how to actually generate
> static pages for every item in my catalog and have found tons of
> questions, but no actual answers.  Every "answer" says to use the IC
> Static Page Generation feature....but no one can seem to tell how to use
> it.
> 
> Does anyone know?  I simply want to create static pages for every item
> in my catalog.  Yes, I know this is inefficient.  I simply need to make
> the site available via static pages for various reasons (CD-ROM, Search
> Engines - (yes, I know IC URL's are formatted so that search engines can
> get to the pages)).
> 
> Can anyone provide a detailed explanation of how to use the Static Page
> Generator in IC?
> 
> My catalog.cfg Static* variables are:
> 
> Static        __CATALOG_STATIC__
> StaticLogged  __LOGGED_STATIC__
> StaticAll     Yes
> StaticDBM     static
> StaticDepth   2
> StaticDir     __SAMPLEHTML__/pages
> StaticFly     Yes
> StaticPath    __SAMPLEURL__/pages
> 
> Any help is appreciated!

Well, if anyone is interested I put together a spider script that will
spider your site completely.


#!/usr/bin/perl
# Mirroring a Document Tree
#
# Mirrors the requested document and all
# subdocuments, using the LWP HTML::LinkExtor module to extract all
# the HTML links.
#
# ----------------------Script I.3.2 mirrorTree pl--------------------
 use LWP::UserAgent;
 use HTML::LinkExtor;
 use URI::URL;
 use File::Path;
 use File::Basename;
 %DONE    = ();

 my $URL = shift;

 $UA     = new LWP::UserAgent;
 $PARSER = HTML::LinkExtor->new();
 $TOP    = $UA->request(HTTP::Request->new(HEAD => $URL));
 $BASE   = $TOP->base;

 mirror(URI::URL->new($TOP->request->url));

 sub mirror {
     my $url = shift;

     # get rid of query string "?" and fragments "#"
     my $path = $url->path;
     my $fixed_url = URI::URL->new ($url->scheme . '://' . $url->netloc
.
$path);

     # make the URL relative
     my $rel = $fixed_url->rel($BASE);
     $rel .= 'index.html' if $rel=~m!/$! || length($rel) == 0;

     # skip it if we've already done it
     return if $DONE{$rel}++;

     # create the directory if it doesn't exist already
     my $dir = dirname($rel);
     mkpath([$dir]) unless -d $dir;

     # mirror the document
     my $doc = $UA->mirror($fixed_url,$rel);
     print STDERR "$rel: ",$doc->message,"\n";
     return if $doc->is_error;

     # Follow HTML documents
     return unless $rel=~/\.html?$/i;
     my $base = $doc->base;

     # pull out the links and call us recursively
     my @links = $PARSER->parse_file("$rel")->links;
     my @hrefs = map { url($_->[2],$base)->abs } @links;

     foreach (@hrefs) {
        next unless is_child($BASE,$_);
        mirror($_);
     }

 }

 sub is_child {
     my ($base,$url) = @_;
     my $rel = $url->rel($base);
     return ($rel ne $url) && ($rel !~ m!^[/.]!);
 }

HTH,
Kevin
-- 
Kevin Old <kold@carolina.rr.com>