[ic] Entire Catalog Static Page Generation
Kevin Old
interchange-users@icdevgroup.org
Mon Apr 28 15:00:00 2003
On Mon, 2003-04-28 at 11:04, Kevin Old wrote:
> Hello everyone,
>
> I have scoured the archives for the answer of how to actually generate
> static pages for every item in my catalog and have found tons of
> questions, but no actual answers. Every "answer" says to use the IC
> Static Page Generation feature....but no one can seem to tell how to use
> it.
>
> Does anyone know? I simply want to create static pages for every item
> in my catalog. Yes, I know this is inefficient. I simply need to make
> the site available via static pages for various reasons (CD-ROM, Search
> Engines - (yes, I know IC URL's are formatted so that search engines can
> get to the pages)).
>
> Can anyone provide a detailed explanation of how to use the Static Page
> Generator in IC?
>
> My catalog.cfg Static* variables are:
>
> Static __CATALOG_STATIC__
> StaticLogged __LOGGED_STATIC__
> StaticAll Yes
> StaticDBM static
> StaticDepth 2
> StaticDir __SAMPLEHTML__/pages
> StaticFly Yes
> StaticPath __SAMPLEURL__/pages
>
> Any help is appreciated!
Well, if anyone is interested I put together a spider script that will
spider your site completely.
#!/usr/bin/perl
# Mirroring a Document Tree
#
# Mirrors the requested document and all
# subdocuments, using the LWP HTML::LinkExtor module to extract all
# the HTML links.
#
# ----------------------Script I.3.2 mirrorTree pl--------------------
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
use File::Path;
use File::Basename;
%DONE = ();
my $URL = shift;
$UA = new LWP::UserAgent;
$PARSER = HTML::LinkExtor->new();
$TOP = $UA->request(HTTP::Request->new(HEAD => $URL));
$BASE = $TOP->base;
mirror(URI::URL->new($TOP->request->url));
sub mirror {
my $url = shift;
# get rid of query string "?" and fragments "#"
my $path = $url->path;
my $fixed_url = URI::URL->new ($url->scheme . '://' . $url->netloc
.
$path);
# make the URL relative
my $rel = $fixed_url->rel($BASE);
$rel .= 'index.html' if $rel=~m!/$! || length($rel) == 0;
# skip it if we've already done it
return if $DONE{$rel}++;
# create the directory if it doesn't exist already
my $dir = dirname($rel);
mkpath([$dir]) unless -d $dir;
# mirror the document
my $doc = $UA->mirror($fixed_url,$rel);
print STDERR "$rel: ",$doc->message,"\n";
return if $doc->is_error;
# Follow HTML documents
return unless $rel=~/\.html?$/i;
my $base = $doc->base;
# pull out the links and call us recursively
my @links = $PARSER->parse_file("$rel")->links;
my @hrefs = map { url($_->[2],$base)->abs } @links;
foreach (@hrefs) {
next unless is_child($BASE,$_);
mirror($_);
}
}
sub is_child {
my ($base,$url) = @_;
my $rel = $url->rel($base);
return ($rel ne $url) && ($rel !~ m!^[/.]!);
}
HTH,
Kevin
--
Kevin Old <kold@carolina.rr.com>