[ic] removing new line characters
Kevin Walsh
kevin at cursor.biz
Fri Sep 26 22:22:28 EDT 2003
Paul Jordan [paul at gishnetwork.com] wrote:
> > I need to strip out the new line characters from a chunk of HTML. Can
> > anyone hook me up with a little Perl for that?
> >
> > - Grant
> You can make a filter:
>
> CodeDef nonl Filter
> CodeDef nonl Routine <<EOR
> sub {
> my $val = shift;
> $val =~ s/\n//g;
> $val;
> }
> EOR
>
> [filter op="nonl"] junk with newlines [/filter]
>
That may produce unexpected results. Take the following block of
text as an example:
<p>
This is
a
test
</p>
The above filter will make "<p>This isatest</p>", which may or may
not be what's wanted, depending upon your requirements.
I suggest the following, which will replace spans of CR and LF
characters with a single space:
sub {
my $val = shift;
$val =~ s/[\r\n]+/ /g;
$val;
}
You may want to further enhance that by replacing multiple spaces
with a single space, so you'd end up with this:
sub {
my $val = shift;
$val =~ s/[\r\n]+/ /g;
$val =~ s/ {2,}/ /g;
$val;
}
Of course, that can be simplified to the following, if you don't mind
converting tabs into single-spaces:
sub {
my $val = shift;
$val =~ s/\s+/ /g;
$val;
}
--
_/ _/ _/_/_/_/ _/ _/ _/_/_/ _/ _/
_/_/_/ _/_/ _/ _/ _/ _/_/ _/ K e v i n W a l s h
_/ _/ _/ _/ _/ _/ _/ _/_/ kevin at cursor.biz
_/ _/ _/_/_/_/ _/ _/_/_/ _/ _/
More information about the interchange-users
mailing list