[ic] problem with new filter

Fri May 7 16:30:18 EDT 2004

At 08:55 PM 5/7/2004 +0100, you wrote:

>At 20:22 07/05/2004, you wrote:
>>At 07:22 PM 5/7/2004 +0100, you wrote:
>>
>>
>>>Hi everyone,
>>>
>>>I'm having a spot of bother whilst trying to write a new [filter] op 
>>>that's supposed to return anything before and including the first "." 
>>>(dot) in the string passed to it.
>>>
>>>eg.
>>>
>>>[filter op=sentence]This is sentence one. This is sentence two.[/filter]
>>>
>>>returns:  "This is sentence one."
>>>
>>>
>>>Following the working example in the online docs, I've put this in my 
>>>interchange.cfg and tried to restart IC.
>>>
>>>GlobalSub <<EOR
>>>sub new_filter {
>>>     BEGIN {
>>>         package Vend::Interpolate;
>>>         $Filter{sentence} = sub {
>>>                             my $val = shift;
>>>                             $val =~ m/^*\.//o;
>>>                             return $val;
>>>                             };
>>>     }
>>>}
>>>EOR
>>>
>>>Once the above is added IC refuses to start up giving the following error:
>>>
>>>Starting Interchange: Bad GlobalSub 'new_filter': Bareword "o" not 
>>>allowed while "strict subs" in use at (eval 124) line 6, <GLOBAL> line 17.
>>>BEGIN not safe after errors--compilation aborted at (eval 124) line 9, 
>>><GLOBAL> line 17.
>>>In line 17 of the configuration file '/etc/interchange.cfg':
>>>GlobalSub <<EOR
>>>
>>>The original example for reverse I got from 
>>>http://www.icdevgroup.org/i/dev/docfly.html?mv_arg=ictags04%2e28 is fine 
>>>and IC loads up, so it must be my code that's at fault.
>>>
>>>Sadly my limited Perl knowledge doesn't permit me to understand the 
>>>error to know what I'm doing wrong here.
>>>
>>>I'd be very grateful if someone could point out my silly mistake(s) :)
>>>
>>>Many thanks
>>>
>>>Mark
>>
>>I'm no regex guru, but this may work better:
>>
>>GlobalSub <<EOR
>>sub new_filter {
>>     BEGIN {
>>         package Vend::Interpolate;
>>         $Filter{sentence} = sub {
>>                             my $val = shift;
>>                             $val =~ s/\..*$/\./;
>>                             return $val;
>>                             };
>>     }
>>}
>>EOR
>>
>>Also, if you wrote this on a Windows box and uploaded it, you'll want to 
>>strip out any carriage returns from the file, as they could cause a problem:
>>
>>         perl -i -p -e 's/\r//g' interchange.cfg
>>
>>- Ed
>
>Hi Ed,
>
>Many thanks for your reply.
>
>It's a definite improvement, but it's still not quite right......
>
>The following appears to be matching words that have dots in them instead 
>of matching the dots at the end of a sentence:
>
>$val =~ s/\..*$/\./;
>
>eg. "This handy 1.5ml single use sachet is......."  gets truncated/matched 
>to: "This handy 1."
>
>I've tried changing the pattern match to the following to allow for a 
>space but this seems to return multiple matches and makes the situation 
>worse :(
>
>$val =~ s/\. .*$/\./;
>
>I feel I should be matching from the start of the string and dumping 
>anything after the first ". " (dot<space>). Perhaps with split() ?
>
>eg.   ($val, $crap) = split (/. /, $val);
>
>Many thanks
>
>Mark

Mark -

It's a tough call. You're assuming through it all the you'll have clean 
data formatted properly with a '. ' between each sentence. Unless you 
control all the data entry it's possible that the space will be omitted, 
there will be two spaces, a newline, etc.

I guess something like:

         $val =~ s/^(.+)?\.\s+(.*)$/$1\./;

....might work in most cases, but again I'm not a regex guru. You might 
want to test-drive some regex's with sample data using a [calc] in a page 
first, since it is easier to do interactive tweaking that way, then 
incorporate the best regex into your filter when you are satisfied.

- Ed

===============================================================
New Media E.M.S.              Technology Solutions for Business
11630 Fair Oaks Blvd., #250   eCommerce | Consulting | Hosting
Fair Oaks, CA  95628          edl at newmediaems.com
(916) 961-0446                http://www.newmediaems.com
(866) 519-4680 Toll-Free      (916) 961-0447 Fax
===============================================================