[ic] Filters with UTF-8 body

David Christensen david at endpoint.com
Thu Mar 12 22:15:24 UTC 2009


On Mar 12, 2009, at 4:45 PM, Peter wrote:

> On 03/12/2009 02:26 PM, David Christensen wrote:
>> On Mar 12, 2009, at 4:10 PM, Peter wrote:
>>
>>> On 03/12/2009 12:28 PM, David Christensen wrote:
>>>> I have a commit queued to fix all instances of explicit ranges,
>>>> however, there was something I found which I'm not sure is a wart  
>>>> or
>>>> not.  From dist/lib/UI/Primitive.pm:
>>>>
>>>> 45:$DECODE_CHARS = qq{&[<"\000-\037\177-\377};
>>> Provided we think it may still be needed, I think the best way to  
>>> deal
>>> with this one is:
>>> $DECODE_CHARS = qq{&[<"[[:^print:]]};
>>
>> Does [[:print:]] include only traditional ASCII, or would the unicode
>> code points fall in this range as well?  I'm under the impression  
>> that
>> extended Unicode characters would fall into the printable class, and
>> hence not be decoded, as implied by the character class, but without
>> knowing the calling context of any code which uses these arguments, I
>> don't know how to verify this.  Also, this threw me off because it  
>> was
>> a literal string and not a regex (at least directly).
>
> You're quite correct, it does include UTF8 characters:
>
> peter at peter-desktop:~/interchange-utf8$ perl -Mutf8 -le 'print $1 if
> "fooäbar" =~ /([[:print:]]*)/'
> fooäbar
>
> OTOH:
>
> peter at peter-desktop:~/interchange-utf8$ perl -le 'print $1 if  
> "fooäbar"
> =~ /([[:print:]]*)/'
> foo
>
> So this may actually be desirable as it could be assumed that if  
> utf8 is
> set (as set by MV_UTF8) then it's ok to output those chars directly to
> pages.  OTOH, if we want to make sure they get escaped in any case  
> then:
>
> $DECODE_CHARS = qq{&[<"[^\040-\176]};
>
> But then it may not work when inserted into a regex character class  
> [],
> I don't know.  Neither may the above for that matter.
>
> Maybe it's a good thing that isn't used anywhere. (you did grep the
> entire source, right?)

Yep.  I looked at the history, and this code appears in Mike's initial  
commit of interchange to CVS.  There were other changes (through time)  
which defined their own $DECODE_CHARS, but always in their own lexical  
scope, and never imported.  There are currently no other instances to  
$DECODE_CHARS other than in the declaration and the package variable  
declaration list for that module.  I suspect this was just a pre- 
version control wart that never came up before.  Is there any old,  
old, old core module which looks at a specifically-named package  
variable on import?  Otherwise, I suspect it could just be removed.   
Mike?

Regards,

David
--
David Christensen
End Point Corporation
david at endpoint.com
212-929-6923
http://www.endpoint.com/






More information about the interchange-users mailing list