[ic] Filters with UTF-8 body
David Christensen
david at endpoint.com
Thu Mar 12 22:15:24 UTC 2009
On Mar 12, 2009, at 4:45 PM, Peter wrote:
> On 03/12/2009 02:26 PM, David Christensen wrote:
>> On Mar 12, 2009, at 4:10 PM, Peter wrote:
>>
>>> On 03/12/2009 12:28 PM, David Christensen wrote:
>>>> I have a commit queued to fix all instances of explicit ranges,
>>>> however, there was something I found which I'm not sure is a wart
>>>> or
>>>> not. From dist/lib/UI/Primitive.pm:
>>>>
>>>> 45:$DECODE_CHARS = qq{&[<"\000-\037\177-\377};
>>> Provided we think it may still be needed, I think the best way to
>>> deal
>>> with this one is:
>>> $DECODE_CHARS = qq{&[<"[[:^print:]]};
>>
>> Does [[:print:]] include only traditional ASCII, or would the unicode
>> code points fall in this range as well? I'm under the impression
>> that
>> extended Unicode characters would fall into the printable class, and
>> hence not be decoded, as implied by the character class, but without
>> knowing the calling context of any code which uses these arguments, I
>> don't know how to verify this. Also, this threw me off because it
>> was
>> a literal string and not a regex (at least directly).
>
> You're quite correct, it does include UTF8 characters:
>
> peter at peter-desktop:~/interchange-utf8$ perl -Mutf8 -le 'print $1 if
> "fooäbar" =~ /([[:print:]]*)/'
> fooäbar
>
> OTOH:
>
> peter at peter-desktop:~/interchange-utf8$ perl -le 'print $1 if
> "fooäbar"
> =~ /([[:print:]]*)/'
> foo
>
> So this may actually be desirable as it could be assumed that if
> utf8 is
> set (as set by MV_UTF8) then it's ok to output those chars directly to
> pages. OTOH, if we want to make sure they get escaped in any case
> then:
>
> $DECODE_CHARS = qq{&[<"[^\040-\176]};
>
> But then it may not work when inserted into a regex character class
> [],
> I don't know. Neither may the above for that matter.
>
> Maybe it's a good thing that isn't used anywhere. (you did grep the
> entire source, right?)
Yep. I looked at the history, and this code appears in Mike's initial
commit of interchange to CVS. There were other changes (through time)
which defined their own $DECODE_CHARS, but always in their own lexical
scope, and never imported. There are currently no other instances to
$DECODE_CHARS other than in the declaration and the package variable
declaration list for that module. I suspect this was just a pre-
version control wart that never came up before. Is there any old,
old, old core module which looks at a specifically-named package
variable on import? Otherwise, I suspect it could just be removed.
Mike?
Regards,
David
--
David Christensen
End Point Corporation
david at endpoint.com
212-929-6923
http://www.endpoint.com/
More information about the interchange-users
mailing list