[ic] Filters with UTF-8 body

Thu Mar 12 21:04:21 UTC 2009

On Mar 12, 2009, at 3:26 PM, Stefan Hornburg wrote:

> David Christensen wrote:
>> On Mar 12, 2009, at 2:10 PM, Peter wrote:
>>
>>> On 03/12/2009 06:12 AM, David Christensen wrote:
>>>> On Mar 12, 2009, at 5:31 AM, Peter wrote:
>>>>
>>>>> On 03/12/2009 03:17 AM, Peter wrote:
>>>>>> On 03/12/2009 03:04 AM, Stefan Hornburg wrote:
>>>>>>> Peter Ajamian suggested that the following code in  
>>>>>>> Interpolate.pm
>>>>>>> causes the problem:
>>>>>>>
>>>>>>> '_filter'               => qr($T{_filter}\s+($Some)\]($Some)),
>>>>>>> my $Some = '[\000-\377]*?';
>>>>>> More specifically $Some, $All, $XSome and $XAll will only parse 8
>>>>>> bit
>>>>>> characters in the range \000-\377.  Not positive about this,  
>>>>>> but I
>>>>>> think
>>>>>> that changing them to the following will work:
>>>>>> my $All = '(?:(?s).*)';
>>>>>> my $Some = '(?:(?s).*?)';
>>>>>> my $XAll = qr{(?:(?s).*)};
>>>>>> my $XSome = qr{(?:(?s).*?)};
>>>>> On further reflection this would probably work just as well and is
>>>>> less
>>>>> complex looking:
>>>>> my $All = '[.\n]*';
>>>>> my $Some = '[.\n]*?';
>>>>> my $XAll = qr{[.\n]*};
>>>>> my $XSome = qr{[.\n]*?};
>>>>
>>>> Heh, one problem:
>>>>
>>>> $ perl -e 'print "matches!" if "foo" =~ /[.\n]/'
>>>> $ perl -e 'print "matches!" if "foo" =~ /(.|[\n])/'
>>>> matches!
>>> Strange.  So may as well go with the (?s) solution as Jon says.  To
>>> add
>>> a few more tests:
>>> peter at peter-desktop:~$ perl -le 'print $1 if "foo\nbar" =~ /((?:(?
>>> s).*))/'
>>> foo
>>> bar
>>> peter at peter-desktop:~$ perl -le 'print $1 if "foo\nbar" =~ /((?:.|
>>> \n)*)/'
>>> foo
>>> bar
>>> peter at peter-desktop:~$ perl -le 'print $1 if "foo\nbar" =~ /(.*)/'
>>> foo
>>> peter at peter-desktop:~$
>>
>>
>> Okay, pushed commit to interchange-utf8; this is a patch that could  
>> be
>> applied to CVS separately if you guys want.  Racke, can you test this
>> against your issue?
>>
>
> It looks like it works now, but I'll have to test that further.
>
> On the other hand, data from the database isn't shown in UTF-8 any  
> more.
> But this happened before applying this patch.

Does it display invalid characters or just nothing?  Just to confirm,  
this is using the latest interchange-utf8, right?  Just to verify,  
have you enabled the corresponding database-specific variable to set  
the encoding that way?  And what encoding is your local database in?

Regards,

David
--
David Christensen
End Point Corporation
david at endpoint.com
212-929-6923
http://www.endpoint.com/