[ic] Call for testers

Peter peter at pajamian.dhs.org
Fri Mar 13 22:56:01 UTC 2009


On 03/13/2009 06:09 AM, David Christensen wrote:
> On Mar 13, 2009, at 4:29 AM, Peter wrote:
> 
>>> and if it's enabled, see any invalid UTF-8 bytes converted to ?
>>> characters. That's simple, nonfatal at runtime, and yet gently  
>>> encourages
>>> developers to get their sources in the proper UTF-8 encoding.
>> I'm fine with that, and that was the original proposal.  One problem,
>> though, is that while I thought that the Encode module could do that,
>> apparently it can only barf when decoding unicode input, so we would
>> have to find another way to find the invalid chars and change them  
>> over.
> 
> 
> There is a third param to Encode::decode which specifies the behavior  
> of invalid decodes, which by default is to die, but can warn, ignore  
> or silently substitute IIRC.  So I think this could be make to  
> substitute the invalid character marker without much problem.

Yes, you're referring to the CHECK parameter which, unfortunately, works
for every encoding type *except* unicode.

http://search.cpan.org/~dankogai/Encode-2.32/Encode.pm#Handling_Malformed_Data

NOTE: Not all encoding support this feature

    Some encodings ignore CHECK argument. For example, Encode::Unicode
ignores CHECK and it always croaks on error.


Peter





More information about the interchange-users mailing list