[ic] Call for testers
Peter
peter at pajamian.dhs.org
Fri Mar 13 22:56:01 UTC 2009
On 03/13/2009 06:09 AM, David Christensen wrote:
> On Mar 13, 2009, at 4:29 AM, Peter wrote:
>
>>> and if it's enabled, see any invalid UTF-8 bytes converted to ?
>>> characters. That's simple, nonfatal at runtime, and yet gently
>>> encourages
>>> developers to get their sources in the proper UTF-8 encoding.
>> I'm fine with that, and that was the original proposal. One problem,
>> though, is that while I thought that the Encode module could do that,
>> apparently it can only barf when decoding unicode input, so we would
>> have to find another way to find the invalid chars and change them
>> over.
>
>
> There is a third param to Encode::decode which specifies the behavior
> of invalid decodes, which by default is to die, but can warn, ignore
> or silently substitute IIRC. So I think this could be make to
> substitute the invalid character marker without much problem.
Yes, you're referring to the CHECK parameter which, unfortunately, works
for every encoding type *except* unicode.
http://search.cpan.org/~dankogai/Encode-2.32/Encode.pm#Handling_Malformed_Data
NOTE: Not all encoding support this feature
Some encodings ignore CHECK argument. For example, Encode::Unicode
ignores CHECK and it always croaks on error.
Peter
More information about the interchange-users
mailing list