From ethan at endpoint.com Mon Oct 11 15:08:10 2004 From: ethan at endpoint.com (Ethan Rowe) Date: Mon, 11 Oct 2004 11:08:10 -0400 Subject: [interchange-i18n] HTML::Entities and Unicode Message-ID: <416AA1DA.6020601@endpoint.com> Hi. Back in November of 2003, Chen Naor posted a patch for the HTML::Entities Perl module that made the HTML encoding routines multi-byte safe. I'm wondering if anybody knows this to work or not work with Unicode UTF-8 encoding (specifically, Unicode encoding of traditional Chinese characters, or Farsi, or anything else well outside the Latin1 subset). Forgive me if this question is silly for some reason but I'm quite new to the world of multi-language systems, character encoding, etc. Is this patch still the way to go if you want to get outside of the latin1 character set using Interchange? Thanks very much in advance. -- Ethan Rowe End Point Corporation ethan at endpoint.com From racke at linuxia.de Mon Oct 11 15:18:47 2004 From: racke at linuxia.de (Stefan Hornburg) Date: Mon, 11 Oct 2004 17:18:47 +0200 Subject: [interchange-i18n] HTML::Entities and Unicode In-Reply-To: <416AA1DA.6020601@endpoint.com> References: <416AA1DA.6020601@endpoint.com> Message-ID: <20041011171847.6828ddac.racke@linuxia.de> On Mon, 11 Oct 2004 11:08:10 -0400 Ethan Rowe wrote: > Hi. > > Back in November of 2003, Chen Naor posted a patch for the > HTML::Entities Perl module that made the HTML encoding routines > multi-byte safe. I'm wondering if anybody knows this to work or not > work with Unicode UTF-8 encoding (specifically, Unicode encoding of > traditional Chinese characters, or Farsi, or anything else well outside > the Latin1 subset). Forgive me if this question is silly for some > reason but I'm quite new to the world of multi-language systems, > character encoding, etc. > > Is this patch still the way to go if you want to get outside of the > latin1 character set using Interchange? IMHO this needs thorough investigation and testing. If you happen to have Perl 5.6 running, that makes thing even more difficult. But it should really done. Bye Racke -- LinuXia Systems => http://www.linuxia.de/ Expert Interchange Consulting and System Administration ICDEVGROUP => http://www.icdevgroup.org/ Interchange Development Team From chen at lilux.co.il Mon Oct 11 21:17:05 2004 From: chen at lilux.co.il (Chen Naor) Date: Mon, 11 Oct 2004 23:17:05 +0200 Subject: [interchange-i18n] HTML::Entities and Unicode References: <416AA1DA.6020601@endpoint.com> <20041011171847.6828ddac.racke@linuxia.de> Message-ID: <001201c4afd7$aaa84380$6432a8c0@iceman> Stefan Hornburg wrote: > On Mon, 11 Oct 2004 11:08:10 -0400 > Ethan Rowe wrote: > >> Hi. >> >> Back in November of 2003, Chen Naor posted a patch for the >> HTML::Entities Perl module that made the HTML encoding routines >> multi-byte safe. I'm wondering if anybody knows this to work or not >> work with Unicode UTF-8 encoding (specifically, Unicode encoding of >> traditional Chinese characters, or Farsi, or anything else well >> outside the Latin1 subset). Forgive me if this question is silly >> for some reason but I'm quite new to the world of multi-language >> systems, character encoding, etc. >> >> Is this patch still the way to go if you want to get outside of the >> latin1 character set using Interchange? > > IMHO this needs thorough investigation and testing. If you happen > to have Perl 5.6 running, that makes thing even more difficult. > But it should really done. > > Bye > Racke Hi, If you use perl 5.8.x, while compiling the HTML::Entities it ask if you want to enable UTF-8/Unicode. I never tryed it but logicaly it have to work :). & then you should not use my patch. Good luck, Chen