[ic] URL encoding bug in Vend::Interpolate::esc

Stefan Hornburg (Racke) racke at linuxia.de
Sun May 16 14:34:29 UTC 2010


On 05/16/2010 08:42 AM, Rok Ruzic wrote:
> I have run across a bug in our URL encoding code, i actually found it
> because it escaped more then it had to, but looking at it i saw that
> it's broken both ways.
>
> It uses this match regex for substitution: \W
>
> Thus was probably OK 15 years ago, but now that we are trying to
> support uft8, \W will also let all the wide characters in.
>
> So i suggest we narrow this to the old ascii [^a-zA-Z0-9_]
>
> While we're at it, we might also *not* escape the characters
> Berners-Lee put in his alphanum2 character set, i.e. [\-_.+] and if you
> guys agree, i would also add the stuff he put in his "safe" class, i.e.
> [\$\@\&]. See http://www.w3.org/Addressing/URL/url-spec.txt for details.

This reference is outdated. Please look at this:

http://labs.apache.org/webarch/uri/rfc/rfc3986.html#unreserved

Please adjust your patch accordingly.

Regards
           Racke



-- 
LinuXia Systems => http://www.linuxia.de/
Expert Interchange Consulting and System Administration
ICDEVGROUP => http://www.icdevgroup.org/
Interchange Development Team




More information about the interchange-users mailing list