[ic] interchange html4 validation problems

Mike Heins interchange-users@lists.akopia.com
Fri Jun 8 11:45:01 2001


Quoting Jason Kohles (jkohles@redhat.com):
> On Fri, Jun 08, 2001 at 10:07:41AM -0400, Mike Heins wrote:
> > Quoting Jason Kohles (jkohles@redhat.com):
> > 
> > > Personally I avoid GET requests whenever possible just because of the
> > > problems I have had in the past with browsers mangling certain variable
> > > names when they were separated by &.
> > > 
> > 
> > I don't know about that. It is sort of difficult to maintain state without
> > parameters in URLs... 8-)
> > 
> That's why I said whenever possible  =), you can still use ; in links, as
> long as you avoid GET method forms, I'm just overly paranoid after dealing
> with a project where several versions of netscape were mangling a query
> string that included &copy_form, trying to turn the &copy into a nice little
> 8-bit copyright symbol when submitting the URL.
> 

If you want the form to be cacheable, you can't use POST. I know people
would say "you don't want to cache forms", but many people like buttons
instead of links to do things, and the <button ...> relying on JS is
not a compatible option. In addition, if you want to allow a "BACK"
button to get back, you have to do that these days. I loved the fact that
Netscape 4 would let you use BACK to see a previous form post without
resubmitting the form. I personally feel you ought to be able to choose
whether BACK/FORWARD ignores cache constraints or not.

In any case, I will bite the bullet and allow for compatibility.
If CGI.pm is doing it by default, we will have to accept that as more
and more programs will use it that way. Heaven knows what they will do
to generate URLs compatible with other programs, but so be it.

    1. UrlSepChar global config directive allows setting
    something besides &. If it is not one of the three [&;:],
    a warning will be issued. A fatal error will be issued
    if it is longer than one character or a value fitting the
    character class [\w%].

    2. If you keep it as "&", and set the global variable
    MV_HTML4_COMPLIANT, [page ...] and [area ...] will join URL
    strings they generate with &amp;.

    It is probable that this is the best mode to operate in.

    3. All incoming URLs will work either with & or the UrlSepChar, or
    a combination. So if you wrote a chunk of embedded Perl or a UserTag
    that custom-constructed URLs, it should still work.

I do wish it had been brought up with more investigation up front, and in
a less accusatory and combative fashion. Yet we must try to be willing
to find the grain of truth in criticism, chew it, and swallow it. If
it needs to be done, it needs to be done.

FAQ:

1. Why is the directive global?

It can't be on a catalog-by-catalog basis -- this all happens before
catalog config time.

2. Will you put it in past versions?

This will only be in 4.7.x, not 4.6.

3. Should I use this?

Probably not. If you use Interchange routines to generate URLs for other
programs, you had better know what you are doing. In general, you will be
best off not to employ this capability. If you want to pass a validation
suite, consider using the MV_HTML4_COMPLIANT variable.

4. Why don't you change the default?

Since every browser in the world tolerates &, my opinion is that this is an
artificially created tempest in a teapot, created by the failure of the
validation suite writer to provide a "pedantic" mode.

If browsers didn't accept this construct, 98% of the web would break.  So the
validation is pedantic, and should certainly not be flagged in the HTML 4.01
transitional type. Shame on you, W3.

-- 
Red Hat, Inc., 3005 Nichols Rd., Hamilton, OH  45013
phone +1.513.523.7621      <mheins@redhat.com>

I am a great believer in luck, and I find that the harder I work
the more luck I have. -- Thomas Jefferson