Annoying HTML
Created 2006-12-18T15:01:52.098Z, last edited 2007-07-15T12:46:43.829Z
Writing the parser for the content management module for FOST.3™ was a very complex affair, and it's fair to say that it still isn't without its rough edges. Its best feature though is that it can turn fairly unstructured Mediawiki style mark-up into correct, validating XHTML which is also semantic.
Here are just three of the more complex issues to do with generating correct HTML from the perspective of writing a general content management system like FOST.3™.
I'm going to list some ideas for making these easier to deal with in no particular order. They're numbered not because I think some solutions are better than others, but to make them easier to refer back.
You can't embed block level elements within <p>
elements
The humble paragraph marker must be one of the most used elements around. Much early HTML used the tag as a paragraph separator rather than the more correct paragraph surround, but of the paragraphs in this page are correctly surrounded by <p>
elements and this should be the case for every web site that cares about semantic mark up.
The difficulty comes when we allow arbitrary inclusion of other content. The footnotes used on this page1 are placed in line within the paragraph where they occur, but because the paragraph is surrounded by <p>
tags we cannot use any block level tags within the footnote text — at least if we want the page to validate.
FOST.3™ uses a complicated system that relies on co-operation between in-line elements and the CSS to style them as block level elements.
Solutions
- The most obvious solution is to allow a
<p>
tag to include block level tags. There's probably a lot of good reasons for assuming this to be a bad idea. - The simplest way to solve this with the existing HTML standard is to not use
<p>
tags, but to style, for example, a<div class=“paragraph”>
tag to look like the<p>
tag. It isn't semantic though. - Another way to handle this would be to have a new HTML tag, maybe called
<aside>
. This would allow the embedding of out-of-band HTML to be placed at any location.<aside>
would be an in-line tag that would be able to contain block-level tags2. User agents could either:- float the content off to one side; or
- draw it in a pop-up window; or
- overlay it on the content (like the way many user agents handle
title
attributes).
Nested lists must be within the <li>
tag
Lists are extremely hard to generate properly. FOST.3™ translates the Mediawiki mark up into an internal representation which is then translated back out to Mediawiki when a page is edited3.
This is the legal way to create a nested bullet list:
<ol><li>First bullet
<ol><li>Nested bullet</li></ol></li>
<li>Second bullet</li></ol>
Note that the nested list must be contained within the <li>
element. This means that different logic must be used to generate the outermost list and the nested lists because the <p>
elements cannot contain the outermost <ul>
/<ol>
element.
Solutions
- Allow the
<ul>
and<ol>
tags to be included within the previous<p>
tag. This has the advantage that now the list always starts within the logically outer content carrying tag. - Simply allow
<ul>
and<ol>
tags to contain child<ul>
and<ol>
tags as well as<li>
tags. Most user agents already deal properly with this situation because so many web sites contain it already. It should be a relatively simple affair to codify this in the standard:
<ol><li>First bullet</li>
<ol><li>Nested bullet</li></ol>
<li>Second bullet</li></ol>
Placement of <input type=“hidden” … />
is the same as other <input>
sub-types
One of the advantages of using a framework is that it should be able to handle a lot of the common functions needed in the interaction between HTTP and HTML. One of these is preserving state across requests and making sure that interactions that are idempotent from the user's point of view actually are idempotent. The way to deal with this is most often done through hidden fields within forms.
The problem is of course that the part of the code that deals with automating the output of these hidden fields is rarely in the same place as the code that deals with the other fields that go into the form. For example, hidden fields used to guard against double submission will often be generated by the framework at the point where the <form>
element is generated.
The locations that it is legal to have a <form>
element isn't the same as the locations that are allowed <input type=“hidden” … />
elements. This means that the framework must remember the hidden fields and place them when it places other <input>
elements. This in turn makes the location of all of the <input>
elements unpredictable which means that using <input>
nested inside <label>
elements, for example, is liable to break.
Solutions
- Provide a special container tag which is only allowed to contain
<input type=“hidden” … />
elements and is not rendered. - Allow
<input type=“hidden” … />
elements to appear anywhere in the HTML stream. - Allow
<input type=“hidden” … />
elements to appear immediately after a<form>
tag or before a</form>
tag.
© 2002-2025 Kirit & Tai Sælensminde. All forum posts are copyright their respective authors.
Licensed under a Creative Commons License. Non-commercial use is fine so long as you provide attribution.