The “Correct” way to process forms

Created 6th February, 2006 16:57 (UTC), last edited 1st September, 2006 08:41 (UTC)

There are all sorts of reasons for having forms on your sites. One for every day of the year. There are a multitude of web pages describing how to craft the HTML for the forms. What there isn't though is enough attention paid to how these forms are handled by servers.

The very fact that there is a form means that you want the server to do something more or less clever with information that your web visitors provides. What you expect the server to do though determines some of the basics of how you should code the HTML forms and how you should architect the script that processes the form submission.

The options

First of all we should take a look at the HTML form element itself. There are no significant changes between HTML 4.01, XHTML 1.0 and XHTML 1.1 that we need to worry about other than the allowed cases. As HTML 4.01 will support the lower case mandated by XHTML then I'll be using the XHTML notations here.

The form attributes that are of interest to us are:

  • method—The HTTP method that is to be used by the user agent when it submits the form.
  • action—The URL that the form is submitted to. This will typically be an ASP page or a CGI script or executable. What you put in here as really of no further interest in this discussion.
  • enctype—The encoding that the form should use. Not really of interest here, but we need to know about it as sometimes the choices we make will affect what we can and what we should use here.

The get method

The essential difference between them is that the user agent may cache the response for a get and show that to the user (there are some provisos to this spelled out in the HTTP specification that I'm not going to worry about for our purposes). The important point here is that the user agent may show the result of the last fetch again without ever talking to the server¹ [1The actual situation is somewhat more complex than this makes out. There are a number of factors including the exact HTTP headers and the HTTP protocol version. In any case not all browsers follow the protocol exactly. In general you shouldn't assume any specific behaviour with get and queries.].

When the user agent requests anything from the server as the result of clicking on a link or a having a URL typed in then it will use this method. Proxy servers may provide the result to many users and even though a user clicks on the submit button several times the user agent may never talk to the server again unless the user has made changes to the form.

This means that this is normally the correct method to use for a site search form. This is because the same search terms will return the same content each time they are used and the same search terms submitted by different users will return the same results (if you are customising the page based on the logged in user then you should already know what to do about this—if you don't then you shouldn't be writing the software).

The post method

The HTTP specification states very clearly that the response to a post may not be cached. This means that every time the user clicks on the submit button the user agent must ask the server again. A proxy must also honour this requirement.

What this means is that you should use this method where the form is actually updating something on the server. For example, when a user submits a newsgroup post or a comment. This is also what you should use for refresh style buttons on pages which have dynamic content that updates often.

The application/x-www-form-urlencoded encoding type

This form encoding takes each field (of whatever sort) and encodes the names and the values in the same way that a standard query string is encoded for a get request. This is the default form encoding and the format of the data sent is described in the HTML 4 specification.

Unlike a query string though the data is sent in the HTTP request body rather than appended to the URL.

The multipart/form-data encoding type

multipart/form-data form encoding is more complex than application/x-www-form-urlencoded, but it has one great advantage which is that it works with file uploads. The encoding here is MIME and is (according to the HTML 4 specification) defined by RFC 2045.

Again the data are sent in the HTTP request body. This method has higher overhead than the URL encoding so by using this you are forcing the browser to send more data to the server.

Form processing

Most forms need to do some processing. After all if no processing is being done then what is the form for? This means that we ought to spend a little time thinking about how this forms processing should be done with respect to the pages before and after.

The single most important thing to consider is whether or not the URL that form takes you to should be bookmarkable and cacheable. For example, by using get for a search form you make the result of that search both cacheable and bookmarkable. Many of your site's users will like to do this.

post on the other hand should be used where the form submission changes the state of the server. You should use it for example whenever a form does something. In some ways the difference between get and post is like the difference between asking a question and making a command. If many people are asking the same question at nearly the same time then it is probably the right thing to give them all the same answer. However many people telling the server to do the same thing at nearly the same time have to be responded to individually so that they know if their command was listened to or not.² [2There is a slightly different view of get and post over on Subbu Allamaraju's blog. He comes from a different direction but essentially has the same conclusion and with diagrams!]

For the encoding type the choices are a little simpler. If the form allows file uploads to the server then you must use multipart/form-data otherwise you don't need to specify this attribute.

One final issue with forms is whether or not the script that processes the form should generate any HTML or not. Generally if the script does something where the results should be bookmarkable then the script ought not to generate the HTML, but rather redirect to a page that can be fetched multiple times.

Errors should always be generated by the script though. You need a very good reason not to present the form again alongside the error and an even better reason not to have the form filled in with the input the user tried last time (with the exception of passwords).

Examples

Here are some examples of the uses of forms and recommendations on how they should be done.

  • Site search—For this get is probably the best choice, as seen on major search engines. It allows results to be bookmarked and it allows proxies to serve the same search results to many people.
  • Contact form—Use post for these. The script should generate HTML to tell the user that the feedback has been saved or show the reason for it not being saved.
  • Comments or bulletin post—Use post because these are saving new content on the server. The script should redirect to the page where the comment or post will appear so that the user can easily bookmark it. Alternatively you can display a confirmation screen which then links to the page where the comment/post may be viewed. Of course the script should generate any error messages.
  • Booking form—Again post is the way to go. The script should generate any errors, but the confirmation screen needs to be bookmarkable, so, if successful the script really ought to redirect to a seperate confirmation page. Don't forget that this confirmation page is probably going to require securing so that only the original user can view it.
  • Validator—These utility pages which perform processing of some other web page should use get so that their results can be bookmarked.
  • Login—This should go without saying, login pages must use post so that the password is not exposed on the address bar and can't end up in referrer logs³ [3I thought this was too obvious to mention when I first wrote this article. Now I know better.].

Whenever you use a form you need to give some thought to these issues as well as the processing or task that the form is actually doing.


Categories: