Checking postcodes

Created 26th February, 2009 03:36 (UTC), last edited 26th February, 2009 04:11 (UTC)

Last time it seems that I managed to go all the way through Thursday thinking it was Wednesday. By the time I was convinced that it actually was Thursday it was far too late to put up the problem. Whoops!

This problem is something a little more practical, relevant to some things I have on this site and as a consequence more open ended. Free the postcode is a project to try to build a geo database of UK postcodes.

If you take a look at the database in Google Earth then it should become apparent that there are a number of data points that are clearly in the wrong places such as SK7 1NJ. Less obviously wrong is SW1A 0AA which is in amongst the NN codes rather than the other SW codes¹ [1A very good starting point to understanding the structure of the UK postcodes is in their Wikipedia article.].

The challenge is to write some code that will process the dataset and produce a list of post codes that are probably incorrect. The more useful you can make the program the better of course.

Some suggestions:

  • A simple starting point would be to work out the centre of each group of postcodes (i.e. all of the BN, BR, SW etc.) and then just list those furthest away in each group.
  • Find some contour data for the UK and list those that are in the sea.
  • There are a large number of graph based algorithms that are suitable. For example, post codes areas don't overlap so you could try to draw contours around then and work out where overlaps occur.
  • If your program uses more than one rule to determine validity it would useful to know which rules produced which parts of the output.
  • Because the UK straddles the prime meridian it's easy to get the sign of the latitude wrong — which of the ones that you mark as incorrect would look more correct if you flip the sign of the latitude?

If this seems too much like hard work, then just having some programs available that do some statistical analysis of the results so far, e.g. maybe that lets us estimate how many outstanding postcodes there are in different areas, would be helpful.