Created 21st October, 2006 06:18 (UTC), last edited 9th November, 2006 06:39 (UTC)

Like all good systems we have to have a design. Because FOST.3™ features a UML compiler this becomes the natural medium to use to draw up the design and for me to use to think it through to start with.

Experience has taught me that single large databases aren't good for large data volumes and also that although in general keeping partial results and end results that can be calculated from the data is a bad idea, if the calculations are expensive then it is often actually a good idea.

Looking at it I decided to use three different database schemas that would all co-operate in order to handle not only the data analysis but also some of the project/operations management tasks¹ [1This project is not quite like the buisness systems that FOST.3™ normally targets in that the software development and execution is the project, rather than an enabler to the business operation that supports the business. Normally the software supports operations management, and the project management is there to support software development, but here the two terms amount to the same thing.].

  1. A static database that will hold the data supplied by Netflix. I'm not going to try to optimise this heavily to start with.
  2. A core database which will hold descriptions about the algorithms tried, which parts of the data they've been run on and any parameters that they take as inputs (but not the actual scores).
  3. A number of results databases holding individual scores for algorithm runs.

The final two databases together will allow me to go back and double check results and look for mistakes, of which there are likely to be many.


  1. Static data
    1. Class diagram