Design

Created 2006-10-21T06:18:06.571Z, last edited 2006-11-09T06:39:25.803Z

Like all good systems we have to have a design. Because FOST.3™ features a UML compiler this becomes the natural medium to use to draw up the design and for me to use to think it through to start with.

Experience has taught me that single large databases aren't good for large data volumes and also that although in general keeping partial results and end results that can be calculated from the data is a bad idea, if the calculations are expensive then it is often actually a good idea.

Looking at it I decided to use three different database schemas that would all co-operate in order to handle not only the data analysis but also some of the project/operations management tasks1.

  1. A static database that will hold the data supplied by Netflix. I'm not going to try to optimise this heavily to start with.
  2. A core database which will hold descriptions about the algorithms tried, which parts of the data they've been run on and any parameters that they take as inputs (but not the actual scores).
  3. A number of results databases holding individual scores for algorithm runs.

The final two databases together will allow me to go back and double check results and look for mistakes, of which there are likely to be many.

Sub-pages