Lots of shiny spheres is a staple of ray traced images :)
One of the features of AnimRay's templated code approach is that it gives you complete control over layout of the memory of the scene. This and the multi-coloured spheres image below are a test to see how memory layout affects performance.
For the coloured spheres version the memory layout for each sphere roughly corresponds to:
This is a total of 64 bytes per sphere (eight doubles times eight bytes). There's no complex spacial optimisation, so the ray tracer simply goes through these in order tracking which is the first to hit any given ray.
The question I had was whether this was memory or CPU constrained. The spheres are all held in a contiguous memory block which means that the data should be being pulled from RAM at the maximum memory bandwidth. If the intersection calculation is fast enough this would lead the code to become memory rather than CPU bound.
Coloured spheres This can be tested, because it turns out that to check the intersection we only care about half of the data we're pulling. The last four doubles are used for the surface colour calculations and we're only going to do surface calculation on the one sphere that is hit, not all of the misses. These white spheres was a quick and dirty way to reduce the data footprint of the data needed for the ray/sphere intersection tests. Instead of each sphere having its own surface data, the spheres are all grouped together and then the group shares the same colour information.
This means that the data needed is halved and it should run faster if memory bandwidth is the problem. This makes the assumption that the CPU isn't smart enough to skip the embedded colour information, a question I have no idea how to answer. In any case there is no difference in performance between the two versions which means that either the CPU is smart enough to skip data it knows the code won't read, or (far more likely to my mind) the ray/sphere intersection calculation is simply too slow and we are in fact CPU bound.