June 2007

Created 30th June, 2007 07:20 (UTC), last edited 30th June, 2007 09:14 (UTC)

I've been doing a lot more thinking about Mahlee™ during this month, but not much in the way of writing code for it.

There is one final thing that Mahlee™ needs. There are a number of complications in scheduling tasks to be done by a group of worker objects. The hardest part is to make the manager lock free, but a trick borrowed from Erlang sorts that aspect out. Erlang doesn't use futures, instead an object has to send a message back containing results for a computation that it carries out.

The same trick can be used in a manager implementation to avoid it having to look inside the futures normally used in Mahlee™. This ensures that the manager object never makes any blocking calls and thus ensures that it can't deadlock.

Here is one suitable implementation:

var Manager = function( workers, message ) {
	var self = this;

	var issue = function() {
		var job, worker;
		while ( self.tasks.length && self.workers.length ) {
			worker = self.workers.pop();
			job = self.tasks.pop();
			self.futures.push( worker[ message ]( job, worker ) );
		}
	}

	this.workers = [];
	for ( var i = 0; i < workers.length ;++i ) {
		this.workers.push( Mahlee.bind( workers[ i ] ) );
	}
	this.tasks = [];
	this.futures = [];

	// Register a task with the manager
	this.task = function( job ) {
		this.tasks.push( job );
		issue();
	}
	// Call back from a worker when it has finished its task
	this.completed = function( job, worker ) {
		this.workers.push( Mahlee.bind( worker ) );
		issue();
	}
	// Return the progress so far
	this.progress = function() {
		var f = this.futures;
		this.futures = [];
		return f;
	}
	// Condition that tells us if the current work has all been done
	this.finished = function() {
		return this.tasks.length == 0 && this.workers.length == workers.length;
	}

}

The manager sends a message to an available worker giving a task to work on and the worker sends a message back when it has completed its task. Here is a worker for resizing images:

function Resize() {
	this.resize = function( job, reference ) {
		try {
			var pathname = FSO.BuildPath( job.source, job.file );
			FHost.echo( pathname );

			GFL.LoadBitmap( pathname );
			GFL.SaveJPEGProgressive = true;
			GFL.SaveJPEGQuality = 85;
			GFL.SaveKeepMetadata = true;

			var ow = GFL.Width, oh = GFL.Height;

			var scales = Math.sqrt( parseFloat( job.megapixels ) * 1000000 / ( ow * oh ) );
			if ( scales < 1 ) {
				var sw = Math.floor( ow * scales + 0.5);
				var sh = Math.floor( oh * scales + 0.5 );
				GFL.Resize( sw, sh );
				GFL.SaveBitmap( FSO.BuildPath( job.destination, FSO.GetBaseName( job.file ) +
					" (" + job.megapixels + " megapixels).jpeg" ) );

				return true;
			} else {
				return false;
			}
		} catch ( e ) {
			FHost.echo( pathname + " : " + e.description );
			return false;
		} finally {
			Mahlee.bind( job.notify ).completed( job, reference );
		}
	}
}

The important part is the finally clause which makes sure that the manager knows it has finished its work and can assign a new task to it.

Here is a controlling program that uses the manager to resize a directory full of images* [*It creates two worker objects so can process the images in half the time on a dual core computer by keeping both cores busy.]:

function main( source, size, destination ) {
	FHost.echo( "Source files: " + source );
	FHost.echo( "Resizing to " + size + " megapixels" );
	FHost.echo( "Saving images to " + ( destination || source ) );
	var workers = [
		Mahlee.create( Resize ),
		Mahlee.create( Resize )
	];
	var manager = Mahlee.create( "Manager", workers, "resize" );
	var folder = FSO.GetFolder( source );
	var number = 0;
	for ( var file = new Enumerator( folder.Files ); !file.atEnd(); file.moveNext() ) {
		manager.task( {
			"notify": manager,
			"source": source, 
			"file": file.item().Name,
			"megapixels": size,
			"destination": destination || source
		} );
	}

	var done = 0;
	var results = [];
	do {
		results = results.concat( manager.progress().result() );
		if ( Mahlee.token( results.pop() ).result() ) ++done;
	} while ( results.length || !manager.finished().result() );
	FHost.echo( "Processed " + done + " images" );
}

The problem here is the complication of the loop that waits for the work to complete. We can simplify it somewhat at the cost of introducing a busy wait:

	FHost.echo( "Start busy wait" );
	while ( !manager.finished().result() );
	FHost.echo( "End busy wait" );

	var done = 0, results = manager.progress().result();
	for ( var i = 0; i < results.length; ++i )
		if ( Mahlee.token( results[ i ] ).result() ) ++done;
	FHost.echo( "Processed " + done + " images" );

This is obviously far from ideal. What Mahlee™ needs, and I haven't had time to write yet, is a condition on the future which blocks until the return is what we want. Mahlee™ needs to recheck the condition after each message that the manager object handles until the condition is met. The busy wait loop can be replaced with something like this:

	manager.finished().until();

What exactly until() will take as its arguments is still an open matter. It should probably be some sort of predicate function.