Steve Loughran: "Looking at the other areas of work, I think scheduling will get
the most interest from different people. Why? Because its where
people like Platform Computing deliver value. It's not the APIs for
grid computing, it's in distributing work to chosen machines. The
current Job Scheduler works, but it is very simple. Every task
worker node has a number of 'slots' -work is assigned to workers
with spare slots. The scheduler is location aware, looking for the
closest open slot to data, but there is no real examination of how
much work a node is really doing, what the expected workload of the
new job is (based on past experience), or anything resembling
balanced scheduling between users. Over time, that's where there is
going to be fun. Watch that space."
I think scheduling is interesting for another reason. Scheduling seems like a natural bottleneck in a master/worker system. I've was looking at Hadoop for a project in work a while back (and to see if we can use it for general async/batch work) and while it's easy to get hung up on something like the namenode or reducers, or even the "it takes get used to" programming model, I kept coming back to the code that will decide when to put work into the jobservers - worried that it would dominate the system.