This post originated from an RSS feed registered with Scala Buzz
by Erik Engbrecht.
Original Post: Concurrency Benchmarking, Actors, and sbt tricks
Feed Title: Erik Engbrecht's Blog
Feed URL: http://erikengbrecht.blogspot.com/feeds/posts/default/-/Scala
Feed Description: Thoughts and experiments with Scala.
Have you ever noticed that other people's microbenchmarks tend to be hard to run and often impossible to duplicate? And are frequently caveated to the hilt? When it gets down to it, a benchmark is really an experiment, and ideally a scientific experiment. That means all factors that are relevant to the results should be clearly recorded, and the tests should be easy for others to duplicate.
Custom sbt actions for benchmarks
In order to test and run benchmarks on the work I'm doing around creating a managed variant of the JSR-166yForkJoinPool along with supporting infrastructure for use with Scala Actors, I'm creating a test harness that captures a host of environmental factors about how it was run, and writing sbt actions to make it easy to run the benchmarks and automatically permute the variables.
It still needs a lot of work, but I had some trouble figuring out a really basic task so I thought I'd share it. Basically I wanted to build a Task object that consists of several tasks based on information in the project definition and permuted parameters. It actually pretty easy, as you can see in the snippet below from my project definition:
/** this task executes the PingPong benchmark using each available scheduler */
lazy val pingpongbench = pingpongTaskList
/** produces a sequence of run tasks using all the available schedulers */
def pingpongTaskList = {
val pairs = 100
val messagesPerPair = 10000
val tasks = for(sched <- schedulers) yield pingpongTask(sched, pairs, messagesPerPair)
tasks.reduceLeft((a, b) => a && b)
}
You can see the whole file here. Basically Task has an && operator that essentially allows you to concatenate one task with another task. This allows you to build a whole chain of tasks. In the example above, I'm having it run the benchmark once for each scheduler configuration. Soon, I'm going to make it permute other parameters. But right now my test harness isn't playing nicely with the schedulers included in the Scala distribution, so first things first.
There's also one other little customization, which is documented, but I think it's important for benchmarking. By default, sbt runs your code in its own process. This can cause problems with multithreaded code, especially if it doesn't terminate properly. It also means the next benchmark to run has to content with any junk that the previous benchmark left around. So I configured sbt to fork new processes. It just required one line:
override def fork = forkRun
Important variables
Here's what I'm capturing for each run right now so that the results can all be dumped into a big spreadsheet for analysis. I'd like to capture more information about the host machine, such as more information about the CPUs and the loading when the benchmark is being run, but haven't got that far yet. Currently these are all captured from within the benchmark process, mostly using system properties and the Runtime object.
Test Name - obviously needed so that results from multiple benchmarks can be stored in the same file
Scheduler - this is my primary variable right now, I want to run each benchmark with each scheduler while holding everything else constant
# of Cores/Processors - essential so that anyone looking at the results has an idea about the hardware used
Java VM Name - different VMs can perform quite differently
Java VM Version - performance characteristics change from version to version (usually getting better)
Java Version - same reason as above, but this is probably the more publicly known version number
Scala Version - this could be important in the future, as it becomes more common for different projects to be on different version of Scala
OS Name and version - again, it can affect performance
Processor Architecture
Approximate Concurrency (number of simultaneously alive actors) - this allows us to examine concurrency levels versus resource consumption, more concurrency does not necessarily mean that more cores or threads would be helpful
Approximate Parallelism (number of simultaneously runnable actors) - this measures how many cores/threads the benchmark can really keep busy
Approximate Total Messages - this estimates the amount of activity that takes place during the benchmark, generally the benchmarks I'm looking at contain very little logic because they are intended to measure overhead introduced by the framework
Total Wall Clock Time (seconds) - as measured using nanoTime within the benchmark process
Initial Thread and Maximum Observed Thread Count - used to examine automatic expansion of the thread pool
Initial Free Memory and Minimum Observed Free Memory - threads use a fair amount of memory, so performance impacts may show up as pressure on the GC as well has contention for the CPU
Initial and Maximum Observed Total Memory - threads use a lot of memory, so it's important to track usage
Verbose - debugging output pretty much invalidates any of these tests