Monday, 21 January 2013

Using Caliper for writing Micro Benchmarks

Summary: Caliper is a micro benchmarking framework, here's how to get going with it using Ant + Ivy. Showing a before/after of an existing benchmark and working around the broken dependency/option for measuring memory allocation.
Writing Micro-Benchmarks in Java is famously hard/tricky/complicated. If you don't think so read these articles:
Caliper is an open source project which aims to simplify the issue by... giving you a framework for writing benchmarks! It's still a bit rough around the edges, but already very useful.


Get started

I'll assume you got Ant sorted and nothing more, you'll need to add Ivy support to your build. You can use the following which I cannibalised from the Ivy sample:
And you will need an ivy.xml file to bring in Caliper:
You'll notice the allocation dependency is excluded and that the build has a task in it to download the jar directly from the website... there's good reason for that. Ivy uses maven repositories to get it's dependencies and the java-allocation-instrumenter maven produced jar is sadly broken. You can fix it by downloading it manually from here. There is probably a cleaner way to handle this with Ivy using resolvers and so on, but this is not a post about Ivy, so I won't bother.
You can use an Eclipse plugin to support Ivy integration and bring the jars into your project. You'll still need to get the allocation.jar and sort it out as described below.
Now that we got through the boring bits, let's see why we bothered.

UTF-8 Encoding benchmarks: Before and after

To give context to this tool you need to review how hand rolled benchmarks often look. In this case I'll just revisit a benchmark I did for a previous post measuring different methods of encoding UTF-8 Strings into a byte buffer. The full code base is here but here's the original code used for benchmarking and comparing(written by Evan Jones):
This is quite typical, actually better than most benchmarking code as things go. But it's quite allot of code to basically compare a few ways of achieving the same thing. If you are going to do any amount of benchmarking you will soon grow tired of this boiler plate and come up with some framework or other... So how about you don't bother? Here's how the benchmark looks when Caliper-ized, including the code actually under test:
Note there's hardly any code concerned with the benchmarking functionality and the fact that for less lines of code we also fit in 3 flavours of the code we wanted to compare. Joy!

Running the main give you the following output:
Now, isn't that nice? you got this lovely little ASCII bar on the right, the results units are sorted. Yummy! Here's some command line options to play with:
--trials <n> : this will run several trials of your benchmark. Very important! you can't rely on a single measurement to make conclusions.
--debug : If you want to debug the process this will not spawn a new process to run your benchmark so that you can intercept the breakpoints easily.
--warmupMillis <millis> : how long to warm up your code for.
--runMillis <millis> : how long should a trial run take
--measureMemory : will measure and compare allocations
Isn't that great? sadly the last one (measureMemory) is a bit annoying to get working because:
  1. The dependency jar does not work
  2. Just getting the right jar is not enough because...
  3. You need to set up a magical environment variable: ALLOCATION_JAR
  4. Don't rename the allocation.jar the name is in the manifest and is required for the java agent to work.
Here's an Ant task which runs the UTF-8 benchmark with measureMemory:
And the output illustrates how using String.getBytes will cause allot of extra allocations compared to the other methods:
That took a bit of poking around the internet to sort out, but now you don't have to. And now that it's so easy to write micro benchmarks, there's less of an excuse to not measure before implementing a particular optimization.
Finally, to their eternal credit to writers of Caliper include a page on the project site which highlights some of the pitfalls and considerations around micro benchmarks, so please "Beware the Jabberwock" :P


  1. Thanks for the stuff about the allocation.jar. That was annoying. I went to Maven Central for my artifact but the manifests are entirely different and it does not work.

    If you're using Maven you must download the allocation.jar manually and set up a system scope dependency.

  2. Would you be able to send me the pom.xml or post it as a gist to add to the completeness of the above HowTo?