POTD: ExceptionInInitializerError logger

It’s been a while I’ve done a project of the day, but here it is, the fruit of my yak-shaving today.

The problem I was trying to solve today was java.lang.NoClassDefFoundError: Could not initialize class Xyz. When a Java class fails to initialize, the first attempt to do that causes ExceptionInInitializerError, but subsequent attempts to use that class results in this rather unhelpful java.lang.NoClassDefFoundError: Could not initialize class Xyz without the chained exception.

This problem has been rpeorted to Java for years, but probably JavaSE people doesn’t understand how painful this is in a large modular system, where the initial exception can be reported in so many places — such as one of the 1000 builds you’ve done today, or in an HTTP response to somebody, stderr, logging, or getting swallowed by empty catch block.

So I wrote a little Java agent that uses java.util.logging to log every ExceptionInInitializerError at the point of instantiation. In this way, even on a server, you have one place you can go to check for all errors of this kind. Through j.u.l, you can write a custom Handler to report errors elsewhere, if you want to.

The number of people who will find this tool useful would be probably small, but I hope they’ll really appreciate this little gem. May Google let them find this page.

POTD: random but meaningful name generator

I’m working on the automated blackbox acceptance tests for Jenkins, where I often need to generate random unique names. The code has been using random number generator to generate such names, but as I was debugging test failures, it became painful to remember those random names.

For example, a test might create two new jenkins jobs “random_name_155230″ and “random_name_137204″. Now which one was supposed to be the upstream and which one is downstream? Aside from a few exceptions, humans are generally not good at remembering those random numbers.

So I thought it’d be a lot better if these names are more meaningful, like “constructive_carrot” or “flexible_designer”. That is, if I have a decent sized corpus of N English adjectives and M nouns, I can generate NxM unique names (and induce a few chuckles to whoever see the generated names.)

After a bit of googling, I came across wordnet, and I took a subset of its corpus to come up with a small library that generates human-friendly random names. It has about 600 adjectives and 2400 nouns, resulting in 1.5 million unique names before the generator wraps around.

You’d use this library like this:

RandomNameGenerator rnd = new RandomNameGenerator(0);

while (true)

The code is under the BSD license with no advertisement clause, and the library is on Maven Central. Hope you find it useful.

POTD: Application configuration via Guice binding + Groovy

Often I write my applications with Guice. I also often want to make those applications configurable externally. For example I might inject username and password for that app to talk to another app, I might configure some timeout value, and so on. I make these configuration values available in Guice, so that I can access them wherever I need them. All of this is pretty common in many other places, I’d imagine.

Given that all I’m doing here is to pass configuration values from left to right, I thought it’d be nice if I can write configuration directly as a Guice module by using Guice binder EDSL. Then I won’t have to parse and translate these configuration any more.

And that became my project of the day.

This little library allows you to write Guice binding definitions in a text file:

timeout = 3
bind Payment named "customer" to VisaPayment

From your program, you use GroovyWiringModule to load this configuration file:

Module config = new GroovyWiringModule(new File("/etc/myapp.conf"));
Injector i = Guice.createInjector(
  Modules.override( ... my application's modules ...)

The end result is that the above script gets translated into the following binding:


Using Groovy as the host language for DSL has other benefits. If you are using system properties or environment variables to configure something, you are basically stuck with strings as the only representation of the configuration. With Groovy, I can create a relatively complex object and bind them, or even put some logic to further obtain values from elsewhere:

bind Payment toInstance new VisaPayment(
  cardNumber: "1234-5678-9012-3456",
  expiration: new Date(System.currentTimeInMillis()+TimeUnit.DAYS.toMillis(30),
  cvv: new URL("http://secret.server/cvv").text) 

With the functionality in Guice to override definitions in one module by another, I can also even override bindings defined in programs, for example to get more logging, add a filter, etc.

POTD: cucumber annotation indexer

Cucumber for Java requires that you specify the packages in which your step definitions exist. At runtime, cucumber uses some hack to try to list all the classes in this package (it’s a hack because class loaders never really support the listing operation), loads them one by one, and finds those that have step definition annotations like @When and @Then. This is both poor user experience (can’t you just find my step definitions!?) and poor performance (loading all the classes under a package is expensive.)

So I wrote a library that offers a much better alternative. It uses annotation indexer to create an index of step definitions and hooks at compile time. Thanks to JSR-269, this happens automatically on Java 6 and later. With the index in /META-INF/annotations, runtime can load all the step definitions quite efficiently.

The library contains a Backend implementation, so you should be able to just add it to your project dependency, and cucumber should automatically find this (and thus all your step definitions and hooks.)

By the way, this horrible technique of scanning jar files, listing class files, and finding annotations from there is unfortunately commonly seen in many other libraries. This was a necessary evil in the days of Java 5, but it should really die in this day and age. If you realy on the classpath scanning, please switch to annotation indexer, which provides the backbone functionality of this POTD.

POTD: no more tears

In a modular Java program or in a large Java project that has lots of dependencies, you often end up a version of library that’s different from the version used to compile the code.

This often results in LinkageError, where a method/field that was present when the code was compiled do not exist any more in the version being loaded at the runtime. This restriction applies to seemingly trivial safe changes, such as changing the return type of a method to the subtype of what it used to be.

Previously, the only way to deal with this is not to remove any signatures that matter. In other words, you count on library/module developers to be more disciplined. Over the time, Java programmers have accepted this as a way of life, but there are some notorious offenders (Guava and ASM, I’m looking at you.) Besides, it makes it difficult to evolve code.

I have dealt with this in multiple different ways in the past.

The bridge method injector is an example of static compile-time transformation. This kind of tool is non-intrusive to the users of a module, which is good, but it’s still a tool for library developers to be diligent, and processing at compile time means it has only limited information to operate on.

The bytecode compatibility transformer is an example of runtime transformation. This has a lot more information to let it do the right transformation, but modifying class files on the fly requires a custom classloader, which limits its applicability.

On the way back from my recent trip, I realized there is the 3rd way to achieve the same effect — invokedynamic. You see, invokedynamic is really just a mechanism of deferring the linking to the runtime. This allows me to combine the benefit of two approaches. I can transform class files at the compile time without really deciding how the references are linked. Then at runtime, I can decide how they actually get linked but without a need of runtime transformation. The only downside is that it requires Java7.

But in any case, I thought the idea was clever, so I implemented it as my “project of the day”. Please let me know what you think.

JavaFX needs to be a new edition of Java

Lately, there has been a number of security vulnerabilities reported in Java. The latest one is reported just after a few days of JavaSE 7u11, which by itself a response to another vulnerability. It’s so bad to the point that people are being asked to uninstall Java (yes, just in the browser, but let’s face it, it’s way easier to just uninstall it entirely than to disable the Java plugin from all the browsers.)

I think this issue needs to be addressed at a deeper level pretty quickly, or else what’s still left of the client-side Java would be dead soon (if you think it’s already dead, then there’s nothing else to see in this post, so please move on.)

In fact, one could even argue that the entire Java platform is at risk — people (especially our users and those who are studying programming) do not understand that these issues are the sandbox breaches and therefore do not affect the server-side Java, nor embedded Java.

I think the main lesson should be that the sandbox model of Java is unmaintainable. It’s not just a few isolated bugs here and there, but it’s a structural problem.

For those of you who aren’t very familiar with Java, basically Java sandbox model is that:

  • At runtime the entirety of Java core libraries are present (the exact same code you run on the server-side, including such abilities as forking processes, making network connections to anywhere, and accessing files)
  • Code gets associated with the “code source” information to indicate if it’s trusted or not
  • Library code (like one that accesses files) checks if the caller is trusted or not, and if not certain operations aren’t allowed.

The check in step #3 is implemented in a Java code (called SecurityManager), and that’s where the problem is. All the recent vulnerabilities in Java basically involves runtime reflection to replace the effective SecurityManager instance. This forces the reflection library to be inside the protection wall built by the security layer, which in turns forces many libraries in JavaSE to be inside the wall (such as JMX, which is used for the recent exploit, as well as XML stuff, which I was personally involved in.)

At this point, the protection wall is so long and windy that it’s impossible to defend. And I think it’s really all because we decided to protect reflection.

IMHO, a much simpler way to let untrusted code run is just to enforce checks on the native side, and simply stop trusting anything that runs as Java byte code. After all, we really need to just check stuff that interacts with the outside world, and every one of those has to be done by calling into native code. There’s very few of these calls. In this way, neither JMX nor XML APIs can be exploited, because they simply become just another untrusted code (that just so happen to be available all the time.) I’m no Flash expert, but IIUC, this is how Flash security model works.

In other words, create a new edition of Java, by adding one more to the ME/SE/EE mix. It is a new runtime environment that just shares the same virtual machine and the subset of core libraries. I think this is already consistent with the way “JavaFX” is marketed. While technically speaking it’s just a fancy UI library, it’s really marketed as an RIA platform along ME/SE/EE.

With the sophisticated existing IDEs, static typing, people’s familiarity, and the formidable existing ecosystems, I think Java (VM and the byte code in particular, and to somewhat less extent the language) is still incredibly useful to the large-scale client side development. Especially so if a bit more can be added, such as a better interoperability with DOM and other HTML5 APIs. While JavaScript is getting better day by day, there are still many things Java does better.

Imagine being able to define a large application into modules, with interfaces to define boundaries, mature module systems, efficient delivery and caching mechanisms. Debugger support, multi-threading, runs outside the browser for unit testing, programming in Ruby or Groovy … you get the idea. Think of it as GWT except translation, if you may.

As an avid Java fan, here’s my plea to Oracle — I want this for my Christmas present this year.

COM4J updates

It’s been a while, but I’ve posted a new version of COM4J. COM4J is a library that lets you talk to Windows COM components. Unlike similar libraries lika jacob, which makes you feel like you are working with reflection, COM4J is designed to work with type-safe annotated interfaces, which makes you feel like you are working with Java libraries. COM4J is also built on top of vtable invocation, not on IDispatch, so it can work with components without the dual interface support (boy those words bring back memories!)

I use this library in Jenkins, among other places, to provide a better native integration.

The major change in this version is that it finally has 64bit Java support. The original work was contributed in 2011, but I’ve never cut a release out of it officially. It contains a number of bug fixes, additional conversions support. The code is now on GitHub, and the website is moved to here.

Debian and Maven, a crash of culture

Tim O’Brien posted his frustration about the state of Java packaging in Debian. While I’m not affiliated with Debian nor Ubuntu, I wanted to post something in defense.

I completely understand where Tim is coming from. To the eyes of Java developers, the Java packaging in Debian looks completely Sisyphean. We got all the binaries and their dependencies captured in a machine readable form (aka POM). Can’t we just take them as-is, do a bit of metadata conversion, and make all those artifacts available to the Debian world so that we can just have a single package manager on Debian? If that’s your line of reasoning, you are in for a surprise, because Debian wouldn’t like that.

The reason they don’t do it is well summarized in the Debian Social Contract. It’s the equivalent of the U.S. Constitution for the Debian project — everything they do derive from this. Binary jars are bad for Debian because they don’t give the users the freedom to modify them and create derivative works. Debian is not just a means to let you conveniently install all the programs you need. It’s a pursuit of certain kinds of freedom.

In that sense, it’s somewhat like the “Free Software” movement. They both have some pretty strong guiding principles, and at times, for outsiders they look like they are “wasting” their efforts or being impractical. But the thing is, it’s those guiding principles that attract so many people to the effort, and that’s what keeps the project going and produce all the incredible good stuff that we use everyday. Criticizing them for their principles while you enjoy the benefits of the very same principles feel bit single-handed to me.

I think a better way forward is to write a little program that takes the source jar (which most jars in the Maven central should already have) and the POM, then generate a build script that simply compiles the source jar into the binary jar. The said program should also inspect the jar file to figure out any resource files, and treat them as source files. That way, we can machine-generate Debian source packages. Granted, not all source packages produced that way would pass the requirements of the Debian Freesoftware Guideline, but I bet substantial number of Maven artifacts are simple enough that this will be actually completely satisfactory. And then humans can concentrate on harder ones.

Anyone interested in giving that a shot?

@Override and interface

Jim Leary, my colleague at CloudBees, got me into digging into this.

The question is around putting the @Override annoation on a method that implements an interface method, like this:

public class Foo implements Runnable {
    public void run() {}

As you can see in the javadoc, when @Override was originally introduced, such use was not allowed. javac 1.5 rejects this, too (I verified this in 1.5.0_22.)

Sun intended to change this in 1.6. Javac 1.6 indeed changed the behaviour to allow it (verified this in 1.6.0_26), but someone forgot to update the documentation, as you can see in the Java 6 API reference.

The interesting thing is, if you use Javac 1.6 with “-source 1.5″ and/or “-target 1.5″. In all the possible 3 combinations, the above code compiles. Is this a bug, or is this correct? The interesting thing is that the semantics of @Override is defined in the library, not in the Java language spec. So an argument can be made that this is as it should be — JLS, which governs the -source/-target switches, have nothing to do with this annotation. It’s akin to your code relying on newly introduced types in Java 6. If you compile them with Javac 1.6 with -source 1.5, it won’t raise an error.

But IDEs do seem to tie this with the language level. Jim said Eclipse, when set to language level 1.5, it will flag the above code as an error. I verified that IntelliJ does the same (but only in the editor, as the actual compilation happens via javac so the build will succeed.)

So the end result is ugly. If you open the project in your IDE, you see all these errors, but your build (nor test nor any actual execution, for that matter) will not catch this problem. Even if this was a bug in javac, I don’t see it getting “fixed” — the last thing you want is your security update relese to Java6 break all your builds.

I guess the right thing to do for projects (like Jenkins) is to try to avoid putting @Override on interfaces and as we discover them, remove them. So that people who open the source tree in IDE won’t see those false positive errors. This is a bummer because it’s actually useful to have @Override on interfaces (that’s why the behaviour was changed in 1.6 in the first place!) Does anyone know of a FindBugs rule or some refactoring tool to check this? Or should these be filed as bugs against IDEs? For enforcing something that’s not in JLS?

Quiz answer: memory leak in Java

I posted a little quiz yesterday, and here is the answer.

The short answer is that InputStream needs to be closed. It’s easy to see why if it’s FileInputStream because you know the file handle needs to be released. But in this case, it’s just ByteArrayInputStream. We can just let GC recycle all the memory, right?

Turns out GZIPInputStream (or more precisely Deflater that it uses internally) uses native zlib code to perform decompression, so it’s actually occupying more memory (about 32K-64K depending on the compression level, I believe) on the native side, while its Java heap footprint is small. So if you allocate enough of those, you can end up eating a lot of native memory, while Java heap is still mostly idle. Even though those GZipInputStreams are no longer referenced, it just doesn’t create enough heap pressure to cause the GC to run.

And eventually you eat up all the native memory, and zlib’s malloc fails, and you get OutOfMemoryError (or your system starts to swap like crazy and your system effectively becomes unusable first.)

The other interesting thing to note is that -XX:HeapDumpOnOutOfMemoryError doesn’t do anything in this case. I read the JVM source code and I learned that heap dump only happens when OOME is caused during 3 or 4 specific memory allocation operations, like allocating a Java object, array, GC saturation, and a few other things. There are many other code passes in JVM that reports OOME, like this zlib malloc failure, that doesn’t trigger heap dump. There’s no question HeapDumpOnOutOfMemoryError is useful, but just beware that in some cases it doesn’t get created.

I knew that GZipInputStream is using native code internally, but I didn’t think about it too much when I was putting this original code together. Humans can’t think about all the transitive object graph and its implications.

The other lesson is that now I know why ps sometimes report such a big memory footprint for JVM while jmap reports only a modest usage. The difference is native memory outside Java heap, although unfortunately I don’t think there’s any easy way to check what’s eating the native memory.

My colleague and friend Paul Sandoz pointed out that if GZipInputStream was nice enough to free them up at EOF, it would have saved a lot of hassle, and I think he’s right — one still needs to consider the case where IOException causes the processing to abort before hitting EOF, but it would have helped, because those abnormal cases would be rare. I mean, there’s no harm in doing so, and anything that makes the library more robust in the face of abuse is a good thing, especially when the failure mode is this cryptic.