Archive

Archive for the ‘Uncategorized’ Category

Groovy SecureASTCustomizer is harmful

April 27th, 2012

I was looking at Groovy DSL slides from Guillaume Laforge when I noticed about SecureASTCustomizer, which led me to what appers to be the original introduction post from Cedric.

Being able to lock Groovy execution down would enable me to use Groovy in more places, so I did a bit of experiment. But I regrettably have to conclude that this feature is practically unusable. In fact I’d argue that it is actively harmful, as it gives a programmer a false comfort.

The fundamental problem is that Groovy is a dynamic language, yet SecureASTCustomizer works by looking at Groovy AST statically. So it’s very easy for Maloney, a malicious attacker, to bypass many of the checks. For example, Cedric’s post talks about how it can let you blacklist/whitelist classes that can be imported. Well, the actual goal of the programmer is to prevent the class from getting used, and not to get them imported. And sure enough, even if I white list the importable classes to java.lang.Math, Maloney can still do Math.class.forName('some.secret.class') to get a reference to a Class, and therefore render the import restrictions pointless.

Then I thought about disabling access to the getClass() method. But this doesn’t work well either because Groovy allows 5."class" and 5["class"] to access properties. To statically prevent this, you’d have to prohibit the array access and a string literal, but that doesn’t leave much of a language!

Many other checks offered by SecureASTCustomizer are equally useless. For example, there’s receiversClassesWhiteList that’s supposed to let you restrict the methods the script can invoke by whitelisting the declaring class of the method. But once again, this is a static check! Groovy compiler doesn’t work very hard to infer types, so much so that it can’t even guess that x=="foo" is a boolean type. Therefore, if you actually try using receiver whitelisting, pretty quickly you’ll discover that you either have to allow Object as a receiver (because Groovy assigns this to every expression when it couldn’t infer the type), which will basically renders the point of whitelisting moot as you can now invoke any method by simply casting the expression to Object.

If you go the other route and disallow Object as a receiver. That will reject almost all non-trivial scripts. Or I suppose you can prohibit a method call, but that doesn’t leave much of a language, does it.

Like I said, I think this is fundamentally a futile approach. You just can’t perform any meaningful static sandboxing on a dynamic language.

Instead, what I think is more fruitful is a dynamic checking. For example, what if the compile-time AST transformation intercepts every method call and property access? That is, transform z=x.y as z=checkedGet(x,"y"), transform x.y=5 into checkedSet(x,"y",5), and finally transform o.foo(a,b,c) into checkedCall(o,"foo",[a,b,c]). This does make execution a whole lot slower, but I can now perform meaningful checks. And unlike Java SecurityManager, this is a lot more friendly to libraries and web applications, who cannot take over the entire JVM.

I haven’t actually put together such an AST transformer, but this doesn’t look too hard.

What do people think?

Uncategorized , ,

COM4J updates

April 27th, 2012

It’s been a while, but I’ve posted a new version of COM4J. COM4J is a library that lets you talk to Windows COM components. Unlike similar libraries lika jacob, which makes you feel like you are working with reflection, COM4J is designed to work with type-safe annotated interfaces, which makes you feel like you are working with Java libraries. COM4J is also built on top of vtable invocation, not on IDispatch, so it can work with components without the dual interface support (boy those words bring back memories!)

I use this library in Jenkins, among other places, to provide a better native integration.

The major change in this version is that it finally has 64bit Java support. The original work was contributed in 2011, but I’ve never cut a release out of it officially. It contains a number of bug fixes, additional conversions support. The code is now on GitHub, and the website is moved to here.

Uncategorized , , ,

Debian and Maven, a crash of culture

March 16th, 2012

Tim O’Brien posted his frustration about the state of Java packaging in Debian. While I’m not affiliated with Debian nor Ubuntu, I wanted to post something in defense.

I completely understand where Tim is coming from. To the eyes of Java developers, the Java packaging in Debian looks completely Sisyphean. We got all the binaries and their dependencies captured in a machine readable form (aka POM). Can’t we just take them as-is, do a bit of metadata conversion, and make all those artifacts available to the Debian world so that we can just have a single package manager on Debian? If that’s your line of reasoning, you are in for a surprise, because Debian wouldn’t like that.

The reason they don’t do it is well summarized in the Debian Social Contract. It’s the equivalent of the U.S. Constitution for the Debian project — everything they do derive from this. Binary jars are bad for Debian because they don’t give the users the freedom to modify them and create derivative works. Debian is not just a means to let you conveniently install all the programs you need. It’s a pursuit of certain kinds of freedom.

In that sense, it’s somewhat like the “Free Software” movement. They both have some pretty strong guiding principles, and at times, for outsiders they look like they are “wasting” their efforts or being impractical. But the thing is, it’s those guiding principles that attract so many people to the effort, and that’s what keeps the project going and produce all the incredible good stuff that we use everyday. Criticizing them for their principles while you enjoy the benefits of the very same principles feel bit single-handed to me.

I think a better way forward is to write a little program that takes the source jar (which most jars in the Maven central should already have) and the POM, then generate a build script that simply compiles the source jar into the binary jar. The said program should also inspect the jar file to figure out any resource files, and treat them as source files. That way, we can machine-generate Debian source packages. Granted, not all source packages produced that way would pass the requirements of the Debian Freesoftware Guideline, but I bet substantial number of Maven artifacts are simple enough that this will be actually completely satisfactory. And then humans can concentrate on harder ones.

Anyone interested in giving that a shot?

Uncategorized , , ,

POTD: Package renamed ASM

March 3rd, 2012

Today’s project of the day is a package renamed ASM library.

I previously wrote about a problem in otherwise quite useful ObjectWeb ASM library. Namely, it breaks backward compatibility in such a way that badly breaks apps/libraries that use them. In that post, I wrote about two proposals to fix the pain point. One is to include debug information, which has been fixed starting 3.x. But the other package renaming hasn’t been addressed in the last 2 years.

This has been in the back of my head, but it never came high enough until recently I had another NoSuchMethodError caused by ASM3. One of the servlet containers shipped ASM3 and it broke Jenkins that bundles ASM2. Between this and ASM4 release for JavaSE 7, which will likely gain popularity over time, I finally decided to fix this problem once and for all, in a way that everyone else can reuse.

The solution, as explained in the original post, is to put each major ASM version in its unique package name. I pakage-renamed ASM2 in org.kohsuke.asm2, ASM3 in org.kohsuke.asm3, and ASM4 in org.kohsuke.asm4. The package name only contains the major version because I trust the ASM developers to maintain compatibility between minor releases (and I believe they’ve maintained this thus far.)

These artifacts are available in org.kohsuke:asm2:2.2.3, org.kohsuke:asm3:3.3.0, and org.kohsuke:asm4:4.0 — these are packaged renamed by jarjar and I tested them somewhat to make sure it’s not downright broken.

If library A depends on asm2 and library B depends on asm3, and someone else uses both A and B, everything will work fine because asm2 and asm3 are in the different pcakages. If A depends on one version of asm3 and B depends on another version of asm3, then the transitive dependency resolution will pick up the newer version and both will work (or you end up implicitly picking up one version over another, and you don’t enjoy the latest bug fixes, but at least it won’t die with LinkageError.)

When you search “asm3″ in Maven central, you see a large number of renamed ASM3 in various projects. Hopefully that madness will stop now!

The other interesting thing about this effort is that I’ve used Gradle to package rename them. Lately I’ve been using Gradle for publishing transformed artifacts like these to a Maven repository, and I like it a lot. But more about that in another post.

potd, Uncategorized

@Override and interface

January 27th, 2012

Jim Leary, my colleague at CloudBees, got me into digging into this.

The question is around putting the @Override annoation on a method that implements an interface method, like this:

public class Foo implements Runnable {
    @Override
    public void run() {}
}

As you can see in the javadoc, when @Override was originally introduced, such use was not allowed. javac 1.5 rejects this, too (I verified this in 1.5.0_22.)

Sun intended to change this in 1.6. Javac 1.6 indeed changed the behaviour to allow it (verified this in 1.6.0_26), but someone forgot to update the documentation, as you can see in the Java 6 API reference.

The interesting thing is, if you use Javac 1.6 with “-source 1.5″ and/or “-target 1.5″. In all the possible 3 combinations, the above code compiles. Is this a bug, or is this correct? The interesting thing is that the semantics of @Override is defined in the library, not in the Java language spec. So an argument can be made that this is as it should be — JLS, which governs the -source/-target switches, have nothing to do with this annotation. It’s akin to your code relying on newly introduced types in Java 6. If you compile them with Javac 1.6 with -source 1.5, it won’t raise an error.

But IDEs do seem to tie this with the language level. Jim said Eclipse, when set to language level 1.5, it will flag the above code as an error. I verified that IntelliJ does the same (but only in the editor, as the actual compilation happens via javac so the build will succeed.)

So the end result is ugly. If you open the project in your IDE, you see all these errors, but your build (nor test nor any actual execution, for that matter) will not catch this problem. Even if this was a bug in javac, I don’t see it getting “fixed” — the last thing you want is your security update relese to Java6 break all your builds.

I guess the right thing to do for projects (like Jenkins) is to try to avoid putting @Override on interfaces and as we discover them, remove them. So that people who open the source tree in IDE won’t see those false positive errors. This is a bummer because it’s actually useful to have @Override on interfaces (that’s why the behaviour was changed in 1.6 in the first place!) Does anyone know of a FindBugs rule or some refactoring tool to check this? Or should these be filed as bugs against IDEs? For enforcing something that’s not in JLS?

Uncategorized , ,

DNS outage with jenkins-ci.org

December 28th, 2011

As Tyler summarized it in this e-mail thread, currently there’s an DNS outage going on with jenkins-ci.org that makes all name resolutions fail.

The current ETA is right around the new year, but in the mean time, you can add our temporary DNS server into your /etc/resolv.conf via “nameserver 140.211.15.121″.

Once again our apologies for this outage.

Uncategorized

Quiz answer: memory leak in Java

November 4th, 2011

I posted a little quiz yesterday, and here is the answer.

The short answer is that InputStream needs to be closed. It’s easy to see why if it’s FileInputStream because you know the file handle needs to be released. But in this case, it’s just ByteArrayInputStream. We can just let GC recycle all the memory, right?

Turns out GZIPInputStream (or more precisely Deflater that it uses internally) uses native zlib code to perform decompression, so it’s actually occupying more memory (about 32K-64K depending on the compression level, I believe) on the native side, while its Java heap footprint is small. So if you allocate enough of those, you can end up eating a lot of native memory, while Java heap is still mostly idle. Even though those GZipInputStreams are no longer referenced, it just doesn’t create enough heap pressure to cause the GC to run.

And eventually you eat up all the native memory, and zlib’s malloc fails, and you get OutOfMemoryError (or your system starts to swap like crazy and your system effectively becomes unusable first.)

The other interesting thing to note is that -XX:HeapDumpOnOutOfMemoryError doesn’t do anything in this case. I read the JVM source code and I learned that heap dump only happens when OOME is caused during 3 or 4 specific memory allocation operations, like allocating a Java object, array, GC saturation, and a few other things. There are many other code passes in JVM that reports OOME, like this zlib malloc failure, that doesn’t trigger heap dump. There’s no question HeapDumpOnOutOfMemoryError is useful, but just beware that in some cases it doesn’t get created.

I knew that GZipInputStream is using native code internally, but I didn’t think about it too much when I was putting this original code together. Humans can’t think about all the transitive object graph and its implications.

The other lesson is that now I know why ps sometimes report such a big memory footprint for JVM while jmap reports only a modest usage. The difference is native memory outside Java heap, although unfortunately I don’t think there’s any easy way to check what’s eating the native memory.

My colleague and friend Paul Sandoz pointed out that if GZipInputStream was nice enough to free them up at EOF, it would have saved a lot of hassle, and I think he’s right — one still needs to consider the case where IOException causes the processing to abort before hitting EOF, but it would have helped, because those abnormal cases would be rare. I mean, there’s no harm in doing so, and anything that makes the library more robust in the face of abuse is a good thing, especially when the failure mode is this cryptic.

Uncategorized ,

Quiz time: memory leak in Java

November 3rd, 2011

Today I had an interesting debugging exercise, and I felt like I learned a new lesson that’s worth sharing with the rest of the world.

I had the following code, which takes a small-ish byte array and deserializes it into an object (let’s say someNotTooBigData is something like new byte[]{1,5,4, ... some data... }.) Seems innocent enough, no?

voidObject foo() {
	byte[] buf = someNotTooBigData();
	return new ObjectInputStream(new GZIPInputStream(
	    new ByteArrayInputStream(buf))).readObject();
}

But when this is executed frequently enough, like while(true) { foo(); }, it creates OutOfMemoryError. Can you tell why? I’ll post the answer tomorrow.

Uncategorized

Ken Cavanaugh had passed away

August 25th, 2011

I’ve just learned that Ken Cavanaugh had passed away. He was my colleague back in Sun, and we had worked on a few small projects together.

When I joined Sun, he was already THE CORBA guy AFAIK, and when I left Sun, AFAIK he was still THE CORBA guy. And I was at Sun for, like 8 years. Not many people have a passion that lasts that long for any given field of technology, and that left me a rather strong impression. I always hoped that I could be like that when I get to his age.

Certain people emits an aura of confidence/reassurance. You can tell right away that he knows what he’s doing/saying when he does/says something. Ken was one of such people for me. He thus obviously commanded the respect he deserves, and you can see it in the guest posts that his colleagues left on his website that that’s not just me. I can really only use English well enough for dry technical matters, so I can’t really describe the feeling very well. I’m just very sorry to hear the news.

Uncategorized

My epic battle with findbugs-maven-plugin and how I utterly lost

August 23rd, 2011

It started quite innoucently. I was looking at this thread in Jenkins dev list and thought it’d be a good idea to get some critical findbugs errors to fail a build. My goal was simple, I want to run some high-priority findbugs checkers during the build, and if they report any error, I want the build to fail. I wanted this to be in a profile, so that I don’t need to wait for FindBugs to finish if I just want to build.

Should be simple enough, you’d think. Nope.

I’ve spent the entire afternoon getting this going. There were several issues in the plugin and relevant places that blocked my progress. In the hope that other people won’t suffer the same loss, here are those:

  • Maven 2.x and Maven 3.x site plugins are totally incompatible, breaking the setup that used to work. AFAIK there’s still no reasonable approach to enable your project to build both with Maven 2 and Maven 3, when it comes to site stuff, and that pretty much includes all the code analysis plugins. Maven 3 breaks backward compatibility with the site configuration of Maven 2, so the POM setup that used to work will no longer work with Maven 3. (So if someone tells you that Maven 3 is compatible with Maven2, don’t let them fool you.) It silently ignores all you have in and does nothing. So you’ll have to move the reporting configuration into a new location. This used to make it impossible to share the site configuration between Maven2 and Maven3, but I was told that the latest Maven site plugin version 3.0 no longer has this problem.

  • Findbugs plugin documentation seems to offer two mojos to generate reports — findbugs:check and findbugs:findbugs. But the check mojo actually isn’t capable of generating any reports. Two dozen or so configuration options to tweak the report generation you see in the doc are totally bogus. They are unused and ignored. (correction 8/24: what I missed is that the check plugin designates findbugs:findbugs as a pre-requisite.)

  • Some people tell you that you can invoke mvn findbugs:findbugs directly to generate report, but this is rather problematic if you actually try it. Firstly, it will generate XML but not HTML, so it’s useless for human beings. It does tell you how many bugs it found, but it doesn’t tell anything that actually points you to where the offending code is. One is supposed to be able to work around that by running the findbugs:gui mojo, but AFAICT this mojo is utterly broken. Secondly, if you invoke findbugs:findbugs mojo directly, it doesn’t pick up the same configuration that it uses during the site generation (one picks up build/plugins/plugin, the other looks at reporting/plugins/plugin). Again, AFAIK there’s no way to have those two modes of invocation use the same configuration.

  • You need to make sure that Maven at least compiled your source code before running site. The FindBugs mojo will happily skip itself if there’s no class files to work on, and unless you are smart enough to figure out what the mysterious “canGenerate=false” line means, you’ll waste your time trying to figure out why the mojo isn’t working, like I did.

  • Remember my use case of making the build fail in case of serious FindBugs issues? Documentation might make you believe that findbugs:check mojo is able to do this, but there are two large pit-falls. One is what I’ve already described, namely that this doesn’t actually run FindBugs, and instead it expects that you’ve already run it. The other is that if it doesn’t find any trace of FindBugs running, it happily skips itself. The consequence is that mvn clean install will always complete successfully, even if your code has FindBugs violations. I still haven’t figured out how to make this whole thing work. As I mentioned, findbugs report generation itself requires that the source code be compiled, so I guess you’d have to invoke Maven like mvn clean compile site install or something. This is just ridiculous.

  • In FindBugs, you can specify what rules you want to enforce and what rules you want to ignore. You describe this in a filter file. In a multi-module project, it tends to be more convenient to have just one filter file that all your modules use, rather than having many similar filter definitions. But this seemingly typical use case just doesn’t work with the Maven FindBugs plugin, because the path you specify in the filter file configuration is always interpreted relative to the current Maven module, and there seems to be no way to have it point to the base directory of the project (the ${project.basedir} macro also expands to the current module’s base directory, which is useless.) The documentation does talk about this and gives you a work around. As a Maven plugin developer myself, I understand where they are coming from, but as a Git user, the assumption that requires such a cumbersome workaround (of being able to check out and build modules individually, like you can in Subversion) is unnecessary, yet I still have to pay all the price. This doesn’t make sense.

I’m sorry to say this, but this is a disaster. Integrating FindBugs in Ant project, generating HTML report, and failing a build in case of significant error is fairly straight-forward, and takes maybe 10 or 20 lines at max. But here in Maven, it takes more lines in your POM, not to mention one whole Maven module just for the filter file, plus all these pitfalls. And it still doesn’t attain my original goal of making critical FindBugs issues fail the build.

Experiences like this made me really want to switch to Gradle, but alas, it’s no longer my call alone to make changes like that. So for the time being, I think I’m going back to my good trusted Maven antrun extended plugin. At least it works. And Stephen, this is why Ant fragment is actually more maintainable than the magical combination of Maven hacks.

Uncategorized ,