POTD: random but meaningful name generator

April 6th, 2014

I’m working on the automated blackbox acceptance tests for Jenkins, where I often need to generate random unique names. The code has been using random number generator to generate such names, but as I was debugging test failures, it became painful to remember those random names.

For example, a test might create two new jenkins jobs “random_name_155230″ and “random_name_137204″. Now which one was supposed to be the upstream and which one is downstream? Aside from a few exceptions, humans are generally not good at remembering those random numbers.

So I thought it’d be a lot better if these names are more meaningful, like “constructive_carrot” or “flexible_designer”. That is, if I have a decent sized corpus of N English adjectives and M nouns, I can generate NxM unique names (and induce a few chuckles to whoever see the generated names.)

After a bit of googling, I came across wordnet, and I took a subset of its corpus to come up with a small library that generates human-friendly random names. It has about 600 adjectives and 2400 nouns, resulting in 1.5 million unique names before the generator wraps around.

You’d use this library like this:

RandomNameGenerator rnd = new RandomNameGenerator(0);

while (true)
    System.out.println(rnd.next());

The code is under the BSD license with no advertisement clause, and the library is on Maven Central. Hope you find it useful.

potd , , ,

POTD: Application configuration via Guice binding + Groovy

March 1st, 2014

Often I write my applications with Guice. I also often want to make those applications configurable externally. For example I might inject username and password for that app to talk to another app, I might configure some timeout value, and so on. I make these configuration values available in Guice, so that I can access them wherever I need them. All of this is pretty common in many other places, I’d imagine.

Given that all I’m doing here is to pass configuration values from left to right, I thought it’d be nice if I can write configuration directly as a Guice module by using Guice binder EDSL. Then I won’t have to parse and translate these configuration any more.

And that became my project of the day.

This little library allows you to write Guice binding definitions in a text file:

timeout = 3
bind Payment named "customer" to VisaPayment

From your program, you use GroovyWiringModule to load this configuration file:

Module config = new GroovyWiringModule(new File("/etc/myapp.conf"));
Injector i = Guice.createInjector(
  Modules.override( ... my application's modules ...)
    .with(config))

The end result is that the above script gets translated into the following binding:

bind(int.class).annotatedWith(Names.named("timeout")).toInstance(3)
bind(Payment.class).annotatedWith(Names.named("customer")).to(VisaPayment.class)

Using Groovy as the host language for DSL has other benefits. If you are using system properties or environment variables to configure something, you are basically stuck with strings as the only representation of the configuration. With Groovy, I can create a relatively complex object and bind them, or even put some logic to further obtain values from elsewhere:

bind Payment toInstance new VisaPayment(
  cardNumber: "1234-5678-9012-3456",
  expiration: new Date(System.currentTimeInMillis()+TimeUnit.DAYS.toMillis(30),
  cvv: new URL("http://secret.server/cvv").text)

With the functionality in Guice to override definitions in one module by another, I can also even override bindings defined in programs, for example to get more logging, add a filter, etc.

potd , , , ,

POTD: cucumber annotation indexer

February 27th, 2014

Cucumber for Java requires that you specify the packages in which your step definitions exist. At runtime, cucumber uses some hack to try to list all the classes in this package (it’s a hack because class loaders never really support the listing operation), loads them one by one, and finds those that have step definition annotations like @When and @Then. This is both poor user experience (can’t you just find my step definitions!?) and poor performance (loading all the classes under a package is expensive.)

So I wrote a library that offers a much better alternative. It uses annotation indexer to create an index of step definitions and hooks at compile time. Thanks to JSR-269, this happens automatically on Java 6 and later. With the index in /META-INF/annotations, runtime can load all the step definitions quite efficiently.

The library contains a Backend implementation, so you should be able to just add it to your project dependency, and cucumber should automatically find this (and thus all your step definitions and hooks.)

By the way, this horrible technique of scanning jar files, listing class files, and finding annotations from there is unfortunately commonly seen in many other libraries. This was a necessary evil in the days of Java 5, but it should really die in this day and age. If you realy on the classpath scanning, please switch to annotation indexer, which provides the backbone functionality of this POTD.

potd , , , ,

POTD: no more tears

December 14th, 2013

In a modular Java program or in a large Java project that has lots of dependencies, you often end up a version of library that’s different from the version used to compile the code.

This often results in LinkageError, where a method/field that was present when the code was compiled do not exist any more in the version being loaded at the runtime. This restriction applies to seemingly trivial safe changes, such as changing the return type of a method to the subtype of what it used to be.

Previously, the only way to deal with this is not to remove any signatures that matter. In other words, you count on library/module developers to be more disciplined. Over the time, Java programmers have accepted this as a way of life, but there are some notorious offenders (Guava and ASM, I’m looking at you.) Besides, it makes it difficult to evolve code.

I have dealt with this in multiple different ways in the past.

The bridge method injector is an example of static compile-time transformation. This kind of tool is non-intrusive to the users of a module, which is good, but it’s still a tool for library developers to be diligent, and processing at compile time means it has only limited information to operate on.

The bytecode compatibility transformer is an example of runtime transformation. This has a lot more information to let it do the right transformation, but modifying class files on the fly requires a custom classloader, which limits its applicability.

On the way back from my recent trip, I realized there is the 3rd way to achieve the same effect — invokedynamic. You see, invokedynamic is really just a mechanism of deferring the linking to the runtime. This allows me to combine the benefit of two approaches. I can transform class files at the compile time without really deciding how the references are linked. Then at runtime, I can decide how they actually get linked but without a need of runtime transformation. The only downside is that it requires Java7.

But in any case, I thought the idea was clever, so I implemented it as my “project of the day”. Please let me know what you think.

potd , , ,

Why I hate Zendesk

August 30th, 2013

In Jenkins project, I use JIRA. And for CloudBees customers, we use Zendesk. And I positively and passionately hate Zendesk. It’s quite painful for my usage, and even more disappointing as many of the things that make me suffer can be fixed so easily. So I decided to write this to make myself feel better.

The way we use Zendesk, it’s really not that diffrent from the way Jenkins JIRA is used, which probably applies to just about any software projects. People report strange behaviours in software, stack traces, log files, etc. We look at those, sometimes ask for more information, ask them to try a new binary, get a new data, and repeat this process until it runs its course. I don’t think we are that unique.

This kind of support work is a lot like a detective work. You look at various log files and stack traces, you connect dots, and you derive a certain hypothesis. Sometimes additional facts are discovered later, or a hypothesis is proven wrong, and you start digging in another direction.

When I’m doing this, it’s very important that I “cite” my sources correctly. I’d like to be able to say “the stack trace in comment #7 indicates that X is doing Y when Z. This is consistent with ticket #150 comment #9.” But I can’t do this in Zendesk, because they do not have any sequence number nor ID for comments. So the best I can say is “comment made by John 11:59 on Apr 13th 2013 in ticket #150″, and if somebody (like me 6 months from now) wants to see the referenced comment, he has to manually find that comment. This is like writing a research paper without a bibliography section, or how a sloppy program managers refer to tickets (“what happened to that ticket — you know, the stability issue from Acme Corp a few weeks ago?”). This is particularly maddening because obviously they have an unique ID in their database, and I can even see it in the HTML source. It’s really only a minor additional work to make comments referencible. And yes, JIRA has permalinks for comments for long time.

There’s a lot of useful features to be built in this direction — for example, I’d love to see log files visible online, in such a way that each line is separately addressible as URL, and allow agents to left comments, much like how you review code. In this way, I can leave the audit trail for my detective work so that others can retrace that later.

Another thing that bugs me is the inability to edit comments that have already been posted. Zendesk supports markdown, which is good; it helps you organize information a lot more than just plain text. But some customers don’t know that, and file a ticket in plain text. Similarly, when the ticket comes from the e-mail gateway, often line wrapping and etc messes up the format.

In JIRA I often edit the ticket description to fix this up — convert a section of the text to a fixed-width font, remove line wraps, delete the pointless e-mail footers, etc. Have you ever had someone who posted 1500 lines of log records as a comment? Move that off to an attachment, and edit that out! In other situations, I want to go back to my earlier comment and strike out something, for example when my earlier hypothesis turns out to be wrong. But I can’t do that either in Zendesk. (And Zendesk doesn’t let you collapse a comment, either.)

I can go on forever, but I should be working, not whining, so I’ll finish this up with just one more problem; inability to edit a closed ticket.

Since CloudBees uses Zendesk for supporting paid customers, developers are expected to bring tickets to completion, to the “closed” state. I’m pretty sure we aren’t unique. The problem is that once a ticket is closed, one cannot make any further updates.

Say another ticket was filed later by somebody else that is seemingly related? Can’t add a comment to link to it. If a detective work led to an old “unresolved” case and you came up with a new hypothesis that might explain a behaviour? Can’t put it there. Adding a new tag? Nope.

My colleague Ben Walding made an observation that Zendesk probably has a different kind of audience in mind — ones where a relatively unsophisticated “level 1″ support people sitting in an office somewhere in Asia goes through 100s of tickets every hour as fast as possible. In other words, Comcast or AT&T kind of support. Prominence given to features like a templated response backs up this assumption. I can only assume Zendesk works well for those use cases.

But if you are considering Zendesk and your use case is more like ours, the sophisticated software support, then I hope you learn from our mistake and stay away from Zendesk.

misc , ,

LEGO Earth project

August 13th, 2013

I play LEGO a lot with my daughter, and I’ve been obsessed with making spheres. My last sphere building attempt was building a mini LEGO Earth in 2009. While I enjoyed the project, it was also clear to me that there is a room for improvements.

The main issue is that the the “pixel resolution” of LEGO is rather low that many of the distinctice coast line shapes are just not visible. I also later discovered off-by-one bug in the program that I used to compute the color assignment to tiles, which skewed the projection around seams.

So around March last year, I decided to build a bigger version of it, at the whopping 3x scale of the earlier project. This will surely give me enough resolution!

The construction starts with building a 36x36x36 stud cube from LEGO technic beams. The studs are facing all 6 directions.

The lumps at corners are for changing the stud direction as well as for supporting the structure.

The idea is to basically build six peels and put them on the surface of this cube, completely covering the cube:

Each peel is big enough that it needs its own support structure. When attached to the cube, the center of each peel will not be connected to anything, so it needs to be fairly sturdy. It would be fun to write another program that does the structural analysis and figure out how much support would be adequate. Perhaps a TODO for the next version. The clipboard you see on the left is the instruction computed by my program

The support structure is mostly made of 2×4 bricks. Just for this purpose, I bought several 1000s of them:

On this support structure, I then built up the arch over it. One section is almost complete!:

One thing I wasn’t too careful about in the planning phase was the seasonal change of the Earth surface.

The image I based the texture mapping on didn’t have any sea ice, so I use a different source to add in sea ice. Unfortunately, those two sources did not agree on the season. If you look carefully, the sea ice is the winter level, but the land ice is the summer level:

On the positive side, the higher resolution really paid off. Here are two Earths side by side, showing the same side.

We are used to seeing the Earth as a two dimensional map projection, so our mental image of the Earth is quite skewed. Building an actual sphere was quite an enlightening experience, as I get to really look at the actual proportions of things. For example, I used to think that when I fly from the US to Europe, I’m going about one third of the Earth, but it turns out it’s more like just one fourth. Africa is really, really large, and the mouth of Amazon river is so vast that the brown area caused by the river-carried dirt stretches a hundred kilometers. For that I gave is a distinctive brown tile:

This version of Earth is about one meter across. It’s quite heavy, but I can still carry it by myself.

Once I assembled it, I still had some more fun by calculating various numbers based on this scale.

For example, the highest point of the Earth, Mount Everest in Himalayas, is only about 0.3mm tall above the sea level. That is, it’s not even one tenth of a LEGO plate high. In fact, he whole Earth crust is only about 30km deep, so that’s only a half LEGO plate high. If I wanted to correctly model the interior of the LEGO Earth, then I have to basically build a red sphere all the way, except the top-most plates. It’s almost like we are living on the egg shell, and no wonder the continents moves over time!

In the same scale, the Sun would be about 50m (200ft) in diameter at 6km / 4miles away. That’s big enough to comfortably fits the Statue of Liberty inside. Someone please build that in LEGO and we will put that and my Earth together at the accurate distance!

Finally, if you put this Earth in my home at San Jose, the furthest man-made object, Voyager 1, is past San Diego and into Mexico, flying at the speed of 1.5m/h. Picture a garden snail that crawls past the LEGO earth, slow that down 100 times, and that’s about the speed of Voyager 1. No wonder it took 35 years to get there — go Voyager!

Anyway, that’s it. Now that I’m finished with this project, what am I going to build next?

lego, misc ,

Support API freedom

April 5th, 2013

I was reading this article from Steve and Sacha about the API copyrightability, and found myself in a violent agreement. If you haven’t read it, I highly recommend it.

For those of you who haven’t been following the tech news, the issue at hand is Android — Google neatly side-stepped Java’s compatibility requirements by introducing a new runtime/VM and said that Android is not Java. Oracle sued Google claiming that the Java API is a copyrightable material, and that Google can’t just create a whole new implementation that’s API-compatible with Java. Oracle lost the case, but now Oracle is appealing, and they are garthering legacy vendor friends to argue that API not copyrightable is bad for economy.

But wait, surely more competition is bad for them vendors, but what about the instant gain we the developers got when Android came along, in becoming instantly productive in this entirely new platform?

Looking at the comment secion of the article, I was bit disappointed that some people saw this only as a storm in a teacup, or that this is an issue only about Java, and their favorite programming ecosystem (C#, Ruby, …) are OK. But it’s quite a contrary. If the appeal is successful, it has a broad implication on all sorts of APIs. As they say, first they come for the communists, and you think you are safe, but by the time they come to you, it might be too late!

Take Mono for example. Sure, C# and CLI are under Microsoft Community Promise. But what about the vast APIs in the .NET Framework, which is necessary for writing any meaningul application? What about all the Win32 APIs that Wine implements? Or how about Eucalyptus implementing the Amazon Web Services API? Sure, they might be in a good relationship now, but what if IBM acquires Eucalyptus and started a cloud offering of the same API?

As a developer I benefit every day from the compatibility and being able to migrate from one vendor to another without losing everything. And when I look back at PC/AT, x86 instruction sets, Java EE APIs, and so on, I truly believe that the openness is good not just for us the developers but for the broader economy as a whole.

So after reading the article, I felt like I wanted to help the cause and voice my support, but I wasn’t sure how — I’m just a developer and not a lawyer. So I created a White House petition. Not so much because I expect the White House to do something about it, but it’s a good enough neutral petition site that hopefully people feel safe enough to join. If you agree with the cause, please join the petition and help spread the words, so that our voices get heard.

Uncategorized , ,

Jenkins commit activities in 2012

January 25th, 2013

Stephen is having fun with Git and R, and I saw another person creating a similar comparison (although I’m not sure where the credit should go on that one.)

Both charts are great, but I also noticed that their Y-axis aren’t consistent. So if people aren’t careful they might not see Jenkins as favorably as it actually is. So here is the graph, comparing # of commits in the year 2012. They both only includes the core and in the master branch:

On average, we have 41.2 commits per week, compared to 9.7 on the other side. If you want the raw data, here is the Excel file.

hudson, jenkins

JavaFX needs to be a new edition of Java

January 20th, 2013

Lately, there has been a number of security vulnerabilities reported in Java. The latest one is reported just after a few days of JavaSE 7u11, which by itself a response to another vulnerability. It’s so bad to the point that people are being asked to uninstall Java (yes, just in the browser, but let’s face it, it’s way easier to just uninstall it entirely than to disable the Java plugin from all the browsers.)

I think this issue needs to be addressed at a deeper level pretty quickly, or else what’s still left of the client-side Java would be dead soon (if you think it’s already dead, then there’s nothing else to see in this post, so please move on.)

In fact, one could even argue that the entire Java platform is at risk — people (especially our users and those who are studying programming) do not understand that these issues are the sandbox breaches and therefore do not affect the server-side Java, nor embedded Java.

I think the main lesson should be that the sandbox model of Java is unmaintainable. It’s not just a few isolated bugs here and there, but it’s a structural problem.

For those of you who aren’t very familiar with Java, basically Java sandbox model is that:

  • At runtime the entirety of Java core libraries are present (the exact same code you run on the server-side, including such abilities as forking processes, making network connections to anywhere, and accessing files)
  • Code gets associated with the “code source” information to indicate if it’s trusted or not
  • Library code (like one that accesses files) checks if the caller is trusted or not, and if not certain operations aren’t allowed.

The check in step #3 is implemented in a Java code (called SecurityManager), and that’s where the problem is. All the recent vulnerabilities in Java basically involves runtime reflection to replace the effective SecurityManager instance. This forces the reflection library to be inside the protection wall built by the security layer, which in turns forces many libraries in JavaSE to be inside the wall (such as JMX, which is used for the recent exploit, as well as XML stuff, which I was personally involved in.)

At this point, the protection wall is so long and windy that it’s impossible to defend. And I think it’s really all because we decided to protect reflection.

IMHO, a much simpler way to let untrusted code run is just to enforce checks on the native side, and simply stop trusting anything that runs as Java byte code. After all, we really need to just check stuff that interacts with the outside world, and every one of those has to be done by calling into native code. There’s very few of these calls. In this way, neither JMX nor XML APIs can be exploited, because they simply become just another untrusted code (that just so happen to be available all the time.) I’m no Flash expert, but IIUC, this is how Flash security model works.

In other words, create a new edition of Java, by adding one more to the ME/SE/EE mix. It is a new runtime environment that just shares the same virtual machine and the subset of core libraries. I think this is already consistent with the way “JavaFX” is marketed. While technically speaking it’s just a fancy UI library, it’s really marketed as an RIA platform along ME/SE/EE.

With the sophisticated existing IDEs, static typing, people’s familiarity, and the formidable existing ecosystems, I think Java (VM and the byte code in particular, and to somewhat less extent the language) is still incredibly useful to the large-scale client side development. Especially so if a bit more can be added, such as a better interoperability with DOM and other HTML5 APIs. While JavaScript is getting better day by day, there are still many things Java does better.

Imagine being able to define a large application into modules, with interfaces to define boundaries, mature module systems, efficient delivery and caching mechanisms. Debugger support, multi-threading, runs outside the browser for unit testing, programming in Ruby or Groovy … you get the idea. Think of it as GWT except translation, if you may.

As an avid Java fan, here’s my plea to Oracle — I want this for my Christmas present this year.

Uncategorized

On the road for the rest of the month

January 10th, 2013

I’m really excited to kick off my 2013 with a tour around the world for Jenkins.

The first stop will be in Tel Aviv, where I’ll be doing training with AlphaCSP. This has sold out, but AlphaCSP will be delivering this training again in the future. I’ll then head to London for the Jenkins User Event in London, with James Nord (of the m2 release plugin fame) and Andrew Phillips from XebiaLabs. This one has also sold out, but I’ll be recording the talks and posting them online. After tha, I’m in Munich to deliver another Jenkins training with ObjectBay. Again the same deal here — it’s sold out but they’ll be scheduling repeats later.

I’ll then be back to my home for a few days, then off to Seoul for the first-ever Jenkins meet-up in Korea. I’m really excited about this, and I’m trying to kick-start a local community, which is invaluable in this region of the world. If you are in the area, please come join us, and get to know other people in the Jenkins community.

Then I head to Tokyo to an user meet-up (no registration page is up yet, please check back our website later), and a couple more trainings (this one has more seats available.)

From there I’m flying to Brussels for FOSDEM. They gave me a keynote slot (yay!), and Tyler is organizing the testing and automation devroom. A number of Jenkins people will be there, and we’ll have a table for two days. This will be exciting!

And in my last leg of the journey, I’ll be at JFokus in Stockholm. They kindly gave me a long 3 and half hour slot to talk about Jenkins in depth, aside from more typical 50 mins talking slot. I’m looking forward to this new format.

And when I finally get back to home, I suspect I need some down time to recover.

If you are in the area nearby, and grab some stickers or want to say hello, please let me know! And my apologies in advance for the delay in responding e-mails and other disruptions.

jenkins , , , ,