There Can Be Only One: Dealing with bugs in open source libraries

One thing you have to do when dealing with open source libraries a lot is sorting out buggy libraries. Or libraries where you hit edge cases that you need to get fixed but you can't get it done in a timely fashion. Sometimes you hit fundamental technical problems where you know your fork won't be included in the upstream version and you just have to make a decision.

As +Robert Watkins points out in the comments, at least you can deal with these bugs. In closed source libraries (yes, some still use them), the problem is much harder to deal with, but I don't discuss that here.

This post is not about how to use Github to contribute back. There are lots of posts that do that. Its about what to do when you have to fix the problem right now.

This problem has particularly hit us in our Grails libraries - people scratch their own itch and then abandon the projects, they never wrote any tests for it and out of its basic functionality its painful or they just don't get updated. The Grails Resources plugin had a problem where it didn't detect the scheme (http vs https) of an incoming request and re-write external javascript resources properly. We had to fix that, and years later it still hadn't been resolved.

But just as often, we hit problems in other libraries - and they take time to resolve. People hit problems in libraries that I make ( +Peter Cummuskey for example) and although he makes a pull request, I may not have time to deal with it or potentially prefer to fix it another way (which also takes time). But generally the people on the ground just want it fixed. The software is usually free, so people are not under any obligation to take your fix or spend any time on it. Thats not much comfort - when you need the fix.

The mechanism we have evolved for dealing with this is helped because we use Apache Maven. Most of our third party (not written in house) libraries are locked up what are called "composite poms". If they have problems we need to resolve, they immediately go into a composite and the problem is isolated at that point.

The tl;dr of this is simply:

put your third party dependencies in a composite-pom that you control
fork the dependency
fix the dependency, rename the groupId
release it to your own version to your own hosted Nexus or equivalent
update the composite-pom to point to your version and release it
version ranges will automatically update your projects, problem solved
put your fork in technical debt,
when the developer maintaining the main artifact solves the problem, try using their new version, if it works, change your composite, remove technical debt
profit (ok, too many steps for Gnomes perhaps)

Composite Maven Projects

Simply put, a Composite Maven Project (or composite for short) is simply a Maven Artefact that is one file - a pom.xml file that contains a listing of dependencies. We don't use multi-module builds with parent dependencies, we put them in composites and use version ranges.

We usually lock down versions of third party libraries [2.4] for commons-io for example - it allows us to ensure that no other version leaks in without the build breaking (which can tend to happen). And yes, we want the build to break if someone starts fiddling versions.

Composites are bought into projects via version ranges, so changes in them automatically flow down. We upgrade the version of Jackson in composite-jackson from 1.9.8 to 1.9.9, everyone gets it automatically. If we change to 2.x of Jackson (a big change), we change the major version number of the pom so it doesn't automatically flow down.

And no, this isn't my idea - as far as I'm aware, I'm 3rd hand for this. It works really well. If you use version ranges extensively, like we do - check out my release-pom plugin (which at release tells you exactly what you were using) and the bounds plugin (which brings up the lower bounds of the minor version ranges, Maven resolves much more quickly when you do this).

Forking

Many of the plugins or libraries we have traditionally used have been hosted in Subversion. This had made forking and keeping track of the changes relatively difficult. You have to check them out, check them in locally and then apply your set of patches on top. If a new version comes out you have to deal with that situation again. Submitting a patch is time-consuming and a painful process with Subversion.

Git, Github and Bitbucket have made this situation easier - forking is encouraged, if something is a patch that should fix a bug, it can be submitted easily, you can track your changes against the other repository and rebase against it. Even working with subversion libraries can be easier - being able to create a new Github or Bitbucket repository from a Subversion one is easy - contributing the changes back are more difficult however.

However, timeliness is a problem - you need to be able to get your code into a tracked, versioned environment that allows you to reproduce repeatable builds and still deliver your software. And you need to do it now. Or within a reasonable period of time. It creates technical debt, but that can go on the technical debt register.

Maven Repository

Having a hosted Maven repository (we use Sonatype's Nexus) solves many of these issues. It allows us to fork these libraries, we typically rename the groupId so put it into our thirdparty heirarchy, and then re-release it to our Nexus. This can be done by hand (for a one off - effectively duplicating the Release plugin) or for a longer term solution, taking our top level parent (which contains Release information) and pushing it into the pom.xml so it tags and pushes to our Nexus third party repository directly.

Tying it all together

Having forked the source, released the new artifact to our "third party" repository on our hosted Nexus, we then just have to change the composite. The composite points to the new, fixed artifact and the fixed artifact will, of course, automatically flow into all downstream projects that depend on it.

Attributions

+Michael McCallum for introducing me to composites and version range use and making Maven wonderful to use again
Post documenting this because I asked +Irina Benediktovich to do it for the logstash logback logger - which uses Jackson 2.x and our Grails apps use 1.9.9. She had to get rid of all that OSGi crap.
Photo from Jon Watson @ Flickr

There Can Be Only One

Saturday, May 17, 2014

Dealing with bugs in open source libraries

by Richard Vowles

Composite Maven Projects

Forking

Maven Repository

Tying it all together

Attributions

No comments: