Monday, October 23, 2017

Maven Release and Sub-Directories

by Richard Vowles

Revisiting Repository Management for Java Artifacts

On a single repository per Java artifact

On the Illegal Argument Podcast that I formed with +Mark Derricutt several  years ago now, and before I left it as I had "run out of things to say", I argued vehemently that for Java projects to have the correct contracts and to be releasable as proper binary artifacts they had to have their own repository in Git.

Git is different from Subversion - which we were all shifting from - in one pretty fundamental way - tags are not tree relative. You don't tag from where you are, you tag the whole repository. This meant that if you were releasing and had other artifacts in your repository, they would all come along with that tag. If you had to do a patch fix for production and you were on a release cycle, you had to swap back to that tag, branch again and do a patch fix. Multiple artifacts in one repository would mean they would all swap back to earlier versions, requiring you to then have to clone out a repository specifically for that fix, which essentially gave you one repository per artifact so you might as well start as you mean to go on and not muddy the waters.

Further, traditional build systems would want to build a particular repository, and since best practice is, and remains, that you should only build what changed and rely on the declared dependencies - that your build system would create a cascading build for when you actually need one.

The downside to this of course is an explosion in repositories.

So a few things have turned up that has changed my mind on this and I'd like to detail one of them here.

Mono-repos, and Maven Release

If you are doing CD, you operate on snapshots - there is no need to do anything else. Unless you have third parties relying on your artifacts. And then you are back into this single repo vs mono repo problem. The problem came that when you release, the whole of your Git repository is checked out into your target folder from your tag, and compiled again. Up until 2.5, you couldn't actually release subfolders.

I realized as part of the Connect Project that I was releasing from sub-folders successfully. So I went to talk to Mark about it, who told me I should be able to - which led to this blog post.

Now you can, and this changes the game somewhat.  In my case, I have a bunch of repositories that I'm totally find chucking in a single mono-repo and just releasing forward. I can branch and do a patch release if I want, but they are all largely unrelated, they don't depend on each other and should never be released using a Multi-Module Build.

One of the real pains of managing finely grained repositories is having to manage so many. This ability to release from a single mono-repo has tipped me into that camp. I can still do patches - I'd never have merged them back anyway - but this is going to make my life considerably easier.

Why I hate Multi-Module Builds

While we are here, lets have a rant about Multi-Module Builds.

A Multi-Module build is one where you have a pom that only has module references in it - and these reference artifacts that are in subdirectories. They are not in themselves evil, they work well for CD as long as you tell them what to build. They do not work well for open source projects.

Typically these are used in open source projects for having all artifacts released together, they all have the same version number and in released projects, they tend to have a slow cadence. A bug fix takes forever to get released because everything gets released, even when it doesn't need to. 99% of artifacts in these kinds of projects experience no change, it is simply because the build process is silly.

This kind of project really annoys me. Projects that release like Spring (although the level of stupid in Spring's build system beggars belief) and CXF (as much as I appreciate the work they do to make me not have to deal with the vagaries of WebServices) but it means they batch their bug fixes. And you can wait weeks for a fixed bug to be released, because they batch them.

What should they do? They should individually release their artifacts as soon as they have been verified to be correct and have a single artifact that represents the project as a whole - a simple pom that just lists the project modules in their specific versions as dependencies. This allows bug fixes to just release day after day, with no change in all of the other artifacts and then they can batch them in the pom only release if they want. And people who need the fixes can just override that released pom with the new artifact. Simple.

I distinguish Multi-Module builds from Reactor Builds. Reactor Builds can take in a whole bunch of artifacts and are never intended for release - just to make a developer's life easier to pick up the Application in their IDE or build a complete installation of an application. They are used in Applications, Multi-Module builds are abused in Libraries.

I'll be shifting the Connect project Java repositories to a single repository soon.