Monday, October 23, 2017

Maven Release and Sub-Directories

by Richard Vowles

Revisiting Repository Management for Java Artifacts

On a single repository per Java artifact

On the Illegal Argument Podcast that I formed with +Mark Derricutt several  years ago now, and before I left it as I had "run out of things to say", I argued vehemently that for Java projects to have the correct contracts and to be releasable as proper binary artifacts they had to have their own repository in Git.

Git is different from Subversion - which we were all shifting from - in one pretty fundamental way - tags are not tree relative. You don't tag from where you are, you tag the whole repository. This meant that if you were releasing and had other artifacts in your repository, they would all come along with that tag. If you had to do a patch fix for production and you were on a release cycle, you had to swap back to that tag, branch again and do a patch fix. Multiple artifacts in one repository would mean they would all swap back to earlier versions, requiring you to then have to clone out a repository specifically for that fix, which essentially gave you one repository per artifact so you might as well start as you mean to go on and not muddy the waters.

Further, traditional build systems would want to build a particular repository, and since best practice is, and remains, that you should only build what changed and rely on the declared dependencies - that your build system would create a cascading build for when you actually need one.

The downside to this of course is an explosion in repositories.

So a few things have turned up that has changed my mind on this and I'd like to detail one of them here.

Mono-repos, and Maven Release

If you are doing CD, you operate on snapshots - there is no need to do anything else. Unless you have third parties relying on your artifacts. And then you are back into this single repo vs mono repo problem. The problem came that when you release, the whole of your Git repository is checked out into your target folder from your tag, and compiled again. Up until 2.5, you couldn't actually release subfolders.

I realized as part of the Connect Project that I was releasing from sub-folders successfully. So I went to talk to Mark about it, who told me I should be able to - which led to this blog post.

Now you can, and this changes the game somewhat.  In my case, I have a bunch of repositories that I'm totally find chucking in a single mono-repo and just releasing forward. I can branch and do a patch release if I want, but they are all largely unrelated, they don't depend on each other and should never be released using a Multi-Module Build.

One of the real pains of managing finely grained repositories is having to manage so many. This ability to release from a single mono-repo has tipped me into that camp. I can still do patches - I'd never have merged them back anyway - but this is going to make my life considerably easier.

Why I hate Multi-Module Builds

While we are here, lets have a rant about Multi-Module Builds.

A Multi-Module build is one where you have a pom that only has module references in it - and these reference artifacts that are in subdirectories. They are not in themselves evil, they work well for CD as long as you tell them what to build. They do not work well for open source projects.

Typically these are used in open source projects for having all artifacts released together, they all have the same version number and in released projects, they tend to have a slow cadence. A bug fix takes forever to get released because everything gets released, even when it doesn't need to. 99% of artifacts in these kinds of projects experience no change, it is simply because the build process is silly.

This kind of project really annoys me. Projects that release like Spring (although the level of stupid in Spring's build system beggars belief) and CXF (as much as I appreciate the work they do to make me not have to deal with the vagaries of WebServices) but it means they batch their bug fixes. And you can wait weeks for a fixed bug to be released, because they batch them.

What should they do? They should individually release their artifacts as soon as they have been verified to be correct and have a single artifact that represents the project as a whole - a simple pom that just lists the project modules in their specific versions as dependencies. This allows bug fixes to just release day after day, with no change in all of the other artifacts and then they can batch them in the pom only release if they want. And people who need the fixes can just override that released pom with the new artifact. Simple.

I distinguish Multi-Module builds from Reactor Builds. Reactor Builds can take in a whole bunch of artifacts and are never intended for release - just to make a developer's life easier to pick up the Application in their IDE or build a complete installation of an application. They are used in Applications, Multi-Module builds are abused in Libraries.

I'll be shifting the Connect project Java repositories to a single repository soon.











Monday, April 25, 2016

Docker Registry and running mini-code-camps

by Richard Vowles

Most of my interest in the last six months has really been consumed by electronics - it happens each year around Christmas when I start having a bit of time to myself again. As I have now started back into the technology deep dive, having found something I'm actually interested in learning and pushing into the +Code Lounge sessions, I am hitting the same Docker problem I had before - bandwidth. I simply do not have enough bandwidth on ADSL to support even two or three people downloading Docker images.

The last time we tried it, it just wasted huge amounts of time and people were failing to pull images. This time however, there appears to be a Docker Registry project, which I am hoping will solve my problem.

My next Code Lounge is on Kubernetes and we will be using Docker, CoreOS, flanneld, etc all that stuff - hopefully to create a distributed docker cluster on people's machines. As such, I am upgrading my old Mac mini to ensure I can actually run it. The last time I tried to install Docker Machine on my Mac, it didn't have the architecture to run and kept crashing until I learnt you had to install it via HomeBrew.

I'll report back how successful I have been.

Sunday, November 30, 2014

Fast Delivery

by Richard Vowles

So I wanted to give myself an exercise in creativity in the last week, scratch an itch and see just how fast I can deliver a project. It tooks me 2.5 days.

My local high school (secondary school) runs a website that the pupils and their parents can log into. It allows you to see the  usual grades and so forth, but also their timetable.

That can be pretty important as they run a 6 day cycle - so if you have a long weekend or a holiday, if even a particularly busy week it can be difficult for some people (particularly my son) to remember what "day" it is at school.

In my case, the project was to allow students (including my son) to synchronize that school provided timetable (which is customized per pupil) with their Google Calendar - which is synchronized to their phone. Many of the kids turn off their data plan, as they are only allowed one device on the school network.

So in my case I wanted to let them log in with their Google Account - give me permission to their Calendar, tell me what their school username and password is (no OAuth there,  I encrypt and store) and then I go off and create a calendar and push their timetable into it. And I do that every week, automatically.

Now this had a few challenges:


  • I had never worked with OAuth before (or OAuth2 in this case). I had read the book, felt I understood it and then forgot it. +Mark Derricutt and I had done a short stint with the Google provided libraries and found some problems, but I couldn't even find that project. 
  • I hadn't worked with the Calendar API and that turns out to have some significant quirks
  • I wanted to make a mobile first application, it had to look good and work - so I wanted to use Material Design components
  • I wanted to run it against an https server
  • I needed to see what parts of the start-up framework were "quirky" and needing of cleaning rough edges

Some lessons I learnt from the experience - stuff that will make it into the next +Code Lounge I run with +Irina Benediktovich .

  • Busy logic - that there is one or more inflight XHR requests really need to be pulled out into a separate module and just included in every project. It is simply such a useful structural pattern that it simply has to be a module
  • The OAuth2 stuff behaved in an unexpected fashion, every single time you go through the "login" process, you get a new token. And if you are starting and stopping your service, the session you get passed is different to the session your request.getSession().getId() provides you - so you have to make special concessions to  try and track the user properly
  • I still haven't found a way to track a user in an opaque fashion - if they change their primary email address, their account is hosed. I was pretty sure there was a way to do this, but I haven't worried about it too much yet.
  • JAWR really hates CSS with embedded SVG images. That took me a while to figure out.
  • Make sure your server is running on the correct time zone or ensure you provide a time zone to all constructed DateTime's in your Google APIs. 
  • It turns out that Chrome on some phones can only search - it ignores any URLs that you type in. It is bizarre and very frustrating!
I may add more to this post as I remember things!

Saturday, November 22, 2014

Polymer without Dependency Injection is dead to me

by Richard Vowles



I have been watching quite a bit of the content from the recent Chrome Dev Summit - its the only way to watch it (after the fact) because there is so much fluff in the talks. I understand why this has to be done, but appreciate that they get off the fluff and into the useful how-tos. The Service Worker stuff I am afraid to say was cleanly and clearly covered in Jake Archibald's DevBytes with the Train Spotters example, and for a feature that still isn't readily available, it seemed fairly heavily hyped. I have yet to check whether Safari will support it on iOS, but if it does I think we are going to start to be able to claw back some territory from native phone apps.

The Material Design / Web Components / Polymer focus however was more interesting - this has been going on for some time and I have been avidly following it. I like the format of web components much more than Angular Directives, I like the encapsulation of essentially everything you need. 

What I don't like however is the lack of dependency injection - even an interface for adding your own.

Whenever I see a "new XXXX' in someone's code - particularly for a service of some kind I outwardly cringe - with Angular and DI, we have been able to significantly improve the way that we build our web applications, focusing on what the customer needs and wants to see in terms of interactivity, workflow, look and feel, and just overall user experience. We attribute this not just to the superior nature of developing in Angular, but partcularly the DI capability. We can mock out our services, and even when we eventually replace them, we are still able to use them for our unit tests with Karma.

There is no such capability with +Polymer - it is listed as part of the "future roadmap" but really, it is critical. The ability to inject your actual services provides for such as well structured application.

The only workaround at the moment is to reverse it, so that every object would pull them from the global context - no Hollywood principle for us!

I would really like to use and recommend Polymer, but I simply cannot at the moment and won't be able to until I see at least some activity in this area.

Tuesday, May 27, 2014

Groovy fun with @CompileStatic

by Richard Vowles

Groovy is one of the few languages that allow both static and dynamic compilation, and among the "big" languages on the JVM (Java, Groovy, Scala, JRuby and Clojure), the only one as far as I am aware.

Unless I am doing Grails (we stopped at 2.1.13 because of the crazy buggy-ness of that platform, its dependence on Hibernate and just random magic insanity that happens). So just using Groovy in a basic Spring + AngularJS web framework has been empowering - I've only been swapping back to non type safe only really for tests. I still look at Mockito with pain.

One of the things that has bugged me however is Closures. If I wanted a callback to work, I lost the type safety. Consider this method:

  void listProfiles(User user, boolean favourite, String filter, Closure callback)

So I now am not telling the compiler what types are being passed. I thought about this problem tonight and remember that an interface with a single method is treated as a closure - so I thought I'd try it:

    interface ListProfileCallback {
       void profile(Profile profile, boolean favourite)
    }

    void listProfiles(User user, boolean favourite, String filter, ListProfileCallback callback)
and sure enough - the type system kicks in and tells me I'm missing a parameter!


(image taken from http://www.luvyababes.co.uk/)

Friday, May 23, 2014

Gerrit, +2 Looks Good To Me and self plussing

by Richard Vowles

One of the weird things about Gerrit is that it appears to operate by default in nonsense mode.

What do I mean? Well, Gerrit is a code review tool. But the default setup of the tool is to allow you to accept your own reviews and submit  them directly - you can push for review, +2, and then submit.

I mean, this blows my mind. Why on earth would this be allowed, by default, in a code review tool? I know some projects might see a good reason for it - I found a few discussions about them on the Issue 308 that discusses this very topic. Two people saying "I do this" seems to make it the default behaviour.

So in my case, it was important that I could set this on the Permissions Project - it happens across all our projects. So the example from the Gerrit Cookbook - which uses Prolog (that hurt my brain the last time I tried to learn it at University) only allows you to do it on a per project basis. So I needed a submit_filter rather than a submit_rule.

So to put this somewhere so I don't forget it or lose it, here is my series of commands:

git clone ....
git fetch origin refs/meta/config
git branch refs/meta/config
git checkout refs/meta/config
vi rules.pl
git add rules.pl
git commit -a -m "your message"
git push origin HEAD:refs/meta/config

in rules.pl you put:

submit_filter(In,Out) :-
  In =.. [submit | Ls],
  add_non_author_approval(Ls, R),
  Out =.. [submit | R].

add_non_author_approval(S1, S2) :-
  gerrit:commit_author(A),
  gerrit:commit_label(label('Code-Review', 2), R),
  R \= A, !,
  S2 = [label('Non-Author-Code-Review', ok(R)) | S1].


add_non_author_approval(S1, [label('Non-Author-Code-Review', need(_)) | S1]).

Enjoy.

PS this is about preventing gerrit users from moderating their own commits or reviews and allowing them to submit. Just in case this sentence makes it easier for Google Search to find it.



Saturday, May 17, 2014

Dealing with bugs in open source libraries

by Richard Vowles


One thing you have to do when dealing with open source libraries a lot is sorting out buggy libraries. Or libraries where you hit edge cases that you need to get fixed but you can't get it done in a timely fashion. Sometimes you hit fundamental technical problems where you know your fork won't be included in the upstream version and you just have to make a decision.

As +Robert Watkins points out in the comments, at least you can deal with these bugs. In closed source libraries (yes, some still use them), the problem is much harder to deal with, but I don't discuss that here.

This post is not about how to use Github to contribute back. There are lots of posts that do that. Its about what to do when you have to fix the problem right now.

This problem has particularly hit us in our Grails libraries - people scratch their own itch and then abandon the projects, they never wrote any tests for it and out of its basic functionality its painful or they just don't get updated. The Grails Resources plugin had a problem where it didn't detect the scheme (http vs https) of an incoming request and re-write external javascript resources properly. We had to fix that, and years later it still hadn't been resolved.

But just as often, we hit problems in other libraries - and they take time to resolve. People hit problems in libraries that I make ( +Peter Cummuskey for example) and although he makes a pull request, I may not have time to deal with it or potentially prefer to fix it another way (which also takes time). But generally the people on the ground just want it fixed.  The software is usually free, so people are not under any obligation to take your fix or spend any time on it. Thats not much comfort - when you need the fix.

The mechanism we have evolved for dealing with this is helped because we use Apache Maven. Most of our third party (not written in house) libraries are locked up what are called "composite poms". If they have problems we need to resolve, they immediately go into a composite and the problem is isolated at that point.

The tl;dr of this is simply:

  • put your third party dependencies in a composite-pom that you control
  • fork the dependency
  • fix the dependency, rename the groupId
  • release it to your own version to your own hosted Nexus or equivalent
  • update the composite-pom to point to your version and release it
  • version ranges will automatically update your projects, problem solved
  • put your fork in technical debt, 
  • when the developer maintaining the main artifact solves the problem, try using their new version, if it works, change your composite, remove technical debt
  • profit (ok, too many steps for Gnomes perhaps)


Composite Maven Projects

Simply put, a Composite Maven Project (or composite for short) is simply a Maven Artefact that is one file - a pom.xml file that contains a listing of dependencies. We don't use multi-module builds with parent dependencies, we put them in composites and use version ranges.

We usually lock down versions of third party libraries [2.4] for commons-io for example - it allows us to ensure that no other version leaks in without the build breaking (which can tend to happen). And yes, we want the build to break if someone starts fiddling versions.

Composites are bought into projects via version ranges, so changes in them automatically flow down. We upgrade the version of Jackson in composite-jackson from 1.9.8 to 1.9.9, everyone gets it automatically. If we change to 2.x of Jackson (a big change), we change the major version number of the pom so it doesn't automatically flow down.

And no, this isn't my idea - as far as I'm aware, I'm 3rd hand for this. It works really well. If you use version ranges extensively, like we do - check out my release-pom plugin (which at release tells you exactly what you were using) and the bounds plugin (which brings up the lower bounds of the minor version ranges, Maven resolves much more quickly when you do this).

Forking

Many of the plugins or libraries we have traditionally used have been hosted in Subversion. This had made forking and keeping track of the changes relatively difficult. You have to check them out, check them in locally and then apply your set of patches on top. If a new version comes out you have to deal with that situation again. Submitting a patch is time-consuming and a painful process with Subversion.

Git, Github and Bitbucket have made this situation easier - forking is encouraged, if something is a patch that should fix a bug, it can be submitted easily, you can track your changes against the other repository and rebase against it. Even working with subversion libraries can be easier - being able to create a new Github or Bitbucket repository from a Subversion one is easy - contributing the changes back are more difficult however.

However, timeliness is a problem - you need to be able to get your code into a tracked, versioned environment that allows you to reproduce repeatable builds and still deliver your software. And you need to do it now. Or within a reasonable period of time. It creates technical debt, but that can go on the technical debt register.

Maven Repository

Having a hosted Maven repository (we use Sonatype's Nexus) solves many of these issues. It allows us to fork these libraries, we typically rename the groupId so put it into our thirdparty heirarchy, and then re-release it to our Nexus. This can be done by hand (for a one off - effectively duplicating the Release plugin) or for a longer term solution, taking our top level parent (which contains Release information) and pushing it into the pom.xml so it tags and pushes to our Nexus third party repository directly.

Tying it all together

Having forked the source, released the new artifact to our "third party" repository on our hosted Nexus, we then just have to change the composite. The composite points to the new, fixed artifact and the fixed artifact will, of course, automatically flow into all downstream projects that depend on it.

Attributions


  • +Michael McCallum for introducing me to composites and version range use and making Maven wonderful to use again
  • Post documenting this because I asked +Irina Benediktovich to do it for the logstash logback logger - which uses Jackson 2.x and our Grails apps use 1.9.9. She had to get rid of all that OSGi crap.
  • Photo from Jon Watson @ Flickr