Tuesday, January 3, 2012

Cleaning Up Maven Dependencies

I'm currently working on a rather large Java code base which is built with Maven. It has million lines of Java, 190 Maven modules and 12 web applications. There is a lot of history with a code base like this. For a while I have had a feeling that we probably have some extra Maven dependencies lying around. This happens very easily over time when code is modified and moved around. Also resolving dependency issues are usually solved by adding more dependencies. Very rarely I've seen anybody removing dependencies from a working build.

I had some spare time so I decided to check how many extra dependencies we have and maybe try to remove them. The end result of this journey was quite surprising to me: from over 3000 dependencies that we had I could remove over 1500 without breaking anything. Removing extra dependencies has positive effect on many levels (build time, IDE development, complexity, ...) so if you have a big Maven build somewhere you might want to do the same. I wrote some scripts on the way and decided to open source them (https://github.com/siivonen/maven-cleanup). I also decided to write following instructions for anyone who want's to clean up their Maven dependencies.

0. You can get the scripts to your machine with the following command

    git clone git://github.com/siivonen/maven-cleanup.git

1. Make sure that the build works on your machine

For starters you need a green baseline. Figure out a build command that will include all the necessary tests etc in our project. We need a Maven command that is used to determine if everything is OK after a dependency removal. You save yourself time when you select a command that runs all the tests and activates all the relevant profiles. The command you choose should have 'install' in it. The default selection is 'mvn clean install'

Once you know which Maven command you want to use, run a full build using this command on your machine. This build needs to succeed so make all the required tasks to get the build green. In big Maven builds this can be hard sometimes. It is important that every Maven project is built successfully locally before you continue. This way your local repository will have all the needed artifacts and we know that at least before any dependency removals the build was green.

2. Systematically remove dependencies not needed directly

This is rather time consuming and mechanical task so I wrote a script for this. The script takes a root pom.xml file and Maven command as parameters. What it does is following:
  • Find all sub modules of given pom.xml file
  • For every pom.xml found remove one dependency at a time and build the project with Maven given command
  • If the build is successful leave the dependency out
The full build time in my project is 1,5 hours and I wanted to eliminate unneeded direct dependencies so I decided to make the builds non recursive.

  ./remove-extra-dependencies.rb pom.xml 'mvn clean install -N'

The script took 14 hours to loop all our 3000+ dependencies in 190 pom.xml files. After the script execution there were 1600 dependency removals to commit. Unfortunately committing those would have resulted in build failure. Removing a dependency from a project will cause build failure on dependent projects that used to get the removed dependency transitively. That's why we often need step 3.

3. Add the missing dependencies

The missing dependencies are something that a project used to get transitively but not anymore. Scripting this task seemed too complex so I decided to do this manually. The process was pretty much the following:
  • Run full build command (the one you ran in step 1)
  • If the build fails for missing classes:
    • search the missing dependency from your local repository
      • grep my.missing.Class ~/.m2/repository -r --include *.jar
    • add the dependency to failing Maven module (if the test compilation/execution is failing the test scope is enough for that dependency)
    • resume the build
      • mvn clean install -rf :failing-module
  • Repeat this until you get green build
After this step you should have less dependencies but still green build and working software.

4. Remove extra dependencyManagement entries

After removing a lot of dependencies you probably have several extra dependency management entries. Removing them is again rather mechanical so I scripted that. The script takes a pom.xml path as parameter and:
  • Searches all dependencies of the given module and all it's sub modules
  • Loop through all the dependency management entries and remove the entries that are not found in the group of all dependencies
You need to run the script separately for all pom.xml files that have dependencyManagement section.

    ./remove-extra-dependency-management.rb pom.xml

This script doesn't need to run any Maven builds so the execution time is rather fast. You might get a build failure after running this script also. The script removes all the dependency management entries that are not referenced directly. You can use dependency management to control transitive dependencies of 3rd party libraries that you are not directly referencing. Also your build might rely on a dependency management exclusions that are now removed. If you have these build problems you can solve them case by case in the similar fashion as in step 3. Once you get a green build after this step you are done!