📜 ⬆️ ⬇️

Maven. We collect only the changed

When working in a multi-module maven project, it is often necessary to make changes to several related modules at the same time. And if you want to collect only affected modules, then unfortunately maven does not provide anything automatic. If you just google it, then on stackoverflow you can find a simple one-line solution:


mvn install -amd -pl $(svn st | colrm 1 8 | sed 's /.* ' | xargs echo | sed 's- -,:-g' | sed 's ^ : ') 

At this could be finished. But I wanted more - which is more specific and how I achieved it under the cut.


For those who are new to bash, mvn or svn, a small explanation of the script:


  1. The main part is mvn install -amd -pl project_list . Maven team, collect projects from project_list and their dependent ones
  2. All that is inside $(...) is getting local changes to svn and pulling the name of the projects from these changes

Why do you need it


Consider the following simple scenario:


  1. Add a new method to the Parentable interface in module A
  2. Implement this method in the Child class (implements Parentable) in module B

To check the assembly, you need to separately assemble A, then B. If only the implementation changes in Child, then it is enough to reassemble only B. However, if the method signature changes, then you need to assemble both modules again.


In this work, you have to think about whether the assembly of Project B is relevant. It’s good when there are 2-3 projects, but if there are more of them, and refactoring through IDE is also used, then it’s very easy to skip the moment when the assembly is no longer relevant.


What's wrong with the script above


The script presented above solves this question clumsily: it always collects A and B. It is not very effective in terms of time, but is relatively safe, because you will not forget to collect anything. I used this approach for about a month, but then decided that it was better to forget to collect than to waste this extra time.


In addition to this problem, there are a number of others related to the features of maven, when this script simply does not work. About them will be discussed below.


What was done


I took the idea of ​​this script as a basis and finalized it with a file (in this case a rasp). The goal was to get a script that collects everything that we have changed, but have not yet collected. And nothing more.


Modifying the script on bash is a thankless task for several reasons. Therefore, the development was carried out in python. Of the benefits:


  1. Cross platform
  2. Modularity
  3. Almost all linux users and many window users have it.
  4. Simplicity

Of course, I want to just take and use, so only standard libraries from dependencies.


So, let's proceed to the main part of the article - a description of what problems we had to face and what solutions were used.


What can maven


To do as little as possible yourself, it would be nice to figure out what maximum you can squeeze out of maven. The maximum is not very large. You can collect the following areas in Maven:


  1. One project
  2. One project and all its modules (recursively)
  3. List of modules within one project.

Option 3 looks the most suitable and it is used in the introductory bash script. Of the benefits, when working with this option, you can specify the parameters:



For local assembly, these options are not very useful, because dependency assembly (amd) can be assigned to CI, and besides, our IDE can point to compilation errors without maven assembly.


And the build of parent projects (-am) is redundant, because we can pull unmodified projects from the maven repository.


Unfortunately, the list of what maven can help us with is over. It is fair to say that there are special third-party plug-ins, incl. incremental, but in this case we have to work with what is already there and change anything in the project for this is impossible.


Problem 1: collect only actual changes


To understand what has changed, it is most convenient to use the status of files in the version control system. However, VCS will not tell us anything about when these changes were made and whether they are included in the assembly. Nevertheless, the list of all changes in VCS is our active projects and the starting point for further analysis.


To find out how much our changes coincide with the build, we compare the date of the file change with the date of the last build (target / artifact_id-version.jar).


For the first approximation it is good, but there are nuances that must be considered:


  1. If other changes come to us from VCS, then our project needs to be recompiled again.
  2. target / artifact_id-version.jar is the default value. May be completely different
  3. Modified files may not be related to the assembly (for example, the project file for your IDE).

Point 1 can be solved by analyzing not only locally modified files, but also all files in active projects.


Two other problems are solved simply by obtaining the necessary information from the pom file and keeping it in mind during assembly.


Problem 2: Rollback Changes


If there were changes and we collected an artifact with them, then when you roll back these changes, the artifact needs to be reassembled. But the status in VCS does not give any information about such files. Therefore it is worth storing this information yourself. To do this, after analyzing the current changed projects, we save this information somewhere and during the next build we merge the current changes and previous ones from the saved file.


Problem 3: modules not included in the main project


The correct structure for a maven project is one in which each project is described as a module in the parent project. Those. running mvn install in the root, we must use all internal projects.


Everything would be easier if everything was correct. But it's not always the case. In this case, there are untied projects. It does not suit the correct maven, and if you give him such a project in -pl, he will spit out a mistake in you, because -pl only works with projects that are listed as submodules in your parent project.


To solve the problem, you need to collect such projects separately. Those. Maven is launched as many times as we have unrelated root projects.


To complicate matters, there may still be a situation when these projects are interconnected, so you need to assemble in a specific order.


Problem 4: local maven repository synchronization


Maven always does builds based on artifacts in the local repository. Periodically, it can synchronize the local repository and download updates from the global repository.


So a situation may arise when you have collected your changes and no longer change files. But sooner or later your local build will be superseded by external changes. One solution is to check the date of the changes in the local repository, not the file in the target. And if they are different, then you need to make a new build.


However, the files already collected in the target remain valid and no one will expel them from there. And in order not to do double work, they can simply be thrown into a local repository, instead of doing a full build.


Problem 5: verbose maven


This is not really a problem, but for me personally there is always the question of why maven issues so many logs. Is that to sit and meditate, looking at the running lines. But if everything goes well, these logs are not at all interesting for me. If everything went bad, then the logs are really useful. It is on this principle that I output the logs.


As a nice bonus to pure output, a significant increase in build speed, because IO is not the cheapest operation.


Problem 6: slow maven versus fast workarounds


Many of the problems above could be solved more easily if you receive additional information from Maven. But he is very slow and performs any commands for a very long time. Using them in a loop is a sure way to add a strong overhead on performance.


Therefore, all problems are solved to the best of their strength and capabilities with their own resources, and Maven is used only once - to run the assembly itself.


However, all attempts to bypass Maven are not complete and may not take into account a number of additional factors. They are enough for me on the current project configuration, but some more sophisticated configurations are not taken into account.


Bonus 1: script for build server


Everything described above was written and used for local assembly when we make changes ourselves.


But after a successful experience of use, I decided that this approach can be used on our continuous integration server and collect only what has changed between revisions. Of course, it is necessary to take into account most of the problems described for the local assembly. So both versions of the script go hand in hand.


It is important that such an incremental assembly give exactly the same result as the full one. This can be easily achieved using the same maven -amd parameter described above. Those. what is changed and / or touched is going to.


With regards to the gain received - in our case, the average incremental build time is 10 times less.


TeamCity has such an opportunity out of the box. Bamboo is not. About other CI do not know


Bonus 2: mini-miniature python projects


This section has nothing to do with maven or java, and may be of interest only to pythonists.
The script that I describe here is written as a project with several modules, because it's easier for me to work with code.


But sharing a script that contains several files is not very convenient.


Alternatively, you can collect a single binary from the scripts and issue it. But in this case, the possibility of fixes is lost on the fly and an understanding of what is happening there.


Therefore, another script was written in haste, which takes a python file as an input, collapses it and all dependent modules. The output is a single file. At the same time, unnecessary blank lines are deleted.


Epilogue


The script I describe here has been used by me and several colleagues for a year now. The script for CI is currently undergoing test operation.


The performance of the script is quite tolerable: I do not notice its analysis and optimization (i.e., they are less than a second) within our projects.


So far this only works with svn, since there is no need to collect other VCS. But as desired, you can easily add others. The benefit of VCS is not much needed there.


Link to project


I will be very happy if my approach and the script will allow someone else to save their time and energy on the assembly with Maven


Questions, corrections and additions are extremely welcome.


')

Source: https://habr.com/ru/post/323228/


All Articles