Modifying a Shared Dependency

· danra's blog

#design #refactoring #C++

I've recently been working on improving CMake integration at Sound Radix. Working with the JUCE framework, we use Projucer as the primary build-system generator for most of our projects, with JUCE's newer CMake integration being gradually added in parallel with the intention of eventually replacing Projucer.

With both Projucer and CMake, we run a Python script prior to building a project to generate BuildDefs.cpp:

1// generated by gen_build_commit.py @2024-03-10 10:25:32
2extern const char* const git_rev = "f478d05m";

which we use to display the build commit's short SHA1 along with a dirty-status (marked as an m suffix) in the about screen:

Muteomatic About Screen

In our Projucer-based builds, we have a pre-build action configured to run the Python script. In our CMake-based builds, we run the script at project configuration time via execute_process:

1target_sources(Muteomatic PRIVATE generated/BuildDefs.cpp)
2
3find_package(Python3 COMPONENTS Interpreter)
4file(MAKE_DIRECTORY "${CMAKE_SOURCE_DIR}/generated")
5execute_process(COMMAND "${Python3_EXECUTABLE}" "${CMAKE_SOURCE_DIR}/scripts/gen_build_commit.py" "--out=${CMAKE_SOURCE_DIR}/generated/BuildDefs.cpp")

The latter works OK most of the time, but not always: The author of this blog post explains why and outlines a more robust method for generating the information and making it available for use in code. I was happily surprised to find out one of my teammates had actually already followed the outline and made GitHash a couple of years ago, and I set out to add it to our CMake integration and replace our current method.

# A Tight Coupling

Here's an example of a GitHash-generated file:

 1// GitHash.cpp
 2namespace GitHash {
 3    extern const char * const branch;
 4    extern const char * const sha1;
 5    extern const char * const shortSha1;
 6    extern const bool dirty;
 7    extern const char * const branch = "main";
 8    extern const char * const sha1 = "f478d05f13d7e5949e6d565462b7a0b7cf20d55f";
 9    extern const char * const shortSha1 = "f478d05f";
10    extern const bool dirty = true;
11}

The same logical information previously provided in git_rev is now split among GitHash::shortSha1 and GitHash::dirty (along with an extra short SHA1 hex digit).

Inspecting current git_rev usage, its main consumer is the AboutScreen class, part of a shared utilities repo included as a submodule in multiple client projects repos:

1// AboutScreen.cpp
2extern const char* const git_rev;
3string AboutScreen::getBuildString() const {
4    return m_pluginName + string (git_rev);
5}

Apparently, rather than receiving the git revision via its public interface (for example as a constructor parameter), the class's implementation is hard-coded and tightly-coupled to receive the information in the git_rev symbol.

Now that we've gathered the relevant details, let's look at alternatives for how to introduce the change.

# Exploring Options

The most straightforward option is to just directly modify the shared class's implementation to work with the new information provider, effectively replacing the existing tight coupling with a new one:

 1// GitHashUtil.hpp
 2#include "GitHash.hpp"
 3namespace GitHash {
 4inline string shortSha1WithDirtyStatus() {
 5    return string (shortSha1) + string (dirty ? "m" : "");
 6}}
 7
 8// AboutScreen.cpp
 9#include "GitHashUtil.hpp"
10string AboutScreen::getBuildString() const {
11    return m_pluginName + GitHash::shortSha1WithDirtyStatus();
12}

However, there are multiple projects already using the shared class; making such a change means that any client project that updates its reference to the shared utilities submodule would fail to build until the necessary adjustments are made.

The main advantage of applying such a solution is that it's really simple! It can work in cases where:

In the first case, we can just commit the breaking change to the shared repo, update our client project to fetch the updated reference to the shared repo, and adjust to the breaking change, leaving it to future developers to make the required simple adjustments in other client projects when updating the shared repo reference. To make everyone's life easier, it's best to also document the breaking change, e.g., under a BREAKING_CHANGES.md file in the shared repository.

In the second case, we might be able to just update all of the client projects ourselves to accommodate the change, keeping everything working properly without requiring anyone else's attention or stretching out the migration process.

But what if there are many client projects, and it would take us too much time to maintain them all right now? Or, what if some of the projects haven't been maintained in a while, and getting them to build properly would require some extra work that there's no time for, e.g., due to breaking changes in other dependencies? What if some of the projects are code-frozen towards an upcoming release? (it's possible to apply the change in a develop branch, but if that's not already the workflow then we'd be introducing new complexity). If we only update some of the client projects, it's just a matter of time until someone (possibly our future selves) maintains a non-adapted project and stumbles into a broken build, this time with less context, increasing the time required to make the needed changes as well as the chance of introducing a bug (recall we are discussing the case where the required change isn't small, easy to understand and hard to get wrong).

In such cases, instead of replacing the tight coupling, we can reuse the existing one. In our case, AboutScreen is tightly coupled to what gen_build_commit.py generates, but since that script is also shared, the change to AboutScreen.cpp can be made non-breaking by also replacing the contents of gen_build_commit.py to run GitHash.cmake in CMake script mode, with the output filename set to BuildDefs.cpp instead of the default GitHash.cpp:

1cmake -DRUN_UPDATE_GIT_HASH=1 -DGitHash_SourceDir=. -DGitHash_OutputDir=./generated -DGitHash_CppFilename=BuildDefs.cpp -P ./GitHash/CMakeLists.txt

The composite change wouldn't be breaking because all client projects are already set up to run gen_build_commit.py and compile BuildDefs.cpp. With the non-breaking change applied to the shared repo, we'll have made the first step towards using GitHash. CMake-based builds of client projects could now be gradually adapted to directly integrate GitHash instead of execute_process-ing the Python script, and Projucer-based builds would just keep using the gen_build_commit.py script. When all CMake-based builds have been migrated and when Projucer-based builds are completely phased out, gen_build_commit.py can be deleted.

Another option in case we don't want to or can't modify the shared class, or if it's too costly or impossible to modify other references to the existing hard-coded symbol, is to only modify gen_build_commit.py; as before, it would run GitHash.cmake in script mode, but we would also customize GitHash to generate the old symbol. For example:

1// <Default GitHash-generated cpp file contents>
2// ...
3// Additional content:
4#include "GitHashUtil.hpp"
5
6static const string dirtysha = GitHash::shortSha1WithDirtyStatus();
7extern const char* const git_rev = dirtysha.data();

Or we could just tweak the CMake code to generate git_rev directly. In any case, we'll need to maintain a separate fork of GitHash for the customization. (Note we can't instead just have gen_build_commit.py append this content, because the end goal is to migrate away from that script in our CMake-based builds!)

None of the options discussed so far is improving anything design-wise; they all just embrace the pre-existing tight coupling in different ways. Is tight coupling really so bad, though? Sometimes it is, sometimes it isn't a big deal ("it depends"); but something I realized while writing this post is how the build-system generator of choice is an objective factor in determining how bad it is.

Considering our case, the tight coupling involves AboutScreen, BuildDefs.cpp and gen_build_commit.py. Since they are all shared, why can't the tight coupling be contained in the shared library rather than requiring the client project to care about it? Ideally, clients should be blissfully unaware of any extra steps that need to be made in order to use a shared class, in our case running gen_build_commit.py and compiling the generated BuildDefs.cpp.

With Projucer-based builds, this ideal seems unachievable. Every project using AboutScreen needs to also be set up to compile BuildDefs.cpp and to run gen_build_commit.py as a pre-build action, making the tight coupling between these elements visible and forcing the developer's awareness of it. This may not sound like much overhead, but consider having more such implicit inter-dependencies, multiple projects and build configurations, and most importantly team members who are each probably aware of only some of the tight couplings, and it adds up.

In contrast, this is definitely possible with CMake-based builds, where the misleadingly-named target_link_libraries links not only libraries but also target usage requirements, such as running extra scripts and adding include paths. It turns out CMake is great for encapsulation!

In the next post, we'll consider other options that do remove the tight coupling.