Much of the work of a build process is embarrassingly
parallel. At least modestly so.
Consider compiling a dozens of source files into a binary. For
C/C++ you first the source files into object files, and then you
link object files together to create a binary. While you can’t link
the software while you compile object files, you can build all or
most of the object files at the same time.
There are two basic approaches to implementing a concurrent build
system. concurrent systems (and a large number of variations,) but
all approaches require breaking app art the entire build process into
many smaller sequences of logical “jobs” or sub-tasks. Smaller units
of work make it possible to construct current build systems that are
capable of parallel execution. To model a build system either:
- Take all granular tasks and specify dependency information. The
build tool will assemble a dependency graph (directed acylcic
graph, or DAG) and then transverse the graph to determine the
execution order and potential for parallelism of each task.
- Split the process into a series of sequences and stages where a
stage refers to a group of tasks with no dependencies, and
sequences refer to a specific ordering of tasks that depend upon
each other. Build processes with multiple stages are themselves a
The graph method is the more generically applicable case, and makes it
possible to add new kinds of tasks with new sets of dependencies
without having a global view of the build system. However, stage
based-approaches make the possibilities for parallelism more explicit
and may be easier to maintain for some kinds projects.
Even if you use a build system that supports graph analysis, you can
use a stage-based metaphor to think about the overriding
architecture of the build system.
There is a certain fixed cost to running a program, including commands
in shell instances. There is a trade-off between breaking the build
process into smaller components that require the build tool to create
larger numbers of processes and having larger “step” that can
ameliorate the process creation costs.
In general, if you have a task that requires non-trivial disk I/O and
CPU use, then process creation is probably worth the cost; however,
creating hundreds or even several dozen of processes per second will
impede performance at some point.
Performance Analysis and Rebuilding
When testing a build system test both the total run time of an
operation and the percentage of CPU utilization. Test these
aspects of the build process as you develop your build system to
measure progress and performance.
There are three general tests of build performance:
Build From Scratch
These tests measure the total time to build all dependencies.
In general this measure the largest amount of time that a build can
take. This time should be reasonably stable between different build
These tests measure the amount of overhead of the build system
tool. Collect no-op build from re-running a build immediately
following a successful build.
These tests measure the amount of time a common rebuild takes. This
measure the more typical amount of work required for testing small to
moderate size changes on the build system.
These times can vary a lot depending on the shape of the dependency
graph and the way that the build system handles rebuilding. If,
these measures equal or approach “from scratch” times, then the
build process itself is not very incremental or there’s another
inefficiency in the dependency graph.
Incremental rebuilds are great for development productivity because
they minimize unnecessary work and allow quick feedback cycles during
the development process.
Maintenance Costs and Discovery
As projects develop and grow, build systems necessarily grow and gain
complexity. The best build systems account for the potential for
growth and provide ways to add new components to the software with
no or minimal build system changes. Realistically, small build system
changes are always needed, but the implementation of build systems
should attempt to minimize the kind amount of specialized knowledge of
the build process or the architecture of the system.
Ideally, the build system or the meta-build tool can generate the
build system based on the names of files, or other
information. Nevertheless, its inevitable that developers will need to
add new build targets and change build processes throughout the course
of development. The best build systems will be able to ameliorate
these costs and make the build process as extensible as possible.
A factor in the complexity and difficulty of Building a project is
that most build operations, depending on project, are not “rebuild
everything” operations. Often a developer or user will need to build
only a single component. The build system must provide easy to use
methods that allow developers to build only the parts that they need
without over building.