Concurrency
Much of the work of a build process is embarrassingly
parallel. At least modestly so.
Example
Consider compiling a dozens of source files into a binary. For
C/C++ you first the source files into object files, and then you
link object files together to create a binary. While you can’t link
the software while you compile object files, you can build all or
most of the object files at the same time.
There are two basic approaches to implementing a concurrent build
system. concurrent systems (and a large number of variations,) but
all approaches require breaking app art the entire build process into
many smaller sequences of logical “jobs” or sub-tasks. Smaller units
of work make it possible to construct current build systems that are
capable of parallel execution. To model a build system either:
- Take all granular tasks and specify dependency information. The
build tool will assemble a dependency graph (directed acylcic
graph, or DAG) and then transverse the graph to determine the
execution order and potential for parallelism of each task.
- Split the process into a series of sequences and stages where a
stage refers to a group of tasks with no dependencies, and
sequences refer to a specific ordering of tasks that depend upon
each other. Build processes with multiple stages are themselves a
sequence.
The graph method is the more generically applicable case, and makes it
possible to add new kinds of tasks with new sets of dependencies
without having a global view of the build system. However, stage
based-approaches make the possibilities for parallelism more explicit
and may be easier to maintain for some kinds projects.
Even if you use a build system that supports graph analysis, you can
use a stage-based metaphor to think about the overriding
architecture of the build system.
Process Creation
There is a certain fixed cost to running a program, including commands
in shell instances. There is a trade-off between breaking the build
process into smaller components that require the build tool to create
larger numbers of processes and having larger “step” that can
ameliorate the process creation costs.
In general, if you have a task that requires non-trivial disk I/O and
CPU use, then process creation is probably worth the cost; however,
creating hundreds or even several dozen of processes per second will
impede performance at some point.
Maintenance Costs and Discovery
As projects develop and grow, build systems necessarily grow and gain
complexity. The best build systems account for the potential for
growth and provide ways to add new components to the software with
no or minimal build system changes. Realistically, small build system
changes are always needed, but the implementation of build systems
should attempt to minimize the kind amount of specialized knowledge of
the build process or the architecture of the system.
Ideally, the build system or the meta-build tool can generate the
build system based on the names of files, or other
information. Nevertheless, its inevitable that developers will need to
add new build targets and change build processes throughout the course
of development. The best build systems will be able to ameliorate
these costs and make the build process as extensible as possible.
User Interaction
A factor in the complexity and difficulty of Building a project is
that most build operations, depending on project, are not “rebuild
everything” operations. Often a developer or user will need to build
only a single component. The build system must provide easy to use
methods that allow developers to build only the parts that they need
without over building.