Previously we explored a way to have some degree of dynamic dependencies in Bazel (input "subsetting") using TreeArtifacts.
The particular use case modeled in the previous gist involved:
- a language with somewhat coarse
libraryandbinaryrules (i.e. each rule describes a collection of files and their collective dependencies — the default for most Bazel rulesets) - a monolithic (and slow!) compiler whose compilation unit size is the entire binary (rather than something smaller like modules or source files)
- i.e. exacerbating the pain of not having "perfect" file-level dependency information
- source files that can be easily (and quickly) scanned to determine which dependencies are unused
To increase cache hit rates (a.k.a. to keep binaries from being rebuilt as a result of changes to inputs that are not actually used to produce the binary being built), the previous gist employs a "scan-deps" action that runs before the compiler is invoked and "winnows" the set of inputs, producing a subset in a TreeArtifact.
Important
As mentioned in the previous gist, normally the way this would be modeled in Bazel is to use unused_inputs_list; i.e.:
- the compiler action would produce a list of files that were not actually needed
- subsequent builds (at least, with the same SkyFrame in-memory state/persistent action cache) will have the compiler action not be sensitive to changes to the inputs that were declared as unused
There are a couple of reasons to prefer a priori (explicit) input pruning to the unused_inputs_list approach:
- incremental build correctness:
unused_inputs_listhinges on the tool accurately describing what inputs it does not need- if the tool incorrectly lists an input as unused:
- changes to that input will not result in rebuilds of the action
- clean builds (where the action is rebuilt) will return different results (i.e. a correctness issue)
- in contrast, with a priori input pruning the compiler action is executed (and sandboxed) such that the unused inputs are not available:
- if the analysis about which inputs are unused was incorrect, the action will fail to execute
- if the tool incorrectly lists an input as unused:
- CI/remote caching:
- in Bazel (and
buck2),unused_inputs_listis only able to eliminate rebuilds if an action has already executed on a particular daemon- i.e. information about which inputs have been "pruned" from the action cannot make it into the action cache; it lives on a particular machine either in-memory (SkyFrame) or in the persistent action cache (on disk within an execroot)
- this means that (for CI set ups where there isn't a persistent Bazel daemon) CI will be unable to leverage this unused dependency information and will need to rebuild
- this is where
buck2is a little different; see "remote dep files"
- this is where
- in contrast, with a priori input pruning, the "scan-deps" action will have to run when unused inputs change but the subsequent compiler action will not (will hit in the cache, ECO)
- in Bazel (and
The downsides to a priori input pruning (with TreeArtifacts) are:
- longer critical path; in the case where you do have to rerun the actual action (i.e. run the compiler) you're doing work twice
- reading in the files to scan for deps and again when the actual action runs
- just running the compiler would have been faster
- (the assumption is that we're dealing with a tool where )
- more complexity
- you have to produce a "copy" (or symlink tree subset, at least) of the inputs and feed it to the actual action which can lead to some annoying issues
- i.e. paths in diagnostics looking "wrong"
In this gist we model the same use case as the original gist except using shadowed_actions instead of subsetting inputs via TreeArtifact symlinks.
The upsides here are mostly ergonomics/reduced complexity:
- don't need to adjust inputs or paths in flags/diagnostics output for the actual action
- let's us sidestep
TreeArtifactweirdness
Tip
The "double" aspect is our way of smuggling the ScanDeps Action (which we want to produce) as part of an aspect into the rule that wishes to use it as a shadowed_action.
(you can get the Action directly within the confines of one rule using _skylark_testable and ctx.created_actions() but that's not what that API is intended for...)
Warning
An artificial dependency from the ScanDep action to the Compile action is required to force the ScanDep action to be built + its inputs to be pruned before the Compile action runs
Otherwise, the Compile action will just run with the full (unpruned) set of inputs.
Important
This patch is needed to get Bazel to inherit only the unpruned set of inputs from the shadowed action:
Without this patch, Compile inherits all the inputs from ScanDeps, including the pruned inputs.
When running with the patch linked above you should see:
❯ bazel build //:a
INFO: From ScanDeps a.inner.unused_hdrs:
5 header(s) available
3 header(s) used
INFO: From Compile a:
3 header(s) available
Target //:a up-to-date:
bazel-bin/a❯ echo "hey" >> e.header
❯ bazel build //:a
Target //:a up-to-date:
bazel-bin/a
INFO: Elapsed time: 0.047s, Critical Path: 0.00s
INFO: 1 process: 2 action cache hit, 1 internal.
# no rebuilds❯ echo "hey" >> c.header
❯ bazel build //:a
INFO: From ScanDeps a.inner.unused_hdrs:
5 header(s) available
3 header(s) used
INFO: From Compile a:
3 header(s) available
INFO: Found 1 target...
Target //:a up-to-date:
bazel-bin/a
INFO: Elapsed time: 0.072s, Critical Path: 0.04s
INFO: 3 processes: 1 internal, 2 linux-sandbox.
# `a` *is* rebuilt❯ echo "include(d)" >> a.header
❯ bazel build //:a
INFO: From ScanDeps a.inner.unused_hdrs:
5 header(s) available
5 header(s) used
INFO: From Compile a:
5 header(s) available
INFO: Found 1 target...
Target //:a up-to-date:
bazel-bin/a
INFO: Elapsed time: 0.120s, Critical Path: 0.06s
INFO: 3 processes: 1 internal, 2 linux-sandbox.
# `a` is rebuilt; d and e were now made available to the `Compile` action