Optimize representation of runfiles in compact execution log #23321

fmeum · 2024-08-16T21:05:34Z

Runfiles trees are now represented with a custom RunfilesTree message in the compact execution log. This allows using InputSets to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths.

Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an InputSet. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. Spawns), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries.

Progress on #18643.

RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format.

src/main/protobuf/spawn.proto

fmeum · 2024-08-22T15:07:14Z

I resolved the conflict and subsequently fixed the existing tests after the merge of the SymlinkAction PR.

src/main/protobuf/spawn.proto

src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java

src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java

src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java

src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java

src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java

src/main/protobuf/spawn.proto

fmeum · 2024-08-30T14:26:34Z

While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in InputSet loses some information about the order. For example, a directory might have come before a file, but InputSet doesn't allow me to see that.

@tjgq Since entry IDs are globally unique anyway, what do you think of merging file_ids, directory_ids and unresolved_symlink_ids into a single field that preserves the order?

tjgq · 2024-08-30T14:37:41Z

While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in InputSet loses some information about the order. For example, a directory might have come before a file, but InputSet doesn't allow me to see that.

@tjgq Since entry IDs are globally unique anyway, what do you think of merging file_ids, directory_ids and unresolved_symlink_ids into a single field that preserves the order?

Oof, it's a disruptive change for any tools already consuming the compact format, but that's exactly why I wanted the format to remain experimental until Bazel 8 is released. Let's do it.

fmeum · 2024-09-02T12:56:37Z

@tjgq I added a bunch of tests and addressed your comments. Since we were already making breaking changes to the format, I slightly tweaked how we represent IDs also in other parts of the protocol. I can revert that part of the change if you prefer.

src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java

src/main/protobuf/spawn.proto

src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java

src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java

tjgq

Thanks for this amazing contribution. I'll get started on the import.

fmeum · 2024-09-19T11:29:19Z

@bazel-io fork 7.4.0

tjgq · 2024-09-19T12:47:21Z

I'm making the following changes on import:

Use new field ID for InputSet.input_ids and Output.output_id instead of reusing an existing ID (makes it easier to write a polyglot tool, as discussed out of band)
Replace occurrences of bazel in tests with TestConstants.PRODUCT_NAME (which is blaze internally)

Runfiles trees are now represented with a custom `RunfilesTree` message in the compact execution log. This allows using `InputSet`s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths. Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an `InputSet`. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. `Spawn`s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries. Progress on #18643. RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format. Closes #23321. PiperOrigin-RevId: 676773599 Change-Id: I010653681ffa44557142bf25009e9178b5d68515

Runfiles trees are now represented with a custom `RunfilesTree` message in the compact execution log. This allows using `InputSet`s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths. Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an `InputSet`. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. `Spawn`s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries. Progress on bazelbuild#18643. RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format. Closes bazelbuild#23321. PiperOrigin-RevId: 676773599 Change-Id: I010653681ffa44557142bf25009e9178b5d68515 (cherry picked from commit c2f539c)

Cherry-picks the following changes: * Optimize representation of runfiles in compact execution log (bazelbuild#23321) * Keep runfiles tree IDs in memory for multiple test attempts (bazelbuild#23703) * Fix naming inconsistency in `spawn.proto` (bazelbuild#23706) * Mark tool runfiles as such in expanded execution log (bazelbuild#23702) The cherry-picks required introducing a `Map<Artifact, RunfilesTree>` shim to `RunfilesSupplier` that matches the Bazel 8 way of obtaining a `RunfilesTree` from a runfiles middleman via `InputMetadataProvider`. Closes bazelbuild#23683 Closes bazelbuild#23710 Closes bazelbuild#23711 Closes bazelbuild#23734

iancha1992 · 2024-10-11T20:52:55Z

The changes in this PR have been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.4.0rc1. Thanks!

fmeum force-pushed the runfiles-spawn-log branch from 1dfb461 to 0275370 Compare August 18, 2024 11:41

fmeum changed the title ~~Runfiles spawn log~~ Optimize representation of runfiles in compact execution log Aug 19, 2024

fmeum commented Aug 19, 2024

View reviewed changes

src/main/protobuf/spawn.proto Show resolved Hide resolved

fmeum force-pushed the runfiles-spawn-log branch from 0275370 to f71f8ad Compare August 21, 2024 10:43

fmeum marked this pull request as ready for review August 21, 2024 10:50

fmeum requested review from a team as code owners August 21, 2024 10:50

fmeum requested review from aranguyen and removed request for a team and aranguyen August 21, 2024 10:50

github-actions bot added team-Performance Issues for Performance teams team-Configurability platforms, toolchains, cquery, select(), config transitions team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Aug 21, 2024

fmeum removed team-Configurability platforms, toolchains, cquery, select(), config transitions team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Aug 21, 2024

fmeum requested a review from tjgq August 21, 2024 10:59

fmeum force-pushed the runfiles-spawn-log branch from 4304e70 to 7a3e2dd Compare August 22, 2024 15:06

tjgq requested changes Aug 27, 2024

View reviewed changes

fmeum force-pushed the runfiles-spawn-log branch 2 times, most recently from 94f75fc to dfe1deb Compare August 30, 2024 13:12

fmeum force-pushed the runfiles-spawn-log branch 2 times, most recently from 7cb21e0 to 147ffc2 Compare September 2, 2024 08:45

fmeum requested a review from tjgq September 2, 2024 12:55

fmeum commented Sep 2, 2024

View reviewed changes

src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java Show resolved Hide resolved

fmeum added 10 commits September 18, 2024 00:46

Add more docs

8637fe7

Inline var

b5efe8a

Do not flatten symlinks

dab1a5a

Better empty files test

2d87ea9

Update docs

cc18347

Simplify InputMetadataProvider

56b6861

Cleanup

818f3a0

Add docs and test

8eadcde

Simplify supporting both versions

8eb2cef

Remove unused methods

786440e

fmeum force-pushed the runfiles-spawn-log branch from a962ffa to 786440e Compare September 17, 2024 22:46

fmeum commented Sep 17, 2024

View reviewed changes

src/main/protobuf/spawn.proto Show resolved Hide resolved

fmeum requested a review from tjgq September 17, 2024 22:49

Fix build

baa6b38

tjgq requested changes Sep 18, 2024

View reviewed changes

Address comments

f8ea5b2

tjgq approved these changes Sep 19, 2024

View reviewed changes

bazel-io mentioned this pull request Sep 19, 2024

[7.4.0] Optimize representation of runfiles in compact execution log #23683

Closed

copybara-service bot closed this in c2f539c Sep 20, 2024

fmeum deleted the runfiles-spawn-log branch September 20, 2024 10:26

fmeum mentioned this pull request Sep 27, 2024

[7.4.0] Compact execution log improvements #23713

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize representation of runfiles in compact execution log #23321

Optimize representation of runfiles in compact execution log #23321

fmeum commented Aug 16, 2024 •

edited by tjgq

Loading

fmeum commented Aug 22, 2024

fmeum commented Aug 30, 2024

tjgq commented Aug 30, 2024

fmeum commented Sep 2, 2024

tjgq left a comment

fmeum commented Sep 19, 2024

tjgq commented Sep 19, 2024 •

edited

Loading

iancha1992 commented Oct 11, 2024

Optimize representation of runfiles in compact execution log #23321

Optimize representation of runfiles in compact execution log #23321

Conversation

fmeum commented Aug 16, 2024 • edited by tjgq Loading

fmeum commented Aug 22, 2024

fmeum commented Aug 30, 2024

tjgq commented Aug 30, 2024

fmeum commented Sep 2, 2024

tjgq left a comment

Choose a reason for hiding this comment

fmeum commented Sep 19, 2024

tjgq commented Sep 19, 2024 • edited Loading

iancha1992 commented Oct 11, 2024

fmeum commented Aug 16, 2024 •

edited by tjgq

Loading

tjgq commented Sep 19, 2024 •

edited

Loading