-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize representation of runfiles in compact execution log #23321
Conversation
1dfb461
to
0275370
Compare
0275370
to
f71f8ad
Compare
4304e70
to
7a3e2dd
Compare
I resolved the conflict and subsequently fixed the existing tests after the merge of the |
src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java
Outdated
Show resolved
Hide resolved
94f75fc
to
dfe1deb
Compare
While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in @tjgq Since entry IDs are globally unique anyway, what do you think of merging |
Oof, it's a disruptive change for any tools already consuming the compact format, but that's exactly why I wanted the format to remain experimental until Bazel 8 is released. Let's do it. |
7cb21e0
to
147ffc2
Compare
@tjgq I added a bunch of tests and addressed your comments. Since we were already making breaking changes to the format, I slightly tweaked how we represent IDs also in other parts of the protocol. I can revert that part of the change if you prefer. |
src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java
Show resolved
Hide resolved
a962ffa
to
786440e
Compare
src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/exec/CompactSpawnLogContext.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/exec/SpawnLogReconstructor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this amazing contribution. I'll get started on the import.
@bazel-io fork 7.4.0 |
I'm making the following changes on import:
|
Runfiles trees are now represented with a custom `RunfilesTree` message in the compact execution log. This allows using `InputSet`s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths. Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an `InputSet`. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. `Spawn`s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries. Progress on #18643. RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format. Closes #23321. PiperOrigin-RevId: 676773599 Change-Id: I010653681ffa44557142bf25009e9178b5d68515
Runfiles trees are now represented with a custom `RunfilesTree` message in the compact execution log. This allows using `InputSet`s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths. Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an `InputSet`. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. `Spawn`s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries. Progress on bazelbuild#18643. RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format. Closes bazelbuild#23321. PiperOrigin-RevId: 676773599 Change-Id: I010653681ffa44557142bf25009e9178b5d68515 (cherry picked from commit c2f539c)
Runfiles trees are now represented with a custom `RunfilesTree` message in the compact execution log. This allows using `InputSet`s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths. Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an `InputSet`. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. `Spawn`s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries. Progress on bazelbuild#18643. RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format. Closes bazelbuild#23321. PiperOrigin-RevId: 676773599 Change-Id: I010653681ffa44557142bf25009e9178b5d68515 (cherry picked from commit c2f539c)
Runfiles trees are now represented with a custom `RunfilesTree` message in the compact execution log. This allows using `InputSet`s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths. Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an `InputSet`. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g. `Spawn`s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries. Progress on bazelbuild#18643. RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format. Closes bazelbuild#23321. PiperOrigin-RevId: 676773599 Change-Id: I010653681ffa44557142bf25009e9178b5d68515 (cherry picked from commit c2f539c)
Cherry-picks the following changes: * Optimize representation of runfiles in compact execution log (bazelbuild#23321) * Keep runfiles tree IDs in memory for multiple test attempts (bazelbuild#23703) * Fix naming inconsistency in `spawn.proto` (bazelbuild#23706) * Mark tool runfiles as such in expanded execution log (bazelbuild#23702) The cherry-picks required introducing a `Map<Artifact, RunfilesTree>` shim to `RunfilesSupplier` that matches the Bazel 8 way of obtaining a `RunfilesTree` from a runfiles middleman via `InputMetadataProvider`. Closes bazelbuild#23683 Closes bazelbuild#23710 Closes bazelbuild#23711 Closes bazelbuild#23734
The changes in this PR have been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. |
Runfiles trees are now represented with a custom
RunfilesTree
message in the compact execution log. This allows usingInputSet
s to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths.Since runfile paths can collide, this change makes it necessary to preserve the order of elements in an
InputSet
. The previous representation as repeated ID fields for each type (file, symlink, directory) made this impossible, so the representation has been modified to reference all direct entry IDs in a single repeated field. Since this also reduces the potential for type mismatches between the ID field type and the referenced message type, all other typed IDs are replaced with untyped ID fields. By slightly tweaking the way IDs are generated for nested entries and not emitting IDs for entries that are never referenced (e.g.Spawn
s), IDs are now consecutive, which simplifies the (possibly concurrent) bookkeeping for consumers by allowing them to use an array to store the entries.Progress on #18643.
RELNOTES: The compact execution log now stores runfiles in a more compact representation that should reduce the memory overhead and log output size, in particular for test spawns. This change required breaking changes to the (experimental) log format.