gh-129201: Use prefetch in GC mark alive phase. #129203
Draft
+288
−55
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements prefetching as suggested in the issue. The benchmarks that are part of pyperformance generally don't show much difference since they don't use enough memory to make prefetch effective.
The benchmark results below come from my own workstation (AMD Ryzen 5 7600X, Linux, GCC 12.2.0). I'm using
--enable-optimizations
but LTO is turned off. The "prefetch off" means the prefetch CPU instruction is omitted, otherwise code is the same.There is something bad happening with the default build GC is
gc.disable()
is not used when creating a big data structure. When enabled, the "gc big" benckmark takes 5x as long. My guess is that the generation 0 collections are shuffling the objects in the gc next/prev linked list but that's only a guess.Source code for the two "big" benchmarks. These create a fairly large object graph and then call
gc.collect()
to time it.gc big tree
gc big
The "bm_gc_collect" benchmark was taken from pyperformance and the constants adjusted:
CYCLES = 100_000
andLINKS = 40
.The "prefetch (7f756eb0)" code branch is essentially the same as this PR (1b4e8c3). I just rebased it on the current main and removed some dead code.