Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-129201: Use prefetch in GC mark alive phase. #129203

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Jan 22, 2025

This PR implements prefetching as suggested in the issue. The benchmarks that are part of pyperformance generally don't show much difference since they don't use enough memory to make prefetch effective.

The benchmark results below come from my own workstation (AMD Ryzen 5 7600X, Linux, GCC 12.2.0). I'm using --enable-optimizations but LTO is turned off. The "prefetch off" means the prefetch CPU instruction is omitted, otherwise code is the same.

There is something bad happening with the default build GC is gc.disable() is not used when creating a big data structure. When enabled, the "gc big" benckmark takes 5x as long. My guess is that the generation 0 collections are shuffling the objects in the gc next/prev linked list but that's only a guess.

Source code for the two "big" benchmarks. These create a fairly large object graph and then call gc.collect() to time it.

gc big tree
gc big

The "bm_gc_collect" benchmark was taken from pyperformance and the constants adjusted: CYCLES = 100_000 and LINKS = 40.

The "prefetch (7f756eb0)" code branch is essentially the same as this PR (1b4e8c3). I just rebased it on the current main and removed some dead code.

code branch build prefetch on benchmark time [ms]
benchmark: bm_async_tree_io_tg
prefetch (7f756eb0) nogil yes bm_async_tree_io_tg 6,752
prefetch (7f756eb0) nogil no bm_async_tree_io_tg 6,964
main (3829104) nogil - bm_async_tree_io_tg 6,960
main (3829104) default - bm_async_tree_io_tg 7,323
benchmark: bm_gc_collect
prefetch (7f756eb0) nogil yes bm_gc_collect 1,418
prefetch (7f756eb0) nogil no bm_gc_collect 1,517
main (3829104) nogil - bm_gc_collect 1,473
main (3829104) default - bm_gc_collect 20,966
benchmark: gc big
prefetch (7f756eb0) nogil yes gc big 587
prefetch (7f756eb0) nogil no gc big 1,064
main (3829104) nogil - gc big 963
main (3829104) default - gc big 5,785
benchmark: gc big tree
prefetch (7f756eb0) nogil yes gc big tree 363
prefetch (7f756eb0) nogil no gc big tree 519
main (3829104) nogil - gc big tree 723
main (3829104) default - gc big tree 3,443
benchmark: gc big (gc disabled)
main (3829104) default - gc big (gc disabled) 641

When traversing a list or tuple, use a "span" if the buffer can't hold
all the items from the collection.  This reduces the size of the object
stack needed if large collections are encountered.  It also helps
keeps the buffer size optimal for prefetching.
@nascheme nascheme force-pushed the gh-129201-gc-mark-prefetch branch from fe9898a to 7f51104 Compare January 24, 2025 07:49
It's possible for lists or tuples to have a NULL item.  Handle that
in the case that all item elements fit into the buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant