-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache-unfriendly filesystem usage, memory fragmentation and ARC #16978
Comments
Have you tried to set |
I have not, but will try that out and report back after it's had some time to bake on a host with extended uptime. Thanks for the pointer! |
Hi, I have been suffering from the exact same issue ever since upgrading to zfs 2.2.0. I'd like to add a few things:
I just reproduced it and had a watch on both arc_summary (notice the nice "Available memory size") and /proc/slabinfo running in parallel. Note that I copied the frozen terminal output into a text file since the system was no longer responsive. I hope it contains the relevant information you might be looking for. Otherwise I can try to extract some information on a partially bricked system where indexing has been killed before the system becomes unresponsive. arc_summary.txt zfs_arc_shrinker_limit is already 0 (I double checked) on my setup as I am running zfs 2.3.0 right now, Kernel 6.12.8.
The issue is reproducible moments after the system has rebooted for me so no causing "memory fragmentation" required at all. If I can provide or try anything else, lmk. |
@XyFreak In the
"since upgrading to zfs 2.2.0" is a long time of about a year. Since it is not a widely noticed issue, there must be something specific in your case that triggers it. It could help to find out what exactly is your "indexing", how does it access the files and how to reproduce it in minimal environment. |
System information
Describe the problem you're observing
After moderate uptime of a few weeks, when a program tries to read or index the whole filesystem or a large chunk of it, the system seizes up, becomes unresponsive to input/network for 15-20 minutes. Eventually it recovers to a sluggish but usable state (with the offending process still running, consuming core time and disk I/O) where a tool like atop can be used to observe lingering heavy free page scan activity, - despite up to 10GiB of free/avail memory! (Linux page cache has been zeroed by this time.)
ARC is maxed out at 97% (almost 50% of system RAM according to the default settings).
Examining /proc/buddyinfo, there are no free pages >= 1MiB in the steady state and can be even worse right after the "seizure" with no free pages >= 128KiB.
I suspect the partial recovery is thanks to kcompactd activity. I am thinking that ZFS should drop cached file blocks from ARC not just when the kernel low watermark is reached, but also when higher order free pages become exhausted.
Describe how to reproduce the problem
Simulate normal memory fragmentation on a host, including multiple hibernate/resume cycles, then run duplicity, tracker3-miner, or similar programs which ingest the whole filesystem in a cache-unfriendly and ZFS-unfriendly way while monitoring the situation with atop.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: