Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs 2.3.0 wont boot / hanging #16966

Open
ChristophSchmidpeter opened this issue Jan 20, 2025 · 4 comments
Open

zfs 2.3.0 wont boot / hanging #16966

ChristophSchmidpeter opened this issue Jan 20, 2025 · 4 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ChristophSchmidpeter
Copy link

ChristophSchmidpeter commented Jan 20, 2025

System information

Type Version/Name
Distribution Name ArchLinux
Distribution Version some month old / latest
Kernel Version 6.10.9.zen1-2 / 6.12.10.zen1-1
Architecture x64
OpenZFS Version 2.3.0

Describe the problem you're observing

When I only update zfs-dkms zfs-utils to 2.3.0 the zfs root system won't boot ( hangs before systemd prints boot messages)
When I update zfs to 2.3.0 an the kernel to latest (6.12.10.zen1-1) it boots, but the system is unusable: there are constantly hangs every few seconds. Even typing into kitty is hardly possible because of hanging/stuttery system.When I update the whole Arch system I gat the same behaviour.

Describe how to reproduce the problem

Update zfs-dkms zfs-utils to 2.3.0

Include any warning/errors/backtraces from the system logs

dmesg.txt

journalctl.txt

@ChristophSchmidpeter ChristophSchmidpeter added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 20, 2025
@justinpryzby
Copy link

What was the prior version of zfs that you were running ?
How much RAM does the server have ?
How large is your ARC ? (c in /sbin/arcstat)
Could you show if the system is swapping? (Check in "vmstat 1").
What does this show ? cat /sys/kernel/mm/ksm/run

@ChristophSchmidpeter
Copy link
Author

ChristophSchmidpeter commented Jan 21, 2025

@justinpryzby

What was the prior version of zfs that you were running ?

2.2.6

How much RAM does the server have ?

It is a Framework 16:
64GB DDR5-5600 (2x32GB)
AMD Ryzen™ 7 7840HS
AMD Radeon™ 780M
No dGPU
Primary: 1 x M.2 2280 NVMe (Root pool) (Sandisk Corp PC SN740 NVMe SSD)
Secondary: 1 x M.2 2230 NVMe (Sabrent Q4 2230 M.2 NVMe Gen 4 2 TB

How large is your ARC ? (c in /sbin/arcstat)

2.2.6/6.10.9: 2.3G
2.3.0/6.12.10:1.8G

Could you show if the system is swapping? (Check in "vmstat 1").

2.2.6/6.10.9: ~
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st gu
0 0 0 52261840 4044 1313936 0 0 39 182 1948 5 1 1 98 0 0 0
0 0 0 52260036 4044 1313992 0 0 0 0 5368 7312 1 1 97 1 0 0
1 0 0 52260596 4044 1314008 0 0 0 0 5857 7559 1 2 96 1 0 0
0 0 0 52271432 4044 1314008 0 0 0 3368 6950 9364 1 2 96 1 0 0
0 1 0 52278972 4044 1314008 0 0 0 0 6812 8775 1 2 96 1 0 0

2.3.0/6.12.10
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st gu
0 0 0 53025556 4044 1152576 0 0 8507 2214 13725 9 4 2 92 1 0 0
0 0 0 53057084 4044 1152632 0 0 0 0 3923 4692 1 0 98 0 0 0
0 0 0 53089288 4044 1152632 0 0 80 4320 5345 5013 0 1 99 0 0 0
0 0 0 53099712 4044 1152632 0 0 0 0 4662 4540 0 1 99 0 0 0
0 0 0 53109264 4044 1152636 0 0 0 0 4023 3963 0 0 99 0 0 0
0 0 0 53112936 4044 1155772 0 0 0 480 3489 3517 0 0 99 0 0 0
0 0 0 53121732 4044 1155756 0 0 0 0 3696 3634 0 0 99 0 0 0
0 0 0 53120168 4044 1155764 0 0 0 0 3715 3727 0 0 99 0 0 0
0 0 0 53122612 4044 1152692 0 0 0 4224 4870 5177 0 1 98 1 0 0
0 0 0 53136528 4044 1149056 0 0 0 0 4815 4337 2 0 98 0 0 0
0 0 0 53138548 4044 1149056 0 0 0 0 3318 3381 0 0 99 0 0 0
0 0 0 53141612 4044 1149056 0 0 0 0 3437 3422 0 0 99 0 0 0
0 0 0 53145084 4044 1149084 0 0 0 0 3748 3682 0 0 99 0 0 0
0 1 0 53160464 4044 1149056 0 0 0 2992 5084 5648 0 1 99 0 0 0

htop shows in both cases :
Mem <10G/57.9G
Swp 0K/48G

What does this show ? cat /sys/kernel/mm/ksm/run

2.2.6/6.10.9:
0

2.3.0/6.12.10:
0

@terencejferraro
Copy link

Possibly not related, as I'm able to boot fine, but after upgrading to 2.3.0 from 2.2.6, I have seen significant system instability across all of my machines that were upgraded.

2 different laptops that are stationary psuedo-desktops (one is 12th Gen Intel(R) Core(TM) i7-12700H 64GB RAM, other is 13th Gen Intel(R) Core(TM) i7-13700HX 192GB ram), 2 desktop servers (Intel(R) Core(TM) i9-14900K, 192GB ram), 1 desktop server (Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz 256GB ram). Various NVME drives in each that are using zfs.

Was running kernel 6.12.6 with zfs 2.2.6 without issue for about a month. After upgrading zfs to 2.3.0, the best uptime I've had across any of the machines was about 36 hours, averaging about 12-24 hours before I get some sort of kernel hang and have to physically power cycle the machine. I tried upgrading kernel to 6.12.10 in case there was a kernel related issue, but the issues continue to persist.

Most of the time, it hits with something like this:
kernel: [322438.659931] BUG: kernel NULL pointer dereference, address: 0000000000000080

Sometimes I'm seeing this:
0010:arc_free_data_impl.constprop.0+0x5c/0x160

Or this:
kernel: [ 2617.687834] note: z_rd_int_2[3220] exited with irqs disabled

Sadly, I upgraded my pools when I upgraded to zfs 2.3.0, otherwise I would have downgraded back to 2.2.6.

Not sure just how big the performance impact will be, but my next step is going to be to try disabling the primarycache/secondarycache and see if that makes any difference.

@ChristophSchmidpeter
Copy link
Author

ChristophSchmidpeter commented Jan 22, 2025

Added dmesg.txt and journalctl.txt to description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants