Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEHException with new Exception behavior in .NET 9 #111242

Open
kevin-montrose opened this issue Jan 9, 2025 · 7 comments · May be fixed by #111259
Open

SEHException with new Exception behavior in .NET 9 #111242

kevin-montrose opened this issue Jan 9, 2025 · 7 comments · May be fixed by #111259
Assignees
Labels
area-ExceptionHandling-coreclr in-pr There is an active PR which will close this issue when it is merged regression-from-last-release untriaged New issue has not been triaged by the area owner

Comments

@kevin-montrose
Copy link

kevin-montrose commented Jan 9, 2025

Description

Starting in .NET 9, pinvoking Lua's luaL_error can raise a SEHException.

I believe this is due to luaL_error being a relatively thin wrapper over longjmp.

I narrowed this down to the newly defaulted on improved exception handling in .NET 9.

Note that while I use KeraLua in my repro, the same thing happens if you pinvoke Lua directly - KeraLua is just a convenient way to package up a pre-built Lua 5.4 binary, and eliminate some of my own pinvoke definition from the repro.

Reproduction Steps

csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net8.0;net9.0</TargetFrameworks>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="KeraLua" Version="1.4.1" />
  </ItemGroup>

</Project>

Program.cs

using KeraLua;

namespace LongJumpRepro
{
    public class Program
    {
        private const string ErrorMsg = "!!RAISED!!";

        private static Lua? lua;

        public static void Main(string[] args)
        {
            // This works as expected in .NET 8
            // It raises a SEHException in .NET 9
            // The exception in .NET 9 goes away if you set environment variable DOTNET_LegacyExceptionHandling=1

            Console.WriteLine($".NET version: {Environment.Version}");
            Console.WriteLine($"DOTNET_LegacyExceptionHandling: {Environment.GetEnvironmentVariable("DOTNET_LegacyExceptionHandling")}");

            try
            {
                lua = new Lua();
                lua.PushCFunction(RaiseLuaError);
                var status = lua.PCall(0, 0, 0);
                Console.WriteLine($"Status (Expected {LuaStatus.ErrRun}): {status}");

                var errMsg = lua.CheckString(1);
                Console.WriteLine($"Error Message (Expected '{ErrorMsg}'): {ErrorMsg}");
            }
            finally
            {
                lua?.Dispose();
            }
        }

        private static int RaiseLuaError(nint luaState)
        => lua!.Error(ErrorMsg);
    }
}

Expected behavior

Expected behavior is either the .NET 8:
Image
or .NET 9 with DOTNET_LegacyExceptionHandling=1 behavior:
Image

Where luaL_error works when pinvoked.

Actual behavior

A SEHException is raised:
Image

Regression?

Yes, this worked in earlier versions of .NET. I have confirmed it worked in .NET 8, and this was reported when first running the code on .NET 9.

Known Workarounds

Setting DOTNET_LegacyExceptionHandling=1 fixes the issue in .NET 9.

Discussion on the PR that introduced it suggests it is temporary however, if it is removed in a later .NET release we will no longer have a workaround.

Configuration

  • .NET 8 and .NET 9
  • Windows 11
  • x64
  • I have not tested on other configurations
  • I am not using Blazor

Other information

While I admit longjmp is weird, embedding Lua is pretty popular.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 9, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 9, 2025
@vcsjones vcsjones added area-ExceptionHandling-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 9, 2025
@nazar554
Copy link

nazar554 commented Jan 9, 2025

Rust 1.24.1 release notes describe a similar issue
https://blog.rust-lang.org/2018/03/01/Rust-1.24.1.html#do-not-abort-when-unwinding-through-ffi
rust-lang/rust#48572

@janvorli
Copy link
Member

janvorli commented Jan 9, 2025

The problem is interesting. At the point .NET gets notified about the error raised by Lua, the call stack looks as shown below. The key point is that there is a block of Lua frames, then a block of .NET managed frames, then again a block of Lua frames and then again .NET managed frames.
.NET gets notified (the ProcessCLRException is called) when the SEH stack walk enters the frame 16. The EXCEPTION_RECORD that it gets as an argument has exception code 0x80000026, which is what Lua passed in and that code means STATUS_LONGJUMP. Since that's not .NET exception, we create a .NET exception, we create the System.Runtime.InteropServices.SEHException for it and propagate it through the .NET frames 16, 17, 18 and 19. Since there is nothing to catch that exception in those frames, the stack walk reaches the lua frame (1a). When we reach a native frame and we know there are some more .NET managed frames towards the bottom of the stack, we rethrow the exception we were propagating through the .NET frames, which is the SEHException, from the context of the frame 1a. But its exception code is obviously not the STATUS_LONGJUMP and so it is not caught by the Lua code and ends up being unhandled.

I need to figure out how to preserve the exception code to rethrow and then rethrow the original SEH exception.

 # Child-SP          RetAddr               Call Site
0d 0000007e`67f8b9f0 00007ffb`1e003edf     coreclr!ProcessCLRException+0x6e [D:\git\runtime7\src\coreclr\vm\exceptionhandling.cpp @ 998] 
0e 0000007e`67f8c390 00007ffb`1dec0502     ntdll!RtlpExecuteHandlerForUnwind+0xf [minkernel\ntos\rtl\amd64\xcptmisc.asm @ 254] 
0f 0000007e`67f8c3c0 00007ffb`1debe53c     ntdll!RtlUnwindEx+0x352 [minkernel\ntos\rtl\amd64\exdsptch.c @ 1538] 
10 0000007e`67f8caf0 00007ffa`9588d789     ntdll!RtlUnwind+0xfc [minkernel\ntos\rtl\amd64\exdsptch.c @ 1077] 
11 0000007e`67f8d060 00007ffa`9583e199     lua54!luaopen_utf8+0x34e19
12 0000007e`67f8d0b0 00007ffa`9583c1d9     lua54!lua_setlocal+0x13d9
13 0000007e`67f8d0e0 00007ffa`958316e0     lua54!luaopen_debug+0xaf9
14 0000007e`67f8d110 00007ffa`95833d6c     lua54!lua_error+0x30
15 0000007e`67f8d140 00007ff9`d0a9ff47     lua54!luaL_error+0x4c
16 0000007e`67f8d180 00007ff9`d0a9fe12     KeraLua!ILStubClass.IL_STUB_PInvoke(IntPtr, System.String)+0x117
17 0000007e`67f8d270 00007ff9`d0a9f9a3     KeraLua!KeraLua.Lua.Error+0x42
18 0000007e`67f8d2b0 00007ff9`d0a9f8d4     repro!LongJumpRepro.Program.RaiseLuaError+0x73
19 0000007e`67f8d310 00007ffa`9583e7ce     KeraLua!ILStubClass.IL_STUB_ReversePInvoke(Int64)+0x44
1a 0000007e`67f8d370 00007ffa`9583d8f4     lua54!lua_yieldk+0x31e
1b 0000007e`67f8d3c0 00007ffa`9583d02f     lua54!lua_setlocal+0xb34
1c 0000007e`67f8d410 00007ffa`9583de63     lua54!lua_setlocal+0x26f
1d 0000007e`67f8d440 00007ffa`9583d6d0     lua54!lua_setlocal+0x10a3
1e 0000007e`67f8d5c0 00007ffa`9583203d     lua54!lua_setlocal+0x910
1f 0000007e`67f8d600 00007ff9`d0a9f82c     lua54!lua_pcallk+0x10d
20 0000007e`67f8d650 00007ff9`d0a9de44     KeraLua!KeraLua.Lua.PCall+0x9c
21 0000007e`67f8d720 00007ffa`2f823753     repro!LongJumpRepro.Program.Main+0x204 [D:\issues\111242\Program.cs @ 24] 
22 0000007e`67f8d840 00007ffa`2f2a81a0     coreclr!CallDescrWorkerInternal+0x83 [D:\git\runtime7\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 74] 
23 0000007e`67f8d880 00007ffa`2f2a8d43     coreclr!CallDescrWorkerWithHandler+0x130 [D:\git\runtime7\src\coreclr\vm\callhelpers.cpp @ 63] 
24 0000007e`67f8d8e0 00007ffa`2ee1e714     coreclr!MethodDescCallSite::CallTargetWorker+0xb93 [D:\git\runtime7\src\coreclr\vm\callhelpers.cpp @ 585] 
25 0000007e`67f8e0d0 00007ffa`2ee6f747     coreclr!MethodDescCallSite::Call+0x24 [D:\git\runtime7\src\coreclr\vm\callhelpers.h @ 465] 
26 0000007e`67f8e100 00007ffa`2ee6f10a     coreclr!RunMainInternal+0x287 [D:\git\runtime7\src\coreclr\vm\assembly.cpp @ 1234] 
27 0000007e`67f8e300 00007ffa`2ee6f213     coreclr!``RunMain'::`30'::__Body::Run'::`5'::__Body::Run+0x5a [D:\git\runtime7\src\coreclr\vm\assembly.cpp @ 1306] 
28 0000007e`67f8e350 00007ffa`2ee6f445     coreclr!`RunMain'::`30'::__Body::Run+0xa3 [D:\git\runtime7\src\coreclr\vm\assembly.cpp @ 1308] 
29 0000007e`67f8e3f0 00007ffa`2ee67872     coreclr!RunMain+0x1c5 [D:\git\runtime7\src\coreclr\vm\assembly.cpp @ 1308] 
2a 0000007e`67f8e4f0 00007ffa`2ef5aa93     coreclr!Assembly::ExecuteMainMethod+0x542 [D:\git\runtime7\src\coreclr\vm\assembly.cpp @ 1434] 
2b 0000007e`67f8eb20 00007ffa`2fb2db78     coreclr!CorHost2::ExecuteAssembly+0x5b3 [D:\git\runtime7\src\coreclr\vm\corhost.cpp @ 349] 
2c 0000007e`67f8f010 00007ffa`958ded6b     coreclr!coreclr_execute_assembly+0x138 [D:\git\runtime7\src\coreclr\dlls\mscoree\exports.cpp @ 494] 
2d (Inline Function) --------`--------     hostpolicy!coreclr_t::execute_assembly+0x2d [D:\git\runtime7\src\native\corehost\hostpolicy\coreclr.cpp @ 108] 
2e 0000007e`67f8f100 00007ffa`958df04c     hostpolicy!run_app_for_context+0x68b [D:\git\runtime7\src\native\corehost\hostpolicy\hostpolicy.cpp @ 256] 
2f 0000007e`67f8f230 00007ffa`958df983     hostpolicy!run_app+0x3c [D:\git\runtime7\src\native\corehost\hostpolicy\hostpolicy.cpp @ 285] 
30 0000007e`67f8f270 00007ffa`f4d8d896     hostpolicy!corehost_main+0x163 [D:\git\runtime7\src\native\corehost\hostpolicy\hostpolicy.cpp @ 426] 
31 0000007e`67f8f370 00007ffa`f4d8fea5     hostfxr!execute_app+0x2e6 [D:\git\runtime7\src\native\corehost\fxr\fx_muxer.cpp @ 145] 
32 0000007e`67f8f400 00007ffa`f4d91f52     hostfxr!`anonymous namespace'::read_config_and_execute+0xa5 [D:\git\runtime7\src\native\corehost\fxr\fx_muxer.cpp @ 532] 
33 0000007e`67f8f4f0 00007ffa`f4d90472     hostfxr!fx_muxer_t::handle_exec_host_command+0x142 [D:\git\runtime7\src\native\corehost\fxr\fx_muxer.cpp @ 1007] 
34 0000007e`67f8f590 00007ffa`f4d88440     hostfxr!fx_muxer_t::execute+0x482 [D:\git\runtime7\src\native\corehost\fxr\fx_muxer.cpp @ 578] 
35 0000007e`67f8f6d0 00007ff6`63bd8bb2     hostfxr!hostfxr_main_startupinfo+0xa0 [D:\git\runtime7\src\native\corehost\fxr\hostfxr.cpp @ 63] 
36 0000007e`67f8f7d0 00007ff6`63bd904e     dotnet!exe_start+0x7e2 [D:\git\runtime7\src\native\corehost\corehost.cpp @ 253] 
37 0000007e`67f8f9b0 00007ff6`63bda3a8     dotnet!wmain+0x11e [D:\git\runtime7\src\native\corehost\corehost.cpp @ 324] 
38 (Inline Function) --------`--------     dotnet!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 
39 0000007e`67f8fa20 00007ffb`1cece8d7     dotnet!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
3a 0000007e`67f8fa60 00007ffb`1df7fbcc     KERNEL32!BaseThreadInitThunk+0x17 [clientcore\base\win32\client\thread.c @ 77] 
3b 0000007e`67f8fa90 00000000`00000000     ntdll!RtlUserThreadStart+0x2c [minkernel\ldr\rtlstrt.c @ 1184] 

@janvorli
Copy link
Member

janvorli commented Jan 9, 2025

It is even more interesting. Re-raising the same SEH exception as .NET got notified about doesn't help for this case, because longjmp doesn't actually use RaiseException. So we will need to re-issue the longjmp, extracting the arguments from the EXCEPTION_RECORD. Or alternatively, maybe ignore the STATUS_LONGJUMP in the new EH. I have to check how the old EH behaves w.r.t. processing the STATUS_LONGJUMP that skips over managed frames.

@janvorli
Copy link
Member

janvorli commented Jan 9, 2025

So we cannot ignore it, the old EH also doesn't ignore it, so there can be e.g. finally handlers invoked while the longjmp jumps over managed frames. I've experimented with re-running longjmp instead of doing RaiseException when the exception escapes managed frames and is propagated into native ones and it seems to work correctly.

@jkotas
Copy link
Member

jkotas commented Jan 9, 2025

Using longjump to skip over managed frames had all sort of issues, even in the old EH scheme. We have it explicitly documented as unsupported in https://learn.microsoft.com/en-us/dotnet/standard/native-interop/exceptions-interoperability#setjmplongjmp-behaviors

@janvorli
Copy link
Member

Ok, then I guess that fixing this specific issue by making it just skip over the managed frames without invoking finallys is the right way to do it. That would be also aligned with Unix where we could not even see there is a longjmp over managed frames.

janvorli added a commit to janvorli/runtime that referenced this issue Jan 10, 2025
The new exception handling has broken hosting Lua on Windows when
a Lua code throws an exception that ends up crossing managed frames.
Lua uses longjmp for its exception handling mechanism and the new
exception handling reported the longjmp as an unhandled `SEHException`.

This change makes the new EH ignore the `STATUS_LONGJUMP` exception
when it passes through managed frames, so it can reach the target
location. It is a best effort fix though, as finallys in the skipped
frames won't be called.

Close dotnet#111242
@janvorli janvorli linked a pull request Jan 10, 2025 that will close this issue
@dotnet-policy-service dotnet-policy-service bot added in-pr There is an active PR which will close this issue when it is merged labels Jan 10, 2025
@jkotas
Copy link
Member

jkotas commented Jan 10, 2025

On Unix, longjump does not work at all since we are not able to unwind transition frames. It means that the key runtime data structures get corrupted and you are very likely going to crash soon after. #1445 has an example of the crash that it leads to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ExceptionHandling-coreclr in-pr There is an active PR which will close this issue when it is merged regression-from-last-release untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants