Skip to content

Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers #109538

Closed
@Sergio0694

Description

@Sergio0694

Description

We're hitting a 100% consistent hang during application shutdown, only on Native AOT. It seems that the finalizer thread and the UI thread (ASTA) are possibly in a deadlock, resulting in the application process remaining alive after closing the main window. After a few seconds, Windows proceeds to kill the process, which shows up in WER as a hang (which is expected). Only repros with Native AOT, whereas CoreCLR seems to work fine.

Reproduction Steps

I don't have a minimal repro. Please ping me on Teams for instructions on how to deploy the Store locally to repro. Alternatively, I can also share an MSIX package for sideloading, with instructions on how to install it for testing (and how to restore the retail Store after that).

Here is a memory dump on the process during the hang (process was paused from WinDbg on the presumed deadlock).

Expected behavior

The application should shutdown correctly when closing the window.

Actual behavior

Here's the two relevant stacktraces I see in WinDbg.

Finalizer thread (!FinalizerStart) (click to expand)
[0x0] ntdll!ZwWaitForMultipleObjects+0x14 [0x1] KERNELBASE!WaitForMultipleObjectsEx+0xe9 [0x2] combase!MTAThreadWaitForCall+0xfb [0x3] combase!MTAThreadDispatchCrossApartmentCall+0x2bc [0x4] combase!CSyncClientCall::SwitchAptAndDispatchCall+0x707 (Inline Function) (Inline Function) [0x5] combase!CSyncClientCall::SendReceive2+0x825 [0x6] combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x2f (Inline Function) (Inline Function) [0x7] combase!CSyncClientCall::SendReceiveInRetryContext+0x2f (Inline Function) (Inline Function) [0x8] combase!DefaultSendReceive+0x6e [0x9] combase!CSyncClientCall::SendReceive+0x300 [0xa] combase!CClientChannel::SendReceive+0x98 [0xb] combase!NdrExtpProxySendReceive+0x58 [0xc] RPCRT4!Ndr64pSendReceive+0x39 (Inline Function) (Inline Function) [0xd] RPCRT4!NdrpClientCall3+0x3de [0xe] combase!ObjectStublessClient+0x14c [0xf] combase!ObjectStubless+0x42 [0x10] combase!CObjectContext::InternalContextCallback+0x2fd [0x11] combase!CObjectContext::ContextCallback+0x902 [0x12] <MICROSOFT_STORE>!WinRT_Runtime_ABI_WinRT_Interop_IContextCallbackVftbl__ContextCallback+0x102 [0x13] <MICROSOFT_STORE>!WinRT_Runtime_WinRT_Context__CallInContext+0x87 [0x14] <MICROSOFT_STORE>!WinRT_Runtime_WinRT_ObjectReferenceWithContext_1<WinRT_Runtime_WinRT_Interop_IUnknownVftbl>__Release+0x64 [0x15] <MICROSOFT_STORE>!WinRT_Runtime_WinRT_IObjectReference__Dispose+0x5b [0x16] <MICROSOFT_STORE>!WinRT_Runtime_WinRT_IObjectReference__Finalize+0x17 [0x17] <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime___Finalizer__DrainQueue+0x7a [0x18] <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime___Finalizer__ProcessFinalizers+0x47 [0x19] <MICROSOFT_STORE>!FinalizerStart+0x56 [0x1a] KERNEL32!BaseThreadInitThunk+0x1d [0x1b] ntdll!RtlUserThreadStart+0x28 
UI thread (shcore!_WrapperThreadProc ApplicationView ASTA) (click to expand)
[0x0] win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14 [...] [0x5] combase!CoWaitForMultipleHandles+0xc2 [0x6] <MICROSOFT_STORE>!PalCompatibleWaitAny+0x63 [0x7] <MICROSOFT_STORE>!CLREventStatic::Wait+0xc6 [0x8] <MICROSOFT_STORE>!RhWaitForPendingFinalizers+0x90 [0x9] <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers+0x32 [0xa] <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers_0+0x21 [0xb] <MICROSOFT_STORE>!S_P_CoreLib_System_GC__WaitForPendingFinalizers+0x1b [0xc] <MICROSOFT_STORE>!S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IReferenceTrackerHost_ReleaseDisconnectedReferenceSources+0x24 [0xd] Windows_UI_Xaml!DirectUI::ReferenceTrackerManager::TriggerFinalization+0x34 [...] 

It seems that:

  • The window is closed
  • The lifecycle manager starts suspending the app
  • XAML decides it should finalize objects
  • ComWrappers::IReferenceTrackerHost::ReleaseDisconnectedReferenceSources is called
  • That in turn blocks on GC::WaitForPendingFinalizers
  • The finalizer thread gets to finalizing some ObjectReferenceWithContext<T> object
  • That object is from another context so it calls CallInContext (here)
  • That passes the callback function and invokes it in the target context (here)
  • That eventually just ends stuck on WaitForMultipleObjectsEx
  • The whole app just hangs until the OS just forcibly kills it

Some potentially relevant differences we noticed in the finalizer logic across CoreCLR (which works fine) and NativeAOT:

CoreCLR:

voidFinalizerThread::FinalizerThreadWait()
{
ASSERT(hEventFinalizerDone->IsValid());
ASSERT(hEventFinalizer->IsValid());
ASSERT(GetFinalizerThread());
// Can't call this from within a finalized method.
if (!IsCurrentThreadFinalizer())
{
// We may see a completion of finalization cycle that might not see objects that became
// F-reachable in recent GCs. In such case we want to wait for a completion of another cycle.
// However, since an object cannot be prevented from promoting, one can only rely on Full GCs
// to collect unreferenced objects deterministically. Thus we only care about Full GCs here.
int desiredFullGcCount =
GCHeapUtilities::GetGCHeap()->CollectionCount(GCHeapUtilities::GetGCHeap()->GetMaxGeneration());
GCX_PREEMP();
#ifdef FEATURE_COMINTEROP
// To help combat finalizer thread starvation, we check to see if there are any wrappers
// scheduled to be cleaned up for our context. If so, we'll do them here to avoid making
// the finalizer thread do a transition.
if (g_pRCWCleanupList != NULL)
g_pRCWCleanupList->CleanupWrappersInCurrentCtxThread();
#endif// FEATURE_COMINTEROP
tryAgain:
hEventFinalizerDone->Reset();
EnableFinalization();
// Under GC stress the finalizer queue may never go empty as frequent
// GCs will keep filling up the queue with items.
// We will disable GC stress to make sure the current thread is not permanently blocked on that.
GCStressPolicy::InhibitHolder iholder;
//----------------------------------------------------
// Do appropriate wait and pump messages if necessary
//----------------------------------------------------
DWORD status;
status = hEventFinalizerDone->Wait(INFINITE,TRUE);
// we use unsigned math here as the collection counts, which are size_t internally,
// can in theory overflow an int and wrap around.
// unsigned math would have more defined/portable behavior in such case
if ((int)((unsignedint)desiredFullGcCount - (unsignedint)g_fullGcCountSeenByFinalization) > 0)
{
// There were some Full GCs happening before we started waiting and possibly not seen by the
// last finalization cycle. This is rare, but we need to be sure we have seen those,
// so we try one more time.
goto tryAgain;
}
_ASSERTE(status == WAIT_OBJECT_0);
}
}

Native AOT:

EXTERN_C void QCALLTYPE RhWaitForPendingFinalizers(UInt32_BOOL allowReentrantWait)
{
// This must be called via p/invoke rather than RuntimeImport since it blocks and could starve the GC if
// called in cooperative mode.
ASSERT(!ThreadStore::GetCurrentThread()->IsCurrentThreadInCooperativeMode());
// Can't call this from the finalizer thread itself.
if (ThreadStore::GetCurrentThread() != g_pFinalizerThread)
{
// We may see a completion of finalization cycle that might not see objects that became
// F-reachable in recent GCs. In such case we want to wait for a completion of another cycle.
// However, since an object cannot be prevented from promoting, one can only rely on Full GCs
// to collect unreferenced objects deterministically. Thus we only care about Full GCs here.
int desiredFullGcCount =
GCHeapUtilities::GetGCHeap()->CollectionCount(GCHeapUtilities::GetGCHeap()->GetMaxGeneration());
tryAgain:
// Clear any current indication that a finalization pass is finished and wake the finalizer thread up
// (if there's no work to do it'll set the done event immediately).
g_FinalizerDoneEvent.Reset();
g_FinalizerEvent.Set();
// Wait for the finalizer thread to get back to us.
g_FinalizerDoneEvent.Wait(INFINITE, false, allowReentrantWait);
// we use unsigned math here as the collection counts, which are size_t internally,
// can in theory overflow an int and wrap around.
// unsigned math would have more defined/portable behavior in such case
if ((int)((unsignedint)desiredFullGcCount - (unsignedint)g_fullGcCountSeenByFinalization) > 0)
{
// There were some Full GCs happening before we started waiting and possibly not seen by the
// last finalization cycle. This is rare, but we need to be sure we have seen those,
// so we try one more time.
goto tryAgain;
}
}
}

It seems that they're similar, however:

  • CoreCLR passes alertable: TRUE
  • Native AOT passes alertable: FALSE
  • NativeAOT also passes the allowReentrantWait: TRUE param, which makes it call CoWaitForMultipleHandleshere

Not sure whether that's intentional (why?) and whether it's related to the issue, just something we noticed.

Regression?

No.

Known Workarounds

None, this is a blocker 😅

Configuration

  • VS 17.12 P5
  • .NET 9 RC2

Metadata

Metadata

Labels

Type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions

    close