Opened 4 months ago

Closed 3 months ago

#18769 closed bug (fixed)

ICU upgrade boot failure - hung mutex

Reported by: kallisti5 Owned by: nobody
Priority: blocker Milestone: R1/beta5
Component: - General Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking: #18565
Platform: All

Description (last modified by kallisti5)

Upgrading ICU from 66 to 74 results in an infinitely hanging mutex from launch_daemon (via libroot via libroot-addon-icu.so)

Maybe related #17468

Screenshots attached. See https://review.haiku-os.org/c/haiku/+/7346 for patches to test.

Flagging high since this is an R1/beta5 blocker

Attachments (3)

ICU_hang.png (171.0 KB ) - added by kallisti5 4 months ago.
backtrace
icu_mutex.png (12.1 KB ) - added by kallisti5 4 months ago.
user_mutex info
icu_mutex_info.png (7.4 KB ) - added by kallisti5 4 months ago.
mutex info

Download all attachments as: .zip

Change History (13)

by kallisti5, 4 months ago

Attachment: ICU_hang.png added

backtrace

by kallisti5, 4 months ago

Attachment: icu_mutex.png added

user_mutex info

by kallisti5, 4 months ago

Attachment: icu_mutex_info.png added

mutex info

comment:1 by kallisti5, 4 months ago

Description: modified (diff)

comment:2 by pulkomandy, 4 months ago

So it seems pretty clear:

  • libbe construction initializes locale kit
  • this eventually calls TimeZone::createDefault()
  • this calls tzset via localtime_r
  • In our case, tzset is implemented using ICU
  • So, tzset ends up calling TimeZone::createDefault() again
  • Deadlock because ICU didn't expect that it would end up calling itself.

So, one option is to patch ICU and delete the call to detectHostTimeZone. Since we use ICU to implement the C locale support, that function is not needed: it's up to ICU to decide what the host timezone is.

We can do that either by patching TimeZone::createDefault in ICU, or by calling TimeZone::setDefault somewhere in initialization (before we get to using TimeZone::createDefault) to avoid ICU defaulting to an "host" timezone.

ICU says in its sourcecode that this recursion shouldn't be a problem: https://github.com/unicode-org/icu/blob/main/icu4c/source/i18n/timezone.cpp#L549

Note that we already had this problem and we had patched ICU buildsystem to not call tzset: https://github.com/haikuports/haikuports/blob/master/dev-libs/icu/patches/icu74-74.1.patchset#L62

but here, instead of calling tzset, it calls localtime_r, which happens to be implemented using tzset. So, similarly patching ICU to not call localtime_r could also be an option.

Looking at how this ends up calling localtime_r, it happens here: https://github.com/unicode-org/icu/blob/main/icu4c/source/common/putil.cpp#L1255 as a last chance to figure out a timezone after several increasingly desperate attempts to identify the current timezone.

Looking at this function, it would be enough to make sure U_TZNAME is not set during ICU build, and then it would always return an empty string, and we would be fine. Our patch to ICU should already do that. So, is U_TZNAME set to 0 instead of being unset, or something like that? Or is our patch to remove it not done as it should?

comment:3 by kallisti5, 4 months ago

Thanks for the explanation. That all makes sense (I did briefly check our patches, and HAS_TZNAME / HAS_TZSET is still set to 0 to disable this feature)

I raised https://unicode-org.atlassian.net/browse/ICU-22648 because the behavior we're seeing is counter to what the source code comments say.

Let me boot up Haiku, and I'll validate the patching-out of these features in ICU.

comment:4 by pulkomandy, 4 months ago

The code in putil.cpp uses #ifdef U_TZNAME at line 1234, I think by reading our patch, U_TZNAME should not be set, and we should then exit the function at line 1272.

I'm not sure why they don't use HAS_TZNAME for the preprocessor check, and I'm not sure if U_TZNAME is undefined or defined to 0.

Given our patches to ICU buildsystem, it may very well be our fault that this isn't working as it should. At least we should link the patch in that ICU bugreport so they don't waste time trying to understand a problem that's actually on our side...

comment:5 by pulkomandy, 4 months ago

Blocking: 18565 added

comment:6 by kallisti5, 3 months ago

Priority: highblocker

comment:8 by kallisti5, 3 months ago

currently testing with this additional ICU patchset..

From f3c4a38b820c4a042240829644ae3c6798d87b6b Mon Sep 17 00:00:00 2001
From: Alexander von Gluck IV <kallisti5@unixzen.com>
Date: Tue, 6 Feb 2024 15:04:41 -0600
Subject: [PATCH] common: prevent potential deadlocks back to ICU

---
 source/common/putilimp.h | 2 ++
 source/configure.ac      | 1 +
 2 files changed, 3 insertions(+)

diff --git a/source/common/putilimp.h b/source/common/putilimp.h
index 50ff4b5..29dcf0e 100644
--- a/source/common/putilimp.h
+++ b/source/common/putilimp.h
@@ -145,6 +145,8 @@ typedef size_t uintptr_t;
 #endif
 #elif U_PLATFORM == U_PF_OS400
    /* not defined */
+#elif U_PLATFORM == U_HAIKU
+   /* not defined, (well it is but a loop back to icu) */
 #else
 #   define U_TZNAME tzname
 #endif
diff --git a/source/configure.ac b/source/configure.ac
index 6b33e74..41a357d 100644
--- a/source/configure.ac
+++ b/source/configure.ac
@@ -809,6 +809,7 @@ AC_SUBST(U_HAVE_POPEN)
 U_HAVE_TZSET=0
 AC_SUBST(U_HAVE_TZSET)
 AC_SUBST(U_TZSET)
+CONFIG_CPPFLAGS="${CONFIG_CPPFLAGS} -DU_HAVE_TZSET=0"
 
 U_HAVE_TZNAME=0
 AC_SUBST(U_HAVE_TZNAME)
-- 
2.42.1

comment:9 by kallisti5, 3 months ago

This tested successful re-fixed in https://github.com/haikuports/haikuports/commit/ab2c87f72ccd38a32f51a39735bdbe674b26d893 (mostly because it follows the style a previous commit mentioned upstream wants)

Working on a rebuild of dependent packages now for a new set of build-packages.

comment:10 by kallisti5, 3 months ago

Resolution: fixed
Status: newclosed

merged in hrev57575!

Note: See TracTickets for help on using tickets.