Opened 2 years ago
Last modified 22 months ago
#17896 new bug
static (hidden) thread_local values do not work in some cases
Reported by: | waddlesplash | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | System/runtime_loader | Version: | R1/beta3 |
Keywords: | Cc: | korli | |
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
gnutls makes use of this, and it had to be disabled to fix crashes: https://github.com/haikuports/haikuports/commit/cda76a62712b91647892ba01f1a5d0dcb38cadc5
Attachments (2)
Change History (23)
comment:1 by , 2 years ago
Cc: | added |
---|
comment:2 by , 2 years ago
Hmm, I guess this counts as "static TLS", which would be #13294. But we should get an error and the program shouldn't load at all in that case...
comment:3 by , 2 years ago
Ah, right, since this is a shared object, it does not count as "static TLS", I don't think, so this should in theory work. But it appears at least in this test application that putting the "_Thread_local" variable outside the function and deleting the "static" doesn't make it work either, in fact in that case I get all 5 having the same pointer value!
comment:4 by , 2 years ago
A bit more testing shows that __tls_get_addr
in libroot is indeed returning bad values:
__tls_get_addr: thread 2224, module 0, offset (nil) -> 0x48bc539990 __tls_get_addr: thread 2221, module 0, offset (nil) -> 0x48bc5398f0 __tls_get_addr: thread 2225, module 0, offset (nil) -> 0x48bc5398f0 __tls_get_addr: thread 2222, module 0, offset (nil) -> 0x48bc5398f0 __tls_get_addr: thread 2223, module 0, offset (nil) -> 0x48bc5398f0
That pushes the problem up into runtime_loader indeed.
comment:5 by , 2 years ago
This does not seem to be a race condition. Adding snooze(100000)
into both for
loops actually has the effect that we always seem to get the same pointer for all threads!
comment:6 by , 2 years ago
Got it: the TLS_DYNAMIC_THREAD_VECTOR has different pointers for each thread (good, that would mean base TLS is broken otherwise) but they all point to the same place. Now how did that happen...
comment:7 by , 2 years ago
Ah, so what is happening is that the threads are all exiting before the next one even starts, and so the TLS memory gets freed when the thread exits and then reused when the next thread starts. Adding snooze(10000);
to the end of the thread function and removing the snoozes from elsewhere shows all variables have different values indeed.
I then readded the static
and things still worked just fine. So I guess I am not sure what the problem with gnutls is that it works when the thread-local variables are non-static...
comment:8 by , 2 years ago
<X512> I found that after writing to TLS variable it read as zero. <X512> Removing static fixed that. <X512> I tried to make mores simple case, but it worked fine... <X512> Write value to static TLS variable and immediately after read it. It will become zero. <X512> I suspect that GCC use some optimization for static TLS variable that may use TLS model unsupported in Haiku.
comment:9 by , 2 years ago
More likely it seems to me is that somehow the generation-accounting logic in runtime_loader's elf_tls.cpp is not correct, and blocks are getting spuriously deleted and recreated. Comparing it to FreeBSD I note a bunch of differences, but based on commit messages some of these may be intentional.
by , 2 years ago
Attachment: | static-vs-non.diff added |
---|
comment:10 by , 2 years ago
Here's a difference of the gcc -S output when compiling random.c from GNUTLS with static
vs no static
on the variable declarations. Note the .value 0x6666
: what's that about?
comment:11 by , 2 years ago
I also did find what I am pretty sure is an issue in the TLS DTV handling, but I think it's not necessarily related to this: https://review.haiku-os.org/c/haiku/+/5604
comment:12 by , 2 years ago
Description: | modified (diff) |
---|---|
Summary: | static (hidden) thread_local values do not actually get unique pointers → static (hidden) thread_local values do not work in some cases |
comment:13 by , 2 years ago
This actually sounds like our gcc config is broken in some way.
For example, in https://github.com/haiku/buildtools/blob/master/binutils/bfd/elf64-x86-64.c, searching for 0x6666
shows several comments where the code is meant to be transformed from the 0x6666
into proper TLS calls.
The fact that 0x6666
remaining in the generated output indicates mis-compilation, and perhaps that at some point, gcc thinks the static TLS model is available, and then this code fails to do the transform into static TLS.
It definitely needs deeper investigation, but it certainly looks like the fault is not in runtime_loader, unless we actually go and implement static TLS.
comment:14 by , 2 years ago
Here's the full command invocation being used to compile the file:
gcc -DHAVE_CONFIG_H -I. -I.. -DLOCALEDIR=\"/packages/gnutls-3.7.4-1/.self/data/locale\" -I./../gl -I./../gl -I./includes -I./x509 -I./includes -I./includes -I./x509 -I/packages/libtasn1-4.18.0-1/.self/develop/headers -I/packages/p11_kit-0.23.18.1-2/.self/develop/headers/p11-kit-1 -Wtype-limits -fanalyzer -fno-common -Wall -Wbad-function-cast -Wcast-align=strict -Wdate-time -Wdisabled-optimization -Wdouble-promotion -Wduplicated-branches -Wduplicated-cond -Wextra -Winit-self -Winvalid-pch -Wlogical-op -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wnull-dereference -Wold-style-definition -Wopenmp-simd -Wpacked -Wpointer-arith -Wshadow -Wstrict-prototypes -Wsuggest-attribute=cold -Wsuggest-attribute=format -Wsuggest-attribute=malloc -Wsuggest-final-methods -Wsuggest-final-types -Wsync-nand -Wtrampolines -Wuninitialized -Wunknown-pragmas -Wunused-macros -Wvariadic-macros -Wvector-operation-performance -Wwrite-strings -Warray-bounds=2 -Wattribute-alias=2 -Wformat-overflow=2 -Wformat=2 -Wformat-truncation=2 -Wimplicit-fallthrough=5 -Wshift-overflow=2 -Wunused-const-variable=2 -Wvla-larger-than=4031 -Wno-analyzer-double-free -Wno-analyzer-malloc-leak -Wno-analyzer-null-dereference -Wno-analyzer-use-after-free -Wno-missing-field-initializers -Wno-unused-parameter -Wno-format-truncation -Wimplicit-fallthrough=2 -Wabi=11 -fdiagnostics-show-option -fno-builtin-strcmp -I/packages/nettle-3.7.3-1/.self/develop/headers -I/packages/libtasn1-4.18.0-1/.self/develop/headers -I/packages/libidn2-2.0.5-2/.self/develop/headers -I/packages/p11_kit-0.23.18.1-2/.self/develop/headers/p11-kit-1 -g -O2 -MT random.lo -MD -MP -MF .Tpo -c random.c -DPIC -o .libs/random.o
All warnings, includes, defines, etc. excluded, there is only:
-fanalyzer -fno-common -fdiagnostics-show-option -fno-builtin-strcmp -g -O2
None of which should actually affect TLS behavior to my knowledge.
comment:15 by , 2 years ago
The fact that 0x6666 remaining in the generated output indicates mis-compilation
I don't think so, I just tried compiling this same testcase on Linux and I see the 0x6666 in the assembly output.
comment:16 by , 2 years ago
jessicah found this document which explains the different kinds of thread-local storage: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter8-20.html
<jessicah> can probably identify the TLS accesses being used from the disassembly <jessicah> from what I can tell, all accesses were LD changed to GD <jessicah> so that'd imply that our TLS accesses, which I guess runtime_loader is responsible for, are wrong
comment:17 by , 2 years ago
lmms has another example of disabling TLS: https://github.com/haikuports/haikuports/blob/14c2cab5428145b93232cb69683a67bbe68a9f06/media-sound/lmms/patches/lmms-1.2.2.patchset#L140
comment:18 by , 2 years ago
I learned from someone trying to port an application to Haiku that their TLS variable read as 0 after fork
. I haven't yet investigated that myself, but it sounds like a place we haven't looked yet.
comment:19 by , 2 years ago
From JustineTunney, this is the workaround added:
extern _Thread_local struct Machine *g_machine;
pid = fork(); #ifdef __HAIKU__ g_machine = m; #endif
I am told that g_machine
was only 0 in the child process; it wasn't unset in the parent.
by , 22 months ago
Attachment: | test-static-thread_local.c added |
---|
CC korli, perhaps you have some idea about what to do here?