#4782 closed bug (fixed)
gcc4 builds fail to boot
Reported by: | augiedoggie | Owned by: | axeld |
---|---|---|---|
Priority: | blocker | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
Something has changed between hrev33544 and hrev33575 which causes gcc4 builds to stop at the boot screen before any icons are lit. This was observed with my own builds on real hardware and by testing nightly images from haiku-files via virtualbox.
I tested: haiku-nightly-hrev33544-x86gcc4hybrid-raw.zip -- works haiku-nightly-hrev33575-x86gcc4hybrid-raw.zip -- doesn't boot
Attachments (1)
Change History (23)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Can confirm that a hrev33575 x86gcc4hybrid fails to launch in VirtualBox and a x86gcc2hybrid boots fine. (Both re-configured and jam -qaj2'ed)
I don't have the serial log from VirtualBox at hand, but it was something like: Kernel running on CPU 0 "" .... It does not light up any icons. Perhaps related to the boot_loader bfs changes?
Haven't dared to test on real hw, yet.
comment:3 by , 15 years ago
Same issue on real hw too, with a hrev33575. I may post serial log from bootloader, but not before the next 8 hours...
comment:4 by , 15 years ago
I have same issue on real hardware. So tested haiku-nightly-hrev33575-x86gcc4-vmware.zip in virtualbox. log added.
comment:6 by , 15 years ago
Component: | - General → System/Kernel |
---|---|
Owner: | changed from | to
Platform: | All → x86 |
Version: | R1/alpha1 → R1/Development |
comment:7 by , 15 years ago
Priority: | normal → blocker |
---|
comment:8 by , 15 years ago
Could anyone look into this with a GCC4 build at hand? I mean not just reverting; I have no idea why GCC4 would interpret this incorrectly.
comment:9 by , 15 years ago
comment:10 by , 15 years ago
Seems since hrev33547 the (mutex), (recursive_lock) and (rw_lock) casts are interpreted by gcc4 as the ones tydef'ined in the new headers/private/shared/locks.h, not the usual headers/private/kernel/lock.h ones. Maybe gcc2.95.3 and gcc4 includes order is not exactly the same. Or maybe I'm just blind stupid, dunno yet.
Anyway, having two different typedef'ed mutex structs doesn't sound like a good idea, except if we take some measure to warrant that they can't be used in wrong context (kernel vs userland). Doesn't seems the case currently.
follow-up: 13 comment:11 by , 15 years ago
locks.h is included by exactly one file in the whole tree: src/system/libroot/posix/malloc_debug/heap.cpp the two headers also don't have the same name (lock.h vs. locks.h). If it is indeed included and used then something's very wrong. There are a few shared headers between libroot and the kernel, but this isn't the case for this one and I wouldn't know how it could happen. It's easily verified if the header is included by adding an #error directive to it and check if the kernel still builds with that.
comment:12 by , 15 years ago
That's what I've found:
It doesn't boot because it runs into a "PANIC: _mutex_lock(): double lock of..." which is caused by an uninitialized mutex (sObjectCacheListLock in Slab.cpp). The contents of that mutex are all 0. Taking it further apart, it is all 0 because the variable ends up in the .bss section. This was not the case before, where it would end up in the .data section instead and be properly initialized (the includes are fine btw, the preprocessed output looks exactly as expected).
Research shows that what we do now is called a compound literal: http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Compound-Literals.html
As it is described there, that this is supported for static variables is only a GCC extension. Either they changed their mind since then, or it is a bug/regression in the GCC4 we are using. I've looked over their bugzilla, but nothing really pointing in that direction turned up. I didn't look too closely though. I did also not find any workarounds to achieve the desired effect.
comment:13 by , 15 years ago
Replying to mmlr:
locks.h is included by exactly one file in the whole tree: src/system/libroot/posix/malloc_debug/heap.cpp the two headers also don't have the same name (lock.h vs. locks.h). If it is indeed included and used then something's very wrong. There are a few shared headers between libroot and the kernel, but this isn't the case for this one and I wouldn't know how it could happen. It's easily verified if the header is included by adding an #error directive to it and check if the kernel still builds with that.
You're right, obviously. Unfortunatly, I didn't have time to dive deeper late night, neither to get skilled enough to dive deeper efficiently ;-)
comment:14 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in hrev33592. Thanks for the investigation, Michael! This has to be a regression, as otherwise the compiler should warn you about what is happening.
comment:15 by , 15 years ago
I was about to file a bug report at the GCC bugzilla, but after trying to distill out a simpler testcase I now know what is actually going on. The initializer is still there, but the initialization of the locks at hand aren't done by putting them in the data section anymore. Instead the init is moved to the static_initialization_and_destruction function. We don't call this init function in the kernel. If it is safe to call it, then we should probably just do it. I am not sure how safe it is to be called in early kernel startup, but it can basically be just initializing variables to constant values, so unproblematic.
The problem is that there is not a single init function for obvious reasons. Looking at a kernel disassembly there are currently 14 instances of such a function spread for various files. Judging simply by the fact that there are such functions, we probably miss some static init in GCC4 right now.
comment:16 by , 15 years ago
Calling the global static initialization should be OK in principle. We should check whether there are static kernel variables (with complex constructors) for which the constructors are expected not to be called automatically, though. Also, we should call the module constructors and destructors.
follow-up: 18 comment:17 by , 15 years ago
How to check for complex constructors, though? And when to call them? Modules are easy, though, and that should definitely be done.
follow-up: 19 comment:18 by , 15 years ago
Replying to axeld:
How to check for complex constructors, though?
How about using objdump to find out which ones are (should be called) and reading their source code?
And when to call them?
At some point early in the boot process. I suppose normally static constructors won't do much besides initializing some data structures, so that shouldn't matter all that much. Stuff like memory allocations aren't suitable for constructors anyway.
follow-up: 20 comment:19 by , 15 years ago
Replying to bonefish:
How about using objdump to find out which ones are (should be called) and reading their source code?
Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.
And when to call them?
At some point early in the boot process. I suppose normally static constructors won't do much besides initializing some data structures, so that shouldn't matter all that much. Stuff like memory allocations aren't suitable for constructors anyway.
True enough.
follow-up: 21 comment:20 by , 15 years ago
Replying to axeld:
Replying to bonefish:
How about using objdump to find out which ones are (should be called) and reading their source code?
Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.
What "them" do you mean? I thought we're talking about constructors, but then your sentence doesn't make sense.
Other than that, we could put the static constructors into a separate segment and free it after kernel initialization.
follow-up: 22 comment:21 by , 15 years ago
Replying to bonefish:
Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.
What "them" do you mean? I thought we're talking about constructors, but then your sentence doesn't make sense.
It did to me ;-) When we move constructors into a separate segment, the constructor calling code could ignore them, giving you the opportunity to initialize them manually.
Other than that, we could put the static constructors into a separate segment and free it after kernel initialization.
Yes, even though that would probably only be a few KB, Linux does that as well, btw.
comment:22 by , 15 years ago
Replying to axeld:
Replying to bonefish:
Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.
What "them" do you mean? I thought we're talking about constructors, but then your sentence doesn't make sense.
It did to me ;-) When we move constructors into a separate segment, the constructor calling code could ignore them, giving you the opportunity to initialize them manually.
The constructor calling code -- I assume you mean crtbegin.o, which we don't even link against ATM -- shall ignore the constructors, so we can invoke them manually? Sorry, I'm still at a loss.
Other than that, we could put the static constructors into a separate segment and free it after kernel initialization.
Yes, even though that would probably only be a few KB, Linux does that as well, btw.
I wouldn't only put the constructors in there, but all initialization code. That's probably more than only a few KB.
PS: Let's take further discussion to a more suitable place.
damn formatting!
This one works:
haiku-nightly-hrev33544-x86gcc4hybrid-raw.zip
This one doesn't
haiku-nightly-hrev33575-x86gcc4hybrid-raw.zip