Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#4782 closed bug (fixed)

gcc4 builds fail to boot

Reported by: augiedoggie Owned by: axeld
Priority: blocker Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

Something has changed between hrev33544 and hrev33575 which causes gcc4 builds to stop at the boot screen before any icons are lit. This was observed with my own builds on real hardware and by testing nightly images from haiku-files via virtualbox.

I tested: haiku-nightly-hrev33544-x86gcc4hybrid-raw.zip -- works haiku-nightly-hrev33575-x86gcc4hybrid-raw.zip -- doesn't boot

Attachments (1)

log.txt (1.1 KB) - added by mt 10 years ago.
haiku-nightly-hrev33575-x86gcc4-vmware virtualbox log

Download all attachments as: .zip

Change History (23)

comment:1 Changed 10 years ago by augiedoggie

damn formatting!

This one works:

haiku-nightly-hrev33544-x86gcc4hybrid-raw.zip

This one doesn't

haiku-nightly-hrev33575-x86gcc4hybrid-raw.zip

comment:2 Changed 10 years ago by michael.weirauch

Can confirm that a hrev33575 x86gcc4hybrid fails to launch in VirtualBox and a x86gcc2hybrid boots fine. (Both re-configured and jam -qaj2'ed)

I don't have the serial log from VirtualBox at hand, but it was something like: Kernel running on CPU 0 "" .... It does not light up any icons. Perhaps related to the boot_loader bfs changes?

Haven't dared to test on real hw, yet.

comment:3 Changed 10 years ago by phoudoin

Same issue on real hw too, with a hrev33575. I may post serial log from bootloader, but not before the next 8 hours...

comment:4 Changed 10 years ago by mt

I have same issue on real hardware. So tested haiku-nightly-hrev33575-x86gcc4-vmware.zip in virtualbox. log added.

Changed 10 years ago by mt

Attachment: log.txt added

haiku-nightly-hrev33575-x86gcc4-vmware virtualbox log

comment:5 Changed 10 years ago by phoudoin

My bet is on hrev33547.

comment:6 in reply to:  5 Changed 10 years ago by michael.weirauch

Component: - GeneralSystem/Kernel
Owner: changed from nobody to axeld
Platform: Allx86
Version: R1/alpha1R1/Development

Replying to phoudoin:

My bet is on hrev33547.

Confirmed. hrev33546 boots, hrev33547 does not in VirtualBox.

comment:7 Changed 10 years ago by phoudoin

Priority: normalblocker

comment:8 Changed 10 years ago by axeld

Could anyone look into this with a GCC4 build at hand? I mean not just reverting; I have no idea why GCC4 would interpret this incorrectly.

comment:9 Changed 10 years ago by phoudoin

I'm rebuilding gcc4hybrid with hrev33547 & hrev33548 reverted right now. I'll look into it, but every help is welcome: I'm not a gcc4 expert, far from it.

comment:10 Changed 10 years ago by phoudoin

Seems since hrev33547 the (mutex), (recursive_lock) and (rw_lock) casts are interpreted by gcc4 as the ones tydef'ined in the new headers/private/shared/locks.h, not the usual headers/private/kernel/lock.h ones. Maybe gcc2.95.3 and gcc4 includes order is not exactly the same. Or maybe I'm just blind stupid, dunno yet.

Anyway, having two different typedef'ed mutex structs doesn't sound like a good idea, except if we take some measure to warrant that they can't be used in wrong context (kernel vs userland). Doesn't seems the case currently.

comment:11 Changed 10 years ago by mmlr

locks.h is included by exactly one file in the whole tree: src/system/libroot/posix/malloc_debug/heap.cpp the two headers also don't have the same name (lock.h vs. locks.h). If it is indeed included and used then something's very wrong. There are a few shared headers between libroot and the kernel, but this isn't the case for this one and I wouldn't know how it could happen. It's easily verified if the header is included by adding an #error directive to it and check if the kernel still builds with that.

comment:12 Changed 10 years ago by mmlr

That's what I've found:

It doesn't boot because it runs into a "PANIC: _mutex_lock(): double lock of..." which is caused by an uninitialized mutex (sObjectCacheListLock in Slab.cpp). The contents of that mutex are all 0. Taking it further apart, it is all 0 because the variable ends up in the .bss section. This was not the case before, where it would end up in the .data section instead and be properly initialized (the includes are fine btw, the preprocessed output looks exactly as expected).

Research shows that what we do now is called a compound literal: http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Compound-Literals.html

As it is described there, that this is supported for static variables is only a GCC extension. Either they changed their mind since then, or it is a bug/regression in the GCC4 we are using. I've looked over their bugzilla, but nothing really pointing in that direction turned up. I didn't look too closely though. I did also not find any workarounds to achieve the desired effect.

comment:13 in reply to:  11 Changed 10 years ago by phoudoin

Replying to mmlr:

locks.h is included by exactly one file in the whole tree: src/system/libroot/posix/malloc_debug/heap.cpp the two headers also don't have the same name (lock.h vs. locks.h). If it is indeed included and used then something's very wrong. There are a few shared headers between libroot and the kernel, but this isn't the case for this one and I wouldn't know how it could happen. It's easily verified if the header is included by adding an #error directive to it and check if the kernel still builds with that.

You're right, obviously. Unfortunatly, I didn't have time to dive deeper late night, neither to get skilled enough to dive deeper efficiently ;-)

comment:14 Changed 10 years ago by axeld

Resolution: fixed
Status: newclosed

Fixed in hrev33592. Thanks for the investigation, Michael! This has to be a regression, as otherwise the compiler should warn you about what is happening.

comment:15 Changed 10 years ago by mmlr

I was about to file a bug report at the GCC bugzilla, but after trying to distill out a simpler testcase I now know what is actually going on. The initializer is still there, but the initialization of the locks at hand aren't done by putting them in the data section anymore. Instead the init is moved to the static_initialization_and_destruction function. We don't call this init function in the kernel. If it is safe to call it, then we should probably just do it. I am not sure how safe it is to be called in early kernel startup, but it can basically be just initializing variables to constant values, so unproblematic.

The problem is that there is not a single init function for obvious reasons. Looking at a kernel disassembly there are currently 14 instances of such a function spread for various files. Judging simply by the fact that there are such functions, we probably miss some static init in GCC4 right now.

comment:16 Changed 10 years ago by bonefish

Calling the global static initialization should be OK in principle. We should check whether there are static kernel variables (with complex constructors) for which the constructors are expected not to be called automatically, though. Also, we should call the module constructors and destructors.

comment:17 Changed 10 years ago by axeld

How to check for complex constructors, though? And when to call them? Modules are easy, though, and that should definitely be done.

comment:18 in reply to:  17 ; Changed 10 years ago by bonefish

Replying to axeld:

How to check for complex constructors, though?

How about using objdump to find out which ones are (should be called) and reading their source code?

And when to call them?

At some point early in the boot process. I suppose normally static constructors won't do much besides initializing some data structures, so that shouldn't matter all that much. Stuff like memory allocations aren't suitable for constructors anyway.

comment:19 in reply to:  18 ; Changed 10 years ago by axeld

Replying to bonefish:

How about using objdump to find out which ones are (should be called) and reading their source code?

Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.

And when to call them?

At some point early in the boot process. I suppose normally static constructors won't do much besides initializing some data structures, so that shouldn't matter all that much. Stuff like memory allocations aren't suitable for constructors anyway.

True enough.

comment:20 in reply to:  19 ; Changed 10 years ago by bonefish

Replying to axeld:

Replying to bonefish:

How about using objdump to find out which ones are (should be called) and reading their source code?

Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.

What "them" do you mean? I thought we're talking about constructors, but then your sentence doesn't make sense.

Other than that, we could put the static constructors into a separate segment and free it after kernel initialization.

comment:21 in reply to:  20 ; Changed 10 years ago by axeld

Replying to bonefish:

Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.

What "them" do you mean? I thought we're talking about constructors, but then your sentence doesn't make sense.

It did to me ;-) When we move constructors into a separate segment, the constructor calling code could ignore them, giving you the opportunity to initialize them manually.

Other than that, we could put the static constructors into a separate segment and free it after kernel initialization.

Yes, even though that would probably only be a few KB, Linux does that as well, btw.

comment:22 in reply to:  21 Changed 10 years ago by bonefish

Replying to axeld:

Replying to bonefish:

Hm, I would prefer a more clear/descriptive mechanism, like putting them into a different ELF section, and then skip them when calling the constructors.

What "them" do you mean? I thought we're talking about constructors, but then your sentence doesn't make sense.

It did to me ;-) When we move constructors into a separate segment, the constructor calling code could ignore them, giving you the opportunity to initialize them manually.

The constructor calling code -- I assume you mean crtbegin.o, which we don't even link against ATM -- shall ignore the constructors, so we can invoke them manually? Sorry, I'm still at a loss.

Other than that, we could put the static constructors into a separate segment and free it after kernel initialization.

Yes, even though that would probably only be a few KB, Linux does that as well, btw.

I wouldn't only put the constructors in there, but all initialization code. That's probably more than only a few KB.

PS: Let's take further discussion to a more suitable place.

Note: See TracTickets for help on using tickets.