Opened 18 years ago
Closed 18 years ago
#1018 closed bug (fixed)
Booting on Athlon 64 X2 fails with both cpus enabled (vmware)
Reported by: | ekdahl | Owned by: | marcusoverhagen |
---|---|---|---|
Priority: | blocker | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | axeld, geist | |
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
Booting stops at the boot screen when enabling both cpus. Using only one cpu works. I'm attaching serial logs for one cpu and both cpus. Tested in vmware workstation with hrev20122.
Attachments (11)
Change History (36)
by , 18 years ago
Attachment: | serial_output_1_cpu.txt added |
---|
by , 18 years ago
Attachment: | serial_output_2_cpus.txt added |
---|
comment:1 by , 18 years ago
follow-up: 3 comment:2 by , 18 years ago
I remember hearing that apm was dangerous on SMP... Still, I used to load the apm driver in R5 on a dual celeron (BP6) and it worked fine for powering down. But since it's the last stuff showing up in the log... did you try disabling it ?
follow-up: 4 comment:3 by , 18 years ago
Replying to mmu_man:
I remember hearing that apm was dangerous on SMP... Still, I used to load the apm driver in R5 on a dual celeron (BP6) and it worked fine for powering down. But since it's the last stuff showing up in the log... did you try disabling it ?
It makes no difference, debug output is exactly the same, so I'm wondering if it really gets disabled. I tried disabling it both in kernel settings file and boot menu.
comment:4 by , 18 years ago
Replying to ekdahl:
It makes no difference, debug output is exactly the same, so I'm wondering if it really gets disabled. I tried disabling it both in kernel settings file and boot menu.
Is your boot menu is "Disable Hyper-Threading" ? My Core2Duo machine was so, (C2D does not support HT) I disable supports_hyper_threading() in boot/platform/bios_ia32/smp.cpp (set to always return false) then "disable smp" from boot menu, now run well.
comment:5 by , 18 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Priority: | normal → blocker |
I have the same problem here with Core 2 Duo E6600. The regression occurred in hrev20072.
I'm going to debug this.
comment:6 by , 18 years ago
Status: | new → assigned |
---|
comment:7 by , 18 years ago
i think i fixed it with change 20154. The new cpuid code was writing to the current cpu structure before it was set up on non boot cpus. The solution was to change the ordering of initialization a bit on non boot cpus, which isn't a generally great solution but should work for now. See if it repros on your machine.
comment:8 by , 18 years ago
It gets a little bit further now. I've attached the new serial debug output.
by , 18 years ago
Attachment: | new_serial_output_2_cpus.txt added |
---|
comment:9 by , 18 years ago
Cc: | added |
---|
comment:10 by , 18 years ago
by , 18 years ago
Attachment: | newer_serial_output_2_cpus.txt added |
---|
by , 18 years ago
Attachment: | r20157.txt added |
---|
by , 18 years ago
Attachment: | r20159.txt added |
---|
by , 18 years ago
Attachment: | r20162_serial_output.txt added |
---|
comment:12 by , 18 years ago
Replying to ekdahl:
Booting stops at the boot screen when enabling both cpus. Using only one cpu works. I'm attaching serial logs for one cpu and both cpus. Tested in vmware workstation with hrev20122.
Happens here on real hardware. ECS mobo, Athlon 64x2 4200+. I have no serial debug capability at the moment. With the 20070218 build, booting stops after displaying features of CPU 1. With multiprocessor support disabled from boot menu, system boots normally.
follow-ups: 14 15 comment:13 by , 18 years ago
I fixed it after the 0218 build, try it with a newer one.
comment:14 by , 18 years ago
comment:15 by , 18 years ago
Replying to geist:
I fixed it after the 0218 build, try it with a newer one.
I tried hrev20182 from haikuhost.com on real hardware. Still fails unless I disable SMP during boot. Console output (copied by hand - no serial debug here): code32 0xf000, 0x80bc, length 0xc9ea code16 0xf000, length 0x418c data 0xfdf0, length 0x0 CPU1: type 0 family 15 model 11 stepping 1 string AuthenticAMD CPU1: vendor 'AMD' model name "AMD Athlon(tm) 64 x2 Dual Core Processor 4200+' CPU1: features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh mmx fxsr sse sse2 ntt sse3 syscall mx mmxest ffxsr long 3dnowext 3dnow
I hope this helps.
comment:16 by , 18 years ago
http://svn.berlios.de/viewcvs/haiku/haiku/trunk/src/system/kernel/smp.c?rev=20200&r1=20160&r2=20200
after that fix, and with scheduler tracing enabled, it booted all the way until it paniced because no boot volume was found (I'm still working on ahci support)
but without scheduler tracing, it stops pretty early, but later than before the fix, see http://pastebin.ca/368034
comment:17 by , 18 years ago
still seeing the same behavior with hrev20208, downloaded on 2007-02-22.
comment:18 by , 18 years ago
sorry, I'm having a terrible time reproducing this. I recently fired up an old dual athlon MP I had lying around to try to reproduce it, but I'm getting something else that's blocking debugging it. I'm pretty sure some of the smp code is kind of rotted a bit, and i've made it worse before it gets better.
If anyone can reproduce it and figure out precisely what the problem is you'll be my hero. I just can't do it here.
by , 18 years ago
Attachment: | r20231_scheduler_trace.txt added |
---|
comment:19 by , 18 years ago
I applied a small cleanup to arch_smp.c and also added volatile to apic access, but this doesn't help.
I enabled TRACE in main.c scheduler.cpp and arch_smp.c I did not enable TRACE_TIMER in arch_smp.c Please have a look at the 20231_scheduler_trace.txt
- when not enabling tracing in scheduler.cpp, the system
stops at INIT : main: done... begin idle loop on cpu 1
- reschedule is never executed on cpu 1
- "inter-cpu interrupt on cpu 1" appears frequently. what does it do?
- when enabling TRACE_TIMER in arch_smp.c, reschudule is executed on both cpus
- the apic time function only disables interrupts, it this enough?
follow-up: 21 comment:20 by , 18 years ago
This appears to be the same bug: http://axeld.blogspot.com/2005/10/not-yet.html
comment:21 by , 18 years ago
Replying to marcusoverhagen:
This appears to be the same bug: http://axeld.blogspot.com/2005/10/not-yet.html
Maybe yes, maybe no. The real puzzle is that sometime in the recent past (within the last 60 days or so) Haiku did boot correctly and run on both CPUs of my Athlon 64. Then I stopped downloading nightly builds for a while; now it doesn't work. I'll try to research and find out when things stopped by loading older images if I can find them.
comment:22 by , 18 years ago
Seems to work now, after the recent changes made by geist.
However, I can only test the boot process up to the point where the root partition is supposed to be mounted.
Can anyone else confirm?
comment:23 by , 18 years ago
Tested here using 1 March image from BuildFactory on real hardware (Athlon64x2 3800+.) Boot no longer stops at the point indicated in my previous comment. Booting dies after appserver starts (thread 47 caused segment violation) but this may be unrelated.
comment:24 by , 18 years ago
that's what I'm seeing too. Later on the system dies because the first couple of processes gets clobbered somehow. I don't think it's SMP related, it may be something we're not doing right on newer cpus, or it could be just a fast machine problem. I vote to mark this one closed and track the app_server/sh/whatever failures with another bug.
comment:25 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(I don't know why the font is set to bold)
Closing this bug.
by , 18 years ago
Attachment: | serial_output_2_cpus_r20359_occasional.txt added |
---|
Gets to the point where I can see the desktop background color and the mouse cursor, only gets this far sometimes.
by , 18 years ago
Attachment: | serial_output_2_cpus_r20359_regular.txt added |
---|
This is where it most often stops
by , 18 years ago
Attachment: | serial_output_1_cpu_r20359.txt added |
---|
As can be seen in this log, booting on one cpu works
This happens on real hardware, as well, for me.