Opened 9 years ago

Closed 9 years ago

#6955 closed bug (fixed)

KDL at desktop after installation

Reported by: drcouzelis Owned by: nobody
Priority: normal Milestone: R1
Component: - General Version: R1/Development
Keywords: KDL Cc:
Blocked By: Blocking: #6954
Has a Patch: no Platform: x86

Description

I used a Haiku nightly hrev39707 gcc2-hybrid anyboot CD. Haiku ran very well from the live CD. Then, I installed Haiku onto my hard drive and restarted my computer.

Problem:

Even though Haiku boot correctly from the live CD, Haiku entered KDL after I installed it and restarted my computer. It looks like it enters KDL just after the blue desktop wallpaper appears.

My computer hardware was well supported by Haiku in the past. I have a Gigabyte motherboard with working sound and networking, and a Radeon card that works well in Haiku. I have installed Haiku many times onto my hard drive without a problem, so this problem must have appeared sometime within the past couple of months.

I will attach a photo from the backtrace from KDL.

Please let me know if you would like me to do more commands in KDL, to give you more information about my hardware, or to give you more specific dates.

Thank you.

Attachments (4)

drcouzelis-kdl.jpg (196.0 KB ) - added by drcouzelis 9 years ago.
drcouzelis-r39173-kdl.jpg (346.3 KB ) - added by drcouzelis 9 years ago.
drcouzelis-r39173-bt.jpg (408.8 KB ) - added by drcouzelis 9 years ago.
default_servers.png (28.9 KB ) - added by luroh 9 years ago.

Download all attachments as: .zip

Change History (31)

by drcouzelis, 9 years ago

Attachment: drcouzelis-kdl.jpg added

comment:1 by humdinger, 9 years ago

Blocking: 6954 added

comment:2 by luroh, 9 years ago

Version: R1/alpha2R1/Development

I suppose our only chance here is to identify the commit that broke things for you. By testing old nightly builds, you could probably find a nice and narrow regression range.

by drcouzelis, 9 years ago

Attachment: drcouzelis-r39173-kdl.jpg added

by drcouzelis, 9 years ago

Attachment: drcouzelis-r39173-bt.jpg added

comment:3 by drcouzelis, 9 years ago

Thank you for your response and suggestion. I found the build.

Build hrev39021 (2010-10-20) works.

Build hrev39173 (2010-10-28) enters KDL.

I attached two screenshots from hrev39173, one from as soon as it enters KDL, and one after typing the "bt" command.

Please let me know if I can be of further assistance. Thank you.

comment:4 by luroh, 9 years ago

Ok, that's certainly a good start. I can help you drill down further by putting some intermediate image revisions up for download if you'd like. I'd suggest we start by confirming that my gcc2 version of hrev39021 works for you while my hrev39173 enters KDL. How does that sound?

comment:5 by drcouzelis, 9 years ago

That's sounds great! I'm ready to download them and try them out.

I will wait for you to give me a link to any builds you would like me to try.

Thank you.

Last edited 9 years ago by drcouzelis (previous) (diff)

comment:6 by luroh, 9 years ago

Below you will find links to three rather bare-bones gcc2 anyboot images. Copies of these files have all been booted successfully in VMware.

Please first verify that the following two images behave just like their nightly build counterparts: 39021 and 39173

If they do, then please go ahead and try our first iteration: 39079

Version 1, edited 9 years ago by luroh (previous) (next) (diff)

comment:7 by drcouzelis, 9 years ago

hrev39021 - After the rocket but before the blue desktop it froze. The top half looked like white and black "snow", and the bottom half was black.

hrev39097 - Nothing happened after the rocket. My monitor went black and stopped receiving any input. There was no CD or drive activity.

hrev39173 (or hrev39137) - The same as hrev39097.

I will try any other commands or builds that you recommend. Thank you.

comment:8 by drcouzelis, 9 years ago

Just to clarify, I didn't install any of those three builds. They all failed to load while being used as a live CD.

comment:9 by anevilyak, 9 years ago

One point of concern here: for some range of dates, Build-O-Matic was uploading builds that weren't actually updated properly due to the UTF-8 keymap issue, i.e. they reported being a newer revision than they in fact were. Off the top of my head I'm unfortunately not sure if 2010-10-20 falls within that range or not, but if it does, it might help explain the difference between luroh's build of hrev39021 vs the one downloaded from haiku-files, or at least I assume the ones you previously tried were from haiku-files.

comment:10 by drcouzelis, 9 years ago

Yes, that's correct, I was using the builds from haiku-files.

comment:11 by luroh, 9 years ago

Ouch, sorry to hear the plan didn't work out. Just to let you know, to double check, I burned and tested the gcc2 hrev39021 image today and it initially KDL'ed but started booting fine after I had my network interfaces disabled in BIOS (something I don't have to do on a HDD install). Not that I think it would help in your case, but the image itself should be ok.

comment:12 by drcouzelis, 9 years ago

I disabled the LAN in my BIOS and tried installing the three special builds again.

hrev39021 - Installed. While booting from the hard drive, it entered KDL after the rocket.

hrev39097 - The same as hrev39021.

hrev39173 (or hrev39137) - Installed. While booting from the hard drive, the screen went blank after the rocket.

comment:13 by luroh, 9 years ago

It's at least interesting to note that we've both made the same observation - booting these slightly older revisions from CD seems to work if network interfaces are disabled.

Back to the KDLs, have you tried booting in safe mode? Pressing <Shift> during boot allows you to access the boot menu. If that doesn't work, try the other safe mode options, one at a time.

comment:14 by drcouzelis, 9 years ago

I downloaded and installed the latest nightly build, hrev39771. (2010-12-08) After installing and rebooting, it entered KDL after the rocket. (As we expected)

I tried booting in safe mode. It worked, and finished booting all the way to the desktop.

I tried setting and unsetting a few of the other "safe mode" options, but they either caused KDL or a lock-up on the "hardware" (I think) boot icon. If you think we can get useful information from trying more safe mode options, then I'll do it. Otherwise, whatever. :P

I'm a little surprised that the stack trace from KDL doesn't help. Can KDL tell us exactly which line of code is crashing? Would that information be useful?

Thank you.

comment:15 by anevilyak, 9 years ago

The line/function at which something crashes is almost never the line that actually caused the issue. Typically it's something else that went wrong previously which put the system in an incorrect state or corrupted a data structure, which ultimately winds up crashing in some place that winds up making use of said structure, but the latter isn't the actual culprit, only a victim. As such, in all but the most trivial cases, the backtrace only really gives you a starting point for figuring out what other pieces of code could potentially have affected the things the offending function is dealing with. In this case even that is unfortunately less than obvious since it's a crash in the VFS, which is used by just about everything in some manner or other. Therefore, narrowing things down more specifically to either a revision or option that makes the problem go away is about the only real route to go in order to start tracking down the actual underlying cause.

comment:16 by drcouzelis, 9 years ago

Thank you for explaining that to me. I think debugging an operating system is a lot different than I thought it was!

Anyway, it's unfortunate that the development builds that luroh made didn't work like the nightly snapshots on haiku-files. I'm ready to any other builds or steps to help narrow down the problem. :)

comment:17 by luroh, 9 years ago

The ticket is becoming quite big but that's ok for now, once we have found enough substantial clues to work with we can close this one and summarize the findings in a new ticket. We're not yet out of ideas. In fact, having only 'safe mode' working often points to some driver or server causing the problem. My suggestion would be to go back to testing the latest nightly, trying the following:

  1. In the boot menu, instead of 'safe mode', select 'fail-safe video mode' (VESA) and pick a suitable resolution. The video corruption issues reported earlier could indicate there being something wrong with the video driver.
  1. 'safe mode' prevents a bunch of servers from starting at boot. Boot into 'safe mode' and then try starting them manually one after the other (you'll find them in boot/system/servers). I have attached a picture of the default running servers.
  1. Try disabling everything you don't need in BIOS (network interfaces, USB?, CDs, HDDs...) and see if you can boot without 'safe mode'.

by luroh, 9 years ago

Attachment: default_servers.png added

in reply to:  14 comment:18 by phoudoin, 9 years ago

I tried booting in safe mode. It worked, and finished booting all the way to the desktop.

Sounds like some drivers not loaded in safe-mode are behind the KDL. As it seems quite related to network interfaces being available or not, could you told us which network adapter(s) you have, lan and/or wan.

As the KDL trigger at rocket icon or early desktop start, maybe it's when the net_server start to configure your interface, indirectly accessing the below kernel driver. Which, maybe, do something wrong which lead to some freeze, or memory corruption somewhere in kernel space.

comment:19 by drcouzelis, 9 years ago

@phoudoin: My ethernet port (LAN) is on my mother board:

GIGABYTE GA-MA770-DS3 AM2+/AM2 AMD 770 ATX All Solid Capacitor AMD Motherboard

@luroh:

1) I selected "fail-safe video mode" and chose a resolution. Haiku started normally, without crashing. Then I tried again with a different resolution and Haiku started normally, without crashing. That surprised me. My video card is Radeon X850 XT PE (R420).

2) When I boot into safe mode, Haiku starts normally (for safe mode). Then, I can start all servers without Haiku crashing. That surprised me.

3) We already know that when I boot with LAN disabled in the BIOS Haiku starts normally. I will try disabling other hardware in the BIOS later tonight.

Thank you.

comment:20 by phoudoin, 9 years ago

Hum, could be an IRQ conflict between the network adapter and your Radeon X850 XT PE. What you could try then, while booted in safe mode, is to disabled the radeon driver in /boot/system/add-ons/kernel/drivers/bin/radeon. Move it on your desktop or under a "disabled" subfolder there. Reboot in normal mode. The radeon driver wont load anymore (fallback on vesa, sorry), that would be interesting to see what happened when the network adapter alone is driven...

Could you check (under linux or Windows if you could multiboot) which IRQ is assigned by your BIOS to your network adapter and the radeon card?

It's starting to smell like an IRQ routing issue...

comment:21 by drcouzelis, 9 years ago

I think it behaved as you expected: I removed the 'radeon' driver and then did a normal boot. Haiku was able to start normally, although in VESA video mode.

The Linux 'lsdev' command says 'radeon' is IRQ 40 and 'eth0' is IRQ 41.

comment:22 by anevilyak, 9 years ago

Blocked By: 5 added

Those are IO-APIC IRQs. Adding ticket #5 as a blocker.

comment:23 by drcouzelis, 9 years ago

Since ticket #5 is a blocker, does that mean I should wait until #5 is fixed before seeing if this ticket is fixed? (even though Haiku was working on my hardware in the beginning of October 2010)

Is there a setting in my BIOS that I should look for that would allow Haiku to boot correctly?

Thank you all for your help.

comment:24 by luroh, 9 years ago

Basically yes, bug #5 is blocking your Haiku installation from working correctly on your computer. However, once in a blue moon, one can get things going by disabling some hardware, assigning IRQs manually in BIOS, selecting "Plug & Play OS = no" or the likes. There are no guarantees, nor are there any detailed instructions to be found, but it might be worth trying if you have the time.

comment:25 by drcouzelis, 9 years ago

For some reason, this bug doesn't happen anymore. hrev39873 (2010-12-16) worked correctly, and I just tested it again with hrev39915 (2010-12-21).

I toggled a couple of things in the BIOS, but I don't think I changed anything permanently. Namely:

. I changed the video card priority (I think it was called) from "PEG" to "PCI" and eventually back to "PEG"

. I changed the PCIE Delay Time (found after pressing Ctrl + F1) from "0" to "100" and eventually back to "0"

Anyway, I consider this bug to be resolved for now. Thank you very much for your assistance and hard work! :)

comment:26 by korli, 9 years ago

Blocked By: 5 removed

Thanks for the feedback!

comment:27 by korli, 9 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.