#2197 closed bug (fixed)
KDL when starting Firefox
Reported by: | cebif | Owned by: | axeld |
---|---|---|---|
Priority: | blocker | Milestone: | R1 |
Component: | Network & Internet/TCP | Version: | R1/pre-alpha1 |
Keywords: | Cc: | fredrik.holmqvist@… | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
When starting Firefox from firefox.bin or link to it, it almost always results in KDL. If starting it in Terminal from the firefox script it starts OK. I have the "UserSetupEnvironment" file in: /home/config/boot. I had to create the /boot folder I am using Firefox 2.0.0.12 and Haiku hrev25268 on its own partition. I will attach screen images of KDL with backtrace and my "UserSetupEnvironment" file. I don't think there is anything wrong with this file. It is the same one, a copy of the file I use in BeOSR5.05 Bone and no problems with it there. I have Firefox in the same folder in /apps as in BeOSR5.05 Bone.
Attachments (5)
Change History (57)
by , 17 years ago
Attachment: | KDL Weh Starting Firefox 1.jpg added |
---|
by , 17 years ago
Attachment: | KDL When Starting Firefox 2.jpg added |
---|
by , 17 years ago
Attachment: | UserSetupEnvironment added |
---|
comment:1 by , 17 years ago
Component: | Applications → Network & Internet/TCP |
---|---|
Milestone: | R1 → R1/alpha1 |
Owner: | changed from | to
Priority: | normal → blocker |
comment:2 by , 17 years ago
Owner: | changed from | to
---|
follow-up: 4 comment:3 by , 17 years ago
Cc: | added |
---|
That UserSetupEnvironment seems to be broken. Has it gone broken when uploading or is that so? Look at end of second line.
comment:4 by , 17 years ago
Replying to tqh:
That UserSetupEnvironment seems to be broken. Has it gone broken when uploading or is that so? Look at end of second line.
Yes it is broken I think from looking at it now. It is the same uploaded as on my system both Haiku and BeOSR5.05 Bone. I don't know why it was sensitive to the error (not all times) in Haiku and not in BeOSR5.05 Bone.It looks like a ":" and "A" was missing in that order.
follow-up: 11 comment:7 by , 17 years ago
Replying to tqh:
So when you corrected that it works correctly?
Yes, I have retested with corrections to UserSetupEnvironment and it has not crashed at startup.
comment:9 by , 17 years ago
Actually, this hints to a problem in the runtime loader, if all it takes is a corrupt PATH variable to trigger a kernel panic under certain conditions. So I would leave this bug open.
comment:11 by , 17 years ago
Replying to cebif:
Replying to tqh:
So when you corrected that it works correctly?
Yes, I have retested with corrections to UserSetupEnvironment and it has not crashed at startup.
I spoke too soon. It is still crashing. It just needed some more tests to find out; after I started Firefox from first booting Haiku this morning.
comment:12 by , 17 years ago
It is still crashing (not everytime) even with the UserSetupEnvironment from Bebits directly copied here without any alteration, to show the literal path.
comment:14 by , 17 years ago
I have tested again in hrev24577, I mean many repeat tests and it is not crashing.
comment:15 by , 16 years ago
I must have made an error with the haiku version number I reported testing with in my last comment. It cannot have been hrev24577 because that is an earlier version than the version I first reported the bug with. The actual version must have been hrev25477 because I keep the last few images that I have used in a folder and that is the closest match, with the same numbers but the 4 and 5 reversed. It is also about the same time as I made that comment. In any case I am now testing with hrev26493 and cannot reproduce the bug anymore.
comment:16 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
So I'm cautiously closing this bug. Please reopen if it happens again.
comment:17 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Rev 28144, fresh build, booted fine in Vmware Player, started Firefox -> panic.
This has happened to me occasionally in the past as well, but I've never managed to catch the serial output before.
serial.txt attached.
comment:18 by , 16 years ago
Ugh, Trac's acting up. File available at http://81.26.233.149/files/28144_serial.txt
comment:19 by , 16 years ago
Seems to be a deadbeef on freeing a socket here: 12 92007d24 (+ 80) 800b0fdf <kernel_x86>:list_remove_link + 0x000b 13 92007d74 (+ 48) 800b1122 <kernel_x86>:list_remove_head_item + 0x001a 14 92007da4 (+ 48) 80555021 </boot/beos/system/add-ons/kernel/network/stack> delete_children(list*: 0x90c11960) + 0x0021 15 92007dd4 (+ 64) 80555c22 </boot/beos/system/add-ons/kernel/network/stack> socket_delete(net_socket*: 0x90c117f8) + 0x008e
It seems that keeps popping up once in a while.
Here is the Firefox side of things: http://mxr.mozilla.org/mozilla1.8.0/source/nsprpub/pr/src/io/prpolevt.c#413 I think I'll rewrite that as it may be optimized AF_UNIX(?) for Haiku and may be fragile under BeOS.
comment:21 by , 16 years ago
Yes, and no. We also talked about the issue of the runtime-loader and broken lib-paths. It probably should spawn a new bug though.
comment:23 by , 16 years ago
Milestone: | R1/alpha1 → R1 |
---|
I have never seen this happen, definitely not in recent times, so this may even have been fixed by the recent TCP buffer management fixes. Removing this ticket from R1/alpha (since something so seldom or possibly even fixed already shouldn't hold up the alpha), also #2029, possibly the same issue, was not in R1/alpha milestone before. Someone more confident could mark both tickets as fixed or move both tickets into R1/alpha if there is reason to believe these were not caused by the TCP buffer list issues.
comment:24 by , 16 years ago
While I haven't seen this bug either (and therefore agree it shouldn't hold up the alpha), I don't think it has been fixed yet, judging from the stack trace, at least.
by , 16 years ago
Attachment: | backtrace-firefox.jpg added |
---|
Firefox 2.0.0.18 KDL + bt on an Asus EEE 901 (hrev30165)
comment:25 by , 16 years ago
In addition to the screenshot above, I can reproduce that crash 100% of the time while using both CPUs on my EEE 901. It does not happen with only one CPU enabled if that helps...
comment:26 by , 16 years ago
Seeing as there are *many* versions of Firefox for BeOS floating around, could you provide some more specific information about that build? ideally an URL. On a side note, if you search "BeZilla" in our bugtracker, you'll find some newer built-in-haiku builds based on Mozilla's source code.
comment:27 by , 16 years ago
The firefox build used is: http://www.bebits.com/bob/21982/firefox-2.0.0.18.en-US.BeOS-bone.zip
comment:28 by , 16 years ago
Please try to reproduce it with a new Haiku build. Axel has done some fixing in the network code, and from the backtrace it looks like a problem with releasing sockets.
comment:29 by , 16 years ago
It would indeed be interesting if that is fixed already. Looks like the socket to be released has already been deleted.
follow-up: 31 comment:30 by , 16 years ago
I tried again, same setup as before, EEE PC 901, firefox-2.0.0.18.en-US.BeOS-bone.zip, this time using Haiku hrev30230
1st attempt (only CPU0 active) - firefox loads successfully 2nd attempt (both CPU0 and CPU1 active) - firefox loads successfully 3rd attempt (both CPU0 and CPU1 active) - KDL again
follow-up: 32 comment:31 by , 16 years ago
Replying to philcostin:
I tried again, same setup as before, EEE PC 901, firefox-2.0.0.18.en-US.BeOS-bone.zip...
Please, use the Firefox builds for Haiku instead, which you can get from here:
http://bezilla.beuser.de/builds/BeZillaBrowser-gcc2-20090218.zip
comment:32 by , 16 years ago
Replying to koki:
Please, use the Firefox builds for Haiku instead, which you can get from here:
I realize you mean well, but in this instance it's irrelevant if he's using the older builds, they shouldn't be able to KDL Haiku regardless, so if they're the builds able to reproduce the problem then they're of interest either way :)
comment:33 by , 16 years ago
koki, I believe anevilyak is correct, if an R5 BONE app is causing a problem in the network stack I'd rather not skip over it since it exposes a condition that should not be occuring. I will try those browsers too, however.
I took "a new Haiku build" (tqh) to mean a rebuild of Haiku rather than a newer browser.
Thanks for the links though koki, it's probably right to try it too as the more information the better.
comment:34 by , 16 years ago
@anevilyak & philcostin
Please disregard my comment. I obviously misread tqh's suggestion to use "a new Haiku build".
comment:35 by , 16 years ago
Trying hrev30230 again, using BeZillaBrowser-gcc2-20090218
I have launched the BeZilla browser above 11 or 12 times with both CPUs enabled and I can't reproduce the KDL I get when testing firefox-2.0.0.18.en-US.BeOS-bone
I did not test the GCC4 version of Bezilla.
Could this be something to do with the BONE emulation?
comment:36 by , 16 years ago
Something more to add, thunderbird also KDLs in the same way: http://bebits.com/bob/21959/thunderbird-2.0.0.17.en-US.BeOS-bone.zip
comment:37 by , 16 years ago
Someone who has debugged semaphore issues before might find this useful...
This is the far more useful info I found after playing with the debugger and switching it to CPU 1.
kdebug> cpu 1 Welcome to Kernel Debugging Land... Thread 538 "firefox-bin" running on CPU 1 kdebug> teams team id parent name 0x80fa5000 1 0x00000000 kernel_team 0x8105c330 532 0x8105c198 sh 0x80fa5cc0 98 0x80fa5000 Tracker 0x80fa54c8 67 0x80fa5000 debug_server 0x8105c000 99 0x80fa5000 Deskbar 0x80fa5660 68 0x80fa5000 net_server 0x80fa57f8 69 0x80fa5000 app_server 0x8105c660 103 0x80fa5000 media_server 0x8105cb28 538 0x8105c330 firefox-bin 0x8105c7f8 104 0x80fa5000 midi_server 0x8105c990 105 0x80fa5000 print_server 0x80fa5b28 419 0x80fa5cc0 Terminal 0x8105c4c8 423 0x80fa5b28 sh 0x80fa5990 83 0x80fa5000 syslog_daemon 0x80fa5e58 87 0x80fa57f8 input_server 0x80fa5198 119 0x8105c660 media_addon_server 0x8105c198 525 0x8105c4c8 sh 0x80fa5330 60 0x80fa5000 registrar kdebug> threads 538 thread id state wait for object cpu pri stack team name 0x855b9800 538 running - 1 10 0x85935000 538 firefox-bin 0x855ab000 539 waiting sem 6468 - 10 0x85939000 538 Mozilla XUL BApplication kdebug> sem 6468 SEM: 0x842b91d0 id: 6468 (0x1944) name: ´AppLooperPort´ owner: -1 count: -1 queue: 539 last acquired by: 539, count: 1 last released by: 539, count: 1
comment:38 by , 16 years ago
Actually, Ignore the above, my screenshot still applies.. thanks to anevilyak and tqh for clarifying.
comment:39 by , 16 years ago
Cc: | removed |
---|
It might be the BONE compability layer, but note that the BezillaBrowser is a far better build than the CVS-built BONE builds, with many issues fixed. It probably doesn't even use the same code-path as the BONE-builds. So it is probably necessary to compare the Firefox / BezillaBrowser code and see what is different and if this is problem with Firefox CVS code.
The NSPR (runtime layer) of BezillaBrowser has a >300kB patch...
comment:40 by , 16 years ago
Cc: | added |
---|
comment:41 by , 16 years ago
I've added a test app "firefox_crash" in hrev30304. Unfortunately, it doesn't reproduce the crash over here (I don't manage to reproduce it with Firefox either, though). tqh, could you have a look at the test app if you spot any differences to how Firefox did it back then?
And could someone being able to reproduce the problem try to see if that test app can also reproduce it?
comment:42 by , 16 years ago
I believe the crashing I am getting (per my original photo) is either #2029 but with a newer codebase... or related to #2029.
This ticket most likely still applies however, as I haven't proven that yet.
I saw your test app get committed while trying to work on the issue myself but it got too late in the night by then for me to do anything more... I'll take a look at the test later to see whether I can reproduce the crash on my EEE PC 901.
Only thing is - I'm not sure how to run a test like that yet.. I'll probably need to re-check out the SVN from within the OS rather than just copying something to the machine... any advice on that axeld?
--Thanks
comment:43 by , 16 years ago
You can build the test app anywhere via:
jam firefox_crash
And then copy it to the machine manually. When updating the system from another OS on the same machine, you can also add it to your UserBuildConfig:
AddFilesToHaikuImage home config bin : firefox_crash ;
comment:44 by , 16 years ago
axeld:
I ran firefox_crash on this machine 20 or 30 times (with both CPUs enabled) and I could not cause a KDL.
comment:46 by , 16 years ago
Thanks philcostin! Looks like I need to have another look at the Firefox sources. At least this doesn't mean that it just doesn't happen on my machine :-)
comment:47 by , 16 years ago
Status: | reopened → new |
---|
I can perfectly reproduce this with Thunderbird!
comment:48 by , 16 years ago
Status: | new → assigned |
---|
comment:49 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed in hrev30363 - finally!
follow-up: 51 comment:50 by , 16 years ago
Nice. I was going to say it might be the event port handling in CVS firefox. Although the stacktrace was a bit weird for that.
(It's completly broken and uses the memory address instead of unique id to detect / create / delete its port).
comment:51 by , 16 years ago
Replying to tqh:
(It's completly broken and uses the memory address instead of unique id to detect / create / delete its port).
Ouch, I wonder if that's why CVS Firefox seemed to sometimes kill the message loopers of other apps (it used to happen remarkably often that running Vision and CVS Firefox simultaneously would result in Vision's window thread simply disappearing for no obvious reason. This would perfectly explain that behavior if it's buggy and nukes the wrong port).
TCP still had Hugo as owner, but I'm afraid that's another component for me now :-)