Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#2197 closed bug (fixed)

KDL when starting Firefox

Reported by: cebif Owned by: axeld
Priority: blocker Milestone: R1
Component: Network & Internet/TCP Version: R1/pre-alpha1
Keywords: Cc: fredrik.holmqvist@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

When starting Firefox from firefox.bin or link to it, it almost always results in KDL. If starting it in Terminal from the firefox script it starts OK. I have the "UserSetupEnvironment" file in: /home/config/boot. I had to create the /boot folder I am using Firefox 2.0.0.12 and Haiku hrev25268 on its own partition. I will attach screen images of KDL with backtrace and my "UserSetupEnvironment" file. I don't think there is anything wrong with this file. It is the same one, a copy of the file I use in BeOSR5.05 Bone and no problems with it there. I have Firefox in the same folder in /apps as in BeOSR5.05 Bone.

Attachments (5)

KDL Weh Starting Firefox 1.jpg (142.4 KB ) - added by cebif 12 years ago.
KDL When Starting Firefox 2.jpg (197.3 KB ) - added by cebif 12 years ago.
UserSetupEnvironment (226 bytes ) - added by cebif 12 years ago.
backtrace-firefox.jpg (160.2 KB ) - added by philcostin 11 years ago.
Firefox 2.0.0.18 KDL + bt on an Asus EEE 901 (hrev30165)
out.txt (8.6 KB ) - added by philcostin 11 years ago.
strace of test: firefox_crash

Download all attachments as: .zip

Change History (57)

by cebif, 12 years ago

by cebif, 12 years ago

by cebif, 12 years ago

Attachment: UserSetupEnvironment added

comment:1 by axeld, 12 years ago

Component: ApplicationsNetwork & Internet/TCP
Milestone: R1R1/alpha1
Owner: changed from axeld to hugosantos
Priority: normalblocker

comment:2 by axeld, 12 years ago

Owner: changed from hugosantos to axeld

TCP still had Hugo as owner, but I'm afraid that's another component for me now :-)

comment:3 by tqh, 12 years ago

Cc: tqh added

That UserSetupEnvironment seems to be broken. Has it gone broken when uploading or is that so? Look at end of second line.

in reply to:  3 comment:4 by cebif, 12 years ago

Replying to tqh:

That UserSetupEnvironment seems to be broken. Has it gone broken when uploading or is that so? Look at end of second line.

Yes it is broken I think from looking at it now. It is the same uploaded as on my system both Haiku and BeOSR5.05 Bone. I don't know why it was sensitive to the error (not all times) in Haiku and not in BeOSR5.05 Bone.It looks like a ":" and "A" was missing in that order.

comment:5 by tqh, 12 years ago

So when you corrected that it works correctly?

comment:6 by kaliber, 12 years ago

Is it a duplicate of bug #2189?

in reply to:  5 ; comment:7 by cebif, 12 years ago

Replying to tqh:

So when you corrected that it works correctly?

Yes, I have retested with corrections to UserSetupEnvironment and it has not crashed at startup.

comment:8 by tqh, 12 years ago

This bug can be closed.

comment:9 by stippi, 12 years ago

Actually, this hints to a problem in the runtime loader, if all it takes is a corrupt PATH variable to trigger a kernel panic under certain conditions. So I would leave this bug open.

comment:10 by tqh, 12 years ago

Yes, although I saw it is a different bug.

in reply to:  7 comment:11 by cebif, 12 years ago

Replying to cebif:

Replying to tqh:

So when you corrected that it works correctly?

Yes, I have retested with corrections to UserSetupEnvironment and it has not crashed at startup.

I spoke too soon. It is still crashing. It just needed some more tests to find out; after I started Firefox from first booting Haiku this morning.

comment:12 by cebif, 12 years ago

It is still crashing (not everytime) even with the UserSetupEnvironment from Bebits directly copied here without any alteration, to show the literal path.

comment:13 by tqh, 12 years ago

Same type of backtrace?

comment:14 by cebif, 12 years ago

I have tested again in hrev24577, I mean many repeat tests and it is not crashing.

comment:15 by cebif, 11 years ago

I must have made an error with the haiku version number I reported testing with in my last comment. It cannot have been hrev24577 because that is an earlier version than the version I first reported the bug with. The actual version must have been hrev25477 because I keep the last few images that I have used in a folder and that is the closest match, with the same numbers but the 4 and 5 reversed. It is also about the same time as I made that comment. In any case I am now testing with hrev26493 and cannot reproduce the bug anymore.

comment:16 by axeld, 11 years ago

Resolution: fixed
Status: newclosed

So I'm cautiously closing this bug. Please reopen if it happens again.

comment:17 by luroh, 11 years ago

Resolution: fixed
Status: closedreopened

Rev 28144, fresh build, booted fine in Vmware Player, started Firefox -> panic.
This has happened to me occasionally in the past as well, but I've never managed to catch the serial output before.
serial.txt attached.

comment:18 by luroh, 11 years ago

Ugh, Trac's acting up. File available at http://81.26.233.149/files/28144_serial.txt

comment:19 by tqh, 11 years ago

Seems to be a deadbeef on freeing a socket here: 12 92007d24 (+ 80) 800b0fdf <kernel_x86>:list_remove_link + 0x000b 13 92007d74 (+ 48) 800b1122 <kernel_x86>:list_remove_head_item + 0x001a 14 92007da4 (+ 48) 80555021 </boot/beos/system/add-ons/kernel/network/stack> delete_children(list*: 0x90c11960) + 0x0021 15 92007dd4 (+ 64) 80555c22 </boot/beos/system/add-ons/kernel/network/stack> socket_delete(net_socket*: 0x90c117f8) + 0x008e

It seems that keeps popping up once in a while.

Here is the Firefox side of things: http://mxr.mozilla.org/mozilla1.8.0/source/nsprpub/pr/src/io/prpolevt.c#413 I think I'll rewrite that as it may be optimized AF_UNIX(?) for Haiku and may be fragile under BeOS.

comment:20 by stippi, 11 years ago

This may actually be a duplicate of #2706 then, no?

comment:21 by tqh, 11 years ago

Yes, and no. We also talked about the issue of the runtime-loader and broken lib-paths. It probably should spawn a new bug though.

comment:22 by axeld, 11 years ago

This is probably a duplicate of bug #2029.

comment:23 by stippi, 11 years ago

Milestone: R1/alpha1R1

I have never seen this happen, definitely not in recent times, so this may even have been fixed by the recent TCP buffer management fixes. Removing this ticket from R1/alpha (since something so seldom or possibly even fixed already shouldn't hold up the alpha), also #2029, possibly the same issue, was not in R1/alpha milestone before. Someone more confident could mark both tickets as fixed or move both tickets into R1/alpha if there is reason to believe these were not caused by the TCP buffer list issues.

comment:24 by axeld, 11 years ago

While I haven't seen this bug either (and therefore agree it shouldn't hold up the alpha), I don't think it has been fixed yet, judging from the stack trace, at least.

by philcostin, 11 years ago

Attachment: backtrace-firefox.jpg added

Firefox 2.0.0.18 KDL + bt on an Asus EEE 901 (hrev30165)

comment:25 by philcostin, 11 years ago

In addition to the screenshot above, I can reproduce that crash 100% of the time while using both CPUs on my EEE 901. It does not happen with only one CPU enabled if that helps...

comment:26 by mmadia, 11 years ago

Seeing as there are *many* versions of Firefox for BeOS floating around, could you provide some more specific information about that build? ideally an URL. On a side note, if you search "BeZilla" in our bugtracker, you'll find some newer built-in-haiku builds based on Mozilla's source code.

comment:28 by tqh, 11 years ago

Please try to reproduce it with a new Haiku build. Axel has done some fixing in the network code, and from the backtrace it looks like a problem with releasing sockets.

comment:29 by axeld, 11 years ago

It would indeed be interesting if that is fixed already. Looks like the socket to be released has already been deleted.

comment:30 by philcostin, 11 years ago

I tried again, same setup as before, EEE PC 901, firefox-2.0.0.18.en-US.BeOS-bone.zip, this time using Haiku hrev30230

1st attempt (only CPU0 active) - firefox loads successfully 2nd attempt (both CPU0 and CPU1 active) - firefox loads successfully 3rd attempt (both CPU0 and CPU1 active) - KDL again

in reply to:  30 ; comment:31 by koki, 11 years ago

Replying to philcostin:

I tried again, same setup as before, EEE PC 901, firefox-2.0.0.18.en-US.BeOS-bone.zip...

Please, use the Firefox builds for Haiku instead, which you can get from here:

http://bezilla.beuser.de/builds/BeZillaBrowser-gcc2-20090218.zip

http://haiku-files.org/files/optional-packages/BeZillaBrowser-2.0.0.21pre-haiku-gcc4-x86-09-03-01.zip

in reply to:  31 comment:32 by anevilyak, 11 years ago

Replying to koki:

Please, use the Firefox builds for Haiku instead, which you can get from here:

I realize you mean well, but in this instance it's irrelevant if he's using the older builds, they shouldn't be able to KDL Haiku regardless, so if they're the builds able to reproduce the problem then they're of interest either way :)

comment:33 by philcostin, 11 years ago

koki, I believe anevilyak is correct, if an R5 BONE app is causing a problem in the network stack I'd rather not skip over it since it exposes a condition that should not be occuring. I will try those browsers too, however.

I took "a new Haiku build" (tqh) to mean a rebuild of Haiku rather than a newer browser.

Thanks for the links though koki, it's probably right to try it too as the more information the better.

comment:34 by koki, 11 years ago

@anevilyak & philcostin

Please disregard my comment. I obviously misread tqh's suggestion to use "a new Haiku build".

comment:35 by philcostin, 11 years ago

Trying hrev30230 again, using BeZillaBrowser-gcc2-20090218

I have launched the BeZilla browser above 11 or 12 times with both CPUs enabled and I can't reproduce the KDL I get when testing firefox-2.0.0.18.en-US.BeOS-bone

I did not test the GCC4 version of Bezilla.

Could this be something to do with the BONE emulation?

comment:36 by philcostin, 11 years ago

Something more to add, thunderbird also KDLs in the same way: http://bebits.com/bob/21959/thunderbird-2.0.0.17.en-US.BeOS-bone.zip

comment:37 by philcostin, 11 years ago

Someone who has debugged semaphore issues before might find this useful...

This is the far more useful info I found after playing with the debugger and switching it to CPU 1.

kdebug> cpu 1
Welcome to Kernel Debugging Land...
Thread 538 "firefox-bin" running on CPU 1
kdebug> teams
team            id      parent          name
0x80fa5000      1       0x00000000      kernel_team
0x8105c330      532     0x8105c198      sh
0x80fa5cc0      98      0x80fa5000      Tracker
0x80fa54c8      67      0x80fa5000      debug_server
0x8105c000      99      0x80fa5000      Deskbar
0x80fa5660      68      0x80fa5000      net_server
0x80fa57f8      69      0x80fa5000      app_server
0x8105c660      103     0x80fa5000      media_server
0x8105cb28      538     0x8105c330      firefox-bin
0x8105c7f8      104     0x80fa5000      midi_server
0x8105c990      105     0x80fa5000      print_server
0x80fa5b28      419     0x80fa5cc0      Terminal
0x8105c4c8      423     0x80fa5b28      sh
0x80fa5990      83      0x80fa5000      syslog_daemon
0x80fa5e58      87      0x80fa57f8      input_server
0x80fa5198      119     0x8105c660      media_addon_server
0x8105c198      525     0x8105c4c8      sh
0x80fa5330      60      0x80fa5000      registrar
kdebug> threads 538
thread          id      state   wait for        object  cpu pri  stack          team    name
0x855b9800      538     running         -               1   10   0x85935000     538     firefox-bin
0x855ab000      539     waiting sem             6468    -   10   0x85939000     538     Mozilla XUL BApplication
kdebug> sem 6468
SEM: 0x842b91d0
id:     6468 (0x1944)
name:   ´AppLooperPort´
owner:  -1
count:  -1
queue:  539
last acquired by: 539, count: 1
last released by: 539, count: 1

comment:38 by philcostin, 11 years ago

Actually, Ignore the above, my screenshot still applies.. thanks to anevilyak and tqh for clarifying.

comment:39 by tqh, 11 years ago

Cc: tqh removed

It might be the BONE compability layer, but note that the BezillaBrowser is a far better build than the CVS-built BONE builds, with many issues fixed. It probably doesn't even use the same code-path as the BONE-builds. So it is probably necessary to compare the Firefox / BezillaBrowser code and see what is different and if this is problem with Firefox CVS code.

The NSPR (runtime layer) of BezillaBrowser has a >300kB patch...

comment:40 by tqh, 11 years ago

Cc: fredrik.holmqvist@… added

comment:41 by axeld, 11 years ago

I've added a test app "firefox_crash" in hrev30304. Unfortunately, it doesn't reproduce the crash over here (I don't manage to reproduce it with Firefox either, though). tqh, could you have a look at the test app if you spot any differences to how Firefox did it back then?

And could someone being able to reproduce the problem try to see if that test app can also reproduce it?

comment:42 by philcostin, 11 years ago

I believe the crashing I am getting (per my original photo) is either #2029 but with a newer codebase... or related to #2029.

This ticket most likely still applies however, as I haven't proven that yet.

I saw your test app get committed while trying to work on the issue myself but it got too late in the night by then for me to do anything more... I'll take a look at the test later to see whether I can reproduce the crash on my EEE PC 901.

Only thing is - I'm not sure how to run a test like that yet.. I'll probably need to re-check out the SVN from within the OS rather than just copying something to the machine... any advice on that axeld?

--Thanks

comment:43 by axeld, 11 years ago

You can build the test app anywhere via:

jam firefox_crash

And then copy it to the machine manually. When updating the system from another OS on the same machine, you can also add it to your UserBuildConfig:

AddFilesToHaikuImage home config bin : firefox_crash ;

comment:44 by philcostin, 11 years ago

axeld:

I ran firefox_crash on this machine 20 or 30 times (with both CPUs enabled) and I could not cause a KDL.

by philcostin, 11 years ago

Attachment: out.txt added

strace of test: firefox_crash

comment:45 by philcostin, 11 years ago

revision used: 30305

comment:46 by axeld, 11 years ago

Thanks philcostin! Looks like I need to have another look at the Firefox sources. At least this doesn't mean that it just doesn't happen on my machine :-)

comment:47 by axeld, 11 years ago

Status: reopenednew

I can perfectly reproduce this with Thunderbird!

comment:48 by axeld, 11 years ago

Status: newassigned

comment:49 by axeld, 11 years ago

Resolution: fixed
Status: assignedclosed

Fixed in hrev30363 - finally!

comment:50 by tqh, 11 years ago

Nice. I was going to say it might be the event port handling in CVS firefox. Although the stacktrace was a bit weird for that.

(It's completly broken and uses the memory address instead of unique id to detect / create / delete its port).

in reply to:  50 comment:51 by anevilyak, 11 years ago

Replying to tqh:

(It's completly broken and uses the memory address instead of unique id to detect / create / delete its port).

Ouch, I wonder if that's why CVS Firefox seemed to sometimes kill the message loopers of other apps (it used to happen remarkably often that running Vision and CVS Firefox simultaneously would result in Vision's window thread simply disappearing for no obvious reason. This would perfectly explain that behavior if it's buggy and nukes the wrong port).

comment:52 by philcostin, 11 years ago

Great job! Works for me now too :D hrev30383

Note: See TracTickets for help on using tickets.