Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#2197 closed bug (fixed)

KDL when starting Firefox

Reported by: cebif Owned by: axeld
Priority: blocker Milestone: R1
Component: Network & Internet/TCP Version: R1/pre-alpha1
Keywords: Cc: fredrik.holmqvist@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

When starting Firefox from firefox.bin or link to it, it almost always results in KDL. If starting it in Terminal from the firefox script it starts OK. I have the "UserSetupEnvironment" file in: /home/config/boot. I had to create the /boot folder I am using Firefox 2.0.0.12 and Haiku hrev25268 on its own partition. I will attach screen images of KDL with backtrace and my "UserSetupEnvironment" file. I don't think there is anything wrong with this file. It is the same one, a copy of the file I use in BeOSR5.05 Bone and no problems with it there. I have Firefox in the same folder in /apps as in BeOSR5.05 Bone.

Attachments (5)

KDL Weh Starting Firefox 1.jpg (142.4 KB) - added by cebif 11 years ago.
KDL When Starting Firefox 2.jpg (197.3 KB) - added by cebif 11 years ago.
UserSetupEnvironment (226 bytes) - added by cebif 11 years ago.
backtrace-firefox.jpg (160.2 KB) - added by philcostin 10 years ago.
Firefox 2.0.0.18 KDL + bt on an Asus EEE 901 (hrev30165)
out.txt (8.6 KB) - added by philcostin 10 years ago.
strace of test: firefox_crash

Download all attachments as: .zip

Change History (57)

Changed 11 years ago by cebif

Changed 11 years ago by cebif

Changed 11 years ago by cebif

Attachment: UserSetupEnvironment added

comment:1 Changed 11 years ago by axeld

Component: ApplicationsNetwork & Internet/TCP
Milestone: R1R1/alpha1
Owner: changed from axeld to hugosantos
Priority: normalblocker

comment:2 Changed 11 years ago by axeld

Owner: changed from hugosantos to axeld

TCP still had Hugo as owner, but I'm afraid that's another component for me now :-)

comment:3 Changed 11 years ago by tqh

Cc: tqh added

That UserSetupEnvironment seems to be broken. Has it gone broken when uploading or is that so? Look at end of second line.

comment:4 in reply to:  3 Changed 11 years ago by cebif

Replying to tqh:

That UserSetupEnvironment seems to be broken. Has it gone broken when uploading or is that so? Look at end of second line.

Yes it is broken I think from looking at it now. It is the same uploaded as on my system both Haiku and BeOSR5.05 Bone. I don't know why it was sensitive to the error (not all times) in Haiku and not in BeOSR5.05 Bone.It looks like a ":" and "A" was missing in that order.

comment:5 Changed 11 years ago by tqh

So when you corrected that it works correctly?

comment:6 Changed 11 years ago by kaliber

Is it a duplicate of bug #2189?

comment:7 in reply to:  5 ; Changed 11 years ago by cebif

Replying to tqh:

So when you corrected that it works correctly?

Yes, I have retested with corrections to UserSetupEnvironment and it has not crashed at startup.

comment:8 Changed 11 years ago by tqh

This bug can be closed.

comment:9 Changed 11 years ago by stippi

Actually, this hints to a problem in the runtime loader, if all it takes is a corrupt PATH variable to trigger a kernel panic under certain conditions. So I would leave this bug open.

comment:10 Changed 11 years ago by tqh

Yes, although I saw it is a different bug.

comment:11 in reply to:  7 Changed 11 years ago by cebif

Replying to cebif:

Replying to tqh:

So when you corrected that it works correctly?

Yes, I have retested with corrections to UserSetupEnvironment and it has not crashed at startup.

I spoke too soon. It is still crashing. It just needed some more tests to find out; after I started Firefox from first booting Haiku this morning.

comment:12 Changed 11 years ago by cebif

It is still crashing (not everytime) even with the UserSetupEnvironment from Bebits directly copied here without any alteration, to show the literal path.

comment:13 Changed 11 years ago by tqh

Same type of backtrace?

comment:14 Changed 11 years ago by cebif

I have tested again in hrev24577, I mean many repeat tests and it is not crashing.

comment:15 Changed 11 years ago by cebif

I must have made an error with the haiku version number I reported testing with in my last comment. It cannot have been hrev24577 because that is an earlier version than the version I first reported the bug with. The actual version must have been hrev25477 because I keep the last few images that I have used in a folder and that is the closest match, with the same numbers but the 4 and 5 reversed. It is also about the same time as I made that comment. In any case I am now testing with hrev26493 and cannot reproduce the bug anymore.

comment:16 Changed 11 years ago by axeld

Resolution: fixed
Status: newclosed

So I'm cautiously closing this bug. Please reopen if it happens again.

comment:17 Changed 11 years ago by luroh

Resolution: fixed
Status: closedreopened

Rev 28144, fresh build, booted fine in Vmware Player, started Firefox -> panic.
This has happened to me occasionally in the past as well, but I've never managed to catch the serial output before.
serial.txt attached.

comment:18 Changed 11 years ago by luroh

Ugh, Trac's acting up. File available at http://81.26.233.149/files/28144_serial.txt

comment:19 Changed 11 years ago by tqh

Seems to be a deadbeef on freeing a socket here: 12 92007d24 (+ 80) 800b0fdf <kernel_x86>:list_remove_link + 0x000b 13 92007d74 (+ 48) 800b1122 <kernel_x86>:list_remove_head_item + 0x001a 14 92007da4 (+ 48) 80555021 </boot/beos/system/add-ons/kernel/network/stack> delete_children(list*: 0x90c11960) + 0x0021 15 92007dd4 (+ 64) 80555c22 </boot/beos/system/add-ons/kernel/network/stack> socket_delete(net_socket*: 0x90c117f8) + 0x008e

It seems that keeps popping up once in a while.

Here is the Firefox side of things: http://mxr.mozilla.org/mozilla1.8.0/source/nsprpub/pr/src/io/prpolevt.c#413 I think I'll rewrite that as it may be optimized AF_UNIX(?) for Haiku and may be fragile under BeOS.

comment:20 Changed 11 years ago by stippi

This may actually be a duplicate of #2706 then, no?

comment:21 Changed 11 years ago by tqh

Yes, and no. We also talked about the issue of the runtime-loader and broken lib-paths. It probably should spawn a new bug though.

comment:22 Changed 10 years ago by axeld

This is probably a duplicate of bug #2029.

comment:23 Changed 10 years ago by stippi

Milestone: R1/alpha1R1

I have never seen this happen, definitely not in recent times, so this may even have been fixed by the recent TCP buffer management fixes. Removing this ticket from R1/alpha (since something so seldom or possibly even fixed already shouldn't hold up the alpha), also #2029, possibly the same issue, was not in R1/alpha milestone before. Someone more confident could mark both tickets as fixed or move both tickets into R1/alpha if there is reason to believe these were not caused by the TCP buffer list issues.

comment:24 Changed 10 years ago by axeld

While I haven't seen this bug either (and therefore agree it shouldn't hold up the alpha), I don't think it has been fixed yet, judging from the stack trace, at least.

Changed 10 years ago by philcostin

Attachment: backtrace-firefox.jpg added

Firefox 2.0.0.18 KDL + bt on an Asus EEE 901 (hrev30165)

comment:25 Changed 10 years ago by philcostin

In addition to the screenshot above, I can reproduce that crash 100% of the time while using both CPUs on my EEE 901. It does not happen with only one CPU enabled if that helps...

comment:26 Changed 10 years ago by mmadia

Seeing as there are *many* versions of Firefox for BeOS floating around, could you provide some more specific information about that build? ideally an URL. On a side note, if you search "BeZilla" in our bugtracker, you'll find some newer built-in-haiku builds based on Mozilla's source code.

comment:28 Changed 10 years ago by tqh

Please try to reproduce it with a new Haiku build. Axel has done some fixing in the network code, and from the backtrace it looks like a problem with releasing sockets.

comment:29 Changed 10 years ago by axeld

It would indeed be interesting if that is fixed already. Looks like the socket to be released has already been deleted.

comment:30 Changed 10 years ago by philcostin

I tried again, same setup as before, EEE PC 901, firefox-2.0.0.18.en-US.BeOS-bone.zip, this time using Haiku hrev30230

1st attempt (only CPU0 active) - firefox loads successfully 2nd attempt (both CPU0 and CPU1 active) - firefox loads successfully 3rd attempt (both CPU0 and CPU1 active) - KDL again

comment:31 in reply to:  30 ; Changed 10 years ago by koki

Replying to philcostin:

I tried again, same setup as before, EEE PC 901, firefox-2.0.0.18.en-US.BeOS-bone.zip...

Please, use the Firefox builds for Haiku instead, which you can get from here:

http://bezilla.beuser.de/builds/BeZillaBrowser-gcc2-20090218.zip

http://haiku-files.org/files/optional-packages/BeZillaBrowser-2.0.0.21pre-haiku-gcc4-x86-09-03-01.zip

comment:32 in reply to:  31 Changed 10 years ago by anevilyak

Replying to koki:

Please, use the Firefox builds for Haiku instead, which you can get from here:

I realize you mean well, but in this instance it's irrelevant if he's using the older builds, they shouldn't be able to KDL Haiku regardless, so if they're the builds able to reproduce the problem then they're of interest either way :)

comment:33 Changed 10 years ago by philcostin

koki, I believe anevilyak is correct, if an R5 BONE app is causing a problem in the network stack I'd rather not skip over it since it exposes a condition that should not be occuring. I will try those browsers too, however.

I took "a new Haiku build" (tqh) to mean a rebuild of Haiku rather than a newer browser.

Thanks for the links though koki, it's probably right to try it too as the more information the better.

comment:34 Changed 10 years ago by koki

@anevilyak & philcostin

Please disregard my comment. I obviously misread tqh's suggestion to use "a new Haiku build".

comment:35 Changed 10 years ago by philcostin

Trying hrev30230 again, using BeZillaBrowser-gcc2-20090218

I have launched the BeZilla browser above 11 or 12 times with both CPUs enabled and I can't reproduce the KDL I get when testing firefox-2.0.0.18.en-US.BeOS-bone

I did not test the GCC4 version of Bezilla.

Could this be something to do with the BONE emulation?

comment:36 Changed 10 years ago by philcostin

Something more to add, thunderbird also KDLs in the same way: http://bebits.com/bob/21959/thunderbird-2.0.0.17.en-US.BeOS-bone.zip

comment:37 Changed 10 years ago by philcostin

Someone who has debugged semaphore issues before might find this useful...

This is the far more useful info I found after playing with the debugger and switching it to CPU 1.

kdebug> cpu 1
Welcome to Kernel Debugging Land...
Thread 538 "firefox-bin" running on CPU 1
kdebug> teams
team            id      parent          name
0x80fa5000      1       0x00000000      kernel_team
0x8105c330      532     0x8105c198      sh
0x80fa5cc0      98      0x80fa5000      Tracker
0x80fa54c8      67      0x80fa5000      debug_server
0x8105c000      99      0x80fa5000      Deskbar
0x80fa5660      68      0x80fa5000      net_server
0x80fa57f8      69      0x80fa5000      app_server
0x8105c660      103     0x80fa5000      media_server
0x8105cb28      538     0x8105c330      firefox-bin
0x8105c7f8      104     0x80fa5000      midi_server
0x8105c990      105     0x80fa5000      print_server
0x80fa5b28      419     0x80fa5cc0      Terminal
0x8105c4c8      423     0x80fa5b28      sh
0x80fa5990      83      0x80fa5000      syslog_daemon
0x80fa5e58      87      0x80fa57f8      input_server
0x80fa5198      119     0x8105c660      media_addon_server
0x8105c198      525     0x8105c4c8      sh
0x80fa5330      60      0x80fa5000      registrar
kdebug> threads 538
thread          id      state   wait for        object  cpu pri  stack          team    name
0x855b9800      538     running         -               1   10   0x85935000     538     firefox-bin
0x855ab000      539     waiting sem             6468    -   10   0x85939000     538     Mozilla XUL BApplication
kdebug> sem 6468
SEM: 0x842b91d0
id:     6468 (0x1944)
name:   ´AppLooperPort´
owner:  -1
count:  -1
queue:  539
last acquired by: 539, count: 1
last released by: 539, count: 1

comment:38 Changed 10 years ago by philcostin

Actually, Ignore the above, my screenshot still applies.. thanks to anevilyak and tqh for clarifying.

comment:39 Changed 10 years ago by tqh

Cc: tqh removed

It might be the BONE compability layer, but note that the BezillaBrowser is a far better build than the CVS-built BONE builds, with many issues fixed. It probably doesn't even use the same code-path as the BONE-builds. So it is probably necessary to compare the Firefox / BezillaBrowser code and see what is different and if this is problem with Firefox CVS code.

The NSPR (runtime layer) of BezillaBrowser has a >300kB patch...

comment:40 Changed 10 years ago by tqh

Cc: fredrik.holmqvist@… added

comment:41 Changed 10 years ago by axeld

I've added a test app "firefox_crash" in hrev30304. Unfortunately, it doesn't reproduce the crash over here (I don't manage to reproduce it with Firefox either, though). tqh, could you have a look at the test app if you spot any differences to how Firefox did it back then?

And could someone being able to reproduce the problem try to see if that test app can also reproduce it?

comment:42 Changed 10 years ago by philcostin

I believe the crashing I am getting (per my original photo) is either #2029 but with a newer codebase... or related to #2029.

This ticket most likely still applies however, as I haven't proven that yet.

I saw your test app get committed while trying to work on the issue myself but it got too late in the night by then for me to do anything more... I'll take a look at the test later to see whether I can reproduce the crash on my EEE PC 901.

Only thing is - I'm not sure how to run a test like that yet.. I'll probably need to re-check out the SVN from within the OS rather than just copying something to the machine... any advice on that axeld?

--Thanks

comment:43 Changed 10 years ago by axeld

You can build the test app anywhere via:

jam firefox_crash

And then copy it to the machine manually. When updating the system from another OS on the same machine, you can also add it to your UserBuildConfig:

AddFilesToHaikuImage home config bin : firefox_crash ;

comment:44 Changed 10 years ago by philcostin

axeld:

I ran firefox_crash on this machine 20 or 30 times (with both CPUs enabled) and I could not cause a KDL.

Changed 10 years ago by philcostin

Attachment: out.txt added

strace of test: firefox_crash

comment:45 Changed 10 years ago by philcostin

revision used: 30305

comment:46 Changed 10 years ago by axeld

Thanks philcostin! Looks like I need to have another look at the Firefox sources. At least this doesn't mean that it just doesn't happen on my machine :-)

comment:47 Changed 10 years ago by axeld

Status: reopenednew

I can perfectly reproduce this with Thunderbird!

comment:48 Changed 10 years ago by axeld

Status: newassigned

comment:49 Changed 10 years ago by axeld

Resolution: fixed
Status: assignedclosed

Fixed in hrev30363 - finally!

comment:50 Changed 10 years ago by tqh

Nice. I was going to say it might be the event port handling in CVS firefox. Although the stacktrace was a bit weird for that.

(It's completly broken and uses the memory address instead of unique id to detect / create / delete its port).

comment:51 in reply to:  50 Changed 10 years ago by anevilyak

Replying to tqh:

(It's completly broken and uses the memory address instead of unique id to detect / create / delete its port).

Ouch, I wonder if that's why CVS Firefox seemed to sometimes kill the message loopers of other apps (it used to happen remarkably often that running Vision and CVS Firefox simultaneously would result in Vision's window thread simply disappearing for no obvious reason. This would perfectly explain that behavior if it's buggy and nukes the wrong port).

comment:52 Changed 10 years ago by philcostin

Great job! Works for me now too :D hrev30383

Note: See TracTickets for help on using tickets.