Opened 10 years ago

Closed 10 years ago

#5485 closed bug (invalid)

103: DEBUGGER: _numBlocks > 0

Reported by: mmadia Owned by: axeld
Priority: normal Milestone: R1
Component: Servers/app_server Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev35618 gcc2hybrid

Frequently, my amd x2 will drop into a white gdb session. Serial log captures 103: DEBUGGER: _numBlocks > 0 and some output from debug_server. KDL is never launched and at that point. Attached is several snippets that were captured over serial debugging.

useful lines: 113, 138, 176.

Next time this occurs, what other commands would be useful to have?

Attachments (2)

numBlock-gt-0.serial_log (12.9 KB ) - added by mmadia 10 years ago.
numBlocks-gt-0-with-libroot_debug.log (3.9 KB ) - added by mmadia 10 years ago.

Download all attachments as: .zip

Change History (15)

by mmadia, 10 years ago

Attachment: numBlock-gt-0.serial_log added

comment:1 by stippi, 10 years ago

These are random heap corruptions. But I wonder why it drops in the white GDB session. When have these crashes started?

comment:2 by stippi, 10 years ago

BTW, Ingo changed the debug interface. If you didn't make a fresh install, the debug_server may not be able to work properly.

in reply to:  2 ; comment:3 by bonefish, 10 years ago

Replying to stippi:

BTW, Ingo changed the debug interface. If you didn't make a fresh install, the debug_server may not be able to work properly.

No, the last change to the debugger interface was in hrev31682, and that wasn't even something that should affect the debug server.

Just to clarify, Matt, you're getting crashes ("_numBlocks > 0" debugger calls) from various teams, but only when it happens in the app server you get the white gdb screen, right? If so, then this is probably a bug in libroot or libbe.

You could try to use the debug libroot globally (e.g. "export LD_PRELOAD=libroot_debug.so" at the beginning of the BootScript, or rename libroot.so and symlink to libroot_debug.so). Maybe that will turn up something earlier.

in reply to:  3 comment:4 by stippi, 10 years ago

Replying to bonefish:

Replying to stippi:

BTW, Ingo changed the debug interface. If you didn't make a fresh install, the debug_server may not be able to work properly.

No, the last change to the debugger interface was in hrev31682, and that wasn't even something that should affect the debug server.

Hm, I was almost sure, from reading the commit messages, that you changed message constants when introducing the single step into/outof kernel feature. If that's the case, and he is not running a completely fresh image, would the debug_server still work?

comment:5 by stippi, 10 years ago

Looked it up, you changed the B_THREAD_DEBUG_NUB_THREAD flag in hrev35620. Don't know if that can affect anything, but it's what I was thinking of.

in reply to:  5 comment:6 by bonefish, 10 years ago

Replying to stippi:

Looked it up, you changed the B_THREAD_DEBUG_NUB_THREAD flag in hrev35620. Don't know if that can affect anything, but it's what I was thinking of.

The constant is kernel private. Only the kernel itself is affected.

in reply to:  3 comment:7 by mmadia, 10 years ago

Replying to stippi:

These are random heap corruptions. But I wonder why it drops in the white GDB session. When have these crashes started?

I'm not sure. After upgrading to 35500 or newer, the vm_page_faults have been occurring. I never reported them, as it looked to be fixed in a newer revision. Over a few updates, I've lost track of the first revision the _numBlocks > 0 occurred.

Replying to bonefish:

Just to clarify, Matt, you're getting crashes ("_numBlocks > 0" debugger calls) from various teams, but only when it happens in the app server you get the white gdb screen, right? If so, then this is probably a bug in libroot or libbe.

Right. Other times the application itself will crash, raising a debug alert window. More often than not, it's in app_server. To note, vm_page_faults still show up on the serial log at times. Later today, I'll be updating a second partition to trunk.

You could try to use the debug libroot globally (e.g. "export LD_PRELOAD=libroot_debug.so" at the beginning of the BootScript, or rename libroot.so and symlink to libroot_debug.so). Maybe that will turn up something earlier.

Attached is the output with the export statement in use.

comment:8 by stippi, 10 years ago

The log would point to Axel as the one to blame. He rewrote bitmap reference handling in the app_server lately.

in reply to:  8 comment:9 by bonefish, 10 years ago

Replying to stippi:

The log would point to Axel as the one to blame. He rewrote bitmap reference handling in the app_server lately.

But would that also affect the client side like this?

comment:10 by axeld, 10 years ago

This is definitely a strange one, as I haven't seen any generic problems like this yet (besides the slab). Since he can reproduce them that easily, it might be as well bad RAM or something.

in reply to:  10 comment:11 by mmadia, 10 years ago

Replying to axeld:

This is definitely a strange one, as I haven't seen any generic problems like this yet (besides the slab). Since he can reproduce them that easily, it might be as well bad RAM or something.

I hope not, but overnight I'll run memtest just to make sure.

comment:12 by mmadia, 10 years ago

3hours of MemTest86+ v4.00 was almost 2 full passes with no errors. Considering the errors occur more frequently than that, I'm relieved to doubt hardware issues. ;) One thing came to mind, lately, I've been running off USB sticks for the sheer joy of it. Could the problem be somehow caused by that?

~/Desktop> listusb   
2222:3061 /dev/bus/usb/0/2 "Kingsis Peripherals" "Evoluent VerticalMouse 2" ver. 0110
0000:0000 /dev/bus/usb/0/hub "HAIKU Inc." "OHCI RootHub" ver. 0110
090c:1000 /dev/bus/usb/1/1 "SMI Corporation" "USB DISK" ver. 1100
05f3:0007 /dev/bus/usb/1/3/2/1 "" "" ver. 0320
05f3:0081 /dev/bus/usb/1/3/2/hub "" "" ver. 0320
05e3:0608 /dev/bus/usb/1/3/hub "" "" ver. 0901
0000:0000 /dev/bus/usb/1/hub "HAIKU Inc." "EHCI RootHub" ver. 0200

comment:13 by mmadia, 10 years ago

Resolution: invalid
Status: newclosed

Earlier today, I downgraded to hrev34464 which was very stable version for me. After experiencing vm related crashes while compiling Haiku, I swapped the PSU out with a spare. Lo and behold, stability was regained.

By the way, can I get a cup of Earl Grey to go with my humble pie? ;)

Note: See TracTickets for help on using tickets.