Opened 9 years ago

Closed 9 years ago

#6751 closed bug (fixed)

gdb won't single step

Reported by: grahamh Owned by: bonefish
Priority: normal Milestone: R1
Component: Applications/Command Line Tools Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

Description

The normal installed version of GDB doesn't seem to be able to single step through a C program. I am a bit dubious that this isn't "just me", as it's a pretty glaringly obvious problem. But I have reproduced it on physical hardware (Pentium III) and on VMWare, with GCC2 and GCC4, and with R1a2, R1 nightly and a gdb built locally from source.

How to reproduce

/data/j> uname -a
Haiku shredder 1 r39138 Oct 25 2010 07:43:53 BePC Haiku

# compile a simple test program

/data/j> cat test.c
#include <stdio.h>
int main(int argc, char** argv) {
        int a, b, c;
        a = 1;
        b = a * 2;
        c = b * a * 4;
        printf("c = %d\n", c);
        return 0;
}
/data/j> gcc -g -O0 -o test test.c

# this is using the default compiler on a GCC2 hybrid, but
# the same thing happens if we switch to GCC4 (or even
# my experimental build of tcc)

/data/j> setgcc
Current GCC: x86/gcc2

# feed it to the debugger

/data/j> gdb test
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-pc-haiku"...

# run to the first line

(gdb) start
Breakpoint 1 at 0x857: file test.c, line 4.
Starting program: /data/j/test 
Breakpoint 1 at 0x200857: file test.c, line 4.
main (argc=1, argv=0x7ffef538) at test.c:4
4               a = 1;

# now try to single step

(gdb) step
0x0029acca in fstat@@LIBROOT_1_ALPHA1 () from /boot/system/lib/libroot.so

# we have skipped several lines and stopped somewhere deep in printf()

(gdb) bt
#0  0x0029acca in fstat@@LIBROOT_1_ALPHA1 () from /boot/system/lib/libroot.so
#1  0x002546e6 in _IO_file_stat () from /boot/system/lib/libroot.so
#2  0x00253761 in _IO_file_doallocate () from /boot/system/lib/libroot.so
#3  0x00255cc7 in _IO_doallocbuf () from /boot/system/lib/libroot.so
#4  0x00253f6b in _IO_new_file_overflow () from /boot/system/lib/libroot.so
#5  0x00254884 in _IO_new_file_xsputn () from /boot/system/lib/libroot.so
#6  0x0026ceb4 in vfprintf () from /boot/system/lib/libroot.so
#7  0x0026890c in printf () from /boot/system/lib/libroot.so
#8  0x0020088f in main (argc=1, argv=0x7ffef538) at test.c:7
(gdb) 

For reference, here's the same thing done on Mac OS X 10.4

Breakpoint 1, main (argc=1, argv=0xbffffc70) at test.c:4
4         a = 1;
(gdb) step
5         b = a * 2;
(gdb) step
6         c = b * a * 4;
(gdb) step
7         printf("c = %d", c);
(gdb) step
8         return 0;

Back on Haiku, another favourite response to "step" or "next" is:

0xffff0114 in ?? ()

Change History (12)

comment:1 Changed 9 years ago by bonefish

Owner: changed from nobody to bonefish
Status: newassigned
Version: R1/alpha2R1/Development

Assigning to myself, though I don't promise to look into it anytime soon.

At BeGeistert anevilyak demonstrated a similar problem with Debugger. I'm quite certain that problem didn't exist a year ago.

If you feel adventurous, you could enable debug output in src/system/kernel/debug/user_debugger.cpp and src/system/kernel/debug/BreakpointManager.cpp. The output goes to /var/log/syslog.

comment:2 in reply to:  1 ; Changed 9 years ago by anevilyak

Replying to bonefish:

At BeGeistert anevilyak demonstrated a similar problem with Debugger. I'm quite certain that problem didn't exist a year ago.

IIRC the behavior we observed was as if Step Over and Step Into had had their meaning reversed, no? In case it helps tracking it down at all.

comment:3 in reply to:  2 ; Changed 9 years ago by bonefish

Replying to anevilyak:

Replying to bonefish:

At BeGeistert anevilyak demonstrated a similar problem with Debugger. I'm quite certain that problem didn't exist a year ago.

IIRC the behavior we observed was as if Step Over and Step Into had had their meaning reversed, no? In case it helps tracking it down at all.

I might recall that incorrectly, but I thought you had a case where a single-step would run the program to completion, unless a breakpoint had been set before (in which case it would properly single-step). At that time I assumed that to be a bug in Debugger itself, but this single-step problem in gdb sounds similar enough to suggest a kernel bug.

comment:4 in reply to:  3 Changed 9 years ago by anevilyak

Replying to bonefish:

I might recall that incorrectly, but I thought you had a case where a single-step would run the program to completion, unless a breakpoint had been set before (in which case it would properly single-step). At that time I assumed that to be a bug in Debugger itself, but this single-step problem in gdb sounds similar enough to suggest a kernel bug.

Come to think of it, there might be both cases. Will try and find some time to do more detailed tests this week and post back here when I figure it out for certain. The behavior with the breakpoint would make sense if the meanings of the ops were reversed as well though, since in theory it'd normally try to step over all of main(), but would get stopped by the breakpoint instead.

comment:5 Changed 9 years ago by grahamh

Some additional digging...

On single step, GDB is correctly getting back a B_DEBUGGER_MESSAGE_SINGLE_STEP. So that part works.

Next thing it does is call read_pc_pid(), which calls haiku_child_fetch_inferior_registers for register 8. This is returning 0xffff0114 - which explains why that value sometimes crops up in the stack trace. GDB then decides, correctly, that this PC value isn't anywhere near where it ought to have stopped, and so just carries on.

haiku_child_fetch_inferior_registers is supposed to send a B_DEBUG_MESSAGE_GET_CPU_STATE to fetch the registers. The next thing to do is to figure out if it's sending the wrong message (a gdb problem) or if it's getting the wrong reply (maybe a kernel problem).

comment:6 Changed 9 years ago by grahamh

OK, the gdb end of the system is in the clear. It's doing the single step as requested, and the B_DEBUGGER_SINGLE_STEP message it gets back has cpu_state.eip == 0xffff0114, which is somewhere either in the commpage or just off the end of it. So back to Comment 1 and enabling logging in parts of my kernel...

comment:7 in reply to:  6 Changed 9 years ago by bonefish

Replying to grahamh:

OK, the gdb end of the system is in the clear. It's doing the single step as requested, and the B_DEBUGGER_SINGLE_STEP message it gets back has cpu_state.eip == 0xffff0114, which is somewhere either in the commpage or just off the end of it.

Yep, that address is the syscall trampoline in the commpage. BTW, the kernel debugger knows it and prints the stack trace accordingly. If gdb delivers garbage, it's usually a good idea to check from KDL.

comment:8 Changed 9 years ago by grahamh

After some more investigation, here is what I think is happening.

Asking for a single-step in gdb sends a message to the in-kernel debug facility, which sets the CPU's single-step flag in the context of the debugged thread. When the thread is resumed, it executes one instruction, then traps with an INT1 debug exception. The INT1 handler passes control to the generic trap handler, which calls gInterruptHandlerTable[1], which is x86_handle_debug_exception. So far, so good.

The first that x86_handle_debug_exception() does is look at DR6 and DR7 to work out what kind of debug event this is. However, if the exception was from user space ( IFRAME_IS_USER(frame) ), it uses supposedly cached copies of the DRs from the thread's cpu info struct. The comment above that line says that they should have been saved by x86_exit_user_debug_at_kernel_entry - but as we have seen, that function isn't in the execution chain here.

So the exception handler sees a bogus DR6=0, DR7=0, it can't understand the trap type, and it just returns. Future instructions are then all trapped in the same way, up until the process makes a system call (or possibly a context switch). At that point, x86_exit_user_debug_at_kernel_entry happens and DR6 and DR7 are stored in the thread struct. Single stepping is resumed when the system call ends and the thread's user mode context is restored. Now having correct DR values, the trap handler stops the process and returns to GDB after the next instruction - which is in the syscall trampoline in the commpage.

Changing the top of x86_handle_debug_exception to always use DR6 and DR7 directly, rather than sometimes using thread->cpu->arch.dr6, fixes the problem with gdb single stepping. However, the behaviour looks like it was put in to fix a debugger crash, so it probably needs a more subtle fix.

Just to finish, I noticed that this issue was also mentioned in bug #5742, and looks to have been introduced with changeset 36340, which was the fix for that one.

comment:9 Changed 9 years ago by bonefish

Status: assignedin-progress

grahamh, thanks for the investigation. Will look into it.

comment:10 in reply to:  8 Changed 9 years ago by bonefish

Replying to grahamh:

After some more investigation, here is what I think is happening.

Asking for a single-step in gdb sends a message to the in-kernel debug facility, which sets the CPU's single-step flag in the context of the debugged thread. When the thread is resumed, it executes one instruction, then traps with an INT1 debug exception. The INT1 handler passes control to the generic trap handler, which calls gInterruptHandlerTable[1], which is x86_handle_debug_exception. So far, so good.

The first that x86_handle_debug_exception() does is look at DR6 and DR7 to work out what kind of debug event this is. However, if the exception was from user space ( IFRAME_IS_USER(frame) ), it uses supposedly cached copies of the DRs from the thread's cpu info struct. The comment above that line says that they should have been saved by x86_exit_user_debug_at_kernel_entry - but as we have seen, that function isn't in the execution chain here.

You have either omitted them intentionally or you're missing a few details here. If you have a look at src/system/kernel/arch/x86/arch_interrupts.S, this is where the magic is supposed to happen. On any exception (save the double fault) int_bottom is entered. There we check whether we came from userland and, if so, continue in int_bottom_user. The DISABLE_BREAKPOINTS() macro is where we call x86_exit_user_debug_at_kernel_entry(), when the THREAD_FLAGS_BREAKPOINTS_INSTALLED thread flag is set. I haven't checked the details yet, but I guess the problem is that this flag is only set, when there's indeed at least one breakpoint installed, i.e. not when single-stepping without any breakpoint (this would also perfectly explain the problem Rene showed me). I'll think about how to solve this without breaking what hrev35951/hrev36340 were trying to fix. Though probably not before tomorrow.

comment:11 Changed 9 years ago by grahamh

Sorry, yes, you are right. I didn't mention DISABLE_BREAKPOINTS, because it doesn't do anything unless there are breakpoints set - so I didn't realise its relevance. As you correctly say, single step doesn't set the thread's breakpoint flag, because on x86 single stepping doesn't work via breakpoints, and that is what's preventing x86_exit_user_debug_at_kernel_entry() from being called. Presumably setting the breakpoint flag for single step might be a solution (when getting a B_DEBUG_MESSAGE_CONTINUE_THREAD with single_step set), but I don't know what other things that might break.

(This is the first time I have looked inside the Haiku kernel, so I don't have a good feel for how everything interconnects yet...)

comment:12 Changed 9 years ago by bonefish

Resolution: fixed
Status: in-progressclosed

Fixed in hrev39201.

Note: See TracTickets for help on using tickets.