Opened 2 years ago

Closed 19 months ago

#18133 closed bug (fixed)

page fault in _kern_write

Reported by: jessicah Owned by: nobody
Priority: normal Milestone: R1/beta5
Component: System/Kernel Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Running yarn --verbose for vscode reliably triggers page fault in make (needs an updated version of nodejs that I need to publish):

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x0, write 0, user 0, exec 1, thread 0x331
PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0x0

Welcome to Kernel Debugging Land...
Thread 817 "make" running on CPU 2
stack trace for thread 817 "make"
    kernel stack: 0xffffffff936b8000 to 0xffffffff936bd000
      user stack: 0x00007fe16ee68000 to 0x00007fe16fe68000
frame                       caller             <image>:function + offset
 0 ffffffff936bc728 (+  24) ffffffff8014559c   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16
 1 ffffffff936bc740 (+  80) ffffffff800aec28   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff936bc790 (+  96) ffffffff800b0243   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3
 3 ffffffff936bc7f0 (+  80) ffffffff800b05de   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff936bc840 (+ 240) ffffffff800b0937   <kernel_x86_64> panic + 0xb7
 5 ffffffff936bc930 (+ 256) ffffffff8012eaf8   <kernel_x86_64> vm_page_fault + 0x258
 6 ffffffff936bca30 (+  64) ffffffff80151458   <kernel_x86_64> x86_page_fault_exception + 0x168
 7 ffffffff936bca70 (+ 904) ffffffff80146d8c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff936bcdf8 (end = 0xffffffff936bcec0)
 rax 0xffffffff801b9040    rbx 0x0                   rcx 0x331
 rdx 0x2                   rsi 0x0                   rdi 0xffffff81178582a8
 rbp 0xffffffff936bcf20     r8 0x1                    r9 0x0
 r10 0xffffffff936bce60    r11 0x206                 r12 0x1
 r13 0xffffffff80001301    r14 0x10def828f660        r15 0xffffffff9b2febc0
 rip 0x0                   rsp 0xffffffff936bcec8 rflags 0x10246
 vector: 0xe, error code: 0x10
 8 ffffffff936bcdf8 (+ 296) 0000000000000000   
 9 ffffffff936bcf20 (+  16) ffffffff8014708f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff936bcf30 (end = 0xffffffff936bcff8)
 rax 0x91                  rbx 0x5c                  rcx 0x1518d209b6c
 rdx 0x10def828f660        rsi 0xffffffffffffffff    rdi 0x1
 rbp 0x7fe16fe65f60         r8 0x5c                   r9 0x22
 r10 0x5c                  r11 0x206                 r12 0x10def828f660
 r13 0x5c                  r14 0x1518d4e59a0         r15 0x10def821f07c
 rip 0x1518d209b6c         rsp 0x7fe16fe65f48     rflags 0x206
 vector: 0x63, error code: 0x0
10 ffffffff936bcf30 (+140608043388976) 000001518d209b6c   <libroot.so> _kern_write + 0x0c
11 00007fe16fe65f60 (+  48) 000001518d23e38a   <libroot.so> _IO_new_file_write + 0x3a
12 00007fe16fe65f90 (+  48) 000001518d23de81   <libroot.so> _IO_file_setbuf (nearest) + 0x81
13 00007fe16fe65fc0 (+  32) 000001518d23ec91   <libroot.so> _IO_do_write + 0x21
14 00007fe16fe65fe0 (+  32) 000001518d23efd5   <libroot.so> _IO_file_overflow + 0x105
15 00007fe16fe66000 (+  80) 000001518d2401bc   <libroot.so> _IO_default_xsputn + 0x8c
16 00007fe16fe66050 (+  80) 000001518d23e8a3   <libroot.so> _IO_new_file_xsputn + 0x193
17 00007fe16fe660a0 (+  48) 000001518d2418b2   <libroot.so> fputs + 0x62
18 00007fe16fe660d0 (+  64) 0000025009c0ea78   <make> child_access (nearest) + 0x1c8
19 00007fe16fe66110 (+ 288) 0000025009c0f126   <make> output_start + 0x86
20 00007fe16fe66230 (+  96) 0000025009c0a9f4   <make> construct_command_argv (nearest) + 0x314
21 00007fe16fe66290 (+  96) 0000025009c0ac78   <make> construct_command_argv (nearest) + 0x598
22 00007fe16fe662f0 (+  80) 0000025009c0b6be   <make> reap_children (nearest) + 0x83e
23 00007fe16fe66340 (+ 176) 0000025009c0bc78   <make> new_job + 0x248
24 00007fe16fe663f0 (+ 176) 0000025009c1684b   <make> notice_finished_file (nearest) + 0x136b
25 00007fe16fe664a0 (+ 128) 0000025009c16ced   <make> notice_finished_file (nearest) + 0x180d
26 00007fe16fe66520 (+ 176) 0000025009c15c8c   <make> notice_finished_file (nearest) + 0x7ac
27 00007fe16fe665d0 (+ 128) 0000025009c16ced   <make> notice_finished_file (nearest) + 0x180d
28 00007fe16fe66650 (+ 176) 0000025009c15c8c   <make> notice_finished_file (nearest) + 0x7ac
29 00007fe16fe66700 (+ 128) 0000025009c16ced   <make> notice_finished_file (nearest) + 0x180d
30 00007fe16fe66780 (+ 176) 0000025009c15c8c   <make> notice_finished_file (nearest) + 0x7ac
31 00007fe16fe66830 (+ 128) 0000025009c170eb   <make> update_goal_chain + 0x12b
32 00007fe16fe668b0 (+3264) 0000025009bfd438   <make> main + 0x15c8
33 00007fe16fe67570 (+  48) 0000025009bfddce   <make> _start + 0x3e
34 00007fe16fe675a0 (+  48) 000001d141e85ae5   </boot/system/runtime_loader@0x000001d141e76000> <unknown> + 0xfae5
35 00007fe16fe675d0 (+   0) 00007fffffd5e258   <commpage> commpage_thread_exit + 0x00

Change History (10)

comment:2 by waddlesplash, 2 years ago

Component: System/libroot.soSystem/Kernel

in reply to:  1 ; comment:3 by korli, 2 years ago

Replying to jessicah:

Seems to be faulting at https://github.com/haiku/haiku/blob/master/src/system/kernel/arch/x86/64/interrupts.S#L451?

This is the syscall call. This would mean the iframe is corrupted.

in reply to:  3 comment:4 by jessicah, 23 months ago

Replying to korli:

This is the syscall call. This would mean the iframe is corrupted.

That's what I thought might be happening, any ideas on how to debug, or what might be causing it? It's 100% reproducible here.

comment:5 by waddlesplash, 23 months ago

One possibility is adding a huge buffer to the top of _user_write (i.e. over a page in size) and see if this fixes the crash. That would indicate a stack overflow. Then you should be able to use mprotect to make the page un-writeable and catch where the overflow happens.

comment:6 by jessicah, 21 months ago

Grab http://haiku.nz/files/node.zip. My IP may change from time to time, so let me know if you can't download it.

unzip node.zip
cp -r install/* ~/config/non-packaged/
pkgman install -y c_ares yarn
git clone --depth=1 https://github.com/microsoft/vscode
cd vscode
yarn

comment:7 by waddlesplash, 19 months ago

Note that this appears to be an "execute" fault, i.e. instruction fetch:

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x0, write 0, user 0, exec 1, thread 0x331

I guess this was run on a machine/VM without SMAP/SMEP enabled, because otherwise that would have triggered.

comment:8 by waddlesplash, 19 months ago

While inspecting the code, I noticed a possibility. I can't download the zip, so I can't be sure at the moment this is the exact same bug. But I highly suspect it is:

PANIC: SMEP violation user-mapped address 0x0000000000000000 touched from kernel 0x0000000000000000

Welcome to Kernel Debugging Land...
Thread 2100 "tcp_connection_test" running on CPU 1
stack trace for thread 2100 "tcp_connection_test"
    kernel stack: 0xffffffff82564000 to 0xffffffff82569000
      user stack: 0x00007fa8e0660000 to 0x00007fa8e1660000
frame                       caller             <image>:function + offset
 0 ffffffff82568828 (+  24) ffffffff80146cfc   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16
 1 ffffffff82568840 (+  80) ffffffff800b0418   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff82568890 (+  96) ffffffff800b1a83   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3
 3 ffffffff825688f0 (+  80) ffffffff800b1e1e   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff82568940 (+ 240) ffffffff800b2177   <kernel_x86_64> panic + 0xb7
 5 ffffffff82568a30 (+  64) ffffffff80152c32   <kernel_x86_64> x86_page_fault_exception + 0x122
 6 ffffffff82568a70 (+ 904) ffffffff801484ec   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff82568df8 (end = 0xffffffff82568ec0)
 rax 0xffffffff801bb040    rbx 0x0                   rcx 0x834
 rdx 0x2                   rsi 0x0                   rdi 0xffffffff983dfe70
 rbp 0xffffffff82568f20     r8 0x1                    r9 0x5
 r10 0x3                   r11 0x7f                  r12 0x1
 r13 0xffffffff80001301    r14 0x134b08fe148         r15 0xffffffff9852c280
 rip 0x0                   rsp 0xffffffff82568ec8 rflags 0x10246
 vector: 0xe, error code: 0x10
 7 ffffffff82568df8 (+ 296) 0000000000000000   
 8 ffffffff82568f20 (+  16) ffffffff801487ef   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff82568f30 (end = 0xffffffff82568ff8)
 rax 0x91                  rbx 0x134b08fe148         rcx 0x476b5058bc
 rdx 0x134b08fe148         rsi 0xffffffffffffffff    rdi 0x4
 rbp 0x7fa8e165f7c0         r8 0x476b7e51d8           r9 0x5
 r10 0x5                   r11 0x202                 r12 0x4
 r13 0x3                   r14 0x7fa8e1660658        r15 0x0
 rip 0x476b5058bc          rsp 0x7fa8e165f7a8     rflags 0x202
 vector: 0x63, error code: 0x0
 9 ffffffff82568f30 (+140365421045904) 000000476b5058bc   <libroot.so> _kern_write + 0x0c
10 00007fa8e165f7c0 (+ 176) 00000134b08fdd1e   <_APP_> main + 0x21e
11 00007fa8e165f870 (+  48) 00000134b08fdf5f   <_APP_> _start + 0x3f
12 00007fa8e165f8a0 (+  48) 00000174bd03eae5   </boot/system/runtime_loader@0x00000174bd02f000> <unknown> + 0xfae5
13 00007fa8e165f8d0 (+   0) 00007fffff2fa258   <commpage> commpage_thread_exit + 0x00

comment:9 by waddlesplash, 19 months ago

The problem that caused the above KDL is fixed in hrev56959. If it's indeed the same as this ticket, then this ticket is also fixed. So, please test.

comment:10 by waddlesplash, 19 months ago

Milestone: UnscheduledR1/beta5
Resolution: fixed
Status: newclosed

yarn succeeded without any KDL after the fix, indeed.

Note: See TracTickets for help on using tickets.