Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#17674 closed bug (fixed)

PANIC: ASSERT FAILED (vm.cpp:3759): !area->IsWired()

Reported by: waddlesplash Owned by: nobody
Priority: normal Milestone: R1/beta4
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Got it while compiling various things in HaikuPorter.

PANIC: ASSERT FAILED (../haiku-git/src/system/kernel/vm/vm.cpp:3759): !area->IsWired()
Welcome to Kernel Debugging Land...
Thread 326673 "conftest" running on CPU 1
stack trace for thread 326673 "conftest"
    kernel stack: 0xffffffff8153d000 to 0xffffffff81542000
      user stack: 0x00007fcec8610000 to 0x00007fcec9610000
frame                       caller             <image>:function + offset
 0 ffffffff81541408 (+  24) ffffffff80144b3c   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16
 1 ffffffff81541420 (+  80) ffffffff800ae208   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff81541470 (+  96) ffffffff800af823   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3
 3 ffffffff815414d0 (+  80) ffffffff800afbbe   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff81541520 (+ 240) ffffffff800aff17   <kernel_x86_64> panic + 0xb7
 5 ffffffff81541610 (+  48) ffffffff801277ab   <kernel_x86_64> vm_delete_areas + 0x5b
 6 ffffffff81541640 (+  32) ffffffff80136d4f   <kernel_x86_64> VMAddressSpace::RemoveAndPut() + 0x2f
 7 ffffffff81541660 (+ 192) ffffffff80085346   <kernel_x86_64> team_delete_team + 0x276
 8 ffffffff81541720 (+ 544) ffffffff8008be2b   <kernel_x86_64> thread_exit + 0xd3b
 9 ffffffff81541940 (+1456) ffffffff80074a13   <kernel_x86_64> handle_signals + 0x7c3
10 ffffffff81541ef0 (+  48) ffffffff8008834e   <kernel_x86_64> thread_at_kernel_exit + 0x1e
11 ffffffff81541f20 (+  16) ffffffff8014684e   <kernel_x86_64> x86_64_syscall_entry + 0x31a
user iframe at 0xffffffff81541f30 (end = 0xffffffff81541ff8)
 rax 0x0                   rbx 0xcae787bb20          rcx 0xb831e2dc39
 rdx 0x0                   rsi 0x5e5808e8d0          rdi 0x0
 rbp 0x7fcec960fe50         r8 0xcae787aae8           r9 0x1
 r10 0x0                   r11 0x202                 r12 0x0
 r13 0x1                   r14 0x7fcec9610658        r15 0x0
 rip 0xb831e2dc39          rsp 0x7fcec960fe38     rflags 0x202
 vector: 0x63, error code: 0x0
12 ffffffff81541f30 (+140528243760928) 000000b831e2dc39   
00007fcec960fe50 -- read fault
kdebug> 

Looks a lot like #6494 from many years ago...

Change History (11)

comment:1 by waddlesplash, 2 years ago

Component: - GeneralSystem/Kernel
Version: R1/beta3R1/Development

comment:2 by waddlesplash, 2 years ago

Assuming this output is correct, remaining areas:

kdebug> areas 326673
addr                  id  base                size            protect lock  name
0xffffffffc0539c00 d16fb3  0x00007ffe4e872000  0x0000000000004000 4033    2  user area
0xfffffffffe847d80 d16fb4  0x00007fcec860c000  0x0000000001005000   3b    0  conftest_326673_stack
0xffffffffa2c81900 d16fb5  0x00007ffffed93000  0x0000000000008000 6005    2  commpage
0xffffffffa287f0c0 d16fc1  0x000011c62574c000  0x0000000000050000   33    0  heap

comment:3 by waddlesplash, 2 years ago

page stats:
total: 1310464
active: 28933 (busy: 0)
inactive: 53941 (busy: 0)
cached: 443402 (busy: 0)
unused: 262176 (busy: 0)
wired: 519001 (busy: 0)
modified: 75 (busy: 0)
free: 2936
clear: 0

It's not normal to have hundreds of thousands of wired pages, is it?

comment:4 by waddlesplash, 2 years ago

Most of these pages have usage=0,wired=0, examples:

0x01acc000 W ----- usage:  0 wired:    0
0x01acd000 W ----- usage:  0 wired:    0
0x01ace000 W ----- usage:  0 wired:    0
0x01acf000 W ----- usage:  0 wired:    0
0x01ad0000 W ----- usage:  0 wired:    0

comment:5 by waddlesplash, 2 years ago

Attempting to continue, I got another KDL:

PANIC: remove page 0xffffffff888a3d80 from cache 0xffffffffbe7c4ce0: page still has mappings!

Welcome to Kernel Debugging Land...
Thread 326673 "conftest" running on CPU 1
stack trace for thread 326673 "conftest"
    kernel stack: 0xffffffff8153d000 to 0xffffffff81542000
      user stack: 0x00007fcec8610000 to 0x00007fcec9610000
frame                       caller             <image>:function + offset
 0 ffffffff81541388 (+  24) ffffffff80144b3c   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16
 1 ffffffff815413a0 (+  80) ffffffff800ae208   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff815413f0 (+  96) ffffffff800af823   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3
 3 ffffffff81541450 (+  80) ffffffff800afbbe   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff815414a0 (+ 240) ffffffff800aff17   <kernel_x86_64> panic + 0xb7
 5 ffffffff81541590 (+  48) ffffffff8013efc2   <kernel_x86_64> VMCache::Delete() + 0x72
 6 ffffffff815415c0 (+  80) ffffffff80122cd1   <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool) + 0x1d1
 7 ffffffff81541610 (+  48) ffffffff801277b9   <kernel_x86_64> vm_delete_areas + 0x69
 8 ffffffff81541640 (+  32) ffffffff80136d4f   <kernel_x86_64> VMAddressSpace::RemoveAndPut() + 0x2f
 9 ffffffff81541660 (+ 192) ffffffff80085346   <kernel_x86_64> team_delete_team + 0x276
10 ffffffff81541720 (+ 544) ffffffff8008be2b   <kernel_x86_64> thread_exit + 0xd3b
11 ffffffff81541940 (+1456) ffffffff80074a13   <kernel_x86_64> handle_signals + 0x7c3
12 ffffffff81541ef0 (+  48) ffffffff8008834e   <kernel_x86_64> thread_at_kernel_exit + 0x1e
13 ffffffff81541f20 (+  16) ffffffff8014684e   <kernel_x86_64> x86_64_syscall_entry + 0x31a
user iframe at 0xffffffff81541f30 (end = 0xffffffff81541ff8)
 rax 0x0                   rbx 0xcae787bb20          rcx 0xb831e2dc39
 rdx 0x0                   rsi 0x5e5808e8d0          rdi 0x0
 rbp 0x7fcec960fe50         r8 0xcae787aae8           r9 0x1
 r10 0x0                   r11 0x202                 r12 0x0
 r13 0x1                   r14 0x7fcec9610658        r15 0x0
 rip 0xb831e2dc39          rsp 0x7fcec960fe38     rflags 0x202
 vector: 0x63, error code: 0x0
14 ffffffff81541f30 (+140528243760928) 000000b831e2dc39   
00007fcec960fe50 -- read fault
initial commands: page 0xffffffff888a3d80; cache 0xffffffffbe7c4ce0
PAGE: 0xffffffff888a3d80
queue_next,prev: 0x0000000000000000, 0xffffffff85a6d290
physical_number: 0x1354f8
cache:           0xffffffffbe7c4ce0
cache_offset:    64
cache_next:      0x0000000000000000
state:           active
wired_count:     1
usage_count:     0
busy:            0
busy_writing:    0
accessed:        0
modified:        0
accessor:        -1
area mappings:
CACHE 0xffffffffbe7c4ce0:
  ref_count:    0
  source:       0x0000000000000000
  type:         RAM
  virtual_base: 0x0
  virtual_end:  0x50000
  temporary:    1
  lock:         0xffffffffbe7c4d60
  lock.holder:  326673
  areas:
  consumers:
  pages:
	17 in cache
kdebug> 

comment:6 by waddlesplash, 2 years ago

There doesn't seem to be a way back from KDL here; all team destruction operations hereafter seem to end with one of the above KDLs, or a related one, e.g.:

PANIC: ASSERT FAILED (../haiku-git/src/system/kernel/vm/vm_page.cpp:1526): (!page->IsMapped()); page: 0xffffffff888a3d80
PANIC: to be freed page 0xffffffff888a3d80 has mappings

comment:7 by waddlesplash, 2 years ago

This is in fact very much reproducible when building recent versions of libgcrypt. Looking in Terminal, I see the last message is:

checking if mlock is broken...

I guess the answer is "yes", then? :-p

comment:8 by waddlesplash, 2 years ago

So, the problem seems to be that mlock() is never automatically balanced with an munlock() on team exit, and our lock_memory really demands that unlock_memory always be invoked. While the mlock()'ed memory is stored in a list in the team structure, and the initial bug could be fixed by just unwinding the list on team destruction, I don't think that's the way to go for a variety of reasons: (1) the same problem would occur on delete_area/munmap instead of team destruction, (2) lots of stuff in the kernel assumes memory will only be locked for a short period of time and waits for memory to be unlocked.

Haiku's "locked memory" concept is much more stringent than POSIX's specifications for mlock(), which only state the locked memory must remain resident in RAM until unlocked. I think what makes the most sense here is merely doing that: blocking pages from being swapped out, without wiring them or anything else.

comment:10 by waddlesplash, 2 years ago

Milestone: UnscheduledR1/beta4
Resolution: fixed
Status: newclosed

Fixed in hrev55986.

comment:11 by waddlesplash, 2 years ago

Blocked By: 6494 removed
Note: See TracTickets for help on using tickets.