Opened 3 months ago

Closed 3 months ago

#19168 closed bug (fixed)

ASSERT FAILED: page->IsMapped() in vm_free_unused_boot_loader_range()

Reported by: Illen Owned by: nobody
Priority: blocker Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #19169
Platform: All

Description

This is a recent regression since hrev58212 could boot without any panics on the Asus X570-Plus board.

Attachments (1)

x570-panic.log (210.4 KB ) - added by Illen 3 months ago.

Download all attachments as: .zip

Change History (17)

by Illen, 3 months ago

Attachment: x570-panic.log added

comment:1 by korli, 3 months ago

Component: - GeneralSystem/Kernel

which revision?

comment:2 by waddlesplash, 3 months ago

Blocking: 19169 added

comment:3 by waddlesplash, 3 months ago

Priority: normalblocker
Summary: Continuable panics when using BIOS loader on Asus X570-Plus and ASRock Z170 Pro4S systemsASSERT FAILED: page->IsMapped() in vm_free_unused_boot_loader_range()

comment:4 by waddlesplash, 3 months ago

The log says hrev58228.

Given #19009, this machine would be using the new-range logic in early page allocation. Unless one of my recent refactors to the early page allocation had a problem (I reviewed them all, they should be functionally equivalent), the only actual change here is the addition of "break" to the new-range logic.

Considering the page in question is actually in an area, I don't think the changes to the bootloader are the problem here.

comment:5 by waddlesplash, 3 months ago

Though interestingly we only get the panic for a single page. I guess the bootloader changes may have shuffled things around in such a way as to expose some other bug.

I may not have time to look into this until Monday, but given where this panic is happening, I wouldn't recommend just ignoring it and continuing. Core system state could get corrupted, and that would be bad.

comment:6 by monni, 3 months ago

Here hrev58213 through hrev58228 doesn't fully boot. hrev58212+1 still boots.

Last edited 3 months ago by monni (previous) (diff)

comment:7 by waddlesplash, 3 months ago

Can you post a syslog from the failed boot with hrev58213?

comment:8 by monni, 3 months ago

The syslog file stops after loading usb_raw, then it's basically just lots of "Symbol not found" starting with /boot/system/add-ons/kernel/network/stack ... It doesn't get to Desktop as there is too many things that fail to load... Seems like it stops reading from disk and all subsequent access fail.

comment:9 by waddlesplash, 3 months ago

What symbol(s) aren't found?

in reply to:  9 comment:10 by monni, 3 months ago

Replying to waddlesplash:

What symbol(s) aren't found?

mutex_lock is the first symbol that isn't found when loading add-ons like network stack... It's almost like when package_daemon kicks in, it forgets where everything is...

comment:11 by waddlesplash, 3 months ago

Do you have drivers in non-packaged? Or something installed in ~? That sounds like you are trying to use an older driver/network stack against a newer kernel (as those symbols were adjusted recently.)

in reply to:  11 comment:12 by monni, 3 months ago

Replying to waddlesplash:

Do you have drivers in non-packaged? Or something installed in ~? That sounds like you are trying to use an older driver/network stack against a newer kernel (as those symbols were adjusted recently.)

Nothing in non-packaged or /boot/home... It's definitely bad interaction of package_daemon as if I wipe activated-packages, it stops flooding the logs, but in serial log it suggests to uninstall or downgrade several packages just before hitting desktop. Obviously I have to boot to older state to be able to edit any files on hard disk.

comment:13 by waddlesplash, 3 months ago

I managed to reproduce this panic by booting with 3.21 GB (not 3 or 3.2 GB) of RAM in QEMU.

comment:14 by waddlesplash, 3 months ago

Something very odd is going on here. According to a debugger attached to QEMU, the virtual address we are currently trying to query and free is 0xffffffff81000000. However, the Query() method returned a physical address of 0x100000, which is of the very first page in the page list. And that page is currently allocated elsewhere:

kdebug> page 0xffffffff82800000
PAGE: 0xffffffff82800000
queue_next,prev: 0x0000000000000000, 0x0000000000000000
physical_number: 0x100
cache:           0xffffffff82022a28
cache_offset:    504
cache_next:      0xffffffff82800050
state:           wired
wired_count:     1
usage_count:     0
busy:            0
busy_writing:    0
accessed:        0
modified:        0
accessor:        0
area mappings:
kdebug> cache 0xffffffff82022a28
CACHE 0xffffffff82022a28:
  ref_count:    1
  source:       0x0000000000000000
  type:         RAM
  virtual_base: 0x0
  virtual_end:  0x580000
  temporary:    1
  lock:         0xffffffff82022aa8
  lock.holder:  -1
  areas:
    area 0x84, sem_table
        base_addr:  0xffffffff87020000, size: 0x580000
        protection: 0x30
        owner:      0x1
  consumers:
  pages:
        1408 in cache

comment:15 by waddlesplash, 3 months ago

It looks like the page table for this virtual address really does have that page in it, but how it got there I don't know.

comment:16 by waddlesplash, 3 months ago

Resolution: fixed
Status: newclosed

This seems to be fixed by the revert in hrev58237.

Note: See TracTickets for help on using tickets.