Opened 14 months ago

Last modified 13 months ago

#14177 new bug

[KDL] in vfs_vnode_io, lots of failures writing back pages

Reported by: jessicah Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: vfs Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

System KDL'd in vfs_vnode_io with GPE. Looks something like:

PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0

Thread 9 "page writer" running on CPU 0
stack trace for thread 9 "page writer"
    kernel stack: 0xffffffff81cc6000 to 0xffffffff81ccb000
....
kernel iframe at 0xffffffff81ccac28 (end = 0xfffffff81ccacf0)
 rax 0xdeadbeefdeadbeef    rbx 0xffffffff92597680    rcx 0x1
 rdx 0xffffffff92e5f868    rsi 0xffffffff90765b90    rdi 0xffffffff92597680
 rbp 0xffffffff81ccad30     r8 0x1000                 r9 0x7
 r10 0xffffffff            r11 0x0                   r12 0xffffffff92e5f868
 r13 0xffffffff90765b90    r14 0x1                   r15 0x1000
 rip 0xffffffff800fdedb    rsp 0xffffffff81ccacf0 flags 0x10286
 vector: 0xd, error code: 0x0
10 ffffffff81ccac28 (+ 264) ffffffff800fdedb  <kernel_x86_64> vfs_vnode_io + 0x1a

More detail in attached screenshot.

Running in VirtualBox with 3 vCPUs, hrev51985. Happened running haikuporter --no-source-package llvm -j3

Also, noticed PageWriteWrapper suddenly had serious issues writing pages, e.g.

KERN: acquire_advisory_lock(vnode = 0xffffffffa287dd80, flock = 0xffffffff80798eb0, wait = yes)
KERN: low resource pages: normal -> note
KERN: low resource pages: note -> normal
KERN: bfs: bfs_io:502: Invalid Argument
KERN: Last message repeated 251 times.
KERN: PageWriteWrapper: Failed to write page 0xffffffff82824a40: Invalid Argument
KERN: PageWriteWrapper: Failed to write page 0xffffffff82824ae0: Invalid Argument
KERN: PageWriteWrapper: Failed to write page 0xffffffff82824b80: Invalid Argument
KERN: PageWriteWrapper: Failed to write page 0xffffffff82819e60: Invalid Argument

Attachments (2)

syslog-page-writer.txt (826.3 KB) - added by jessicah 14 months ago.
general protection fault.PNG (28.5 KB) - added by jessicah 14 months ago.

Download all attachments as: .zip

Change History (10)

Changed 14 months ago by jessicah

Attachment: syslog-page-writer.txt added

Changed 14 months ago by jessicah

comment:1 Changed 14 months ago by waddlesplash

KDL message is the same as #14160. Possibly related to GCC7 upgrade then; it would be nice to figure out the root of the problem instead of just disabling that optimization for even more files. (Are we mishandling the SSE registers somewhere? Is something getting misaligned?)

comment:2 Changed 14 months ago by waddlesplash

Maybe related: #11920 (see especially Simon South's last comment) and #10509 comment 13 ("still crashing due to unaligned stack access using movdqa").

comment:3 in reply to:  1 ; Changed 14 months ago by korli

Replying to waddlesplash:

KDL message is the same as #14160. Possibly related to GCC7 upgrade then; it would be nice to figure out the root of the problem instead of just disabling that optimization for even more files. (Are we mishandling the SSE registers somewhere? Is something getting misaligned?)

From the other ticket, my comment from the other ticket still stands: "It would be nice to point to the documentation reference for the GCC flag."

Version 0, edited 14 months ago by korli (next)

comment:4 in reply to:  2 Changed 14 months ago by korli

Replying to waddlesplash:

Maybe related: #11920 (see especially Simon South's last comment) and #10509 comment 13 ("still crashing due to unaligned stack access using movdqa").

It's a GPE because the address is not in canonical form, otherwise it would be a normal pagefault.

comment:5 in reply to:  3 Changed 14 months ago by korli

Replying to korli:

My comment from the other ticket still stands: "It would be nice to point to the documentation reference for the GCC flag."

I tried to dump the disassembly of the kernel with and without the optimization, couldn't find a difference with or without. Maybe I'm doing something wrong.

comment:6 Changed 14 months ago by waddlesplash

Since "rtl-stv1" is a GCC pass, not a specific optimization group that can be disabled, the only real documentation is here: https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#Developer-Options

On IRC, the GCC developers informed me that pass is a vectorization pass. If you do diffs of assembly of files with and without it, you will see that an awful lot of code now uses SSE registers.

Maybe I'm doing something wrong.

Changing kernel flags or even ObjectC++Flags or the like does not cause jam to rebuild, apparently. So if you are trying to test with/without a certain flag, you will need to delete the kernel objects directory inbetween runs (something like rm -rf generated/objects/haiku/x86_64/system/kernel iirc.)

comment:7 in reply to:  6 Changed 14 months ago by korli

Replying to waddlesplash:

On IRC, the GCC developers informed me that pass is a vectorization pass. If you do diffs of assembly of files with and without it, you will see that an awful lot of code now uses SSE registers.

AFAICT this was already the case with GCC 5.4.

Changing kernel flags or even ObjectC++Flags or the like does not cause jam to rebuild, apparently. So if you are trying to test with/without a certain flag, you will need to delete the kernel objects directory inbetween runs (something like rm -rf generated/objects/haiku/x86_64/system/kernel iirc.)

That's actually how I built each kernel.

comment:8 Changed 13 months ago by waddlesplash

#14202 was just reported and looks similar.

Note: See TracTickets for help on using tickets.