Opened 17 months ago

Closed 3 months ago

#14177 closed bug (fixed)

[KDL] in vfs_vnode_io, lots of failures writing back pages

Reported by: jessicah Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: vfs Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

System KDL'd in vfs_vnode_io with GPE. Looks something like:

PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0

Thread 9 "page writer" running on CPU 0
stack trace for thread 9 "page writer"
    kernel stack: 0xffffffff81cc6000 to 0xffffffff81ccb000
....
kernel iframe at 0xffffffff81ccac28 (end = 0xfffffff81ccacf0)
 rax 0xdeadbeefdeadbeef    rbx 0xffffffff92597680    rcx 0x1
 rdx 0xffffffff92e5f868    rsi 0xffffffff90765b90    rdi 0xffffffff92597680
 rbp 0xffffffff81ccad30     r8 0x1000                 r9 0x7
 r10 0xffffffff            r11 0x0                   r12 0xffffffff92e5f868
 r13 0xffffffff90765b90    r14 0x1                   r15 0x1000
 rip 0xffffffff800fdedb    rsp 0xffffffff81ccacf0 flags 0x10286
 vector: 0xd, error code: 0x0
10 ffffffff81ccac28 (+ 264) ffffffff800fdedb  <kernel_x86_64> vfs_vnode_io + 0x1a

More detail in attached screenshot.

Running in VirtualBox with 3 vCPUs, hrev51985. Happened running haikuporter --no-source-package llvm -j3

Also, noticed PageWriteWrapper suddenly had serious issues writing pages, e.g.

KERN: acquire_advisory_lock(vnode = 0xffffffffa287dd80, flock = 0xffffffff80798eb0, wait = yes)
KERN: low resource pages: normal -> note
KERN: low resource pages: note -> normal
KERN: bfs: bfs_io:502: Invalid Argument
KERN: Last message repeated 251 times.
KERN: PageWriteWrapper: Failed to write page 0xffffffff82824a40: Invalid Argument
KERN: PageWriteWrapper: Failed to write page 0xffffffff82824ae0: Invalid Argument
KERN: PageWriteWrapper: Failed to write page 0xffffffff82824b80: Invalid Argument
KERN: PageWriteWrapper: Failed to write page 0xffffffff82819e60: Invalid Argument

Attachments (2)

syslog-page-writer.txt (826.3 KB ) - added by jessicah 17 months ago.
general protection fault.PNG (28.5 KB ) - added by jessicah 17 months ago.

Download all attachments as: .zip

Change History (11)

by jessicah, 17 months ago

Attachment: syslog-page-writer.txt added

by jessicah, 17 months ago

comment:1 by waddlesplash, 17 months ago

KDL message is the same as #14160. Possibly related to GCC7 upgrade then; it would be nice to figure out the root of the problem instead of just disabling that optimization for even more files. (Are we mishandling the SSE registers somewhere? Is something getting misaligned?)

comment:2 by waddlesplash, 17 months ago

Maybe related: #11920 (see especially Simon South's last comment) and #10509 comment 13 ("still crashing due to unaligned stack access using movdqa").

in reply to:  1 ; comment:3 by korli, 17 months ago

Replying to waddlesplash:

KDL message is the same as #14160. Possibly related to GCC7 upgrade then; it would be nice to figure out the root of the problem instead of just disabling that optimization for even more files. (Are we mishandling the SSE registers somewhere? Is something getting misaligned?)

From the other ticket, my comment from the other ticket still stands: "It would be nice to point to the documentation reference for the GCC flag."

Version 0, edited 17 months ago by korli (next)

in reply to:  2 comment:4 by korli, 17 months ago

Replying to waddlesplash:

Maybe related: #11920 (see especially Simon South's last comment) and #10509 comment 13 ("still crashing due to unaligned stack access using movdqa").

It's a GPE because the address is not in canonical form, otherwise it would be a normal pagefault.

in reply to:  3 comment:5 by korli, 17 months ago

Replying to korli:

My comment from the other ticket still stands: "It would be nice to point to the documentation reference for the GCC flag."

I tried to dump the disassembly of the kernel with and without the optimization, couldn't find a difference with or without. Maybe I'm doing something wrong.

comment:6 by waddlesplash, 17 months ago

Since "rtl-stv1" is a GCC pass, not a specific optimization group that can be disabled, the only real documentation is here: https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#Developer-Options

On IRC, the GCC developers informed me that pass is a vectorization pass. If you do diffs of assembly of files with and without it, you will see that an awful lot of code now uses SSE registers.

Maybe I'm doing something wrong.

Changing kernel flags or even ObjectC++Flags or the like does not cause jam to rebuild, apparently. So if you are trying to test with/without a certain flag, you will need to delete the kernel objects directory inbetween runs (something like rm -rf generated/objects/haiku/x86_64/system/kernel iirc.)

in reply to:  6 comment:7 by korli, 17 months ago

Replying to waddlesplash:

On IRC, the GCC developers informed me that pass is a vectorization pass. If you do diffs of assembly of files with and without it, you will see that an awful lot of code now uses SSE registers.

AFAICT this was already the case with GCC 5.4.

Changing kernel flags or even ObjectC++Flags or the like does not cause jam to rebuild, apparently. So if you are trying to test with/without a certain flag, you will need to delete the kernel objects directory inbetween runs (something like rm -rf generated/objects/haiku/x86_64/system/kernel iirc.)

That's actually how I built each kernel.

comment:8 by waddlesplash, 17 months ago

#14202 was just reported and looks similar.

comment:9 by waddlesplash, 3 months ago

Resolution: fixed
Status: newclosed

All other linked tickets of GPEs were reported fixed following stack alignment changes. So closing this one as fixed also; nobody seems to have seen it since.

Note: See TracTickets for help on using tickets.