Opened 12 months ago

Closed 5 months ago

#18692 closed bug (fixed)

BQuery with too many terms causes hard reboot

Reported by: jeremy-list Owned by: axeld
Priority: high Milestone: R1/beta5
Component: File Systems/BFS Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

When I attempt to run a BQuery with too many terms (I'm unsure of the exact number but it's less than 4097) my computer reboots without going through the usual shutdown routine.

Attachments (1)

testquerysize.cpp (502 bytes ) - added by jeremy-list 12 months ago.
An example program which triggers this bug.

Download all attachments as: .zip

Change History (17)

by jeremy-list, 12 months ago

Attachment: testquerysize.cpp added

An example program which triggers this bug.

comment:1 by waddlesplash, 12 months ago

Milestone: UnscheduledR1/beta5
Priority: normalhigh

Reproduces for me. Wow, been a while since we've seen a triple fault!

comment:2 by waddlesplash, 12 months ago

(Or at least, a non-hardware-related triple fault.)

comment:3 by waddlesplash, 11 months ago

Seems this also reproduces on 32-bit R1/beta4, not just 64-bit nightlies.

comment:4 by waddlesplash, 6 months ago

So, the hard reboot was due to the double fault handler being broken. I fixed that in hrev57747, and now we get a much more intelligible panic:

PANIC: Fatal exception "Double Fault Exception" occurred! Error code: 0x0

Welcome to Kernel Debugging Land...
Thread 532 "bfsQuerySize" running on CPU 0
stack trace for thread 532 "bfsQuerySize"
    kernel stack: 0xffffffff811df000 to 0xffffffff811e4000
      user stack: 0x00007ffbce23c000 to 0x00007ffbcf23c000
frame                       caller             <image>:function + offset
 0 ffffffff81d72a40 (+  16) ffffffff80271255   <kernel_x86_64> arch_debug_stack_trace + 0x13
 1 ffffffff81d72a50 (+  32) ffffffff801a1abe   <kernel_x86_64> stack_trace_trampoline(void*) + 0x11
 2 ffffffff81d72a70 (+  32) ffffffff80264f50   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 3 ffffffff81d72a90 (+  80) ffffffff801a3d7f   <kernel_x86_64> debug_call_with_fault_handler + 0x7a
 4 ffffffff81d72ae0 (+ 144) ffffffff801a1db5   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_
tag*, int) + 0x2f4
 5 ffffffff81d72b70 (+  64) ffffffff801a2214   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_l
ist_tag*, int) + 0x76
 6 ffffffff81d72bb0 (+ 240) ffffffff801a43c1   <kernel_x86_64> panic + 0xc9
 7 ffffffff81d72ca0 (+  80) ffffffff80272a4e   <kernel_x86_64> x86_fatal_exception + 0x4f
 8 ffffffff81d72cf0 (+ 584) ffffffff8026690c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff81d72f38 (end = 0xffffffff81d73000)
 rax 0x28                  rbx 0xffffffff811e3d98    rcx 0x1
 rdx 0x1                   rsi 0xffffffff811e3d98    rdi 0xffffffff9a26cff0
 rbp 0xffffffff811e0000     r8 0x0                    r9 0xffffffff80341308
 r10 0xffffffff81724f10    r11 0x206                 r12 0xffffffff9a26cff0
 r13 0xffffffff811e3d98    r14 0xffffffff811e3e38    r15 0xffffffff8a46bcf8
 rip 0xffffffff817225c4    rsp 0xffffffff811e0000 rflags 0x10212
 vector: 0x8, error code: 0x0
 9 ffffffff81d72f38 (+   0) ffffffff817225c4   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x04
10 ffffffff811e0000 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
11 ffffffff811e0020 (+  48) ffffffff81722866   <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16
12 ffffffff811e0050 (+  48) ffffffff817225d6   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16
13 ffffffff811e0080 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
14 ffffffff811e00a0 (+  48) ffffffff81722866   <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16
15 ffffffff811e00d0 (+  48) ffffffff817225d6   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16
16 ffffffff811e0100 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
...
374 ffffffff811e3ca0 (+  48) ffffffff81722866   <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16
375 ffffffff811e3cd0 (+  48) ffffffff817225d6   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16
376 ffffffff811e3d00 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
377 ffffffff811e3d20 (+  48) ffffffff81722866   <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16
378 ffffffff811e3d50 (+  48) ffffffff817225d6   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16
379 ffffffff811e3d80 (+  48) ffffffff8172296e   <bfs> Expression::Expression(char*) + 0x1e
380 ffffffff811e3db0 (+  80) ffffffff81724f5d   <bfs> bfs_open_query(fs_volume*, char const*, unsigned int, int, unsigne
d int, void**) + 0x4d
381 ffffffff811e3e00 (+  80) ffffffff8020ccd0   <kernel_x86_64> query_open(int, char const*, unsigned int, int, int, boo
l) + 0xac
382 ffffffff811e3e50 (+ 208) ffffffff802137da   <kernel_x86_64> _user_open_query + 0x156
383 ffffffff811e3f20 (+  16) ffffffff80266c0f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff811e3f30 (end = 0xffffffff811e3ff8)
 rax 0x92                  rbx 0x7ffbcf23b140        rcx 0x6a16d74b3c
 rdx 0xd3a6                rsi 0x10fdc59221c8        rdi 0x3
 rbp 0x7ffbcf23b100         r8 0xffffffff             r9 0x0
 r10 0x0                   r11 0x206                 r12 0x7ffbcf23b140
 r13 0x7ffbcf23b0d8        r14 0x923a25605b          r15 0x923a25605a
 rip 0x6a16d74b3c          rsp 0x7ffbcf23b0c8     rflags 0x206
 vector: 0x63, error code: 0x0
384 ffffffff811e3f30 (+140721617465808) 0000006a16d74b3c   <libroot.so> _kern_open_query + 0x0c
385 00007ffbcf23b100 (+ 288) 000000923a255e4e   <_APP_> main + 0xee
386 00007ffbcf23b220 (+  48) 000000923a255fee   <_APP_> _start + 0x3e
387 00007ffbcf23b250 (+  48) 000001d838f67c35   </boot/system/runtime_loader@0x000001d838f58000> <unknown> + 0xfc35
388 00007ffbcf23b280 (+   0) 00007ff5852592b0   <commpage> commpage_thread_exit + 0x00

comment:5 by pulkomandy, 6 months ago

kernel stack: 0xffffffff811df000 to 0xffffffff811e4000

  9 ffffffff81d72f38 (+   0) ffffffff817225c4   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x04
 10 ffffffff811e0000 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110

Frame 10 is inside the kernel stack range, but frame 9 is outside of it.

Looks like there was a very large allocation on the stack here and we ran out of stack space? Or the stack pointer was just corrupt.

comment:6 by waddlesplash, 6 months ago

We could just have run out of stack space and then things went wrong. I think kernel stacks are 16KB, and this looks like it's close to or over that?

comment:7 by X512, 6 months ago

 9 ffffffff81d72f38 (+   0) ffffffff817225c4   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x04
10 ffffffff811e0000 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
11 ffffffff811e0020 (+  48) ffffffff81722866   <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16
12 ffffffff811e0050 (+  48) ffffffff817225d6   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16
13 ffffffff811e0080 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
14 ffffffff811e00a0 (+  48) ffffffff81722866   <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16
15 ffffffff811e00d0 (+  48) ffffffff817225d6   <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16
16 ffffffff811e0100 (+  32) ffffffff817227d0   <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110

Recursive parsing of data that come from userland can cause kernel stack overflow and be considered as denial of service attack vulnerability.

comment:8 by waddlesplash, 6 months ago

Yes, I agree that it's a problem, and I've been looking at refactoring the query parser to not need such recursion. However, before I do that, it'd be good to unify with the generic query parser as already used by RAMFS and packagefs, but that will require a bit of work.

comment:9 by waddlesplash, 6 months ago

So, I have a prototype refactor of QueryParser here which rewrites the main parse logic to use loops instead of recursion. This gets past the above KDL, but then just crashes in a stack overflow of CalculateScore() instead. The obvious solution is to just make Operator nodes handle lists of Terms instead of just two and then iterate over them.

However, it looks like a lot of the code presently depends on Operators having just two terms, because we iterate up and down (?) the expression tree inside the entry-iteration code. It looks like this was a primitive form of optimization, allowing the least-expensive term to be evaluated first...?

axeld: Do you have any comments here, before I just go rewrite the Operator logic to handle arrays of Terms and refactor the rest of the code to match?

comment:10 by waddlesplash, 6 months ago

Considering #18672 as well, how about this: instead of the "stack" model of evaluation as the parser/evaluator presently uses, we simply locate the most apt expression to use on an index (presumably the one with the smallest index), and compare with that first, and then if it succeeds we compare with the whole expression (from the top-down, hopefully skipping the one equation already compared.)

comment:11 by waddlesplash, 6 months ago

Hmm, actually the current system might handle this fine, with a few adjustments, if it didn't recurse to calculate scores...

comment:12 by waddlesplash, 6 months ago

Ah, never mind, no it will not: the evaluator just calls Match() on the highest-up AND statement it can find, so it'll wind up recursing too far there too.

comment:13 by pulkomandy, 6 months ago

I would think parsing an arbitrary expression into a syntax tree, and tree traversal, are bothlong solved problems in computer science? And as far as I know they can both be done without recursion. You replace the recursion with a separate data structure (typically a last-in-first-out stack, unless you can optimize to something simpler), and you can traverse the tree with a loop

But also, maybe if you find yourself a dozen levels down a query expression tree, maybe just give up and return B_NO_MEMORY? Are there practical uses for expressions so complicated?

comment:14 by waddlesplash, 6 months ago

Yeah, I am thinking that limiting the number of equations may be the best route to take here. We shouldn't be evaluating thousands of them inside the kernel. How about an upper limit of 32? Would anyone really have a use for more than that?

comment:15 by nephele, 6 months ago

I‘d think limiting them now is fine, if somebody does have a use case and complains we can revisit this.

comment:16 by waddlesplash, 5 months ago

Resolution: fixed
Status: newclosed

Changes merged in hrev57778.

Note: See TracTickets for help on using tickets.