Opened 13 months ago
Closed 6 months ago
#18692 closed bug (fixed)
BQuery with too many terms causes hard reboot
Reported by: | jeremy-list | Owned by: | axeld |
---|---|---|---|
Priority: | high | Milestone: | R1/beta5 |
Component: | File Systems/BFS | Version: | R1/beta4 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
When I attempt to run a BQuery with too many terms (I'm unsure of the exact number but it's less than 4097) my computer reboots without going through the usual shutdown routine.
Attachments (1)
Change History (17)
by , 13 months ago
Attachment: | testquerysize.cpp added |
---|
comment:1 by , 13 months ago
Milestone: | Unscheduled → R1/beta5 |
---|---|
Priority: | normal → high |
Reproduces for me. Wow, been a while since we've seen a triple fault!
comment:3 by , 12 months ago
Seems this also reproduces on 32-bit R1/beta4, not just 64-bit nightlies.
comment:4 by , 7 months ago
So, the hard reboot was due to the double fault handler being broken. I fixed that in hrev57747, and now we get a much more intelligible panic:
PANIC: Fatal exception "Double Fault Exception" occurred! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 532 "bfsQuerySize" running on CPU 0 stack trace for thread 532 "bfsQuerySize" kernel stack: 0xffffffff811df000 to 0xffffffff811e4000 user stack: 0x00007ffbce23c000 to 0x00007ffbcf23c000 frame caller <image>:function + offset 0 ffffffff81d72a40 (+ 16) ffffffff80271255 <kernel_x86_64> arch_debug_stack_trace + 0x13 1 ffffffff81d72a50 (+ 32) ffffffff801a1abe <kernel_x86_64> stack_trace_trampoline(void*) + 0x11 2 ffffffff81d72a70 (+ 32) ffffffff80264f50 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 3 ffffffff81d72a90 (+ 80) ffffffff801a3d7f <kernel_x86_64> debug_call_with_fault_handler + 0x7a 4 ffffffff81d72ae0 (+ 144) ffffffff801a1db5 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_ tag*, int) + 0x2f4 5 ffffffff81d72b70 (+ 64) ffffffff801a2214 <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_l ist_tag*, int) + 0x76 6 ffffffff81d72bb0 (+ 240) ffffffff801a43c1 <kernel_x86_64> panic + 0xc9 7 ffffffff81d72ca0 (+ 80) ffffffff80272a4e <kernel_x86_64> x86_fatal_exception + 0x4f 8 ffffffff81d72cf0 (+ 584) ffffffff8026690c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff81d72f38 (end = 0xffffffff81d73000) rax 0x28 rbx 0xffffffff811e3d98 rcx 0x1 rdx 0x1 rsi 0xffffffff811e3d98 rdi 0xffffffff9a26cff0 rbp 0xffffffff811e0000 r8 0x0 r9 0xffffffff80341308 r10 0xffffffff81724f10 r11 0x206 r12 0xffffffff9a26cff0 r13 0xffffffff811e3d98 r14 0xffffffff811e3e38 r15 0xffffffff8a46bcf8 rip 0xffffffff817225c4 rsp 0xffffffff811e0000 rflags 0x10212 vector: 0x8, error code: 0x0 9 ffffffff81d72f38 (+ 0) ffffffff817225c4 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x04 10 ffffffff811e0000 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110 11 ffffffff811e0020 (+ 48) ffffffff81722866 <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16 12 ffffffff811e0050 (+ 48) ffffffff817225d6 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16 13 ffffffff811e0080 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110 14 ffffffff811e00a0 (+ 48) ffffffff81722866 <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16 15 ffffffff811e00d0 (+ 48) ffffffff817225d6 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16 16 ffffffff811e0100 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110 ... 374 ffffffff811e3ca0 (+ 48) ffffffff81722866 <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16 375 ffffffff811e3cd0 (+ 48) ffffffff817225d6 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16 376 ffffffff811e3d00 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110 377 ffffffff811e3d20 (+ 48) ffffffff81722866 <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16 378 ffffffff811e3d50 (+ 48) ffffffff817225d6 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16 379 ffffffff811e3d80 (+ 48) ffffffff8172296e <bfs> Expression::Expression(char*) + 0x1e 380 ffffffff811e3db0 (+ 80) ffffffff81724f5d <bfs> bfs_open_query(fs_volume*, char const*, unsigned int, int, unsigne d int, void**) + 0x4d 381 ffffffff811e3e00 (+ 80) ffffffff8020ccd0 <kernel_x86_64> query_open(int, char const*, unsigned int, int, int, boo l) + 0xac 382 ffffffff811e3e50 (+ 208) ffffffff802137da <kernel_x86_64> _user_open_query + 0x156 383 ffffffff811e3f20 (+ 16) ffffffff80266c0f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff811e3f30 (end = 0xffffffff811e3ff8) rax 0x92 rbx 0x7ffbcf23b140 rcx 0x6a16d74b3c rdx 0xd3a6 rsi 0x10fdc59221c8 rdi 0x3 rbp 0x7ffbcf23b100 r8 0xffffffff r9 0x0 r10 0x0 r11 0x206 r12 0x7ffbcf23b140 r13 0x7ffbcf23b0d8 r14 0x923a25605b r15 0x923a25605a rip 0x6a16d74b3c rsp 0x7ffbcf23b0c8 rflags 0x206 vector: 0x63, error code: 0x0 384 ffffffff811e3f30 (+140721617465808) 0000006a16d74b3c <libroot.so> _kern_open_query + 0x0c 385 00007ffbcf23b100 (+ 288) 000000923a255e4e <_APP_> main + 0xee 386 00007ffbcf23b220 (+ 48) 000000923a255fee <_APP_> _start + 0x3e 387 00007ffbcf23b250 (+ 48) 000001d838f67c35 </boot/system/runtime_loader@0x000001d838f58000> <unknown> + 0xfc35 388 00007ffbcf23b280 (+ 0) 00007ff5852592b0 <commpage> commpage_thread_exit + 0x00
comment:5 by , 7 months ago
kernel stack: 0xffffffff811df000 to 0xffffffff811e4000
9 ffffffff81d72f38 (+ 0) ffffffff817225c4 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x04 10 ffffffff811e0000 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
Frame 10 is inside the kernel stack range, but frame 9 is outside of it.
Looks like there was a very large allocation on the stack here and we ran out of stack space? Or the stack pointer was just corrupt.
comment:6 by , 7 months ago
We could just have run out of stack space and then things went wrong. I think kernel stacks are 16KB, and this looks like it's close to or over that?
comment:7 by , 7 months ago
9 ffffffff81d72f38 (+ 0) ffffffff817225c4 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x04 10 ffffffff811e0000 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110 11 ffffffff811e0020 (+ 48) ffffffff81722866 <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16 12 ffffffff811e0050 (+ 48) ffffffff817225d6 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16 13 ffffffff811e0080 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110 14 ffffffff811e00a0 (+ 48) ffffffff81722866 <bfs> Expression::ParseAnd[clone .localalias] (char**) + 0x16 15 ffffffff811e00d0 (+ 48) ffffffff817225d6 <bfs> Expression::ParseOr[clone .localalias] (char**) + 0x16 16 ffffffff811e0100 (+ 32) ffffffff817227d0 <bfs> Expression::ParseEquation[clone .localalias] (char**) + 0x110
Recursive parsing of data that come from userland can cause kernel stack overflow and be considered as denial of service attack vulnerability.
comment:8 by , 7 months ago
Yes, I agree that it's a problem, and I've been looking at refactoring the query parser to not need such recursion. However, before I do that, it'd be good to unify with the generic query parser as already used by RAMFS and packagefs, but that will require a bit of work.
comment:9 by , 7 months ago
So, I have a prototype refactor of QueryParser here which rewrites the main parse logic to use loops instead of recursion. This gets past the above KDL, but then just crashes in a stack overflow of CalculateScore() instead. The obvious solution is to just make Operator nodes handle lists of Terms instead of just two and then iterate over them.
However, it looks like a lot of the code presently depends on Operators having just two terms, because we iterate up and down (?) the expression tree inside the entry-iteration code. It looks like this was a primitive form of optimization, allowing the least-expensive term to be evaluated first...?
axeld: Do you have any comments here, before I just go rewrite the Operator logic to handle arrays of Terms and refactor the rest of the code to match?
comment:10 by , 7 months ago
Considering #18672 as well, how about this: instead of the "stack" model of evaluation as the parser/evaluator presently uses, we simply locate the most apt expression to use on an index (presumably the one with the smallest index), and compare with that first, and then if it succeeds we compare with the whole expression (from the top-down, hopefully skipping the one equation already compared.)
comment:11 by , 7 months ago
Hmm, actually the current system might handle this fine, with a few adjustments, if it didn't recurse to calculate scores...
comment:12 by , 7 months ago
Ah, never mind, no it will not: the evaluator just calls Match() on the highest-up AND statement it can find, so it'll wind up recursing too far there too.
comment:13 by , 7 months ago
I would think parsing an arbitrary expression into a syntax tree, and tree traversal, are bothlong solved problems in computer science? And as far as I know they can both be done without recursion. You replace the recursion with a separate data structure (typically a last-in-first-out stack, unless you can optimize to something simpler), and you can traverse the tree with a loop
But also, maybe if you find yourself a dozen levels down a query expression tree, maybe just give up and return B_NO_MEMORY? Are there practical uses for expressions so complicated?
comment:14 by , 7 months ago
Yeah, I am thinking that limiting the number of equations may be the best route to take here. We shouldn't be evaluating thousands of them inside the kernel. How about an upper limit of 32? Would anyone really have a use for more than that?
comment:15 by , 7 months ago
I‘d think limiting them now is fine, if somebody does have a use case and complains we can revisit this.
An example program which triggers this bug.