#15902 closed bug (fixed)
[DMAResource] PANIC: IORequest::_CopyData(): invalid range: (684032, 3584)
Reported by: | waddlesplash | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta3 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
I've run into this while working on refactoring nvme_disk driver to use DMAResource:
Offset: 684032; Length 2048 PANIC: IORequest::_CopyData(): invalid range: (684032, 3584) Welcome to Kernel Debugging Land... Thread 356 "sync" running on CPU 0 stack trace for thread 356 "sync" kernel stack: 0xffffffff814c1000 to 0xffffffff814c6000 user stack: 0x00007f47bc2d7000 to 0x00007f47bd2d7000 frame caller <image>:function + offset 0 ffffffff814c5618 (+ 24) ffffffff8014e44c <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff814c5630 (+ 80) ffffffff800ad6d8 <kernel_x86_64> debug_call_with_fault_handler + 0x88 2 ffffffff814c5680 (+ 96) ffffffff800af061 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf1 3 ffffffff814c56e0 (+ 80) ffffffff800af35e <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff814c5730 (+ 240) ffffffff800af6c7 <kernel_x86_64> panic + 0xb7 5 ffffffff814c5820 (+ 96) ffffffff800db624 <kernel_x86_64> IORequest::_CopyData(void*, long, unsigned long, bool) + 0x154 6 ffffffff814c5880 (+ 96) ffffffff800dbb3c <kernel_x86_64> IOOperation::Prepare(IORequest*) + 0x13c 7 ffffffff814c58e0 (+ 192) ffffffff800d6ff2 <kernel_x86_64> DMAResource::TranslateNext(IORequest*, IOOperation*, unsigned long) + 0x802 8 ffffffff814c59a0 (+ 256) ffffffff8151956e <nvme_disk> nvme_disk_io(void*, IORequest*) + 0xde 9 ffffffff814c5aa0 (+ 352) ffffffff8151a331 <nvme_disk> nvme_disk_write(void*, long, void const*, unsigned long*) + 0x81 10 ffffffff814c5c00 (+ 112) ffffffff800f1c18 <kernel_x86_64> _kern_writev + 0x118 11 ffffffff814c5c70 (+ 32) ffffffff8016b7e3 <kernel_x86_64> writev_pos + 0x13 12 ffffffff814c5c90 (+ 384) ffffffff818b8e54 <bfs> _ZN7Journal22_WriteTransactionToLogEv.localalias.9 + 0x8f4 13 ffffffff814c5e10 (+ 48) ffffffff818b9138 <bfs> _ZN7Journal9_FlushLogEbb.localalias.8 + 0x78 14 ffffffff814c5e40 (+ 224) ffffffff8010b59a <kernel_x86_64> _kern_sync + 0x11a
That is, the IORequest itself is the offset/length on top, but then by the time it gets to IOOperation::Prepare(), it is trying to copy more data than is actually in the request.
It appears this is due to DMAResource::TranslateNext aligning the length to "alignment", which in the case of nvme_disk, is actually B_PAGE_SIZE; and then it does not trim this down to only the length of the IORequest length. It looks like #14881 has the same or similar cause.
I spent some time reading IORequest::Prepare, and I think the issue is in the loop that CopyData() occurs in. The offset and length are checked only against the endOffset, which is the total end offset, not the IORequest end offset for this operation (i.e. fOriginalOffset+fOriginalLength.) I tried adding a naive bounds check like in Finish() for read (out) transfers:
@@ -469,12 +470,17 @@ IOOperation::Prepare(IORequest* request) vecOffset = 0; if (base >= bounceBufferStart && base < bounceBufferEnd) { if (offset + (off_t)length > endOffset) length = endOffset - offset; + + generic_size_t copyLength = length; + if (offset + (off_t)copyLength > originalEndOffset) + copyLength = originalEndOffset - offset; + status_t error = fParent->CopyData(offset, - bounceBuffer + (base - bounceBufferStart), length); + bounceBuffer + (base - bounceBufferStart), copyLength);
but this resulted in a second panic:
PANIC: vm_page_fault: unhandled page fault in kernel space at 0xffffffff00000600, ip 0xffffffff8016e373 Welcome to Kernel Debugging Land... Thread 373 "sync" running on CPU 0 stack trace for thread 373 "sync" kernel stack: 0xffffffff815e9000 to 0xffffffff815ee000 user stack: 0x00007fdb308cd000 to 0x00007fdb318cd000 frame caller <image>:function + offset 0 ffffffff815ed278 (+ 24) ffffffff8014e46c <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff815ed290 (+ 80) ffffffff800ad6d8 <kernel_x86_64> debug_call_with_fault_handler + 0x88 2 ffffffff815ed2e0 (+ 96) ffffffff800af061 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf1 3 ffffffff815ed340 (+ 80) ffffffff800af35e <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff815ed390 (+ 240) ffffffff800af6c7 <kernel_x86_64> panic + 0xb7 5 ffffffff815ed480 (+ 240) ffffffff80138c80 <kernel_x86_64> vm_page_fault + 0x260 6 ffffffff815ed570 (+ 64) ffffffff80158e70 <kernel_x86_64> x86_page_fault_exception + 0x160 7 ffffffff815ed5b0 (+ 536) ffffffff8014fd12 <kernel_x86_64> int_bottom + 0x56 kernel iframe at 0xffffffff815ed7c8 (end = 0xffffffff815ed890) rax 0xffffffff8040be00 rbx 0xffffffff00000600 rcx 0xcccccccccc00c6cc rdx 0xcccccccccc00c6cc rsi 0xffffffff00000600 rdi 0xffffffff8040be00 rbp 0xffffffff815ed8b0 r8 0x1 r9 0x0 r10 0xffffffff823dc340 r11 0x1 r12 0xffffffff815edab0 r13 0xffffffff8040be00 r14 0xfffffffffffffa00 r15 0xffffffff860db190 rip 0xffffffff8016e373 rsp 0xffffffff815ed890 rflags 0x13292 vector: 0xe, error code: 0x0 8 ffffffff815ed7c8 (+ 232) ffffffff8016e373 <kernel_x86_64> memcpy + 0x23 9 ffffffff815ed8b0 (+ 16) ffffffff800d941d <kernel_x86_64> IORequest::_CopySimple(void*, unsigned long, unsigned long, int, bool) + 0x5d 10 ffffffff815ed8c0 (+ 96) ffffffff800db59b <kernel_x86_64> IORequest::_CopyData(void*, long, unsigned long, bool) + 0xcb 11 ffffffff815ed920 (+ 128) ffffffff800db995 <kernel_x86_64> IOOperation::Finish() + 0x235 12 ffffffff815ed9a0 (+ 256) ffffffff81519613 <nvme_disk> nvme_disk_io(void*, IORequest*) + 0x1f3 13 ffffffff815edaa0 (+ 352) ffffffff8151a261 <nvme_disk> nvme_disk_write(void*, long, void const*, unsigned long*) + 0x81 14 ffffffff815edc00 (+ 112) ffffffff800f1c38 <kernel_x86_64> _kern_writev + 0x118 15 ffffffff815edc70 (+ 32) ffffffff8016b803 <kernel_x86_64> writev_pos + 0x13 16 ffffffff815edc90 (+ 384) ffffffff818b8e54 <bfs> _ZN7Journal22_WriteTransactionToLogEv.localalias.9 + 0x8f4 17 ffffffff815ede10 (+ 48) ffffffff818b9138 <bfs> _ZN7Journal9_FlushLogEbb.localalias.8 + 0x78 18 ffffffff815ede40 (+ 224) ffffffff8010b5ba <kernel_x86_64> _kern_sync + 0x11a
Change History (13)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
For a bit more background: NVMe is kind of an oddball here; it's designed around physically contiguous transfers. That is, the first buffer's address needs to be only 4-byte-aligned, but it must end on a page boundary, and all subsequent buffers must both start and end on a page boundary, until the last one, which can end in the middle of a page. But there is no way to convey any of that to DMAResource, so it just gets an "alignment" of B_PAGE_SIZE.
comment:3 by , 5 years ago
Here's the change which triggers the bug: https://review.haiku-os.org/c/haiku/+/2519
comment:5 by , 5 years ago
This was caused by having an alignment > block size which DMAResource can not handle. I've uploaded a new changeset that changes the block size of the DMAResource to B_PAGE_SIZE as well. Note that the block size for DMAResource does not need to correspond to the device block size.
comment:7 by , 5 years ago
Blocked By: | 15912 removed |
---|---|
Summary: | PANIC: IORequest::_CopyData(): invalid range: (684032, 3584) → [DMAResource] PANIC: IORequest::_CopyData(): invalid range: (684032, 3584) |
Nope. The problem is indeed as mmlr describes above.
comment:8 by , 5 years ago
Blocking: | 14881 removed |
---|
comment:9 by , 4 years ago
Probably the right fix here is simply to make DMAResource increase the block size to the alignment, if the alignment is greater than the block size.
comment:10 by , 4 years ago
Well, I tried to add that assert, and this happened:
PANIC: ASSERT FAILED (../src/system/kernel/device_manager/dma_resources.cpp:161): fRestrictions.alignment <= blockSize Welcome to Kernel Debugging Land... Thread 15 "main2" running on CPU 0 stack trace for thread 15 "main2" kernel stack: 0xffffffff811b4000 to 0xffffffff811b9000 frame caller <image>:function + offset 0 ffffffff811b7fe8 (+ 24) ffffffff8014f99c <kernel_x86_64> arch_debug_call_with_fault_handler() + 0x16 1 ffffffff811b8000 (+ 80) ffffffff800ae1b8 <kernel_x86_64> debug_call_with_fault_handler() + 0x88 2 ffffffff811b8050 (+ 96) ffffffff800afb41 <kernel_x86_64> _ZL20kernel_debugger_loopPKcS0_P13__va_list_tagi() + 0xf1 3 ffffffff811b80b0 (+ 80) ffffffff800afe3e <kernel_x86_64> _ZL24kernel_debugger_internalPKcS0_P13__va_list_tagi() + 0x6e 4 ffffffff811b8100 (+ 240) ffffffff800b01a7 <kernel_x86_64> panic() + 0xb7 5 ffffffff811b81f0 (+ 96) ffffffff800d6f5d <kernel_x86_64> _ZN11DMAResource4InitERK16dma_restrictionsmjj() + 0x35d 6 ffffffff811b8250 (+ 144) ffffffff800d70e2 <kernel_x86_64> _ZN11DMAResource4InitEP11device_nodemjj() + 0x152 7 ffffffff811b82e0 (+ 48) ffffffff818d5260 <scsi_disk> _ZL16das_set_capacityP15das_driver_infomj() + 0x40 8 ffffffff811b8310 (+ 96) ffffffff818d98ff <scsi_periph> _Z21periph_check_capacityP23scsi_periph_device_infoP8scsi_ccb() + 0xaf 9 ffffffff811b8370 (+ 48) ffffffff818d5037 <scsi_disk> _ZL15das_init_devicePvPS_() + 0x37 10 ffffffff811b83a0 (+ 48) ffffffff800cd2cd <kernel_x86_64> _ZN12_GLOBAL__N_16Device10InitDeviceEv() + 0x7d 11 ffffffff811b83d0 (+ 336) ffffffff800cef45 <kernel_x86_64> _ZL10devfs_openP9fs_volumeP8fs_vnodeiPPv() + 0x75 12 ffffffff811b8520 (+ 64) ffffffff800ff19f <kernel_x86_64> _ZL10open_vnodeP5vnodeib() + 0x3f 13 ffffffff811b8560 (+ 64) ffffffff8010608c <kernel_x86_64> _ZL9file_openiPcib() + 0x5c 14 ffffffff811b85a0 (+ 80) ffffffff8010c827 <kernel_x86_64> _kern_open() + 0x57 15 ffffffff811b85f0 (+ 112) ffffffff80167116 <kernel_x86_64> open() + 0x26 16 ffffffff811b8660 (+ 48) ffffffff800e6397 <kernel_x86_64> _ZN8BPrivate10DiskDevice11KDiskDevice5SetToEPKc() + 0x77 17 ffffffff811b8690 (+ 224) ffffffff800e8740 <kernel_x86_64> _ZN8BPrivate10DiskDevice18KDiskDeviceManager5_ScanEPKc() + 0x1f0 18 ffffffff811b8770 (+ 224) ffffffff800e86c2 <kernel_x86_64> _ZN8BPrivate10DiskDevice18KDiskDeviceManager5_ScanEPKc() + 0x172 19 ffffffff811b8850 (+ 224) ffffffff800e86c2 <kernel_x86_64> _ZN8BPrivate10DiskDevice18KDiskDeviceManager5_ScanEPKc() + 0x172 20 ffffffff811b8930 (+ 224) ffffffff800e86c2 <kernel_x86_64> _ZN8BPrivate10DiskDevice18KDiskDeviceManager5_ScanEPKc() + 0x172 21 ffffffff811b8a10 (+ 224) ffffffff800e86c2 <kernel_x86_64> _ZN8BPrivate10DiskDevice18KDiskDeviceManager5_ScanEPKc() + 0x172 22 ffffffff811b8af0 (+ 64) ffffffff800e8897 <kernel_x86_64> _ZN8BPrivate10DiskDevice18KDiskDeviceManager17InitialDeviceScanEv() + 0xd7 23 ffffffff811b8b30 (+1072) ffffffff801119eb <kernel_x86_64> vfs_mount_boot_file_system() + 0x41b 24 ffffffff811b8f60 (+ 80) ffffffff80063029 <kernel_x86_64> _ZL5main2Pv() + 0x99 25 ffffffff811b8fb0 (+ 32) ffffffff8008af17 <kernel_x86_64> _ZL19common_thread_entryPv() + 0x37 26 ffffffff811b8fd0 (+2128900144) ffffffff811b8fe0
So it appears something does specify an alignment greater than the block size?
comment:11 by , 4 years ago
In this case:
alignment: 14757395258967641292, block size: 512
So ... that seems kind of bogus.
comment:12 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The problem was in how I wrote the assert. Fixed in hrev54350.
The second panic usually stays as that address, or one time I saw it as "0x600." It looks like the code I added is causing overflows somehow.