#9896 closed bug (not reproducible)
DVD file copy stalls at ca. 1 GB, pegs CPUs
Reported by: | ttcoder | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | File Systems/UDF | Version: | R1/Development |
Keywords: | Cc: | korli | |
Blocked By: | Blocking: | #9181 | |
Platform: | All |
Description
I've tried 4 times now: it does not matter if I use Tracker or Terminal & cp, can't copy big files from DVD. If I reboot to Windows the very same DVD works 100% so that would rule out a bad sector/faulty media I guess.
Symptoms:
- after copying ca. 1 GB (roughly 50% of the 2 GB file), the DVD drive stops, its light switches off
- CPUs peg up, at ca. 60% usage. (if I try to immediately run another test, I eventually end up with 100% CPU usage).
- can't Ctrl-C the /bin/cp process
- can't kill it in ProcessController either.
- can't attach-for-debugging to it (after running Debugger), though that might just be me being a newbie with the new debugger.
I basically 'have' to reboot at this point, though apart from pegged CPU the system seems stable (I'm writing this in W+ without having rebooted yet).
The syslog includes a couple statements made with echo > /dev/dprint for clarity.
Attachments (4)
Change History (23)
by , 11 years ago
Attachment: | syslog_STALL_with_some_dprintf_echo.txt added |
---|
comment:1 by , 11 years ago
Assuming you're able to drop into KDL at this point, it would be interesting to see the backtrace of that particular thread. Not being able to attach to it would imply it's likely blocked in the kernel somewhere.
comment:2 by , 11 years ago
Seems to be object_cache_alloc
; not sure if I have to do something other than 'bt' when the prompt comes up, or if I shoul drop into KDL multiple times to see if the backtrace changes over time?
(great that this stuff is written to the syslog BTW, saves me doing some photo grabbing :-)
KERN: Keyboard Requested Halt. KERN: Welcome to Kernel Debugging Land... KERN: Thread 765 "cp" running on CPU 0 KERN: kdebug> kdebug> kdebug> kdebug> btstack trace for thread 765 "cp" KERN: kernel stack: 0x81b06000 to 0x81b0a000 KERN: user stack: 0x624d4000 to 0x634d4000 KERN: frame caller <image>:function + offset KERN: 0 81b08fb4 (+ 48) 80095b0c <kernel_x86> invoke_command_trampoline([34m0x81b0904c[0m) + 0x1c KERN: 1 81b08fe4 (+ 12) 801257e2 <kernel_x86> arch_debug_call_with_fault_handler + 0x1b KERN: 2 81b08ff0 (+ 48) 80094a86 <kernel_x86> debug_call_with_fault_handler + 0x5e KERN: 3 81b09020 (+ 64) 80095d65 <kernel_x86> invoke_debugger_command + 0xb9 KERN: 4 81b09060 (+ 64) 80095b91 <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: [34m0xcce38030[0m, int32: [34m0[0m, [34m0x0[0m [31m"<NULL>"[0m) + 0x79 KERN: 5 81b090a0 (+ 64) 80095ed0 <kernel_x86> invoke_debugger_command_pipe + 0x9c KERN: 6 81b090e0 (+ 48) 80097b6c <kernel_x86> ExpressionParser<[32m0x81b09190[0m>::_ParseCommandPipe([34m0x81b0918c[0m) + 0x234 KERN: 7 81b09110 (+ 64) 80096d8c <kernel_x86> ExpressionParser<[32m0x81b09190[0m>::EvaluateCommand([34m0x801a1f40[0m [36m"bt"[0m, [34m0x81b0918c[0m) + 0x2d0 KERN: 8 81b09150 (+ 224) 800991b8 <kernel_x86> evaluate_debug_command + 0x80 KERN: 9 81b09230 (+ 64) 80093312 <kernel_x86> kernel_debugger_loop([34m0x80171b7e[0m [36m"Keyboard Requested Halt."[0m, [34m0x0[0m [31m"<NULL>"[0m, [34m0x0[0m [31m"<NULL>"[0m, int32: [34m0[0m) + 0x32a KERN: 10 81b09270 (+ 48) 80093567 <kernel_x86> kernel_debugger_internal([34m0x80171b7e[0m [36m"Keyboard Requested Halt."[0m, [34m0x0[0m [31m"<NULL>"[0m, [34m0x0[0m [31m"<NULL>"[0m, int32: [34m0[0m) + 0x53 KERN: 11 81b092a0 (+ 48) 80094e48 <kernel_x86> kernel_debugger + 0x28 KERN: 12 81b092d0 (+ 48) 80094986 <kernel_x86> debug_emergency_key_pressed + 0x2a KERN: 13 81b09300 (+ 64) d175fee3 </boot/system/add-ons/kernel/bus_managers/ps2> keyboard_handle_int(ps2_dev*: [34m0xd1765d3c[0m) + 0x11f KERN: 14 81b09340 (+ 64) d175f914 </boot/system/add-ons/kernel/bus_managers/ps2> ps2_dev_handle_int + 0x218 KERN: 15 81b09380 (+ 48) d175efca </boot/system/add-ons/kernel/bus_managers/ps2> ps2_interrupt(NULL) + 0xee KERN: 16 81b093b0 (+ 48) 800597aa <kernel_x86> int_io_interrupt_handler + 0x5e KERN: 17 81b093e0 (+ 64) 801342ed <kernel_x86> x86_hardware_interrupt + 0x6d KERN: 18 81b09420 (+ 12) 80128290 <kernel_x86> int_bottom + 0x30 KERN: kernel iframe at 0x81b0942c (end = 0x81b0947c) KERN: eax 0xdfdfe880 ebx 0xdfdfe880 ecx 0x60 edx 0x60 KERN: esi 0x180 edi 0x82807dc0 ebp 0x81b094a4 esp 0x81b09460 KERN: eip 0x801000bf eflags 0x210246 KERN: vector: 0x21, error code: 0x0 KERN: 19 81b0942c (+ 120) 801000bf <kernel_x86> object_cache_alloc + 0x32b KERN: 20 81b094a4 (+ 48) 800f8aa3 <kernel_x86> block_alloc(uint32: [34m0x180[0m ([34m384[0m), uint32: [34m0x0[0m ([34m0[0m), uint32: [34m0x0[0m ([34m0[0m)) + 0x73 KERN: 21 81b094d4 (+ 48) 800f8de4 <kernel_x86> realloc + 0x38 KERN: 22 81b09504 (+ 48) 8004e858 <kernel_x86> FileMap<[32m0xdf02bea0[0m>::_MakeSpace(uint32: [34m0x9[0m ([34m9[0m)) + 0x88 KERN: 23 81b09534 (+ 64) 8004e8d6 <kernel_x86> FileMap<[32m0xdf02bea0[0m>::_Add(file_io_vec*: [34m0x81b095b4[0m, uint32: [34m0x8[0m ([34m8[0m), [34m0x81b095ac[0m) + 0x36 KERN: 24 81b09574 (+ 192) 8004eccb <kernel_x86> FileMap<[32m0xdf02bea0[0m>::_Cache(int64: [34m1073676288[0m, int64: [34m65536[0m) + 0xf7 KERN: 25 81b09634 (+ 80) 8004eed8 <kernel_x86> FileMap<[32m0xdf02bea0[0m>::Translate(int64: [34m1073676288[0m, uint32: [34m0x10000[0m ([34m65536[0m), file_io_vec*: [34m0x81b09764[0m, [34m0x81b09760[0m, uint32: [34m0x800[0m ([34m2048[0m)) + 0x100 KERN: 26 81b09684 (+ 48) 8004f49a <kernel_x86> file_map_translate + 0x2e KERN: 27 81b096b4 (+ 64) 81afd6b6 </boot/system/add-ons/kernel/file_systems/udf> iterative_io_get_vecs_hook([34m0xdf00cd80[0m, IORequest*: [34m0x81b098dc[0m, int64: [34m1073676288[0m, uint32: [34m0x10000[0m ([34m65536[0m), file_io_vec*: [34m0x81b09764[0m, [34m0x81b09760[0m) + 0x3e KERN: 28 81b096f4 (+ 240) 800e8b06 <kernel_x86> do_iterative_fd_io_iterate([34m0xd26578a8[0m, IORequest*: [34m0x81b098dc[0m, [34m0x81b09813[0m) + 0x8a KERN: 29 81b097e4 (+ 64) 800e96cb <kernel_x86> do_iterative_fd_io + 0x16b KERN: 30 81b09824 (+ 64) 81afdee9 </boot/system/add-ons/kernel/file_systems/udf> udf_io(fs_volume*: [34m0xdf02ac00[0m, fs_vnode*: [34m0xd23ff590[0m, NULL, IORequest*: [34m0x81b098dc[0m) + 0x69 KERN: 31 81b09864 (+ 64) 800e9264 <kernel_x86> vfs_vnode_io + 0x2c KERN: 32 81b098a4 (+ 224) 800deda5 <kernel_x86> vfs_read_pages + 0x69 KERN: 33 81b09984 (+ 784) 8004c446 <kernel_x86> read_into_cache(file_cache_ref*: [34m0xdf02a780[0m, NULL, int64: [34m1073676288[0m, int32: [34m0[0m, uint32: [34m0x17eb000[0m, uint32: [34m0x10000[0m ([34m65536[0m), [34mtrue[0m, vm_page_reservation*: [34m0x81b09d58[0m, uint32: [34m0x0[0m ([34m0[0m)) + 0x1ee KERN: 34 81b09c94 (+ 208) 8004d949 <kernel_x86> cache_io([34m0xdf02a780[0m, NULL, int64: [34m1073737728[0m, uint32: [34m0x17fa000[0m, [34m0x81b09f2c[0m, [34mfalse[0m) + 0x701 KERN: 35 81b09d64 (+ 96) 8004e38e <kernel_x86> file_cache_read + 0x36 KERN: 36 81b09dc4 (+ 128) 81af8064 </boot/system/add-ons/kernel/file_systems/udf> Icb<[32m0xdf00cd80[0m>::Read(int64: [34m1073676288[0m, [34m0x17eb000[0m, [34m0x81b09f2c[0m, NULL) + 0x88 KERN: 37 81b09e44 (+ 64) 81afde77 </boot/system/add-ons/kernel/file_systems/udf> udf_read(fs_volume*: [34m0xdf02ac00[0m, fs_vnode*: [34m0xd23ff590[0m, [34m0xd22cc340[0m, int64: [34m1073676288[0m, [34m0x17eb000[0m, [34m0x81b09f2c[0m) + 0x33 KERN: 38 81b09e84 (+ 64) 800e0ddb <kernel_x86> file_read(file_descriptor*: [34m0xdf02a540[0m, int64: [34m1073676288[0m, [34m0x17eb000[0m, [34m0x81b09f2c[0m) + 0x67 KERN: 39 81b09ec4 (+ 80) 800ccbad <kernel_x86> common_user_io(int32: [34m3[0m, int64: [34m-1[0m, [34m0x17eb000[0m, uint32: [34m0x10000[0m ([34m65536[0m), [34mfalse[0m) + 0x181 KERN: 40 81b09f14 (+ 48) 800cd070 <kernel_x86> _user_read + 0x28 KERN: 41 81b09f44 (+ 100) 80128480 <kernel_x86> handle_syscall + 0xcd KERN: user iframe at 0x81b09fa8 (end = 0x81b0a000) KERN: eax 0x8a ebx 0x10212cc ecx 0x634d2a04 edx 0x61738114 KERN: esi 0x10000 edi 0x0 ebp 0x634d2a40 esp 0x81b09fdc KERN: eip 0x61738114 eflags 0x3202 user esp 0x634d2a04 KERN: vector: 0x63, error code: 0x0 KERN: 42 81b09fa8 (+ 0) 61738114 <commpage> commpage_syscall + 0x04 KERN: 43 634d2a40 (+ 384) 0030cd6b <_APP_> main (nearest) + 0x1737 KERN: 44 634d2bc0 (+ 480) 0030eb7b <_APP_> src_info_init (nearest) + 0x145f KERN: 45 634d2da0 (+ 80) 0030f431 <_APP_> copy + 0x89 KERN: 46 634d2df0 (+ 224) 0030b2c9 <_APP_> usage (nearest) + 0xbd5 KERN: 47 634d2ed0 (+ 240) 0030c037 <_APP_> main + 0xa03 KERN: 48 634d2fc0 (+ 48) 0030a5cf <_APP_> _start + 0x5b KERN: 49 634d2ff0 (+ 48) 00c745ae </boot/system/runtime_loader@0x00c65000> <unknown> + 0xf5ae KERN: 50 634d3020 (+ 0) 61738250 <commpage> commpage_thread_exit + 0x00 KERN: kdebug> kdebug> continue KERN: Last message repeated 2 times.
comment:3 by , 11 years ago
Component: | - General → File Systems/UDF |
---|
That could be a problem in either the object cache or in the UDF add-on itself. Assigning to the latter for now though, as similarly-sized copies work fine for other media/filesystems.
comment:4 by , 11 years ago
The infinite loop can be at any level in the stack trace. So entering KDL multiple times to do a stack trace would indeed help to possibly rule out the functions in the upper stack. Fortunately the kernel debugger has an even more helpful tool for this case: breakpoint support. The "caller" column shows the return addresses, so setting a breakpoint at such an address can conclusively determine whether the callee returns to the caller.
The only caveat of the method is that breakpoints cannot be restricted to certain teams or threads, so setting a breakpoint in a commonly used function (like in the memory allocation code) will likely not be very helpful, since it may repeatedly be triggered by other threads. The four UDF functions we see in the stack trace should be pretty good locations, though. The file map related kernel code might be OK, too. It is also used by BFS, but it shouldn't be hit very often (possibly only for the syslog writes), as long as you avoid doing stuff with files (e.g. in Tracker).
The strategy as such is rather simple:
- Enter KDL and get a stack trace of the misbehaving thread.
- Pick one of the caller addresses, set a breakpoint via
breakpoint <address>
, andcontinue
. - If the breakpoint is hit (KDL will be entered with a respective message):
- If it is in the wrong thread, just continue. If the breakpoint keeps being hit by the wrong threads, clear the breakpoint (
breakpoint <address> clear
-- use the command history to avoid address typos) and try 2. again with a different address. - If it is in the right thread, clear the breakpoint and try 2. again with an address lower in the stack trace.
- If it is in the wrong thread, just continue. If the breakpoint keeps being hit by the wrong threads, clear the breakpoint (
The goal of the exercise is to find the topmost caller that is not being returned to or at least narrow it down (i.e. find a lower boundary).
comment:5 by , 11 years ago
I'm returning with what I think is a solid pin-point on the "stuck" function, though you guys need to interpret it ("caller" means the breakpoint pertains to the line below which it is written, off-by-one, or to the same line?).
It didn't take long before I narrowed things down to this block:
KERN: 24 81b09574 (+ 192) 8004eccb <kernel_x86> FileMap<[32m0xdf02bea0[0m>::_Cache(int64: [34m1073676288[0m, int64: [34m65536[0m) + 0xf7 KERN: 25 81b09634 (+ 80) 8004eed8 <kernel_x86> FileMap<[32m0xdf02bea0[0m>::Translate(int64: [34m1073676288[0m, uint32: [34m0x10000[0m ([34m65536[0m), file_io_vec*: [34m0x81b09764[0m, [34m0x81b09760[0m, uint32: [34m0x800[0m ([34m2048[0m)) + 0x100 KERN: 26 81b09684 (+ 48) 8004f49a <kernel_x86> file_map_translate + 0x2e
So I did..
- breakpoint 0x8004eccb
Which was hit all the time (Desktop did not even have time to redraw after a "continue", I was dropped right back to KDL).
Then I cleared it and did...
- breakpoint 0x8004eed8
Which was hit every 4 seconds or so, by a different thread (page_writer), but never ever by the "cp" thread.
comment:6 by , 11 years ago
Thanks, that pinpoints it indeed. The function name matches the caller address on the same line, so FileMap::_Cache()
doesn't return. It only has a single loop that repeatedly calls vfs_get_file_map()
(which just calls the file system hook) and adds the retrieved vectors until it hits the end of the file or the maximum cache size. So either the retrieved vectors aren't handled correctly or UDF's hook doesn't work correctly -- returning B_OK
together with no vectors or an empty one. I suppose an assert could be added to catch the latter case at least.
comment:7 by , 11 years ago
Cc: | added |
---|
From looking at http://cgit.haiku-os.org/haiku/tree/src/add-ons/kernel/file_systems/udf/Icb.cpp#n498 , the empty case looks to be a distinct possibility. Perhaps korli can shed some light, as he most recently touched the UDF code.
comment:8 by , 11 years ago
An empty extent at the end of the file could definitely trigger such a behavior. FileMap::_Cache() should btw checks whether the offset has been advanced in the loop. It would interesting to see the log generated by activating it here.
by , 11 years ago
Attachment: | syslog_last UDF trace.txt added |
---|
(if needed, I can try a more "clean" capture of the 4 successive syslogs)
comment:9 by , 11 years ago
It looks like a descriptor chain is occurring in the very same disk block, which we don't seem to handle correctly. ttcoder, is it possible to dump this dvd and provide it online for testing purposes?
comment:10 by , 11 years ago
At 30 KB/s upload rate it's gonna take forever :-b... Better mail/Colissimo it to you some day.. I suppose it wouldn't do much good to send just a part of it, extracted with a clever combination of /bin/head and /bin/tail.. ?
No big deal on fixing this bug as far as I'm concerned; though that particular DVD was burnt in Windows 7 with a recent version of Nero so I suppose it could occur again for other people in the future with modern (strike that... with "recent") gear, so I'll try to help.
comment:11 by , 11 years ago
Slow upload rate :) Anyway something like "dd if=/dev/../../dvd.raw of=block.raw bs=2K skip=524553" would normally do if the block size is 2K...
by , 11 years ago
Attachment: | block_for_9896.raw added |
---|
comment:12 by , 11 years ago
You bet it is slow, but I've been on dial-up for so long I'm still grateful it's that "fast"!
Uploaded the first MB from within windows (webpositive consistently stalls/stops after a few seconds, dang).
comment:14 by , 10 years ago
Blocking: | 9181 added |
---|
comment:17 by , 6 years ago
I don't think I own that DVD any more -- suggest filing as not reproducible
comment:18 by , 6 years ago
Resolution: | → not reproducible |
---|---|
Status: | new → closed |
comment:19 by , 5 years ago
Milestone: | Unscheduled |
---|
Remove milestone for tickets with status = closed and resolution != fixed
No error messages shown after 'cp' is launched, a few at mounting time though.