Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#3856 closed bug (fixed)

page fault in LegacyDevice::UninitDevice

Reported by: marcusoverhagen Owned by: axeld
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc: fredrik.holmqvist@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

When switching with a USB KVM switch between machines, haiku will sometimes crash when a file descriptor belonging to the usb mouse is close()ed. See attachment.

Attachments (1)

020520097499small.jpg (451.6 KB) - added by marcusoverhagen 10 years ago.
KDL screenshot

Download all attachments as: .zip

Change History (14)

Changed 10 years ago by marcusoverhagen

Attachment: 020520097499small.jpg added

KDL screenshot

comment:1 Changed 10 years ago by axeld

Unfortunately I don't have a USB KVM, and I also haven't seen this bug yet. Feel free to investigate :-)

comment:2 Changed 10 years ago by bonefish

I've seen that too in hrev30603 while unplugging my Kinesis keyboard to which a mouse was attached.

As for a possible cause: The there are several places in legacy_drivers.cpp where the device is deleted right after a devfs_unpublish_device(, true). It is not guaranteed that after vfs_disconnect_vnode() returns there won't be anymore calls using that node, though, since it only marks the FDs disconnected. Any thread that is in put_fd() past the check could cause this problem. Unless I'm mistaken even more generally, anyone who has already acquired a reference to the vnode (or even a file_descriptor) at this point can still call into the devfs with the "stale" node.

comment:3 Changed 10 years ago by axeld

Yes, indeed, the legacy device must not be deleted before the vnode has been released. This is also what makes the whole disconnect mechanism work.

comment:4 Changed 10 years ago by axeld

Status: newassigned

comment:5 Changed 10 years ago by axeld

Should be fixed in hrev30726. I'm keeping this bug open until I got a confirmation (at least for a few weeks, that is), as I can't reproduce this.

comment:6 Changed 10 years ago by bga

Hy Axel.

As I sent on an email to haiku-development, it is now crashing for me (sorry, no KDL image as I have no camera with me right now). The culprit seems to be the ps2 bus manager and the crash is in LegacyDevice::Remove(). It is obviously related to not having any ps2 devices connected to the machine as when I plug in a PS2 keyboard it works.

comment:7 Changed 10 years ago by tqh

Cc: fredrik.holmqvist@… added

That revision broke it for me as well with a kdl for ps2 service. I used auto_stack_trace but it didn't list anything except the header for stack traces. Either the computer hang, auto_stack_trace is broken or there is none. Reverting the files from hrev30726 at least proves that it is to blame. No ps2 here as well, just usb mouse and keyboard connected to a hub.

comment:8 Changed 10 years ago by anevilyak

I can reproduce this also if I boot with my ps2 keyboard unplugged, backtrace as follows:

PANIC: vm_page_fault: unhandled page fault in kernel space at 0x2c, ip 0x8007f9bc

stack trace for thread 136 "ps2 service"
    kernel stack: 0x9157c000 to 0x91580000
frame               caller     <image>:function + offset
 0 9157fa34 (+  48) 800659f1   <kernel_x86>:invoke_debugger_command + 0x00f5
 1 9157fa64 (+  64) 800657e1   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x80136b20, int32: 0, 0x0 "<NULL>") + 0x0079
 2 9157faa4 (+  64) 80065b68   <kernel_x86>:invoke_debugger_command_pipe + 0x009c
 3 9157fae4 (+  48) 80067118   <kernel_x86> ExpressionParser<0x9157fb98>::_ParseCommandPipe(0x9157fb94) + 0x0234
 4 9157fb14 (+  64) 80066552   <kernel_x86> ExpressionParser<0x9157fb98>::EvaluateCommand(0x80127200 "bt", 0x9157fb94) + 0x02ba
 5 9157fb54 (+ 224) 80068540   <kernel_x86>:evaluate_debug_command + 0x0088
 6 9157fc34 (+  64) 8006386a   <kernel_x86> kernel_debugger_loop() + 0x01ae
 7 9157fc74 (+  32) 80064771   <kernel_x86>:kernel_debugger + 0x004d
 8 9157fc94 (+ 192) 80064719   <kernel_x86>:panic + 0x0029
 9 9157fd54 (+  80) 800ca551   <kernel_x86>:vm_page_fault + 0x0139
10 9157fda4 (+  64) 800daf65   <kernel_x86>:page_fault_exception + 0x00d9
11 9157fde4 (+  12) 800de696   <kernel_x86>:int_bottom + 0x0036
kernel iframe at 0x9157fdf0 (end = 0x9157fe40)
 eax 0x1            ebx 0x2c            ecx 0x8012bd60   edx 0x0
 esi 0x8132c3b8     edi 0x8119f398      ebp 0x9157fe58   esp 0x9157fe24
 eip 0x8007f9bc  eflags 0x10286
 vector: 0xe, error code: 0x0
12 9157fdf0 (+ 104) 8007f9bc   <kernel_x86> LegacyDevice<0x8132c3b8>::Removed(0x812de780, 0x9157fe98) + 0x004c
13 9157fe58 (+  48) 80076f14   <kernel_x86> devfs_delete_vnode(devfs*: 0x8119f3c0, devfs_vnode*: 0x812de780, false) + 0x006c
14 9157fe88 (+  48) 8007807c   <kernel_x86> devfs_remove_vnode(fs_volume*: 0x8119f398, fs_vnode*: 0x81330d8c, true) + 0x005c
15 9157feb8 (+  48) 8009fea8   <kernel_x86> free_vnode(vnode*: 0x81330d8c, true) + 0x00ac
16 9157fee8 (+  48) 800a00a4   <kernel_x86> dec_vnode_ref_count(vnode*: 0x81330d8c, false, true) + 0x0140
17 9157ff18 (+  48) 800a4105   <kernel_x86>:put_vnode + 0x0069
18 9157ff48 (+  48) 80079986   <kernel_x86>:devfs_unpublish_device + 0x0086
19 9157ff78 (+  48) 91576c23   </boot/system/add-ons/kernel/bus_managers/ps2>:ps2_dev_unpublish + 0x002f
20 9157ffa8 (+  48) 915794c0   </boot/system/add-ons/kernel/bus_managers/ps2>:ps2_service_thread + 0x00d8
21 9157ffd8 (+  32) 800592ab   <kernel_x86> _create_kernel_thread_kentry() + 0x001b
22 9157fff8 (+1856503816) 80059248   <kernel_x86> thread_kthread_exit() + 0x0000

comment:9 Changed 10 years ago by anevilyak

After some further tracing, I see why it's crashing, but not what leads to this condition:

fRemovedFromParent is false, but fDriver is NULL. Thus fDriver->devices.Remove(this); causes the above mentioned panic, since this condition isn't checked. Should it be possible for this to occur at all, or is something more drastic wrong if fDriver is NULL at that point?

comment:10 Changed 10 years ago by anevilyak

Boots and seems to correctly handle device replugging/unplugging properly for me with the following diff:

Index: src/system/kernel/device_manager/legacy_drivers.cpp
===================================================================
--- src/system/kernel/device_manager/legacy_drivers.cpp (revision 30769)
+++ src/system/kernel/device_manager/legacy_drivers.cpp (working copy)
@@ -1184,7 +1184,7 @@
 {
        RecursiveLocker _(sLock);
 
-       if (!fRemovedFromParent)
+       if (!fRemovedFromParent && fDriver != NULL)
                fDriver->devices.Remove(this);
 
        delete this;

comment:11 Changed 10 years ago by axeld

Thanks Rene! I wanted to investigate this today, but didn't find the time. I'll have a look if fDriver is allowed to be NULL at that point; it sounds a bit odd indeed, at least.

comment:12 Changed 10 years ago by axeld

Resolution: fixed
Status: assignedclosed

Legacy drivers can be published without an actual driver, with the hooks only - so it's indeed perfectly okay that fDriver is NULL.

I've applied your patch, thanks!

comment:13 Changed 10 years ago by axeld

In hrev30770, btw.

Note: See TracTickets for help on using tickets.