Opened 8 years ago

Closed 2 years ago

#7252 closed bug (fixed)

KDL when removing an active USB network interface

Reported by: phoudoin Owned by: axeld
Priority: critical Milestone: R1
Component: Network & Internet/Stack Version: R1/Development
Keywords: Cc: shade, mmlr
Blocked By: Blocking: #9447
Has a Patch: yes Platform: All

Description (last modified by phoudoin)

Plug an USB adapter supported by usb_pegasus, without ethernet wire connected.

The net_server detect it and as the device report (wrongly!) a link, it starts to auto-configure it via DHCP. Remove the device before DHCP attempts ends.

I get this KDL:

PANIC: _mutex_lock(): double lock of 0x8102b9fc by thread 264
Welcome to Kernel Debugging Land...
Thread 264 "/dev/net/pegasus/0 reader" running on CPU 1
stack trace for thread 264 "/dev/net/pegasus/0 reader"
    kernel stack: 0x816b5000 to 0x816b9000
frame               caller     <image>:function + offset
 0 816b8bcc (+  32) 800ed4b3   <kernel_x86>:arch_debug_stack_trace + 0x000f
 1 816b8bec (+  16) 80072d1e   <kernel_x86> stack_trace_trampoline(void*: NULL) + 0x000b
 2 816b8bfc (+  12) 800f2c96   <kernel_x86>:arch_debug_call_with_fault_handler + 0x001b
 3 816b8c08 (+  48) 80073215   <kernel_x86>:debug_call_with_fault_handler + 0x0052
 4 816b8c38 (+  80) 800741c6   <kernel_x86> kernel_debugger_loop(char const*: 0x1 "<???>", char const*: 0x8015b764 "^^k^", char*: 0x816b8cc8, int32: -2147007438) + 0x0226
 5 816b8c88 (+  64) 80074451   <kernel_x86> kernel_debugger_internal(char const*: 0x1 "<???>", char const*: 0x860446b0 "", char*: 0x816b8ce8, int32: -2147006943) + 0x0112
 6 816b8cc8 (+  32) 80074634   <kernel_x86>:panic + 0x0023
 7 816b8ce8 (+  80) 8006cbd5   <kernel_x86>:_mutex_lock + 0x00a7
 8 816b8d38 (+ 512) 8101705f   </boot/system/add-ons/kernel/network/stack> get_device_interface(const char*: 0x85b6d9ac "/dev/net/pegasus/0", false) + 0x002f
 9 816b8f38 (+  64) 81017c86   </boot/system/add-ons/kernel/network/stack> device_removed(net_device*: 0x85b6d9a8) + 0x0036
10 816b8f78 (+  96) 81017e0e   </boot/system/add-ons/kernel/network/stack> device_reader_thread(void*: 0x8638b968) + 0x010e
11 816b8fd8 (+  32) 80065d0b   <kernel_x86> _create_kernel_thread_kentry() + 0x0015
12 816b8ff8 (+2123657224) 800069578   <kernel_x86> thread_kthread_exit() + 0x0000

This is with nightly gcc4hybrid hrev40527 running native on a dual core system.

When the device is removed after the DHCP failed attemps, there is no KDL, but running ifconfig or clicking networkstatus replicant in Deskbar froze them.

One can wonder why the pegasus device report a link when there is none yet...

Anyway, device_removed() should not leads to a KDL.

Attachments (2)

dsc00100zo.jpg (1.4 MB) - added by mr.Noisy 8 years ago.
Fix-7252.patch (920 bytes) - added by diver 6 years ago.

Download all attachments as: .zip

Change History (20)

comment:1 Changed 8 years ago by phoudoin

Description: modified (diff)

comment:2 Changed 8 years ago by phoudoin

Description: modified (diff)

comment:3 Changed 8 years ago by phoudoin

Description: modified (diff)

comment:4 Changed 8 years ago by mr.Noisy

I get similar problem (see attached photo) when I plug my cell phone SonyEricsson W595 via USB, mount it flash-drive, unmount, unplug and plug it again. The phone can work as a usb network adapter, and it seems that this mode is activated after second plug.

Should I create another ticket or my issue related to this one?

This is with nightly gcc4hybrid hrev41245.

Changed 8 years ago by mr.Noisy

Attachment: dsc00100zo.jpg added

comment:5 in reply to:  4 Changed 8 years ago by phoudoin

Replying to mr.Noisy:

I get similar problem (see attached photo) when I plug my cell phone SonyEricsson W595 via USB, mount it flash-drive, unmount, unplug and plug it again. The phone can work as a usb network adapter, and it seems that this mode is activated after second plug.

More probably it publish multiple devices classes, both usb mass storage *and* USB ethernet (ECM) for instance, but while you see a dialog popup when an USB storage is detected, you don't have much indication when an USB network adapter is, in particular if you've already a working net interface.

Should I create another ticket or my issue related to this one?

Looks like it's the very same issue, indeed, just with usb_ecm instead of pegasus driver.

Last edited 8 years ago by phoudoin (previous) (diff)

comment:6 Changed 8 years ago by diver

Version: R1/alpha2R1/Development

Still here in hrev42709.

comment:7 Changed 6 years ago by diver

Blocking: 9447 added

comment:8 Changed 6 years ago by diver

Removing this mutex seems to fix the KDL.

index a3030f0..3597fa7 100644
--- a/src/add-ons/kernel/network/stack/device_interfaces.cpp
+++ b/src/add-ons/kernel/network/stack/device_interfaces.cpp
@@ -761,7 +761,7 @@ device_link_changed(net_device* device)
 status_t
 device_removed(net_device* device)
 {
-       MutexLocker locker(sLock);

        // hold a reference to the device interface being removed
        // so our put_() will (eventually) do the final cleanup

Found by Evgeny Abdraimov aka Shade.

Unfortunately even with this patch applied network stack deadlocks and net_server/ifconfig/Network/NetworkStatus hangs as well.

comment:9 Changed 6 years ago by diver

Cc: shade mmlr added

Changed 6 years ago by diver

Attachment: Fix-7252.patch added

comment:10 Changed 6 years ago by diver

Has a Patch: set

comment:11 Changed 6 years ago by diver

ifconfig deadlock after replug with this patch applied:

kdebug> teams
team           id  parent      name
0x828bf600      1  0x00000000  kernel_team
0x828b8a00     93  0x828bf600  registrar
0x828b7e00    100  0x828bf600  debug_server
0x828b8400    101  0x828bf600  net_server
0x828b7800    102  0x828bf600  app_server
0x828b7200    119  0x828bf600  syslog_daemon
0x828b6c00    134  0x828b7800  input_server
0x828b6000    141  0x828bf600  mount_server
0x828b5a00    154  0x828bf600  Tracker
0x828ba200    155  0x828bf600  Deskbar
0x828b9c00    156  0x828bf600  media_server
0x828ba800    161  0x828bf600  notification_server
0x828be400    162  0x828bf600  power_daemon
0x828b4800    193  0x828bf600  LaunchBox
0x828b3600    205  0x828b9c00  media_addon_server
0x828b3c00    282  0x828b4800  Terminal
0x828b9600    286  0x828b3c00  bash
0x828bae00    377  0x828b4800  Terminal
0x828b1200    381  0x828bae00  bash
0x828b0c00    397  0x828b1200  ifconfig
0x828b1800    403  0x828b9600  tail

kdebug> sems 397
sem            id count   team   last  name
0x85f31a10   3211     0    397      0  some BBlockCache lock
0x85f31a40   3212     0    397      0  Catalog
0x85f31a70   3213     0    397      0  some BLocker
0x85f31aa0   3214     0    397      0  some BLocker
0x85f31ad0   3215     0    397      0  token space
0x85f31b00   3216     0    397      0  BLooperList lock
0x85f31b30   3217     0    397      0  AppServerLink_sLock
0x85f31b60   3218     0    397      0  LocaleRosterData
0x85f31b90   3219     0    397      0  some BLocker

kdebug> bt 397
stack trace for thread 397 "ifconfig"
    kernel stack: 0x85b1f000 to 0x85b23000
      user stack: 0x7efee000 to 0x7ffee000
frame               caller     <image>:function + offset
 0 85b22ac4 (+ 112) 8008b7c6   <kernel_x86> reschedule() + 0x552
 1 85b22b34 (+  80) 8008820c   <kernel_x86> _mutex_lock + 0x1d0
 2 85b22b84 (+ 336) 8110f6fd   </boot/system/add-ons/kernel/network/stack> get_device_interface(0x85b22dd0 "loop", true) + 0x4d
 3 85b22cd4 (+  48) 8111caec   </boot/system/add-ons/kernel/network/stack> user_request_get_device_interface(0x7ffedc74, ifreq&: 0x85b22dd0, net_device_interface&: 0x85b22d38) + 0x3c
 4 85b22d04 (+ 288) 8111ce1b   </boot/system/add-ons/kernel/network/stack> link_control(net_protocol*: 0x80bf2a40, int32: 251658240, int32: 8903, 0x7ffedc74, 0x85b22e88) + 0x22b
 5 85b22e24 (+  80) 811190fd   </boot/system/add-ons/kernel/network/stack> socket_control(net_socket*: 0x82b6a000, int32: 8903, 0x7ffedc74, uint32: 0x54 (84)) + 0x141
 6 85b22e74 (+  48) 81120e7c   </boot/system/add-ons/kernel/network/stack> stack_interface_ioctl(net_socket*: 0x82b6a000, uint32: 0x22c7 (8903), 0x7ffedc74, uint32: 0x54 (84)) + 0x30
 7 85b22ea4 (+  48) 800d2c7a   <kernel_x86> socket_ioctl(file_descriptor*: 0x80bc7ed0, uint32: 0x22c7 (8903), 0x7ffedc74, uint32: 0x54 (84)) + 0x26
 8 85b22ed4 (+  48) 800ca4f3   <kernel_x86> fd_ioctl(false, int32: 3, uint32: 0x22c7 (8903), 0x7ffedc74, uint32: 0x54 (84)) + 0x5b
 9 85b22f04 (+  64) 800cb3d8   <kernel_x86> _user_ioctl + 0x58
10 85b22f44 (+ 100) 80122370   <kernel_x86> handle_syscall + 0xcd
user iframe at 0x85b22fa8 (end = 0x85b23000)
 eax 0x8e          ebx 0x66d128       ecx 0x7ffedc00  edx 0xffff0114
 esi 0x3           edi 0x3            ebp 0x7ffedc2c  esp 0x85b22fdc
 eip 0xffff0114 eflags 0x203216  user esp 0x7ffedc00
 vector: 0x63, error code: 0x0
11 85b22fa8 (+   0) ffff0114   <commpage> commpage_syscall + 0x04
12 7ffedc2c (+ 160) 0057db63   <libbnetapi.so> BNetworkInterface<0x7ffede90>::GetHardwareAddress(BNetworkAddress&: 0x7ffede08) + 0x67
13 7ffedccc (+ 512) 002037ba   <_APP_> list_interface(0x7ffedf10 "loop") + 0x92
14 7ffedecc (+ 128) 00203fae   <_APP_> list_interfaces(0x0 "<NULL>") + 0x62
15 7ffedf4c (+  48) 002052e9   <_APP_> main + 0x1d5
16 7ffedf7c (+  48) 00202507   <_APP_> _start + 0x5b
17 7ffedfac (+  48) 0010626e   </boot/system/runtime_loader@0x00100000> <unknown> + 0x626e
18 7ffedfdc (+   0) 7ffedfec   9858:ifconfig_397_stack@0x7efea000 + 0x1003fec
kdebug>
Last edited 6 years ago by diver (previous) (diff)

comment:12 Changed 5 years ago by jackburton

Changing the mutex to a recursive locker fixes the problem, but I'm not sure of the implications.

comment:13 in reply to:  12 Changed 4 years ago by jackburton

Replying to jackburton:

Changing the mutex to a recursive locker fixes the problem, but I'm not sure of the implications.

Is it okay if I apply this change ?

comment:14 Changed 4 years ago by diver

ping

comment:15 Changed 4 years ago by jackburton

Still a problem ?

comment:16 in reply to:  15 Changed 3 years ago by phoudoin

Replying to jackburton:

Still a problem ?

Didn't check yet, but switching to a recursive locker make quite sense to me.

comment:17 Changed 2 years ago by axeld

Status: newin-progress

comment:18 Changed 2 years ago by axeld

Resolution: fixed
Status: in-progressclosed

Fixed in hrev51071. If there are any followup issues, please create a new ticket; I don't have the hardware, so I couldn't easily reproduce the issue.

Note: See TracTickets for help on using tickets.