Opened 2 years ago
Last modified 2 years ago
#17802 assigned bug
[Network stack] deadlock
Reported by: | diver | Owned by: | waddlesplash |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Network & Internet/Stack | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
hrev56182 x86_64 running in VMware Fusion.
At some point all network related processes locked up.
Attachments (1)
Change History (4)
by , 2 years ago
comment:1 by , 2 years ago
Deadlock is between:
mutex 0xffffffff8227aa80: name: device interface receive flags: 0x0 holder: 1759 waiting threads: 2060 mutex 0xffffffff93cda718: name: /dev/net/pcnet/0 flags: 0x0 holder: 2060 waiting threads: 1759 2103 2123 1762
backtraces of those two threads:
KERN: stack trace for thread 1759 "/dev/net/pcnet/0 consumer" KERN: kernel stack: 0xffffffff81781000 to 0xffffffff81786000 KERN: frame caller <image>:function + offset KERN: 0 ffffffff81785c60 (+ 128) ffffffff80099e43 <kernel_x86_64> reschedule(int) + 0x433 KERN: 1 ffffffff81785ce0 (+ 48) ffffffff800899a6 <kernel_x86_64> thread_block + 0xc6 KERN: 2 ffffffff81785d10 (+ 80) ffffffff80095fea <kernel_x86_64> _mutex_lock + 0x21a KERN: 3 ffffffff81785d60 (+ 32) ffffffff8009608e <kernel_x86_64> recursive_lock_lock + 0x3e KERN: 4 ffffffff81785d80 (+ 80) ffffffff81117d16 </boot/system/add-ons/kernel/network/stack> Interface::AddressForDestination[clone .localalias] (net_domain*, sockaddr const*) + 0x26 KERN: 5 ffffffff81785dd0 (+ 80) ffffffff8111b112 </boot/system/add-ons/kernel/network/stack> get_interface_address_for_destination(net_domain*, sockaddr const*) + 0x52 KERN: 6 ffffffff81785e20 (+ 80) ffffffff81113f1a </boot/system/add-ons/kernel/network/stack> datalink_is_local_address(net_domain*, sockaddr const*, net_interface_address**, unsigned int*) + 0xba KERN: 7 ffffffff81785e70 (+ 208) ffffffff81859070 </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data[clone .localalias] (net_buffer*) + 0x130 KERN: 8 ffffffff81785f40 (+ 112) ffffffff81114c72 </boot/system/add-ons/kernel/network/stack> device_consumer_thread(void*) + 0x152 KERN: 9 ffffffff81785fb0 (+ 32) ffffffff8008c088 <kernel_x86_64> common_thread_entry(void*) + 0x38 KERN: 10 ffffffff81785fd0 (+2122817584) ffffffff81785fe0 17526:/dev/net/pcnet/0 consumer_1759_@0xffffffff81781000 + 0x4fe0
stack trace for thread 2060 "BUrlProtocol.HTTP" KERN: kernel stack: 0xffffffff81d5d000 to 0xffffffff81d62000 KERN: user stack: 0x00007f0001117000 to 0x00007f0001157000 KERN: frame caller <image>:function + offset KERN: 0 ffffffff81d619b0 (+ 128) ffffffff80099e43 <kernel_x86_64> reschedule(int) + 0x433 KERN: 1 ffffffff81d61a30 (+ 48) ffffffff800899a6 <kernel_x86_64> thread_block + 0xc6 KERN: 2 ffffffff81d61a60 (+ 80) ffffffff80095fea <kernel_x86_64> _mutex_lock + 0x21a KERN: 3 ffffffff81d61ab0 (+ 32) ffffffff8009608e <kernel_x86_64> recursive_lock_lock + 0x3e KERN: 4 ffffffff81d61ad0 (+ 96) ffffffff811158a6 </boot/system/add-ons/kernel/network/stack> register_device_handler[clone .localalias] (net_device*, int, int (*)(void*, net_device*, net_buffer*), void*) + 0x56 KERN: 5 ffffffff81d61b30 (+ 48) ffffffff811c705b </boot/system/add-ons/kernel/network/datalink_protocols/ipv6_datagram> ipv6_datalink_init(net_interface*, net_domain*, net_datalink_protocol**) + 0x2b KERN: 6 ffffffff81d61b60 (+ 112) ffffffff81124931 </boot/system/add-ons/kernel/network/stack> get_domain_datalink_protocols(Interface*, net_domain*) + 0x131 KERN: 7 ffffffff81d61bd0 (+ 96) ffffffff81118abc </boot/system/add-ons/kernel/network/stack> Interface::CreateDomainDatalinkIfNeeded[clone .localalias] (net_domain*) + 0x20c KERN: 8 ffffffff81d61c30 (+ 48) ffffffff8111ada3 </boot/system/add-ons/kernel/network/stack> get_interface(net_domain*, char const*) + 0xb3 KERN: 9 ffffffff81d61c60 (+ 528) ffffffff81113950 </boot/system/add-ons/kernel/network/stack> datalink_control(net_domain*, int, void*, unsigned long*) + 0xa0 KERN: 10 ffffffff81d61e70 (+ 80) ffffffff8111d738 </boot/system/add-ons/kernel/network/stack> socket_control(net_socket*, unsigned int, void*, unsigned long) + 0xd8 KERN: 11 ffffffff81d61ec0 (+ 64) ffffffff800e7f5a <kernel_x86_64> fd_ioctl(bool, int, unsigned int, void*, unsigned long) + 0x5a KERN: 12 ffffffff81d61f00 (+ 32) ffffffff800e8c1a <kernel_x86_64> _user_ioctl + 0x3a KERN: 13 ffffffff81d61f20 (+ 16) ffffffff80145eff <kernel_x86_64> x86_64_syscall_entry + 0xfb KERN: user iframe at 0xffffffff81d61f30 (end = 0xffffffff81d61ff8) KERN: rax 0x93 rbx 0x10010074ae00 rcx 0x18dd39c KERN: rdx 0x7f0001156650 rsi 0x22cb rdi 0xf KERN: rbp 0x7f0001156620 r8 0x100100748f08 r9 0x2 KERN: r10 0x58 r11 0x202 r12 0x100100748ef8 KERN: r13 0x7f0001156650 r14 0xf r15 0x7f0001156708 KERN: rip 0x18dd39c rsp 0x7f0001156608 rflags 0x202 KERN: vector: 0x63, error code: 0x0 KERN: 14 ffffffff81d61f30 (+139640111580912) 00000000018dd39c <libroot.so> _kern_ioctl + 0x0c KERN: 15 00007f0001156620 (+ 192) 0000000002fde7f2 <libnetwork.so> gethostbyaddr_r (nearest) + 0x2d2 KERN: 16 00007f00011566e0 (+ 112) 0000000002fdea4d <libnetwork.so> getifaddrs + 0x11d KERN: 17 00007f0001156750 (+ 448) 0000000002fe5a2e <libnetwork.so> getaddrinfo + 0xc4e KERN: 18 00007f0001156910 (+ 160) 000000000127cd8c <libbnetapi.so> BNetworkAddressResolver::SetTo(int, char const*, char const*, unsigned int) + 0x11c KERN: 19 00007f00011569b0 (+ 64) 000000000127cf5a <libbnetapi.so> BNetworkAddressResolver::BNetworkAddressResolver(int, char const*, char const*, unsigned int) + 0x5a KERN: 20 00007f00011569f0 (+ 128) 000000000127d416 <libbnetapi.so> BNetworkAddressResolver::Resolve(int, char const*, char const*, unsigned int) + 0x1b6 KERN: 21 00007f0001156a70 (+ 96) 000000000127d650 <libbnetapi.so> BNetworkAddressResolver::Resolve(int, char const*, unsigned short, unsigned int) + 0x60 KERN: 22 00007f0001156ad0 (+ 32) 000000000127d69d <libbnetapi.so> BNetworkAddressResolver::Resolve(char const*, unsigned short, unsigned int) + 0x1d KERN: 23 00007f0001156af0 (+ 64) 000000000127add1 <libbnetapi.so> BNetworkAddress::SetTo(char const*, unsigned short, unsigned int) + 0x21 KERN: 24 00007f0001156b30 (+ 64) 000000000127aeda <libbnetapi.so> BNetworkAddress::BNetworkAddress(char const*, unsigned short, unsigned int) + 0x4a KERN: 25 00007f0001156b70 (+ 224) 0000000000e00377 <libpackage.so> _ZN8BPrivate14NaturalCompareEPKcS1_ (nearest) + 0x91b7 KERN: 26 00007f0001156c50 (+ 256) 0000000000dfa20c <libpackage.so> _ZN8BPrivate14NaturalCompareEPKcS1_ (nearest) + 0x304c KERN: 27 00007f0001156d50 (+ 32) 0000000000e01760 <libpackage.so> _ZN8BPrivate14NaturalCompareEPKcS1_ (nearest) + 0xa5a0 KERN: 28 00007f0001156d70 (+ 32) 00000000018dc629 <libroot.so> _thread_do_exit_work (nearest) + 0x89 KERN: 29 00007f0001156d90 (+ 0) 00007f0001081258 <commpage> commpage_thread_exit + 0x00
comment:2 by , 2 years ago
This is tough.
CreateDomainDatalinkIfNeeded needs to hold the interface lock while calling get_domain_datalink_protocols because that updates domain_datalink members without acquiring any other locks, it relies on its caller to have already locked them, so unlocking the interface here is not going to be easy.) Furthermore, CreateDomainDatalinkIfNeeded has a mechanism to check if the datalink was already created, and if it was, just return B_OK
, presuming if the datalink is there then it initialized successfully; so if we unlock the interface before it is fully initialized, we would need to add a new mechanism for competing Create requests to wait if they had to.
device_consumer_thread needs to hold the receive lock while it iterates through the receive list and calls handlers, to protect against anything modifying the list (which the other thread wants to do.)
I guess it will be easier to modify device_consumer_thread to not hold the receive lock while actually calling handlers. That may allow for reverting hrev55940, but I think that change stands on its own, though after adjusting this, the comment may be changed.
comment:3 by , 2 years ago
Maybe there is another solution: lock the addresses table of Interface separately. I'm not really sure that's a good solution, though.
KDL session