Opened 2 years ago

Last modified 2 years ago

#17802 assigned bug

[Network stack] deadlock

Reported by: diver Owned by: waddlesplash
Priority: normal Milestone: Unscheduled
Component: Network & Internet/Stack Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

hrev56182 x86_64 running in VMware Fusion.

At some point all network related processes locked up.

Attachments (1)

syslog (411.0 KB ) - added by diver 2 years ago.
KDL session

Download all attachments as: .zip

Change History (4)

by diver, 2 years ago

Attachment: syslog added

KDL session

comment:1 by waddlesplash, 2 years ago

Deadlock is between:

mutex 0xffffffff8227aa80:
  name:            device interface receive
  flags:           0x0
  holder:          1759
  waiting threads: 2060

mutex 0xffffffff93cda718:
  name:            /dev/net/pcnet/0
  flags:           0x0
  holder:          2060
  waiting threads: 1759 2103 2123 1762

backtraces of those two threads:

KERN: stack trace for thread 1759 "/dev/net/pcnet/0 consumer"
KERN:     kernel stack: 0xffffffff81781000 to 0xffffffff81786000
KERN: frame                       caller             <image>:function + offset
KERN:  0 ffffffff81785c60 (+ 128) ffffffff80099e43   <kernel_x86_64> reschedule(int) + 0x433
KERN:  1 ffffffff81785ce0 (+  48) ffffffff800899a6   <kernel_x86_64> thread_block + 0xc6
KERN:  2 ffffffff81785d10 (+  80) ffffffff80095fea   <kernel_x86_64> _mutex_lock + 0x21a
KERN:  3 ffffffff81785d60 (+  32) ffffffff8009608e   <kernel_x86_64> recursive_lock_lock + 0x3e
KERN:  4 ffffffff81785d80 (+  80) ffffffff81117d16   </boot/system/add-ons/kernel/network/stack> Interface::AddressForDestination[clone .localalias] (net_domain*, sockaddr const*) + 0x26
KERN:  5 ffffffff81785dd0 (+  80) ffffffff8111b112   </boot/system/add-ons/kernel/network/stack> get_interface_address_for_destination(net_domain*, sockaddr const*) + 0x52
KERN:  6 ffffffff81785e20 (+  80) ffffffff81113f1a   </boot/system/add-ons/kernel/network/stack> datalink_is_local_address(net_domain*, sockaddr const*, net_interface_address**, unsigned int*) + 0xba
KERN:  7 ffffffff81785e70 (+ 208) ffffffff81859070   </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data[clone .localalias] (net_buffer*) + 0x130
KERN:  8 ffffffff81785f40 (+ 112) ffffffff81114c72   </boot/system/add-ons/kernel/network/stack> device_consumer_thread(void*) + 0x152
KERN:  9 ffffffff81785fb0 (+  32) ffffffff8008c088   <kernel_x86_64> common_thread_entry(void*) + 0x38
KERN: 10 ffffffff81785fd0 (+2122817584) ffffffff81785fe0   17526:/dev/net/pcnet/0 consumer_1759_@0xffffffff81781000 + 0x4fe0
stack trace for thread 2060 "BUrlProtocol.HTTP"
KERN:     kernel stack: 0xffffffff81d5d000 to 0xffffffff81d62000
KERN:       user stack: 0x00007f0001117000 to 0x00007f0001157000
KERN: frame                       caller             <image>:function + offset
KERN:  0 ffffffff81d619b0 (+ 128) ffffffff80099e43   <kernel_x86_64> reschedule(int) + 0x433
KERN:  1 ffffffff81d61a30 (+  48) ffffffff800899a6   <kernel_x86_64> thread_block + 0xc6
KERN:  2 ffffffff81d61a60 (+  80) ffffffff80095fea   <kernel_x86_64> _mutex_lock + 0x21a
KERN:  3 ffffffff81d61ab0 (+  32) ffffffff8009608e   <kernel_x86_64> recursive_lock_lock + 0x3e
KERN:  4 ffffffff81d61ad0 (+  96) ffffffff811158a6   </boot/system/add-ons/kernel/network/stack> register_device_handler[clone .localalias] (net_device*, int, int (*)(void*, net_device*, net_buffer*), void*) + 0x56
KERN:  5 ffffffff81d61b30 (+  48) ffffffff811c705b   </boot/system/add-ons/kernel/network/datalink_protocols/ipv6_datagram> ipv6_datalink_init(net_interface*, net_domain*, net_datalink_protocol**) + 0x2b
KERN:  6 ffffffff81d61b60 (+ 112) ffffffff81124931   </boot/system/add-ons/kernel/network/stack> get_domain_datalink_protocols(Interface*, net_domain*) + 0x131
KERN:  7 ffffffff81d61bd0 (+  96) ffffffff81118abc   </boot/system/add-ons/kernel/network/stack> Interface::CreateDomainDatalinkIfNeeded[clone .localalias] (net_domain*) + 0x20c
KERN:  8 ffffffff81d61c30 (+  48) ffffffff8111ada3   </boot/system/add-ons/kernel/network/stack> get_interface(net_domain*, char const*) + 0xb3
KERN:  9 ffffffff81d61c60 (+ 528) ffffffff81113950   </boot/system/add-ons/kernel/network/stack> datalink_control(net_domain*, int, void*, unsigned long*) + 0xa0
KERN: 10 ffffffff81d61e70 (+  80) ffffffff8111d738   </boot/system/add-ons/kernel/network/stack> socket_control(net_socket*, unsigned int, void*, unsigned long) + 0xd8
KERN: 11 ffffffff81d61ec0 (+  64) ffffffff800e7f5a   <kernel_x86_64> fd_ioctl(bool, int, unsigned int, void*, unsigned long) + 0x5a
KERN: 12 ffffffff81d61f00 (+  32) ffffffff800e8c1a   <kernel_x86_64> _user_ioctl + 0x3a
KERN: 13 ffffffff81d61f20 (+  16) ffffffff80145eff   <kernel_x86_64> x86_64_syscall_entry + 0xfb
KERN: user iframe at 0xffffffff81d61f30 (end = 0xffffffff81d61ff8)
KERN:  rax 0x93                  rbx 0x10010074ae00        rcx 0x18dd39c
KERN:  rdx 0x7f0001156650        rsi 0x22cb                rdi 0xf
KERN:  rbp 0x7f0001156620         r8 0x100100748f08         r9 0x2
KERN:  r10 0x58                  r11 0x202                 r12 0x100100748ef8
KERN:  r13 0x7f0001156650        r14 0xf                   r15 0x7f0001156708
KERN:  rip 0x18dd39c             rsp 0x7f0001156608     rflags 0x202
KERN:  vector: 0x63, error code: 0x0
KERN: 14 ffffffff81d61f30 (+139640111580912) 00000000018dd39c   <libroot.so> _kern_ioctl + 0x0c
KERN: 15 00007f0001156620 (+ 192) 0000000002fde7f2   <libnetwork.so> gethostbyaddr_r (nearest) + 0x2d2
KERN: 16 00007f00011566e0 (+ 112) 0000000002fdea4d   <libnetwork.so> getifaddrs + 0x11d
KERN: 17 00007f0001156750 (+ 448) 0000000002fe5a2e   <libnetwork.so> getaddrinfo + 0xc4e
KERN: 18 00007f0001156910 (+ 160) 000000000127cd8c   <libbnetapi.so> BNetworkAddressResolver::SetTo(int, char const*, char const*, unsigned int) + 0x11c
KERN: 19 00007f00011569b0 (+  64) 000000000127cf5a   <libbnetapi.so> BNetworkAddressResolver::BNetworkAddressResolver(int, char const*, char const*, unsigned int) + 0x5a
KERN: 20 00007f00011569f0 (+ 128) 000000000127d416   <libbnetapi.so> BNetworkAddressResolver::Resolve(int, char const*, char const*, unsigned int) + 0x1b6
KERN: 21 00007f0001156a70 (+  96) 000000000127d650   <libbnetapi.so> BNetworkAddressResolver::Resolve(int, char const*, unsigned short, unsigned int) + 0x60
KERN: 22 00007f0001156ad0 (+  32) 000000000127d69d   <libbnetapi.so> BNetworkAddressResolver::Resolve(char const*, unsigned short, unsigned int) + 0x1d
KERN: 23 00007f0001156af0 (+  64) 000000000127add1   <libbnetapi.so> BNetworkAddress::SetTo(char const*, unsigned short, unsigned int) + 0x21
KERN: 24 00007f0001156b30 (+  64) 000000000127aeda   <libbnetapi.so> BNetworkAddress::BNetworkAddress(char const*, unsigned short, unsigned int) + 0x4a
KERN: 25 00007f0001156b70 (+ 224) 0000000000e00377   <libpackage.so> _ZN8BPrivate14NaturalCompareEPKcS1_ (nearest) + 0x91b7
KERN: 26 00007f0001156c50 (+ 256) 0000000000dfa20c   <libpackage.so> _ZN8BPrivate14NaturalCompareEPKcS1_ (nearest) + 0x304c
KERN: 27 00007f0001156d50 (+  32) 0000000000e01760   <libpackage.so> _ZN8BPrivate14NaturalCompareEPKcS1_ (nearest) + 0xa5a0
KERN: 28 00007f0001156d70 (+  32) 00000000018dc629   <libroot.so> _thread_do_exit_work (nearest) + 0x89
KERN: 29 00007f0001156d90 (+   0) 00007f0001081258   <commpage> commpage_thread_exit + 0x00

comment:2 by waddlesplash, 2 years ago

This is tough.

CreateDomainDatalinkIfNeeded needs to hold the interface lock while calling get_domain_datalink_protocols because that updates domain_datalink members without acquiring any other locks, it relies on its caller to have already locked them, so unlocking the interface here is not going to be easy.) Furthermore, CreateDomainDatalinkIfNeeded has a mechanism to check if the datalink was already created, and if it was, just return B_OK, presuming if the datalink is there then it initialized successfully; so if we unlock the interface before it is fully initialized, we would need to add a new mechanism for competing Create requests to wait if they had to.

device_consumer_thread needs to hold the receive lock while it iterates through the receive list and calls handlers, to protect against anything modifying the list (which the other thread wants to do.)

I guess it will be easier to modify device_consumer_thread to not hold the receive lock while actually calling handlers. That may allow for reverting hrev55940, but I think that change stands on its own, though after adjusting this, the comment may be changed.

comment:3 by waddlesplash, 2 years ago

Maybe there is another solution: lock the addresses table of Interface separately. I'm not really sure that's a good solution, though.

Note: See TracTickets for help on using tickets.