Opened 11 years ago

Closed 11 years ago

#2172 closed bug (fixed)

neon's single_read test hangs at 100% CPU

Reported by: andreasf Owned by: axeld
Priority: critical Milestone: R1/alpha1
Component: Network & Internet/TCP Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

neon 0.25.x SVN branch, virtually unpatched, hangs at the single_read socket test (make check).

CPU stays at 100%, and not even kill -9 <pid> can abort the test.

neon is an HTTP/WebDAV library used by Subversion; I compiled it against the expat XML parser and (optionally) against OpenSSL (no influence). http://tools.assembla.com/BePorts/wiki/net-misc/neon

Change History (11)

comment:1 Changed 11 years ago by andreasf

In hrev25256 it fails one test earlier, peak_close, with the same symptoms.

No hints in serial output. The kernel debugger seems to indicate that neon's socket app has a single thread, in the running state.

kdebug> threads 3230                                                            
thread         id  state        sem/cv cpu pri  stack      team  name           
0x90d5c800   3230  running        -      0  10  0x90a9a000 3230  socket

comment:2 Changed 11 years ago by korli

You could try "bt 3230" or "sc 3230" to see the current stacktrace.

comment:3 in reply to:  2 Changed 11 years ago by andreasf

(the above should have read peek_close)

Replying to korli:

You could try "bt 3230" or "sc 3230" to see the current stacktrace.

Thanks for the hint.

Still at hrev25256, it occurred at addr_connect test (another two earlier) this time:

kdebug> threads 174                                                             
thread         id  state        sem/cv cpu pri  stack      team  name           
0x90db7800    174  running        -      1  10  0x90ac9000  174  socket         
kdebug> sc 174                                                                  
stack trace for thread 174 "socket"                                             
    kernel stack: 0x90ac9000 to 0x90acd000                                      
      user stack: 0x7efe7000 to 0x7ffe7000                                      
frame            caller     <image>:function + offset                           
800d053c (+2146630340) 90b183e0

and after continue, during next F12:

kdebug> threads 174                                                             
thread         id  state        sem/cv cpu pri  stack      team  name           
0x90db7800    174  running        -      1  10  0x90ac9000  174  socket         
kdebug> sc 174                                                                  
stack trace for thread 174 "socket"                                             
    kernel stack: 0x90ac9000 to 0x90acd000                                      
      user stack: 0x7efe7000 to 0x7ffe7000                                      
frame            caller     <image>:function + offset                           
00000082 -- read fault

I assume this lack of sensible output indicates some kind of stack corruption?

comment:4 Changed 11 years ago by andreasf

Better luck at hrev25274 with SMP disabled, back to peek_close:

kdebug> threads 177                                                             
thread         id  state        sem/cv cpu pri  stack      team  name           
0x90dbd800    177  running        -      0  10  0x94314000  177  socket         
kdebug> sc 177                                                                  
stack trace for thread 177 "socket"
    kernel stack: 0x94314000 to 0x94318000
      user stack: 0x7efe7000 to 0x7ffe7000
frame            caller     <image>:function + offset
94317928 (+  52) 8008ace7   <kernel>:invoke_debugger_command + 0x00cf
9431795c (+  64) 8008ba90   <kernel>:_ParseCommand__16ExpressionParserRi + 0x01f8
9431799c (+  48) 8008b482   <kernel>:EvaluateCommand__16ExpressionParserPCcRi + 0x01de
943179cc (+ 228) 8008cba4   <kernel>:evaluate_debug_command + 0x0088
94317ab0 (+  64) 80089822   <kernel>:kernel_debugger_loop__Fv + 0x017a
94317af0 (+  48) 8008a4d5   <kernel>:kernel_debugger + 0x010d
94317b20 (+ 192) 8008a3bd   <kernel>:panic + 0x0029
94317be0 (+  48) 807b7df1   </boot/beos/system/add-ons/kernel/bus_managers/ps2>:ps2_interrupt + 0x00cd
94317c10 (+  64) 80028918   <kernel>:int_io_interrupt_handler + 0x00e0
94317c50 (+  48) 80097522   <kernel>:hardware_interrupt + 0x007a
94317c80 (+  12) 8009ab6d   <kernel>:int_bottom + 0x001d (nearest)
iframe at 0x94317c8c (end = 0x94317ce4)
 eax 0x90dbd800     ebx 0x90f01a9c      ecx 0x2129d      edx 0x0
 esi 0x94317dcc     edi 0x94317dcc      ebp 0x94317cf4   esp 0x94317cc0
 eip 0x80029171  eflags 0x282      
 vector: 0x21, error code: 0x0
94317c8c (+ 104) 80029171   <kernel>:mutex_unlock + 0x000d
94317cf4 (+  48) 802f135a   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Wait__8WaitListRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12MutexLockingxb + 0x002e
94317d24 (+  64) 802f2d13   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:_WaitForEstablished__11TCPEndpointRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12Mutex7
94317d64 (+ 112) 802f1c20   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Connect__11TCPEndpointPC8sockaddr + 0x0234
94317dd4 (+  48) 802f097b   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:tcp_connect__FP12net_protocolPC8sockaddr + 0x0023
94317e04 (+  48) 802d189d   </boot/beos/system/add-ons/kernel/network/stack>:socket_connect__FP10net_socketPC8sockaddrUi + 0x0055
94317e34 (+  48) 802d76f6   </boot/beos/system/add-ons/kernel/network/stack>:stack_interface_connect__FP10net_socketPC8sockaddrUi + 0x002e
94317e64 (+  48) 8004ad85   <kernel>:common_connect__FiPC8sockaddrUib + 0x0049
94317e94 (+ 176) 8004c178   <kernel>:_user_connect + 0x0080
94317f44 (+ 100) 8009ace2   <kernel>:pre_syscall_debug_done + 0x0002 (nearest)
iframe at 0x94317fa8 (end = 0x94318000)
 eax 0x82           ebx 0x3c89f8        ecx 0x7ffe6dd0   edx 0xffff0104
 esi 0x7ffe6e4c     edi 0x20            ebp 0x7ffe6e1c   esp 0x94317fdc
 eip 0xffff0104  eflags 0x200213   
 vector: 0x63, error code: 0x0
94317fa8 (+   0) ffff0104
7ffe6e1c (+  80) 002185c1   <libneon.so.25>:ne_addr_destroy + 0x0085 (nearest)
7ffe6e6c (+  64) 00218688   <libneon.so.25>:ne_sock_connect + 0x006c
7ffe6eac (+  48) 00202742   <socket>:_term_after + 0x0122 (nearest)
7ffe6edc (+  48) 00202870   <socket>:_term_after + 0x0250 (nearest)
7ffe6f0c (+  48) 00202e84   <socket>:_term_after + 0x0864 (nearest)
7ffe6f3c (+  64) 00204e1c   <socket>:main + 0x03c8
7ffe6f7c (+  48) 0020253f   <socket>:_start + 0x005b
7ffe6fac (+  48) 0010078e   2245:runtime_loader_seg0ro@0x00100000 + 0x78e
7ffe6fdc (+   0) 7ffe6fec   2244:socket_main_stack@0x7efe7000 + 0xffffec

comment:5 Changed 11 years ago by andreasf

With SMP re-enabled, back to single_read:

stack trace for thread 177 "socket"
    kernel stack: 0x94306000 to 0x9430a000
      user stack: 0x7efe7000 to 0x7ffe7000
frame            caller     <image>:function + offset
94309be8 (+  12) 8009ab6d   <kernel>:int_bottom + 0x001d (nearest)
iframe at 0x94309bf4 (end = 0x94309c4c)
 eax 0x200          ebx 0x20c03         ecx 0x1          edx 0x246
 esi 0x0            edi 0x284f          ebp 0x94309c44   esp 0x94309c28
 eip 0x800971a4  eflags 0x246      
 vector: 0xfd, error code: 0x0
94309bf4 (+  80) 800971a4   <kernel>:arch_int_restore_interrupts + 0x0020
94309c44 (+  32) 80028aa2   <kernel>:restore_interrupts + 0x0012
94309c64 (+  96) 80030824   <kernel>:release_sem_etc + 0x049c
94309cc4 (+  48) 800291b1   <kernel>:mutex_unlock + 0x004d
94309cf4 (+  48) 802f535a   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Wait__8WaitListRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12MutexLockingxb + 0x002e
94309d24 (+  64) 802f6d13   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:_WaitForEstablished__11TCPEndpointRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12Mutex7
94309d64 (+ 112) 802f5c20   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Connect__11TCPEndpointPC8sockaddr + 0x0234
94309dd4 (+  48) 802f497b   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:tcp_connect__FP12net_protocolPC8sockaddr + 0x0023
94309e04 (+  48) 802d589d   </boot/beos/system/add-ons/kernel/network/stack>:socket_connect__FP10net_socketPC8sockaddrUi + 0x0055
94309e34 (+  48) 802db6f6   </boot/beos/system/add-ons/kernel/network/stack>:stack_interface_connect__FP10net_socketPC8sockaddrUi + 0x002e
94309e64 (+  48) 8004ad85   <kernel>:common_connect__FiPC8sockaddrUib + 0x0049
94309e94 (+ 176) 8004c178   <kernel>:_user_connect + 0x0080
94309f44 (+ 100) 8009ace2   <kernel>:pre_syscall_debug_done + 0x0002 (nearest)
iframe at 0x94309fa8 (end = 0x9430a000)
 eax 0x82           ebx 0x3c89f8        ecx 0x5          edx 0x1800e360
 esi 0x7ffe6e3c     edi 0x20            ebp 0x7ffe6e0c   esp 0x94309fdc
 eip 0xffff0102  eflags 0x200213   
 vector: 0x63, error code: 0x0
94309fa8 (+   0) ffff0102
7ffe6e0c (+  80) 002185c1   <libneon.so.25>:ne_addr_destroy + 0x0085 (nearest)
7ffe6e5c (+  64) 00218688   <libneon.so.25>:ne_sock_connect + 0x006c
7ffe6e9c (+  48) 00202742   <socket>:_term_after + 0x0122 (nearest)
7ffe6ecc (+  48) 00202870   <socket>:_term_after + 0x0250 (nearest)
7ffe6efc (+  64) 00203103   <socket>:_term_after + 0x0ae3 (nearest)
7ffe6f3c (+  64) 00204e1c   <socket>:main + 0x03c8
7ffe6f7c (+  48) 0020253f   <socket>:_start + 0x005b
7ffe6fac (+  48) 0010078e   2220:runtime_loader_seg0ro@0x00100000 + 0x78e
7ffe6fdc (+   0) 7ffe6fec   2219:socket_main_stack@0x7efe7000 + 0xffffec

comment:6 Changed 11 years ago by andreasf

At hrev25278 (with Axel's network fixes) hangs in addr_connect test, without meaningful backtrace.

comment:7 Changed 11 years ago by axeld

Component: Network & InternetNetwork & Internet/TCP
Milestone: R1R1/alpha1
Priority: normalcritical
Status: newassigned

This might be the same problem that happens with the OpenSSH forwarding test - I will look into that in the next days.

comment:8 Changed 11 years ago by axeld

How does it look now, after hrev25294? At least the OpenSSH forwarding test now works here.

comment:9 Changed 11 years ago by andreasf

At hrev25294 in peek_close:

kdebug> sc 51767
stack trace for thread 51767 "socket"
    kernel stack: 0x95449000 to 0x9544d000
      user stack: 0x7efe7000 to 0x7ffe7000
frame            caller     <image>:function + offset
9544cbf4 (+  32) 8002f112   <kernel>:context_switch__FP6threadT0 + 0x0026
9544cc14 (+  64) 8002f39c   <kernel>:scheduler_reschedule + 0x0268
9544cc54 (+  32) 8003c48c   <kernel>:thread_block_with_timeout_locked + 0x00d0
9544cc74 (+  80) 800306a0   <kernel>:switch_sem_etc + 0x0384
9544ccc4 (+  64) 800302ee   <kernel>:acquire_sem_etc + 0x0026
9544cd04 (+  64) 802f7392   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Wait__8WaitListRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12MutexLockingx + 0x006a
9544cd44 (+  48) 802f8e2a   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:_WaitForEstablished__11TCPEndpointRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12Mutexe
9544cd74 (+  96) 802f7a35   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Connect__11TCPEndpointPC8sockaddr + 0x0071
9544cdd4 (+  48) 802f697b   </boot/beos/system/add-ons/kernel/network/protocols/tcp>:tcp_connect__FP12net_protocolPC8sockaddr + 0x0023
9544ce04 (+  48) 802d7d41   </boot/beos/system/add-ons/kernel/network/stack>:socket_connect__FP10net_socketPC8sockaddrUi + 0x0055
9544ce34 (+  48) 802ddc0e   </boot/beos/system/add-ons/kernel/network/stack>:stack_interface_connect__FP10net_socketPC8sockaddrUi + 0x002e
9544ce64 (+  48) 8004b3d5   <kernel>:common_connect__FiPC8sockaddrUib + 0x0049
9544ce94 (+ 176) 8004c7c8   <kernel>:_user_connect + 0x0080
9544cf44 (+ 100) 8009b682   <kernel>:pre_syscall_debug_done + 0x0002 (nearest)
iframe at 0x9544cfa8 (end = 0x9544d000)
 eax 0x82           ebx 0x3c89f8        ecx 0x5          edx 0x1800fbe0
 esi 0x7ffe6e30     edi 0x20            ebp 0x7ffe6dfc   esp 0x9544cfdc
 eip 0xffff0102  eflags 0x200217   
 vector: 0x63, error code: 0x0
9544cfa8 (+   0) ffff0102
7ffe6dfc (+ 112) 00218582   <libneon.so.25>:ne_addr_destroy + 0x00c6 (nearest)
7ffe6e6c (+  64) 0021864f   <libneon.so.25>:ne_sock_connect + 0x0073
7ffe6eac (+  48) 00202712   <socket>:_term_after + 0x0122 (nearest)
7ffe6edc (+  48) 00202840   <socket>:_term_after + 0x0250 (nearest)
7ffe6f0c (+  48) 00203014   <socket>:_term_after + 0x0a24 (nearest)
7ffe6f3c (+  64) 00204fac   <socket>:main + 0x03c8
7ffe6f7c (+  48) 0020250f   <socket>:_start + 0x005b
7ffe6fac (+  48) 0010078e   1044226:runtime_loader_seg0ro@0x00100000 + 0x78e
7ffe6fdc (+   0) 7ffe6fec   1044225:socket_main_stack@0x7efe7000 + 0xffffec

Then after continue something new: FAIL (count not connect to server: Operation would block

I'll recheck whether #2195 had any influence on this, and whether my F12 had any influence on it or whether I was just too impatient by now.

comment:10 Changed 11 years ago by andreasf

Same revision, after restart, without Pe started, immediately passes peek_close but runs into a ca. two minute timeout in read_and_peek test, again leading to:

FAIL (could not connect to server: Operation would block)

Next try, one test earlier, small_reads, and in the same run in line_overlow and line_mingle (which both passed the run before). Still looks a little random, but CPU is no longer 100% (~0%) and it times out at all.

I noticed for some test earlier that client and server were being set up on the same (single) thread, could that be a problem? Seems like a non-standard scenario.

comment:11 Changed 11 years ago by bonefish

Resolution: fixed
Status: assignedclosed

Fixed in hrev25526.

Note: See TracTickets for help on using tickets.