Opened 17 years ago
Closed 17 years ago
#2172 closed bug (fixed)
neon's single_read test hangs at 100% CPU
Reported by: | andreasf | Owned by: | axeld |
---|---|---|---|
Priority: | critical | Milestone: | R1/alpha1 |
Component: | Network & Internet/TCP | Version: | R1/pre-alpha1 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
neon 0.25.x SVN branch, virtually unpatched, hangs at the single_read
socket test (make check
).
CPU stays at 100%, and not even kill -9 <pid>
can abort the test.
neon is an HTTP/WebDAV library used by Subversion; I compiled it against the expat XML parser and (optionally) against OpenSSL (no influence). http://tools.assembla.com/BePorts/wiki/net-misc/neon
Change History (11)
comment:1 by , 17 years ago
follow-up: 3 comment:2 by , 17 years ago
You could try "bt 3230" or "sc 3230" to see the current stacktrace.
comment:3 by , 17 years ago
(the above should have read peek_close
)
Replying to korli:
You could try "bt 3230" or "sc 3230" to see the current stacktrace.
Thanks for the hint.
Still at hrev25256, it occurred at addr_connect
test (another two earlier) this time:
kdebug> threads 174 thread id state sem/cv cpu pri stack team name 0x90db7800 174 running - 1 10 0x90ac9000 174 socket kdebug> sc 174 stack trace for thread 174 "socket" kernel stack: 0x90ac9000 to 0x90acd000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 800d053c (+2146630340) 90b183e0
and after continue
, during next F12:
kdebug> threads 174 thread id state sem/cv cpu pri stack team name 0x90db7800 174 running - 1 10 0x90ac9000 174 socket kdebug> sc 174 stack trace for thread 174 "socket" kernel stack: 0x90ac9000 to 0x90acd000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 00000082 -- read fault
I assume this lack of sensible output indicates some kind of stack corruption?
comment:4 by , 17 years ago
Better luck at hrev25274 with SMP disabled, back to peek_close
:
kdebug> threads 177 thread id state sem/cv cpu pri stack team name 0x90dbd800 177 running - 0 10 0x94314000 177 socket kdebug> sc 177 stack trace for thread 177 "socket" kernel stack: 0x94314000 to 0x94318000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 94317928 (+ 52) 8008ace7 <kernel>:invoke_debugger_command + 0x00cf 9431795c (+ 64) 8008ba90 <kernel>:_ParseCommand__16ExpressionParserRi + 0x01f8 9431799c (+ 48) 8008b482 <kernel>:EvaluateCommand__16ExpressionParserPCcRi + 0x01de 943179cc (+ 228) 8008cba4 <kernel>:evaluate_debug_command + 0x0088 94317ab0 (+ 64) 80089822 <kernel>:kernel_debugger_loop__Fv + 0x017a 94317af0 (+ 48) 8008a4d5 <kernel>:kernel_debugger + 0x010d 94317b20 (+ 192) 8008a3bd <kernel>:panic + 0x0029 94317be0 (+ 48) 807b7df1 </boot/beos/system/add-ons/kernel/bus_managers/ps2>:ps2_interrupt + 0x00cd 94317c10 (+ 64) 80028918 <kernel>:int_io_interrupt_handler + 0x00e0 94317c50 (+ 48) 80097522 <kernel>:hardware_interrupt + 0x007a 94317c80 (+ 12) 8009ab6d <kernel>:int_bottom + 0x001d (nearest) iframe at 0x94317c8c (end = 0x94317ce4) eax 0x90dbd800 ebx 0x90f01a9c ecx 0x2129d edx 0x0 esi 0x94317dcc edi 0x94317dcc ebp 0x94317cf4 esp 0x94317cc0 eip 0x80029171 eflags 0x282 vector: 0x21, error code: 0x0 94317c8c (+ 104) 80029171 <kernel>:mutex_unlock + 0x000d 94317cf4 (+ 48) 802f135a </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Wait__8WaitListRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12MutexLockingxb + 0x002e 94317d24 (+ 64) 802f2d13 </boot/beos/system/add-ons/kernel/network/protocols/tcp>:_WaitForEstablished__11TCPEndpointRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12Mutex7 94317d64 (+ 112) 802f1c20 </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Connect__11TCPEndpointPC8sockaddr + 0x0234 94317dd4 (+ 48) 802f097b </boot/beos/system/add-ons/kernel/network/protocols/tcp>:tcp_connect__FP12net_protocolPC8sockaddr + 0x0023 94317e04 (+ 48) 802d189d </boot/beos/system/add-ons/kernel/network/stack>:socket_connect__FP10net_socketPC8sockaddrUi + 0x0055 94317e34 (+ 48) 802d76f6 </boot/beos/system/add-ons/kernel/network/stack>:stack_interface_connect__FP10net_socketPC8sockaddrUi + 0x002e 94317e64 (+ 48) 8004ad85 <kernel>:common_connect__FiPC8sockaddrUib + 0x0049 94317e94 (+ 176) 8004c178 <kernel>:_user_connect + 0x0080 94317f44 (+ 100) 8009ace2 <kernel>:pre_syscall_debug_done + 0x0002 (nearest) iframe at 0x94317fa8 (end = 0x94318000) eax 0x82 ebx 0x3c89f8 ecx 0x7ffe6dd0 edx 0xffff0104 esi 0x7ffe6e4c edi 0x20 ebp 0x7ffe6e1c esp 0x94317fdc eip 0xffff0104 eflags 0x200213 vector: 0x63, error code: 0x0 94317fa8 (+ 0) ffff0104 7ffe6e1c (+ 80) 002185c1 <libneon.so.25>:ne_addr_destroy + 0x0085 (nearest) 7ffe6e6c (+ 64) 00218688 <libneon.so.25>:ne_sock_connect + 0x006c 7ffe6eac (+ 48) 00202742 <socket>:_term_after + 0x0122 (nearest) 7ffe6edc (+ 48) 00202870 <socket>:_term_after + 0x0250 (nearest) 7ffe6f0c (+ 48) 00202e84 <socket>:_term_after + 0x0864 (nearest) 7ffe6f3c (+ 64) 00204e1c <socket>:main + 0x03c8 7ffe6f7c (+ 48) 0020253f <socket>:_start + 0x005b 7ffe6fac (+ 48) 0010078e 2245:runtime_loader_seg0ro@0x00100000 + 0x78e 7ffe6fdc (+ 0) 7ffe6fec 2244:socket_main_stack@0x7efe7000 + 0xffffec
comment:5 by , 17 years ago
With SMP re-enabled, back to single_read
:
stack trace for thread 177 "socket" kernel stack: 0x94306000 to 0x9430a000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 94309be8 (+ 12) 8009ab6d <kernel>:int_bottom + 0x001d (nearest) iframe at 0x94309bf4 (end = 0x94309c4c) eax 0x200 ebx 0x20c03 ecx 0x1 edx 0x246 esi 0x0 edi 0x284f ebp 0x94309c44 esp 0x94309c28 eip 0x800971a4 eflags 0x246 vector: 0xfd, error code: 0x0 94309bf4 (+ 80) 800971a4 <kernel>:arch_int_restore_interrupts + 0x0020 94309c44 (+ 32) 80028aa2 <kernel>:restore_interrupts + 0x0012 94309c64 (+ 96) 80030824 <kernel>:release_sem_etc + 0x049c 94309cc4 (+ 48) 800291b1 <kernel>:mutex_unlock + 0x004d 94309cf4 (+ 48) 802f535a </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Wait__8WaitListRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12MutexLockingxb + 0x002e 94309d24 (+ 64) 802f6d13 </boot/beos/system/add-ons/kernel/network/protocols/tcp>:_WaitForEstablished__11TCPEndpointRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12Mutex7 94309d64 (+ 112) 802f5c20 </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Connect__11TCPEndpointPC8sockaddr + 0x0234 94309dd4 (+ 48) 802f497b </boot/beos/system/add-ons/kernel/network/protocols/tcp>:tcp_connect__FP12net_protocolPC8sockaddr + 0x0023 94309e04 (+ 48) 802d589d </boot/beos/system/add-ons/kernel/network/stack>:socket_connect__FP10net_socketPC8sockaddrUi + 0x0055 94309e34 (+ 48) 802db6f6 </boot/beos/system/add-ons/kernel/network/stack>:stack_interface_connect__FP10net_socketPC8sockaddrUi + 0x002e 94309e64 (+ 48) 8004ad85 <kernel>:common_connect__FiPC8sockaddrUib + 0x0049 94309e94 (+ 176) 8004c178 <kernel>:_user_connect + 0x0080 94309f44 (+ 100) 8009ace2 <kernel>:pre_syscall_debug_done + 0x0002 (nearest) iframe at 0x94309fa8 (end = 0x9430a000) eax 0x82 ebx 0x3c89f8 ecx 0x5 edx 0x1800e360 esi 0x7ffe6e3c edi 0x20 ebp 0x7ffe6e0c esp 0x94309fdc eip 0xffff0102 eflags 0x200213 vector: 0x63, error code: 0x0 94309fa8 (+ 0) ffff0102 7ffe6e0c (+ 80) 002185c1 <libneon.so.25>:ne_addr_destroy + 0x0085 (nearest) 7ffe6e5c (+ 64) 00218688 <libneon.so.25>:ne_sock_connect + 0x006c 7ffe6e9c (+ 48) 00202742 <socket>:_term_after + 0x0122 (nearest) 7ffe6ecc (+ 48) 00202870 <socket>:_term_after + 0x0250 (nearest) 7ffe6efc (+ 64) 00203103 <socket>:_term_after + 0x0ae3 (nearest) 7ffe6f3c (+ 64) 00204e1c <socket>:main + 0x03c8 7ffe6f7c (+ 48) 0020253f <socket>:_start + 0x005b 7ffe6fac (+ 48) 0010078e 2220:runtime_loader_seg0ro@0x00100000 + 0x78e 7ffe6fdc (+ 0) 7ffe6fec 2219:socket_main_stack@0x7efe7000 + 0xffffec
comment:6 by , 17 years ago
At hrev25278 (with Axel's network fixes) hangs in addr_connect
test, without meaningful backtrace.
comment:7 by , 17 years ago
Component: | Network & Internet → Network & Internet/TCP |
---|---|
Milestone: | R1 → R1/alpha1 |
Priority: | normal → critical |
Status: | new → assigned |
This might be the same problem that happens with the OpenSSH forwarding test - I will look into that in the next days.
comment:8 by , 17 years ago
How does it look now, after hrev25294? At least the OpenSSH forwarding test now works here.
comment:9 by , 17 years ago
At hrev25294 in peek_close
:
kdebug> sc 51767 stack trace for thread 51767 "socket" kernel stack: 0x95449000 to 0x9544d000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 9544cbf4 (+ 32) 8002f112 <kernel>:context_switch__FP6threadT0 + 0x0026 9544cc14 (+ 64) 8002f39c <kernel>:scheduler_reschedule + 0x0268 9544cc54 (+ 32) 8003c48c <kernel>:thread_block_with_timeout_locked + 0x00d0 9544cc74 (+ 80) 800306a0 <kernel>:switch_sem_etc + 0x0384 9544ccc4 (+ 64) 800302ee <kernel>:acquire_sem_etc + 0x0026 9544cd04 (+ 64) 802f7392 </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Wait__8WaitListRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12MutexLockingx + 0x006a 9544cd44 (+ 48) 802f8e2a </boot/beos/system/add-ons/kernel/network/protocols/tcp>:_WaitForEstablished__11TCPEndpointRQ28BPrivatet10AutoLocker2Z5mutexZQ28BPrivate12Mutexe 9544cd74 (+ 96) 802f7a35 </boot/beos/system/add-ons/kernel/network/protocols/tcp>:Connect__11TCPEndpointPC8sockaddr + 0x0071 9544cdd4 (+ 48) 802f697b </boot/beos/system/add-ons/kernel/network/protocols/tcp>:tcp_connect__FP12net_protocolPC8sockaddr + 0x0023 9544ce04 (+ 48) 802d7d41 </boot/beos/system/add-ons/kernel/network/stack>:socket_connect__FP10net_socketPC8sockaddrUi + 0x0055 9544ce34 (+ 48) 802ddc0e </boot/beos/system/add-ons/kernel/network/stack>:stack_interface_connect__FP10net_socketPC8sockaddrUi + 0x002e 9544ce64 (+ 48) 8004b3d5 <kernel>:common_connect__FiPC8sockaddrUib + 0x0049 9544ce94 (+ 176) 8004c7c8 <kernel>:_user_connect + 0x0080 9544cf44 (+ 100) 8009b682 <kernel>:pre_syscall_debug_done + 0x0002 (nearest) iframe at 0x9544cfa8 (end = 0x9544d000) eax 0x82 ebx 0x3c89f8 ecx 0x5 edx 0x1800fbe0 esi 0x7ffe6e30 edi 0x20 ebp 0x7ffe6dfc esp 0x9544cfdc eip 0xffff0102 eflags 0x200217 vector: 0x63, error code: 0x0 9544cfa8 (+ 0) ffff0102 7ffe6dfc (+ 112) 00218582 <libneon.so.25>:ne_addr_destroy + 0x00c6 (nearest) 7ffe6e6c (+ 64) 0021864f <libneon.so.25>:ne_sock_connect + 0x0073 7ffe6eac (+ 48) 00202712 <socket>:_term_after + 0x0122 (nearest) 7ffe6edc (+ 48) 00202840 <socket>:_term_after + 0x0250 (nearest) 7ffe6f0c (+ 48) 00203014 <socket>:_term_after + 0x0a24 (nearest) 7ffe6f3c (+ 64) 00204fac <socket>:main + 0x03c8 7ffe6f7c (+ 48) 0020250f <socket>:_start + 0x005b 7ffe6fac (+ 48) 0010078e 1044226:runtime_loader_seg0ro@0x00100000 + 0x78e 7ffe6fdc (+ 0) 7ffe6fec 1044225:socket_main_stack@0x7efe7000 + 0xffffec
Then after continue
something new:
FAIL (count not connect to server: Operation would block
I'll recheck whether #2195 had any influence on this, and whether my F12 had any influence on it or whether I was just too impatient by now.
comment:10 by , 17 years ago
Same revision, after restart, without Pe started, immediately passes peek_close
but runs into a ca. two minute timeout in read_and_peek
test, again leading to:
FAIL (could not connect to server: Operation would block)
Next try, one test earlier, small_reads
, and in the same run in line_overlow
and line_mingle
(which both passed the run before). Still looks a little random, but CPU is no longer 100% (~0%) and it times out at all.
I noticed for some test earlier that client and server were being set up on the same (single) thread, could that be a problem? Seems like a non-standard scenario.
In hrev25256 it fails one test earlier,
peak_close
, with the same symptoms.No hints in serial output. The kernel debugger seems to indicate that neon's
socket
app has a single thread, in therunning
state.