Opened 6 years ago

Closed 5 years ago

#10328 closed bug (fixed)

[Network Stack] crashes in socket_free

Reported by: diver Owned by: nobody
Priority: high Milestone: R1/beta1
Component: Network & Internet/Stack Version: R1/Development
Keywords: Cc: degea@…
Blocked By: Blocking: #10814
Has a Patch: no Platform: All

Description

This is hrev46546 gcc2 hybrid in VirtualBox.

Got this KDL a few times after firing up Web+ (right after boot) and going to https://money.yandex.ru

Attachments (1)

kdl.png (92.4 KB) - added by diver 6 years ago.

Download all attachments as: .zip

Change History (23)

Changed 6 years ago by diver

Attachment: kdl.png added

comment:1 Changed 6 years ago by ttcoder

Cc: degea@… added

Gotta follow the action on this in light of the other tracked kernel problems..

Here's socket_free() for reference: http://cgit.haiku-os.org/haiku/tree/src/add-ons/kernel/network/stack/net_socket.cpp#n460

socket_free(net_socket* _socket)
{
	net_socket_private* socket = (net_socket_private*)_socket;
	socket->first_info->free(socket->first_protocol);
	socket->ReleaseReference();
}

I suppose gcc inlined an atomic_add() in lieu of ReleaseReference().

comment:2 Changed 6 years ago by ttcoder

If I didn't screw up searching, the inlined atomic_add() is here and the fUseCount variable is in class BWeakReferenceable here.

I guess the KDL hints at the net_socket_private having been deleted before, thus being reset to deadbeef, including its BWeakReferenceable part (and/or) its WeakPointer member and its fUseCount member... So when atomic_add() dereferences the weakpointer to access its fUseCount it dereferences 0xdeadbeef plus the offset to that usecount variable, == 0xdeadbef7.. So this would be a "heap corruption/double free()" scenario.. Sounds correct to any of you kernel gurus ?

Maybe diver could do a dis or even dis -b20 to check how edx ended up the way it is..

comment:3 Changed 6 years ago by diver

hrev46686. Turning on and off Traffic/Photos in the collapsable view in the upper right corner at http://maps.google.com (with classic interface) reproduces this bug.

comment:4 Changed 6 years ago by anevilyak

Component: Kits/Network KitNetwork & Internet/Stack

comment:5 Changed 6 years ago by diver

Summary: [Network Kit] KDL when accessing https sites[Network Stack] crashes in socket_free

comment:6 Changed 6 years ago by pulkomandy

Owner: changed from pulkomandy to nobody
Status: newassigned

Sorry, I'm afraid I can't really help here. I'll let someone else have a look.

comment:7 Changed 5 years ago by anevilyak

Blocking: 10814 added

(In #10814) Duplicate of #10328.

comment:8 Changed 5 years ago by waddlesplash

This is getting really, really bad -- I can't really browse anywhere without this happening. I haven't found any sites that cause it to happen 100% of the time, but it seems to happen repeatedly after browsing for any significant amount of time.

Perhaps a committer can look at Coverity and see if there are any open warnings in the network stack?

comment:9 Changed 5 years ago by ttcoder

Just had a similar KDL (occurs very rarely here), starting from BSecureSocket::WaitForData() instead of BSecureSocket::Disconnect() if I recall.

I say "if I recall" because I don't have the full backtrace: I expected it would be in previous_syslog but unfortunately the interesting part is corrupted/truncated by some trailing 'color codes' themselves followed by seemingly random binary garbage:

write access attempted on write-protected area 0x54 at 0xdeadb000
vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0xdeadbef7, ip 0x8008d6be, write 1, user 0, thread 0xb45
PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbef7, ip 0x8008d6be

Welcome to Kernel Debugging Land...
Thread 2885 "BUrlProtocol.HTTP" running on CPU 1
stack trace for thread 2885 "BUrlProtocol.HTTP"
    kernel stack: 0x8152a000 to 0x8152e000
      user stack: 0x7a33b000 to 0x7a37b000
frame               caller     <image>:function + offset
 0 8152dc94 (+  32) 801413b6   <kernel_x86> arch_debug_stack_trace + 0x12
 1 8152dcb4 (+  16) 800a131f   <kernel_x86> stack_trace_trampoline(NULL) + 0x0b
 2 8152dcc4 (+  12) 801330fe   <kernel_x86> arch_debug_call_with_fault_handler + 0x1b
 3 8152dcd0 (+  48) 800a2e8a   <kernel_x86> debug_call_with_fault_handler + 0x5a
 4 8152dd00 (+  64) 800a153b   <kernel_x86> kernel_debugger_loop([34m0x80183f77[0m [36m"PANIC: "[0m, [34m0x8019a920[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx

Also forgot to run a dis:-/

comment:10 Changed 5 years ago by kallisti5

Milestone: R1R1/alpha5
Priority: normalhigh

comment:11 Changed 5 years ago by phoudoin

Rebuilding the network stack with TRACE_SOCKET defined would allow to see when the net_socket_private destructor seems to be called previously. Or, if not, at least we could search for heap corruption, then.

comment:12 Changed 5 years ago by mmu_man

I also get this one from time to time... In VirtualBox as well.

comment:13 Changed 5 years ago by waddlesplash

I haven't seen this for months. Anyone else?

comment:14 Changed 5 years ago by pulkomandy

It still happens.

comment:15 Changed 5 years ago by pulkomandy

Blocked By: 11098 added
Resolution: duplicate
Status: assignedclosed

comment:16 Changed 5 years ago by mmlr

Resolution: duplicate
Status: closedreopened

No, this one is different. Only 0xdeadbef7 in common_{poll|select|wait_for_object} are #11098.

comment:17 Changed 5 years ago by mmlr

Blocked By: 11098 removed

comment:18 Changed 5 years ago by pulkomandy

Milestone: R1/alpha5R1/beta1

comment:19 Changed 5 years ago by pulkomandy

I haven't seen this one in a while either now. Can anyone still reproduce it?

comment:20 Changed 5 years ago by diver

I can't reproduce it with google maps either.

comment:21 Changed 5 years ago by ttcoder

Didn't see it in many weeks, though that kinda overlaps with the time at which youtube videos started acting up (they abort playing after a few seconds), so not absolutely ruling out it be back if/when videos work again later, but yes for now this KDL seems to be gone!

comment:22 Changed 5 years ago by pulkomandy

Resolution: fixed
Status: reopenedclosed

Ok then, let's close for now.

Note: See TracTickets for help on using tickets.