Opened 12 years ago

Closed 12 years ago

#9184 closed bug (fixed)

ifconfig up/down an interface repeatedly causes a kdl

Reported by: kallisti5 Owned by: axeld
Priority: high Milestone: R1/beta1
Component: Network & Internet/IPv6 Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

gcc4 image, hrev44859

running: ifconfig <device> down ifconfig <device> up ifconfig <device> down . .

will result in a KDL. Reproduced several times.

Same thing happens on wired vs wireless interfaces

Attachments (1)

IMG_20121118_102617.jpg (283.1 KB ) - added by kallisti5 12 years ago.

Download all attachments as: .zip

Change History (13)

by kallisti5, 12 years ago

Attachment: IMG_20121118_102617.jpg added

comment:1 by anevilyak, 12 years ago

Component: Network & Internet/StackNetwork & Internet/IPv6

comment:2 by jackburton, 12 years ago

I found out that, when in KDL, the output of net_interface shows that the ipv6 domains for the interface I played with has a refcount of 2, while the ipv4 has a refcount of 1. Moreover, the KDL doesn't happen if I remove the ipv6 module. After having a look at the code, I suspect that the culprit is the function ipv6_setsockopt() in src/add-ons/kernel/network/protocols/ipv6.cpp

at line 1240 it calls sDatalinkModule->get_interface_with_address() but, unlike the ipv4 equivalent code, it does not call sDatalinkModule->put_interface(interface).

ipv6:

	struct net_interface* interface
		= sDatalinkModule->get_interface_with_address(
			(sockaddr*)address);
	if (interface == NULL) {
		delete address;
		return EADDRNOTAVAIL;
	}

	delete protocol->multicast_address;
	protocol->multicast_address = (struct sockaddr*)address;
	return B_OK;

cf. ipv4:

	struct net_interface* interface
		= sDatalinkModule->get_interface_with_address(
			(sockaddr*)address);
	if (interface == NULL) {
		delete address;
		return EADDRNOTAVAIL;
	}

	delete protocol->multicast_address;
	protocol->multicast_address = (struct sockaddr*)address;
	sDatalinkModule->put_interface(interface);
	return B_OK;

I don't have a development system at hand, but maybe somebody can comment on this, or even commit the fix.

Last edited 12 years ago by jackburton (previous) (diff)

comment:3 by axeld, 12 years ago

This is definitely a bug that should be fixed. Not sure if it's the culprit for this ticket, but it may be. I may have time to look into this in about a week.

comment:4 by umccullough, 12 years ago

Priority: normalcritical

comment:5 by umccullough, 12 years ago

Priority: criticalhigh

Sorry for ticket spam, I meant to set that High :(

in reply to:  3 comment:6 by jackburton, 12 years ago

Replying to axeld:

This is definitely a bug that should be fixed. Not sure if it's the culprit for this ticket, but it may be. I may have time to look into this in about a week.

In fact, I just tried and it doesn't fix the issue in question. Oh well, at least I found another problem.

comment:7 by axeld, 12 years ago

You could have committed it directly, though :-) I've done so now in hrev45154, thanks!

in reply to:  7 comment:8 by jackburton, 12 years ago

Replying to axeld:

You could have committed it directly, though :-) I've done so now in hrev45154, thanks!

Yeah, but when finally I got access to the development machine, and saw that it didn't fix the bug in question, I lost a bit of enthusiasm. :)

comment:9 by axeld, 12 years ago

I reproduced this issue as well using the debug heap, but unfortunately, this didn't help. I did not get a nice stack trace out of this, either.

I've digged in a little, and it looks to me like the hash table has been corrupted somehow: the crash does not always happens at the same spot, but jumps around at different hash table methods. This might either mean that the multicast implementation uses the hash incorrectly, or that the hash implementation itself is at fault.

in reply to:  9 comment:10 by jackburton, 12 years ago

Replying to axeld:

I reproduced this issue as well using the debug heap, but unfortunately, this didn't help. I did not get a nice stack trace out of this, either.

I've digged in a little, and it looks to me like the hash table has been corrupted somehow: the crash does not always happens at the same spot, but jumps around at different hash table methods. This might either mean that the multicast implementation uses the hash incorrectly, or that the hash implementation itself is at fault.

Now that you said that... CID 610870 and CID 610871 are about the multicast implementation and the use of the hash table. I'm not very familiar with that code, so I haven't checked if they reference actual problems, but they might be worth checking.

comment:11 by jackburton, 12 years ago

Should be fixed in hrev45220, could someone confirm ?

comment:12 by jackburton, 12 years ago

Resolution: fixed
Status: newclosed

I'm closing this, tested on three different systems and can't reproduce it anymore. hrev45220 indeed fixed this.

Note: See TracTickets for help on using tickets.