#12208 closed bug (fixed)
[Network Kit] DHCP is broken since hrev49401
Reported by: | diver | Owned by: | pulkomandy |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta1 |
Component: | Kits/Network Kit | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #11275, #12223, #12235 | |
Platform: | All |
Attachments (9)
Change History (59)
follow-up: 17 comment:1 by , 9 years ago
comment:5 by , 9 years ago
DHCP is broken here on real hardware. Manually setting network preferences in terminal does not work either.
comment:6 by , 9 years ago
This is what I get when I switch from Bridged mode to NAT:
AEMON 'DHCP': /dev/net/ipro1000/0: Send DHCP_REQUEST for 10.0.2.15 to 255.255.255.255:67 KERN: [ipro1000] (lem) Link is up 1000 Mbps Full Duplex KERN: /dev/net/ipro1000/0: media change, media 0x900030 quality 1000 speed 1000000000 DAEMON 'DHCP': /dev/net/ipro1000/0: Received DHCP_NACK from 192.168.0.1 DAEMON 'DHCP': /dev/net/ipro1000/0: Send DHCP_DISCOVER to 255.255.255.255:67 DAEMON 'DHCP': /dev/net/ipro1000/0: Received DHCP_OFFER from 192.168.0.1 DAEMON 'DHCP': your_address: 192.168.0.6 DAEMON 'DHCP': server: 192.168.0.1 DAEMON 'DHCP': lease time: 25200 seconds DAEMON 'DHCP': nameserver[0]: 192.168.0.1 DAEMON 'DHCP': gateway: 192.168.0.1 DAEMON 'DHCP': subnet: 255.255.255.0 DAEMON 'DHCP': UNKNOWN OPTION 252 (0xfc) DAEMON 'DHCP': /dev/net/ipro1000/0: Send DHCP_REQUEST for 192.168.0.6 to 255.255.255.255:67 DAEMON 'DHCP': /dev/net/ipro1000/0: Timeout shift: 8 secs (try 1) DAEMON 'DHCP': /dev/net/ipro1000/0: Send DHCP_REQUEST for 192.168.0.6 to 255.255.255.255:67 DAEMON 'DHCP': /dev/net/ipro1000/0: Timeout shift: 16 secs (try 2) DAEMON 'DHCP': /dev/net/ipro1000/0: Send DHCP_REQUEST for 192.168.0.6 to 255.255.255.255:67
comment:7 by , 9 years ago
The network connection is broken because the /boot/systems/settings/network/services file is empty. I replaced it with one I copied from a working Haiku partition and restored network functionality.
comment:8 by , 9 years ago
After a reboot, I lost network again. Checked my settings - everything looks OK. As soon as I disabled the SSH server, network started working again.
comment:9 by , 9 years ago
Reporting my comments from IRC here: in bridged mode, the DHCP is handled by whatever DHCP server is installed on your real network. In NAT mode, it is handled by vbox.
I suspect there is a timing problem or something similar making DHCP not work well in certain configurations. For example, with one laptop I can connect to my home wifi network, but not with another. And this second laptop will work on some other wifi networks.
A trace of what gets sent and received on the network (you can use tcpdump in Haiku, and tcpdump or wireshark on the host) would be useful. Also, more information about your network: why is Haiku trying the IP 10.0.2.15? Apparently your DHCP server is in 192.168.0.x, we adjust to this, but then the server seems to be ignoring our DHCP_REQUESTs? Or is it that the reply of the server is lost somewhere on the way?
comment:10 by , 9 years ago
10.0.2.15 is an address VirtualBox DHCP server (10.0.2.2) is assigning to Haiku guest in NAT mode. 192.168.0.6 is an address my router (192.168.0.1) assigned to Haiku while it was still in bridged mode.
comment:11 by , 9 years ago
I did a fresh install of hrev49408_gcc2 to a hard drive partition. Attached is a syslog of the first boot after installation. Attached is a listdev of my hardware. Attached screenshot1, screenshot2 and screenshot3 are of the Network Preferences post install Attached screenshot4 is showing the contents of the /boot/system/settings/network directory Note that "interfaces" and "resolv.conf" are both missing.
by , 9 years ago
by , 9 years ago
by , 9 years ago
Attachment: | screenshot1.png added |
---|
by , 9 years ago
Attachment: | screenshot2.png added |
---|
by , 9 years ago
Attachment: | screenshot3.png added |
---|
by , 9 years ago
Attachment: | screenshot4.png added |
---|
comment:12 by , 9 years ago
I can restore network function by copying resolv.conf from a working Haiku partition, then executing the following commands in terminal:
ifconfig /dev/net/ipro1000/0 192.168.1.67 route add /dev/net/ipro1000/0 default gw 192.168.1.254
Now, the "interfaces" file magically appears in the network settings directory (screenshot5)
by , 9 years ago
Attachment: | screenshot5.png added |
---|
comment:13 by , 9 years ago
After a reboot network is lost again. Opened tcpdump in terminal (screenshot6)
by , 9 years ago
Attachment: | screenshot6.png added |
---|
comment:14 by , 9 years ago
comment:15 by , 9 years ago
Yes, but why? The DHCP client code does not use getadrinfo, so how can it be affected by this?
A wireshark capture in either case (working/not working) may help understanding the difference.
comment:17 by , 9 years ago
Replying to luroh:
Fwiw, a gcc2h hrev49404 build works fine here in VBox 5.0.0.
FWIW, on gcc2h hrev49404 in bridged mode with VirtualBox 5, I get the same problem as reported here. NAT mode works fine for me.
Furthermore, I was unable to define a static IP. I can set it, but no network connection is picked up despite of that.
I also have seen similar behavior to Pulkomandy's comments, where it seems to sometimes work on WiFi, and sometimes not. This is with the same WiFi network, which is fine for all other devices and OSes.
comment:18 by , 9 years ago
I find if you try to enter a static IP address into the network preferences and hit "apply" the settings do not take. However, if you set it up from the terminal, all is well. Don't switch to DHCP or your settings will be erased again upon reboot.
follow-up: 20 comment:19 by , 9 years ago
Can you explain how to set it up from the terminal? So, what, exactly, is causing this issue, if anyone knows? How did it happen?
comment:20 by , 9 years ago
Replying to Luposian:
Can you explain how to set it up from the terminal? So, what, exactly, is causing this issue, if anyone knows? How did it happen?
See the source activity log for hrev49293, where they switched from libbind to netresolv. I believe Axel will be looking into it once the issues with the new launch daemon are sorted out. In the meantime you can use hrev49292, which was the last build before the change.
From the terminal:
ifconfig interface address e.g. ifconfig /dev/net/ipro1000/0 192.168.1.67
route add interface default gw destination address e.g. route add /dev/net/ipro1000/0 default gw 192.168.1.254
You will also need to enter your DHCP server address(s) to /boot/system/settings/network/resolv.conf
comment:21 by , 9 years ago
We've found the bug, and it's a problem in the DHCP client. How this ever worked we have no clue at all. Fix coming soon...
comment:22 by , 9 years ago
Blocking: | 12235 added |
---|
comment:23 by , 9 years ago
waddlesplash: did you really find the bug?
We discussed this on IRC with kallisti5, but I could not get him to upload a capture of the only useful packet: the DHCP OFFER from the server. So, everything is only speculation so far.
We need a complete capture of all the packets in the DHCP negotiation. tcpdump only shows the headers with the default settings, which is not enough. Wireshark screenshots like kallisti5 did are better, but he missed one of the packets (which comes from the server).
comment:24 by , 9 years ago
It fails on NACK on my machine, OpenWRT router.
DAEMON 'DHCP': /dev/net/iprowifi4965/0: Send DHCP_DISCOVER to 255.255.255.255:67 DAEMON 'DHCP': /dev/net/iprowifi4965/0: Received DHCP_OFFER from 192.168.9.1 DAEMON 'DHCP': your_address: 192.168.9.210 DAEMON 'DHCP': server: 192.168.9.1 DAEMON 'DHCP': lease time: 43200 seconds DAEMON 'DHCP': renewal time: 21600 seconds DAEMON 'DHCP': rebinding time: 37800 seconds DAEMON 'DHCP': subnet: 255.255.255.0 DAEMON 'DHCP': broadcast: 192.168.9.255 DAEMON 'DHCP': gateway: 192.168.9.1 DAEMON 'DHCP': nameserver[0]: 192.168.9.1 DAEMON 'DHCP': domain name: "lan" DAEMON 'DHCP': /dev/net/iprowifi4965/0: Send DHCP_REQUEST for 192.168.9.210 to 255.255.255.255:67 DAEMON 'DHCP': /dev/net/iprowifi4965/0: Received DHCP_NACK from 192.168.9.1
comment:25 by , 9 years ago
Again, a complete capture of all the packets (DISCOVER, OFFER, REQUEST, and NACK/ACK if any) would be helpful. It can't be that hard to capture 4 ethernet packets with wireshark and upload the pcap file?
comment:26 by , 9 years ago
And here is the DHCP server log:
Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 available DHCP range: 192.168.9.100 -- 192.168.9.249 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 client provides name: shredder Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 DHCPDISCOVER(br-lan) c4:85:08:45:36:ca Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 tags: lan, br-lan Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 DHCPOFFER(br-lan) 192.168.9.210 c4:85:08:45:36:ca Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 requested options: 1:netmask, 3:router, 6:dns-server, 28:broadcast, Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 requested options: 15:domain-name Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 next server: 192.168.9.1 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 1 option: 53 message-type 2 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 54 server-identifier 192.168.9.1 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 51 lease-time 12h Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 58 T1 6h Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 59 T2 10h30m Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 1 netmask 255.255.255.0 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 28 broadcast 192.168.9.255 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 3 router 192.168.9.1 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 4 option: 6 dns-server 192.168.9.1 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 sent size: 3 option: 15 domain-name lan Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 available DHCP range: 192.168.9.100 -- 192.168.9.249 Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 client provides name: shredder Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 DHCPREQUEST(br-lan) 192.168.9.210 c4:85:08:45:36:ca Jul 27 17:59:51 dnsmasq-dhcp[3318]: 204252940 DHCPNAK(br-lan) 192.168.9.210 c4:85:08:45:36:ca wrong server-ID
(Pulkomandy, you are out of the loop. Please be nice instead of just polluting bug reports. The bugs are for logging info on an issue. We are working on it in IRC.)
comment:27 by , 9 years ago
Guys, the cast is broken... here's proof:
diff --git a/src/servers/net/DHCPClient.cpp b/src/servers/net/DHCPClient.cpp index b3d4c1a..d6d1144 100644 --- a/src/servers/net/DHCPClient.cpp +++ b/src/servers/net/DHCPClient.cpp @@ -766,6 +766,8 @@ DHCPClient::_ParseOptions(dhcp_message& message, BMessage& address, syslog(LOG_DEBUG, " server: %s\n", _AddressToString(data).String()); fServer.SetAddress(*(in_addr_t*)data); + syslog(LOG_DEBUG, " server set: %s\n", + fServer.ToString().String()); break; case OPTION_ADDRESS_LEASE_TIME:
result attached
comment:28 by , 9 years ago
As I said on IRC, it doesn't prove anything yet. There is also BNetworkAddress
involved. The only thing wrong with this cast is that there is a special place in hell reserved for people using C-style casts in C++ code ;)
comment:29 by , 9 years ago
It looks like BNetworkAddress::SetTo()
and/or constructor may be the culprit. DHCP client construct fServer with AF_INET, nullptr
host and DHCP port (c.f. http://cgit.haiku-os.org/haiku/tree/src/servers/net/DHCPClient.cpp#n430 ). Then it expects SetAddress()
never to fail. However, SetAddress()
expects address family to be set properly which may not happen in the insides of SetTo()
try to be more smart than they are expected to be. Someone will more free time will have to investigate this more closely.
comment:31 by , 9 years ago
Changing:
fServer.SetAddress(*(in_addr_t*)data);
to
fServer.SetTo(*(in_addr_t*)data, DHCP_SERVER_PORT);
works around the issue. BNetworkAddressResolver doesn't seem to work the same as before. getaddrinfo() will always return error for a NULL host so that it forgets that family and port has been set in DHCPClient constructor. Other than that BNetworkAddress and BNetworkAddressResolver seems quite complex and doing duplicate InitCheck's.
comment:32 by , 9 years ago
The only thing changed with hrev49401 is the BNetworkAddressResolver
backend -- IOW the error (maybe just now revealed) has something to do with that, as it used to work just fine before.
BTW we do have unit tests at least for BNetworkAddress
. Please, before you do anything else, enhance those tests to reproduce the bug you are experiencing.
comment:34 by , 9 years ago
The unit tests are run inside the build tree, they are not installable.
Start with an Haiku install with a git checkout of Haiku sources.
jam -q unittests
will compile the unitTester executable and all the tests as add-ons for it. You may need to add a symlink to libcppunit.so (also built during this process) next to the unitTester executable (IIRC in generated/tests), then run for example unitTester network
(the unitTester executable has an help message and can list the available tests).
comment:35 by , 9 years ago
My development environment is in Linux, so if it don't work there or can be installed I don't think I'll be able to do this.
comment:36 by , 9 years ago
How would that be possible? They are unit tests for Haiku, not Linux, so they have to be run from Haiku. They do, of course, also work in a VM, though. So if you are Linux only, you would need to run Haiku in a VM, check out the source tree, and build the unit tests there.
That's how it'll run on our buildbots, too, once we manage to integrate it.
comment:37 by , 9 years ago
They could be built on Linux and included in the generated image so you can run them. Actually it would make sense to do it that way on the buildbots.
comment:38 by , 9 years ago
My point exactly. I run the Haiku build in a VM, so why can't I crosscompile tests as any other app? I'll mess a bit with unit tests, but I'll leave this for anyone with a already working environment for this. I don't want several trees and for ACPI and EFI my current setup is needed.
comment:39 by , 9 years ago
So this in build/jam/UserBuildConfig works for running this bugs unittest:
AddFilesToHaikuImage home tests : UnitTester ; AddFilesToHaikuImage home tests lib : libcppunit.so libnetapitest.so ;
comment:40 by , 9 years ago
Wrote some unit test for the cases, but they work fine. Puckipedia suggested it is because VM has internet connection. So: Disabled network in QEMU (-net none) unit test takes forever and more tests fail. Disabled network interface in Haiku, more tests fails, but instantly.
Seems to match that some report hanging and some (me) report failed init check.
Not something I'm capable of fixing.
comment:41 by , 9 years ago
Blocking: | 11275 added |
---|
comment:42 by , 9 years ago
With hrev49476, DHCP is now busted in VirtualBox 100% of the time for me. Progress!
comment:43 by , 9 years ago
Actually, wait, no it isn't. It does stick on "Configuring" with the DHCP status page showing nothing, but after a lot of gunk from DHCP goes through the syslog, it works. Weird...?
comment:46 by , 9 years ago
I did a pkgman full-sync to hrev49477 last night (on real hardware) and I still do not have DHCP. I'll download a anyboot image tonight and do a fresh install to another hard drive partition to see if it makes a difference.
follow-up: 49 comment:47 by , 9 years ago
If you still have problems, please reopen a ticket, as the issue reported here was fixed. Please include all the relevant information:
- IP address of the server
- Syslog
- Wireshark capture of the DHCP packets (you can type "bootp" in the filter field in Wireshark and press "Apply" to get only the DHCP packets) - this can be done from any Linux/Windows/OSX system on the same network since the DHCP packets are sent in broadcast.
comment:49 by , 9 years ago
Replying to pulkomandy:
If you still have problems, please reopen a ticket, as the issue reported here was fixed. Please include all the relevant information:
- IP address of the server
- Syslog
- Wireshark capture of the DHCP packets (you can type "bootp" in the filter field in Wireshark and press "Apply" to get only the DHCP packets) - this can be done from any Linux/Windows/OSX system on the same network since the DHCP packets are sent in broadcast.
Thanks for the detailed and concise instructions on how to troubleshoot any potential network issues. I'll give the latest nightly a try. Hopefully, I won't have to report anything further.
comment:50 by , 9 years ago
Verified working with a fresh install of hrev49477 x86_gcc2. Strange that a pkgman update to the same revision did not work.
Fwiw, a gcc2h hrev49404 build works fine here in VBox 5.0.0.