Opened 15 months ago
Closed 14 months ago
#18550 closed bug (duplicate)
dec21xxx Network driver is broken since GCC upgrade.
Reported by: | jmairboeck | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Drivers/Network | Version: | R1/Development |
Keywords: | Cc: | The_Ringmaster, korli | |
Blocked By: | Blocking: | #18593 | |
Platform: | All |
Description
This is hrev57208.
Since the upgrade to GCC 13, the dec21xxx
network driver is broken. This is used as the "legacy" network adapter in Hyper-V.
It results in the following KDL on boot:
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 793 "fbsd callout" running on CPU 0 stack trace for thread 793 "fbsd callout" kernel stack: 0xffffffff81bdc000 to 0xffffffff81be1000 frame caller <image>:function + offset 0 ffffffff81be0928 (+ 24) ffffffff8014955c <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff81be0940 (+ 80) ffffffff800b2e08 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff81be0990 (+ 96) ffffffff800b44b4 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff81be09f0 (+ 80) ffffffff800b484e <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff81be0a40 (+ 240) ffffffff800b4ba7 <kernel_x86_64> panic + 0xb7 5 ffffffff81be0b30 (+ 856) ffffffff8014ae1c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff81be0e88 (end = 0xffffffff81be0f50) rax 0x2842022 rbx 0xffffffff821d3c00 rcx 0x0 rdx 0xffffffff815d3000 rsi 0xffffffff815d3030 rdi 0xffffffff94bd3d6e rbp 0xffffffff81be0f70 r8 0xfffffffffffffffd r9 0xfffffffffffffffc r10 0xfffffffffffffff8 r11 0x85 r12 0xffffffff94bd3d10 r13 0xffffffff9495172e r14 0x7fffffffffffffff r15 0xffffffff821d4168 rip 0xffffffff81bb5880 rsp 0xffffffff81be0f50 rflags 0x10286 vector: 0xd, error code: 0x0 6 ffffffff81be0e88 (+ 232) ffffffff81bb5880 </boot/system/add-ons/kernel/drivers/dev/net/dec21xxx> tulip_txprobe.isra.0 + 0x110 7 ffffffff81be0f70 (+ 64) ffffffff81bc1b2e </boot/system/add-ons/kernel/drivers/dev/net/dec21xxx> callout_thread(void*) + 0x11e 8 ffffffff81be0fb0 (+ 32) ffffffff8008bd77 <kernel_x86_64> common_thread_entry(void*) + 0x37 9 ffffffff81be0fd0 (+2118250544) ffffffff81be0fe0 12388:fbsd callout_793_kstack@0xffffffff81bdc000 + 0x4fe0 kdebug>
This was already reported in ticket:18541:9 by The_Ringmaster, but this should be a separate ticket. I just encountered the same problem.
Note: the dec21xxx
driver is missing from the component selection in Trac.
Attachments (2)
Change History (29)
comment:1 by , 15 months ago
by , 15 months ago
Attachment: | bootlog.txt added |
---|
comment:3 by , 15 months ago
Using a self-compiled version of dec21xxx
with -O0
for the driver and libfreebsd_network.a
boots successfully.
I used the host compiler (configure
without arguments), which is currently gcc 13.1 (2023_06_20).
comment:4 by , 15 months ago
Using -O0
only for dec21xxx
is sufficient apparently. For freebsd_network
it is not needed. I suspected that already because the same system works fine using ipro1000
in VirtualBox. (This is still my "portable" Haiku system which was once bare metal but now is only a hard disk on a USB adapter because that laptop died a few years ago ...)
by , 15 months ago
Attachment: | bootlog-O1.txt added |
---|
comment:5 by , 15 months ago
With -O1
I get an SMEP violation and page faults, but not anywhere in dec21xxx
(see attached log). I continued a few times until there were no more different panics.
I suspect that the scheduler related stuff is expected by pausing in the kernel debugger.
comment:6 by , 15 months ago
The driver works apparently with -O1
when compiled with gcc 13.2. However, the original "General protection exception" KDL still occurs identically.
comment:8 by , 15 months ago
(At the KDL prompt, I mean, so we can see the faulting instruction and disassembly.)
comment:9 by , 15 months ago
kdebug> dis -b 6
[*READ/WRITE FAULT (?), pc: 0xffffffff800b9bf2 *] kdebug>
comment:11 by , 15 months ago
I just noticed that the de
driver, which is the part that is used in Hyper-V apparently, has been deprecated and was removed from FreeBSD 13. See https://github.com/freebsd/fcp/blob/master/fcp-0101.md
What does this mean for Haiku?
Note that dc
(which is also contained in the same Haiku driver) is not affected by this.
comment:12 by , 15 months ago
For now, FreeBSD's APIs haven't changed too much, and so it's not hard to keep around. Hopefully that won't change in the future.
Can you poke around in KDL and try and find out what (if anything) is at the fault memory address, i.e. what area it's in?
comment:13 by , 15 months ago
I tried compiling just the tulip_txprobe
function with O1
or O0
(using #pragma GCC optimize
). This "fixes" the General Protection Exception, but I get the SMEP violation and page faults instead in both cases.
Trying to find the area of the involved addresses (with area contains <address>
) just gives the same READ/WRITE FAULT
message as before.
comment:14 by , 15 months ago
Applying -O1
or -O0
to the whole file if_de.c
doesn't help either. The outcome is the same as above.
comment:15 by , 15 months ago
It's possible that something is trying to jump to an invalid address, though I've no idea how that could be the case here.
I'll set aside some time to look through the disassembly under O1 vs O2 and see if anything jumps out.
follow-up: 19 comment:16 by , 15 months ago
Applying -O1 or -O0 to the whole file if_de.c doesn't help either.
Where did you apply it? Before, or after the includes block? If after, please try before.
Please also apply it to the glue_de.c, if that makes no difference. If applying both still causes the SMAP violation, but the same flag specified on the command line fixes the problem, then I would suspect the flag is not getting applied correctly.
comment:17 by , 15 months ago
So, inspecting the disassembly, the primary difference between -O2 and -O1 is that the -O2 version uses SSE2 registers and instructions, probably most notably pshufd
. Thus it may be interesting to try -mno-sse2
and maybe even also -mno-sse
on the whole file.
comment:18 by , 15 months ago
Cc: | added |
---|
CC korli: both this and #18541 have XSAVEC enabled, perhaps that or something else related to FPU state is involved?
(It appears my VMware setup, which has an ipro1000 device, also has XSAVEC enabled but there are no problems there.)
comment:19 by , 15 months ago
Replying to waddlesplash:
Applying -O1 or -O0 to the whole file if_de.c doesn't help either.
Where did you apply it? Before, or after the includes block? If after, please try before.
Please also apply it to the glue_de.c, if that makes no difference. If applying both still causes the SMAP violation, but the same flag specified on the command line fixes the problem, then I would suspect the flag is not getting applied correctly.
I added CCFLAGS on [ FGristFiles if_de.o ] = -O0 ;
just before the SubDirCcFlags
line. I checked with jam -dx
and the flag seemed to be applied correctly. I'll try adding glue_de
too.
comment:20 by , 15 months ago
Now this is weird: I tried adding -O0
to SubDirCcFlags again (just the glue file didn't work), and now it doesn't work any more too. This did work before (see above). I still get the SMEP violation.
comment:21 by , 15 months ago
Now I booted it again and it works! It seems like the SMEP violation doesn't always occur (which makes debugging it quite a bit harder ...).
comment:22 by , 14 months ago
Blocking: | 18593 added |
---|
comment:25 by , 14 months ago
A freshly downloaded image of hrev57287 did boot successfully in Hyper-V (app_server started and it showed the FirstBootPrompt). However, I couldn't test anything else yet because capturing the mouse in Hyper-V doesn't work over Remote Desktop apparently.
comment:26 by , 14 months ago
I tested this again today and it works now. The network is still very slow, however.
comment:27 by , 14 months ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Well, that's a separate problem.
The_Ringmaster: I find it odd that you're still getting a KDL ... are you sure you upgraded past the fix revision? If so, and it still happens, then you can open a new ticket I suppose, or we can reopen your old one (if it's really the same.)
The_Ringmaster's originally posted a screenshot [here]https://discuss.haiku-os.org/t/kdl-crash-when-installing-anything-via-pkgman/13789 on the forum, where he says that went away. Since you tested with current nightly, it seems not.