Opened 11 years ago

Closed 6 years ago

Last modified 6 years ago

#9099 closed bug (fixed)

Unable to boot (KDL) with Atheros L1 Gigabit Ethernet (built in ASUS M51SN laptop)

Reported by: kvark Owned by: 3dEyes
Priority: normal Milestone: R1
Component: Drivers/Network/attansic_l1 Version: R1/Development
Keywords: atheros, ethernet, KDL Cc:
Blocked By: Blocking: #6694, #9179
Platform: x86

Description

Laptop Asus M51SN. Haiku crashes on boot inside the "net_server" code, while initializing Atheros L1 ethernet card. Safe mode didn't work either, hanging half-way through the boot (no KDL though), so I took the hardware info from Linux. See attachments.

Attachments (6)

Stack-page1.JPG (3.3 MB ) - added by kvark 11 years ago.
Stack trace - page 1
Stack-page2.JPG (3.2 MB ) - added by kvark 11 years ago.
Stack trace - page 2
lspci.txt (2.4 KB ) - added by kvark 11 years ago.
Linux lspci output
lshw.txt (20.7 KB ) - added by kvark 11 years ago.
Linux lshw output
listdev.txt (3.9 KB ) - added by kvark 11 years ago.
listdev log
syslog.txt (494.4 KB ) - added by kvark 11 years ago.

Change History (20)

by kvark, 11 years ago

Attachment: Stack-page1.JPG added

Stack trace - page 1

by kvark, 11 years ago

Attachment: Stack-page2.JPG added

Stack trace - page 2

by kvark, 11 years ago

Attachment: lspci.txt added

Linux lspci output

by kvark, 11 years ago

Attachment: lshw.txt added

Linux lshw output

comment:1 by kvark, 11 years ago

I found some suspicious code in Atheros ethernet driver. File: /haiku/src/add-ons/kernel/drivers/network/attansic_l1/dev/age/if_age.c Line: 3045 (looking from here: http://code.metager.de/source/xref/haiku/src/libs/compat/freebsd_network/fbsd_busdma_x86.c#792)

bus_dma_segment_t segs[1];

The host function (age_newbuf) allocates 1-element array of segments on the stack, while the receiver function (bus_dmamap_load_mbuf_sg) is free to use a number of them. That can easily cause page fault in the stack area (what is what happened, as far as I see).

I don't know what size this array should be, and I would prefer if the maximum array size was passed along to the lower-level functions to prevent over-writing. Unfortunately, I can't build haiku source at the moment to test this, because this laptop doesn't boot at all (as described in the bug).

comment:2 by kvark, 11 years ago

Component: Servers/net_serverDrivers/Network/attansic_l1
Owner: changed from axeld to 3dEyes
Summary: KDL on load with Atheros L1 Gigabit Ethernet (built in ASUS M51SN laptop)Unable to boot (KDL) with Atheros L1 Gigabit Ethernet (built in ASUS M51SN laptop)

comment:3 by kvark, 11 years ago

This seems to be a duplicate of 6694. However, this one has more info about the hardware, as well as actual ideas of fixing it.

comment:4 by diver, 11 years ago

Blocking: 6694 added

(In #6694) Marking it as duplicate since #9099 has more info about the hardware, as well as actual ideas of fixing it.

comment:5 by kvark, 11 years ago

Priority: highnormal

Correction: it loads just fine with the Safe Mode option checked (my app_server was just screwed up by my fault). More than that, I tried building the driver myself, and replacing the system one with it. It loads fine after that... The system was of rev44584 (x86gcc2hybrid), while the source is approximately from this august (sorry, no exact revision yet).

There is an interesting line in the syslog (full version will follow): KERN: [attansic_l1] (age) Read request size : 512 bytes. KERN: [attansic_l1] (age) TLP payload size : 128 bytes. KERN: bus_dmamem_alloc failed to align memory properly.

by kvark, 11 years ago

Attachment: listdev.txt added

listdev log

by kvark, 11 years ago

Attachment: syslog.txt added

comment:6 by kvark, 11 years ago

The code is from hrev43952 (output of "git describe --tags"), which is older than the live CD I used to install Haiku (hrev44584). The build configuration is default (is it x86gcc2hybrid?).

As for the following code: bus_dma_segment_t segs[1]; This is pretty much what we see in all other ethernet drivers, so the issue is unlikely to be there. Still, given the weird circumstances (built from code driver works), it's difficult to say what exactly is wrong.

comment:7 by kvark, 11 years ago

It turned out that the driver doesn't work when built from code. It just has a higher chance to load and not crash, but the connection still says "No Link". There are several things I tried to fix, all didn't help:

1) Occasionally, the callback passed to "bus_dmamap_load" is given EFBIG error code. The error basically means that the requested dma block was not found. I tried reducing the size of the buffers we allocate (from 256 elements to 128), which cures the EFBIG error but still doesn't fix the driver.

2) The driver code also assumes the callback is called instantly, which is a wrong assumption according to FreeBSD doc (http://www.freebsd.org/doc/en/books/arch-handbook/isa-driver-busmem.html). I tried deferring the result query, which seems to work (but still doesn't fix the driver).

3) Occasionally, the bus_dmamap_load reports a warning that it failed to align the memory. Changing the alignment of SMB and CMB blocks cures the message but doesn't fix the driver.

4) Finally, there were messages on the internet that people had problems with Atheros L1 under FBSD version 7 (same thing: No Link). They also claim it was fixed in 8. Hence, I grabbed and built the latest code from FBSD repository. This didn't help either.

I'm out of ideas for now.... Just going to try FBSD LiveCD to see what messages are in the log, and compare them with Haiku ones.

comment:8 by korli, 11 years ago

This looks a bit related with #8454. I'll try to provide a patch to fix some oddities met when testing #8454, it could eventually help with attansic_l1 too.

comment:9 by korli, 11 years ago

Aligned page allocation should be ok in hrev44773. Not sure it would help with attansic_l1 given it only tries to align on 1 byte.

comment:10 by luroh, 11 years ago

Blocking: 7665 added

comment:11 by diver, 11 years ago

Blocking: 9179 added

comment:12 by mmlr, 7 years ago

This might be the same as #9601 and should be checked after the change in hrev50755.

comment:13 by waddlesplash, 6 years ago

Resolution: fixed
Status: newclosed

No reply in 19 months, assuming fixed.

comment:14 by waddlesplash, 6 years ago

Blocking: 7665 removed
Note: See TracTickets for help on using tickets.