#14282 closed bug (fixed)

atheroswifi driver causes vm address fault at startup

Reported by: v.vill Owned by: waddlesplash
Priority: normal Milestone: Unscheduled
Component: Drivers/Network/atheroswifi Version: R1/Development
Keywords: Cc: waddlesplash
Blocked By: Blocking:
Has a Patch: no Platform: x86-64

Description (last modified by diver)

After upgrading to hrev52099, thus taking advantage of the updated atheroswifi driver (see #14249), the boot sequence ends in KDL:

KDL panic message & initial sc

I’m running Haiku x86_64 (EFI boot) with an AR9565 chipset wifi chip (vendor/device: 168c/036).

Attachments (4)

IMG_20180715_083643.jpg (1.1 MB ) - added by v.vill 15 months ago.
KDL stack trace
IMG_20180715_161645.jpg (879.4 KB ) - added by v.vill 15 months ago.
KDL panic message & initial sc
IMG_20180715_211112.jpg (1.3 MB ) - added by v.vill 15 months ago.
KDL syslog
IMG_20180715_211044.jpg (1.7 MB ) - added by v.vill 15 months ago.
KDL panic message & initial sc

Change History (23)

comment:1 by diver, 15 months ago

Panic message and stack trace is missing.

by v.vill, 15 months ago

Attachment: IMG_20180715_083643.jpg added

KDL stack trace

by v.vill, 15 months ago

Attachment: IMG_20180715_161645.jpg added

KDL panic message & initial sc

comment:2 by diver, 15 months ago

Description: modified (diff)

in reply to:  1 comment:3 by v.vill, 15 months ago

Replying to diver:

Panic message and stack trace is missing.

Indeed. I forgot to copy/paste my comment to #14249:

The KDL panic message is

PANIC: vm_page_fault: unhandled page fault in kernel space

Besides (not sure if it’s relevant), when booting by blacklisting the atheroswifi driver, I can see that the syslog includes many lines such as:

KERN: [atheroswifi] (ath_pci) ath_edma_rxfifo_alloc: Q1: alloc failed: i=1, nbufs=128?
KERN: [atheroswifi] (ath_pci) ath_edma_rxbuf_alloc: nothing on rxbuf?!

I’ve also typeset the KDL syslog as requested by waddlesplash:

	start_wlan: wlan started.
	atheroswifi: init_driver(0xffffffff81f155d0) at 12
	loaded driver /boot/system/add-ons/kernel/drivers/dev/net/atheroswifi
	sis19x:00.16.016:init_hardware::SiS19X:init_hardware()
	slab memory manager: created area 0xffffffffa5001000 (7631)
	[net/atheroswifi/0] compat_open(0x2)
 	[net/atheroswifi/0] ieee80211_init
 	[net/atheroswifi/0] start running, 1 vaps running
 	[net/atheroswifi/0] ieee80211_start_locked: up parent
 	[net/atheroswifi/0] start running, 1 vaps running
 	[net/atheroswifi/0] iee80211_new_state_locked: INIT -> SCAN (nrunning 0 nscanning 0)
 	net/atheroswifi/0: media change, media 0x200a0 quality 1000 speed 1000000000
 	vm_page_fault: vm_soft_fault returned error ’Bad address’ on fault at 0xffffffff81f78044, ip 0xffffffff81e06fb7, write 0, user 0, thread 0x437
 	kdebug>
Last edited 15 months ago by v.vill (previous) (diff)

comment:4 by waddlesplash, 15 months ago

@diver, putting the screenshot in the ticket description isn't amazingly helpful; actually I prefer it just as an attachment.

I’ve also typeset the KDL syslog as requested by waddlesplash:

You didn't need to do that, the screenshot alone was enough :)

Besides (not sure if it’s relevant), when booting by blacklisting the atheroswifi driver, I can see that the syslog includes many lines such as:

I'm confused. Is this previous_syslog? Or do you have an atheros driver in non-packaged? Because if you do, that would probably be the issue here.

The lines in your message are the same as korli's in #14273, and are said by him to be harmless. So that's probably unrelated.

in reply to:  4 comment:5 by diver, 15 months ago

@diver, putting the screenshot in the ticket description isn't amazingly helpful; actually I prefer it just as an attachment.

The screenshot in a description makes it easy to see if there are other tickets with the same back trace. Quite handy when one wants to close a bunch of dups.

in reply to:  4 comment:6 by v.vill, 15 months ago

Replying to waddlesplash:

I'm confused. Is this previous_syslog? Or do you have an atheros driver in non-packaged? Because if you do, that would probably be the issue here.

Oh, indeed. I had tried with the last build you attached to #14249, and forgot to remove it. But the problem was already there prior to that, and it remains now even though I’ve removed it. (See updated attachments.)

by v.vill, 15 months ago

Attachment: IMG_20180715_211112.jpg added

KDL syslog

by v.vill, 15 months ago

Attachment: IMG_20180715_211044.jpg added

KDL panic message & initial sc

comment:7 by Max-Might, 15 months ago

Is this another instance of drivers failing to access the eeprom? We have this happening with other drivers too.

comment:8 by waddlesplash, 15 months ago

No. The only drivers that fail to access the EEPROM are the 3945 and 4965 drivers; and they were crashing with NULL-dereferences. This one appears to be a buffer overflow, and it's not related to EEPROMs at all.

comment:9 by waddlesplash, 15 months ago

Hmm, actually, maybe this is related. I've had a chance to investigate somewhat, and you're right, this does go through the EEPROM code. I'm still not sure why this one has an actual pointer instead of a NULL, though.

comment:10 by waddlesplash, 15 months ago

Please retest after hrev52114.

comment:11 by v.vill, 15 months ago

No dice. :-(

comment:12 by v.vill, 15 months ago

Hello again,
what I can’t quite wrap my head around is that this driver initially *did* work, as of hrev52099 (though I did have a few problems because it appeared to conflict with the Ethernet interface, and I had to rm the /boot/system/settings/network/interfaces config file, but the driver was indeed loaded as it should be). So I’m left to wonder where exactly this faulty addressing issue comes from.

I’ve tried rebooting to various earlier builds, but without success; in a nutshell:

  • if hrev < 52090 (roughly), the driver isn’t loaded at all
  • if hrev > 52096, the driver _is_ loaded but boot ends up in KDL (with the exact same overflow as reported above).
  • I couldn’t reproduce the blissful 24 hours where it did work as intended, even with a couple of reboots.

Another factor (FWIW) might be that before this got merged onto the master branch, I tried a couple of out-of-tree builds (in /system/non-packaged/add-ons/). I removed them afterwards, though.
Could there be some persistent cache file/corrupted firmware or whatever that I could manually clean up elsewhere in the filesystem?

comment:13 by waddlesplash, 15 months ago

I couldn’t reproduce the blissful 24 hours where it did work as intended, even with a couple of reboots.

Can you try booting from a live image, i.e., not your install partition as it is now? If it is some configuration issue, but it did work at one point, then it may work in that state.

I removed them afterwards, though. Could there be some persistent cache file/corrupted firmware or whatever that I could manually clean up elsewhere in the filesystem?

No, there is no firmware for AR9xxx devices, and there should not be a cache or anything like that. There may be a settings file (but I don't know how it would have been created, if there is).

comment:14 by waddlesplash, 15 months ago

Someone else just reported this on IRC:

10:02 PM — mikeyloveslignux uploaded an image: VectorImage_2018-07-30_100158.jpg (4888KB) < https://matrix.org/_matrix/media/v1/download/matrix.org/IMPqrlOxvvhkwpQjJtcbqBJJ >

I may be able to borrow an IdeaPad S210, which according to some specs pages I found has a 956x chip in it. So I may be able to reproduce this one myself.

Last edited 15 months ago by waddlesplash (previous) (diff)

comment:15 by waddlesplash, 15 months ago

The same individual reports from IRC that it may be that Linux is initializing the WiFi device; i.e., it works from a warm reboot, but not from a cold one. @v.vill, if you could warm-reboot from whatever other OS you use and see if it works then, that would be very interesting.

in reply to:  15 comment:16 by v.vill, 15 months ago

Replying to waddlesplash:

@v.vill, if you could warm-reboot from whatever other OS you use and see if it works then, that would be very interesting.

I tried; unfortunately I still get the same KDL message, no matter how cold or warm the card may be :-)

Also from waddlesplash:

Can you try booting from a live image, i.e., not your install partition as it is now? If it is some configuration issue, but it did work at one point, then it may work in that state.

With the latest nightly anyboot image (hrev52139) on a USB drive I get stuck on the Boot disk icon. (Obviously unrelated to the problem at hand; I’ll need to investigate with other drives or alternate EFI settings.)

No, there is no firmware for AR9xxx devices, and there should not be a cache or anything like that. There may be a settings file (but I don't know how it would have been created, if there is).

Yeah, I tried looking for something like that. Haven’t found anything yet.

comment:17 by waddlesplash, 15 months ago

7:37 PM <adrian> weird
7:37 PM <adrian> ok, so print out the reg_read value
7:38 PM <adrian> when you panic
7:38 PM <adrian> i'm curious what value it tried reading
7:38 PM <adrian> maybe something is massivel out of bounds leading to the page fault

I'll take a look.

comment:18 by v.vill, 14 months ago

Greetings,

now we seem to be getting somewhere: I've managed to reformat my BeFS partition and reinstall a new, clean image,... and no KDL! The atheroswifi driver loads correctly, and I can even see the SSID list appear. However I've stumbled upon a couple of issues that may or may not be related to this specific driver (but I hadn't seen these happening with other devices, nor could I find these exact reports in the bug tracker, fwiw). Just in case, I've opened separate tickets for these new problems: see #14348 and #14349.

Thanks for your patience!

comment:19 by waddlesplash, 14 months ago

Resolution: fixed
Status: newclosed

Yes, those are indeed separate issues.

My guess is that this was fixed by hrev52204 (or its parent commit) then. Nice!

Note: See TracTickets for help on using tickets.