Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#14266 closed bug (fixed)

iprowifi4965 : wireless network disappears before login

Reported by: ttcoder Owned by: waddlesplash
Priority: normal Milestone: R1/beta2
Component: Drivers/Network/iprowifi4965 Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

On boot-up complete the network is listed at first; I select it and enter the password when prompted a few seconds later. But then the password dialog re-appears.. Probably because by then the network has "disappeared":

IOW, my wifi router's network gets listed at first, then disappears in a few seconds, never to re-appear for that session.

This is with today's nightly. Could also try an "ancient" nightly if that'd help.

Attachments (2)

syslog_t410 (22.8 KB ) - added by ttcoder 6 years ago.
wpa_tracing_t410.txt (14.4 KB ) - added by ttcoder 5 years ago.
terminal tracing: failure to connect to livebox

Download all attachments as: .zip

Change History (18)

comment:1 by ttcoder, 6 years ago

This is on the family lenovo T410. Probably my last wifi ticket as I've made the rounds of the wifi-capable 'puters here :-)

More:

Haiku shredder 1 hrev52094 Jul 11 2018 22:43:35 BePC x86 Haiku

device Network controller [2|80|0]
  vendor 8086: Intel Corporation
  device 4239: Centrino Advanced-N 6200

~/> ls /system/settings/network/
hostname  hosts  resolv.conf  services

PS - anyone else seeing media_addon_server crashing on boot? I'll file a ticket if this persists; or maybe this is just specific to booting from a (slow) USB thumbdrive

Version 0, edited 6 years ago by ttcoder (next)

by ttcoder, 6 years ago

Attachment: syslog_t410 added

comment:2 by waddlesplash, 6 years ago

Yes, a newer firmware might be useful, so please do try that. But it looks like this isn't a driver problem; it seems that the "station deauth" because "sending STA is leaving/has left IBSS or ESS" may be related to your access point doing channel-hopping: https://supportforums.cisco.com/t5/small-business-wireless/wap321-sta-is-leaving-ibss-or-ess/td-p/2349232

If this is indeed the case then this isn't a driver bug, and is instead a router bug or just traffic congestion causing the router to behave strangely, and would explain #14260 and #14258 as well.

More information could be had by adding a #define IWN_DEBUG just above here and then editing line 54 of that same file to comment out the if-test (i.e. leave printf, make it occur unconditionally).

comment:3 by waddlesplash, 6 years ago

Indeed, the same "station deauth" message is in #14260:

KERN: [net/iprowifi3945/0] [d0:ae:ec:3e:a9:b0] station deauth via MLME (reason: 3 (sending STA is leaving/has left IBSS or ESS))

So unless net80211 or the driver are incorrectly reporting MLME states (unlikely, more would be broken) it seems your access point is the problem here.

comment:4 by ttcoder, 6 years ago

Re. firmware,

  • for a start, I tried the first of the six or so files available on that site, behaved the same though.
  • ~EDIT: note to self: if that becomes relevant, gotta bring up the fact that the behavior is the same also if the firmware is black-listed and no replacement is provided; that prompted me to look at the mapping but it looks ok.. ~

Re. the Cisco support hints,

based on ""It looks like your WAP may be set to auto mode for the channel. You will get disconnects like that when the WAP automatically changes  channels"" I went to my router (which is a france telecom "LiveBox") config > Wifi > Advanced. There is a channel setting there indeed. It was set to "Auto". I've set it to a fixed channel among the 14 proposed. Will report again (probably tomorrow) about any change.

Last edited 6 years ago by ttcoder (previous) (diff)

comment:5 by ttcoder, 6 years ago

No change with channel-hopping disabled (router hardcoded to channel 6): the wifi network remains listed as long as I don't connect; then it disappears when I try to connect to it (very Heisenberg'ish of him :-), together with that strange media_addon_server crash too.

Anyway no big deal to me if I don't get wifi on that particular laptop; might revisit the above to-dos later for the sake of helping with haiku's compatibility though.

comment:6 by waddlesplash, 6 years ago

If the firmware is blacklisted and no replacement is provided, then are you sure it's loading the firmware you believe it is? Most chipsets should just not work at all without firmware (from a *cold* boot anyway.)

in reply to:  6 comment:7 by ttcoder, 6 years ago

Tried again from cold boot twice, and indeed, nothing gets listed in that case. So it was a case of having booted into a firmware-enabled install, then doing a warm reboot to a firmware-less one. Gotta keep that in mind in the future. So back on track with experimenting alternative firmwares. (and I might do a cold boot each time I try a new version to be sure of the result, this preserve-state-across-warm-boots business is kinda unnerving..!)

comment:8 by waddlesplash, 5 years ago

Please retest with a more recent build; it's possible wpa_supplicant changes fixed this.

comment:9 by ttcoder, 5 years ago

Cold booted into a USB stick upgraded from 520xx to 52539.

No joy. We're typically using this with wired ethernet though, so no biggie AFAIC.

Details:

  • uname -a :

Haiku shredder 1 hrev52539 Nov 16 2018 21:27:53 BePC x86 Haiku

  • changed: the wifi net no longer disappears from the list (in the Deskbar replicant).. unless I 'Cancel' the login window, at the end of test (below).
  • unchanged: can't connect; the login window comes back and back and back, at intervals of 10+ secs.

Gave up after 20-30 attempts, ending the wireless test and going wired instead to post this. Possibly same "station deauth" symptoms as before:

	KERN: wlan_control: 9234, 21
	KERN: [net/iprowifi4965/0] [d0:ae:ec:3e:a9:b0] station deauth via MLME (reason: 3 (sending STA is leaving/has left IBSS or ESS))
	KERN: [net/iprowifi4965/0] ieee80211_new_state_locked: SCAN -> INIT (nrunning 0 nscanning 0)
	KERN: [net/iprowifi4965/0] ieee80211_newstate_cb: SCAN -> INIT arg 3
  • media_addon_server crash is not related to anything here, just due to HDA not being supported on this laptop (see my other ticket)
  • early during the test, I got greeted by a KDL "attempting to clone kernel area".. It was continuable (!) so ended up logged, luckily: (was followed by the userland media srv crash that always occurs on this machine):
KERN: PANIC: attempting to clone kernel area "dpc: normal priority_14_kstack" (296)!
KERN: Welcome to Kernel Debugging Land...
KERN: Thread 771 "HD Audio control" running on CPU 2
KERN: stack trace for thread 771 "HD Audio control"
KERN:     kernel stack: 0xdde12000 to 0xdde16000
KERN:       user stack: 0x71773000 to 0x717b3000
KERN: frame               caller     <image>:function + offset
KERN:  0 dde15d48 (+  32) 8014ea8e   <kernel_x86> arch_debug_stack_trace + 0x12
KERN:  1 dde15d68 (+  16) 800a900f   <kernel_x86> stack_trace_trampoline(NULL) + 0x0b
KERN:  2 dde15d78 (+  12) 80140222   <kernel_x86> arch_debug_call_with_fault_handler + 0x1b
KERN:  3 dde15d84 (+  48) 800aab37   <kernel_x86> debug_call_with_fault_handler + 0x5b
KERN:  4 dde15db4 (+  64) 800a922b   <kernel_x86> kernel_debugger_loop([34m0x80192697[0m [36m"PANIC: "[0m, [34m0x801a7a80[0m [36m"attempting to clone kernel area "%s" (%ld)!"[0m, [34m0xdde15e60[0m [36m  m, int32: [34m2[0m) + 0x217
KERN:  5 dde15df4 (+  48) 800a95a7   <kernel_x86> kernel_debugger_internal([34m0x80192697[0m [36m"PANIC: "[0m, [34m0x801a7a80[0m [36m"attempting to clone kernel area "%s" (%ld)!"[0m, [34m0xdde15e60[0m [ [0m, int32: [34m2[0m) + 0x53
KERN:  6 dde15e24 (+  48) 800aaec2   <kernel_x86> panic + 0x3a
KERN:  7 dde15e54 (+ 128) 8012106f   <kernel_x86> vm_clone_area + 0x1cb
KERN:  8 dde15ed4 (+ 112) 80128d4b   <kernel_x86> _user_clone_area + 0xa3
KERN:  9 dde15f44 (+ 100) 80142def   <kernel_x86> handle_syscall + 0xdc
KERN: user iframe at 0xdde15fa8 (end = 0xdde16000)
KERN:  eax 0xc9          ebx 0x8a8330       ecx 0x717addfc  edx 0x608a7114
KERN:  esi 0x717adeac    edi 0x717aded8     ebp 0x717ade38  esp 0xdde15fdc
KERN:  eip 0x608a7114 eflags 0x3202    user esp 0x717addfc
KERN:  vector: 0x63, error code: 0x0
KERN: 10 dde15fa8 (+   0) 608a7114   <commpage> commpage_syscall + 0x04
KERN: 11 717ade38 (+ 160) 00be99e0   <libmedia.so> __7BBufferRC17buffer_clone_info + 0x1bc
KERN: 12 717aded8 (+ 128) 00c1d5e7   <libmedia.so> BPrivate::BufferCache<[32m0x18375108[0m>::GetBuffer(int32: [34m1[0m) + 0xa7
KERN: 13 717adf58 (+ 640) 00bead01   <libmedia.so> BBufferConsumer<[32m0x183995f8[0m>::HandleMessage(int32: [34m774[0m, [34m0x717ae218[0m, uint32: [34m0xc8[0m ([34m200[0m)) + 0x1a5
KERN: 14 717ae1d8 (+16448) 00bfb273   <libmedia.so> BMediaNode<[32m0x18399c74[0m>::WaitForMessage(int64: [34m9223372036854749042[0m, uint32: [34m0x0[0m ([34m0[0m), NULL) + 0x1e7
KERN: 15 717b2218 (+ 208) 00bf5ca4   <libmedia.so> BMediaEventLooper<[32m0x183996dc[0m>::ControlLoop() + 0x254
KERN: 16 717b22e8 (+  64) 00bf60eb   <libmedia.so> BMediaEventLooper<[32m0x183996dc[0m>::_ControlThreadStart(NULL) + 0x37
KERN: 17 717b2328 (+  48) 007faa7f   <libroot.so> _get_next_team_info (nearest) + 0x5f
KERN: 18 717b2358 (+   0) 608a7258   <commpage> commpage_thread_exit + 0x00
KERN: kdebug> kdebug> kdebug> co771: DEBUGGER: BufferCache::GetBuffer: IDs mismatch

comment:10 by waddlesplash, 5 years ago

Gave up after 20-30 attempts, ending the wireless test and going wired instead to post this. Possibly same "station deauth" symptoms as before:

Can you kill wpa_supplicant and run it in a terminal, and paste the error messages that it gives here? That's more interesting than the syslog in this case.

early during the test, I got greeted by a KDL "attempting to clone kernel area".. It was continuable (!) so ended up logged, luckily: (was followed by the userland media srv crash that always occurs on this machine):

Non-page-fault KDLs typically are continuable, yes.

KERN: PANIC: attempting to clone kernel area "dpc: normal priority_14_kstack" (296)!

Um... Woah. That's the Media Kit attempting to clone a kernel stack area, which definitely should never occur. It's entirely possible this is the source of at least some of the weird memory corruption you sometimes see. I wondered if there were bugs like this that introducing that panic would uncover, and it seems there are :)

I'm not sure why it tries to clone that ID, but what's stranger is that should just return B_NOT_ALLOWED, and then BBuffer::BBuffer should exit early followed by GetBuffer returning NULL as Data() should be NULL since the clone failed, except somehow it isn't...?

by ttcoder, 5 years ago

Attachment: wpa_tracing_t410.txt added

terminal tracing: failure to connect to livebox

comment:11 by ttcoder, 5 years ago

Today's sequence: (no KDL this time):

Boot 1:

  • attempt connection (fails)
  • kill wpa_supplicant
  • launch in terminal
  • realize that the wifi net is gone, oops! reboot..

Boot 2:

  • pre-emptively launch wpa_supplicant (there was no instance, I guess it's run-on-demand)
  • select net, enter password, capture terminal output
  • control-C
  • notice that the wifi net is gone from the list (in the Deskbar replicant) again.. Fortunately this time I captured output before hand.

See attached. Those 2 lines look ominous:

 /dev/net/iprowifi4965/0: Authentication with d0:ae:ec:3e:a9:b0 timed out.
 Added BSSID d0:ae:ec:3e:a9:b0 into blacklist

comment:12 by ttcoder, 5 years ago

The BBuffer ctor seems to also call clone_buffer() here : http://xref.plausible.coop/source/xref/haiku/src/kits/media/SharedBufferList.cpp#62

It's indirect though (done through class SharedBufferList), so not consistent with the above backtrace.. Or maybe the code is inlined by GCC?

Note to self: next time, trigger the media_addon_server crash first thing, before even looking at wpa_supplicant; that's how I got the KDL first time, IIRC.

comment:13 by waddlesplash, 5 years ago

Please retest after hrev52730.

comment:14 by ttcoder, 5 years ago

Warm booted into hrev52801, was logged in effortlessly at first try.

And couldn't get media_addon_server to crash either in a few minutes of testing, if that's significant. Between this machine and another (more recent) thinkpad with ipro4965 whose behavior has vastly improved, I'd say this is nailed.

comment:15 by waddlesplash, 5 years ago

Resolution: fixed
Status: newclosed

Refactoring the 802.11 init code seems to be the gift that keeps on giving. Awesome!

comment:16 by nielx, 4 years ago

Milestone: UnscheduledR1/beta2

Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone

Note: See TracTickets for help on using tickets.