Opened 17 months ago

Closed 17 months ago

Last modified 17 months ago

#9148 closed bug (fixed)

Enumerating interfaces hangs in kernel on first boot in safe mode

Reported by: mmadia Owned by: anevilyak
Priority: high Milestone: R1/beta1
Component: Network & Internet/Stack Version: R1/alpha4
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

As mentioned in 9128#comment:7, NetworkStatus is capable of locking Deskbar when net_server is not running.

While thinking of the various combinations isFirstBoot, isReadOnly, isSafeMode along with what the user may do in those scenarios (e.g., boot CD into safemode and install to a writable device vs. dd anyboot to a USB stick and do a first boot into safemode, as Haiku didn't fully boot in normal mode), the most complete solution I can think of is to modify NetworkStatus to gracefully install itself into Deskbar when net_server is not present.

Perhaps a new network status icon could be made to indicate the lack of net_server? Or would that qualify as kNetworkStatusNoConnection?

Attachments (4)

bt.png (35.4 KB) - added by augiedoggie 17 months ago.
backtrace of deskbar window (requested by AnEvilYak)
bt2.png (27.3 KB) - added by augiedoggie 17 months ago.
backtrace of deskbar window right after boot
mutex_info.png (40.5 KB) - added by augiedoggie 17 months ago.
gozer.diff (559 bytes) - added by augiedoggie 17 months ago.
diff by anevilyak that seems to fix the problem

Download all attachments as: .zip

Change History (13)

comment:1 Changed 17 months ago by anevilyak

I've looked into this code and so far I don't really see how it could cause a hang. Nothing it does while installing itself in Deskbar nor in its constructor/AttachedToWindow() in any way talks to net_server (all of it is done via ioctls directly to the network stack at that point), so that doesn't appear to be the reason it's hanging. When it does communicate with net_server later, it uses BMessengers which would fail immediately if net_server isn't actually running. I don't really see any logical reason for it to block like this unless one of the ioctls is itself blocking, which would make it more of a kernel/network stack problem.

Changed 17 months ago by augiedoggie

backtrace of deskbar window (requested by AnEvilYak)

comment:2 Changed 17 months ago by anevilyak

  • Component changed from Applications/NetworkStatus to Kits/Network Kit
  • Status changed from new to assigned
  • Summary changed from Must be able to install itself in Deskbar w/o net_server running. to Enumerating interfaces hangs in kernel on first boot in safe mode

Based on augiedoggie's results in virtualbox (as seen in attachment:bt.png), the problem does indeed seem to reside in the network stack. Switching component/description.

comment:3 Changed 17 months ago by anevilyak

  • Component changed from Kits/Network Kit to Network & Internet/Stack

Changed 17 months ago by augiedoggie

backtrace of deskbar window right after boot

comment:4 Changed 17 months ago by augiedoggie

The first backtrace was after I had attempted to kill the NetworkStatus thread. Uploaded a new one that was taken right after boot although it points to the same area.

comment:5 Changed 17 months ago by bonefish

@augiedoggie: Please enter the kernel debugger, get the info for the thread (thread -s <thread ID>), get the info for the mutex it is waiting on (mutex <mutex address>) and a stack trace for the holder of the mutex.

Changed 17 months ago by augiedoggie

comment:6 follow-up: Changed 17 months ago by anevilyak

Interestingly the mutex seems to have been smashed/corrupted. The only one net_timer acquires is sTimerLock, which is initialized to the name "net timer", which doesn't appear in the mutex info.

comment:7 in reply to: ↑ 6 Changed 17 months ago by bonefish

Replying to anevilyak:

Interestingly the mutex seems to have been smashed/corrupted. The only one net_timer acquires is sTimerLock, which is initialized to the name "net timer", which doesn't appear in the mutex info.

The culprit is obviously uninit_timers(). It waits for the timer thread only after destroying the mutex, which might still be used by the former.

Changed 17 months ago by augiedoggie

diff by anevilyak that seems to fix the problem

comment:8 Changed 17 months ago by anevilyak

  • Owner changed from axeld to anevilyak
  • Status changed from assigned to in-progress

comment:9 Changed 17 months ago by anevilyak

  • Resolution set to fixed
  • Status changed from in-progress to closed

Fixed in hrev44832, thanks for helping track it down augiedoggie!

Last edited 17 months ago by anevilyak (previous) (diff)
Note: See TracTickets for help on using tickets.