Opened 11 years ago

Closed 10 years ago

#2894 closed bug (fixed)

Input server crashes at boot on Amilo Li2735

Reported by: jackburton Owned by: axeld
Priority: critical Milestone: R1/alpha1
Component: Servers/input_server Version: R1/pre-alpha1
Keywords: Cc: fredrik.holmqvist@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description (last modified by stippi)

After haiku completes the boot process, the Tracker and Deskbar starts up, then after one/two seconds the input server crashes. 100% reproducible, although the stack trace changed once or twice. I think this started happening after hrev28241. I'm testing with hrev28244.

Attachments (9)

Immagine.png (467.3 KB) - added by jackburton 11 years ago.
input_server_crash.jpg (382.5 KB) - added by herdemir 11 years ago.
thread_info_75(input_server).jpg (453.5 KB) - added by herdemir 11 years ago.
bt_of_thread_75(input_server).jpg (294.7 KB) - added by herdemir 11 years ago.
last_part_of_syslog_thread_82.jpg (460.9 KB) - added by herdemir 11 years ago.
last_part_of_syslog_thread_82.2.jpg (460.9 KB) - added by herdemir 11 years ago.
sc_of_thread_82(add-on_manager).jpg (450.1 KB) - added by herdemir 11 years ago.
aldeck_bt.JPG (105.4 KB) - added by aldeck 11 years ago.
syslog.txt (104.1 KB) - added by herdemir 10 years ago.
syslog of touchpad recognition

Change History (34)

Changed 11 years ago by jackburton

Attachment: Immagine.png added

comment:1 Changed 11 years ago by jackburton

Owner: changed from korli to axeld

If I add a "return B_OK" before line 483 in

MouseInputDevice::_HandleMonitor(BMessage* message),

thus skipping the device removal, I don't get the crash anymore. Obviously something is double-freed or something like that. Assigning to Axel, who might know better.

comment:2 Changed 11 years ago by tqh

Cc: fredrik.holmqvist@… added

I see this as well.

comment:3 Changed 11 years ago by tqh

On a laptop with no mouse attached, just touchpad (synaptics, so give the new driver and stuff). Clevo 120TN-R: Intel DC 2,5GHz Intel 965 http://www.clevo.com.tw/en/products/prodinfo_2.asp?productid=63

comment:4 Changed 11 years ago by tqh

Tested from Rene Gollant suggestion: cd src/servers/input ; svn up -r 28240 svn up -r 28240 headers/os/add-ons/input_server/InputServerDevice.h

but that seemed to hang hard, as terminal cursor is not blinking. (Or mouse and keyboard don't work.)

comment:5 Changed 11 years ago by axeld

Milestone: R1R1/alpha1
Priority: normalcritical

Stefano, can you add some more debug output to the MouseDevice? I haven't yet tried on my laptop, but the other machines don't expose this problem.

Interesting would be which paths are added and removed in particular. At least I don't see anything particularly wrong with the code itself; it could of course also be a problem of the BPathMonitor, as this one wasn't used before.

comment:6 Changed 11 years ago by jackburton

I have some logs but they are on the other machine, which I forgot at home. Looking at the code I also noticed that in PathList.cpp, the path_entry constructor does not initialize the ref_count member. And the ref_count is nowhere else initialized.

comment:7 Changed 11 years ago by jackburton

btw, after typing "continue" for ten times or so, I'm able to use the system normally.

comment:8 Changed 11 years ago by axeld

That's a good hint the missing ref_count initialization is to blame here; since the input_server is restarted by the app_server when its gone, you have another chance to have ref_counts that don't bring it down again.

This should be fixed in hrev28277. Thanks for the investigation, and please close this bug if you can confirm it being fixed :-)

comment:9 Changed 11 years ago by jackburton

I'll try as soon as I can. In the meanwhile, maybe tqh could check if it's fixed.

Changed 11 years ago by herdemir

Attachment: input_server_crash.jpg added

Changed 11 years ago by herdemir

Changed 11 years ago by herdemir

comment:10 Changed 11 years ago by herdemir

Tested with hrev28277, but having a different crash now. Added additional images of the crash debug.

comment:11 Changed 11 years ago by jackburton

I had that crash too, also before this change.

comment:12 Changed 11 years ago by stippi

The thread crashes when trying to retrieve the next message. This usually happens when memory was corrupted before, for example when processing the previous message or during setup in case it's the first message it's trying to process.

comment:13 in reply to:  11 Changed 11 years ago by jackburton

Replying to jackburton:

I had that crash too, also before this change.

I meant, that's the other crash I was talking about when I wrote : "although the stack trace changed once or twice" in the description.

comment:14 Changed 11 years ago by herdemir

Seems the previous bug also did not dissappear completely. Additional debug output follows.

Changed 11 years ago by herdemir

Changed 11 years ago by herdemir

Changed 11 years ago by herdemir

comment:15 Changed 11 years ago by aldeck

Happens here too with hrev28289 on my dev laptop (Asus A8J) that worked nicely a week ago :) Backtrace from gdb follows.

Changed 11 years ago by aldeck

Attachment: aldeck_bt.JPG added

comment:16 Changed 11 years ago by jackburton

I suspect it's related to the touchpad. I have the laptop here, will try to supply some debug output when (if) I have time.

comment:17 Changed 10 years ago by herdemir

It is indeed related to the touchpad. I made some investigations on PS/2 recognition of touchpad found some lead, I hope. I first tried using VirtualBox, how a normal PS/2 mouse was found looking at it's syslog. It finds PS/2 mouse in first ps/s mouse probe. While in real hardware(touchpad) mouse probe tries 4 times to find the mouse which the first three failed to find a mouse. The last probe finds it and lets you use it. The crash occurs after unpublishing of failed ps2/mouse dev nodes. And input_server crashes after that. It was also finding ps/2 mouse like that, before. So the same problem was still there before hrev28241, but it wasn't crashing input_server. After hrev28241 it just helps to reveal a bug by crashing input_server.

Hope it helps.

Changed 10 years ago by herdemir

Attachment: syslog.txt added

syslog of touchpad recognition

comment:18 Changed 10 years ago by stippi

Description: modified (diff)

I am currently investigating this. Publishing my findings so far:

herdemir is correct: Somehow, the PS/2 driver publishes a mouse twice, even when no PS/2 mouse is attached at all. On the input_server side there will be an InputDeviceListItem created in _RegisterDevices(). Such objects have a member "fDevice" which is constructed in such a way that its "name" member points to memory by the original input_device_ref provided by the MouseDevice. Later, strcmp() to find the device is then called with the same pointers for the name, I don't know if that even works.

I've fixed this in my local tree, but I can still reproduce corrupted memory when I unplug my USB mouse. It always crashes in the heap management asserts the second time I re-plug the mouse (in _RegisterDevices()).

What also happens is that InputServer::_RegisterDevices() will not let you register the same device name twice. This is documented and correct behavior. But at least with the current implementation, if two devices are added with the same name, and the input_device_ref is deleted for the second instance in the MouseDevice, there will be a mix up and the InputDeviceListItem::fDevice::name member will point to freed memory. I don't know if that is what's actually happening though, because I don't see the output I added when removing devices. Here is some syslog output, stripped of unrelated messages:

KERN: loaded driver /boot/beos/system/add-ons/kernel/drivers/dev/input/ps2_hid
KERN: loaded driver /boot/beos/system/add-ons/kernel/drivers/dev/input/usb_hid

KERN: InputServer::RegisterDevices() device_ref: USB Keyboard 1

KERN: MouseInputDevice::_AddDevice(/dev/input/mouse/usb/0), name: Usb Mouse 1

KERN: InputServer::RegisterDevices() device_ref: Usb Mouse 1
KERN: InputServer::RegisterDevices() device_ref: Wacom Tablets

KERN: wacom: device_open() open: 2

KERN: ps2: devfs_publish_device input/mouse/ps2/0, status = 0x00000000
KERN: void AddOnManager::MessageReceived(BMessage *) what: NMP_
KERN: MouseInputDevice::_AddDevice(/dev/input/mouse/ps2/0), name: PS/2 Mouse 1
KERN: InputServer::RegisterDevices() device_ref: PS/2 Mouse 1

KERN: ps2: probe_mouse reset failed
KERN: ps2: probing mouse input/mouse/ps2/0 failed
KERN: void AddOnManager::MessageReceived(BMessage *) what: NMP_
KERN: MouseInputDevice::_AddDevice(/dev/input/mouse/ps2/0), name: PS/2 Mouse 1
KERN: InputServer::RegisterDevices() device_ref already exists: PS/2 Mouse 1

KERN: ps2: devfs_publish_device input/keyboard/at/0, status = 0x00000000
KERN: void AddOnManager::MessageReceived(BMessage *) what: NMP_

KERN: ps2: devfs_unpublish_device input/mouse/ps2/0, status = 0x00000000
KERN: InputServer::RegisterDevices() device_ref: AT Keyboard 1

KERN: ps2: keyboard found
KERN: void AddOnManager::MessageReceived(BMessage *) what: NMP_
KERN: InputServer::RegisterDevices() device_ref already exists: AT Keyboard 1

KERN: void AddOnManager::MessageReceived(BMessage *) what: NMP_
KERN: MouseInputDevice::_RemoveDevice(/dev/input/mouse/ps2/0), name: PS/2 Mouse 1
KERN: InputServer::UnregisterDevices() device_ref: PS/2 Mouse 1

comment:19 Changed 10 years ago by stippi

Axel, could you check if publishing a node in the devfs twice will still trigger a node monitor event B_ENTRY_CREATED the second time? Is this intended?

comment:20 Changed 10 years ago by RandomInsano

I had crashing problems at around the same time as the original bug as well, dumping me into gdb. I don't think the input server is to blame though since I was able to move my cursor (with a track point) and resize or drag the terminal window around (no UI response otherwise).

Now with build 28303 the problem still exists but I no longer get cursor movement when this happens. Also, I recently had a single bootup when the program in question didn't crash. Here's a link to my complete syslog: http://dl.getdropbox.com/u/128703/syslog

comment:21 Changed 10 years ago by stippi

Ok, I've found the sucker. Only took me all day. The problem was introduced in hrev28242 by switching to the BObjectList and configuring it to "own the contained objects". A RemoveItem() therefor already deletes the item, but the original code that deleted it was left in place. I will commit this soon after I have cleaned it all up again. I've also found a few other problems...

comment:22 Changed 10 years ago by stippi

Please confirm if hrev28321 fixes the problems.

comment:23 Changed 10 years ago by herdemir

Thanks stippi, it works again :)

comment:24 Changed 10 years ago by aldeck

Nice work stippi :) Works perfectly here!

comment:25 Changed 10 years ago by stippi

Resolution: fixed
Status: newclosed

Thanks for the feedback, guys!

Note: See TracTickets for help on using tickets.