Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#4053 closed bug (fixed)

USB MIDI not recognized dynamically

Reported by: Pete Owned by: phoudoin
Priority: normal Milestone: R1
Component: Servers/midi_server Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Now that usb_midi is in the distributed image, I tried plugging in my M-Audio 'Uno', but it doesn't show up -- either in PatchBay or /dev/midi. (It does appear as /dev/bus/usb/0/0)

Looking at syslog, I see the following:

KERN: usb hub 2: port 1: new device connected
KERN: usb_midi: init_hardware() Jun 30 2009 15:19:22
KERN: usb_midi: init_driver() Jun 30 2009 15:19:22
KERN: usb_midi: device_added()
KERN: usb_midi: vendor ID 0x0763, product ID 0x0150
KERN: usb_midi: interface 0, alt : 0: class 1, subclass 1, protocol 0
KERN: usb_midi: interface 1, alt : 0: class 1, subclass 3, protocol 0
KERN: usb error uhci 0: td (0x020ab5c0) error: status: 0x194007ff; token: 0xffe80369;
KERN: usb_midi: set_alt_interface() failed -2147442667
KERN: usb_midi: init_driver() OK
KERN: usb_midi: publish_devices()
KERN: usb_midi: uninit_driver()

Attachments (5)

usb_midi_fault_output (3.8 KB ) - added by Pete 15 years ago.
fragment of serial output showing KDL when usb cable is unplugged
usb_midi_fix.zip (10.7 KB ) - added by Pete 15 years ago.
updated source files to eliminate page faults
usb_midi.c (18.3 KB ) - added by Pete 15 years ago.
remove_device_info-before-publish_devices (2.0 KB ) - added by Pete 15 years ago.
Stack trace showing fault from remove_device_info() with waiting read()
usb_midi_fix.diff (16.1 KB ) - added by Pete 15 years ago.
(Working!) Patches to usb_midi and midi_server to cure page faults

Download all attachments as: .zip

Change History (52)

comment:1 by mmlr, 15 years ago

Can you please add the output of "listusb -v /dev/bus/usb/0/0" here. The device stalls the request to set the alternate interface, which is possibly ok if there aren't any alternates. I don't know the USB midi protocol, it might also be required to set the alternate interface to enable device functions (like in other USB protocols).

comment:2 by Pete, 15 years ago

Here's what I get:

[Device /dev/bus/usb/0/0]
    Class .................. 0x00
    Subclass ............... 0x00
    Protocol ............... 0x00
    Max Endpoint 0 Packet .. 64
    USB Version ............ 0x0100
    Vendor ID .............. 0x0763
    Product ID ............. 0x0150
    Product Version ........ 0x0125
    Manufacturer String .... "	"
    Product String ......... "	"
    Serial Number .......... ""
    [Configuration 0]
        Configuration String . "	"
        [Interface 0]
            [Alternate 0 active]
                Class .............. 0x01
                Subclass ........... 0x01
                Protocol ........... 0x00
                Interface String ... ""
                [Descriptor 0]
                    Type ............. 0x24
                    Data ............. 01 00 01 09 00 01 01 
        [Interface 1]
            [Alternate 0 active]
                Class .............. 0x01
                Subclass ........... 0x03
                Protocol ........... 0x00
                Interface String ... ""
                [Endpoint 0]
                    MaxPacketSize .... 64
                    Interval ......... 0
                    Type ............. Bulk
                    Direction ........ Input
                [Endpoint 1]
                    MaxPacketSize .... 64
                    Interval ......... 0
                    Type ............. Bulk
                    Direction ........ Output
                [Descriptor 0]
                    Type ............. 0x24
                    Data ............. 01 00 01 41 00 
                [Descriptor 1]
                    Type ............. 0x24
                    Data ............. 02 01 01 00 
                [Descriptor 2]
                    Type ............. 0x24
                    Data ............. 02 02 02 00 
                [Descriptor 3]
                    Type ............. 0x24
                    Data ............. 03 01 03 01 02 01 00 
                [Descriptor 4]
                    Type ............. 0x24
                    Data ............. 03 02 04 01 01 01 00 
                [Descriptor 5]
                    Type ............. 0x25
                    Data ............. 01 01 03 
                [Descriptor 6]
                    Type ............. 0x25
                    Data ............. 01 01 01 

comment:3 by Pete, 15 years ago

Update: I switched to the Midisport 2x2 that is normally attached to my Ubuntu machine, and find that this does not stall on the set-alt_interface call, and the '/dev/midi/usb' entry gets created. (Both this and the Uno work equally well under Ubuntu.)

PatchBay, however, still did not see any ports when I plugged the Midisport in. Then I finally read the midi_server docs, and saw that the server only looks for /dev/midi devices at boot-up!!(!) I guess someone wasn't thinking of USB and hotplugging when that decision was made... This needs fixing at some point, otherwise users will be very confused (like me!).

When the device was connected before booting, PatchBay finally saw the ports, but unfortunately there is still no actual MIDI input available when I attach my keyboard. Looking at syslog, I see a continual loop:

  usb_midi_callback()
  report 00 00 00 00

comment:4 by phoudoin, 15 years ago

Indeed, /dev/midi/* devices should be watched, not just scanned at startup time. src/servers/midi/DeviceWatcher.cpp|.h should be enhanced what his name imply ;-)

comment:5 by phoudoin, 15 years ago

I've reworked DeviceWatcher in hrev33118 to now dynamically detect device , not only at start up. Removing a device is not yet implemented, though, but I plan to do it soon. Meanwhile, could you please check if now your USB adapter is detected:

  • dynamically, when connected while midi_server run
  • during midi_server startup (already connected at boot time, or when killing/restarting midi_server), as before.

As I don't plan to buy another USB midi adapter soon, if ever, and until I enhanced our usb_midi driver to support my Roland UM2 too, I can't test these changes with actual hardware myself. I've done my tests by faking /dev/midi/* entries, which is not a complete one.

comment:6 by phoudoin, 15 years ago

Owner: changed from axeld to phoudoin
Status: newassigned

comment:7 by phoudoin, 15 years ago

Component: DriversServers/midi_server

Wasn't really an issue with USB midi driver (which is the distinct subject of #4463 ticket) but with midi_server devices detection.

comment:8 by phoudoin, 15 years ago

Summary: USB MIDI not recognizedUSB MIDI not recognized dynamically

in reply to:  5 ; comment:9 by Pete, 15 years ago

Replying to phoudoin:

I've reworked DeviceWatcher in hrev33118 to now dynamically detect device , not only at start up. Removing a device is not yet implemented, though, but I plan to do it soon. Meanwhile, could you please check if now your USB adapter is detected:

  • dynamically, when connected while midi_server run
  • during midi_server startup (already connected at boot time, or when killing/restarting midi_server), as before.

Unfortunately, it doesn't seem to work. (:-()

Plugging the USB in after boot-up results in two inputs and two outputs showing up in PatchBay! Neither of them actually seem to pass MIDI events.

If the USB is plugged in before boot, I get a lockup (in both PatchBay and MusicWeaver) when I try to start the app. (Or in the case of MW when I try to load a configuration that connects to the ports.) (By 'lockup' I mean that the program freezes -- the 3-finger remedy will kill it and Haiku continues.)

I'll try to do some debugging sessions and see if I can figure something out.

I should also note something that I observed since posting the original bug. If the USB is unplugged before a shutdown/restart, I invariably get a KDL page-fault panic from 'MidiPortProducer' (presumably because the disconnection is not being "watched" either!).

in reply to:  9 ; comment:10 by phoudoin, 15 years ago

Argh. Okay, it seems I'll need to buy an UNO then, in order to properly test my changes.

In hrev33159 I've added device remove support, but I'll bet it still crash. Sorry for having break midi_server, guys. I'll revert to last working version tomorrow.

in reply to:  10 ; comment:11 by Pete, 15 years ago

Replying to phoudoin:

Argh. Okay, it seems I'll need to buy an UNO then, in order to properly test my changes.

Echh. Don't feel you have to do that. [Anyway the UNO isn't the reference, and others like my 2x2 don't work either]

Let me see what I can find out in the next day or so, and I'll hopefully get back with some info.

comment:12 by phoudoin, 15 years ago

This morning I though about hacking a /dev/null or /dev/zero driver to publish entries in /dev/midi/... in order to test without having to buy anything.

Will see how it goes.

in reply to:  11 ; comment:13 by Pete, 15 years ago

Replying to Pete:

Replying to phoudoin: Let me see what I can find out in the next day or so, and I'll hopefully get back with some info.

Preliminary findings: I took DeviceWatcher.cpp, stripped all the Add/RemoveDevice stuff out and turned it into a little standalone test app (just printing out the message contents).

If I start the test up, and plug in the USB MIDI (for the first time) I get two reports! They're identical as far as I can see.

From that point on, when I remove or re-plug the USB, I only get one message, but that explains why I see two nodes in PatchBay! Sounds like a bug in PathMonitor to me...

OTOH it doesn't explain the other problems (no MIDI and lockups). I'll see what I can find.

in reply to:  13 ; comment:14 by Pete, 15 years ago

Replying to Pete:

Preliminary findings: I took DeviceWatcher.cpp, stripped all the Add/RemoveDevice stuff out and turned it into a little standalone test app (just printing out the message contents).

If I start the test up, and plug in the USB MIDI (for the first time) I get two reports! They're identical as far as I can see.

Ahah! I didn't look quite carefully enough at the contents of the two messages (via PrintToStream). The 'name' field of one is "0", and of the other is "usb". So obviously one (the second!) is from directory creation, the other is from the device node itself. As the directory is never deleted, one obviously only gets one message from then on. [But why does one see the directory creation when B_WATCH_FILES_ONLY is set? (:-/] Interestingly the 'path' field is "/dev/midi/usb/0" in both messages.

The temporary fix is obviously to check the name field against the last item in the path, but presumably this is a bug in PathMonitor.

in reply to:  14 ; comment:15 by phoudoin, 15 years ago

In hrev33202 I fix two major DeviceWatcher.cpp issues. I add an extra check to enforce that a valid path for a device should be a file, not a directory. I don't understand how a duplicate path could now be published twice, as I check against the known path -> device's endpoints map.

in reply to:  15 ; comment:16 by Pete, 15 years ago

This is an interim report, as I've been a bit delayed by a dose of flu [or something!]. It isn't fully working yet, and I haven't got all the answers, but for now:

Replying to phoudoin:

In hrev33202 I fix two major DeviceWatcher.cpp issues. I add an extra check to enforce that a valid path for a device should be a file, not a directory.

This doesn't work, because '/dev/midi/usb/0' is not recognized as a file! In my variant, I was just checking the Leaf of the full path against the 'name' in the message. However...:

I don't understand how a duplicate path could now be published twice, as I check against the known path -> device's endpoints map.

This probably avoids the dual entry anyway (you weren't doing the check in hrev33118 which is what I was looking at). I haven't got to check that yet.

Once I took out the 'IsFile' check, which prevented any port being found, _AddDevice seems to work. (The extra thread has prevented the lockups.)

However, _RemoveDevice just hangs, because the destructor for MidiPortProducer just hangs at the wait_for_thread(). As a result I still get to KDL when I quit midi_server.

I now have to find out why the thread isn't quitting, unless you have an immediate solution.

in reply to:  16 ; comment:17 by phoudoin, 15 years ago

Replying to Pete:

[...] I've been a bit delayed by a dose of flu [or something!].

I'm right there now, too ;-)

This doesn't work, because '/dev/midi/usb/0' is not recognized as a file!

Damn. I wonder why, though. Maybe a devfs entry is not a regular file, and BStatable don't report them as file. Okay, will reverse the check then, calling IsDirectory() instead.

This probably avoids the dual entry anyway (you weren't doing the check in hrev33118 which is what I was looking at). I haven't got to check that yet.

Oh. Hope it explains the dual entry indeed.

Once I took out the 'IsFile' check, which prevented any port being found, _AddDevice seems to work. (The extra thread has prevented the lockups.)

Nice to hear it.

However, _RemoveDevice just hangs, because the destructor for MidiPortProducer just hangs at the wait_for_thread().

Hum, yes, and I see why: I'm stupid. See below how much.

As a result I still get to KDL when I quit midi_server.

I now have to find out why the thread isn't quitting, unless you have an immediate solution.

MidiPortProducer reader thread is blocking on read(), and I do nothing to wake him. Before the wait_for_thread(), a send_signal(fReaderThread, SIGINT) should do it. If not, let's try the brutal kill_thread(fReaderThread) instead, but it will leak SysEx buffer eventually.

comment:18 by phoudoin, 15 years ago

I just checked, and indeed BEntry.IsFile() only check for regular file, so return False on char/block device entries. I dunno if it's the same under BeOS, BTW. That's what happend when one test his code with regular files he create/remove because he can't play with actual /dev/midi/* entries :-\

Anyway, the path should not be a directory. Symlinks are not supported by devfs currently, but maybe in order to support them in the future we should also tranverse them before checking the target is not a directory.

Will fix that tonight. I think we're close to... close. ;-)

in reply to:  17 ; comment:19 by stippi, 15 years ago

Replying to phoudoin:

MidiPortProducer reader thread is blocking on read(), and I do nothing to wake him. Before the wait_for_thread(), a send_signal(fReaderThread, SIGINT) should do it. If not, let's try the brutal kill_thread(fReaderThread) instead, but it will leak SysEx buffer eventually.

Or you could close the FD that the reader thread is read()ing from. That should also wake up the thread and seems like the cleanest solution.

in reply to:  19 ; comment:20 by phoudoin, 15 years ago

Replying to stippi:

Or you could close the FD that the reader thread is read()ing from. That should also wake up the thread and seems like the cleanest solution.

The file descriptor is open once (twice is not available, as /dev/midi/* devices will allow only one) and shared between MidiPortProducer and MidiPortConsumer. I don't like the idea that MidiPortProducer could close it behind MidiPortConsumer's back.

Anyway, on device remove, we already close the FD in DeviceWatcher::_RemoveDevice(). Seems it's not enough, thought, to wake up MidiPortProducer reader. Dunno why, but I'm getting old and have a flu. Yes, that's shame excuses ;-)

in reply to:  20 ; comment:21 by Pete, 15 years ago

Replying to phoudoin:

Replying to stippi:

Or you could close the FD that the reader thread is read()ing from. That should also wake up the thread and seems like the cleanest solution.

The file descriptor is open once (twice is not available, as /dev/midi/* devices will allow only one) and shared between MidiPortProducer and MidiPortConsumer. I don't like the idea that MidiPortProducer could close it behind MidiPortConsumer's back.

Anyway, on device remove, we already close the FD in DeviceWatcher::_RemoveDevice(). Seems it's not enough, thought, to wake up MidiPortProducer reader. Dunno why, but I'm getting old and have a flu. Yes, that's shame excuses ;-)

[Darn -- all these comments while I'm having a lie in... (:-/)]

I too quickly realized last night that the read was hanging, and this is even though the FD is "closed" before the destructor is called. The read actually hangs on a semaphore that's not affected by the close (all in usb_midi), so I tried putting a delete_sem in the close function. Had no effect (!) so I went to bed...

I'll get back to digging into causes in a moment.

(I don't think there's any harm in closing both the Producer and Consumer at once. Events might be lost, but they're hardly important at that point!)

in reply to:  18 comment:22 by Pete, 15 years ago

Replying to phoudoin:

I just checked, and indeed BEntry.IsFile() only check for regular file, so return False on char/block device entries. I dunno if it's the same under BeOS, BTW.

Anyway, the path should not be a directory. Symlinks are not supported by devfs currently, but maybe in order to support them in the future we should also tranverse them before checking the target is not a directory.

The problem is that this is not the problem... (:-/) The two messages that PathMonitor [buggily, I believe] sends when the device -- and directory -- are first created both contain the same 'path' field: "/dev/midi/usb/0". So what was happening before you put in the map check was that they were both invoking _AddDevice the same way and we got two entries.

What I did was to compare the Leaf of a BPath created from the path with the 'name' field of the message, which differed ("0" and "usb" respectively) in the two messages, and only called _AddDevice if they matched. This works, but is probably redundant as your map check now prevents duplicates anyway (checked).

in reply to:  21 ; comment:23 by Pete, 15 years ago

Replying to Pete:

I too quickly realized last night that the read was hanging, and this is even though the FD is "closed" before the destructor is called. The read actually hangs on a semaphore that's not affected by the close (all in usb_midi), so I tried putting a delete_sem in the close function. Had no effect (!)...

OK, I'm foxed... The usb_midi_close function in usb_midi.c is just not getting called! I have a print showing the close is being invoked on the FD that was opened, and printouts afterward showing it returned, but a (syslog) printout at the beginning of usb_midi_close never appears! What can possibly intervene? The read is hung on the semaphore, but there's no single threading here, is there?

Maybe I'll just try directly killing the thread, after all.

in reply to:  23 comment:24 by Pete, 15 years ago

Replying to Pete:

OK, I'm foxed... The usb_midi_close function in usb_midi.c is just not getting called!

Eventually had the brilliant idea (:-/) of checking the return value of the close! Of course it is an error -- "Bad File Descriptor".

Now the question is, why?

comment:25 by Pete, 15 years ago

... and kill_thread doesn't work either -- just KDLs because the still-running usb read then finds empty space.

comment:26 by stippi, 15 years ago

Guys, you definitely have a bug there. Maybe the MIDI driver is blocking on the next interrupt transfer? I have not looked at the code yet, but I assume you schedule a transfer, and in the transfer done hook, you release a semaphore. In your read hook, you probably block on that semaphore, but this should be done with a timeout. That way, you can unblock the read thread from time to time and bail out if the device has been closed meanwhile. Have a look at the USB HID driver, it should pose a very clean and good example of how to do this.

comment:27 by stippi, 15 years ago

Alternatively, you could also delete the semaphore when the device is closed, so the thread wakes up because it gets an appropriate error from acquire_sem_etc().

comment:28 by phoudoin, 15 years ago

I guess that the issue we have indeed, but maybe it deserve a separate "usb_midi driver hang when closed" ticket.

comment:29 by Pete, 15 years ago

Let's not open a new ticket quite yet. I think the reason the close is not working is that the node is getting removed when the device is unplugged, before the close actually gets called. I'll see if I can sort it out later today.

in reply to:  27 ; comment:30 by Pete, 15 years ago

Replying to stippi:

Alternatively, you could also delete the semaphore when the device is closed, so the thread wakes up because it gets an appropriate error from acquire_sem_etc().

This is in fact what I was trying. The problem is that at the moment the close never gets invoked!

in reply to:  30 ; comment:31 by phoudoin, 15 years ago

This is in fact what I was trying. The problem is that at the moment the close never gets invoked!

Could we have an issue related to USB remove hook & node deletion while it's still opened!?

in reply to:  31 comment:32 by Pete, 15 years ago

Replying to phoudoin:

Could we have an issue related to USB remove hook & node deletion while it's still opened!?

We actually have something of a logical gotcha here! We are watching for the '/dev/midi/usb/0' node to be removed, and then we go to close it! Ouch...

Let me think on this. I guess what has to be done is to remove the Producer and Consumer without having them try to access the USB side.

comment:33 by Pete, 15 years ago

Sorry, folks -- I'm going to have to give this a rest for a while. See if I can clear my head after spending most of four days on it.

Obviously using a 'close' triggered by DeviceWatcher won't work, but I'd have thought I could make the waiting read quit gracefully at least. (I've even tried putting midi_server aside, and just doing "cat /dev/midi/usb/0 >/dev/null. Little difference.)

However everything I try just KDLs. It looks as if the page fault is occurring in devfs_read, and not at the usb_midi level at all, so I'm completely lost. It's probably happening when I end the read-wait by deleting the semaphore or timing out (I've tried both, with essentially the same outcome).

[And I'm afraid I'm getting tired of eternal 20-minute "edit source; jam -q; copy object file; start tests; pull USB cable; KDL; reboot; repeat..."]

in reply to:  33 comment:34 by bonefish, 15 years ago

Replying to Pete:

Replying to Pete:

OK, I'm foxed... The usb_midi_close function in usb_midi.c is just not getting called!

Eventually had the brilliant idea (:-/) of checking the return value of the close! Of course it is an error -- "Bad File Descriptor".

Now the question is, why?

There are two obvious reasons: It is really the wrong FD index, or it has already been closed before (via close() or dup2()).

I can only wholeheartedly recommend getting familiar with the kernel debugger and kernel tracing. E.g. the "iocontext" command can tell you what file descriptors are currently open in a team. Syscall tracing (the kernel tracing option) will allow you to examine what happened to the FD (open()/close()/dup[2]() calls). There are even (hardware) breakpoints available in the kernel debugger.

Replying to Pete:

Replying to phoudoin:

Could we have an issue related to USB remove hook & node deletion while it's still opened!?

We actually have something of a logical gotcha here! We are watching for the '/dev/midi/usb/0' node to be removed, and then we go to close it! Ouch...

There's nothing wrong with first removing a file/unpublishing a device and then closing still open FDs to that file/device. File systems and drivers must handle this situation gracefully. If they don't, they are buggy.

Replying to Pete:

[And I'm afraid I'm getting tired of eternal 20-minute "edit source; jam -q; copy object file; start tests; pull USB cable; KDL; reboot; repeat..."]

If that cycle takes 20 minutes you either have a very slow machine or you're doing something suboptimally. Rebuilding the kernel and/or a driver should take only a minute or so (the most time being spent for the initial jam startup and header scanning).

Regarding the "waiting on a semaphore and still being interruptable by a close()" issue: If something like deleting the semaphore is not an option, the experimental wait_for_objects() API (<OS.h>) introduced in Haiku might be a good alternative to timeout polling. It allows for waiting for events on any number of FDs, semaphores, ports, or threads. I.e. one could monitor the semaphore one is interested in and a second semaphore indicating a close.

by Pete, 15 years ago

Attachment: usb_midi_fault_output added

fragment of serial output showing KDL when usb cable is unplugged

by Pete, 15 years ago

Attachment: usb_midi_fix.zip added

updated source files to eliminate page faults

comment:35 by Pete, 15 years ago

[Augghh!! I just spent half an hour composing an entry, but added a couple of attachments before I previewed it -- and lost it all!! I will try to recreate it....]

After a few more days of hell [and a pleasant interlude or two -- like a weekend of music in Golden Gate Park (:-)] I think I have the page fault problem licked.

As far as I can tell, the faults seem to have been caused by the driver and its objects going away before the system was finished accessing them. The crash was never in the actual usb_midi code at all. A fragment of serial output and stack trace is attached that shows something of what was happening. In this case it seems that legacy_drivers::UninitDevice was trying to reach now-absent data.

To prevent this, I've moved object deletion as late as possible. The device removal itself is now in usb_midi_free. Even this was apparently not enough if too many dprintf statements were also enabled, so I think there must still be a hazard somewhere. (Part of the reason I was having so much trouble tracking things down.) With printout suppressed, though, it seems to be stable.

The slightly unfortunate result (at least I think it's a result -- the crashes were preventing me seeing exactly what was happening before) is that each time one unplugs and replugs the usb cable, a new device is created in /dev/midi/usb. Originally /dev/midi/usb/0, it becomes /dev/midi/usb/1, and so on. The device nodes don't accumulate, but each is not deleted until the cable is plugged in again, and of course they become safely inactive when it's unplugged. I think this is because publish_devices is not invoked when the device is finally removed; when it is called in response to a new device things get corrected.

I think I can live with this, but the user will need to be aware of the need to adjust things if they replug the cable.

I've attached a zip of the sources (not a diff) that have changed (since hrev33202).

in reply to:  35 ; comment:36 by mmlr, 15 years ago

Replying to Pete:

To prevent this, I've moved object deletion as late as possible. The device removal itself is now in usb_midi_free. Even this was apparently not enough if too many dprintf statements were also enabled, so I think there must still be a hazard somewhere. (Part of the reason I was having so much trouble tracking things down.) With printout suppressed, though, it seems to be stable.

The free hook is supposed to be called only after all open handles have been close()d and all other calls like read/write/ioctl have completed. So if there is still an open count in the free hook or access to the device after the free hook returned that'd be a bug in the devfs code for legacy drivers.

I think this is because publish_devices is not invoked when the device is finally removed; when it is called in response to a new device things get corrected.

Any add/remove of a device that matches the support descriptors of your driver will also cause a rescan of the driver to happen. It should in that case call publish_devices also after the unplug. That requires that you have the list updated before returning from the device_removed hook you provide. Again, if this doesn't work there might be a bug in the devfs code.

I think I can live with this, but the user will need to be aware of the need to adjust things if they replug the cable.

Um, they are bugs, so living with them isn't really the way to go ;-)

I've attached a zip of the sources (not a diff) that have changed (since hrev33202).

It's generally easier if you avoid attaching zips. That is because the zips can't be viewed inline in the bug tracker.

in reply to:  36 comment:37 by Pete, 15 years ago

Replying to mmlr:

Replying to Pete: The free hook is supposed to be called only after all open handles have been close()d and all other calls like read/write/ioctl have completed. So if there is still an open count in the free hook or access to the device after the free hook returned that'd be a bug in the devfs code for legacy drivers.

I'm afraid I got fairly confused trying to track the order of occurrences, especially as the dprintfs often seemed to cause the page faults. I'm checking the open count in the free hook (as well as checking the active flag) and with printouts enabled these seemed to be correct.

I'm not sure how to dig deeper. I tried to recompile just the kernel (from the slightly older sources I currently have) with some dprintfs in legacy_drivers.cpp, but I got a (repeating!) page fault at boot. I assume I should be able to do that if everything matches (?) so I'll try again when I get the alpha release and matching sources installed.

I think this is because publish_devices is not invoked when the device is finally removed; when it is called in response to a new device things get corrected.

Any add/remove of a device that matches the support descriptors of your driver will also cause a rescan of the driver to happen. It should in that case call publish_devices also after the unplug. That requires that you have the list updated before returning from the device_removed hook you provide. Again, if this doesn't work there might be a bug in the devfs code.

Again, I'm not sure of the exact sequence, or what actually triggers the call to publish_devices. And I'm not sure what you mean by the 'device_removed hook' -- I don't find that in the set of hooks. There's a 'close', a 'free', and an 'uninit_driver'.

'remove_device_info' (in devlist.c) gets called (now) in 'usb_midi_close', and adjusts the driver's own list of names (ready to be published), but doesn't trigger any external action. Then 'remove_device' gets called in 'usb_midi_free'. So it does rather sound to me as if 'publish_devices is not getting invoked when it should be.

I think I can live with this, but the user will need to be aware of the need to adjust things if they replug the cable.

Um, they are bugs, so living with them isn't really the way to go ;-)

Well, I suppose... (:-)) but I just checked, and Linux does pretty much the same thing -- except that the device does disappear when unplugged. Here's what I got:

[Plug in the USB MIDI cable:]
pete@Lys:~$ ls /dev/usbdev*
/dev/usbdev1.1_ep00  /dev/usbdev1.2_ep81  /dev/usbdev1.4_ep00
/dev/usbdev1.1_ep81  /dev/usbdev1.3_ep00  /dev/usbdev1.4_ep04
/dev/usbdev1.2_ep00  /dev/usbdev1.3_ep81  /dev/usbdev1.4_ep86
[Unplug it...:]
pete@Lys:~$ ls /dev/usbdev*
/dev/usbdev1.1_ep00  /dev/usbdev1.2_ep00  /dev/usbdev1.3_ep00
/dev/usbdev1.1_ep81  /dev/usbdev1.2_ep81  /dev/usbdev1.3_ep81
pete@Lys:~$ ls /dev/usbdev*
[and plug it in again...:]
/dev/usbdev1.1_ep00  /dev/usbdev1.2_ep81  /dev/usbdev1.6_ep00
/dev/usbdev1.1_ep81  /dev/usbdev1.3_ep00  /dev/usbdev1.6_ep04
/dev/usbdev1.2_ep00  /dev/usbdev1.3_ep81  /dev/usbdev1.6_ep86

The other difference is that alsa itself manages to keep the same 'client number', and also lets you address the device by name.

I've attached a zip of the sources (not a diff) that have changed (since hrev33202).

It's generally easier if you avoid attaching zips. That is because the zips can't be viewed inline in the bug tracker.

Ah. OK, sorry. (I was really intending it for Philippe's use, thinking it might be easier for him to grab and look at.) I'll attach the source of usb_midi.c, as I think that's the only one that's been affecting the crash.

by Pete, 15 years ago

Attachment: usb_midi.c added

comment:38 by Pete, 15 years ago

Some more details:

I'm not sure I made it clear that, if the device is not opened by any user (e.g. midi_server), there is no problem. Plugging in the USB MIDI creates /dev/midi/usb/0, unplugging removes it again. It's only if there is a connection when the cable is unplugged that things go wrong.

Specifically, if there are no opens, the call of usb_midi_removed() (by the usb manager I guess) calls remove_device_info() and remove_device() before publish_devices() gets invoked, and all is fine.

However, if there is a connection, with a waiting read(), this sequence will cause a page fault in devfs_read() -- see attached trace. So if there are open connections both the above calls are now deferred: remove_device_info() seems to be safe in usb_midi_close(), but it seems that remove_device() can't safely be called before usb_midi_free(). In any case, remove_device_info() then happens after publish_devices(), causing the successive device stepping mentioned previously.

In addition, if debug printouts are enabled, there is invariably a page fault in legacy_drivers::UninitDevice().

I've taken the serial output with debug output enabled -- first plugging and unplugging (twice) with no open call, and then with a simple 'cat /dev/midi/usb/0 >/dev/null' -- and edited it down to just the printout, with interspersed comments in '[...]', in hopes it will show more clearly what's happening:

[First, plug in USB MIDI and unplug again without opening a MIDI port]
usb_midi: init_hardware() Oct  7 2009 15:57:16
usb_midi: init_driver() Oct  7 2009 15:57:16
usb_midi: init_driver() OK
usb_midi: publish_devices()
usb_midi: uninit_driver()
usb_midi: uninit complete
loaded driver /boot/system/add-ons/kernel/drivers/dev/midi/usb_midi
usb hub 2: port 0: device removed
usb hub 2: port 0: new device connected
usb_midi: init_hardware() Oct  7 2009 15:57:16
usb_midi: init_driver() Oct  7 2009 15:57:16
usb_midi: usb_midi_added(0x00000008, 0x80fe5808)
usb_midi: vendor ID 0x0763, product ID 0x0150
usb_midi: interface 0, alt : 0: class 1, subclass 1, protocol 0
usb_midi: interface 1, alt : 0: class 1, subclass 3, protocol 0
usb_midi: Allocated 1 write buffers
usb_midi: Created device 0x87194168
usb_midi: my_dev = 0x87194168 endpoint count = 2
usb_midi:  input endpoint = 0x80ed3ab0
usb_midi:  output endpoint = 0x80ed3ab8
usb_midi: queueing bulk xfer ep 0
usb_midi: added midi/usb/0
usb_midi: init_driver() OK
usb_midi: publish_devices()
usb_midi: find_device(midi/usb/0)
[no more messages until:]
[USB MIDI unplugged here...]
usb hub 2: port 0: device removed
usb hub 2: port 0 disabled
usb_midi: usb_midi_removed(midi/usb/0)
usb_midi: removing info & device: midi/usb/0
usb_midi: remove_device 0x87194168
usb_midi: publish_devices()
usb_midi: uninit_driver()
usb_midi: uninit complete
[Plug cable in again (without opening...]
usb hub 2: port 0: device removed
usb hub 2: port 0: new device connected
usb_midi: init_hardware() Oct  7 2009 15:57:16
usb_midi: init_driver() Oct  7 2009 15:57:16
usb_midi: usb_midi_added(0x0000000f, 0x80fe57f0)
usb_midi: vendor ID 0x0763, product ID 0x0150
usb_midi: interface 0, alt : 0: class 1, subclass 1, protocol 0
usb_midi: interface 1, alt : 0: class 1, subclass 3, protocol 0
usb_midi: Allocated 1 write buffers
usb_midi: Created device 0x871941e0
usb_midi: my_dev = 0x871941e0 endpoint count = 2
usb_midi:  input endpoint = 0x80ed3a60
usb_midi:  output endpoint = 0x80ed3a68
usb_midi: queueing bulk xfer ep 0
usb_midi: added midi/usb/0
usb_midi: init_driver() OK
usb_midi: publish_devices()
usb_midi: find_device(midi/usb/0)
usb hub 2: port 0: device removed
usb hub 2: port 0 disabled
usb_midi: usb_midi_removed(midi/usb/0)
usb_midi: removing info & device: midi/usb/0
usb_midi: remove_device 0x871941e0
usb_midi: publish_devices()
usb_midi: uninit_driver()
usb_midi: uninit complete
[---------------------------------------------]

[This time, open a connection after plugging in]
usb hub 2: port 0: device removed
usb hub 2: port 0: new device connected
usb_midi: init_hardware() Oct  7 2009 15:57:16
usb_midi: init_driver() Oct  7 2009 15:57:16
usb_midi: usb_midi_added(0x00000016, 0x80fe57e4)
usb_midi: vendor ID 0x0763, product ID 0x0150
usb_midi: interface 0, alt : 0: class 1, subclass 1, protocol 0
usb_midi: interface 1, alt : 0: class 1, subclass 3, protocol 0
usb_midi: Allocated 1 write buffers
usb_midi: Created device 0x871940f0
usb_midi: my_dev = 0x871940f0 endpoint count = 2
usb_midi:  input endpoint = 0x80ed39f0
usb_midi:  output endpoint = 0x80ed39f8
usb_midi: queueing bulk xfer ep 0
usb_midi: added midi/usb/0
usb_midi: init_driver() OK
usb_midi: publish_devices()
usb_midi: find_device(midi/usb/0)
[ Do 'cat /dev/midi/usb/0 >/dev/null']
usb_midi: open(midi/usb/0)
usb_midi: device midi/usb/0 open (1)
usb_midi: MIDI read (4096 byte buffer at 0 cookie 0x80f30828)
usb_midi: waiting on acquire_sem_etc
usb_midi: acquire_sem_etc timed out
[...last two repeat while waiting...]
usb_midi: waiting on acquire_sem_etc
[USB MIDI unplugged here...]
usb hub 2: port 0: device removed
usb hub 2: port 0 disabled
usb_midi: usb_midi_removed(midi/usb/0)
usb_midi: bus status -2147483636
usb_midi: midi/usb/0 still open
usb_midi: publish_devices()
usb_midi: acquire_sem_etc aborted
usb_midi: Device no longer active
usb_midi: close(midi/usb/0)
usb_midi: close -- removing device info midi/usb/0
usb_midi: close complete
usb_midi: free(midi/usb/0)
usb_midi: free -- removing midi/usb/0
usb_midi: remove_device 0x871940f0
usb_midi: uninit_driver()
usb_midi: uninit complete
[... and because of the 'usb_midi:' dprintfs above...:]
vm_soft_fault: va 0xdeadb000 not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0xdeadbf0f
, ip 0x8007a7ff, write 1, user 0, thread 0x142
PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbf0f, ip 0x8
007a7ff

Welcome to Kernel Debugging Land...
Thread 322 "cat" running on CPU 0
kdebug> sc
stack trace for thread 322 "cat"
[......]
 8 83fe3c70 (+ 192) 800610a1   <kernel_x86>:panic + 0x0029
 9 83fe3d30 (+  80) 800c5255   <kernel_x86>:vm_page_fault + 0x0139
10 83fe3d80 (+  64) 800d537d   <kernel_x86>:page_fault_exception + 0x00d9
11 83fe3dc0 (+  12) 800d8a76   <kernel_x86>:int_bottom + 0x0036
kernel iframe at 0x83fe3dcc (end = 0x83fe3e1c)
 eax 0xdeadbeef     ebx 0x87363e70      ecx 0x1          edx 0x246
 esi 0x80f30710     edi 0x82d4b000      ebp 0x83fe3e34   esp 0x83fe3e00
 eip 0x8007a7ff  eflags 0x10282
 vector: 0xe, error code: 0x2
12 83fe3dcc (+ 104) 8007a7ff   <kernel_x86> LegacyDevice<0x87363e70>::UninitDevice(0x0, 0x870af078, 0x2bb22b) + 0x0047
13 83fe3e34 (+  48) 8007339c   <kernel_x86> devfs_free_cookie(fs_volume*: 0x80edf398, fs_vnode*: 0x87154330, 0x80f30710) + 0x0048
14 83fe3e64 (+  48) 800a1e3a   <kernel_x86> file_free_fd(file_descriptor*: 0x870af078) + 0x002e
15 83fe3e94 (+  48) 8008ff14   <kernel_x86>:put_fd + 0x0038
16 83fe3ec4 (+  64) 8008fffe   <kernel_x86>:close_fd_index + 0x0032
17 83fe3f04 (+  32) 80090d0d   <kernel_x86> common_close(int32: 3, false) + 0x0031
18 83fe3f24 (+  32) 800915e4   <kernel_x86>:_user_close + 0x0018
19 83fe3f44 (+ 100) 800d8ca2   <kernel_x86>:handle_syscall + 0x00af
user iframe at 0x83fe3fa8 (end = 0x83fe4000)
 eax 0x8a           ebx 0x2b8bb4        ecx 0x7ffeeeb0   edx 0xffff0104
 esi 0x7ffef5b5     edi 0x204177        ebp 0x7ffeeedc   esp 0x83fe3fdc
 eip 0xffff0104  eflags 0x207      user esp 0x7ffeeeb0
 vector: 0x63, error code: 0x0
20 83fe3fa8 (+   0) ffff0104   <commpage>:commpage_syscall + 0x0004
21 7ffeeedc (+ 160) 0020220f   <_APP_>:main + 0x0453
22 7ffeef7c (+  48) 002015eb   <_APP_>:_start + 0x005b
23 7ffeefac (+  48) 0010090a   </boot/system/runtime_loader@0x00100000>:unknown+ 0x090a
24 7ffeefdc (+   0) 7ffeefec   6159:cat_main_stack@0x7efef000 + 0xffffec
kdebug>

My suspicion is strongly that the way devfs currently handles things isn't adequate for this situation. I don't think my knowledge of that area is going to be sufficient to track things down, so I'd be really happy to cooperate with someone with more detailed expertise.

by Pete, 15 years ago

Stack trace showing fault from remove_device_info() with waiting read()

by Pete, 15 years ago

Attachment: usb_midi_fix.diff added

(Working!) Patches to usb_midi and midi_server to cure page faults

comment:39 by Pete, 15 years ago

Finally...(!) A slight reorganization of the objects used by usb_midi seems to have at last given stable handling of USB MIDI plugging and unplugging. It no longer steps the device number when replugged, either. I've attached a patch against (approximately) the alpha1 sources. [The reference sources are the berlios.de source snapshot of 2009/9/28, but nobody has yet been able to tell me how to find the revision number of this (:-/)]

The main fix was to make the usb device info structure and the driver cookie a little more independent of each other, and to carefully disconnect them before either is deleted.

I also (eventually) realized that one thing bedevilling me was a bug in legacy_drivers that had been fixed independently (ticket #3856), so things work a bit better under alpha1!

in reply to:  39 comment:40 by phoudoin, 15 years ago

I'm back from BeGeistert, so a little tired. But I'll check and import your patch ASAP, promise.

;-)

comment:41 by phoudoin, 15 years ago

Except for few coding style violation in DeviceWatcher.cpp and PortDrivers.cpp, the change looks fine and, for usb_midi part, making sense. Good job.

in reply to:  41 ; comment:42 by Pete, 15 years ago

Replying to phoudoin:

Except for few coding style violation in DeviceWatcher.cpp and PortDrivers.cpp, the change looks fine and, for usb_midi part, making sense. Good job.

Great! Thanks.

Would you mind emailing me (or something) the violations you found? I thought I'd been careful. (:-/)

Hope BeGeistert went well...

in reply to:  42 ; comment:43 by phoudoin, 15 years ago

Patch applied in hrev33782. I did some minor changes in midi_server part: I've kept the directory/symlink checks in !_AddDevice() and to not check path/name consistency in B_ENTRY_CREATED notification, as if it's a necessary hack we must fix it in BPathMonitor, not here.

Would you mind emailing me (or something) the violations you found? I thought I'd been careful. (:-/)

The other few minor changes were coding style:

  • no space between cast and variable
  • no space between type and asterisk
  • no space around expression
  • expand if else statement to 4 lines at least.

Hope BeGeistert went well...

Very.

in reply to:  43 comment:44 by Pete, 15 years ago

Replying to phoudoin:

Patch applied in hrev33782. I did some minor changes in midi_server part: I've kept the directory/symlink checks in !_AddDevice() and to not check path/name consistency in B_ENTRY_CREATED notification, as if it's a necessary hack we must fix it in BPathMonitor, not here.

Yes, I slightly stand corrected... On reflection, the check for a directory will be needed when BPathMonitor is fixed (as it should be) because it will then correctly announce the creation of the directory itself, rather than the device entry twice.

And somehow your addition of the 'ContainsKey' check escaped my notice. This of course fixes the multiple Endpoint problem without the need for my path/name protection.

So I've checked it and everything seems to work. (There now seems to be a separate problem of interference between USB devices -- plugging in my Wacom seems to kill the MIDI stream rather quickly! That's another ticket, though, when I have better data.)

The other few minor changes were coding style:

  • no space between cast and variable
  • no space between type and asterisk
  • no space around expression
  • expand if else statement to 4 lines at least.

I'm glad to note that all these -- aside from the if-else -- were in the code I inherited from you...! (:-))

comment:45 by phoudoin, 15 years ago

Resolution: fixed
Status: assignedclosed

I'm glad to note that all these -- aside from the if-else -- were in the code I inherited from you...! (:-))

Hum, no. My list above what not what was missing but what was espected. As in:

-   ((DeviceWatcher*)data)->_ScanDevices(kDevicesRoot);
+   ((DeviceWatcher*) data)->_ScanDevices(kDevicesRoot);

The space after cast is a coding style violation.

-   if (fDeviceEndpointsMap.ContainsKey(path)) {
+   if ( fDeviceEndpointsMap.ContainsKey(path) ) {

The space around expression too.

-   DeviceEndpoints* deviceEndpoints = [...]
+   DeviceEndpoints * deviceEndpoints = [...]

Space between type and asterisk is too.

Wasn't the case in my code. That's pure luck maybe, as I'm not known for respecting well our coding style, but at least that was't there before.

Anyway, I'm glad it's working fine with actual hardware, minus USB conflich with Wacom device (I've similar issue with USB keyboard and USB volito tablet too), but I'm closing this ticket now.

in reply to:  45 ; comment:46 by Pete, 15 years ago

Replying to phoudoin:

I'm glad to note that all these -- aside from the if-else -- were in the code I inherited from you...! (:-))

Hum, no. My list above what not what was missing but what was espected. As in:

-   ((DeviceWatcher*)data)->_ScanDevices(kDevicesRoot);
+   ((DeviceWatcher*) data)->_ScanDevices(kDevicesRoot);

The space after cast is a coding style violation. [........]

Wasn't the case in my code. That's pure luck maybe, as I'm not known for respecting well our coding style, but at least that was't there before.

Sorry to persistently disagree about this (:-/), but it was! All those sections of code were exactly what I downloaded in changeset hrev33202 (source zip form, but even looking at the diff in Trac I see one of them!). I edited nothing in that area, so however it got there, it wasn't me...!

in reply to:  46 comment:47 by phoudoin, 15 years ago

Replying to Pete:

Sorry to persistently disagree about this (:-/), but it was! All those sections of code were exactly what I downloaded in changeset hrev33202 (source zip form,

You're right, I just forgot your patch was made against hrev33202. I fixed these code style issues of mine (indeed) were fixed in hrev33265, and I though your patch was against a more recent revision. Sorry about that.

Note: See TracTickets for help on using tickets.