Opened 14 years ago
Closed 8 years ago
#7285 closed bug (fixed)
BSoundPlayer sends buffers late, with ever-increasing latency
Reported by: | Pete | Owned by: | jua |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Kits/Media Kit | Version: | R1/Development |
Keywords: | Cc: | pete.goodeve@…, mdisreali@… | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
Once I got sound working reliably on my machine (see #7284) I was able to try apps like SqueekySynth and MidiPlayer, but I immediately ran into "Buffer Late" problems there.
Often the app will start up OK, and I can drive it from a midifile or live input with marvellously low latency. Shortly, however, the whole OS will get very sluggish, and sould usually quits altogether. This seems to be usually triggered by mouse action, such as clicking a radio button.
If I run 'top', I see the audio mixer hogging up to 80% of CPU. Looking at 'tail -F /var/log/syslog', there are continuous "Buffer late" messages with ever-increasing lag. (I've seen something like 28 million microsecs if I let it go long enough!)
Once I terminate the app, things go back to normal, but the same thing will happen the next time round.
I can prevent it happening at all by using Cortex to change the run mode of the app to 'drop frames', rather than 'increase latency' but I imagine this isn't the correct solution. MediaPlayer, which seems to have a slightly different buffer management scheme, never shows the problem, however hard I work the mouse buttons.
(I'm still looking into this, but I wanted to get it into a ticket before any possible alpha deadline.)
Attachments (2)
Change History (55)
comment:1 by , 14 years ago
Cc: | added |
---|---|
Component: | - General → Audio & Video |
follow-ups: 4 28 comment:2 by , 14 years ago
comment:3 by , 14 years ago
Component: | Audio & Video → Kits/Media Kit |
---|---|
Owner: | changed from | to
follow-up: 5 comment:4 by , 14 years ago
Replying to Pete:
What I find completely mysterious, though, is -- what may be the root of the whole problem -- that the read_port_etc in WaitForMessage (called in MediaEventLooper's ControlLoop) sometimes times out 4-8 msec late! (About the same amount the latency keeps increasing by.)
[...]
Note that the time of return is 8 msec later than requested! I don't imagine this is expected behaviour in a B_URGENT_PRIORITY task?
Unless some other thread with even higher priority is very busy, this shouldn't happen. If there isn't, disregarding kernel bugs, those latencies can only occur when interrupts are disabled for too long, for which there are two usual reasons: 1. Handling an interrupt takes too long -- which really is a bug too (likely in a driver). 2. You have serial debug output enabled. The latter could explain all the behavior you have reported. Since interrupts are disabled while writing to the serial port and writing even only a few characters takes a rather long time, this can easily cause the read_port_etc()
latencies you encounter. It might also explain that the latency increases over time, since late buffers cause the mixer to print to the serial debug output, which in turn worsens the issue. At some point the adjusted downstream latency should be high enough to compensate for all of that, but the excessive serial debug output might affect the system enough to cause drivers to become unhappy and complain (to the serial debug output). The general system sluggishness would support such an assumption.
So, long story short, if you have serial debug output enabled (the default for nightlies and self-built images), try to disable it and see how things go.
Just in case serial debug output isn't the underlying cause, another common reason for ever-increasing/decreasing latencies is a mix-up/incorrect use of real time and performance time in computations, since the two usually drift apart slowly.
Oh, and since you mentioned a perceivable latency, IIRC the last time I looked the multi-audio add-on played buffers a full buffer late. This isn't noticeable when just playing audio, but might be when watching a video or doing MIDI.
follow-up: 6 comment:5 by , 14 years ago
Replying to bonefish:
Replying to Pete:
What I find completely mysterious, though, is -- what may be the root of the whole problem -- that the read_port_etc in WaitForMessage (called in MediaEventLooper's ControlLoop) sometimes times out 4-8 msec late! (About the same amount the latency keeps increasing by.) [...]
[....] there are two usual reasons: 1. Handling an interrupt takes too long -- which really is a bug too (likely in a driver). 2. You have serial debug output enabled. The latter could explain all the behavior you have reported. It might also explain that the latency increases over time, since late buffers cause the mixer to print to the serial debug output, which in turn worsens the issue.
So, long story short, if you have serial debug output enabled (the default for nightlies and self-built images), try to disable it and see how things go.
Hah... and D'oh... I never suspected that -- because I wasn't aware that was the default for self-built images! Serial-debug is suppressed in the alpha distributions, right?
(I see several tickets, like #6709, that report the Mixer thread hogging the CPU. Could this be a point people are generally not aware of?)
Being brief as well... Things are much better now! There is no runaway in the Mixer thread, and latency is fairly good.
However, they aren't perfect. With default buffer_size (880 bytes for the MidiPlayer I think) latency still gradually increases, and the audio can suddenly get distorted (it "growls"). If I double the buffer size, latency still goes up (but less) and there is no growling.
If I bring up Cortex and change the Run Mode to Drop Data, things seem much better still: latency stays at 18 msec. I actually have not yet managed to hear the effect of a dropped buffer, so I'm wondering if this wouldn't be the better default. Using BSoundPlayer, there seems to be no 'official' way for an app to change the mode, so for my purposes I think dropping by default would be better, and I don't really want to keep a custom libmedia.so around! (I also notice that under BeOS the same apps don't suffer from drooping latency, so either BSoundPlayer there is dropping data, or they have a better algorithm.)
Oh, and since you mentioned a perceivable latency, IIRC the last time I looked the multi-audio add-on played buffers a full buffer late. This isn't noticeable when just playing audio, but might be when watching a video or doing MIDI.
I wonder if this is because the event queue is not woken up? (Which I fixed for the SoundPlayNode, but has to be done for every module that uses TimedEventQueue.) Or is there a totally different reason? In any case, with dropped data, latency is really good now.
comment:6 by , 14 years ago
Replying to Pete:
Replying to bonefish: Hah... and D'oh... I never suspected that -- because I wasn't aware that was the default for self-built images! Serial-debug is suppressed in the alpha distributions, right?
Yep.
However, they aren't perfect. With default buffer_size (880 bytes for the MidiPlayer I think) latency still gradually increases, and the audio can suddenly get distorted (it "growls"). If I double the buffer size, latency still goes up (but less) and there is no growling.
If I bring up Cortex and change the Run Mode to Drop Data, things seem much better still: latency stays at 18 msec. I actually have not yet managed to hear the effect of a dropped buffer, so I'm wondering if this wouldn't be the better default. Using BSoundPlayer, there seems to be no 'official' way for an app to change the mode, so for my purposes I think dropping by default would be better, and I don't really want to keep a custom libmedia.so around! (I also notice that under BeOS the same apps don't suffer from drooping latency, so either BSoundPlayer there is dropping data, or they have a better algorithm.)
To avoid misunderstandings, both modes drop buffers, if they are too late for being processed/played in time. The "increase latency" mode additionally increases the latency, if buffers are late. I believe it does so also, when the buffer arrives late, but the node decides that it can still process it in time. If you don't hear dropped buffers (or see them in the syslog), then this is likely what happens. Since the "drop buffers" mode works fine, there's likely a bug in the code related to latency increase computation or in the code of the upstream nodes handling the latency increase. I already mentioned one common mistake that can lead to things getting continuously worse: Mix-up of real and performance time in computations. Another issue could be a bug in the lateness computation somewhere in the BMediaEventLooper
code. ControlLoop()
looks suspicious IMO. It apparently computes the lateness after pulling the event out of the queue. I believe it should do that directly after the buffer arrives.
Oh, and since you mentioned a perceivable latency, IIRC the last time I looked the multi-audio add-on played buffers a full buffer late. This isn't noticeable when just playing audio, but might be when watching a video or doing MIDI.
I wonder if this is because the event queue is not woken up? (Which I fixed for the SoundPlayNode, but has to be done for every module that uses TimedEventQueue.) Or is there a totally different reason?
It's a logical problem in the multi-audio add-on itself. It simply schedules the buffers for playing one full buffer late.
follow-up: 8 comment:7 by , 14 years ago
I've been hammering away at this off and on for a month or so now (with intervals when my head started spinning too fast!), and I think I have a bit better handle on it now. No clue as to the root cause though.
First, you can disregard what I said above about it never happening. It always does, but the longer the buffer the less frequent the glitch. I eventually took the 'ToneProducer' BeOS sample code, added some diagnostic printouts and switch options to change buffer size, initial latency, and so on, and used that as a basic testbed.
The fundamental bug seems to be that something turns off interrupts (I assume) for about 8-9 msec every so often. If this happens at the wrong time, the buffer in the mixer input queue is retrieved late by that much (or a partial delay, depending on relative timing).
As a result, the buffer is 'late', and the mixer notifies the producer -- thinking it is responsible -- which obligingly increases its latency. However, the latency known to the mixer is always its own latency (fixed at 12999 microsec in my system), so however big the producer's latency gets, the glitch will cause the buffer to be late.
Eventually the producer's latency gets too big for the number of buffers assigned to the group, and things go completely west. ("Major GLITCH!!..." in the mixer diagnostic output, and the sound begins to growl).
I can't seem to find any way to compensate for this without a hack at some level. I suppose I could add 9msec to the mixer's own latency -- haven't tried that yet. I can minimize obnoxiousness by having the mixer not send late notices, which prevents the runaway, but you still hear a click when the glitch occurs.
I can modify my own stuff -- and MidiPlayer, which is severely bothered -- to drop data rather than increase latency (or just use 'RECORDING' mode) but this isn't possible with SqueekySynth, which I would really like to be able to use.
Best to find the module causing the delay, but I have no clue how to do so. (I've now ensured -- and checked -- that no serial debug output is active.)
follow-up: 9 comment:8 by , 14 years ago
Replying to Pete:
The fundamental bug seems to be that something turns off interrupts (I assume) for about 8-9 msec every so often. If this happens at the wrong time, the buffer in the mixer input queue is retrieved late by that much (or a partial delay, depending on relative timing).
As a result, the buffer is 'late', and the mixer notifies the producer -- thinking it is responsible -- which obligingly increases its latency. However, the latency known to the mixer is always its own latency (fixed at 12999 microsec in my system), so however big the producer's latency gets, the glitch will cause the buffer to be late.
I don't understand your analysis. There are three latencies relevant to the mixer: upstream, downstream, and internal latency. The downstream latency determines when the mixer has to send a buffer downstream for it to arrive in time and AFAIU is not relevant to the problem at hand, since the issue happens earlier in the chain. The mixer's internal latency defines how long it maximally takes to process a buffer, so that a buffer arriving in time will be sent out in time again. The upstream latency is the one determining when a buffer must arrive at the latest for the mixer to be able to send it out in time.
The three latencies cover the whole chain of nodes from the initial producer to the final consumer seamlessly. This is an important fact. It implies that increasing (the right) latencies will eventually solve all sporadic lateness issues. This includes a 8-9 ms "blackout" (which shouldn't happen in the first place, of course). There simply isn't a "wrong time" for it to happen. If it happens before the buffer arrives, the upstream latency compensates for it. Eventually it will have been increased by a total of 9 ms and it doesn't matter anymore, when the blackout occurs, since the buffer will still arrive in time. If it happens during the mixer's processing the internal latency should compensate for it. I believe the mixer doesn't adjust its internal latency, when it was too slow processing a buffer. It certainly should, but unless the mixer drops buffers due to that, which from your description it doesn't seem to do, we can ignore this issue ATM. The worst case is that the buffer will arrive late at the consumer, which in turn will send a late notice, causing our upstream latency to be increased. So eventually the buffers wouldn't be late anymore either.
So, again, there isn't a "wrong time" for the blackout, since eventually the latencies will compensate for it in either case. That, of course, assumes seamless latency handling, which I believe is what is broken here. In my previous comment I already pointed to BMediaEventLooper::ControlLoop()
, which violates that seamlessness constraint by computing the buffer lateness after pulling the buffer out of the queue (having waited for the processing start time already). A blackout occurring anytime between expected wake-up time minus blackout duration and lateness computation will result in a positive buffer lateness, regardless of how early the buffer actually arrived.
Best to find the module causing the delay, but I have no clue how to do so. (I've now ensured -- and checked -- that no serial debug output is active.)
I would recommend to first look into the BMediaEventLooper
issue. That will probably already fix the problem. The delay, if indeed caused by disabled interrupts, will be hard to debug. There aren't any means for debugging disabling/enabling of interrupts yet. If interrupts aren't the problem, scheduling_recorder
+ DebugAnalyzer
(neither is in the image by default) could help. Record a stretch of time (ideally not too long -- seconds rather than minutes) with the former and load the result with the latter. If the mixer thread was held up and the cause was some wait object (semaphore, mutex,...), that can be seen. Blackout by interrupt will probably only show as increased latency or run time.
comment:9 by , 14 years ago
Replying to bonefish:
I don't understand your analysis.
Possibly because I'm not sure I did, either... (:-)) I took my time responding because I want to make sure I'm covering all the subtle possibilities this time. There seem to be a lot of confounding factors involved. Forgive me if a lot of the following is what you already expected.
First, let's break down the original problem in a bit more detail, which in fact is basically as you described -- using the lateness computed by the ControlLoop after waking up is wrong. If the wake-up is late, due to a blackout, the loop will always report the lateness, whatever the current EventLatency (which is what the MediaEventLooper both computes the waiting time from, and checks lateness against). The mixer would always respond to that by notifying the producer, which would correspondingly increase its own latency a bit more. This would happen every time there was a blackout at the 'wrong' time, until the latency got longer than could be covered by the available buffer pool, and the sound would go to pieces.
And this is what I meant by the 'wrong time'. If the blackout occurred, say, while the loop was sleeping, it would not make anything late. It had to happen actually at the wakeup time (or between then and when lateness was calculated) to cause the problem. And changing any of the latencies in the chain (downstream, internal, or upstream) couldn't change the situation in any way, because both the 'expected' time and the wakeup time were based on the same latency.
So, your suggestion to move the control loop's test for lateness seemed rational, but in practice it doesn't help much. I put the calculation at the top of the loop, where the head of the queue gets checked, but -- for shorter buffers anyway -- the glitches still happened about as often. I realized that this is because it is a loop. If the buffer duration is longer than the latency, things actually work, because there will normally be no buffers kept in the queue. When each buffer is sent for processing, the loop goes into infinite timeout until it is woken by the next input buffer. If the input is late (or is signalled late) upstream latency will be increased as it should, and it won't be late next time.
If the buffer duration is shorter than the latency, though, buffers will spend some time in the queue, and will be accessed there by the loop at the moment the previous buffer is handled (rather than there being an idle interval). If there is a delay here, the situation is the same as before -- whatever the current latency value, the event will be determined to be 'late', and we're back to the endless increase.
Hope that makes more sense this time. And in general, yes, you're right. The "seamless" latency handling is broken. But actually I don't think it's really broken in the control loop. In fact I've ended up putting it back the way it was (excepting an early fix for a sign error in the initial lateness tests). I think the real point is that the mixer was using the information wrongly.
Taking lateness information from the control loop is never going to work, as I think I've shown. What I had to do was move the (upstream) lateness check (and NotifyLateProducer call) into the mixer's BufferReceived() method, and replace the original check in HandleInputBuffer with a limited adjustment for the mixer's own EventLatency to handle the delays happening in its control loop. The 'lateness' parameter itself is now just ignored. [I'm not sure what BeOS meant the 'lateness' in the HandleInputBuffer call to represent. It's hard because under no circumstances can I make BeOS actually generate a LateNoticeReceived()!]
With these mods, I no longer get runaway latency, and I can play live MIDI input through SqueekySynth. There is still the occasional audible hiccup -- and corresponding "MAJOR Glitch!!" reported, but it is tolerable, and overall latency stays under 30msec.
Still have no clue as to the basic cause of the blackouts. Is it possible that a task-switch could occur, even though the thread is high priority?
Anyway, I'll have to tidy up the code -- removing all the extraneous printfs etc. -- and then post the patch.
by , 14 years ago
Attachment: | media_mixer.diff added |
---|
Media and Mixer patch to fix behaviour with short buffers
comment:10 by , 14 years ago
patch: | 0 → 1 |
---|
follow-up: 12 comment:11 by , 14 years ago
After a little more tweaking, I think I'm satisfied with the changes to AudioMixer, so the patch is attached.
In summary, the updates are as follows:
MediaEventLooper:
In the original, the latency was subtracted from "now" before checking against the first event start time, so that could never be seen as late! Latency is now added to the current time as it should be.
(The actual calculation of "lateness" was at one point moved to that point at the beginning of the loop, but it seems better that the true lateness when the buffer is returned from the loop be provided, so this check is still done after wakeup, as it was originally.)
AudioMixer:
Checking the lateness of incoming buffers -- and notifying the producer -- is now done in BufferReceived, rather than when the buffer is returned from the loop in HandleInputBuffer. There is a new test in the latter for actual lateness ( wrt downstream latency -- not the total event latency) which will adjust the Mixer's internal latency. LateNoticeReceived will also adjust internal latency, but as far as I can tell is never invoked!
In all cases, lateness is only acted upon if it is an initial report, as multiple buffers following in the queue will also be flagged late. The fLastNotification variable has been redirected for this purpose, as its original use in LateNoticeReceived seemed to be pointless.
follow-up: 13 comment:12 by , 14 years ago
Replying to Pete:
MediaEventLooper:
In the original, the latency was subtracted from "now" before checking against the first event start time, so that could never be seen as late!
Good catch!
Latency is now added to the current time as it should be. (The actual calculation of "lateness" was at one point moved to that point at the beginning of the loop, but it seems better that the true lateness when the buffer is returned from the loop be provided, so this check is still done after wakeup, as it was originally.)
Due to the violation of the seamless latency computation it is still incorrect, though. If the behavior is worse with the correct implementation, there's obviously something else wrong. Since after with your changes the mixer completely ignores the value computed here, the issue is probably in a different node up- or downstream.
AudioMixer:
Checking the lateness of incoming buffers -- and notifying the producer -- is now done in BufferReceived, rather than when the buffer is returned from the loop in HandleInputBuffer.
Well, that is a work-around for the incorrect lateness computation in BMediaEventLooper::ControlLoop()
.
There is a new test in the latter for actual lateness ( wrt downstream latency -- not the total event latency) which will adjust the Mixer's internal latency.
While it doesn't harm, I don't think this is necessary either. I would rather see the lateness be computed correctly in BMediaEventLooper
and handled here instead of in BufferReceived()
. That only leaves the wait itself, which should be covered by the scheduling latency. Though I wouldn't bother adjusting it, but rather let the internal latency compensate. The internal latency could be checked after the mixing has happened (i.e. whether the lateness after mixing is greater than the greatest lateness of the processed input buffers (using the lateness value computed by BMediaEventLooper
)). Doing that isn't strictly necessary either, since we should get a late notice when the buffer arrives late downstream.
LateNoticeReceived will also adjust internal latency, but as far as I can tell is never invoked!
That's bad and should be fixed.
In all cases, lateness is only acted upon if it is an initial report, as multiple buffers following in the queue will also be flagged late. The fLastNotification variable has been redirected for this purpose, as its original use in LateNoticeReceived seemed to be pointless.
The purpose of fLastLateNotification
is to prevent a reaction to a late notice before the latency adjustment caused by the previous late notice could have taken effect. Looking at the code I'd say the value it is currently set to (TimeSource()->Now()
) should be increased by the downstream and internal latency, since any buffer that already has been or is currently being processed won't benefit from the latency change. So in that your change is correct, but the respective change in HandleInputBuffer()
seems superfluous at best.
Regarding your comment in BufferReceived()
:
312 // I believe this is pointless here, because the relevant run mode is the source, mot the mixer! 313 // if (RunMode() == B_DROP_DATA || RunMode() == B_DECREASE_PRECISION 314 // || RunMode() == B_INCREASE_LATENCY) {
Your assumption is not correct. Whether the mixer needs the buffers in time depends on the mixer's run mode, not that of its source. Whether the source chooses to ignore the notice is another matter.
Regarding coding style: The assignment operator should be treated like any other operator when wrapping the line (i.e. break before). Local variable names are camel-case.
follow-up: 15 comment:13 by , 14 years ago
Replying to bonefish:
Replying to Pete:
(The actual calculation of "lateness" was at one point moved to that point at the beginning of the loop, but it seems better that the true lateness when the buffer is returned from the loop be provided, so this check is still done after wakeup, as it was originally.)
Due to the violation of the seamless latency computation it is still incorrect, though. If the behavior is worse with the correct implementation, there's obviously something else wrong. Since after with your changes the mixer completely ignores the value computed here, the issue is probably in a different node up- or downstream.
It was not "worse" -- just ineffective. My thesis is that there is no possible "correct implementation" of lateness calculation in ControlLoop()!
Let's suppose a buffer arrives from the producer "on time" based on the buffer's start time and current latencies. It gets put in the looper's input queue, and the looper is immediately woken up (by a message) to handle it. The appropriate wakeup moment is computed from the current time, the looper's (aka mixer's) event latency, and the desired start time, and the loop goes to sleep (supposedly) for that period. "Lateness" can be computed -- from the same parameters -- either at the same point as the sleep time (as I tried), or when the loop wakes up (as it was originally and is now).
Now let's suppose a blackout occurs, resulting in a late wakeup. Then using a lateness calculated after the wakeup to affect upstream latency is obviously wrong, because the delay in the loop gets reported as if it was upstream. However, the other setup is no better, because if buffer duration is short there will be other buffers in the queue; a late wakeup will mean that -- though the first buffer will not be "late" because its lateness was determined before the wait -- the next buffer is looked at late when the loop returns to the top after the wait, so computing lateness at that point has just the same result: again "lateness" will be detected and the cause will erroneously be attributed to the producer. (If buffer duration is longer than the latency, the early test does work, because the loop goes into infinite timeout after passing on a buffer, and is woken by the next one's arrival.)
As I noted before, there's no way to stop this happening for every blackout, so the producer's latency will keep on increasing until it runs out of buffers. This means that lateness reported by the looper can never be relied upon for notifying the producer, however or wherever it is determined. As it can't be used for this, it seems more logical for it to represent the actual lateness of buffers when delivered, in case this might be useful.
[Hope all that's clear... I've had great difficulty in translating my observations into words!]
AudioMixer:
Checking the lateness of incoming buffers -- and notifying the producer -- is now done in BufferReceived, rather than when the buffer is returned from the loop in HandleInputBuffer.Well, that is a work-around for the incorrect lateness computation in
BMediaEventLooper::ControlLoop()
.
See above... (:-)) Why do you think this is the wrong place to do the check? Upstream latency should only be changed when the producer is late (and blackouts do also happen in the producer's looper). So testing when the buffer actually arrives at the mixer seems the obvious (and correct!) place to do the check.
There is a new test in the latter for actual lateness ( wrt downstream latency -- not the total event latency) which will adjust the Mixer's internal latency.
While it doesn't harm, I don't think this is necessary either. I would rather see the lateness be computed correctly in
BMediaEventLooper
and handled here instead of inBufferReceived()
.
Again, see above... In fact, I think this would properly be dealt with by LateNoticeReceived, but as I say I didn't seem to be seeing any activity there. I felt a "belt and braces" approach was safer for the moment. As you say, it does no harm (and works in practice...).
That only leaves the wait itself, which should be covered by the scheduling latency. Though I wouldn't bother adjusting it, but rather let the internal latency compensate. The internal latency could be checked after the mixing has happened (i.e. whether the lateness after mixing is greater than the greatest lateness of the processed input buffers (using the lateness value computed by
BMediaEventLooper
)).
Not sure exactly what you're suggesting here. Note, though, that using the looper's lateness to calculate internal latency is no use either, because the looper only knows about (total) event latency (and scheduling latency). If you adjust internal latency and thus event latency, all calculations come out just as before (it will compute both sleep time and lateness based on the event latency, so short buffers especially can still generate spurious lateness). So each blackout will incur another bump.
LateNoticeReceived will also adjust internal latency, but as far as I can tell is never invoked!
That's bad and should be fixed.
I'm not exactly sure of the flow sequence here. I think MixerInput is the next in line? I presume MultiAudio gets the actual mixer output, but that's probably too far down the line.
In all cases, lateness is only acted upon if it is an initial report, as multiple buffers following in the queue will also be flagged late. The fLastNotification variable has been redirected for this purpose, as its original use in LateNoticeReceived seemed to be pointless.
The purpose of
fLastLateNotification
is to prevent a reaction to a late notice before the latency adjustment caused by the previous late notice could have taken effect. Looking at the code I'd say the value it is currently set to (TimeSource()->Now()
) should be increased by the downstream and internal latency, since any buffer that already has been or is currently being processed won't benefit from the latency change. So in that your change is correct, but the respective change inHandleInputBuffer()
seems superfluous at best.
("Now"+downstream+internal) should be approximately the same as the buffer start time by my calculation, so I chose that for convenience.
The HandleInputBuffer()
code section is intended to compensate for queued buffers all being affected by a single blackout, but possibly should only last for the reported lateness, rather than the entire latency period. I originally added a separate variable to the class structure, but decided that the existing fLastNotification was adequate.
Regarding coding style: The assignment operator should be treated like any other operator when wrapping the line (i.e. break before). Local variable names are camel-case.
OK -- thanks. Will fix.
comment:14 by , 14 years ago
I need to correct myself on the reason for the queuing of buffers and the correct setting of fLastNotification. Of course it's not any of the actual latencies that determines how many buffers are queued, but the extra upstream latency. If everything was running like precise clockwork, buffers would arrive from the producer just at the right point to be passed on from ControlLoop(), and they'd never accumulate in the queue. In reality, a blackout will cause an increase in the producer's latency,and -- as that was transient -- henceforth buffers will arrive early, and need to be queued.
This means that using EventLatency() to calculate fLastNotification is not really correct, though as latency always seems to be greater than the maximum lateness it works in practice. I'm not sure what the best way of determining this is. Maybe it could keep a count of the (approximate) number of buffers in the queue, and multiply the buffer duration by this to get the needed inhibitory delay.
follow-ups: 16 45 comment:15 by , 14 years ago
Replying to Pete:
Replying to bonefish:
Replying to Pete:
(The actual calculation of "lateness" was at one point moved to that point at the beginning of the loop, but it seems better that the true lateness when the buffer is returned from the loop be provided, so this check is still done after wakeup, as it was originally.)
Due to the violation of the seamless latency computation it is still incorrect, though. If the behavior is worse with the correct implementation, there's obviously something else wrong. Since after with your changes the mixer completely ignores the value computed here, the issue is probably in a different node up- or downstream.
It was not "worse" -- just ineffective. My thesis is that there is no possible "correct implementation" of lateness calculation in ControlLoop()!
[...]
Now let's suppose a blackout occurs, resulting in a late wakeup. Then using a lateness calculated after the wakeup to affect upstream latency is obviously wrong, because the delay in the loop gets reported as if it was upstream. However, the other setup is no better, because if buffer duration is short there will be other buffers in the queue; a late wakeup will mean that -- though the first buffer will not be "late" because its lateness was determined before the wait -- the next buffer is looked at late when the loop returns to the top after the wait, so computing lateness at that point has just the same result: again "lateness" will be detected and the cause will erroneously be attributed to the producer. (If buffer duration is longer than the latency, the early test does work, because the loop goes into infinite timeout after passing on a buffer, and is woken by the next one's arrival.)
Agreed.
AudioMixer:
Checking the lateness of incoming buffers -- and notifying the producer -- is now done in BufferReceived, rather than when the buffer is returned from the loop in HandleInputBuffer.Well, that is a work-around for the incorrect lateness computation in
BMediaEventLooper::ControlLoop()
.See above... (:-)) Why do you think this is the wrong place to do the check? Upstream latency should only be changed when the producer is late (and blackouts do also happen in the producer's looper). So testing when the buffer actually arrives at the mixer seems the obvious (and correct!) place to do the check.
I think any place outside libbe is suboptimal for the lateness computation. BMediaEventLooper already provides a lateness value. To let that be computed incorrectly and have the API user compute it themselves is not a solution.
I have nothing against basing the lateness computation on a time stamp from the receiving thread. Unfortunately the event doesn't go through the BMediaEventLooper
, but is added directly to the event queue. A reasonable solution would be to add a queued_time
field to media_timed_event
, filled in when the event is added. That means BTimedEventQueue
would need to allow for optionally specifying a BTimeSource
. The actual lateness computation would still happen in BMediaEventLooper::ControlLoop()
, just based on the event's queued_time
.
There is a new test in the latter for actual lateness ( wrt downstream latency -- not the total event latency) which will adjust the Mixer's internal latency.
While it doesn't harm, I don't think this is necessary either. I would rather see the lateness be computed correctly in
BMediaEventLooper
and handled here instead of inBufferReceived()
.Again, see above...
You too. :-)
In fact, I think this would properly be dealt with by LateNoticeReceived, but as I say I didn't seem to be seeing any activity there. I felt a "belt and braces" approach was safer for the moment.
We don't want something that works just because we have added enough work-arounds. We want it to work because it is correct. In fact adding work-arounds usually makes it more complicated to find the actual problems.
As you say, it does no harm (and works in practice...).
No-ops don't harm either. Still we don't want them in the code.
That only leaves the wait itself, which should be covered by the scheduling latency. Though I wouldn't bother adjusting it, but rather let the internal latency compensate. The internal latency could be checked after the mixing has happened (i.e. whether the lateness after mixing is greater than the greatest lateness of the processed input buffers (using the lateness value computed by
BMediaEventLooper
)).Not sure exactly what you're suggesting here.
I'm suggesting that, for the time being, all that is needed is the lateness handling in HandleInputBuffer()
. If you or anyone else feels motivated, one could add comparisons of the actual buffer processing times with the internal latency and increase the latter, if necessary. Due to the mixer's nature of having multiple inputs with possibly different formats and buffer durations, this might turn out to be a bit complicated. I'm also not sure whether this would actually improve things in practice.
Note, though, that using the looper's lateness to calculate internal latency is no use either, because the looper only knows about (total) event latency (and scheduling latency). If you adjust internal latency and thus event latency, all calculations come out just as before (it will compute both sleep time and lateness based on the event latency, so short buffers especially can still generate spurious lateness). So each blackout will incur another bump.
I don't follow. "All calculations" certainly don't come out the same: The wakeup time will be earlier, which is the whole point of increasing the internal latency. The lateness has nothing to do with our internal latency (other than shifting the requested buffer arrival times), so, of course, it shouldn't be affected. Regarding the blackout, I thought we agreed on that: As long as the latency/lateness computation is done seamlessly without attributing wait times (be it actual wait times or times where the thread is busy with another event) to the upstream latency the blackout time will increase the right latencies until they completely compensate for blackouts. And computing the lateness as suggested (based on a time stamp in BTimedEventQueue::AddEvent()
), that will be the case:
- A blackout anytime before getting the time stamp in
BTimedEventQueue::AddEvent()
will increase the lateness and will cause the upstream node's latency to be increased until buffers arrive up to a full blackout time early. Another blackout will simply result in 0 lateness. - A blackout anytime after getting the time stamp -- be it even before actually queuing the event or anywhere in the looper thread -- will not affect the computed lateness and will thus not increase the upstream node's latency. The buffer will be processed and sent downstream where it will possibly arrive late, which will cause a late notice and an increase of the internal latency. Eventually the internal latency will have been increased by a full blackout time. Hence the looper will schedule the wakeup time a full blackout time earlier. Thereafter:
- A blackout before starting to wait will simply cause the event to be pulled out of the queue without wait at wakeup time + blackout time. Since the internal latency includes a full blackout time, there's still enough time to process and send the buffer so that it arrives in time.
- A blackout while or immediately after waiting will cause the event to be pulled out of the queue at the latest at wakeup time + blackout time, so, again, in time for the downstream node.
- In case of a later blackout the event will be pulled out of the queue at wakeup time, so there's a full additional blackout time left for processing and sending the buffer, which compensates for the blackout.
So in neither case latencies will be increased unboundedly.
LateNoticeReceived will also adjust internal latency, but as far as I can tell is never invoked!
That's bad and should be fixed.
I'm not exactly sure of the flow sequence here. I think MixerInput is the next in line?
No, that's just part of the mixer. The whole mixer is a single node. Next upstream you usually already have the audio producer.
I presume MultiAudio gets the actual mixer output, but that's probably too far down the line.
Nope, that's usually already the next one downstream. Just check with Cortex.
In all cases, lateness is only acted upon if it is an initial report, as multiple buffers following in the queue will also be flagged late. The fLastNotification variable has been redirected for this purpose, as its original use in LateNoticeReceived seemed to be pointless.
The purpose of
fLastLateNotification
is to prevent a reaction to a late notice before the latency adjustment caused by the previous late notice could have taken effect. Looking at the code I'd say the value it is currently set to (TimeSource()->Now()
) should be increased by the downstream and internal latency, since any buffer that already has been or is currently being processed won't benefit from the latency change. So in that your change is correct, but the respective change inHandleInputBuffer()
seems superfluous at best.("Now"+downstream+internal) should be approximately the same as the buffer start time by my calculation, so I chose that for convenience. The
HandleInputBuffer()
code section is intended to compensate for queued buffers all being affected by a single blackout,
To clarify what I mean: This code is insufficient, because only the late notice receiver can really decide whether an internal latency change due to a previous late notice happened before or after processing the buffer causing the current late notice. And since the late notice receiver has to do that anyway (haven't checked whether it actually does in this case), the code in HandleInputBuffer()
is superfluous.
Apropos, latency change: I noticed that BMediaEventLooper::SetEventLatency()
does not wake up the looper thread, which means that the event currently waited on will not be affected by the latency change.
follow-up: 17 comment:16 by , 14 years ago
Replying to bonefish:
Replying to Pete:
I think any place outside libbe is suboptimal for the lateness computation.
Actually libmedia, but... (:-))
I have nothing against basing the lateness computation on a time stamp from the receiving thread. Unfortunately the event doesn't go through the
BMediaEventLooper
, but is added directly to the event queue. A reasonable solution would be to add aqueued_time
field tomedia_timed_event
, filled in when the event is added. That meansBTimedEventQueue
would need to allow for optionally specifying aBTimeSource
. The actual lateness computation would still happen inBMediaEventLooper::ControlLoop()
, just based on the event'squeued_time
.
This sounds as if it would work (haven't got near trying it yet).
It is in fact very annoying (to me) that BTimedEventQueue
is so divorced from the BMediaEventLooper
that uses it. There should have been a BMediaEventLooper::AddEvent(...)
(and ::AddRealtimeEvent
) that could handle things like waking up the loop -- and the above idea for a queued_time
. (I long ago found that a modifiable queue of events -- MIDI in my case -- had to be intimately connected to the loop that was reading it.) It seems wrong to me that the wakeup for an event is done by SendBuffer(...)
.
I wonder if a separate BTimeSource
is actually needed? Although "RealTime" is nominally distinct from "system_time", they have been identical up to this point; is there any reason they ever would be different? [If so, why? I've never understood Be's reasoning here.]
If there is ever such a possibility, it might be a good idea to add a "RealTimeFromSystem(bigtime_t systime)
" method to BTimeSource
, anyway.
If queued_time
can just be from "system_time()
", things would be simpler.
I don't follow. "All calculations" certainly don't come out the same:
I think we're talking at cross purposes here. I won't pursue it for now.
I'm not exactly sure of the flow sequence here. I think MixerInput is the next in line?
No, that's just part of the mixer. The whole mixer is a single node. Next upstream you usually already have the audio producer.
What I was trying to figure out is what entity would detect and report lateness from the mixer's MediaEventLooper
. I'd have thought this would have to happen before actual mixing of the streams. (I'm afraid I'm completely lost looking at MixerInput.cpp
.)
I presume
MultiAudio
gets the actual mixer output, but that's probably too far down the line.Nope, that's usually already the next one downstream. Just check with Cortex.
(What I see in Cortex is labelled "Auich",but I inferred that to be actually MultiAudio
.) Again, what I meant is that I'd think lateness would have to be noticed before that.
Apropos, latency change: I noticed that
BMediaEventLooper::SetEventLatency()
does not wake up the looper thread, which means that the event currently waited on will not be affected by the latency change.
Good point. Does this require yet one more port signal code? I don't see any "CONSUMER_...
" code that looks appropriate. Early on, when I thought I might need to wake up the looper myself, I wondered if there should be a generic "WAKEUP
" code -- perhaps just 0. Is this a bad idea?
follow-up: 18 comment:17 by , 13 years ago
Replying to Pete:
Replying to bonefish:
Replying to Pete:
I wonder if a separate
BTimeSource
is actually needed? Although "RealTime" is nominally distinct from "system_time", they have been identical up to this point; is there any reason they ever would be different? [If so, why? I've never understood Be's reasoning here.]
system_time()
is called often and thus based on a cheap (i.e. taking few CPU cycles to read) clock. I believe Be's argument was that a more accurate and steady (but more expensive) clock might be available in the system and preferable for media use.
If there is ever such a possibility, it might be a good idea to add a "
RealTimeFromSystem(bigtime_t systime)
" method toBTimeSource
, anyway.
I suppose the other way around is of more interest, e.g. to compute wake-up times which normally need the system time.
If
queued_time
can just be from "system_time()
", things would be simpler.
BTimeSource::RealTime()
is static anyway. I was rather thinking of setting queued_time
to the performance time in case of a performance time queue, but I guess that doesn't even make that much sense, as one would need to use the real time to compute the lateness anyway. So never mind the BTimeSource
idea.
I'm not exactly sure of the flow sequence here. I think MixerInput is the next in line?
No, that's just part of the mixer. The whole mixer is a single node. Next upstream you usually already have the audio producer.
What I was trying to figure out is what entity would detect and report lateness from the mixer's
MediaEventLooper
. I'd have thought this would have to happen before actual mixing of the streams. (I'm afraid I'm completely lost looking atMixerInput.cpp
.)
I don't see why you'd want to compute another lateness value in the mixer. We compute the lateness of the buffer arrival which is used to adjust the previous node's latency and the next node will compute the lateness which is used to adjust our latency. Since there's only one latency value per node we can change, that does fully suffice.
Apropos, latency change: I noticed that
BMediaEventLooper::SetEventLatency()
does not wake up the looper thread, which means that the event currently waited on will not be affected by the latency change.Good point. Does this require yet one more port signal code? I don't see any "
CONSUMER_...
" code that looks appropriate. Early on, when I thought I might need to wake up the looper myself, I wondered if there should be a generic "WAKEUP
" code -- perhaps just 0. Is this a bad idea?
A separate code for waking up was what I was thinking of, too. Some prefix would be nice, though. Maybe NODE_WAKE_UP
or EVENT_LOOPER_WAKE_UP
.
Another related issue I noticed is that HandleInputBuffer()
should set fLastLateNotification
too when it sends out a late notice. Unless the buffer is dropped completely, our downstream node might send us a late notice later and we'd otherwise adjust our own latency.
comment:18 by , 13 years ago
Replying to bonefish:
Replying to Pete:
If there is ever such a possibility, it might be a good idea to add a "
RealTimeFromSystem(bigtime_t systime)
" method toBTimeSource
, anyway.I suppose the other way around is of more interest, e.g. to compute wake-up times which normally need the system time.
If
queued_time
can just be from "system_time()
", things would be simpler.
BTimeSource::RealTime()
is static anyway.
Oh, duhh... Never noticed that. Makes things totally simple.
I was rather thinking of setting
queued_time
to the performance time in case of a performance time queue, but I guess that doesn't even make that much sense, as one would need to use the real time to compute the lateness anyway. So never mind theBTimeSource
idea.
I neglected to mention that I was assuming queued_time
would always be system/RealTime. Seems to make more sense to convert to performance time if/when needed.
I don't see why you'd want to compute another lateness value in the mixer. We compute the lateness of the buffer arrival which is used to adjust the previous node's latency and the next node will compute the lateness which is used to adjust our latency. Since there's only one latency value per node we can change, that does fully suffice.
I was uncertain as to the complexity when the mixer combines incoming buffers into a possible different-sized output buffer. I thought that some further lateness calculation might be needed at that point.
Apropos, latency change: I noticed that
BMediaEventLooper::SetEventLatency()
does not wake up the looper thread, which means that the event currently waited on will not be affected by the latency change.Good point. Does this require yet one more port signal code? I don't see any "
CONSUMER_...
" code that looks appropriate. Early on, when I thought I might need to wake up the looper myself, I wondered if there should be a generic "WAKEUP
" code -- perhaps just 0. Is this a bad idea?A separate code for waking up was what I was thinking of, too. Some prefix would be nice, though. Maybe
NODE_WAKE_UP
orEVENT_LOOPER_WAKE_UP
.
OK. Is just using '0' as the value reasonable? I'm thinking of a code that will specifically never have any action other than ending the WaitUntil
.
Another related issue I noticed is that
HandleInputBuffer()
should setfLastLateNotification
too when it sends out a late notice. Unless the buffer is dropped completely, our downstream node might send us a late notice later and we'd otherwise adjust our own latency.
Yes, I was doing that in the various experimental code versions. I now think that the setting should depend on the actual lateness, rather than the latency, but experimentally I find that the observed value of that is sometimes a little less that needed to cover all the spurious notices. It also dawned on me though that each successive (spurious) notification is less that the one preceding, so all one has to do is ignore notifications with decreasing values.
I'll embark on these ideas,and see how it goes.
follow-up: 20 comment:19 by , 13 years ago
I think I need a little clarification on one thing before I get deep into recoding. It looks to me as if MediaEventLooper
doesn't quite have the correct idea of timing -- or maybe its me...
My understanding of "performance time" is that it could be entirely different from "real time", even though at the moment it seems to just be microseconds with its own start point. For instance a performance time "unit" might be a 44.1 kHz audio frame.
If so, one can't just adjust a performance time by subtracting a latency from a performance time (which MediaEventLooper
does), as latency is always in microsecs, right? And reported lateness should also always be microsecs, too.
If my thinking is correct, I'll make the necessary corrections to the code while I'm at it.
follow-up: 21 comment:20 by , 13 years ago
Replying to Pete:
My understanding of "performance time" is that it could be entirely different from "real time", even though at the moment it seems to just be microseconds with its own start point. For instance a performance time "unit" might be a 44.1 kHz audio frame.
No, performance time is always in microseconds.
If so, one can't just adjust a performance time by subtracting a latency from a performance time (which
MediaEventLooper
does), as latency is always in microsecs, right? And reported lateness should also always be microsecs, too.
All times (real or performance, latencies and latenesses) are in microseconds.
Regarding adding/subtracting latencies to/from performance time values, theoretically one shouldn't mix a real time latency with a performance time, as there can be a "drift" between the two measures. In practice that usually doesn't matter all that much as the difference is commonly no more than a few percent and computations involving latencies rarely need to be that exact anyway. Still, I would avoid this kind of mixing.
follow-up: 22 comment:21 by , 13 years ago
Replying to bonefish:
Replying to Pete:
My understanding of "performance time" is that it could be entirely different from "real time", even though at the moment it seems to just be microseconds with its own start point. For instance a performance time "unit" might be a 44.1 kHz audio frame.
No, performance time is always in microseconds.
Are youo sure (that that's what Be intended)? The BeBook is either pretty explicit or pretty confused (in the BTimeSource chapter):
"Real time refers to the actual passage of time, as reported by system_time() or the BTimeSource::RealTime() function. It's measured in microseconds.
Performance time runs in "time units" which aren't necessarily directly related to real time. Since your code will have to deal with both kinds of time, you need to be sure to convert between the two time systems when it's necessary to do so."
They make a clear distinction there between microseconds and "time units".
In any case, I can do it "right" for either interpretation.
follow-up: 23 comment:22 by , 13 years ago
Replying to Pete:
Replying to bonefish:
No, performance time is always in microseconds.
Are youo sure (that that's what Be intended)? The BeBook is either pretty explicit or pretty confused (in the BTimeSource chapter): ...
They make a clear distinction there between microseconds and "time units".
I'm too lazy to search whether "time units" is explained somewhere. I'd simply guess they wanted to point out that 1 performance time unit does not exactly equal 1 real time microsecond (due to a possible drift).
At any event a performance time unit must denote some fixed length of time. It cannot be some format dependent value (e.g. 1/44100s) as you suggested, since the format may be different at the beginning and the end of the media node chain (e.g. 44.1kHz CD player and 96kHz sound card), but still both ends must use the same time source. Theoretically some behind-the-scenes conversion could take place, but that would just complicate things without any benefit I see. Also, a good indicator is that bigtime_t
is used to store performance time values, which is quite consistently only used for microsecond values.
follow-up: 24 comment:23 by , 13 years ago
Replying to bonefish:
Replying to Pete:
Replying to bonefish:
No, performance time is always in microseconds.
Are you sure (that that's what Be intended)? The BeBook is either pretty explicit or pretty confused (in the BTimeSource chapter): ...
They make a clear distinction there between microseconds and "time units".I'm too lazy to search whether "time units" is explained somewhere. I'd simply guess they wanted to point out that 1 performance time unit does not exactly equal 1 real time microsecond (due to a possible drift).
Well I decided not to be lazy (:-)) and did a little more digging. I think it's quite clear what Be intended...
In PublishTime():
" drift indicates the value which indicates the rate at which the performance time changes compared to real time.
The drift value makes it possible to interpolate intermediate values. For instance, if playback of a video source is progressing at normal speed, the drift value would be 1.0, indicating that performance time and real time progress at the same rate.
However, if the movie is playing at half-speed, drift would be 0.5, so that for every one unit of real time that passes, only a half-unit of performance time would pass. This information is used to compute times without having to query the time source each time an update must occur."
In SnoozeUntil():
"Because performanceTime is specified in performance time, and withLatency is specified in real time, you can't just subtract withLatency microseconds from performanceTime and snooze until that time, because performanceTime may be passing at a rate faster than or slower than real time (if, for example, the time source is running at double-speed)."
At any event a performance time unit must denote some fixed length of time. It cannot be some format dependent value (e.g. 1/44100s) as you suggested, since the format may be different at the beginning and the end of the media node chain (e.g. 44.1kHz CD player and 96kHz sound card), but still both ends must use the same time source.
My interpretation is that all nodes in a chain would use the same time source, but the basic rate can be chosen to suit a need (which might be nothing like the usual audio or video chain, if you wanted, I guess).
Also, a good indicator is that
bigtime_t
is used to store performance time values, which is quite consistently only used for microsecond values.
It would have to be 64-bit, and I suppose they just didn't think a separate type was justified.
Would be nice if we could collar Jon Watte and ask what he really meant (:-)), but in the meantime I'm going with my interpretation!
follow-up: 26 comment:24 by , 13 years ago
Replying to Pete:
Would be nice if we could collar Jon Watte and ask what he really meant (:-)), but in the meantime I'm going with my interpretation!
I'm not sure what your interpretation is, but I have the feeling it is not correct. It really isn't complicated and the paragraphs you quoted support what I already wrote before: The performance time unit is microseconds, just not real time microseconds but performance time microseconds (the conversion factor is the "drift").
The BeBook quotes seem to suggest that one could somehow influence the rate at which the performance time passes and use that to play audio/video faster or slower. AFAIK that is not (easily) possible, since BTimeSource does not provide an API to do something like that, nor does any other API of the media kit to my knowledge. You'd have to create your own BTimeSource subclass and would find it troublesome (impossible?) to convince audio mixer and other downstream nodes to use your time source.
comment:25 by , 13 years ago
Cc: | added |
---|
comment:26 by , 13 years ago
Replying to bonefish:
I'm not sure what your interpretation is, but I have the feeling it is not correct. It really isn't complicated and the paragraphs you quoted support what I already wrote before: The performance time unit is microseconds, just not real time microseconds but performance time microseconds (the conversion factor is the "drift").
"My interpretation" is just that you can't -- at least in the most general case -- rely on performance time being anything close to microseconds, although it currently always happens to be. The quoted "drift of 0.5" means the time-units are each 2 microsecs long.
I fully recognize that the audio chain always works on "microseconds", even though they're not exact (the 'drift' on my machine is ~0.999975), and therefore things like the mixer don't really have to be concerned. (I was a bit silly to suggest a 44kHz rate, as all drivers really have to get as close to microsecs as they can to keep things clean.) However, if other rates are possible as the BeBook states, general-purpose modules like MediaEventLooper have to be sure they don't assume this. If they follow the Real/Performance rules in the book, they'll be correct anyway. In the current revision, I've made sure they do.
The BeBook quotes seem to suggest that one could somehow influence the rate at which the performance time passes and use that to play audio/video faster or slower. [....] You'd have to create your own BTimeSource subclass and would find it troublesome (impossible?) to convince audio mixer and other downstream nodes to use your time source.
I think this was exactly the intention. The flexibility is only needed for custom-written BTimeSource classes, for custom media-chains, (of which I guess there are none at the moment). If you wanted to feed such into the standard audio chain, you'd need to do a conversion between time sources.
by , 13 years ago
Attachment: | media_mixer.2.diff added |
---|
2nd version of the patch to fix latency problem
comment:27 by , 13 years ago
I've attached a new patch (entirely obsoleting the previous one), incorporating the suggestions.
It seems to work as intended, except for very short buffers, which will still glitch -- I think because increasing the latency means there are no longer enough buffers in the pool. Fortunately, short buffers don't seem to be necessary in Haiku (as they are in BeOS) for apps like Csound.
Note that in normal operation the AudioMixer does not get late notices from MultiAudio, simply because the run mode for that is set to B_RECORDING. This however is a Good Thing, because the latency checking in MultiAudio still uses code like that in the old MediEventLooper. (It does not use the lateness value from that!) If I try changing to "increase latency" with Cortex, we're in trouble again, because the mixer keeps bumping its latency until it breaks. This should be looked at some time, but at the moment blackouts at that point in the chain seem to be unnoticeable, so things work well anyway.
Any chance of getting this into alpha-3, so that I won't need to have a custom build? (:-)) [Also my USB_MIDI updates -- #4463 -- which should help others besides myself...]
follow-up: 29 comment:28 by , 13 years ago
Replying to Pete:
First, MediaEventLooper and TimedEventQueue between them have a couple of bugs. There is a sign reversal in the test for lateness in ControlLoop (which when fixed reduces the problem but doesn't remove it), and the event queue suffers from a classic glitch ('classic' because I had to avoid it myself a long time ago...) in that if an event is placed in the queue while the loop is sleeping (maybe with B_INFINITE_TIMEOUT) the loop doesn't get woken up!
I don't see a sign reversal in the lateness calculation. "Fixing" this will just make it schedule buffers later instead of (event latency)-early as it's intended.
The queue won't get stuck because events are only placed in the queue from inside BMediaNode::WaitForMessage, that means other nodes HandleMessage.
Each time after WaitForMessage is processed, the first entrys in both queues are checked to account for modifications to the queue. Your patch completely removes this code. Why?
The patch only tags realtime events that have a lateness >0 as realtime. Thats broken.
The patch introduces a queued_time field, set on event creation. You even base the "lateness" calculation on that. But this is very strange. it should be possible to create 10 events, give them performance time stamps when they should be played, and place them into the queue. They are not late unless they get removed from the queue after their performance time has passed. Your lateness calculation makes them appear to be late, just because they were queued early.
In: "lateness = 0; remove any extraneous value if we get this far "
it appears your patch sets lateness always to zero before calling DispatchEvent, finally dropping the information how late the event is played. Why?
I can't recommend applying this patch.
comment:29 by , 13 years ago
Replying to marcusoverhagen:
Replying to Pete:
I don't see a sign reversal in the lateness calculation. "Fixing" this will just make it schedule buffers later instead of (event latency)-early as it's intended.
Marcus, PLEASE LOOK AGAIN!
An event popped from the queue is late if the difference between 'Now' and the scheduled performance time is less than the latency. One can check for lateness either by subtracting the latency from the performance time (to find the latest time the event can be handled without being late) and seeing if that is greater than the current time, or adding the latency to Now and comparing to the performance time.
Your original code was:
if (fEventQueue.HasEvents() && (TimeSource()->Now() - latency) >= fEventQueue.FirstEventTime()) {
Here, latency is subtracted from Now (getting a still earlier time) that can only be greater than the performance time if not only that has passed but it's latency microseconds after that!! (And many other things will have gone wrong before that could happen...)
The queue won't get stuck because events are only placed in the queue from inside BMediaNode::WaitForMessage, that means other nodes HandleMessage.
Yes. I'm sorry about that (an earlier misunderstanding that is not actually in the patch).
Each time after WaitForMessage is processed, the first entrys in both queues are checked to account for modifications to the queue. Your patch completely removes this code. Why?
Where is it removed? The order of things has been changed a bit, and I was able to compress things a bit, but the logic is the same.
The patch only tags realtime events that have a lateness >0 as realtime. Thats broken.
Again -- please look again! It sets the realtime flag either when it breaks because of a late event or when it resets waituntil
for realtime.
The patch introduces a queued_time field, set on event creation. You even base the "lateness" calculation on that. But this is very strange. it should be possible to create 10 events, give them performance time stamps when they should be played, and place them into the queue. They are not late unless they get removed from the queue after their performance time has passed. Your lateness calculation makes them appear to be late, just because they were queued early.
They are late if they are removed from the queue after (performance-time - downstream-latency) has passed! And they are not classified 'late' if they are queued early! They are late if they are queued late.
The whole scheme is because WaitForMessage sometimes returns 8-10 msec later than the scheduled waituntil
time. This meant that the looper thought that its upstream producer had supplied the event late when in fact it itself was the culprit. The result was that an "Increase Latency" message would be sent upstream, but it would not cure the situation -- the next WaitForMessage delay would again cause a lateness detection, and the cycle would repeat.
With the added queued_time
variable, you don't get the spurious lateness episodes, but an event that arrives late downstream because of the glitch will be detected there and reported back as appropriate.
(And I should probably note that this part was Ingo's idea... (:-/) I was originally doing a modified check in the mixer code itself, but I agree that this is better.)
In: "lateness = 0; remove any extraneous value if we get this far "
it appears your patch sets lateness always to zero before calling DispatchEvent, finally dropping the information how late the event is played. Why?
Again... No. A positive lateness will always cause a break before that statement is reached (and thus will be available to DispatchEvent). The zeroing is necessary because lateness will normally be negative (and irrelevant), so it is just removed.
I can't recommend applying this patch.
Please evaluate the code properly, and think again. Without it, Haiku is quite unuseable for real-time music.
comment:31 by , 13 years ago
Fwiw, hrev43164 definitely fixes sound in VMware (using OpenSound). It used to be very stuttery before, but now the playback quality is flawless as far as my ears can tell.
comment:32 by , 13 years ago
I'm baffled. I had perfectly smooth, non-stuttering sound in VMware there for a while but now the quality is bad again after just doing a few more builds. Reverting back to hrev43164 doesn't help either, so please disregard my previous comment.
comment:33 by , 13 years ago
Many thanks to Adrien for applying the patch. However, as luroh has found, it doesn't necessarily fix things! At the bottom, I'm sure the problem is those darn 'blackouts' -- I suppose deep in the scheduler somewhere -- that can eventually cause the latency of a source to increase. The patch takes care of this in the mixer, but the blackouts can also affect the initial source, so if it's in "increase latency" mode the same thing can happen.
In the audio code that I am writing or have source for (such as Csound) I'm currently setting the mode to "recording", so even if blackouts occur latency doesn't grow. There are certainly still 'crackles' when there is any display activity, but it's tolerable -- at least for casual use.
follow-up: 35 comment:34 by , 13 years ago
Not sure if that qualifies as the same kind of blackout being discussed here, but the whole OS can be put to a long freeze by a simple trick (see #8007).. If that ticket was related to this ticket, well it would be too good to be true I guess? Indeed with a multi-second blackout (?) that is trivially simple to reproduce, and affects media_server as well as app_server and even the mouse pointer, it would make it easy to debug the whole thing, by dropping into KDL to see what's going on ...etc, right ?
Ignore me if that is entirely unrelated, but I'm posting this comment on a hunch that maybe it's not..
P.S. everybody, make sure to be in mono-processor mode when trying to reproduce #8007 (or even this ticket here for that matter)
comment:35 by , 13 years ago
Replying to ttcoder:
Not sure if that qualifies as the same kind of blackout being discussed here, but the whole OS can be put to a long freeze by a simple trick (see #8007).. If that ticket was related to this ticket, well it would be too good to be true I guess?
Thanks for that link! I've added a comment over there with a couple of observations from a remote Terminal that might be useful. Dunno if the two bugs are related, but I strongly suspect they are!
comment:36 by , 13 years ago
Blocked By: | 8007 added |
---|
follow-up: 39 comment:37 by , 13 years ago
Ok, tried running the patch under some stress tests... I'm not too happy with the results. The mode called "increase latency" is meant to increase the latency. The idea is to get smooth sound output in the end, and it does that by inccreasing the latency each time something is late, until it gets enough.
The fact that there are "black outs" or other problems somewhere in the system must result in getting a big enough latency, in the current situation it may reach several seconds under high load. With the patch this doesn't happen anymore :
- play some music with media player
- run some CPU intensive task - git pull, a build of haiku for example
- after some time the media decoding gets late and audio stops
Before patch:
- Latency was increased a lot and audio resumed from where it stopped, without more problems. This is the expected result : latency has increased, and now the nodes manage to process stuff properly.
After patch:
- Music will fast-forward to catch up, which is not the expected result in increase latency mode.
We need to have a closer look at things, but, if you want to keep short latency and have the sound skip to catch up on system delays, instead of increasing the latency until they aren't visible anymore, increase latency mode is not for you.
comment:38 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 41 comment:39 by , 13 years ago
Replying to pulkomandy:
Ok, tried running the patch under some stress tests...
I can't really reproduce what you're seeing.
With MediaPlayer, however hard I work the system, I can't get it to glitch, with or without the patch! Without the patch I can push the latency up to 163ms (which ISRT is a system-set limit) but I guess it has allocated enough buffer space to handle this (dunno how many buffers it allocates, but they're 40ms each). With the patch, I haven't managed to push the latency beyond ~30ms...
MediaPlayer is not quite typical, I think, because it doesn't use SoundPLayNode, and may have done some tricks of its own (haven't looked yet). Try something like MidiPlayer instead, which on my system goes mad almost immediately with an unpatched libmedia, but survives with the patch most of the time. Once it fails, though, it won't recover until it's quit and restarted.
We need to have a closer look at things, but, if you want to keep short latency and have the sound skip to catch up on system delays, instead of increasing the latency until they aren't visible anymore, increase latency mode is not for you.
You say that you heard the sound "catching up" with the patch, but I've never encountered this. As I say, once the sound gets garbled, you have to restart the app.
The problem with increase-latency mode is that unless you allocate a lot of extra buffers, it may increase (especially when unpatched!) beyond the available capacity. That's the trouble with the "ever-increasing latency that was the original topic! A limit is set somewhere in the chain (about 1/5 sec?), but most apps have too little buffer space to reach this.
You're right that with the current blackout problems, short latency and increase-latency mode are not compatible. (For my live MIDI stuff, I'm using B_RECORDING where I can,but that's not always possible to set -- for SqueekySynth for instance.) However, as MidiPlayer shows, the problem is wider.
Glad you've taken ownership. This needs a solution.
comment:40 by , 13 years ago
Maybe I forgot to tell I'm currently running a computer from 2003, which is quite a good thing for easy stress testing. I get complete system freezes by just running a Haiku build, which doesn't happen on my usual (and slightly newer) computer.
follow-up: 43 comment:41 by , 13 years ago
Replying to Pete:
You say that you heard the sound "catching up" with the patch, but I've never encountered this. As I say, once the sound gets garbled, you have to restart the app.
I don't see that here. When system load gets down, everything resumes as expected (with or without the patch). I get an occasional media_addon_server crash. Are you sure this part is not actually a sound card driver bug ?
The problem with increase-latency mode is that unless you allocate a lot of extra buffers, it may increase (especially when unpatched!) beyond the available capacity. That's the trouble with the "ever-increasing latency that was the original topic! A limit is set somewhere in the chain (about 1/5 sec?), but most apps have too little buffer space to reach this.
The latency shouldn't be ever-increasing, of course.
You're right that with the current blackout problems, short latency and increase-latency mode are not compatible. (For my live MIDI stuff, I'm using B_RECORDING where I can,but that's not always possible to set -- for SqueekySynth for instance.) However, as MidiPlayer shows, the problem is wider.
Can't you set the mode for it in Cortex ? Raising the thread priority may help too (and that would point to the problem actually being in scheduler world).
comment:42 by , 13 years ago
Blocked By: | 8007 removed |
---|
(In #8007) I don't see how #7285 is related. AFAICT it's about problems in the chain of media nodes involved and issues in the media kit. The other two tickets may or may not be related. They could as well just be caused by latencies not being guessed/computed generously enough. Without analyzing the scheduling behavior that's impossible to say.
@ttcoder: You found the right lock.h/cpp. If you're referring to the priority inheritance approach, there are a few things that need to be modified: In the Thread structure the contended locks the thread holds need to be tracked and accordingly the lock structures need to be adjusted. Obviously the locking functions themselves need to be modified. Moreover changes in the schedulers are needed as well. I haven't really thought about, whether another priority field is needed in the Thread structure or anything else.
I don't want to discourage you working on that, but please note that there's no documentation (other than what you find in the sources) how things work together and I don't really have time and motivation for introductory explanations (I suspect the same holds for others). If you have questions about particulars, feel free to ask.
follow-up: 44 comment:43 by , 13 years ago
Replying to pulkomandy:
Maybe I forgot to tell I'm currently running a computer from 2003, which is quite a good thing for easy stress testing. I get complete system freezes by just running a Haiku build, which doesn't happen on my usual (and slightly newer) computer.
My machine is pretty old, too -- at least 2003 I'd guess. I haven't seen it freeze on a compile, though -- only when what seems to be the #8007 priority-inversion is occurring.
I have now managed to get MediaPlayer to freeze (together with the cursor and everything else) by reading a big web page with WebPositive.
You say that you heard the sound "catching up" with the patch, but I've never encountered this. As I say, once the sound gets garbled, you have to restart the app.
I don't see that here. When system load gets down, everything resumes as expected (with or without the patch). I get an occasional media_addon_server crash.
On further investigation I do see strange behaviour differences with and without the patch. Without the patch, things do go back to normal (aside from permanently increased latency on a running MediaServer) when the freezing ends, but the presence of the patch tends to cause some kind of state after a freeze that, for instance, causes MediaPlayer to fail to close for a while, or Cortex to crash. It does come back to normal after MediaPlayer finally closes, though.
Are you sure this part is not actually a sound card driver bug ?
Can't be certain, of course, but all the symptoms -- like "buffer late" messages from the mixer -- seem to say it's earlier in the chain.
I still don't quite understand what you mean by "fast-forwarding". Do you mean that buffers are getting aborted part-way through to make way for the next? Didn't know that could happen; I thought they were always processed sequentially to completion (or else skipped completely). Certainly never heard it on my machine. (Unless the garbled sound I get on excess lateness is that. But it never "catches up" once it starts.)
(For my live MIDI stuff, I'm using B_RECORDING where I can,but that's not always possible to set...)
Can't you set the mode for it in Cortex ?
Yes, I can, but you have to catch it before there's been too much display activity, and touching Cortex is display activity! A pain to have to do it each time, too.
Raising the thread priority may help too (and that would point to the problem actually being in scheduler world).
I've tried playing with priorities at various times, with essentially no improvement. I'm sure the root problem is the #8007 glitch locking things out, though. If that can be fixed, all the above might be moot.
comment:44 by , 13 years ago
I don't see that here. When system load gets down, everything resumes as expected (with or without the patch). I get an occasional media_addon_server crash.
On further investigation I do see strange behaviour differences with and without the patch. Without the patch, things do go back to normal (aside from permanently increased latency on a running MediaServer) when the freezing ends, but the presence of the patch tends to cause some kind of state after a freeze that, for instance, causes MediaPlayer to fail to close for a while, or Cortex to crash. It does come back to normal after MediaPlayer finally closes, though.
A crash shouldn't happen, so the patch needs to be fixed.
Are you sure this part is not actually a sound card driver bug ?
Can't be certain, of course, but all the symptoms -- like "buffer late" messages from the mixer -- seem to say it's earlier in the chain.
I still don't quite understand what you mean by "fast-forwarding". Do you mean that buffers are getting aborted part-way through to make way for the next? Didn't know that could happen; I thought they were always processed sequentially to completion (or else skipped completely). Certainly never heard it on my machine. (Unless the garbled sound I get on excess lateness is that. But it never "catches up" once it starts.)
No, I also got the garbled sound once or twice - without a way out as you mention it. What I heard in this other case was likely a case of some buffers getting dropped somewhere, creating skips in the sound until there was some space somewhere ? Anyway, I guess the same root problem can have different effects on different hardware/drivers, which makes it a bit harder to analyze.
(For my live MIDI stuff, I'm using B_RECORDING where I can,but that's not always possible to set...)
Can't you set the mode for it in Cortex ?
Yes, I can, but you have to catch it before there's been too much display activity, and touching Cortex is display activity! A pain to have to do it each time, too.
Ok. I think the issue is rather on SqueekySynth side for this anyway : the app should use the "drop sample" mode if it's meant for real time performance.
I've tried playing with priorities at various times, with essentially no improvement. I'm sure the root problem is the #8007 glitch locking things out, though. If that can be fixed, all the above might be moot.
It likely doesn't help. Let's watch it...
comment:45 by , 11 years ago
Replying to bonefish:
A reasonable solution would be to add a
queued_time
field tomedia_timed_event
, filled in when the event is added.
I wonder whether saving the time of enqueue is really the right solution, considering BeOS did not have such a data field.
Considering what was also said by bonefish:
We don't want something that works just because we have added enough work-arounds. In fact adding work-arounds usually makes it more complicated to find the actual problems.
This rings very true and in the end, using queued_time
is just that: a work-around. The actual problem are the "blackouts" happening at all, which is a bug in either the kernel or a driver.
If code makes a call to read_port_etc()
with timeout, it is reasonable to expect that it will wake up in time and not be delayed 5-8ms by a randomly positioned blackout. So if the kernel's timing works correctly as specified by the API and drivers do not misbehave, the original version by Marcus is correct in calculating the lateness after the dequeue in BMediaLooper::ControlThread
. Without blackouts, simply nothing can happen to make the lateness wrong in that place.
I suppose in Be's version of the MediaKit it was simply assumed that timing works correctly. After all, the BeBook e.g. makes very clear in the Device Drivers section, that a driver must never disable interrupts for more than 50 microseconds.
So the question is: do we want to mask the problem of blackouts by adding queued_time
, or intentionally go without it to make the problem visible? I'd root for the latter, but would very much like to hear more opinions on it.
comment:46 by , 11 years ago
I think I have a a good solution which I will implement.
While I find working around the timing inaccuracy shouldn't be necessary, there's another thing I had not thought about when writing the previous comment: virtual machines. Those were slow and not popular in Be's days, but certainly are today. When running an OS in these, timekeeping is known to be less accurate. So it makes sense to be more resilient against timing inaccuracy in MediaKit and use queued_time
.
However, I wouldn't like to just hide the problem, we should generally keep an eye on keeping timing as accurate as possible, it's one of the things Haiku can excel at in a desktop system. Imagine the case of a user who wants to use Haiku for playing software-synthesizers via a MIDI keyboard. For a skilled player, 8ms additional latency can make all the difference in the world. Now if he has a system plagued by blackouts of up to 8ms, he would see his latency being increased and wonder why that happens and it would be hard to tell why the MediaKit decided to increase it.
The solution I want to go for is this: use the queued_time
for lateness calculation, that way the seamlessness constraint is always met, and even in virtual machines or with broken drivers, things do work (although with increased latency). But then, in BMediaEventLooper::ControlLoop
, we also calculate the lateness the old way, the way which expects timing to be accurate. We compare the two lateness values, and if there is a too high difference (say, more than 1ms), we write a message to the syslog saying e.g. MediaKit node abc is late, timing inaccuracy xyz us
. Since we always use the queued_time
for the late notice, the number of syslog message will be small and bounded, or we limit it to log the message only once per node instance. If a user asks "why is my latency high", the syslog gives the answer: timing is inaccurate, we don't have to search for a bug in MediaKit, the problem is elsewhere.
comment:47 by , 11 years ago
Owner: | changed from | to
---|---|
Version: | R1/alpha2 → R1/Development |
comment:48 by , 11 years ago
Unfortunately I don't have the time to dive any deeper into the topic ATM. So just a few general remarks: Obviously the "blackouts" are not good and should be fixed. That may be driver issues or it may be starvation or locking issues (just like or similar to the ones you have analyzed in the port code), or something else entirely.
Debug kernel builds (the default) have serial debug output enabled. Serial output happens synchronously and with interrupts disabled, i.e. a single line of output can easily result in several ms "blackout".
Anyway, even when the immediate cause for the high latencies is fixed, I think it will always be possible to put enough pressure on the system that high latencies ensue. So the media kit should be able to deal with it. Whatever the system latencies, in B_INCREASE_LATENCY
mode at some point the involved nodes' latencies should eventually be stable and still cushion all delays.
comment:49 by , 9 years ago
Pete what is the current status with the machine you mentioned in the ticket description? If you could attach the BSoundPlayer debug output with latencies i will be glad :-)
comment:50 by , 9 years ago
I'm afraid that machine [like Franco...] is still dead! Its hard drive died a couple of years ago, and I keep hoping to get around to replacing it, but...
My current laptop is HDA, and I can't reproduce the effect, even with the oldest system I could resurrect (hrev42415).
I've been trying the ToneProducer test app I used before [modified from the original -- I think just to add command options for buffer size etc.] on two more recent versions -- pre-PM 45851, which is still my day-to-day system, and 49167 from ~May. Neither shows runaway latency, though I do get buffer-late messages. Here's the sort of thing from 49167:
ToneProducer::NodeRegistered ToneProducer: looper running Preparing: no buffer size provided, suggesting 2048 Connect: downstream latency = 12.999 msec Buffers size 2048 with 1 chans have 512 samples Calc Internal Latency (buffer filling) = 33 usec SetEventLatency 13.032 msec (incl. added 0) TimeSource: real 152903403, perf 134741179, drift 0.999994 SetBufferDuration 11.609 msec AllocateBuffers: latency = 12999 usec, buffer duration = 11609 usec creating group of 3 buffers, size = 2048 SetRunMode to B_INCREASE_LATENCY(3) RunMode actually now B_INCREASE_LATENCY GetLatency: set total=13132 usec (Event=13032 + Sched=100) Started with RunMode=B_INCREASE_LATENCY Handle-buffer event for 134755720 (queued at 134742627) late by 40 Sending buffer 134767329 at 134754323 LATE by 28 (handled at 134754232) Sending buffer 134778939 at 134765935 LATE by 28 (handled at 134765846) Sending buffer 134790549 at 134777563 LATE by 47 (handled at 134777456) Sending buffer 134802159 at 134789148 LATE by 22 (handled at 134789060) Sending buffer 134813769 at 134800770 LATE by 34 (handled at 134800675) Sending buffer 134825379 at 134812360 LATE by 14 (handled at 134812272) Sending buffer 134836989 at 134823972 LATE by 17 (handled at 134823880) .....
The interesting thing is that with 45851, I don't see the repeated LATE messages! Just the initial "Handle buffer..." one.
If I reduce the buffer size to 64, I get the LATE messages on both systems, but in the PM one it sounds a bit rough, while it's still perfectly smooth in the earlier one. [BTW, I do use my modified HDA driver (#9134) on both systems -- seems fine.]
So things are still a bit confusing to me.
comment:51 by , 9 years ago
I didn't test it with your driver, but will do in the next days :-). Actually the newest rev you tested is before my changes. I'm trying to compare the results so that i can figure out what to do then. I think also in the meantime the two revs you are testing some bug might have been added, it would be helpful to verify this and see where's the problem.
comment:52 by , 8 years ago
This issue should be solved as of hrev50293, standing on both discussions and what I have get testing the nodes. Latency looks like stable now.
I'll leave this ticket open for a few time, if anyone has still concerns about that, then just let me know.
comment:53 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed as of hrev50603. There's not ever increasing latency under the tests I mentioned in the ml. The resulting behavior is a min/max lateness pattern that could be used to improve the code further while I think it's better to stick with the simple one.
I've been digging into this for the past couple of weeks, and have made some progress, but I don't really think I've reached down to the root cause.
First of all, if I increase the buffer size from the 'suggested' one (which BSoundPlayer normally uses) the problem goes away. And I notice that MediaPlayer's audio buffer is larger still, so I imagine this is why it doesn't glitch. However this means I'm not going to get the best latency, which is important for real-time playing of MIDI from a keyboard. It's actually not too bad -- around 22 msec -- but Haiku ought to be able to do better! (The lag is still noticeable.)
I've been putting in (and removing...) printfs everywhere, to see if I can pinpoint the real cause of the delay, but I tend to get a headache poring through the resulting reams of output! (:-/) I have found a few things though.
First, MediaEventLooper and TimedEventQueue between them have a couple of bugs. There is a sign reversal in the test for lateness in ControlLoop (which when fixed reduces the problem but doesn't remove it), and the event queue suffers from a classic glitch ('classic' because I had to avoid it myself a long time ago...) in that if an event is placed in the queue while the loop is sleeping (maybe with B_INFINITE_TIMEOUT) the loop doesn't get woken up! As port messages do typically arrive regularly, things get handled eventually, but not always as soon as they ought to be. I'll report all this more fully later, when I get it into patch form.
What I find completely mysterious, though, is -- what may be the root of the whole problem -- that the read_port_etc in WaitForMessage (called in MediaEventLooper's ControlLoop) sometimes times out 4-8 msec late! (About the same amount the latency keeps increasing by.)
I put the following printfs around the read_port_etc in WaitForMessage (MediaNode.cpp):
and -- just occasionally -- catch something like this:
Note that the time of return is 8 msec later than requested! I don't imagine this is expected behaviour in a B_URGENT_PRIORITY task?