Ultralite AVB

Talk about your MIDI interfaces, microphones, keyboards...

Moderators: MattKingUSA, khz

ahellquist
Established Member
Posts: 62
Joined: Mon Jul 01, 2013 12:28 am
Has thanked: 4 times

Re: Ultralite AVB

Post by ahellquist »

Awesome
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

Woke up and the channels had indeed drifted, so I don't think we're 100% of the way there right now. Regardless, it's good to see upstream taking this seriously.
tvaz
Established Member
Posts: 24
Joined: Sun Nov 10, 2019 5:55 pm
Been thanked: 1 time

Re: Ultralite AVB

Post by tvaz »

It seems to be something related to IO/IRQ. I've been using an Ultralite AVB with ESS chip without any issues for months now (Drumfix's driver). However, once I perform a high load IO task, say using dd to clone an image to the hard drive, the device will immediately turn to the distortion/channel shifting mode. Every single time.
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

So, the behaviour I was observing with Audacity turned out to be resolved by a BIOS update, so that may have been a sketchy USB host controller. Regardless, even with the BIOS updated, I was seeing channel drifting after hours to days of operation with linux-next.

With "acpi_irq_nobalance nowatchdog nohz=off" I got several days of stable operation. It may have drifted after longer periods of time but I didn't test for more than 48 hours. With "acpi_irq_nobalance nowatchdog" I got about 18 hours before channel drifting. "nohz=off nowatchdog" got me about 48.

This seemed really odd. Regardless of scheduling situation, I don't think that default kernel behaviour should lead to complete degradation of function for a USB device, so I did some digging.

I came upon this patchset: https://patchwork.kernel.org/project/li ... edhat.com/, and the successive https://patchwork.kernel.org/project/li ... edhat.com/ from the same author. In a nutshell, most of the time URB completions are deferred to a tasklet and this increases throughput and decreases system latency across the board because less code is executing in hardirq context. However, the MOTU AVB hardware has a confluence of factors: firstly, implicit feedback means that the playback is clocked to the capture, and, secondly, I'm not sure how to-spec the MOTU actually is. My hypothesis here is that the MOTU is expecting audio data to arrive in a particular USB frame, and when it's late, the device just assumes it's receiving data for the next block of samples, and therein a rolling offset is introduced.

More digging brought up https://patchwork.kernel.org/project/li ... gmail.com/. The choice quote from this (merged) patchset is:
In the case of isochronous pipe, executing the urb complete function
in tasklet can cause late urb resubmissions due to tasklet schduling
delay. Then the td can not match the endpoint service time interval
resulting in an empty isochronous transfer ring and ring overrun/
underrun event. But this is not a problem.
My hypothesis is that this is what we're seeing. Most of the time it doesn't matter, but here, because of the above factors (implicit feedback, "creative" interpretation of the spec), it's absolutely pivotal. In Mikulas's case, the issue could be resolved by just not scheduling mplayer at such a high priority, but, in our case, we're seeing this happen even on otherwise completely idle systems.

I've adapted Mikulas's patchset, adding XHCI support, and I'm smoke testing it right now. I'm almost 96 hours in and no channel drift, and this is without any special boot arguments – this is with a stock AUR-built linux-next, which also has CONFIG_HZ_300=y set. I figure if it works here it'll work anywhere. Even tried loading down the USB bus some – ran a 1080@60p HDMI capture for a bit, also running a second JACK server on my other audio interface (babyface pro) and so far no issues.

Haven't tried jack_bufsize yet. Will do so if it makes it through the weekend without channel drift, and then I'll post the patch as well. Note that if you want to try this out, Mikulas's patchset will cause a hard-lock on USB3 systems and requires some adaptation, so... don't do that.

-w
ahellquist
Established Member
Posts: 62
Joined: Mon Jul 01, 2013 12:28 am
Has thanked: 4 times

Re: Ultralite AVB

Post by ahellquist »

wrl wrote: Fri Jan 01, 2021 1:30 pm So, the behaviour I was observing with Audacity turned out to be resolved by a BIOS update, so that may have been a sketchy USB host controller. Regardless, even with the BIOS updated, I was seeing channel drifting after hours to days of operation with linux-next.

With "acpi_irq_nobalance nowatchdog nohz=off" I got several days of stable operation. It may have drifted after longer periods of time but I didn't test for more than 48 hours. With "acpi_irq_nobalance nowatchdog" I got about 18 hours before channel drifting. "nohz=off nowatchdog" got me about 48.

This seemed really odd. Regardless of scheduling situation, I don't think that default kernel behaviour should lead to complete degradation of function for a USB device, so I did some digging.

I came upon this patchset: https://patchwork.kernel.org/project/li ... edhat.com/, and the successive https://patchwork.kernel.org/project/li ... edhat.com/ from the same author. In a nutshell, most of the time URB completions are deferred to a tasklet and this increases throughput and decreases system latency across the board because less code is executing in hardirq context. However, the MOTU AVB hardware has a confluence of factors: firstly, implicit feedback means that the playback is clocked to the capture, and, secondly, I'm not sure how to-spec the MOTU actually is. My hypothesis here is that the MOTU is expecting audio data to arrive in a particular USB frame, and when it's late, the device just assumes it's receiving data for the next block of samples, and therein a rolling offset is introduced.

More digging brought up https://patchwork.kernel.org/project/li ... gmail.com/. The choice quote from this (merged) patchset is:
In the case of isochronous pipe, executing the urb complete function
in tasklet can cause late urb resubmissions due to tasklet schduling
delay. Then the td can not match the endpoint service time interval
resulting in an empty isochronous transfer ring and ring overrun/
underrun event. But this is not a problem.
My hypothesis is that this is what we're seeing. Most of the time it doesn't matter, but here, because of the above factors (implicit feedback, "creative" interpretation of the spec), it's absolutely pivotal. In Mikulas's case, the issue could be resolved by just not scheduling mplayer at such a high priority, but, in our case, we're seeing this happen even on otherwise completely idle systems.

I've adapted Mikulas's patchset, adding XHCI support, and I'm smoke testing it right now. I'm almost 96 hours in and no channel drift, and this is without any special boot arguments – this is with a stock AUR-built linux-next, which also has CONFIG_HZ_300=y set. I figure if it works here it'll work anywhere. Even tried loading down the USB bus some – ran a 1080@60p HDMI capture for a bit, also running a second JACK server on my other audio interface (babyface pro) and so far no issues.

Haven't tried jack_bufsize yet. Will do so if it makes it through the weekend without channel drift, and then I'll post the patch as well. Note that if you want to try this out, Mikulas's patchset will cause a hard-lock on USB3 systems and requires some adaptation, so... don't do that.

-w
Happy new year !

And my only comment to your investigation is: Amazing job
This might be the big break through

/Anders
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

Still going fine today, about 100 hours of runtime with no channel drift. As mentioned, this is with an 828es, I don't have any other devices handy. Latest firmware.

Here's a patchset, apply on top of linux-next. Otherwise no device/ALSA-specific patches necessary here, linux-next has the relevant implicit fb improvements.

I could see this one getting some pushback from the USB subsystem crew so I'm going to keep investigating. There's another URB flag, USB_ISO_ASAP, that I want to try and see if it makes a difference even without this fast completion patch. Will report back.

Curious to hear how this works for folks.

-w

EDIT/UPDATE:
Testing on a variety of machines has shown that this works perfectly on some systems and not at all on others. I'm leaving the patch uploaded in the event it works for your specific system, but unfortunately it will be no surprise if it doesn't.
Attachments
0002-usb-audio-urb-fast-completion.patch.txt
(7.46 KiB) Downloaded 103 times
Last edited by wrl on Fri Jan 08, 2021 8:02 pm, edited 1 time in total.
Drumfix
Established Member
Posts: 299
Joined: Mon Jan 26, 2009 5:15 pm
Been thanked: 11 times

Re: Ultralite AVB

Post by Drumfix »

I make it short: Doing a spin_unlock/spin_lock while inside hardirq context is the best way to create a kernel panic.
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

Drumfix wrote: Sat Jan 02, 2021 9:51 pm I make it short: Doing a spin_unlock/spin_lock while inside hardirq context is the best way to create a kernel panic.
Noted, and thanks for looking at the patch.

The unlock/lock is necessary because of a hardlock that happens otherwise. The specific code was borrowed from the way that the URB giveback code worked before the tasklet patches were merged. Take a look: https://patchwork.kernel.org/project/li ... gmail.com/. In this sense, URB_FAST_COMPLETION is opting back in to the old way that URB giveback worked.

To be fair, this is exactly why I'm expecting pushback from the USB maintainers when I submit this upstream.
Drumfix
Established Member
Posts: 299
Joined: Mon Jan 26, 2009 5:15 pm
Been thanked: 11 times

Re: Ultralite AVB

Post by Drumfix »

The right thing to do would be to have a separate kernel thread that handles the ISO completion callbacks and is scheduled instead of the tasklet_hi_schedule.
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

Drumfix wrote: Sat Jan 02, 2021 10:36 pm The right thing to do would be to have a separate kernel thread that handles the ISO completion callbacks and is scheduled instead of the tasklet_hi_schedule.
Is this something I could just test with "threadirqs" before going down the path of implementing a separate kernel thread?
Drumfix
Established Member
Posts: 299
Joined: Mon Jan 26, 2009 5:15 pm
Been thanked: 11 times

Re: Ultralite AVB

Post by Drumfix »

I see a solution for using the completion handler in hardirq context.
The reason for your hardlock is, that when resubmitting an urb, xhci_urb_enqueue tries to acquire the xhci_lock that is already held by the interrupt handler. So adding a flag USB_HOST_LOCK_IS_ACQUIRED to the urb's flag when an urb is enqueued from within in the completion callback, xhci_urb_enqueue could be changed like this:

Code: Select all

if (spin_trylock_irqsave(&xhci_lock, flags) && (urb->transfer_flags & USB_HOST_LOCK_IS_ACQUIRED))
{
     BUG("xhci_lock was not held but URB says so");
     spin_unlock_irqsave(&xhci_lock, flags);
    return -EINVAL;
}
...

Code: Select all

if (!(urb->transfer_flags & USB_HOST_LOCK_IS_ACQUIRED))
{
   spin_unlock_irqrestore(&xhci->lock, flags);
}
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

Aha, very nice catch, thanks for the review! I'll make that adjustment tomorrow – might just use the URB_FAST_COMPLETION flag for that rather than adding another flag that's only used in this specific code path. Will also look into making the change in the EHCI code.

Currently smoke testing USB_ISO_ASAP on the usb-audio URBs and it made it about 30 minutes before the channels started to drift, so... that's not the answer. This completion-in-hardirq approach, for all of its faults, is the best and most stable performance I've seen from the 828es on Linux so far.
Drumfix
Established Member
Posts: 299
Joined: Mon Jan 26, 2009 5:15 pm
Been thanked: 11 times

Re: Ultralite AVB

Post by Drumfix »

You need both flags.

URB_FAST_COMPLETION must be set in all submitted ISO URBs, no matter if they are submitted from inside and outside of the completion handler.
USB_HOST_LOCK_IS_ACQUIRED must be set in ISO urbs submitted from inside the completion handler only.
User avatar
wrl
Established Member
Posts: 48
Joined: Wed Nov 03, 2010 12:46 am
Been thanked: 2 times

Re: Ultralite AVB

Post by wrl »

You're absolutely right, I somehow completely blanked on that.
Drumfix
Established Member
Posts: 299
Joined: Mon Jan 26, 2009 5:15 pm
Been thanked: 11 times

Re: Ultralite AVB

Post by Drumfix »

No i'm absolutely tired.

i think in xhci_urb_enqueue

Code: Select all

int lock_acquired = spin_trylock_irqsave(&xhci_lock, flags);
...

Code: Select all

if (lock_aquired)
   spin_unlock_irqrestore(&xhci_lock, flags);
should do the trick. No extra USB_HOST_LOCK_IS_ACQUIRED flag needed.
Post Reply