Early Lessons from a Linux Audio Newbie

angus.hendrick · Post by **angus.hendrick** » Mon Nov 12, 2012 6:34 pm

I have a consumer gaming desktop with AMD64 quad-core desktop with 12GB of RAM that I recently decided to make into a studio box. I've been partially successful, and it's been quite a bit of trial and error, so I thought I'd write down some lessons learned from my journey for others to use, or provide feedback on. This includes both conceptual as well as technical understanding I (think I) have gained from my efforts so far.

Software Configuration:

1. I'm based in the Ubuntu distros. A relevant bit of history is that, prior to commencing my audio configuration efforts, I spent several months coming down from the high of several years worth of Gnome2 addiction. Gnome3 was not for me, nor was Unity (computer != cell-phone). I tried Mint, KDE4, before settling on Xubuntu as the best option. It turns out it is also lightweight, which makes it a good audio choice for low latency. If you go this way, at some point you will turn on your system and it will have no window decorations. You need to hit Alt-F2 and issue

Code: Select all

xfwm4 --replace

Finally, I'm using the 12.04 version because I'm into long-term usage (e.g., change is bad... again Gnome3 and Unity) and more importantly, because KXStudio uses the 12.04 version as its base (see below).

2. The next thing I did was to install the Ubunt Studio packages. Note that these can be installed without a clean install by just installing ubuntustudio-somepackage. I use wajig to wrap apt and dpkg (and you should too: sudo apt-get install wajig), so the command line to get the list of available packages is

Code: Select all

wajig list-all ubuntustudio

I then proceeded to go to battle with jack and pulseaudio to try to get things to work. Short story is I failed and ultimately went to KXStudio which changes the default audio stack (thanks FalkTX) so that (on my system) it works.

A digression on what newbies should understand about what I'm calling the "audio stack" (hopefully consistent with standard jargon). Think of a computer system as built of successive layers with established interfaces between the layers. The lowest layer is the hardware. Manufacturers specify the interfaces for their hardware (sometimes to standards, sometimes not to standards, sometimes not at all), and driver software is built to translate between the hardware and pieces of software that might want to use it. The driver is a translator that outputs hardware specific directions and accepts inputs according to a standard interface. This lets higher-tier software writers (like for music players and digital audio work stations) ignore the hardware variations and only worry about making their output go to a (relatively consistent) driver interface. In Linuxes, the most common drivers for audio are provided by ALSA. (Aside, if you have audio problems, do not follow the instructions to rebuild ALSA that you may find. As of the date of this post, that was all unnecessary and mostly harmful. Problems are almost certainly elsewhere)

Most audio software doesn't talk directly to the ALSA drivers in Linux to output (or record) sound. Most music players and things like flashplayer instead talk to another layer: pulseaudio. If your distro provides the ability to manage audio via a speaker on the toolbar, it is probably controlling pulseaudio. I found that these interfaces are less capable than the standard pulseaudio controller that they replace and so I recommend

Code: Select all

wajig install pavucontrol

Pulseaudio exists for several reasons, but in summary it takes the single connection provided by ALSA to a given piece of hardware and provides a multiple input interface with a mixer that allows you to have multiple sources that can all output sound to the same piece of hardware.

Pulseaudio works great for lots of purposes, but it is not designed for studio applications. In particular it does not support precise timing and has undefined latency. I was not involved in computer audio in the 90's but I had friends who were, and the standard nightmare was that a recording made on a machine would not sync exactly with previously recorded material. Jack is designed to correct these problems and essentially stands in place of pulseaudio by creating a multiple input connection to ALSA that allows multiple studio applications to share audio hardware. It also provides for generalized routing of audio among studio applications (analogous to a hardware patch bay) and transmission of MIDI messages. Finally, Jack provides for synchronization of playback and recording among studio applications.

However, all is not roses with jack. Whereas pulseaudio puts the hardware and audio paths at arms-length, Jack requires you to think about all the details of your hardware's capabilities and the signal paths. Also, non-studio applications (e.g., flash) connect to pulseaudio automatically, or failing that to ALSA. They do not connect to Jack. Since by default pulseaudio connects directly to ALSA and ALSA only supports a single connection per hardware device, Jack and pulseaudio do not necessarily play nice together out of the box if you are trying to use the same set of audio hardware (e.g., soundcard) for both studio work and general use.

In principle, a real man (woman, hermaphrodite, wev...) could beat this system into shape. The general idea is that you can either shutoff pulseaudio when you enable jack, or configure pulseaudio to talk to jack instead ALSA. However, getting this to work right can be somewhat complicated. Pulseaudio is setup as an automatically restarting daemon by default, so even when you stop it, it comes back and messes with your attempts to connect jack to ALSA. Also, I could not figure out how to tell pulseaudio to try to connect to jack instead of ASLA. There's lots of information on how to do all this yourself, but in the end I just installed FalkTX's KXStudio, and all the magic happened.

3. FalkTX's excellent KXStudio does three things for me. First, it ends the jack-pulseaudio wars. Second, it provides a very rich set of audio tools. Third, it provides an integrated interface to studio setup and maintenance. To install KXStudio into an existing Ubuntu install, follow these instructions. Next, if you've found this post, you may have already tripped across qjackctl as a handy gui for running jack. You've probably figure it out how it works and are getting to be okay with it. Please rip that from your mind, and learn to use Cadence and Catia instead.

Both Cadence and Catia are provided by KXStudio. Cadence provides several enhanced features to control jack that go beyond qjackctl. Most notably, toward the bottom right of the System pane are several bridging options. One of them is the aforementioned pulseaudio-jack bridge. Cadence also manages an ALSA-MIDI-to-Jack-MIDI bridge using a2jmidid. (Briefly--unlike the rest of this post--Linux MIDI is evolving, and two standards are still in use. ALSA-MIDI is the older approach to managing MIDI and is still widely supported. It has relatively larger timing "jitter" than Jack-MIDI which uses a higher-precision timer is replacing ALSA-MIDI as the standard interface in Linux studio applications. For now, the ability to bridge between these two MIDI interfaces is essential to permit general connectivity).

Catia replaces the Connect dialog in qjackctl with a much more intuitive "canvas" interface. There are other apps (e.g., patchage) that use the same approach, but Catia works well with Cadence (i.e., it understands the bridge setup there), and so is a natural first choice. A useful learning experience is to go into Catia and select Tools|Jack Server|Jack Control and select one of the options to fail or ignore automatic connection requests. This puts all connections in your hands. At some point, the magical autoconnects done by your favorite DAW will mystify you, and doing this will let you sort out how the connections should go.

Finally, the application Claudia manages "studios." The concept of a studio in Linux is related to the fact that several applications are typically needed to do any given job. For example, a softsynth, a MIDI-sequencer and an audio recorder might be connected together to do some composition. This modular approach is (apparently) unique in Linux compared to Windows and OSX. Since the set of applications and connections among them will vary as a function of the job, and configuring the multiple applications can be time consuming and error-prone, a session manager provides a shortcut to restore a common operating setup. The LADI Session Handler (LADISH) is the underlying magic for managing this and Claudia is a front end for LADISH. in Claudia you can tell it a set of applications and connections and it will remember it... at least that's the idea. For me, it works sometimes, and crashes others. I'm going to stick with it, but you may want to consider other options. Morevoer, I would defer messing with Claudia until you have played with applications and connections on your own and have started to develop some workflows of your own.

4. Latency refers to how quickly the system can take an input and produce an audio output. Jack allows you set the parameters to minimize latency. Small buffers and faster sampling frequencies minimize latency at the expense of risking buffer overruns (i.e., xruns, reported by your jack interface application) that lose data and sound bad. Larger buffers increase latency but reduce the possibility of xruns. In short, you want minimum latency when the computer is both playing back a recording and you are trying to play along using the computer as the audio source (i.e., using a hardware MIDI controller to play a softsynth). This comes up in live performance and some recording situations. If the latency is too big, you will hear your playing out of sync with the tracks you are trying to play along with and it will be unworkable. For mixing and mastering jobs where you're doing lots of audio processing, but not playing along in real-time, increasing the buffer size permits these activities to do their thing without data loss.

Notwithstanding the latest guidance that generic kernels are adequate for achieving low latency, I have achieved considerably lower latency (reduced 30ms to 5ms) using the KXStudio lowlatency kernel. Note that after restarting into the low-latency kernel, my NVidia drivers were gone, but a

Code: Select all

wajig reinstall nvidia-current-updates

fixed that (after another reboot). Hardware is the other big piece of the latency puzzle. More on that in the last section.

5. A couple bits of emergent weirdness have cropped up for me. Most significantly, Flash, which used to connect via pulseaudio and which continued to do so for several weeks after my initial install of KXStudio, suddenly started connecting directly via jack. It appears that what is going on is that Flash is trying to connect to ALSA directly and jack is instead creating a jack source for it that is then connected to the outputs. This is serviceable but undesirable, as some Flash windows do not have working audio controls, and the direct connection means that the pulseaudio mixer does not work on these apps. Also, it's just weird.

Applications:

If you are new to digital audio in general, or new to it on Linux in particular a brief discussion of some of the key applications may be useful. I have surveyed several in the past few weeks, but my survey is hardly exhaustive.

1. First, begin with the idea that a recording is a specification of the variation in sound with time. There are two basic ways to do specify the sound with time: Audio and MIDI recording. Audio recording involves the computer writing down a digital representation of an analog wave form. Without going into details, it is an (albeit extremely) modernized version of recording as our grand-parents knew it. MIDI "recording" involves establishing timing for sending MIDI commands. In a simple case, these are none-on/off and velocity (i.e., volume) commands to a softsynth that makes noise in response. They can also involve commands that tell the instrument to do other things, like change patches and so forth. The latter is different from the former because a single MIDI recording could produce a wildly different sound, simply by changing the synthesizer that the MIDI is played back through. In that way, MIDI recording bears more of a resemblance to written "recording" technology used by composers (i.e., notes written on a page). Many musicians use both. A simple example of how this immediately gets complicated is that one might use MIDI to sent a note-on signal to a synthesizer that causes the synthesizer to play a previously recorded audio sample.

2. Second, as a practical matter, get used to thinking in terms of a collection of applications. Typically, a digital audio workstation (DAW), some set of softsynths, and possibly one or more sequencers. A simple setup might be qtractor (a DAW that supports both MIDI and Audio recording) talking to Yoshimi (a softsynth) to play back its MIDI tracks. For a recording with no computerized instruments, an application like Ardour2 or Audacity might do the job by itself, but even there, the final processing of a multitrack recording would involve passing the output to a separate application like jamin to do the more sophisticated processing involved in competent audio mastering.

The model I now have in my head is that every application is analogous to a hardware box. It has inputs and outputs of various kinds that are analogous to hardware I/O (e.g, 1/4", RCA, S/PDIF, XLR, MIDI, etc.). Like actual hardware boxes, sometimes you have the connections you want, sometimes not, and sometimes you can adapt from one to another, and sometimes not. Jack is the patch bay that connects everything to everything else. The ALSA-MIDI-to-JACK-MIDI bridge is an example of connecting two different "kinds" of connections together in a way that works. In my mind, Jack is what's inside the box, and Catia is how it looks on the outside. I generally don't worry about the insides... so long as it works.

3. As far as particular applications, here's a nickel on lessons so far... (Google is your friend to find some really good videos on all of these).

a. Ardour is the big boy on the block for free open source Linux DAW's, though the devs starting to get more aggressive about their desire for a donation, albeit as small as $1. Ardour is very flexible, but not always intuitive for newbies. Ardour2 is the current stable version. It does not include MIDI recording. Ardour3 is easily built from source and includes MIDI recording, but it is not stable (crashes hourly on my machine). Qtractor is a stable MIDI/Audio DAW. It doesn't seem quite has powerful as Ardour, but it easier for me to use, and I presently cannot point out any particular limitations that are unique to it. Renoise is a commercial application that is available for Linux and which has a fairly full featured DAW that includes MIDI and sampling capabilities. Audacity is an audio only recorder. It is simpler and easier to use than Ardour, but less sophisticated. Rosegarden is a MIDI only compositional tool who's main claim to fame is its ability to write music that looks like written music. Synth24 is a sequencer that is popular for live work.

b. ZynAddSubFX is a flexible synthesizer with a bunch of great presets as well as the ability to create sounds by combining and filtering oscillators. Yoshimi is a fork of ZynAddSubFX with better jack compatibility (I don't know more about this than that I have heard it said). It can also load sounds created in ZynAddSubFX sounds. Hydrogen is a sequencer and synth that is the Linux standard drum machine; it also provides powerful sampling capabilities. There are tons of others, and this doesn't even begin to scratch the surface.

c. There are lots of different plugins that DAW's can use to provide particular processing capabilities (e.g., compression, reverb, etc.). Most of these function like built in effects for the DAW. There's a long history of development and several different formats (LADSPA, LV2, VST, etc.), but the good news is that most DAWs support most of these (Claudia shows which apps support which plugins). I haven't used many of these. There are also stand-alone "effects" units, like jamin, that can process audio. Again, the analogy is to hardware boxes: plugins are like the effects built into your board, and stand-alone effects are the things you patch in using sends and receives.

d. Presently, I am using qtractor, Hydrogen, and Yoshimi for composition. I sequence all sounds except drums in qtractor, and Yoshimi plays them back. Hydrogen is my drum machine. I record audio into qtractor. Once I get my first song done, I'll have to figure out how to do the mixing and mastering. I anticipate exporting or recording into Ardour2 and mixing/mastering there using jamin.

4. For any MIDI sequencer (except Hydrogen which includes its own synth), you will need to make a few connections. I prefer to turn off all automatic connections in Catia and manually make connections myself. This prevents the occasional feedback loop that results from automatic connections, and also has caused me to better comprehend what is going on. If you want to record MIDI into your MIDI sequencer (the distinction between a MIDI recorder and a MIDI sequencer may have been one I just invented, but conceptually, a MIDI sequencer has playback capabilities, and a MIDI recorder can record MIDI messages, practically the terms are used interchangeably), all you really have to have connected is a MIDI controller to the recorder's MIDI inputs. Practically, you will want to hear what you are playing and so you will need the controller to send MIDI to the synth as well to tell it to play. Alternatively, you could send MIDI from the controller to the sequencer, and then send MIDI from the sequencer to the synth. Regardless of how you route the MIDI, you will then need to send audio from the synth to the system outputs to hear anything. You could also send the output to an audio channel of your DAW and record the audio. Note that the DAW doesn't really care if there's a synth attached to the output or not, it just records the MIDI input and sends output to whatever is attached. If you were late-in-life Bach, you would just play your MIDI controller and record the input on the DAW and never hear it except in your head.

Hardware:

1. I got started using the hardware I had available, which consisted of audio through the NVidia graphics card to the speakers on the 36" TV I use as a monitor. I then made several improvements. The MOST important was to upgrade my speakers. I had a pair of M-Audio Studiopro 2 speakers that I dug out. Simply plugging those into the headphone out of the TV and using them for sound was a revolution. I have had the problem reported on Amazon for some of the newer versions that the unpowered speaker will go quiet/silent. It happens about once a week, and I can fix it by turning off and back on the speakers.

2. To reduce latency, I picked up an M-Audio Audiophile 2496 PCI Card. In summary, it can matter which PCI slot you plug into (won't work in some works in others). Another option that has worked for some people is booting the kernel with the option "pci=nocrs". The underlying problem appears to be interrupt conflicts. Compared to the NVidia or built-in motherboard audio card, latency is reduced by half or more.

3. I had been using a fairly high-end usb Logitech camera as a recording (got it for Skype). I was using it as a microphone for audio for a while. I learned that it only allows up to 32KHz, and so when jack is connected to it, its sampling frequency is reduced to 32KHz no matter what you set it to be.

4. I have a Zoom H4n. One of the reasons I got it was that it works as an audio interface in Linux. I can use either its built-in condenser mics or attach external mics to the two XLR inputs on the back. However, I have learned that the the H4n does not work in duplex mode (i.e., it is playback or record only), and I have had severe xrun problems when I try to use it as a digital interface for recording and something else is connected as an output. The solution for now is to connect the headphone output to the RCA recording inputs on the M-Audio Audiophile 2496. This seems to work fine, though I'm not sure if I'm adding unwanted layers of A/D in my signal path.

5. After installing dsmidiwifi on my box, I have turned my Android phone into a midi controller for my softsynths--a real keyboard is on my christmas list. Latency is not noticeable. I have had luck with using Ardroid on my phone to control Ardour3, though the controls are pretty minimal. TouchDAW and TouchOSC are applications that allow me to use my phone to control DAWs, but I can't seem to get either of them to work with any DAWs.

If you read this far, you're either very bored or the right audience for this post. If you're the former, get some beer. If you're the latter, I hope this helped a little.

manic_b · Post by **manic_b** » Sat Nov 17, 2012 3:44 pm

Nice post, it's always helpful to see other people's problems and solutions

I also find it helpful to think of applications as hardware boxes. Good luck with your christmas list, a real MIDI keyboard does wonders for productivity!

steevc · Post by **steevc** » Mon Nov 19, 2012 8:28 am

Wow, that's a pretty comprehensive post that taught me a few things.I'm also using KXStudio and liking it a lot.

I've got a Zoom H4 that I was using as my interface before I got the Delta 66. Now I plug the aux out into the Delta so I can use the H4 microphones as I don't have any other good microphones. That seems to work fairly well, but I also wondered if there is some extra conversion going on. I assume there must be as you can apply effects to the microphone signal. I'm fairly sure the H4 can do duplex USB. Did they remove that feature on the H4n?

I want to try out Ardroid some time. Last night I was trying to use my nanoKontrol with Android. I was able to select it as a control surface, but only the play/pause/rewind buttons seemed to do anything. Then it stopped working at all. Didn't have time to investigate further. Just trying to set something up to that I can record myself on guitar or vocals whilst out of reach of the keyboard.

angus.hendrick · Post by **angus.hendrick** » Sat Jan 05, 2013 3:16 pm

In re: the H4n, I believe from looking around here that duplexing over USB on the H4n is not supported. I guess this is a downgrade from the H4.

The bigger issue is the latency and generally massive xrun problem I was getting over USB with the H4n. I eventually gave up and got a different audio card to serve as my audio interface. Using the analog output from the H4n to send the mics into my audio card works well though.

As for Christmas, my Axiom 61 is much goodness. Coupled with Ardour's MIDI learning, I think I'm set to make some big improvements in productivity.

tatch · Post by **tatch** » Mon Jan 07, 2013 2:11 am

angus.hendrick wrote:In re: the H4n, I believe from looking around here that duplexing over USB on the H4n is not supported.

I have a zoom H4n and I duplex over it all the time, usually using the H4n as the input device and my internal soundcard as the output. using the H4n as both input and output has also worked for me, but when I use it as an output device the latency or something piles up really quickly and eventually starts to sound like an echo.

LinuxMusicians

Early Lessons from a Linux Audio Newbie

Early Lessons from a Linux Audio Newbie

Re: Early Lessons from a Linux Audio Newbie

Re: Early Lessons from a Linux Audio Newbie

Re: Early Lessons from a Linux Audio Newbie

Re: Early Lessons from a Linux Audio Newbie