Late in my teenager days, I would often record myself playing games. I did not do anything with the videos (I had a reluctance to post anything on the internet even back then). However, I wanted to be able to keep an easy record of what happened in the game so I could easily refer back to important story beats and whatnot, as well as capture great moments of my own making (I am still saddened to this day that I do not have footage of me shooting a plasma grenade with a plasma pistol in Halo: Reach). As I tinkered with it more, I investigated the idea of splitting up my audio to different audio channels in the video for the potential of making editing easier. I could dump the game audio to one channel, my microphone to another, and output from my chat application to a third. Back then, pulseaudio was the default audio handler on major Linux distributions, and it had the flexibility to set up and route virtual speakers and microphones for those willing to learn the commands and put them in your .profile file. When I went back to Windows for a brief stint, I used VB-Audio's freemium products to set up a number of these virtual speakers.

I eventually tired of recording all my games. I wanted to be able to play games in whatever environment (Steam Deck, my 1080p laptop, etc), and I was not willing to put up with the variations in quality, so I gave up the idea of recording everything. However, I recently have been investigating this again. I have been some visual-novel-type games with an out-of-state friend, where I host a game and stream it to him, and we decide together what to do. We primarily work using Signal, and Signal does not share desktop audio. As such, I needed to be able to send the game audio alongside my microphone output to my brother. On top up that, I have also taken to sending my music player output to whoever I am chatting with, creating a custom soundtrack for the game. I avoid this on public platforms like Discord, but I am willing to do this on secure platforms like Signal, or platforms that I manage, like Mumble or the Simple Voice Chat Minecraft mod. With these new developments, I began to revisit my custom audio software setup.

These days, Pipewire is the preferred audio system for Linux distros. As such, I needed to learn the new setup. Up until now, I relied on a rough bash script to create the appropriate sinks and loopback streams (recreating them as needed, since Henry Stickmin had a bad habit of closing audio streams any time audio was not playing). However, now that I am using more loopbacks, I decided to look into something more permanent.

As a solution, I discovered that you can define audio items in a configuration file and will create them whenever the service starts. By putting files the $XDG_CONFIG/pipewire/pipewire.conf.d/ directory, I can have everything set up whenever I start the computer, and configure my applications as needed.

When I was setting up my scheme last night, I had the following goals:

  1. I wanted a custom speaker (or "sink") for my music. My music player would dump its audio into this sink. Anything that is sent to this sink is then also sent (via a "loopback") to my primary speakers, but also my microphone output (or "source" alongside with whatever my actual microphone picks up.
  2. I wanted a sink for my application. For cases where I am sharing a game, I need to send the application output alongside my microphone output as well. However, I do not want to always do it (I do not want to send my Minecraft game audio to chat).
  3. I want a sink to dump my chat audio. It will be looped to my speakers. However, it will not be sent to the microphone.

In all of these, each category of items (each sink) can be recorded separately by software like OBS Studio, allowing it to be dumped to different audio channels for later mastering.

With these goals, I used a series of pipewire loopback modules, defining it in /etc/pipewire/pipewire.conf.d/10-music-sink.conf (to give all my users access to this setup).

context.modules = []

The configuration file vaguely resembles creating Python arrays and objects, although they do not use commas. All of my modules are defined within the square brackets.

To start, I create a "combined microphone." Both my hardware microphone and all relevant sinks will send their data to that sink. Anything sent to the combined microphone sink will then be able to be read as a microphone that chat applications like Signal and Mumble can use.

context.modules = [
    {   name = libpipewire-module-loopback
        args = {
            capture.props = {
                node.name = "combined_microphone.input"
                stream.dont-remix = true
                node.passive = true
            }
            playback.props = {
                node.name = "combined_microphone"
                node.description = "Combined Microphone"
                media.class = "Audio/Source"
            }
        }
    }
]

By reloading Pipewire with systemctl --user reload pipewire, we now can see a new input, "Combined Microphone" (you may need to enable virtual audio objects in your UI).

An explanation of what each aspect does:

  1. name = libpipewire-module-loopback This defined what type of module we are making. For this project, we are using exclusively loopback modules. This module type can both create sinks and sources, but also handle redirecting.
  2. capture.props This controls how we get audio data into this module. This has two modes. By setting the media.class property to "Audio/Sink," I can create a new sink that we can have applications dump data to. However, by leaving it out, it can be configured to take its data from an existing source or sink. This is usually done by specifying the node.target property. By leaving out node.target, however, it will have the loopback module capture the default source (i.e. the hardware microphone). This capture will even keep up with changes to the default, letting us change what microphone we use without issue.
  3. playback.props This controls how we send out data. Here, we set media.class to "Audio/Source" to create a new source, a new microphone. The idea is that I can configure chat applications to listen to this microphone and get whatever I send to this object. If I decided to omit media.class, I could have instead sent the received audio data to a sink, such as my primary speakers (configurable via node.target).
  4. node.name The name of the source/sink. This is the name that will be used to reference the source/sink elsewhere in the file. As a standard, I add a ".input" to the capture properties when I am making a new source, or a "output" suffix to the playback properties when I am making a new sink. This is just a personal standard, however.
  5. node.description A human-readable name for sinks and sources that will show up in GUIs. I only define this when I am also defining the media.class property.
  6. stream.dont-remix By setting this to "true," I can ensure that the channels (i.e. left output, right output) are not changed. I may need to reconsider this if I ever get a more complex audio hardware setup. However, since all my devices use stereo audio, this works.
  7. node.passive By setting this to "true," I can reduce resources when no audio is flowing through this source or sink. I use this for objects that are purely used for connecting two objects.
  8. stream.capture.sink This is not used in the above example, but I set it to "true" when I am sending audio between sinks. I cannot find documentation on what it does, but it is needed in some of the definitions below.

With this setup, I now have a virtual microphone alongside however many hardware microphones I have plugged in, and whatever hardware microphone I have set to "default" will be echoed in the virtual microphone. At the moment, it is not useful. We need more objects for it to do something interesting. Let us define a new sink to dump audio into.

context.modules = [
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
        args = {
            capture.props = {
                node.name = music_sink
                node.description = "Music Sink"
                media.class = Audio/Sink
            }
            playback.props = {
                node.name = "music_sink.output"
                stream.dont-remix = true
                node.passive = true
            }
        }
    }
]

This time, we create a virtual speaker called music_sink by setting the media.class property within capture.props to "Audio/Sink." By omitting the class within the playback props, we echo anything dumped into the new sink into the default speakers.

With this setup, we now have a new audio output that will echo its data to the default speaker. By setting a music player to send its output to this new source, we can still hear the music. However, we can record this output (i.e. via OBS) separately from the rest of the desktop.

Let us set up a few more.

context.modules = [
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
        args = {
            capture.props = {
                node.name = application_sink
                node.description = "Application Sink"
                media.class = Audio/Sink
            }
            playback.props = {
                node.name = "application_sink.output"
                stream.dont-remix = true
                node.passive = true
            }
        }
    }
    {   name = libpipewire-module-loopback
        args = {
            capture.props = {
                node.name = chat_sink
                node.description = "Chat Sink"
                media.class = Audio/Sink
            }
            playback.props = {
                node.name = "chat_sink.output"
                stream.dont-remix = true
                node.passive = true
            }
        }
    }
]

Now, we have three new sinks: one for the music player, one for applications/games, and one for chat applications. All of them will send their data to the default speaker, but will also keep their information separate.

Now, we could leave things here, and our setup would work quite well for streaming and recording. However, I want to be able to send some data (music) over chat, so we need to create more loopbacks and take advantage of the combined_microphone source we set up earlier. Let us add a new loopback that will connect two existing objects.

context.modules = [
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
...
    }
    {   name = libpipewire-module-loopback
        args = {
            capture.props = {
                node.target = music_sink
                stream.capture.sink = true
            }
            playback.props = {
                node.target = "combined_microphone.input"
                stream.dont-remix = true
                node.passive = true
            }
        }
    }
]

This new setup connects everything we dump to music_sink and send it to the combined_microphone. Now, if we reload, our music player can be heard on combined_microphone. By setting a chat application to listen to combined_microphone, your friends can hear both your voice and whatever music you are playing.

Now, we could also set up a similar loopback for applications, letting me supplement the screen share. However, this presents a conflict. I want to share Ace Attorney over Signal. However, I do not want to share Minecraft over voice chat. That said, I still want to dump both into an isolated sink to record separately. To solve this, I will create yet another sink, along with a new loopback.

context.modules = [
...
    {   name = libpipewire-module-loopback
        args = {
            node.name = application_sink
            capture.props = {
                node.name = application_loopback
                node.description = "Application Loopback"
                media.class = Audio/Sink
            }
            playback.props = {
                node.target = "application_sink"
                stream.dont-remix = true
                node.passive = true
            }
        }
    }
...
    {   name = libpipewire-module-loopback
        args = {
            audio.position = [ FL FR ]
            capture.props = {
                node.target = application_loopback
                stream.capture.sink = true
            }
            playback.props = {
                node.target = "combined_microphone.input"
                stream.dont-remix = true
                node.passive = true
            }
        }
    }
]

The first module we define, application_loopback above creates a sink that echos not to the speaker, but to application_sink. The second module will also echo application_loopback to the combined microphone. Thus, we have two options for applications. By sending it to application_loopback, it will be echoed to both application_sink for recording, but also to combined_microphone for sharing. Alternatively, we can dump it to application_sink directly if we only want to record.

The final file can be found here. By placing it in the appropriate folder and reloading pipewire (systemctl --user restart pipewire), we now have several new audio outputs and a new input. From there, we set up our environment as such:

  1. Set the default speaker to the hardware speaker we want to hear everything out of. Set the default microphone to the hardware microphone we will be speaking into.
  2. Configure the music player to dump to the music_sink sink. If the application does not support it naively, configure it in an OS volume mixer (KDE handles this without needing extra packages).
  3. Configure your game to output to either application_loopback if you are running a screen share, or application_sink if you only need to record.
  4. Configure your chat application (Mumble, Signal, Discord, etc.) to output to chat_sink. Configure your chat application to use combined_microphone as the input.
  5. In your screen recording software (OBS, etc.), set the microphone to your default hardware microphone. Set up different audio sources to read application_sink and chat_sink (and optionally music_sink), and set each source to save to a different audio channel.