Streamlining the Audio Workflow

October 4, 2019

As I get closer to completing all the side quests and final scenes in Sleuthhounds: Cruise, I’ve been thinking about the next steps in production. One of the big tasks I’ll be moving onto is the recording of dialog. The actual recording of the voices is a lot of fun. It’s a chance to hear the characters speaking aloud and to work with other people. Less fun is the process of actually integrating the audio into the game. In fact, it can be downright tedious at times. As such, anything that can reduce the tedious bits and speed up the audio incorporation is a boon.

[GoldWave displaying a typical dialog recording.]
GoldWave displaying a typical dialog recording.

To record the audio, I use an application called GoldWave paired with a Blue Yeti microphone. In GoldWave an empty new sound needs to be created before recording can commence. This is done by setting how long the new sound should be. For example, I could set the new sound to be 15 seconds long, or a minute, or an hour, or however long I think I need. I believe this is so that GoldWave can ensure there’s enough hard drive space for the sound ahead of time.

For the process of recording people’s voices, I’ve found it’s easiest and most time efficient to set a long sound time, typically an hour, and record all the lines of dialog into that one sound file (or as many lines of dialog as can be recorded in an hour). I typically record a couple of takes per line with people to have a safety in case one of the recordings isn’t usable for whatever reason. After the recording, comes the process of breaking that one sound file into separate sound files for the individual lines that will go into the game. With a couple of options available for most line this necessitates choosing the preferred line reading and saving only that version.

GoldWave is great at recording audio and at doing such processing tasks as removing low level background hiss, eliminating breath pops (caused by puffs of air on the microphone), and cutting and splicing separate audio recordings where perhaps part of a line was performed well in one take and part performed well in another. And though it can be used to break out the individual lines into separate files, that particular process can become rather tiresome, minimally requiring a start point and end point to be set in the master sound, an “extract to disk” button be clicked, and a file name typed in. Regardless of whatever other cleanup is needed on a line, those particular steps remain the same and become tiresome to perform when dealing with any substantial quantity of audio files.

With all that in mind, I set out to create not a replacement for GoldWave, but supplemental tools to help streamline the most repetitious bits of preparing the audio for incorporation into the game. This resulted in the creation of four fairly simple tools, none of which are cleverly named.

Audio Splitter

[The Audio Splitter showing both clean and thought recordings.]
The Audio Splitter showing both clean and thought recordings.

For the new process, once the audio has been recorded and an overall dehissing performed in GoldWave, the resultant clean audio will be saved out to a file. At the same time, I’ll be using GoldWave to create a “thought” file. This takes the clean audio recorded and adds a slight echo to it, which is the effect I use in game to represent when a character is thinking to themselves instead of talking out loud (because it’s always seemed odd to me that adventure game characters would talk out loud even when no one else was around).

The clean file, and optionally the thought file, will be brought into my new Audio Splitter application. As the name suggests, this program will be used to break the overall clean and thought files into individual pieces for each of the different lines. The audio waveforms are displayed, and left clicking on one will set the point to start the extraction and right clicking will set the point to stop the extraction. With the desired piece of audio selected, a save button can be pressed to extract the audio between the start and end points automatically to a numbered file based on an overall file name format such as “output####.wav” (with the “####” part being replaced by an incrementing, zero padded number).

Audio Chooser

[The Audio Chooser listing all individual files to choose from.]
The Audio Chooser listing all individual files to choose from.

Once the individual line readings have been separated into their own files, I’ll then be able to use the new Audio Chooser application. This program will be pointed at the directory to which the separate audio files generated by the Audio Splitter were saved. While recording the audio, I may ask the person supplying their voice to do a couple of takes of a line or there may just be some lines that people stumble over and require multiple takes to get through. The Audio Chooser will allow me to play each take and choose the one I want, while discarding the others.

In some cases, a line may require additional cleanup. It may have breath pops or other background noise that needs to be removed. Or the person may have put in unusually long pauses at punctuation in the line. Or maybe part of one reading sounds good and part of another sounds good, but no one recording sounds good overall. In these cases, the Audio Chooser will allow me to select one or more of the separate files and immediately send them to GoldWave to perform further work on them. If worse comes to worst, and the audio for a particular line is unsuitable, then the Audio Chooser will also allow lines to be marked for rerecording in a pickup session at a later date.

Audio Trimmer

[The Audio Trimmer makes the silence at the start and end of all files consistent.]
The Audio Trimmer makes the silence at the start and end of all files consistent.

After the good versions of all the dialog lines have been chosen, the next new application I’ve prepared is the Audio Trimmer. When recording lines of dialog, people naturally put pauses between recitations of the lines. This is useful because it makes it easy to find where the starts and ends of the lines are when visualizing the recordings in something like GoldWave or the Audio Splitter. However, for the actual game I wouldn’t want a line of dialog to be followed by two or three seconds of silence, for example. This is where the Audio Trimmer comes in.

The Audio Trimmer, like the Audio Chooser, can be pointed to the folder that contains the separate, good lines of dialog. It then performs a bulk process on all the audio files there trimming (or adding) the silence that appears at the beginning and ending of each audio file to a unified length. GoldWave allows the starts and ends of audio to be adjusted to a very fine resolution, but doing this by hand is a fiddly, time consuming process when you factor in the sheer number of dialog lines. Far better to let the computer handle this across the board for all the dialog lines at once.

File Renumber

[File Renumber removes gaps in the file numbering from deleted files.]
File Renumber removes gaps in the file numbering from deleted files.

The final new application is the File Renumber application. When using the Audio Splitter, files are all assigned an increasing sequential number. When using the Audio Chooser some of these files will be discarded, leaving gaps in the numbering. While it’s not strictly necessary that the file numbers be contiguous, it does make referring back to where in the dialog script the files came from a bit easier. So as the final automated step, all the files that will be incorporated into the game will be run through the File Renumber application, which will remove any gaps caused by unneeded, deleted line recordings.

Conclusion

For any given line of dialog, the steps covered by the above new mini applications do not take that long to perform. However, when you multiply that time out by the number of dialog lines, the overall time taken does add up quickly. As well, since these tasks are very tedious and repetitive they’re perfect for off-loading to the computer to do. It can do them faster, more consistently, and in a less error-prone way than manually doing them for every line of dialog. Now I just have to finish the dialog script and schedule some recording sessions.