Talking to a fellow sound designer about game audio, I realised that he wasn't aware of some of the differences between working on audio for linear media (film, animation, TV, etc) and for interactive media (video games).
So this post is kind of my answer to that. A brief introduction to the creative depth of video game sound design. This would be aimed to audio people who are maybe not very familiar with the possibilities this world offers or just want to see in which way is different to, say, working on film sound design.
Of course there are many differences between linear and interactive sound design, but perhaps the most fundamental, and the most important for somebody new to interactive sound design, is the concept of middleware. In this post, I’ll aim to give beginners a first look at this unfamiliar tool.
Linear Sound vs Interactive Sound
Video games are an interactive media and this is going to influence how you face the sound design work. In traditional linear media, you absolutely control what happens in your timeline. You can be sure that, once you finish your job, every time anyone presses play that person is going to have the same audio experience you originally intended, provided that their monitoring system is faithful.
Think about that. You can spend hours and hours perfecting the mix to match a scene since you know is always going to look the same and it will always be played in the same context. Far away explosion? Let's drop a distant explosion there or maybe make a closer FX sound further away. No problem.
In the case of interactive media, this won´t always be the case. Any given sound effect could be affected by game world variables, states and context. Let me elaborate on those three factors. Let's use the example of the explosion again. In the linear case, you can design the perfect explosion for the shot, because is always going to be the same. Let's see in the case of a game:
The player could be just next to the explosion or miles away. In this case, the distance would be a variable that is going to affect how the explosion is heard. Maybe the EQ, reverb or compression should be different depending on this.
At the same time, you probably don't want the sound effect to be exactly the same if it comes from an ally instead of the player. In that case, you'd prefer to use a simpler, less detailed SFX. One reason for this could be that you want to enhance the sound of what the player does so her actions feel more clear and powerful. In this case, who the effect belongs to, would be a state.
Lastly, is easier to make something sound good when you always know the context. In video games, you may not always know or control which sounds will play together. This forces you to play-test to make sure that sounds not only work in isolation but also together and in the proportions that usually the player is going to hear. Also, different play styles will alter these proportions. So, following with our example, your explosion may sound awesome but maybe at the same time dialogue is usually being played and is getting lost in the mix and you'd need to account for that.
After seeing this, linear sound design may feel more straightforward, almost easy in comparison. Well, not really. I´ll explain with an analogy. Working on linear projects, particularly movies, is like writing a book. You can really focus on developing the characters, plot and style. You can keep improving the text and making rewrites until you are completely satisfied. Once is done, your work is always going to deliver the same experience to anyone who reads the book.
Interactive media, on the other hand, is closer to being a game master preparing a D&D adventure for your friends. You may go into a lot detail with the plot, characters and setting but like any experienced GM knows, players will somewhat unpredictable. They will spend an annoying amount of time exploring some place that you didn´t give enough attention to and then they will circumnavigate the epic boss fight by some creative rule bending or a clever outside the box idea.
So, as you can see, being a book writer or working in linear sound design gives you the luxury of really focusing on the details you want, since the consumer experience and interaction with your creation is going to be closed and predictable. In both D&D and interactive media, you are not really giving the final experience to the players, you are just providing the ingredients and the rules that will create a unique experience every time.
Creating those ingredients and rules is our job. Let's explore the tools that will help us with this epic task.
Audio Middleware and the Audio Event
Games, or any software for that matter, is built from a series of instructions that we call code. This code manages and keeps track of everything that makes a game run: graphics, internal logic, connecting to other computers through the internet and of course, audio.
The simplest way of connecting a game with some audio files is just calling them from the code whenever we need them. Let's think about a FPS game. We would need a sound every time the players shoots her shotgun. So, in layman's terms, the code would say something like: "every time the players clicks her mouse to shoot, please play this shotgun.wav file that you will find in this particular folder". And we may don't even need to say please since computers don't usually care about such things.
This is how all games were done before and is still pretty much in use. This method is very straightforward but also very limited. Incorporating the audio files into the game is a process that is usually called implementation and this is its more rudimentary form. The thing is, code can be a little scary at first, specially for us audio people who are not very familiar with it. Of course, we can learn it, and is an awesome tool if you plan to work in the video game industry, but at the end of the day we want to be focusing on our craft.
Middleware was created help us with this and fill the gap between the game code and the audio. It serves as a middle man, hence the name, allowing sound designers to just focus on the sound design itself. In our previous example, the code was pointing to specific audio files that were needed in any given moment. Middleware does essentially the same thing but puts an intermediate in the middle of the process. This intermediate is what we call an audio event.
An audio event is the main functional unit that the code will call whenever it needs a sound. It could be a gunshot, a forest ambience or a line of dialogue. It could contain a single sound file or dozens of them. Anytime something makes a sound, is triggering an event. The key thing is that, once the code is pointing to an event, we have control, we can make it sound the way we want, we are in our territory.
And this is because middleware uses tools that we audio people are familiar with. We'll find tracks, faders, EQs and compressors. Keep in mind that these tools are still essentially code, middleware is just offering us the convenience of having them in an comfortable and familiar environment. Is bringing the DAW experience to the game development realm.
Audio middleware can be complex and powerful and I'd need a whole series of posts to tell you what they can do and how. So, for now. I'm just going to go through three main features that should give you an idea of what they can offer.
I - Conventional Audio Tools within Middleware
Middleware offer a familiar environment with tracks, timelines and tools similar to the ones found on your DAW. Things like EQ, dynamics, pitch shifters or flangers are common.
This gives you the ability to tweak your audio assets without needing to go back and forth between different softwares. Probably you are still going to start from your DAW and build the base sounds there using conventional plugins, but being able to also do processing within the middleware gives you flexibility and, more importantly, a great amount of power as you'll see later.
II - Dealing with Repetition and Variability
The player may perform some actions over and over again. For example, think about footsteps. You generally don't want to just play the same footstep sound every single time. Even having a set of, say 4 different footsteps, is going to feel repetitive eventually. This repetitiveness is something that older games suffer from and that generally modern games try to avoid. The original 1998 Half-Life, for example, uses a set of 4 footstep sounds per surface. Having said that, it may still be used when looking for a nostalgic or retro flavour the same way pixel art is still used.
Middleware offer us tools to make several instances of the same audio event, sound cohesive but never exactly identical. The most important of these tools are variations, layering and parameter randomization.
The simplest approach to avoid repetition is just recording or designing several variations on the same effect and let the middleware choose randomly between them every time the event is triggered. If you think about it, this imitates how reality behaves. A sword impact or an footstep are not going to sound exactly the same every single time, even if you really try to use the same amount of force and hit on the same place.
You could also break up a sound into different components or layers. For example, a gunshot could be divided in a shot impact, its tail and the bullet shell hitting the ground. Each of this layers could also have their own variations. So now, every time the player shoots, the middleware is going to randomly choose an impact, a tail and a bullet shell sound, creating a unique combination.
Another cool thing to do is to have an event with a special layer that is triggered very rarely. By default, every layer on an event has a 100% possibility to be heard but you can tweak this value to make it more infrequent. Imagine for example a power-up sound that has an exciting extra sound effect but is only played 5% of the time the event is called. This is a way to spice things up and also reward players who spend more time playing.
An additional way of adding variability would be to randomize not only which sound clip will be played, but also their parameters. For example, you could randomize volume, pitch or panorama within a certain range of your choice. So, every time an audio clip is called, a different pitch and volume value are going to be randomly picked.
Do you see the possibilities? If you combine these three techniques, you can achieve an amazing degree of variability, detail and realism while using a relative small amount of audio files.
See above the collision_ashore event that is triggered whenever a ship collides with an island. It contains 4 different layers:
A wood impact. (3 variations)
Sand & dirt impacts with debris. (3 variations)
Wooden creaks (5 variations)
A low frequency impact.
As I said, each time the event is triggered, one of this variations within each layer will be chosen. If we then combine this with some pitch, volume and EQ randomization we will assure that every instance of the event will be unique but cohesive with the rest.
III - Connecting audio tools to in-game variables and states.
This is where the real power resides.
Remember the audio tools built-in into middleware that I mentioned before? In the first section I showed you how we can use these audio tools the same way we use them on any DAW. Additionally, can also randomize their values, like I showed you in the second section. So here comes the big one.
We can also automate any parameter like volume, pitch, EQ or delay in relation to anything going on inside the game. In other words, we will have a direct connection between the language of audio and the language the game speaks, the code. Think about the power that that gives you. Here are some examples:
Apply an increasing high pass filter to the music and FX as the protagonist health gets lower.
Apply a delay to cannon shots that gets longer the further away the shot is, creating a realistic depiction of how light travels faster than sound.
Make the tempo of a song gets faster and its EQ brighter as you approach the end of the level.
As your sci-fi gun wears off, the sounds get more distorted and muffled. You feel so relieved when you can repair it and get all its power back.
Do you see the possibilities this opens? You can express ideas in the game's plot and mechanics with dynamic and interactive sound design! Isn't that exciting? The takeaway concept that I want you to grasp from this post is that you would never be able to do something this powerful with just linear audio. Working on games makes you think much harder about how sound coming from objects and creatures behaves, evolves and changes.
As I said before, you are just providing the ingredients and the rules, the sound design itself only materializes when the player starts the game.
You can see on the above screenshot how an in-game parameter, distance in this case, affects an event layers' volume, reverb send and EQs.
How to get started
If I have piqued your interest, here are some resources and information to start with.
Fmod and Wwise are currently the two main middleware used by the industry. Both are free to use and not very hard to get into. You will need to re-wire your brain a bit to get used to the way they work, though. Hopefully, reading this post gave you a solid introduction on some of the concepts and tools they use.
If I had to choose one of them, Fmod could look less intimidating at first and maybe more "DAW user friendly". Of course, there are other options, but if you just want to have a first contact, Fmod does the job.
There are loads of resources and tutorials online to learn both Fmod and Wwise, but since I think that the best way to really learn is to jump in and make something yourself, I'll leave you with something concrete to start from for each of them:
Fmod has these very nice tutorials with example projects that you can download and play with.
Wwise has official courses and certifications that you can do for free and and also include example projects.
And of course, don't hesitate to contact me if you have further questions. Thanks for reading!