Using WAAPI with UE5 and C++
/Hi there!
I just published an article about WAAPI, Unreal and C++ on the Audiokinetic website have a look!
Hi there!
I just published an article about WAAPI, Unreal and C++ on the Audiokinetic website have a look!
When users migrate from FMOD to Wwise, the scatterer instrument is usually one of the most missed features. Let’s see how we can recreate its behaviour within Wwise but first, a quick overview at the instrument in FMOD (skip if you want).
This instrument essentially provides us with two handy features. On one hand, we can control how often sounds from a playlist will be triggered. This can be based on regular time or on musical tempo. On the other hand, we can also control the positioning each sound instance.
We can control the spawn interval with a min and max spawn interval value. Additionally, we also have a spawn rate knob which is very handy if we want to automate the spawning rate with a game parameter. For example, we could have a scatterer instrument for birds chirping and the spawning rate is reduced as the weather gets worse.
Something to always keep in mind with spawning is that each element in our playlist will use a voice. A fast spawning rate plus a high polyphony can result in a lot of voices, specially if you have several scatterer instruments so keep in them in check if you have a limited voice budget which is usually the case for mobile or portable VR.
If you have a min/max distance that is not 0, FMOD will play the sounds around the emitter in a random position between the two values. Note that this positions are “in world”, as if they had they own emitters in the game world. This is very nice for things like birds around a tree or water splashes in a lake.
Now, keep in mind that even if you don’t have a spatializer on your event, having a non 0 min and max distance on a scatterer instrument will effectively make that event 3D. This is important to remember if (like me) you are playing your events through custom code and have checks for 2D/3D.
A different use case for a scatterer instrument is: what if we just want to play sounds around the listener in a “non in world”, 2D way? To do this, you would need to firstly not have a spatializer on the event which is probably already the case, if you were just creating a 2D ambience. Then, you need to set the min and max distance to 0. This will result in all sounds having the same volume and pan but that can be easily fixed by just randomizing this values. In fact, the scatterer instrument already has a “Vol Rnd” knob that you can use for this.. To change the panorama, you will need an extra effect on the track or you can also randomize the event panner if you want to affect everything at once.
Firstly, we need to create a random container with all the Sound SFXs or other containers that we want to use. By default, a random container just plays one of its items following whatever randomization settings you have. So each time an event tries to play the container, we would just get one sound or voice. We can change this by changing the Play Mode from “Step” to “Continuous”. Now our container will try to play the complete list of objects we have. We can then make this loop so we cycle randomly through the list without ever stopping.
To control how often the sounds play, activate “Transitions” and choose “Trigger rate”. This allow us to control how often sounds are spawned in a similar way as in FMOD. You may want to randomize the duration value so sounds don’t play at a predictable rate.
On the example above, you can see that sounds will spawn at a rate between 0.5 and 2 seconds. We now have an easy way to control how often we spawn sounds. Of course, we would need to probably create an stopping event if we want to stop our sounds spawning at some point. Now we just need to figure out how to play these sounds around the game world.
In the positioning tab, under the 3D position section, we can find three different modes:
Emitter: This will make the position always the same (the position from which we have played the event). Use this if you don’t want any movement at all.
Emitter with Automation: This will take this same emitter position and add some movement that we can control. Use this if you want the sounds to be played in world, around the emitter. This will use whatever attenuation values you have set. You can even set paths with movement (think birds flying around). This will achieve the same effect as playing an FMOD 3D event with an scatterer instrument with a non 0 min and max distance.
Listener with Automation: Same as above but the sounds play around the listener. Make sure to tick “Hold Listener Orientation” if you want the sound to be completely 2D and thus independent on the listener movement. This means the sounds won’t feel like they come from an specific point in the world but a predetermined position in the panorama. Your attenuation settings will be respected though. Use this for purely 2D ambiences. This will be same effect as an FMOD scatterer instrument with 0 min and max distance and no spatializer.
When setting the 3D position automation, make sure you choose “Step“ so each time a sound spawns, a different path is chosen. You can also use “Random” if you want the position to change in a non predictable way.
After using both middleware I went from “is a shame Wwise doesn’t have a scatterer instrument kind of container” to “Wwise can re-create all the scatterer behaviour and even make it better!” so there’s that. Let me know how it works for you and if you find any issues yourself.
This third part is all about modulation. FMOD offers very nice and easy to use modulation that you can apply to any knob on a track, event master or event macros. I have used this quite a lot. The envelopes (AHDSR) are great to create dynamic events that for example ramp up in volume and pitch when you play them and ramp down when you stop them. The LFO and random modulations are great for doing creative and interactive sound design directly on FMOD.
Once I jumped to Wwise, I wanted to try to do similar stuff to the one I was used to doing in FMOD. You can, but it is a bit more convoluted. Once you get used to the new workflow is not to bad. Jump to the video below if you want to see how this works and don’t care for a deeper explanation.
Adding an envelope to volume is not hard for the attack part. You can just change the Fade Time on the Play action. It lives on the event itself, not on the items on the Actor-Mixer Hierarchy. This has interesting consequences. It means that, in Wwise, this modulation is not necessarily tied to the audio objects themselves but the actions within events. Once more, an example of how Wwise separates content and action. We will see later that this hasn’t to be necessarily the case (at least for the attack part).
I then tried to have a release time on the volume and this proved tricky. Same thing for trying to use envelopes on other stuff like pitch. After some looking around and many unfruitful google searches, I finally found a way to do it. Before I explain how, I want to clarify this can also be done using parameters and triggering them within Wwise, using event actions. In my opinion, that way is more complicated and unintuitive but I just wanted to mention that is possible and could the best way for your use case.
So, this is the way I discovered I prefer. You need to create an Envelope modulator on the ShareSets section. Then, go to your desired knob or fader on your container or Sound SFX, let’s say just volume for now. We add our envelope like so:
It will then appear on the RTPC tab. The graph that you see here can be confusing. It doesn’t represent the envelope itself, like the envelope graphs you find in FMOD. It represents how values should evolve as the envelope progresses and actually you can’t even add new points to the graph or change its shape. It may as well just be two numbers. That’s QUITE confusing. So how do we make sense of this? Let’s see. On the attack phase, our values will change from left to right or from 0.0 to 1.0. And on the release phase, is the other way around. What if we want different values for the attack and release? One way could be to create 2 different envelopes: one for attack and another for release, will explain more about how release works below.
What if we want to change the shape of the envelope curve itself? As you can see on the envelope settings on the right hand side above, there is an “Attack Curve” value that goes from 0 to 100. Looking at the Wwise documentation we can see that 0 represents “an exponential-style envelope where the rate of change starts slow and then increases” while 100 would be “a logarithmic envelope where the rate of change starts fast, then decreases“. This would also mean that 50 (which is the default) is a linear curve.
So now the obvious question is what about the release curve shape? Amazingly to me, there is no way to change it. From my tests, the release curve always seems to be linear. This is just a shame. A linear curve is not ideal in so many cases. This got me thinking… maybe this means that I’m not really using envelopes for their intended use and I’m doing a bit of a hack to bend them to my will? Not sure if that’s the case but I don’t understand why something so useful and basic is so uncomfortable on basically the industry standard audio engine.
Anyway, now that we have our envelope ready, if you play the container or Sound SFX you will see that the attack is working as we expect. But what about the release? If we stop the container/Sound SFX the release won’t work. If you look at the screenshot on the right, you will see that we are triggering the envelope when we play the event but nothing is telling Wwise that the release should be triggered when the sound is stopped. As far as I know, there is no way to do this from the Actor-Mixer Hierarchy. There is an “Auto Release” bool but this would just trigger the release after the sustain time which is not what we want.
So our only option is to change our stop event. Again this Wwise philosophy of separation between content and action comes to mind. Before we jump to the event side of things, notice that there is a “Stop Playback After Release” bool. Let’s make that true, it will be helpful for us.
So there is an action called “Release Envelope” that we can trigger on our stop event. The target is not the envelope asset on the ShareSets, but the container or Sound SFX that we are trying to stop. This makes sense because that same envelope could be used for many different things but this also implies that if you have multiple envelopes on there, you can’t selectively release them, is all or nothing. Which I guess is fine for the most part?
At first, I had a stop action and then the release envelope action. This won’t work because you are stopping the event before the envelope has any time to do its thing. A way around this is to delay the stop the same amount of time the release takes. This would work but is cumbersome. So a better way is to only have a release action since the event will stop as soon as the release time ends because of the “Stop Playback After Release” setting that we made true.
And now, finally, we are done. Now we can apply the same envelope to pitch or any other thing. We can also create two different envelopes if we want to change pitch and volume in different ways. By the way, to test this properly (and make tweaks on the values) you will need to use the Soundcaster view.
For illustration, here is a video of the process of setting all of this up in both FMOD and Wwise:
This seems to follow roughly the same workflow. Is easy to do on both middleware, although FMOD offers a visual representation of the LFO shapes which is nice but not super important.
On Wwise, you need to create an LFO ShareSet and then apply it to whatever you like. And that’s pretty much it. You get again an entry on the RTPC tab where you can change the range of the modulation, while the “Depth” value will determine how far we go in that range value on each cycle.
We don’t need to worry about all that release shenanigans since the LFO modulation will just always play, although we could automate or modulate its depth easily enough. Wwise offers roughly the same shapes as FMOD although we are missing noise options. What it does offer is a random option which would also function similarly to FMOD’s random modulation type. This random option doesn’t have any settings that we can change but it works well for what is designed to do, I think.
Here are some more thoughts about how FMOD and Wwise do things differently:
This is a concept that works quite differently in Wwise, compared to FMOD. Let’s have a look:
RTPC (Real Time Parameter Control): This a direct equivalent to FMOD parameters and they work pretty much in the same way. I believe both are floats under the hood.
Something that I feel is clearer on FMOD is parameter scope. All parameters are either local or global and you choose this when creating them. But in the case of Wwise, as far as I have seen, parameters are agnostic and you really choose scope when you send values to them via code or blueprints. I guess that’s handy? I can see how it can also lead to confusion and errors since most parameter won’t change scope during runtime.
Is also interesting to note that on both Wwise and FMOD you have built-in parameters for things like distance, elevation, etc.. Very handy.
Switches: So on Wwise you can create switch groups which in turn contain individual switches. These basically work as enum type parameters. The classic example is to have a switch for material types so you can then change footsteps sounds using a switch container.
We would achieve the same with FMOD by just using labelled parameters. This is another key difference: FMOD offers different parameter types and each of them are useful in a different context. Is important to remember that under the hood all parameters are always a float variable so these types are there just make things easier for the user on the FMOD Studio side.
States: These work in a similar way to FMOD snapshots. You can call different states from code and then change values on Sound SFXs or buses.
Something to consider is that FMOD uses the concept of overriding and blending snapshots. The former type will always “force” whatever value you have set, following the snapshot priority order, while the latter will just apply their change additively (+3dB, for example) on top of whatever is the current position of the fader or knob. As far as I can see, Wwise states are basically always “on blending mode” since you can nudge values but not force them. As a consequence, there is no concept of state priorities in Wwise like we have on FMOD.
Triggers: I don’t know much about these because I haven’t done music work on Wwise but as far as I can see FMOD would accomplish the same with just normal parameters. I’m sure that if Wwise has its own bespoke system to play music, it must have cool features to blend and mix music but I still need to learn about this.
On FMOD all events are 2D until you add an spatializer while on Wwise SFX Objects (or containers, or Actor-Mixers…) usually inherit values from their parents but I believe they are 2D by default too. What I mean is that if you create an SFX Object on a new working unit, the 3D spatialization will default to ‘None’.
Regarding other positioning options, things are organized and even named quite differently but roughly we can see that:
Wwise Speaker Panning / 3D Spatialization modes:
Direct Assignment: You would normally use this if you want to your SFX to be 3D and update its position when the associated emitter moves. 3D Spatialization needs to be on “Position” or “Position + Orientation”.
On FMOD: This is the normal behaviour of the vanilla spatializer.
Balance-Fade: You usually use this for 2D SFXs that you just play on regular stereo, mono or even surround. 3D Spatialization needs to be on “None”. These settings won’t update position in any way when things move around in the game world so they don’t need an associated emitter.
On FMOD: This is how an event without an spatializer basically works. You can also have an spatializer but use the pan override function to directly play audio on specific channels.
Steering: As far as I can see, this works like Balance-Fade but also allows to distribute the sound on a Z (vertical) coordinate. I’m not sure why this is a different mode, maybe I haven’t really understand why is there.
On FMOD: You can’t control verticality as far as I know.
FMOD Envelopment is an interesting concept that I don’t see in Wwise, at least not directly. This gives you the ability to make an SFX source wider or narrower in terms of directionality. Let’s see how it works:
Before anything else, on FMOD every event has a min and max distance. This used to be at the spatializer level but now lives on the event which has huge advantages to build automated systems. Anyway, the min distance is the distance at which the attenuation will start taking effect while the max distance is where the attenuation will cease.
Now let’s look at the envelopment variables, which we can find on the spatializer. We have two variables to play with: sound size & min extent. As the listener is closer to the min distance, the sound will increasingly be all around us. When the distance is equal or smaller than the sound size, we would be “inside” the SFX so it would have no directionality. On the other hand, as we get further away, the SFX would be a smaller point in space, so directionality increases. Min Extent would define how small the sound can get when you are further away or, in other words, how narrowly directional.
I assume you can re-create this behaviour by hand on Wwise using a distance parameter and automating the Speaker Panning / 3D Spatialization Mix value but I haven’t tried this yet.
Now let’s look at something Wwise offers but FMOD doesn’t have. On Wwise, we can find a way to automate movement on the 3D position based on where the emitter or listener are. As far as I know, this is not possible with FMOD, at least not with just FMOD Studio. Probably it can be done with the FMOD API? That sounds like an interesting thing to try to build.
Anyway, let’s have a quick look at what Wwise offers, this lives under the 3D Position options (Positioning tab).
Emitter: This is just normal positioning defined by the game engine itself. We would use this most of the time.
Emitter with Automation: You start with the same game based position as the above position but you can add some extra movement on top of it. Remember that the game engine is completely unaware of this movement, is not like Wwise is moving GameObjects/Actors or anything like that. The movement is just on the audio Wwise level.
Listener with Automation: This is a similar concept as the previous option but the movement is based on where the listener is, instead of the emitter. This is useful for SFXs that we want to move in relative positions around the player or camera (depending where the listener is). A perfect example would be ambience spots.
Wwise bases its attenuation workflow on ShareSets. These are essentially presets that can be used by any Sound SFX, container or Actor-Mixer. Each attenuation preset allows you to change different properties as distance increases.
In contrast, FMOD offers distance attenuation built-in on the spatializer and it only affects volume. If you want to achieve something similar to Wwise attenuations and be able to re-use the same attenuation for several events, you would need to save your spatializer on an effect preset although that is not going to give you a lot of options, just the curve shape.
If you want to have a much more flexible system on FMOD, you need to turn off the distance attenuation on the spatializer and directly add a built-in distance parameter to the event. Then you can use the parameter to automate levels, EQs, reverb sends, etc… If you then save all this automation within a effect chain preset, you would have something as powerfull and flexible as Wwise attenuations (although with quite more work). If you want to know more about FMOD effect presets, check my article about it.
Is funny that I prefer Wwise attenuation system but, in contrast, I think FMOD’s orientation system is more powerful. On Wwise, orientation automation lives on the attenuation shareSets, which I think makes a lot of sense. You can set how levels and filters change as audio the listener is in front, to the side or behind the source. Basically think about this as if the audio emitter is a directional speaker, the frequency response and levels would change as we move behind it.
If you know FMOD orientation options, Wwise cone attenuation may remind you to FMOD’s Event Cone Angle built-in parameter. They are basically the same. The thing is FMOD offers two more orientation parameters, taking also into account the listener’s angles. If you want to know more about this works on FMOD, check my article here. As far as know, there is no equivalent to these advanced orientation parameters on Wwise.
Hi there! November 2022 Javier here again. I was wrong about this!. You can actually find something similar to FMOD’s orientation parameters by using Wwise built-in parameters. This means that you won’t find all of them on the attenuation set but on the RTPCs that you can apply to the object. Let’s see how these correspond to the ones in FMOD: (reading this before may help understanding)
Angle formed by the line between listener and emitter and the listener’s orientation. Emitter orientation is irrelevant:
FMOD: Direction
Wwise: Listener Cone
Angle formed by the line between listener and emitter and the emitter orientation. Listener orientation is irrelevant:
FMOD: Event Cone
Wwise: This is the cone attenuation that you can find on the attenuation share set.
Angle formed between the emitter orientation and the listener orientation (projected in 2D, verticality is ignored in both cases)
FMOD: Event Orientation
Wwise: Azimuth
After working with FMOD everyday for a couple of years, I started a new job where we are using Wwise. This change give me the chance to think about the differences between the two.
Let me be clear, I don’t particularly advocate for either of them but it is interesting to think about their different concepts and workflow. You can also consider these notes a companion if you are migrating from FMOD to Wwise like I did.
Events are the minimal functional unit for playing audio. This is is a concept that is shared by both softwares but the workflow is quite different.
On FMOD, events are self-contained in the sense that the event itself has a timeline where you can add any audio that you want. I think this is more straightforward and DAWish which is very novice friendly. You also have an audio bin where all your audio assets sit, ready to be added to events. All the rules that determine how audio will play at runtime can also be found on each event, on the Event Macros section. Have a look at this article if you want to know how these work.
Wwise does this very differently, which might be confusing at first. To start with, there is no audio bin per se, but we use a “Sound SFX”. These don’t represent a single audio file necessarily, but a “channel” that plays audio. But don’t really think about Sound SFXs as channels in the sense of a mixing desk or DAW, they are more like containers with a set of rules to describe how to play audio. So things like pitch, volume, looping or insert effects live on Sound SFXs on Wwise. On FMOD, these would live on each track within an event. But Wwise Sound SFXs also contain things like positioning settings which in the case of FMOD live on the event level.
But if we want to play audio from our game with Wwise, Sound SFXs are not the thing we need to call. We need to create an event which in turn can play our Sound SFX. In other words, Wwise separates content and action, while FMOD keeps them together. Is one better than the other? I don’t think so, they are just different approaches to solve the same problem and you just need to get used to each way of them.
These guys are also a bit weird, specially at first. Despite their name, they are NOT the main way you want to mix your game. Actor-Mixers don’t act as a bus for all the Sound SFXs that they contain, so their audio signals are not really combined at this level. That happens on buses and auxiliars on the Master-Mixer Hierarchy, which are the best option for actually mixing your game.
So Actor-Mixers act as a yet another container, they are useful for setting effects, positioning or parameter curves for a bunch of Sound SFXs at the same time. Remember when I said Sound SFXs were behaviour containers? Well Actor-Mixers are kind of the same thing, they are basically a group of Sound SFXs that share the same behaviour.
Grouping SFXs like this is useful if you want to tweak things in the same way for a bunch of Sound SFXs and also if maybe you want to do general mix tweaks but they are probably not the best option for most mixing work, particularly dynamic mixing based on states.
Is hard to find an equivalent on FMOD to the Actor-Mixer, the closest thing could be a mix between VCAs and effect chain presets but it is fundamentally a very different way to organize things.
This illustrates yet again the difference in approach on each middleware. Is going to sound odd but bare with me.
So Wwise uses this concept of “Game Calls” which are basically orders that the game sends to Wwise. These usually match event names like for example “Play_Jump” or “Stop_MainMusic” but if you think about it, game calls are in reality generic and ambiguous in the sense that they don’t have to do what their names suggests.
So an approach that maybe makes more sense with Wwise is to give game calls names like “Protagonist_Jump”. We know that the character jumped so now is our job to decide what audio should do to reflect that action. Usually, this means playing a Sound SFX but it could also mean stopping or pausing other Sound SFX, maybe even changing some other audio’s settings. This philosophy gives you more flexibility but it takes a bit to wrap your head around it. That’s why, when you set up a Wwise event, you can do all sorts of actions in there, although 90% of the time, you probably just want to play or stop audio.
Contrast this with FMOD, where the game engine is always explicit in how it sends orders. When using the FMOD API, we directly and explicitly say “play this event” or “stop that event” so, for the most part, there is no room to do anything else, although you can break this workflow a bit by using command instruments.
This difference in philosophy is something that I don’t usually hear people talking about. Is subtle but I think is one of the keys to understand how Wwise and FMOD do things differently. The net result is that Wwise keeps a bit more logic control on the middleware side but pays the price of being a bit more confusing. FMOD, on the other hand, is more straightforward but more control resides on the game engine side since the calls are always specific. To summarize this point:
Wwise: Game calls are ambiguous, not necessarily tied to an specific action like playing audio.
FMOD: Game calls are unambiguous and very specific: tied to an specific event and action.
On Wwise, everything resides on a container called “Work Unit” from Sound SFXs to events and parameters. YES, another container! This is a bit weird, since it feels like the type of thing that is not usually user facing but an internal system to organize files. Work Units are really .xml files which is funny because FMOD also uses this format but the files are not as visible to the user.
The idea is to organize things in these different containers, so different people can work on the same project. They won’t conflict as long as they work on different units.
This makes a lot of sense until you realize that FMOD doesn’t have anything similar to this and it works nicely anyway when sharing a project with others, at least in my experience. I have used SVN source control, not even the official implementation that FMOD has (this gave me issues), just plain sharing the FMOD project files with SVN. You won’t conflict as long as you don’t work on the same event or mixer bus so FMOD is actually more granular than a work unit, is like each element is its own working unit!
Wwise containers work in a kind of similar way to FMOD instruments. Roughly, you can make the following comparisons:
Wwise | FMOD | Notes |
---|---|---|
Sound SFX | Single Instrument | They work in the same way. |
Random Container | Multi Instrument | Similar randomization options. |
Sequence Container | -- | FMOD allows this by allowing many instruments on an event. Is probably easier to setup too. |
Switch Container | -- | Om FMOD, you woud just use a parameter sheet or trigger conditions on instruments. |
Blend Container | -- | FMOD does this natively on its timeline since you can crossfade instruments based on time or parameter values. |
-- | Event Instrument | Wwise doesn't allow nested events in the same way as FMOD does but you have more flexible event actions. |
-- | Scatterer Instrument | Can be recreated in Wwise! |
-- | Programmer Instrument | Nothing like this on Wwise, but there is no need since game calls are ambiguous. |
-- | Command Instrument | Same as above. |
-- | Snapshot Instrument | On Wwise, you can always trigger states from an event an achieve the same result. |
So as you can see, between Wwise and FMOD, there is no perfect option. I particularly miss the scatterer instrument in Wwise but is true that it has its limits and annoying things in FMOD anyway. For example, the fact that it always plays a sound the first time you enter the instrument, without respecting the spawning time.
On the other hand, on Wwise, you don’t have these specific instruments like the programmer or command instruments but there is no need really, since you can add as many actions as you want to an event.
I just think you need to get used to the differences and learn to think in a different way for each piece of software. Advantages and disadvantages on both sides for sure. In engineering, every decision is a trade off.