December 24, 2018

Figuring out: Measuring Loudness

December 24, 2018/ Javier Zumer

How loud is too loud?

There are many loudness standards nowadays and many types of media and platforms so making sure audio is on the correct level everywhere can be tricky. In this post, I’m going to talk about the history of measuring loudness and the current standards that we use nowadays.

The analogue days

The first step to measure loudness is to define and understand the fundamental nature of the decibel. Luckily, I wrote a post last year about this very subject so you may want to check that before diving into loudness.

So, now that you are accustomed with the dB, let’s think about how we can best use it to measure how loud audio signals are.

In the analogue days, reading audio levels always implied measuring voltage or power in a signal and comparing it to a reference value. When trying to determine how loud an audio signal is, we can just measure these values across time but the problem is that levels are usually changing constantly. So how do we best represent the overall level?

A possible approach would be to just measure the highest value. This method of measuring loudness is called Peak and is handy when we want to make sure we are not working with levels above the system capacity to make sure our signals are not saturated. But in terms of measuring the general level of a piece of audio, this approach can be very deceiving. For example, a very quiet signal with a sudden loud transient would register as loud despite being quiet as a whole.

As you are probably thinking, a much better method would be to measure an average value across a certain time window instead of the instant reading that peak meters provide. This is usually called RMS (root mean square) metering and it is much closer to how we humans perceive loudness.

Let’s have a look at some of the meters that were created:

**Real audio signal (grey) and how a VU meter would interpret it. (black)**

VU (Volume Unit) meters are probably the most used meters in analogue equipment. They were designed in the 1940s to measure voltage with a response time similar to how we naturally hear. The method is surprisingly simple: the needle’s own weight slows down its movement by around 300 ms on both the attack and the release so very sudden changes would be soften. The time that the meter needs to start moving is usually called the integration time. You will also hear the term “ballistics” to define these response times.

The PPM (peak programme meter) is a different type of meter that was widely used in the UK and Scandinavia since the 1930s. Unlike the the VU meter, PPM uses very short attack integration times (around 10ms for type II and 4ms for type I) while using relatively long times for the release (around 1.5 seconds for a 20dB fall). Since these integration times are very short, they were often consider quasi-peak meters. The long release time helped engineers see peaks for a longer time and get a feel of the overall levels of a programme since levels would fall slowly after a loud section.

The Dorrough Loudness Meter is also worth mentioning. It combines a RMS and a peak meter in one unit and was very common in the 90s. We will see that combining a RMS and peak meter in a single unit was going to be a trend that will carry on until today.

The dawn of Digital Audio

As digital audio started to become the new industry standard, new ways to measure audio levels needed to be adopted. But how do we define how much 0 is in the digital realm? In analogue audio, the value we assign to 0 is usually some meaningful measure that help us avoid saturating the audio chain. These values used to be measured in volts or watts and would vary depending on the context and type of gear. For example, for studio equipment in the US, 0VU corresponds with +4 dBu (1.228 V) while europe’s 0VU is +6 dBu (1.55 V). Consumer equipment uses -10dBV (0.3162V) as their 0VU. As you can see, the meaning of 0VU is very context dependant.

In the case of digital audio, 0dB is simply defined as the loudest level that flows through the converters before clipping, this is, before the waveform is deformed and saturation is introduced. We call this definition of the decibel, dBFS (Decibel Full Scale). How digital audio levels correspond with analogue levels depends on how your converters are calibrated but usually 0VU is equated to around -20dBFS on studio equipment.

**Fletcher-Munson curves showing frequency sensitivity for humans. How cool would it be to see the equivalent curves for other animals, like bats?**

The platonic loudness standard

Since dBFS is only a scale in the digital world, we still need to find a way to measure loudness in a human friendly way within digital audio. As we have seen, this is usually accomplished by averaging audio levels across a certain time window. On the other hand, digital audio also needs precision when measuring peaks if we want to avoid saturation when converting audio between analogue and digital and viceversa.

Something else that we need to take into consideration for our standard is the fact that we are not sensitive to all frequencies in the same proportion as the Fletcher–Munson curves show. As you can see, we are not very sensitive to low or very high frequencies, if we want our audio levels to be accurate, this is something that needs to be accounted for.

So, I have laid out everything that we need our loudness standard to have. Does such thing exist?

The ITU BS.1770 standard

This document was presented by the ITU (International Telecommunications Union) in 2006 and fits all the required criteria we were looking for. The ITU BS.1770 is really a collection of technologies and protocols designed to measure loudness accurately in a digital environment. It is really a set recommendations, we could say.

Four revisions have been released at the time of this writing plus the ITU BS.1771 which also expands on the same ideas. For simplicity, I will refer to all of these documents as simply the ITU BS.1770 or just ITU.

The loudness unit defined by the ITU is the LKFS, which stands for “Loudness K-weighted Full scale”. This unit combines a weighting curve (named “K”) to account for frequence sensitivity along with an averaged or RMS measurement that uses a 400 ms time window. The ITU also defines a “true peak” meter as a peak meter that uses oversampling for greater accuracy.

Once the ITU released their recommendations, each region used it as the foundation for their own standards. As the ITU released new updates each region would incorporate some of these ideas while expanding on them. Let´s see some regional standards.

EBU R128, Time Windows & Gates

This is the standard in use in Europe and it is released by the EBU (European Broadcast Union).

Before I continue, a clarification. The EBU names the loudness unit LUFS (Loudness units relative to full scale) instead of LKFS as the former complies better with scientific naming conventions. So if you see LUFS, keep in mind that this is pretty much the same as LKFS. On the other hand you will also see LU (Loudness Units). This is simply a relative unit that is used when comparing two LUFS or two LKFS values.

In the R128 standard, four different times windows are defined. This is based on the ITU BS.1771 recommendation. A meter needs to have all these plus some other features (see below) to be considered capable of operating in “EBU Mode”.

True-Peak: Almost instantaneous window with sub-sample accuracy.
Momentary: 400 ms window. Useful to get an idea of how loud a particular sound is. Plugins usually offer different scale options.
Short Term: 3 seconds window. Gives a good feel of how loud a particular section is.
Integrated or Programme:. Indicates how loud the whole programme is in its whole length. Sometimes it’s also called “Long Term”

Why so many different time windows? In my opinion, they are useful when working on a mix since they tell you information at different levels of resolution. True-peak tells you wether you would saturate the converters and it is good practice to always keep some headroom here. The momentary measurement is more or less similar to what VU meters would indicate, and gives you information on a particular short section. I personally don’t really look at the momentary meter much because any mix with a decent amount of dynamic range is going to fluctuate here quite a bit. Nevertheless it is useful to make sure that the mix is not very far away from the target levels on some specific sections.

Short term maybe a better tool to get a solid feel of how loud a scene is. This measurement is going to fluctuate but not as much as the momentary value. In order to get a mix within the standards, you need to make sure the short term value is usually around the target level, but you don´t need to be super accurate with this. What I try to do is make a compromise between the level that feels right and my target level and when in doubt, I favor what it feels right.

Finally, the integrated or long term value has a time window with the size of the whole show. This is the value that is going to tell you the overall level and measuring it in a faithful way is tricky as you will see below.

So, I was mentioning “target levels”. Which levels? The EBU standard recommends audio to be at -23 LUFS ±0.5 LU (±1 LU for live programmes). We are talking here about the integrated measurement, so the level for the entire show. Additionally, the maximum true peak value allowed is -1 dBTP. And that would be pretty much it, although there is one more issue as I was saying. Measuring levels throughout a long length of time in a consistent way comes with some challenges.

This is because there is usually a main element that we want to make sure is always easy to hear (usually dialogue or narration) and since audio volume is logarithmic, that main element would pretty much carry 90% of the show’s loudness weight. So we would naturally mix this element to already be at the desired loudness or slightly below. The problem comes when considering all the other elements around the dialogue. If there are too many quiet moments, that it’s going to make our integrated levels quite low, since everything is averaged.

The solution would be to either push the level of the whole show or re-mix the level of the dialogue louder so the integrated value is correct. Either way that would probably make the dialogue too loud and we would also risk saturating the peak meter. Not ideal.

**Nugen´s VisLM Plugin operating in EBU mode. You can see all the common EBU features including all time windows, loudness range and a gate indicator.**

In order to fix this the R128 uses the recommendations from the revisioned ITU BS.1770-3. Integrated loudness is calculated using a relative gate method that effectively pauses the measurement when levels drop below a threshold of -10 LU relative to an un-gated measurement. There is also an absolute gate at -70 LUFS, nothing below this value would be consider for the measurement. These gates help us getting a more meaningful result since only the relevant audio in the foreground will be considered when measuring the integrated time.

The last concept I wanted to mention is loudness range or LRA. This is measured in LU and indicates how much the overall levels change throughout the programme, in a macroscopic view. You can think of this as an indication of the dynamic range of your mix: low values would indicate that the mix has a very constant level while higher values would appear when there is a larger difference between quiet and loud moments. The EBU doesn’t recommend any given target value for the loudness range since this would depend on the nature of the show but it is for sure a nice tool to have to get an idea of your overall mix dynamics.

ATSC A/85

This is the standard used in the US and is released by the ATSC (Advanced Television Systems Commitee). It uses LFKS units (remember that LKFS and LUFS are virtually equivalent) and similar time windows to the europeans. The recommended integrated value is -24 LKFS while the maximum peak value allowed is -2 dBTP.

When the first version was released in 2009, this standard recommended a different method when when calculating the integrated value. As you know, the EBU system uses a relative gate in order to only consider foreground audio for its measurements but the ATSC took a different approach. Remember when I was saying before that mixes usually have some main element (often dialogue) that forms the center of the mix?

The ATSC called this main element an “anchor”. Since dialogue is usually this anchor, the system used an algorithm to detect speech and would only consider that to calculate the integrated level. I’ve done some tests with both Waves WLM and Nugen VisLM and the algorithm works pretty well, the integrated value doesn’t even budge when you are just monitoring non-dialogue content although singing usually confuses it.

In fact, on the 2011 update, the ATSC standard started to differentiating between regular programmes and commercials. Dialogue based gating would be used for the former while the all elements in the entire mix would be consider for the latter. This was actually one the main goals of the ITU standard initially: to avoid commercials being excessively loud in comparison to the programmes themselves.

Nevertheless, the ATSC updated the standard again in 2013 to follow the ITU BS.1770-3 directives and from then on all content would be measured using the same two gated method Europe uses. Because of this, I was tempted to just avoid mentioning all this ATSC history mess but I thought it was important to explain it, so yo can understand why some loudness plugins offer so many different ATSC options.

Here you can see the ATSC options on WLM. The first two would be pre 2013, using either dialogue detection or the whole mix to calculate the integrated time. The third, called “2013” used the gated method ala Europe.

TV Regional and National Standards

Now that we have a good idea of all the different characteristics standards use, let’s see how they compare.

  
    Country / Region
    Standard
    Units Used
    Integrated Level
    True Peak
    Weighting
    Integrated level method
  

    Europe
    EBU R128
    LUFS
    -23 LUFS
    -1 dBTP
    K
    Relative Gate
  

    US
    ATSC A/85 post 2013
    LKFS
    -24 LKFS
    -2 dBTP
    K
    Relative Gate
  

    US
    ATSC A/85 pre 2013 (Commercials)
    LKFS
    -24 LKFS
    -2 dBTP
    K
    All elements are considered
  

    US
    ATSC A/85 pre 2013 (Programmes)
    LKFS
    -24 LKFS
    -2 dBTP
    K
    Dialogue Detection
  

    Japan
    TR-B32
    LUFS
    -24 LUFS
    -2 dBTP
    K
    Relative Gate
  

    Australia
    OP-59
    LKFS
    -24 LKFS
    -2 dBTP
    K
    Relative Gate
  

As you can see, currently, there are only small differences between them.

Loudness for Digital Platforms

I have tried to find the specifications for some of the most used digital platforms but I was only able to find the latest Netflix specs. Hulu, Amazon and HBO don’t specify their requirements or at least not publicly. If you need to deliver a mix to these platform, make sure they send you their desired specs. In any case, using the latest EBU or ATSC recommendations is probably a good starting point.

In the case of Netflix, their specs are very curious. They ask for a integrated level of -27 LKFS and a maximum true peak of -2 dBTP. The method to measure the integrated level would be dialogue detection, like the ATSC used to recommend, which in a way is a step back. Why would Netflix recommend this if the ATSC spec moved on to gated based measurements? Netflix basically says that when using the gated method, mixes with a large dynamic range tend to leave dialogue too low so they propose a return to the dialogue detection algorithm.

The thing is, this algorithm is old and can be inaccurate so this decision was controversial. A new, modern and more robust algorithm could be a possible solution for this high dynamic range mixes. Also, -27 LKFS may sound too low but it wasn’t chosen arbitrarily but based on the fact that that was the level where dialogue would usually end up on these mixes. If you want to know more about this, you can check this, this and this article.

Loudness for Theatrical Releases

The case of cinema is very different from broadcast for a very simple reason: you can expect a certain homogeneity in the reproduction systems that you won’t find in home setups. For this reason there is no hard loudness standard that you have to follow.

  
    Dolby Scale
    SPL (dBC)
  
    7
    85
  
    6.5
    83.33
  
    6
    81.66
  
    5.5
    80
  
    5
    78.33
  
    4.5
    76.66
  
    4
    75
  
    3.5
    65

This lack of general standard has resulted in a similar loudness war to the one in the music mixing world. The result are lower dynamic ranges and many complains about cinemas being too loud. Shouldn’t cinema mixes offer a bigger dynamic range experience than TV? How are these levels determined?

Cinema screens have a Dolby box where the projectionist would set the general level. These levels are determined by the Dolby Scale and correspond to SPL measures under a C curve when using the “Dolby noise”. Remember that, in the broadcast world, the K curve is used instead which doesn’t help things when trying to translate between both.

Nowadays more and more cinemas are automated. This means that levels are set via software or even remotely. At first, all cinemas were using level 7, which is the one recommended by Dolby but as movies were getting louder and people complained, projectionists would start to use lower levels. 6, 5 and even 4.5 are used regularly. In turn, mixers started to work in those levels too which resulted in louder mixes overall in order to get the same feel. This, again, made cinemas lower their levels even more.

You see where this is going. To give you an idea, Eelco Grimm together with Michel Schöpping analyzed 24 movies available at dutch cinemas and found out levels that would vary wildly. The integrated level went from -38 LUFS to -20 LUFS, with the maximum Short-term level varying from -29 LUFS to -8 LUFS and the maximum True-Peak level varying from -7 to +3.5 dBTP. Dialogue levels varied from -41 to -25 LUFS. That’s quite a big difference, imagine if that would be the case in broadcast.

The thing is that despite these numbers being very different, we have to remember that all these movies probably were played at different levels on the dolby scale. Eelco says on his analysis:

The average playback level for movies mastered at '7' is -28 LUFS (-29 to -25).
The average playback level for movies mastered at '6.3' is -23 LUFS (-25 to -21). They are projected 3 dB softer, so if we corrected the average to a '7' level, it would be -26 LUFS.
The average playback level for movies mastered at '5' is -20 LUFS (all were -20). They are projected 7 dB softer, so the corrected average would be -27 LUFS.

So, as you can see, at the end dialogue level is equivalent to about -27 LUFS in all cases, the only difference is that the movies that were mixed at 7 (which is the recommended level) would have greater dynamic range, something important to be able to give a cinematic feel that TV can’t provide. The situation is quite unstable and I hope a solid solution based in the ITU recommendations is implemented at some point. If you want to know more about all this issue and read the paper that Eelco Grimm released, check this comprehensive article.

Loudness standards for video games.

Video games are everywhere: consoles, computers, phones, tablets, etc, so there is no clear standard to use. Having said that, some companies have stablished some guidelines. Sony, through their ASWG-R001 document recommends the following:

-23 LUFS and -1dBTP for Playstations 3 and 4 games.
-18 LUFS and -1dBTP for PSVita games.
The maximum loudness range recommended is 20 LU.

But how do you measure the integrated loudness in a game? Integrated loudness was designed for linear media so Sony’s document recommends to make measurements in 30 minutes sessions that are a good representation of different sections of the game.

So, despite games being so diverse in platforms and contexts using the EBU recommendations for consoles and PC (-23 LUFS) and a louder spec for mobile and portable games (-18 LUFS) would be a great starting point.

Conclusions and some plugins.

I hope you now have a solid foundation of knowledge for the subject. Things will keep changing so if your read this in the future, assume some of this information is outdated. Nevertheless, you would have hopefully learned the concepts you need to work with loudness now and in the future.

If you want to test loudness, many DAWS (including Pro Tools) don’t have a built-in meter that can measure LUFS/LKFS but there are plugins to solve this. I recommend that you try both Waves WLM and Nugen VisLM. If you can’t afford a loudness plugin, you can try Youlean, which has a free version and is a great one to start with.

Thanks for reading!

November 30, 2018

Exploring Sound Design Tools: Sound Particles

November 30, 2018/ Javier Zumer

Sound Particles allows you to create soundscapes and sound design using virtual particles that can be associated with audio files. The results are then rendered using virtual microphones.

If you want to check it out or follow this review along, you can download the demo here. It has all the features of the paid version but is limited for non-commercial projects only.

I won’t explain how to use the software in depth but I will give an over overview and show some practical uses for everyday work in sound design. If you want to get a more in-depth explanation, you can also watch this tutorial.

Sound Particles interface. Nice, clean and responsive.

Features Overview

The heart of the program are the particles. You can basically create them in three different ways:

A Particle Group will create any number of particles at the same time in an area or shape of your choice.
A Particle Emitter creates particles over time at a particular rate.
A single point source is just a single particle.

By default, particles are created as soon as you hit play, although you can also choose to change the start time to delay their creation. Generally, they last as much as the length of audio file attached to them.

You can choose the coordinates used to create your particles and also move the individual particles around the scene to create different effects. Particle emitters can also be moved. The movements that you can apply to the particles stack with each other, giving you an amazing amount of options to create motion. Keyframes can also be used to match any movement to a reference video.

See the video below for an example with the three types of particles:

So in the video you can see:

A particle group (red) that generates particles in a square shaped area. These particles are not created at the same time because we have also applied a random delay. They have fireworks sounds attached.
A particle emitter (orange) is moving in a circular motion while the particles that creates also have some small random movement. They have magical sounds attached.
A single point source (pink) with my voice paulstreched to infinity.

You can also apply audio modifiers to each particle group. These will randomize certain parameters so you obtain more interesting and varied results. If you think about this, this is similar to how audio works in the real world. Each time you take a step, your shoe makes a slightly different sound: pitch, level and timing will be different. Sound Particles lets you randomize the audio from each particle in a similar way. The audio modifiers are:

Gain: Basically, audio level.
Delay: This determines when the particle is created. It is very useful because usually you don’t want all particles in a group to be created at the start. In the example above, the red particles are being created with a random delay.
EQ: It applies different filters and bands of EQ to each particle so they don’t sound exactly the same.
Granular: This is kind of a special modifier. It slices the audio file and then plays each slice from a certain particle. You can control how long the slice is or even leave it random. You can also control if the slices are then played in sequence or at a random order.
Pitch: It applies a different pitch shifting value to each particle.

For any single parameter that requires randomization, you can choose different probability distributions to get the result that you want. An uniform distribution (all values have the same weight) and a normal distribution (most values will be around the mean) are probably the most useful ones. You can even create a custom distribution which is pretty awesome.

Of course, once you have the particles ready, you need a virtual microphone to capture the result. On this area, the amount of options are simply amazing. Not only you can place the microphone anywhere in the scene but you can choose between many configurations including M/S, X/Y and all sorts of surround and ambisonic configurations.

If that wasn´t enough, you can also create several microphones on the same scene and render different stems per microphone. These stems can contain different combinations of particles so you can have more control later on the mix.

Finally, the project settings page allows you to control how Sound Particles is going to manage sound propagation and attenuation from distance. You can change the speed of sound, simulate the delay of far away sounds, change how much sounds attenuate with distance or wether your scene uses the doppler effect.

Microphone configurations can follow a variety of speaker setups

Sound design examples

Enough with the theory, let´s hear some real applications. Since sound particles is much easier to understand when you see the particles in movement, I decided to create a video for every example instead of just audio.

Battlefield soundscape

This is very simple but could be very useful if you need create a soundscape and don´t want to move every single sound into place by hand. As you can see, is very easy and quick to create a randomized soundscape. Something I feel I miss here is a bit more control on which sounds are triggered. When you have different types of sounds, it would be nice to be able to trigger some sounds only occasionally in the same way you can do this in fmod or wwise.

It would also be helpful to be able eliminate a particular particle that moves too close to the mic or at least be able to prevent them to getting too close without using complex custom distributions.

Scifi Interface

Now let’s imagine we are building a somewhat cheesy 80´s computer interface with beeps and blops and some folders flying around the screen.

As you can see, we are using two particle systems at the same time. One of them (blue) creates all the beeps in a circle around the listener while the orange is a particle emitter that throws particles horizontally to simulate things flying by.

Playing with pitch

Let’s explore how we can use the pitch randomization feature to create new, complex sounds from simple ones. On this example, I first use a uniform distribution for a more detuned and unsettling effect. We can also use a discrete distribution so the jumps in pitch are strictly within certain semitones, obtaining a more musical result.

As you can see, just changing the distribution can produce very different results.

We can also automate pitch to create dynamic effect like for example making all the frequencies converge on a central one. The THX deepnote was achieved with a a similar method.

Granular synthesis

This modifier offers many sound design possibilities. You can see an example below of building some sort of alien speech sound step by step.

We can also obtain a “voices in my head” effect by slicing up some speech and distributing it around the listener. As you can see, we can always re-create the particles to obtain a new variations which is very handy for video game work.

Doppler Effect

There are many plugins that recreate a doppler effect but this one for sure offers a unique visual approach. As you can see below, we can create a doppler effect on a single particle or on many.

Conclusion

I hope you found this software interesting, I think is a very good tool to have in your arsenal and I feel I have barely scratched the surface with the sonic possibilities that offers. I believe there is an update coming soon for Sound Particles and I may have another look then and write a new post covering the new features.

You can also have a look at a couple of plugins that Nuno Fonseca, Sound Particles creator has released. They allow you to use the doppler and air absorption simulations that Sound Particles has but in a convenient plugin that you can use in your DAW.

October 27, 2018

Interview on La Bobina Sonora

October 27, 2018/ Javier Zumer

I have been interviewed on the site “La Bobina Sonora” which is dedicated to the spanish and latin america audio community. I thought it would be interesting to translate the interview into english in case you want to have a look. There are some insights into my career history, the way I approach sound design and mixing and the projects I was working on at the time (October 2018). So, here we go!

LA BOBINA SONORA: Before starting with the interview, I just wanted to thank you for your presence here at labobinasonora.net.

JAVIER ZUMER: Thanks you for the invitation, I’ve been reading the blog for years and I’m happy to be able to contribute.

LBS: You are currently based in Ireland, where you do most of your work. It’s interesting to ask, which are the main differences in the audio industry between Ireland and Spain?

JAVIER: The main difference is that Ireland is a country that enjoys a better economical situation. This brings more stability and specialization to the profession.

Having said that, Ireland is an interesting example because it shares some similarities with Spain. Both countries went under during the economic crisis (both with a property bubble). Also, both live under other countries shadows like the UK, France or the US since these have a more mature and stablished industry.

LBS: How are audio professionals treated by the Irish industry? Do any kind of associations or unions exist?

JAVIER: Personally, my experience has being positive. Maybe sound doesn’t get as much love and attention as other departments (that’s kind of universal since we are visual creatures) but in my environment I usually have the time and resources needed to get the job done.

About associations, I am not aware of them but if they do exist they are probably based in Dublin since the industry is mostly located there. (I’m currently in Galway).

LBS: Those who work on this amazing profession usually share an appreciation for cinema, music and even other arts. Which were the main reasons for you to end up building sonic worlds? Maybe your experience in music production brought you there?

JAVIER: Like many other people, the thing that made me consider and appreciate sound was music. Reason was the first audio software that I used in depth and that was when I dropped out of college to study audio.

I still think Reason is a very unique starting point since its design imitates real hardware and it gave me my first notions of how the audio signal flows.

Later, I started to be more interested in audio for cinema and games. I think they offer a great balance of artistic and technical challenges.

LBS: At the start of your career you were getting some experience with music recording and mixing at Mundo Sinfónico. How do you think this time helped you in your career?

JAVIER: Mundo sinfónico was my first professional audio experience. Héctor Pérez, who owns the place, was kind enough to let me join on some projects during recording and mixing.

During that time I learned a lot about using microphones, Pro Tools, and other software. It was pretty much like discovering how all these things are used in the real world and in real applications. At this time, I also started to learn how to to face a mix.

LBS: So, how were your first steps as a sound designer?

JAVIER: At some point, I knew I needed to invest in my own gear in order to work in projects and I had to make a decision. I could either invest in music recording or in location audio gear. I decided to go for the latter since building an studio would lock me into an specific location but I could do location audio anywhere. Also, by that point audio for cinema interested me as much as music production.

With this gear I did many, many short films, some documentaries and TV stuff. Naturally, I would also work on the audio post for some of these projects and this was the way I went into sound design and mixing.

LBS: Is there any specific moment in time when you feel you made a big leap forward on your career?

JAVIER: Maybe the way I got my current job. By that time, I was living in Galway, which is quite far away from Dublin (impossible to commute). Since all the industry is really in Dublin, this was an issue if I wanted to get work but those days I was just working on freelance projects here are there.

One day, I decided that it would be cool to find people in my city interested in going out and record sound effects. I sent some emails to local audio folks and one of them was Ciarán Ó Tuairisc, who was the head of sound for Telegael, a company that was super close, like a 5 minute drive from my place.

I went there to meet him and see the place and he gave me some episodes so I could do a sound design test. Some days later, I came back with the results and I was offer a job there. I was maybe expecting that they will consider me for freelance work at best but the whole thing was kind of a job interview where I was successful with no need for a CV or a tie.

LBS: What are your main goals when facing a sound design project? Which of them are esential to your workflow?

JAVIER: When doing sound design I like to first do a basic coverage pass. Just have a sound for every obvious thing without taking much time with each. Once this is done, the real job begins when you start thinking about how the sounds you already have work together and which ones are important enough to spend more time and thought on them.

LBS: When crafting a sonic world, which are the processes (artistic or technical) that deserve the most attention and detail?

JAVIER: The elements that drive the story forward defintely deserve the most attention. Also is very important to give detail to any element that helps with world building.

If the story takes place in a special place or there is a relevant object is important to think how these should sound like. Of course, ideally this should work on subconcious level for the viewer.

LBS: Talking now about all the different processes that build a sonic world (dialogue editing, ambients/fx, foley, mixing…), which is the hardest for you and which one do you enjoy the most?

JAVIER: Probably foley is where I am the least confortable. It is a true art that requieres experience, coordination and sensitivity to get it right. I don´t have a lot of experience doing it and I am not into the physical part of the job although I know that that appeals to other people.

The process I enjoy the most is mixing since this is when all elements come together to create a cohesive whole that moves towards the same artistic direction.

LBS: Do you usually think about mixing when doing sound design? Do you use sub-mixes or pre-mixes on certain elements? Or do you prefer to start the mix completely from scratch?

JAVIER: It depends on the situation. When I´m just doing sound design I try to give the mixer as much control and options as possible so I don´t usually do sub-mixes although sometimes they makes sense.

If I´m mxing and also doing sound design I tend to pre-mix things as I go and even apply some EQ or compression here and there on elements that I know are going to need it. For this, clip effecs on Pro-Tools are great.

LBS: Talking about something omnipresent and unavoidable like technology, which is the gear you usually use when doing editing, sound design and mixing?.

JAVIER: I use a Pro Tools Ultimate rig with a S6 M10 desk. In terms of software, I use the usual stuff, most of my plugins are either from Avid or from Waves. For dialogue editing, Izotope RX is a must.

LBS: Which was your last technological discover that improved your workflow the most?

JAVIER: Probably Soundly, although this wasn´t that recent. It is a library management software that maybe doesn´t offer as many features as Soundminer but I think is a great option. It is more affordable (in the short term) and also offers online libraries that are kept updated and growing. It offers more than enough metadata capabilities and good integration with Pro Tools.

LBS: A big portion of your work is focused on an area that is maybe a little unkwnown for some of us but very important and clearly rising in relevance. How did you get into video game sound design?

JAVIER: I grew up playing games and this was always an area that interested me when I got into sound design.

One day I saw an ad for a crowdfunding from a spanish game, Unepic. They were looking for some money to record some voice acting and I emailed them asking them wether they would also be interested in some help with sound design. I had really no idea about how this kind of work would go and surprinsingly the were interested and we started to work together.

Six year later, Unepic has sold more than half a million copies between consoles and PC, being the first spanish indie game to get into Steam. It was a project that taught me a lot and I have kept working with its developer, Francisco Téllex de Meneses and many others since.

LBS: What are the main differences between working on video game sound design and just working on traditional media?

JAVIER: The main difference is that traditional media is linear. Once you finish a mix, it is going to be the same for all viewers, the only differentiating factor would be the reproduction system but the mix itself it would be the same forever.

On the other hand, video games are interactive so there is no mix in the traditional sense. You just give the game engine every audio asset needed and the rules that will govern how these sound are played. So the mix would be created in real time as the player intereacts with the world of the game.

The real power in video game sound design comes from the fact that you can connect audio tools with parameters and states within the game world. For example, imagine that the music and dialogue are connected to a low pass filter, a reverb and a delay and they change as your health gets lower. Or a game where you build weapons that wear out as you use them so their foley and FX become darker (via an EQ) and more distorted in the process.

I have an article on my blog with more information for someone who wants to start to do video game sound design.

LBS: Let´s talk about your work on field and SFX recording. We can find some interesting libraries on your website made by you, some of them dedicated to something you call “audio explorations”.

How important is field and SFX recording for you?

JAVIER: It´s something I consider very important beacuse once you have access to the big libraries the industry uses, you realize that there are many sounds that are over used. Once you start to hear them, they are everywhere!

So, I think is important to bring a more unique and personal approach to sound design. Also, when you record and create your own sound effects you force yourself to be more adventurous and to experiment with thechniques and ideas.

LBS: How do you usually plan a field recording session? Are they done within the context of a larger project or do you plan free sessions just to experiment and play around?

JAVIER: This is something I´ve been thinking about for a long time. On one hand, when something specific is needed, I just go out to get it. But with time I have been thinking that in those cases is not very convenient to explore and record interesting stuff since you have deadlines and many other things to work on.

As a solution, I´ve been going on what I call “explorations”. I just pick a technique, prop, place or software and I try to create interesting stuff while trying to learn how it works. I´ve been blogging about them and also releasing free mini-libraries with the results.

LBS: Any particular piece of advice to keep in mind when doing field recording?

JAVIER: At the begining of every take, always explain what your are doing with your own voice. Take videos and picture if you can. I guarantee you won´t remember everything you where doing later when you are editing.

LBS: What kind of gear (recorder, microphones…) and techniques do you usually use when doing filed recording?

JAVIER: Nothing too special or obscure. I use a Tascam HD-P2 that works great after seven years of use and is able to record at 192 kHz although it only has two pre-amps so sometimes I need other recorders as a reinforcement. The microphones I use are a 416, Oktava 012, Rode NT4, SM57, Sanken COS-11D and some more exotic mics from JRF (hydrophone, contact mic and a coil pick up).

LBS: Which project would you consider a highlight on your career in terms technical or artistic merit?

JAVIER: Recently, I have worked on the sound design and a good portion of the mix for a documentary series about the lighthouses of Ireland that was premiered on RTE (the irish BBC).

It was a very interesting project with beautiful helicopter footage. I needed to recreate the audio for 200 minutes of aerial shots so loads of waves, wind, storms, seagulls and things like that. I tried to give each location and lighthouse its own personality and sound. Some of them are really astonishing and true masterpieces of engineering while others are situated on amazing natural locations.

I summary, one the most beautiful projects I have had the chance of working on.

LBS: Is there any cool anecdote in your almost decade as a professional that you would like to share?

JAVIER: While I was trying to remember an anecdote I thought I could share something that happens to me from time to time and I wonder if it´s something thar other people experience too.

Some times, when I´m looking for a particular sound. I bring some audio just by chance or even by mistake and it works great just like that. I guess that when you spend many hours editing audio, these things are going to just happen from time to time but it always feels like you were touched by the goods of sound design for a moment.

LBS: Is there any project on your near future?

JAVIER: I´m about to get immersed in Drop Dead Weird, a live action comedy about three australian teenagers that move to Ireland and their parents turn into zombies. I am mixing the show, which is a co-production between Channel 7 (Australia) and RTE (Ireland).

It´s a cool crazy project with a lot of action and sound design and many people on each scene which is always a challenge in terms of dialogue editing.

LBS: To wrap thing up, any advice for someone who is mad enough to be interested in this beautiful profession?

JAVIER: When I look back at my career there is a pattern that repeats itself: I was able to make a leap forward when I was on the right place at the right time. The problem is that you never know when and where this is going to happen, for each of these moments of success I´ve had many more that just were unfruitful.

So the best way to go then is to be persistant and throw as many seeds to the air as possible while always improving as a professional. Something will bloom.

LBS: Thanks again for your time, Javier. Best of luck on your future projects which we will keep an eye on here on labobinasonora.net.

JAVIER: Thanks to you, Óscar for having me. My pleasure.

September 22, 2018

Essential Bodyfalls: Sound Library Post-Mortem

September 22, 2018/ Javier Zumer

Essential Bodyfalls is the second library that I’ve published. This is a brief account of what I learned during the process of creating it along side fellow sound designers Grace Canavan and Pearse O' Caoimh.

Where to record

At first, we considered recording outdoors, somewhere desolated and quiet but the Irish weather quickly encouraged us to go another route. It would be very tough to find enough days when the three of us were free plus the weather was decent.. So we considered finding an indoors place. After some looking around, we found that Grace’s family had a house that was in construction and there was a room in there that we may able to use.

The place was empty and echoey but fairly quiet and mostly for ourselves on the weekends so we decided to turn it into our improvised foley studio. We couldn’t do anything permanent to the room, so we did some research to find possible solutions that would be easy to remove afterwards.

We were able to get some help from the builders working on the house and we built a wooden frame and two foley pits for us. The idea was to apply a poor’s man room within a room concept. The frame, which spanned two thirds of the room, was then covered with old blankets and duvets creating both a dream-like blanket castle and hopefully a recording studio.

The result, despite the low tech approach was pretty decent acoustically. The room was now very dry although from a frequency balance perspective there were improvements to make. Firstly, the high frequency absorption was maybe too much so we removed some of the blankets to make the room a bit more bright.

This is how one of the corners looked with all the blankets on.

The biggest issue, as always with amateur acoustical work, were the low frequencies. We had some big resonance modes on several places. To solve this (or at least to try to), we built some DIY bass traps on the corners. We had an improvement but it wasn’t very dramatic. We decided to continue anyways knowing that we would maybe need to do some EQ work with the resulting sounds.

Props: Building dummies

Although the idea of using your own body to record is tempting, it may not be very practical from a medical point of view. We knew we had to build some kind of dummy that we could use as an action double. Something durable, heavy enough and of course realistic sounding.

We tried several things to try to create the correct weight and sound.

Mark 1 (Fat Tony): Our first approach was to use sandbags covered with clothes. A big one would be the torso plus two smaller cylindrical ones for limbs. The resulting dummy was heavy (maybe too heavy) and it sounded quite dull.

Mark 2 (Potato Man): A different approach was to stuff some old dungarees with a mix of potatoes and foam. The result was a brighter sound that maybe needed more weight.

Mark 3 (Punching Bag): This time we bought a punching bag and we stuffed with old clothes and foam. This one sounded kind of in the middle of the two previous ones, it had a good amount of weight to it but without being too dull.

We also used other smaller props, like toys and stuffed animals to give the sounds more variability and to interact with the different materials and surfaces we had. At the end, the best results were achieved by combining two or more props in a single action, we were usually using two of our dummies at a time.

Our collection of dummies during some initial testing.

Surfaces & Materials

Although we considered some others, the final library ended up having body falls on: dirt, gravel, sand, concrete, metal, grass and wood.

We were able to find some of the materials in construction sites where builders were kind enough to let us grab a bucket full of different types of dirt, sand and gravel.

For the concrete, we just use the bare floor of the room since it had no carpet or tiles. For metal, we used different pieces that we found around. We had a solid one and then a more hollow sounding one.

The grass was recorded using combinations of dry grass and VHS tapes to achieve both short and tall grass. Finally, the wood falls were recorded on an old door and a abandoned pallet.

We used a piece of cloth to contain the materials and easily swap them when needed. Something we quickly discovered was that to get more interesting results, it´s a good idea to combine different materials. The dirt, for example, had a bit of gravel mixed in to enhance the crunchiness.

Here you an see our buckets + the cloths we were using to contain each material + the old wooden door on the left.

Recording sessions

Something we learned while working on this project was that at first we were being too ambitious. We were planning to record several falls from each of the dummies with three different intensities on each variation of every surface. This would have taken forever.

At the end, we decided to streamline the process, focusing of getting nice sounds for each of the surfaces regardless of the prop used and mixing up intensities. The best results were probably achieved when combining the dummies and using two of them at the same time.

Since we were a team of three, two would be recording while the third is editing and checking takes on a Pro Tools rig that we set up on another room. This way, we had quick feedback on what was working the best.

After we have recorded enough falls on any given surface, we would record some isolated interactions with the material like drags, impacts, debris, etc… This proved to be essential on the editing phase.

The gear used was quite simple. A sennheisser MKH416 and a Shure SM57 into my faithful Tascam HD-P2.

Editing, mixing & Mastering

This is probably one of the most gruelling steps of the process. We needed to process and combine hundreds of sounds to get to the final product. The approach we used was to have a master Pro Tools session with every single dummy and surface combination. We then did a selection of the best sounds from each of the takes.

We then created a new session per final bodyfall type where we combined all the different layers of sounds to achieve a nice range of intensities and complexity. In some cases, we could even use a dull, neutral fall recorded on concrete and for example add a gravel impact and debris to create a gravel body fall.

Izotope RX was used to clean up takes and EQ + compression was applied all around. We were also mindful about audio levels and we applied the same mastering process to all the final sounds so they have a confortable level of loudness to work with.

Conclusions

In my opinion, the main lesson learned from this project, was that it´s important to set a realistic goal and focus on getting that done to the best of your abilities instead of planning to do something too ambitious that you probably will never finish.

Another lesson was that sometimes it´s easier to just pay for something instead of expending a lot of time trying to get it for free. Every problem can be solved with either time or money and knowing when to use each is key if you want to get things done.

If you work on any library creation project, something that you should always keep in mind is that the editing and mixing process is tough and very time consuming. Try dividing it into smaller chunks or assigning different sections to different people to make it easier.

With all this work now behind us, we are very happy with the results and with how the library is doing. We are definitely looking forward to tackle new projects and apply all the learned lessons but in the meantime, you can check out the library here:

August 24, 2018

Figuring out: Gain Staging

August 24, 2018/ Javier Zumer

What is it?

Gain staging is all about managing the audio levels of different layers within an audio system. In other words, when you need to make something louder, good gain staging is knowing where in the signal chain would be best to do this.

I will focus this article on the realm of mix & post-production work under Protools, since this is what I do daily, but these concepts can be applied in any other audio related situation like recording or live sound.

Pro Tools Signal Chain

To start with, let's have a look at the signal chain on Protools:

Knowing and understanding this chain is very important when setting your session up for mixing. Note that other DAWs would vary in their signal chain. Cubase, for example, offers pre and post-fader inserts while on Pro Tools every insert is always pre-fader except from the ones on the master channel.

Also, I've added a Sub Mix Bus (an auxiliar) at the end of the chain because this is how usually mixing templates are set up and is important to keep it in mind when thinking about signal flow.

So, let's dive into each of the elements of the chain and see their use and how they interact with each other.

Clip gain & Inserts

As I was saying, on Pro Tools, inserts are pre-fader. It doesn't matter how much you lower your track's volume, the audio clip is always hitting the plugins with its "original" level. This renders clip gain very handy since we can use it to control the clip levels before they hit the insert chain.

You can use clip gain to make sure you don't saturate your first insert input and for keeping the level consistent between different clips on the same track. This last use is specially important when audio is going through a compressor since you want roughly the same amount of signal being compressed across all the different clips on a given channel.

So what if you want a post-fader insert? As I said, you can't directly change an insert to post-fader but there is a workaround. If you want to affect the signal after the track's volume, you can always route that track or tracks to an auxiliar and have the inserts on that aux. In this case, these inserts would be post-fader from the audio channel perspective but don't forget they are still pre-fader from the aux channel own perspective.

Signal flow within the insert chain

Since the audio signal flows from the first to the last insert, when choosing the order of these plugins is always important to think about whatever goal you want to achieve. Should you EQ first? Compress first? What if you want a flanger, should it be at the end of the chain or maybe at the beginning?

I don't think there is definitive answer and, as I was saying, the key is to think about the goal you have in mind and whichever way makes conceptual sense to your brain. EQ and compression order is a classic example of this.

The way I usually work is that I use EQ first to reduce any annoying or problematic frequencies, having also a high pass filter most of the time to remove unnecessary low end. Once this is done, I use the compressor to control the dynamic range as desired. The idea behind this approach is that the compressor is only going to work with the desired part of the signal.

I sometimes add a second EQ after the compressor for further enhancements, usually boosting frequencies if needed. Any other special effects, like a flanger or a vocoder would go last on the chain.

Please note that, if you use the new Pro Tools clip effects (which I do use), these are applied to the clip before the fader and before the inserts.

Channel Fader

After the insert chain, the signal goes through the channel fader or track volume. This is where you usually do most of the automation and levelling work. A good gain stage management job makes working with the fader much easier. You want to be working close to unity, that is, close to 0.

This means that, after clip gain, clip effects and all inserts; you want the signal to be at your target level when the fader is hovering around 0. Why? This is where you have the most control, headroom and confort. If you look closely at the fader you'll notice it has a logarithmic scale. A small movement next to unity would suppose 1 or 2 dB but the same movement down below could be a 10 dB change. Mixing close to unity makes subtle and precise fader movements easy and confortable.

Sends

Pro Tools sends are post-fader by default and this is the behaviour you would usually want most of the time. Sending audio to a reverb or delay is probably the most common use for a send since you want to keep 100% of the dry signal and just add some wet processed signal that will change in level as the dry also changes.

Pre-fader sends are mostly useful for recording and live mixing (sending a headphone mix is a usual example) and I don't find myself using them much on post. Nevertheless, a possible use on a post-production context could be when you want to work with a 100% of the wet signal regardless of how much of the dry signal is coming through. Examples of this could be special effects and/or very distant or echoey reverbs where you don't want to keep much of the original dry signal.

Channel Trim

Trim is pretty much like effectively having two volume lanes per track. Why would this be useful? I use trim when I already have an automation curve that I want to keep but I just want to make the whole thing louder or quieter in a dynamic way. Once you finish a trim pass, both curves would coalesce into one. This is the default behaviour but you can change it on Preferences > Mixing > Automation.

VCAs

VCAs are a concept that comes from analogue consoles (Voltage Controlled Amplifier) and allows you to control the level of several tracks with a single fader. They use to do this by controlling the voltage reaching each channel but on Pro Tools, VCAs are a special type of track that doesn't have audio, inserts, inputs or outputs. VCA tracks just have a volume lane that can be used to control the volume of any group of tracks.

So, VCAs are something that you usually use when you want to control the overall level of a section of the mix as a whole, like the dialogue or sound effects tracks. In terms of signal flow, VCAs are just changing a track level via the track's fader so you may say they just act as a third fader (the second being trim).

Why is this better that just routing the same tracks to an auxiliar and changing the volume there? Auxiliars are also useful, as you will see on the next section, but if the goal is just level control, VCAs have a few advantages:

Coalescing: After every pass, you are able to coalesce your automation, changing the target tracks levels and leaving your VCA track flat and ready for your next pass.
More information: When using an auxiliar instead of a VCA track, there is no way to know if a child track is being affected by it. If you accidentally move that aux fader you may go crazy trying to figure out why your dialogue tracks are all slightly lower (true story). On the other hand, VCAs show you a blue outline (see picture below) with the real affected volume lane that would result after coalescing both lanes so you can always see how a VCA is affecting a track.
Post fader workflow: Another problem of using an auxiliar to control the volume of a group of tracks, is that if you have post-fader sends on those tracks, you will still send that audio away regardless of the parent's auxiliar level. This is because you are sending that audio away before you send it to the auxiliar. VCAs avoid this problem by directly affecting the child track volume and thus also affecting how much is sent post-fader.

Sub Mix buses

This is the final step of the signal chain. After all inserts, faders, trim and VCA, the resulting audio signals can be routed directly to your output or you may also consider using a sub mixing bus instead. This is an auxiliar track that sums all the signals from a specific group of channels (like Dialogue tracks) and allows you to control and process each sub mix as a whole.

These are the type of auxiliar tracks that I was taking about on the VCA section. They may not be ideal to control the levels of a sub mix, but they are useful when you want to process a group of tracks with the same plugins or when you need to print different stems.

An issue you may find when using them is that you may find yourself "fighting" for a sound to be loud enough. You feel that pushing the fader more and more doesn't really help and you barely hear the difference. When this happens, you've probably run out of headroom. Pushing the volume doesn't seem to help because a compressor or limiter further on the signal chain (that is, acting as a post-fader insert) is squashing the signal.

When this happens, you need to go back and give yourself more headroom by making sure you are not over compressing or lowering every track volume until you are working on manageable level. Ideally, you should be metering your mix from the start so you know where you are in terms of loudness. If you mix to any loudness standard like EBU-R128, that should give you a nice and comfortable amount of headroom.

Final Thoughts

Essentially, mixing is about making things louder or quieter to serve the story that is being told. As you can see, is important to know where in the audio chain the best place to do this is. If you keep your chain in order, from clip gain to the sub mix buses, making sure levels are optimal every step of the way. you'll be in control and have a better idea on where to act when issues arise. Happy Mixing.

Blog