Figuring out: Measuring Loudness
/How loud is too loud?
There are many loudness standards nowadays and many types of media and platforms so making sure audio is on the correct level everywhere can be tricky. In this post, I’m going to talk about the history of measuring loudness and the current standards that we use nowadays.
The analogue days
The first step to measure loudness is to define and understand the fundamental nature of the decibel. Luckily, I wrote a post last year about this very subject so you may want to check that before diving into loudness.
So, now that you are accustomed with the dB, let’s think about how we can best use it to measure how loud audio signals are.
In the analogue days, reading audio levels always implied measuring voltage or power in a signal and comparing it to a reference value. When trying to determine how loud an audio signal is, we can just measure these values across time but the problem is that levels are usually changing constantly. So how do we best represent the overall level?
A possible approach would be to just measure the highest value. This method of measuring loudness is called Peak and is handy when we want to make sure we are not working with levels above the system capacity to make sure our signals are not saturated. But in terms of measuring the general level of a piece of audio, this approach can be very deceiving. For example, a very quiet signal with a sudden loud transient would register as loud despite being quiet as a whole.
As you are probably thinking, a much better method would be to measure an average value across a certain time window instead of the instant reading that peak meters provide. This is usually called RMS (root mean square) metering and it is much closer to how we humans perceive loudness.
Let’s have a look at some of the meters that were created:
VU (Volume Unit) meters are probably the most used meters in analogue equipment. They were designed in the 1940s to measure voltage with a response time similar to how we naturally hear. The method is surprisingly simple: the needle’s own weight slows down its movement by around 300 ms on both the attack and the release so very sudden changes would be soften. The time that the meter needs to start moving is usually called the integration time. You will also hear the term “ballistics” to define these response times.
The PPM (peak programme meter) is a different type of meter that was widely used in the UK and Scandinavia since the 1930s. Unlike the the VU meter, PPM uses very short attack integration times (around 10ms for type II and 4ms for type I) while using relatively long times for the release (around 1.5 seconds for a 20dB fall). Since these integration times are very short, they were often consider quasi-peak meters. The long release time helped engineers see peaks for a longer time and get a feel of the overall levels of a programme since levels would fall slowly after a loud section.
The Dorrough Loudness Meter is also worth mentioning. It combines a RMS and a peak meter in one unit and was very common in the 90s. We will see that combining a RMS and peak meter in a single unit was going to be a trend that will carry on until today.
The dawn of Digital Audio
As digital audio started to become the new industry standard, new ways to measure audio levels needed to be adopted. But how do we define how much 0 is in the digital realm? In analogue audio, the value we assign to 0 is usually some meaningful measure that help us avoid saturating the audio chain. These values used to be measured in volts or watts and would vary depending on the context and type of gear. For example, for studio equipment in the US, 0VU corresponds with +4 dBu (1.228 V) while europe’s 0VU is +6 dBu (1.55 V). Consumer equipment uses -10dBV (0.3162V) as their 0VU. As you can see, the meaning of 0VU is very context dependant.
In the case of digital audio, 0dB is simply defined as the loudest level that flows through the converters before clipping, this is, before the waveform is deformed and saturation is introduced. We call this definition of the decibel, dBFS (Decibel Full Scale). How digital audio levels correspond with analogue levels depends on how your converters are calibrated but usually 0VU is equated to around -20dBFS on studio equipment.
The platonic loudness standard
Since dBFS is only a scale in the digital world, we still need to find a way to measure loudness in a human friendly way within digital audio. As we have seen, this is usually accomplished by averaging audio levels across a certain time window. On the other hand, digital audio also needs precision when measuring peaks if we want to avoid saturation when converting audio between analogue and digital and viceversa.
Something else that we need to take into consideration for our standard is the fact that we are not sensitive to all frequencies in the same proportion as the Fletcher–Munson curves show. As you can see, we are not very sensitive to low or very high frequencies, if we want our audio levels to be accurate, this is something that needs to be accounted for.
So, I have laid out everything that we need our loudness standard to have. Does such thing exist?
The ITU BS.1770 standard
This document was presented by the ITU (International Telecommunications Union) in 2006 and fits all the required criteria we were looking for. The ITU BS.1770 is really a collection of technologies and protocols designed to measure loudness accurately in a digital environment. It is really a set recommendations, we could say.
Four revisions have been released at the time of this writing plus the ITU BS.1771 which also expands on the same ideas. For simplicity, I will refer to all of these documents as simply the ITU BS.1770 or just ITU.
The loudness unit defined by the ITU is the LKFS, which stands for “Loudness K-weighted Full scale”. This unit combines a weighting curve (named “K”) to account for frequence sensitivity along with an averaged or RMS measurement that uses a 400 ms time window. The ITU also defines a “true peak” meter as a peak meter that uses oversampling for greater accuracy.
Once the ITU released their recommendations, each region used it as the foundation for their own standards. As the ITU released new updates each region would incorporate some of these ideas while expanding on them. Let´s see some regional standards.
EBU R128, Time Windows & Gates
This is the standard in use in Europe and it is released by the EBU (European Broadcast Union).
Before I continue, a clarification. The EBU names the loudness unit LUFS (Loudness units relative to full scale) instead of LKFS as the former complies better with scientific naming conventions. So if you see LUFS, keep in mind that this is pretty much the same as LKFS. On the other hand you will also see LU (Loudness Units). This is simply a relative unit that is used when comparing two LUFS or two LKFS values.
In the R128 standard, four different times windows are defined. This is based on the ITU BS.1771 recommendation. A meter needs to have all these plus some other features (see below) to be considered capable of operating in “EBU Mode”.
True-Peak: Almost instantaneous window with sub-sample accuracy.
Momentary: 400 ms window. Useful to get an idea of how loud a particular sound is. Plugins usually offer different scale options.
Short Term: 3 seconds window. Gives a good feel of how loud a particular section is.
Integrated or Programme:. Indicates how loud the whole programme is in its whole length. Sometimes it’s also called “Long Term”
Why so many different time windows? In my opinion, they are useful when working on a mix since they tell you information at different levels of resolution. True-peak tells you wether you would saturate the converters and it is good practice to always keep some headroom here. The momentary measurement is more or less similar to what VU meters would indicate, and gives you information on a particular short section. I personally don’t really look at the momentary meter much because any mix with a decent amount of dynamic range is going to fluctuate here quite a bit. Nevertheless it is useful to make sure that the mix is not very far away from the target levels on some specific sections.
Short term maybe a better tool to get a solid feel of how loud a scene is. This measurement is going to fluctuate but not as much as the momentary value. In order to get a mix within the standards, you need to make sure the short term value is usually around the target level, but you don´t need to be super accurate with this. What I try to do is make a compromise between the level that feels right and my target level and when in doubt, I favor what it feels right.
Finally, the integrated or long term value has a time window with the size of the whole show. This is the value that is going to tell you the overall level and measuring it in a faithful way is tricky as you will see below.
So, I was mentioning “target levels”. Which levels? The EBU standard recommends audio to be at -23 LUFS ±0.5 LU (±1 LU for live programmes). We are talking here about the integrated measurement, so the level for the entire show. Additionally, the maximum true peak value allowed is -1 dBTP. And that would be pretty much it, although there is one more issue as I was saying. Measuring levels throughout a long length of time in a consistent way comes with some challenges.
This is because there is usually a main element that we want to make sure is always easy to hear (usually dialogue or narration) and since audio volume is logarithmic, that main element would pretty much carry 90% of the show’s loudness weight. So we would naturally mix this element to already be at the desired loudness or slightly below. The problem comes when considering all the other elements around the dialogue. If there are too many quiet moments, that it’s going to make our integrated levels quite low, since everything is averaged.
The solution would be to either push the level of the whole show or re-mix the level of the dialogue louder so the integrated value is correct. Either way that would probably make the dialogue too loud and we would also risk saturating the peak meter. Not ideal.
In order to fix this the R128 uses the recommendations from the revisioned ITU BS.1770-3. Integrated loudness is calculated using a relative gate method that effectively pauses the measurement when levels drop below a threshold of -10 LU relative to an un-gated measurement. There is also an absolute gate at -70 LUFS, nothing below this value would be consider for the measurement. These gates help us getting a more meaningful result since only the relevant audio in the foreground will be considered when measuring the integrated time.
The last concept I wanted to mention is loudness range or LRA. This is measured in LU and indicates how much the overall levels change throughout the programme, in a macroscopic view. You can think of this as an indication of the dynamic range of your mix: low values would indicate that the mix has a very constant level while higher values would appear when there is a larger difference between quiet and loud moments. The EBU doesn’t recommend any given target value for the loudness range since this would depend on the nature of the show but it is for sure a nice tool to have to get an idea of your overall mix dynamics.
ATSC A/85
This is the standard used in the US and is released by the ATSC (Advanced Television Systems Commitee). It uses LFKS units (remember that LKFS and LUFS are virtually equivalent) and similar time windows to the europeans. The recommended integrated value is -24 LKFS while the maximum peak value allowed is -2 dBTP.
When the first version was released in 2009, this standard recommended a different method when when calculating the integrated value. As you know, the EBU system uses a relative gate in order to only consider foreground audio for its measurements but the ATSC took a different approach. Remember when I was saying before that mixes usually have some main element (often dialogue) that forms the center of the mix?
The ATSC called this main element an “anchor”. Since dialogue is usually this anchor, the system used an algorithm to detect speech and would only consider that to calculate the integrated level. I’ve done some tests with both Waves WLM and Nugen VisLM and the algorithm works pretty well, the integrated value doesn’t even budge when you are just monitoring non-dialogue content although singing usually confuses it.
In fact, on the 2011 update, the ATSC standard started to differentiating between regular programmes and commercials. Dialogue based gating would be used for the former while the all elements in the entire mix would be consider for the latter. This was actually one the main goals of the ITU standard initially: to avoid commercials being excessively loud in comparison to the programmes themselves.
Nevertheless, the ATSC updated the standard again in 2013 to follow the ITU BS.1770-3 directives and from then on all content would be measured using the same two gated method Europe uses. Because of this, I was tempted to just avoid mentioning all this ATSC history mess but I thought it was important to explain it, so yo can understand why some loudness plugins offer so many different ATSC options.
TV Regional and National Standards
Now that we have a good idea of all the different characteristics standards use, let’s see how they compare.
Country / Region | Standard | Units Used | Integrated Level | True Peak | Weighting | Integrated level method |
---|---|---|---|---|---|---|
Europe | EBU R128 | LUFS | -23 LUFS | -1 dBTP | K | Relative Gate |
US | ATSC A/85 post 2013 | LKFS | -24 LKFS | -2 dBTP | K | Relative Gate |
US | ATSC A/85 pre 2013 (Commercials) | LKFS | -24 LKFS | -2 dBTP | K | All elements are considered |
US | ATSC A/85 pre 2013 (Programmes) | LKFS | -24 LKFS | -2 dBTP | K | Dialogue Detection |
Japan | TR-B32 | LUFS | -24 LUFS | -2 dBTP | K | Relative Gate |
Australia | OP-59 | LKFS | -24 LKFS | -2 dBTP | K | Relative Gate |
As you can see, currently, there are only small differences between them.
Loudness for Digital Platforms
I have tried to find the specifications for some of the most used digital platforms but I was only able to find the latest Netflix specs. Hulu, Amazon and HBO don’t specify their requirements or at least not publicly. If you need to deliver a mix to these platform, make sure they send you their desired specs. In any case, using the latest EBU or ATSC recommendations is probably a good starting point.
In the case of Netflix, their specs are very curious. They ask for a integrated level of -27 LKFS and a maximum true peak of -2 dBTP. The method to measure the integrated level would be dialogue detection, like the ATSC used to recommend, which in a way is a step back. Why would Netflix recommend this if the ATSC spec moved on to gated based measurements? Netflix basically says that when using the gated method, mixes with a large dynamic range tend to leave dialogue too low so they propose a return to the dialogue detection algorithm.
The thing is, this algorithm is old and can be inaccurate so this decision was controversial. A new, modern and more robust algorithm could be a possible solution for this high dynamic range mixes. Also, -27 LKFS may sound too low but it wasn’t chosen arbitrarily but based on the fact that that was the level where dialogue would usually end up on these mixes. If you want to know more about this, you can check this, this and this article.
Loudness for Theatrical Releases
The case of cinema is very different from broadcast for a very simple reason: you can expect a certain homogeneity in the reproduction systems that you won’t find in home setups. For this reason there is no hard loudness standard that you have to follow.
Dolby Scale | SPL (dBC) |
---|---|
7 | 85 |
6.5 | 83.33 |
6 | 81.66 |
5.5 | 80 |
5 | 78.33 |
4.5 | 76.66 |
4 | 75 |
3.5 | 65 |
This lack of general standard has resulted in a similar loudness war to the one in the music mixing world. The result are lower dynamic ranges and many complains about cinemas being too loud. Shouldn’t cinema mixes offer a bigger dynamic range experience than TV? How are these levels determined?
Cinema screens have a Dolby box where the projectionist would set the general level. These levels are determined by the Dolby Scale and correspond to SPL measures under a C curve when using the “Dolby noise”. Remember that, in the broadcast world, the K curve is used instead which doesn’t help things when trying to translate between both.
Nowadays more and more cinemas are automated. This means that levels are set via software or even remotely. At first, all cinemas were using level 7, which is the one recommended by Dolby but as movies were getting louder and people complained, projectionists would start to use lower levels. 6, 5 and even 4.5 are used regularly. In turn, mixers started to work in those levels too which resulted in louder mixes overall in order to get the same feel. This, again, made cinemas lower their levels even more.
You see where this is going. To give you an idea, Eelco Grimm together with Michel Schöpping analyzed 24 movies available at dutch cinemas and found out levels that would vary wildly. The integrated level went from -38 LUFS to -20 LUFS, with the maximum Short-term level varying from -29 LUFS to -8 LUFS and the maximum True-Peak level varying from -7 to +3.5 dBTP. Dialogue levels varied from -41 to -25 LUFS. That’s quite a big difference, imagine if that would be the case in broadcast.
The thing is that despite these numbers being very different, we have to remember that all these movies probably were played at different levels on the dolby scale. Eelco says on his analysis:
The average playback level for movies mastered at '7' is -28 LUFS (-29 to -25).
The average playback level for movies mastered at '6.3' is -23 LUFS (-25 to -21). They are projected 3 dB softer, so if we corrected the average to a '7' level, it would be -26 LUFS.
The average playback level for movies mastered at '5' is -20 LUFS (all were -20). They are projected 7 dB softer, so the corrected average would be -27 LUFS.
So, as you can see, at the end dialogue level is equivalent to about -27 LUFS in all cases, the only difference is that the movies that were mixed at 7 (which is the recommended level) would have greater dynamic range, something important to be able to give a cinematic feel that TV can’t provide. The situation is quite unstable and I hope a solid solution based in the ITU recommendations is implemented at some point. If you want to know more about all this issue and read the paper that Eelco Grimm released, check this comprehensive article.
Loudness standards for video games.
Video games are everywhere: consoles, computers, phones, tablets, etc, so there is no clear standard to use. Having said that, some companies have stablished some guidelines. Sony, through their ASWG-R001 document recommends the following:
-23 LUFS and -1dBTP for Playstations 3 and 4 games.
-18 LUFS and -1dBTP for PSVita games.
The maximum loudness range recommended is 20 LU.
But how do you measure the integrated loudness in a game? Integrated loudness was designed for linear media so Sony’s document recommends to make measurements in 30 minutes sessions that are a good representation of different sections of the game.
So, despite games being so diverse in platforms and contexts using the EBU recommendations for consoles and PC (-23 LUFS) and a louder spec for mobile and portable games (-18 LUFS) would be a great starting point.
Conclusions and some plugins.
I hope you now have a solid foundation of knowledge for the subject. Things will keep changing so if your read this in the future, assume some of this information is outdated. Nevertheless, you would have hopefully learned the concepts you need to work with loudness now and in the future.
If you want to test loudness, many DAWS (including Pro Tools) don’t have a built-in meter that can measure LUFS/LKFS but there are plugins to solve this. I recommend that you try both Waves WLM and Nugen VisLM. If you can’t afford a loudness plugin, you can try Youlean, which has a free version and is a great one to start with.
Thanks for reading!