Figuring out: Measuring Loudness

How loud is too loud?

There are many loudness standards nowadays and many types of media and platforms so making sure audio is on the correct level everywhere can be tricky. In this post, I’m going to talk about the history of measuring loudness and the current standards that we use nowadays.

The analogue days

The first step to measure loudness is to define and understand the fundamental nature of the decibel. Luckily, I wrote a post last year about this very subject so you may want to check that before diving into loudness.

So, now that you are accustomed with the dB, let’s think about how we can best use it to measure how loud audio signals are.

In the analogue days, reading audio levels always implied measuring voltage or power in a signal and comparing it to a reference value. When trying to determine how loud an audio signal is, we can just measure these values across time but the problem is that levels are usually changing constantly. So how do we best represent the overall level?

A possible approach would be to just measure the highest value. This method of measuring loudness is called Peak and is handy when we want to make sure we are not working with levels above the system capacity to make sure our signals are not saturated. But in terms of measuring the general level of a piece of audio, this approach can be very deceiving. For example, a very quiet signal with a sudden loud transient would register as loud despite being quiet as a whole.

As you are probably thinking, a much better method would be to measure an average value across a certain time window instead of the instant reading that peak meters provide. This is usually called RMS (root mean square) metering and it is much closer to how we humans perceive loudness.

Let’s have a look at some of the meters that were created:

Real audio signal (grey) and how a VU meter would interpret it. (black)

VU (Volume Unit) meters are probably the most used meters in analogue equipment. They were designed in the 1940s to measure voltage with a response time similar to how we naturally hear. The method is surprisingly simple: the needle’s own weight slows down its movement by around 300 ms on both the attack and the release so very sudden changes would be soften. The time that the meter needs to start moving is usually called the integration time. You will also hear the term “ballistics” to define these response times.

The PPM (peak programme meter) is a different type of meter that was widely used in the UK and Scandinavia since the 1930s. Unlike the the VU meter, PPM uses very short attack integration times (around 10ms for type II and 4ms for type I) while using relatively long times for the release (around 1.5 seconds for a 20dB fall). Since these integration times are very short, they were often consider quasi-peak meters. The long release time helped engineers see peaks for a longer time and get a feel of the overall levels of a programme since levels would fall slowly after a loud section.

The Dorrough Loudness Meter is also worth mentioning. It combines a RMS and a peak meter in one unit and was very common in the 90s. We will see that combining a RMS and peak meter in a single unit was going to be a trend that will carry on until today.

VU meter.

PPM

The dawn of Digital Audio

As digital audio started to become the new industry standard, new ways to measure audio levels needed to be adopted. But how do we define how much 0 is in the digital realm? In analogue audio, the value we assign to 0 is usually some meaningful measure that help us avoid saturating the audio chain. These values used to be measured in volts or watts and would vary depending on the context and type of gear. For example, for studio equipment in the US, 0VU corresponds with +4 dBu (1.228 V) while europe’s 0VU is +6 dBu (1.55 V). Consumer equipment uses -10dBV (0.3162V) as their 0VU. As you can see, the meaning of 0VU is very context dependant.

In the case of digital audio, 0dB is simply defined as the loudest level that flows through the converters before clipping, this is, before the waveform is deformed and saturation is introduced. We call this definition of the decibel, dBFS (Decibel Full Scale). How digital audio levels correspond with analogue levels depends on how your converters are calibrated but usually 0VU is equated to around -20dBFS on studio equipment.

Fletcher-Munson curves showing frequency sensitivity for humans. How cool would it be to see the equivalent curves for other animals, like bats?

Fletcher-Munson curves showing frequency sensitivity for humans. How cool would it be to see the equivalent curves for other animals, like bats?

The platonic loudness standard

Since dBFS is only a scale in the digital world, we still need to find a way to measure loudness in a human friendly way within digital audio. As we have seen, this is usually accomplished by averaging audio levels across a certain time window. On the other hand, digital audio also needs precision when measuring peaks if we want to avoid saturation when converting audio between analogue and digital and viceversa.

Something else that we need to take into consideration for our standard is the fact that we are not sensitive to all frequencies in the same proportion as the Fletcher–Munson curves show. As you can see, we are not very sensitive to low or very high frequencies, if we want our audio levels to be accurate, this is something that needs to be accounted for.

So, I have laid out everything that we need our loudness standard to have. Does such thing exist?

2011-ITU-logo-official.png

The ITU BS.1770 standard

This document was presented by the ITU (International broadcast union) in 2006 and fits all the required criteria we were looking for. The ITU BS.1770 is really a collection of technologies and protocols designed to measure loudness accurately in a digital environment. It is really a set recommendations, we could say.

Four revisions have been released at the time of this writing plus the ITU BS.1771 which also expands on the same ideas. For simplicity, I will refer to all of these documents as simply the ITU BS.1770 or just ITU.

The loudness unit defined by the ITU is the LKFS, which stands for “Loudness K-weighted Full scale”. This unit combines a weighting curve (named “K”) to account for frequence sensitivity along with an averaged or RMS measurement that uses a 400 ms time window. The ITU also defines a “true peak” meter as a peak meter that uses oversampling for greater accuracy.

Once the ITU released their recommendations, each region used it as the foundation for their own standards. As the ITU released new updates each region would incorporate some of these ideas while expanding on them. Let´s see some regional standards.

EBU logo 2012.png

EBU R128, Time Windows & Gates

This is the standard in use in Europe and it is released by the EBU (European Broadcast Union).

Before I continue, a clarification. The EBU names the loudness unit LUFS (Loudness units relative to full scale) instead of LKFS as the former complies better with scientific naming conventions. So if you see LUFS, keep in mind that this is pretty much the same as LKFS. On the other hand you will also see LU (Loudness Units). This is simply a relative unit that is used when comparing two LUFS or two LKFS values.

In the R128 standard, four different times windows are defined. This is based on the ITU BS.1771 recommendation. A meter needs to have all these plus some other features (see below) to be considered capable of operating in “EBU Mode”.

  • True-Peak: Almost instantaneous window with sub-sample accuracy.

  • Momentary: 400 ms window. Useful to get an idea of how loud a particular sound is. Plugins usually offer different scale options.

  • Short Term: 3 seconds window. Gives a good feel of how loud a particular section is.

  • Integrated or Programme:. Indicates how loud the whole programme is in its whole length. Sometimes it’s also called “Long Term”

Why so many different time windows? In my opinion, they are useful when working on a mix since they tell you information at different levels of resolution. True-peak tells you wether you would saturate the converters and it is good practice to always keep some headroom here. The momentary measurement is more or less similar to what VU meters would indicate, and gives you information on a particular short section. I personally don’t really look at the momentary meter much because any mix with a decent amount of dynamic range is going to fluctuate here quite a bit. Nevertheless it is useful to make sure that the mix is not very far away from the target levels on some specific sections.

Short term maybe a better tool to get a solid feel of how loud a scene is. This measurement is going to fluctuate but not as much as the momentary value. In order to get a mix within the standards, you need to make sure the short term value is usually around the target level, but you don´t need to be super accurate with this. What I try to do is make a compromise between the level that feels right and my target level and when in doubt, I favor what it feels right.

Finally, the integrated or long term value has a time window with the size of the whole show. This is the value that is going to tell you the overall level and measuring it in a faithful way is tricky as you will see below.

So, I was mentioning “target levels”. Which levels? The EBU standard recommends audio to be at -23 LUFS ±0.5 LU (±1 LU for live programmes). We are talking here about the integrated measurement, so the level for the entire show. Additionally, the maximum true peak value allowed is -1 dBTP. And that would be pretty much it, although there is one more issue as I was saying. Measuring levels throughout a long length of time in a consistent way comes with some challenges.

This is because there is usually a main element that we want to make sure is always easy to hear (usually dialogue or narration) and since audio volume is logarithmic, that main element would pretty much carry 90% of the show’s loudness weight. So we would naturally mix this element to already be at the desired loudness or slightly below. The problem comes when considering all the other elements around the dialogue. If there are too many quiet moments, that it’s going to make our integrated levels quite low, since everything is averaged.

The solution would be to either push the level of the whole show or re-mix the level of the dialogue louder so the integrated value is correct. Either way that would probably make the dialogue too loud and we would also risk saturating the peak meter. Not ideal.

Nugen´s VisLM Plugin operating in EBU mode. You can see all the common EBU features including all time windows, loudness range and a gate indicator.

In order to fix this the R128 uses the recommendations from the revisioned ITU BS.1770-3. Integrated loudness is calculated using a relative gate method that effectively pauses the measurement when levels drop below a threshold of -10 LU relative to an un-gated measurement. There is also an absolute gate at -70 LUFS, nothing below this value would be consider for the measurement. These gates help us getting a more meaningful result since only the relevant audio in the foreground will be considered when measuring the integrated time.

The last concept I wanted to mention is loudness range or LRA. This is measured in LU and indicates how much the overall levels change throughout the programme, in a macroscopic view. You can think of this as an indication of the dynamic range of your mix: low values would indicate that the mix has a very constant level while higher values would appear when there is a larger difference between quiet and loud moments. The EBU doesn’t recommend any given target value for the loudness range since this would depend on the nature of the show but it is for sure a nice tool to have to get an idea of your overall mix dynamics.

atsc_large.png

ATSC A/85

This is the standard used in the US and is released by the ATSC (Advanced Television Systems Commitee). It uses LFKS units (remember that LKFS and LUFS are virtually equivalent) and similar time windows to the europeans. The recommended integrated value is -24 LKFS while the maximum peak value allowed is -2 dBTP.

When the first version was released in 2009, this standard recommended a different method when when calculating the integrated value. As you know, the EBU system uses a relative gate in order to only consider foreground audio for its measurements but the ATSC took a different approach. Remember when I was saying before that mixes usually have some main element (often dialogue) that forms the center of the mix?

The ATSC called this main element an “anchor”. Since dialogue is usually this anchor, the system used an algorithm to detect speech and would only consider that to calculate the integrated level. I’ve done some tests with both Waves WLM and Nugen VisLM and the algorithm works pretty well, the integrated value doesn’t even budge when you are just monitoring non-dialogue content although singing usually confuses it.

In fact, on the 2011 update, the ATSC standard started to differentiating between regular programmes and commercials. Dialogue based gating would be used for the former while the all elements in the entire mix would be consider for the latter. This was actually one the main goals of the ITU standard initially: to avoid commercials being excessively loud in comparison to the programmes themselves.

Nevertheless, the ATSC updated the standard again in 2013 to follow the ITU BS.1770-3 directives and from then on all content would be measured using the same two gated method Europe uses. Because of this, I was tempted to just avoid mentioning all this ATSC history mess but I thought it was important to explain it, so yo can understand why some loudness plugins offer so many different ATSC options.

Here you can see the ATSC options on WLM. The first two would be pre 2013, using either dialogue detection or the whole mix to calculate the integrated time. The third, called “2013” used the gated method ala Europe.

TV Regional and National Standards

Now that we have a good idea of all the different characteristics standards use, let’s see how they compare.

Country / Region Standard Units Used Integrated Level True Peak Weighting Integrated level method
Europe EBU R128 LUFS -23 LUFS -1 dBTP K Relative Gate
US ATSC A/85 post 2013 LKFS -24 LKFS -2 dBTP K Relative Gate
US ATSC A/85 pre 2013 (Commercials) LKFS -24 LKFS -2 dBTP K All elements are considered
US ATSC A/85 pre 2013 (Programmes) LKFS -24 LKFS -2 dBTP K Dialogue Detection
Japan TR-B32 LUFS -24 LUFS -2 dBTP K Relative Gate
Australia OP-59 LKFS -24 LKFS -2 dBTP K Relative Gate

As you can see, currently, there are only small differences between them.

Loudness for Digital Platforms

I have tried to find the specifications for some of the most used digital platforms but I was only able to find the latest Netflix specs. Hulu, Amazon and HBO don’t specify their requirements or at least not publicly. If you need to deliver a mix to these platform, make sure they send you their desired specs. In any case, using the latest EBU or ATSC recommendations is probably a good starting point.

In the case of Netflix, their specs are very curious. They ask for a integrated level of -27 LKFS and a maximum true peak of -2 dBTP. The method to measure the integrated level would be dialogue detection, like the ATSC used to recommend, which in a way is a step back. Why would Netflix recommend this if the ATSC spec moved on to gated based measurements? Netflix basically says that when using the gated method, mixes with a large dynamic range tend to leave dialogue too low so they propose a return to the dialogue detection algorithm.

The thing is, this algorithm is old and can be inaccurate so this decision was controversial. A new, modern and more robust algorithm could be a possible solution for this high dynamic range mixes. Also, -27 LKFS may sound too low but it wasn’t chosen arbitrarily but based on the fact that that was the level where dialogue would usually end up on these mixes. If you want to know more about this, you can check this, this and this article.

Loudness for Theatrical Releases

The case of cinema is very different from broadcast for a very simple reason: you can expect a certain homogeneity in the reproduction systems that you won’t find in home setups. For this reason there is no hard loudness standard that you have to follow.

Dolby Scale SPL (dBC)
7 85
6.5 83.33
6 81.66
5.5 80
5 78.33
4.5 76.66
4 75
3.5 65

This lack of general standard has resulted in a similar loudness war to the one in the music mixing world. The result are lower dynamic ranges and many complains about cinemas being too loud. Shouldn’t cinema mixes offer a bigger dynamic range experience than TV? How are these levels determined?

Cinema screens have a Dolby box where the projectionist would set the general level. These levels are determined by the Dolby Scale and correspond to SPL measures under a C curve when using the “Dolby noise”. Remember that, in the broadcast world, the K curve is used instead which doesn’t help things when trying to translate between both.

Nowadays more and more cinemas are automated. This means that levels are set via software or even remotely. At first, all cinemas were using level 7, which is the one recommended by Dolby but as movies were getting louder and people complained, projectionists would start to use lower levels. 6, 5 and even 4.5 are used regularly. In turn, mixers started to work in those levels too which resulted in louder mixes overall in order to get the same feel. This, again, made cinemas lower their levels even more.

You see where this is going. To give you an idea, Eelco Grimm together with Michel Schöpping analyzed 24 movies available at dutch cinemas and found out levels that would vary wildly. The integrated level went from -38 LUFS to -20 LUFS, with the maximum Short-term level varying from -29 LUFS to -8 LUFS and the maximum True-Peak level varying from -7 to +3.5 dBTP. Dialogue levels varied from -41 to -25 LUFS. That’s quite a big difference, imagine if that would be the case in broadcast.

The thing is that despite these numbers being very different, we have to remember that all these movies probably were played at different levels on the dolby scale. Eelco says on his analysis:

  • The average playback level for movies mastered at '7' is -28 LUFS (-29 to -25).

  • The average playback level for movies mastered at '6.3' is -23 LUFS (-25 to -21). They are projected 3 dB softer, so if we corrected the average to a '7' level, it would be -26 LUFS.

  • The average playback level for movies mastered at '5' is -20 LUFS (all were -20). They are projected 7 dB softer, so the corrected average would be -27 LUFS.

So, as you can see, at the end dialogue level is equivalent to about -27 LUFS in all cases, the only difference is that the movies that were mixed at 7 (which is the recommended level) would have greater dynamic range, something important to be able to give a cinematic feel that TV can’t provide. The situation is quite unstable and I hope a solid solution based in the ITU recommendations is implemented at some point. If you want to know more about all this issue and read the paper that Eelco Grimm released, check this comprehensive article.

Loudness standards for video games.

Video games are everywhere: consoles, computers, phones, tablets, etc, so there is no clear standard to use. Having said that, some companies have stablished some guidelines. Sony, through their ASWG-R001 document recommends the following:

  • -23 LUFS and -1dBTP for Playstations 3 and 4 games.

  • -18 LUFS and -1dBTP for PSVita games.

  • The maximum loudness range recommended is 20 LU.

But how do you measure the integrated loudness in a game? Integrated loudness was designed for linear media so Sony’s document recommends to make measurements in 30 minutes sessions that are a good representation of different sections of the game.

So, despite games being so diverse in platforms and contexts using the EBU recommendations for consoles and PC (-23 LUFS) and a louder spec for mobile and portable games (-18 LUFS) would be a great starting point.

Conclusions and some plugins.

I hope you now have a solid foundation of knowledge for the subject. Things will keep changing so if your read this in the future, assume some of this information is outdated. Nevertheless, you would have hopefully learned the concepts you need to work with loudness now and in the future.

If you want to test loudness, many DAWS (including Pro Tools) don’t have a built-in meter that can measure LUFS/LKFS but there are plugins to solve this. I recommend that you try both Waves WLM and Nugen VisLM. If you can’t afford a loudness plugin, you can try Youlean, which has a free version and is a great one to start with.

Thanks for reading!

Figuring out: Gain Staging

What is it?

Gain staging is all about managing the audio levels of different layers within an audio system. In other words, when you need to make something louder, good gain staging is knowing where in the signal chain would be best to do this. 

I will focus this article on the realm of mix & post-production work under Protools, since this is what I do daily, but these concepts can be applied in any other audio related situation like recording or live sound.

Pro Tools Signal Chain

To start with, let's have a look at the signal chain on Protools:

Untitled Diagram (10).png

Knowing and understanding this chain is very important when setting your session up for mixing. Note that other DAWs would vary in their signal chain. Cubase, for example, offers pre and post-fader inserts while on Pro Tools every insert is always pre-fader except from the ones on the master channel.

Also, I've added a Sub Mix Bus (an auxiliar) at the end of the chain because this is how usually mixing templates are set up and is important to keep it in mind when thinking about signal flow.

So, let's dive into each of the elements of the chain and see their use and how they interact with each other.

Clip gain & Inserts

As I was saying, on Pro Tools, inserts are pre-fader. It doesn't matter how much you lower your track's volume, the audio clip is always hitting the plugins with its "original" level. This renders clip gain very handy since we can use it to control the clip levels before they hit the insert chain.

You can use clip gain to make sure you don't saturate your first insert input and for keeping the level consistent between different clips on the same track. This last use is specially important when audio is going through a compressor since you want roughly the same amount of signal being compressed across all the different clips on a given channel.

So what if you want a post-fader insert? As I said, you can't directly change an insert to post-fader but there is a workaround. If you want to affect the signal after the track's volume, you can always route that track or tracks to an auxiliar and have the inserts on that aux. In this case, these inserts would be post-fader from the audio channel perspective but don't forget they are still pre-fader from the aux channel own perspective.

Signal flow within the insert chain

Since the audio signal flows from the first to the last insert, when choosing the order of these plugins is always important to think about whatever goal you want to achieve. Should you EQ first? Compress first? What if you want a flanger, should it be at the end of the chain or maybe at the beginning?

I don't think there is definitive answer and, as I was saying, the key is to think about the goal you have in mind and whichever way makes conceptual sense to your brain. EQ and compression order is a classic example of this. 

The way I usually work is that I use EQ first to reduce any annoying or problematic frequencies, having also a high pass filter most of the time to remove unnecessary low end. Once this is done, I use the compressor to control the dynamic range as desired. The idea behind this approach is that the compressor is only going to work with the desired part of the signal.

I sometimes add a second EQ after the compressor for further enhancements, usually boosting frequencies if needed. Any other special effects, like a flanger or a vocoder would go last on the chain.

Please note that, if you use the new Pro Tools clip effects (which I do use), these are applied to the clip before the fader and before the inserts.

Channel Fader

After the insert chain, the signal goes through the channel fader or track volume. This is where you usually do most of the automation and levelling work. A good gain stage management job makes working with the fader much easier. You want to be working close to unity, that is, close to 0.

This means that, after clip gain, clip effects and all inserts; you want the signal to be at your target level when the fader is hovering around 0. Why? This is where you have the most control, headroom and confort. If you look closely at the fader you'll notice it has a logarithmic scale. A small movement next to unity would suppose 1 or 2 dB but the same movement down below could be a 10 dB change. Mixing close to unity makes subtle and precise fader movements easy and confortable.

Sends

Pro Tools sends are post-fader by default and this is the behaviour you would usually want most of the time. Sending audio to a reverb or delay is probably the most common use for a send since you want to keep 100% of the dry signal and just add some wet processed signal that will change in level as the dry also changes.

Pre-fader sends are mostly useful for recording and live mixing (sending a headphone mix is a usual example) and I don't find myself using them much on post. Nevertheless, a possible use on a post-production context could be when you want to work with a 100% of the wet signal regardless of how much of the dry signal is coming through. Examples of this could be special effects and/or very distant or echoey reverbs where you don't want to keep much of the original dry signal.

Channel Trim

Trim is pretty much like effectively having two volume lanes per track. Why would this be useful? I use trim when I already have an automation curve that I want to keep but I just want to make the whole thing louder or quieter in a dynamic way. Once you finish a trim pass, both curves would coalesce into one. This is the default behaviour but you can change it on Preferences > Mixing > Automation.

VCAs

VCAs are a concept that comes from analogue consoles (Voltage Controlled Amplifier) and allows you to control the level of several tracks with a single fader. They use to do this by controlling the voltage reaching each channel but on Pro Tools, VCAs are a special type of track that doesn't have audio, inserts, inputs or outputs.  VCA tracks just have a volume lane that can be used to control the volume of any group of tracks.

So, VCAs are something that you usually use when you want to control the overall level of a section of the mix as a whole, like the dialogue or sound effects tracks. In terms of signal flow, VCAs are just changing a track level via the track's fader so you may say they just act as a third fader (the second being trim).

Why is this better that just routing the same tracks to an auxiliar and changing the volume there? Auxiliars are also useful, as you will see on the next section, but if the goal is just level control, VCAs have a few advantages:

  • Coalescing: After every pass, you are able to coalesce your automation, changing the target tracks levels and leaving your VCA track flat and ready for your next pass.

  • More information: When using an auxiliar instead of a VCA track, there is no way to know if a child track is being affected by it. If you accidentally move that aux fader you may go crazy trying to figure out why your dialogue tracks are all slightly lower (true story). On the other hand, VCAs show you a blue outline (see picture below) with the real affected volume lane that would result after coalescing both lanes so you can always see how a VCA is affecting a track.

  • Post fader workflow: Another problem of using an auxiliar to control the volume of a group of tracks, is that if you have post-fader sends on those tracks, you will still send that audio away regardless of the parent's auxiliar level. This is because you are sending that audio away before you send it to the auxiliar. VCAs avoid this problem by directly affecting the child track volume and thus also affecting how much is sent post-fader.

Sub Mix buses

This is the final step of the signal chain. After all inserts, faders, trim and VCA, the resulting audio signals can be routed directly to your output or you may also consider using a sub mixing bus instead. This is an auxiliar track that sums all the signals from a specific group of channels (like Dialogue tracks) and allows you to control and process each sub mix as a whole.

These are the type of auxiliar tracks that I was taking about on the VCA section. They may not be ideal to control the levels of a sub mix, but they are useful when you want to process a group of tracks with the same plugins or when you need to print different stems.

An issue you may find when using them is that you may find yourself "fighting" for a sound to be loud enough. You feel that pushing the fader more and more doesn't really help and you barely hear the difference. When this happens, you've probably run out of headroom. Pushing the volume doesn't seem to help because a compressor or limiter further on the signal chain (that is, acting as a post-fader insert) is squashing the signal.

When this happens, you need to go back and give yourself more headroom by making sure you are not over compressing or lowering every track volume until you are working on manageable level. Ideally, you should be metering your mix from the start so you know where you are in terms of loudness. If you mix to any loudness standard like EBU-R128, that should give you a nice and comfortable amount of headroom.

Final Thoughts

Essentially, mixing is about making things louder or quieter to serve the story that is being told. As you can see, is important to know where in the audio chain the best place to do this is. If you keep your chain in order, from clip gain to the sub mix buses, making sure levels are optimal every step of the way. you'll be in control and have a better idea on where to act when issues arise. Happy Mixing.

Figuring out: Dolby Atmos

Figuring out: About this series
They say the best way to really learn about something is to force yourself to explain it to someone. That is the goal of this series. I will delve into a topic that I feel don't know enough about and explain my findings. Hopefully, we would both learn something useful!

Logo_Dolby_Atmos.svg.png

More than a gimmick?

Up until some months ago, Dolby Atmos was to me mostly about having speakers on the ceiling in the hope of attracting people back to the cinemas. After getting to know Atmos a little better, I wanted to see what it has to offer and if it is really going to be the new standard in professional audio. Consider this a 101 introduction on Dolby Atmos.

Surround Systems

Before Atmos, let´s start with something familiar. Surround systems have been used for decades to offer a more interesting audio experience for the listener. 5.1 and 7.1 are the more used formats for both cinemas and home setups.

Something important to understand about these systems is that they are channel-based. For example, a 7.1 system would offer us the following channels:

As you can see, these channels can be composed of just one speaker (like the central channel) or by several of them (like the left surround channel). We can send audio to any channel independently but we would have no control on how much is sent to each of the individual speakers that form a channel.

That is basically how all surround systems work, the only thing that varies is the amount of channels.

Dolby Atmos introduces two innovation to the table. Firstly, it uses an object-based approach on top of the previous channel-based system. Secondly, it expands the surround feel by adding speakers to the ceiling and unlocking 3D sound. Let´s look at both of these features:

Object-based

Dolby Atmos allows for 128 channels in total. We can use a certain amount of those for traditional channel-based stems and the rest for the new sound objects. 

Think about these sound objects as individual mono sounds that you can place and move around the room. If you place a sound object on a specific location, Dolby Atmos will play the sound on that location, addressing the nearby speakers individually as needed, regardless on how big the room is or how many speakers there are.

In other words, you are telling Atmos the coordinates of the sound instead of how much the sound is feeding each of the channels. It allows you to place sounds with great precision in big rooms but at the same time, the mix will translate well into smaller rooms or even headphones since Atmos is just using the coordinates of each sound object in 3D space.

3D Sound

The second innovation is probably the flashiest.

If you think about it, stereo is one dimensional, sound moves in a horizontal line. Surround audio is 2D, the soundscape is around you, on a horizontal plane. 3D is the next step: sound would be on a cube or a sphere.

Before Atmos, some surround 9.1 systems tried to achieve this by placing two speakers on top of the front speakers in order to give some "height" to some elements of the mix.

Dolby Atmos goes one step beyond adding speakers to the ceiling itself. Elements like ambiences, FX or music can now be placed overhead, opening the third dimension for the listener.

In theatres, these ceiling speakers usually go in two rows. There are also some extra surround speakers on the walls to make panning smoother when transitioning sounds between onscreen and offscreen. In total, up to 64 individual speakers are allowed on a theatrical Atmos installation.

At home, usually two or four overhead speakers are used, so you'll see configurations like 5.1.2 or 7.1.4. Note how the third set of numbers denotes the number of ceiling speakers. Up to 22 speakers are allowed on home setups.

Since installing ceiling speakers may not always be very practical on a home setting, sometimes sound is "fired" to the ceiling so that it bounces back to the listener giving the impression that it comes from above.

Crafting a soundscape with Atmos in mind

Knowing that a project will be mixed in Atmos changes the approach in terms of sound design and mixing, giving us more tools and challenges to achieve a compelling soundtrack.

For example, building ambiences now has an additional dimension. Imagine a scene inside a car while is raining. You could have different layers of the car engine and the city exterior and then the sound of the rain falling into the roof featured on the overhead speakers. A forest ambience could have discreet mono birds chirping above and around you, some of them static, some of them moving throughout the 3D space.

It's also worth noting that Atmos setups usually include one or more extra subwoofers close to the surrounds and overhead speakers. Although low frequencies are not very directional, it sill makes a difference in terms of sound placement to use the surround subwoofer instead of the one behind the screen.

Additionally, the Atmos standard makes sure that all surround speakers offer the same sound pressure level and frequency response as the onscreen ones. This means that while designing sound objects with a wide frequency range like a fighter jet going by overhead we have the whole spectrum at our disposal. This wasn't the case with previous systems, since the surround speakers did not have enough power and were best suited for simple atmospheric and background sounds.

Atmos makes you think more on where you want the audio to be in a 3D space rather than thinking about which channels and speakers to feed the audio to. It turns the mix into a full frequency canvas to position your elements.

Encoding for Dolby Atmos.

When preparing audio for Atmos, there are two distincts uses we can give to each of the available 128 channels. We can have sound objects as discussed above and we can also have channel-based submixes (beds). These beds can be created in any traditional channel-based configuration like 5.1 or 7.1 and are mapped to individual speakers or arrays of speakers the old fashioned way. In contrast, objects are not mapped to any speaker but saved with metadata that describes their coordinates over time.

This double approach (beds + objects) makes Atmos backwards compatible since we are also creating a traditional channel-based version when creating the masters.

To put all this information together we use a renderer. I won't go into a too much detail here, but Dolby basically offers two ways of doing this:

Dolby Mastering Suite + RMU:
This is the most advanced option, it is used for theatrical applications and Dolby certified rooms. It combines the Dolby Mastering Suite software with the Dolby Rendering and Mastering Unit (RMU), a dedicated Dell server computer that communicates with Pro Tools via MADI and processes all the Atmos information while compensating for any delays in the system. 

The RMU can be used for monitoring, authoring and recording Dolby Atmos print masters. It is also used for creating and loading room calibrations and configurations.

Note that the Dolby Mastering Suite software runs only on dedicated hardware (the RMU), while we would still need a different software package for any Pro Tools systems involved in the Atmos workflow. This would be the Dolby Production Suite, which I'm explaining below. The Dolby Mastering Suite includes three Dolby Production Suite copies but you can also buy the latter separately.

The mighty RMU

Dolby Production Suite:
This is the package that should be installed on the Pro Tools machines. It basically includes the renderer itself, a monitoring application and all the necessary Pro Tools plugins. In case you are using an RMU, this package will allow you to connect with it. If you are not, it will allow you to play, edit and record any Atmos mixes all within the same Pro Tools system.

While the Dolby Atmos Production Suite includes the ability to render Atmos objects, just like you can using the RMU, it has significant limitations. The software is an "in the box" renderer that runs on the same system as your Pro Tools session so if your project is large you may not be able to run it. Also, the software won't be able to compensate for any delays produced in the system.

Having said that, the Dolby Production Suite may be powerful enough for Blue-ray, streaming and VR projects with a limitation of up to 22 monitor outputs. For larger and/or theatrical projects an RMU is necessary, being capable of up to 64 outputs.

Dolby Atmos Everywhere

Atmos in home theatres is not rendered the same way as in cinemas because of limited bandwidth and lack of processing power. Close objects and speakers are clustered together conserving any relevant panning metadata. This simplified Atmos mix can be played through a home Atmos setup, like a 7.1.2.

Since ceiling speakers are cumbersome, home setups are becoming more accessible with the inclusion of sound bars and upward-firing speakers.

Blu-rays can carry an Atmos soundtrack and some broadcasting and streaming companies like Sky or Netflix are starting to offer Atmos content. The 2018 winter olympics was the first live event offered in Atmos.

In the world of video games, Dolby Atmos could be specially promising, enhancing the player's experience with immersive and expressive 3D audio. Currently, Xbox One, the PC and somewhat the PS4 offer dolby Atmos options via either an AV receiver or headphones (behind a paywall). There are a handful of titles ready for Atmos like Overwatch, Battlefield 1 or Star Wars: Battlefront.

Any Atmos mix can be scaled down into a pair of headphones. You don't need surround headphones for this, the Dolby algorithms convert all the Atmos channels into a stereo binaural signal that sounds around you in 360°. Some phones and tablets are starting to support this already.

Final Thoughts

It seems like Dolby Atmos is here to stay and become the new standard the same way stereo and surround sound replaced their older counterparts.

In my opinion, The key quality about Atmos is its object-based technology and scalability. Overhead 3D audio is very cool, but it may not be game changing enough and/or very accessible for the average user. It is still to be seen if binaural headphone technology and upward-firing speakers are going to be good enough to recreate the 3D feel that currently theatres can provide.

All you need to know about the decibel

Here is an bird's eye view on the decibel and how understanding it can be useful if you work as a sound designer, sound mixer or even just anywhere in the media industry.

I've included numbered notes that you can open to get more information. So, enter, the decibel:

The Decibel is an odd unit. There are three main reasons for this: 

1: A Logarithmic Unit

Firstly, a decibel is a logarithmic unit1. Our brains don't usually enjoy the concept of logarithmic units since we are used to things like prices, distances or weights, which usually grow linearly in our every day lives. Nevertheless, logarithmic units are very useful when we want to represent a vast array of different of values.

Let's see an example: If we take a value of 10 and we make it 2, 3 or 5 times bigger, we'll see that the resulting value will get huge pretty fast on a logarithmic scale.2
  1. Note that I will use logarithmic units and logarithmic scales interchangeably.

  2. I'm using a logarithm to base 10. Is the easiest to understand since we use the decimal system.

 
How much bigger? Value on a linear scale Value on a logarithmic scale
1 Time 10 10
2 Times 20 100
3 Times 30 1000
4 Times 40 10000
5 Times 50 100000
 
The reason behind this difference is that, while the linear scale is based on multiplication, the logarithmic scale uses exponentiation.3 Here is the same table but with the math behind it, including the generic formula:
  1. And actually, the logarithm is just the inverse operation to exponentiation, that's why sometimes you will see exponential scales or units. They are basically the same as a logarithmic ones.

 
How much bigger? Value on a linear scale Value on a logarithmic scale
1 Time 10 (10*1) 10 (101)
2 Times 20 (10*2) 100 (102)
3 Times 30 (10*3) 1000 (103)
4 Times 40 (10*4) 10000 (104)
5 Times 50 (10*5) 100000 (105)
X Times 10*X 10X
 

As you can see, with just a 5 times increment we get to a value of a hundred thousand. That can be very convenient when we want to visualise and work with values on a set of data ranging from dozens to millions. 

Some units work fine on a linear scale because we usually move within a small range of values. For example, let's imagine we want to measure distances between cities. As you can see, most values are between 3000 and 18000 km, so they fit nicely on an old fashioned linear scale. It's easy to see how the distances compare.

Now, let's imagine we are still measuring distances between cities, but we are an advanced civilization that has founded some cities throughout the galaxy. Let's have a look:

As you can see, the result is not very easy to read. Orion is so far away that all other distances are squashed on the chart. Of course, we could use light years instead of km and that would be much better for the cities on other stars but then we will have super low, hard to use numbers for the earth cities. Another solution would be measure earth cities in kllometres and galaxy cities in light years but then we wouldn't be able to easily compare the values between them. 

The logarithmic scale offers us a solution for this problem since it easily covers several orders of magnitude. Here is the same distance chart, but on a logarithmic scale, I just took the distances in kilometres and calculated their logarithms.

This is much more comfortable to use, we can get a better idea of the relationships between all these distances.

Like the city examples above, some natural phenomena that span through several orders of magnitude, are more comfortably measured with a logarithmic scale. Some examples are pH, earthquakes and... you guessed it, sound loudness. This is the case, because our ears are ready to process both very quiet and very loud sounds.4
  1. It seems like we animals experience much of the world in a logarithmic way. This also includes sound frequency and light brightness. Here is a cool paper about it.

So the take away here is that we use a logarithmic scale for convenience and because it gives us a more accurate model of nature.

2: A Comparative Unit

Great, so we have now an easy to use scale to measure anything from a whisper to a jet engine, we just need to stick our sound level meter out of the window and check the number. Well, is not that simple. When we say something is 65dB, we are not just making a direct measurement, we are always comparing two values. This is the second reason why decibels are odd, let me elaborate:

Decibels are really the ratio between a certain measured value and a reference value. In other words, they are a comparative unit. Just saying 20dB is incomplete in the same way that just saying 20% is incomplete. We need to specify the reference value we are using. 20% percent of what? 20dB respect to what? So, what kind of reference value could we use? This brings me to the third reason:

3: A Versatile Unit

Although most people associate decibels with sound, but they can be used to measure ratios of values of any physical property. These properties can be related to audio (like air pressure or voltage) or they may have little or nothing to do with audio (like light or reflectivity on a radar). Decibels are used in all sort of industries, not only audio. Some examples are electronics, video or optics.

OK, with those three properties in mind, let's sum up what a decibel is.

A decibel is the logarithmically expressed ratio between two physical values

Let that sink in and make sure you really get those three core concepts.
Now, let's see how we can use them to measure sound loudness, that's why we were here if I remember correctly.

In space, nobody can hear you scream

14784812262_6f1534b0e2_b.jpg

As much as Star Wars is trying to convince us on the contrary, sound's energy needs a physical medium to travel through. When sound waves disturb such mediums, there is measurable pressure change as the atoms move back and forth. The louder the sound, the more intense this disturbance is.

Since air is the medium through which we usually experience sound, this gives us the most direct and obvious way of measuring loudness: we just need to register how pressure changes on a particular volume of air. Pressure is measured in Pascals, so we are good to go. But wait, if this is the most direct way of measuring loudness couldn't we just say that a pair of speakers are capable of disturbing the air with a pressure of 6.32 Pascals and forget about decibels?

Well, we could, but again, it wouldn't be very convenient. While the mentioned speakers can reach 6.32 Pascals and this seems like a comfortable number to manage, here are some other examples, from quiet to loud:

 
Source Sound Pressure in Pascals (Pa) Sound Pressure (mPa)
Microsoft's Anechoic Chamber 0.0000019 0.0019
Human Threshold of Hearing @ 1 KHz 0.00002 0.02
Quiet Room 0.0002 0.2
Normal Conversation 0.02 20
Speakers @ 1 meter 6.32 6320
Human Threshold of Pain 63.2 63200
Jet Engine @ 1 meter 650 650000
Rifle shot @ 1 meter 7265 7265000
 

Unless you love counting zeros, that doesn't look very convenient, does it? Note how using Pascals is not very confortable with quiet sounds while mPa (a thousandth of a Pascal) doesn't work very well with loud ones. If our goal is to create a system that measures sound loudness, one of the key things we need is that the unit we use can comfortably cover a large range of values. Several orders of magnitude, actually. To me, that sounds like a job for an logarithmic unit.

Moreover, maybe measuring just naked Pascals doesn't seem like a very useful thing to do when our goal is to just get an idea of how loud stuff is. A better way of doing this, could be to compare our measured value to a reference value and get the ratio between the two. This is starting to sound an awful lot like our previous definition of a decibel! We are getting somewhere.

So, what could we use as a reference level to measure the loudness of sound waves on the air? If you have a look at the table above, you'll notice a very good candidate: the human threshold of hearing. If we do this, 0dB would be the very minimal pressure our ears can detect and after that, the numbers would go up in a comfortable scale as we go up in intensity. Even better, if we measure sounds that are below our ear's threshold the resulting number will be negative, indicating not only that the sound would be imperceptible for us but also saying by how much. That's an elegant system right there. I'm starting to dig decibels.

Now, let's look at the previous Pascals table, but adding now the corresponding decibel values:

 
Source Sound Pressure in Pascals dBSPL
Microsoft's Anechoic Chamber 0.0000019 -20.53
Human Threshold of Hearing @ 1 KHz 0.00002 0
Quiet Room 0.0002 20
Normal Conversation 0.02 60
Speakers @ 1 meter 6.32 110
Human Threshold of Pain 63.2 130
Jet Engine @ 1 meter 650 150
Rifle shot @ 1 meter 7265 171
 

That looks like a much easier scale to use. Remember that dBs are used to measure both very quiet things like anechoic chambers and very loud stuff like space rockets. This scale does a better job for the whole range of human audition, it is fine tuned to those microphones we carry around and call ears.

Decibel Flavours

Did you notice that on the table above there is a cute subindex after dB that reads SPL? What's up with that? That subindex stands for Sound Pressure Level and is a particular flavour of decibel. Since decibels can be based on any physical property and since they can use any reference value, we can have many different flavours of decibels depending of which measured property and reference value is more convenient to use in each case.

In the case of dBSPL, this type of decibel is telling us two things. Firstly, that the physical property we are using is pressure. Secondly, that our reference value is the threshold of human hearing. This is fine for measuring loudness on sound waves travelling through the air but, is audio information capable of travelling through other mediums?
AcousticSession.jpg

We have learned to transform the frequency and amplitude information contained in sound waves in the air into grooves in a record or streams of electrons in a cable. That's a pretty remarkable feat that deserves its own post but for now let's just consider that we are able to "code" audio information into flows of electrons that we can measure.

Since dBs can used with any physical property, we can use units from the realm of electronics like watts or volts to measure loudness in a electrical audio signal. In this sense, both pascals and volts give us an idea of how intense a sound signal is, even though they refer to very different physical properties.

So, we need to establish which units and reference values will be useful to use to build new decibel flavours. We also need to label our particular flavour of dB somehow. This is usually done using a subindex (dBSPL) or a suffix (dBu).

Let's have a look at some of the most used decibel flavours:

dB Unit Property Measured (Unit) Reference Value Used on
dBSPL Pressure (Pascals) 2*10-5 Pascals
(Human Threshold of Hearing)
Acoustics.
dBA, dBB, and dBC Pressure (Pascals) 2*10-5 Pascals
(Human Threshold of Hearing)
Acoustics when accounting for
human sensitivity
to different frequencies.
dBV Electric potential (Volts) 1 Volt Consumer audio equipment.
dBu Electric potential (Volts) 0.7746 Volts Professional audio equipment.
dBm Electric Power (Watts) 1mW Radio, microwave and
fiber-optical communication networks.

As you can see, we can also use units from the electric realm to measure how loud an audio signal is. We will choose the most convenient unit depending on the context. Ideally, when using decibels, the type should be stated although sometimes it has to be inferred by the context.

If you read dB values on a mixer desk, for example, chances are they will be dBu, since this is the unit usually used in professional audio. When shopping for a pair of speakers or headphones, SPL values are usually given. Finally, when measuring things like an office space or a computer fan you will see dBA, dBB or dBC. These units are virtually the same as dBSPL but they apply different weighting filters that account for how we are more sensitive to certain frequencies than others in order to get a more accurate result.

And that's all folks. I left several things out of this post because I wanted to keep it focused on the basics. The decibel has some more mysteries to unravel but I'll leave that for a future post. In the meantime, here are some bullet points to refresh you on what you've learned:

Takeaways

The decibel:

  • Uses the logarithmic scale which works very well when displaying a wide range of values.

  • Is a comparative unit that always uses the ratio between a measured value and a reference value.

  • Can be used with any physical property, not only sound pressure.

  • Uses handy reference values so the numbers we manage are more meaningful.

  • Comes in many different flavours depending on the property measured and the reference value.