Figuring out: Audio Pull up/down

When working with video, an audio pull up or pull down is needed when there´s been a change in the picture´s frame rate and you need to tweak the audio to make sure it stays in sync.

This subject is somehow always surrounded by a layer of mysticism and confusion so this is my attempt of going through the basics and hopefuly get some clarity.

Audio Sampling Rate

First, we need to understand some basic digital audio concepts. Feel free to skip this if you have it fresh.

Whenever we are converting an audio signal from analogue to digital, all we are doing is checking where the waveform is at certain “points” in its oscilation. These “points” are usually called samples.

In order to get a faithful signal, we need to sample our waveforms many times. The number of times we do this per second is what determines sampling rate and is measured in Hertzs.

Keep in mind that if our sampling rate is not fast enough, we won´t be able to “capture” the higher frequencies since these would fluctuate faster than we can measure. So how fast do we need to be for accurate results?

The Nyquist-Shannon sampling theorem gives us a very good estimation. It basically says that we need about twice the sampling rate of the highest frequency we want to capture. Since the highest frequency humans can hear is around 20Khz, a sampling rate of 40Khz should suffice. Once we know this, let´s see the most comonly used sampling rates:

Sampling Rate Use
8 KhZ Telephones, Walkie-Talkies
22 Khz Low quality digital audio
44.1 Khz CD quality, the music standard.
48 KHz The standard for professional video.
96 Khz DVD & Blu-ray audio
192 Khz DVD & Blu-ray audio.
This is usually the highest sampling rate for professional use.

As you can see, most professional formats use a sampling rate higher than 40 Khz to guarantee that we capture the full frequency spectrum. Something that is important to remember and that will become relevant later on is that a piece of audio is always going to be the same lenght as long as it is played at the same sample rate that it was recorded.

For the sake of completion, I just want to mention audo resolution (or bit depth) briefly. This is the other parameter that we need to take into consideration when converting to digial audio. It measures hoy many bits we use to encode the information of each of our samples. Higher values will give us more dynamic range, since a bigger range of intensity values will be captured. This doesn´t really affect the pull up/down process.

Frames per second in video

Let´s now jump to the realm of video. There´s a lot to be said on the subject of frame rate but I will just keep it short. This value is simply how many pictures per second are put together to create our film or video. 24 frames per second (or just fps) is the standard for cinema, while TV uses 25 fps in europe (PAL) and 29.97 fps in the US (NTSC).

Keep in mind that these frame rates are different not only on a technical level but also on a stylistic level. 24 fps “feels” cinematic and “premium” while sometimes the higher frame rates used in TV feel “cheap”. This is probably a cultural perception and is definitely changing. Videogames, which many times use high frame rates like 60 fps and beyond, are partially responsible for this taste shift. The amount of motion is also very important, higher fps will be the best at showing fast motions.

But how can these different frame rates affect audio sync? The problem usually starts when a project is filmed at a certain rate and then converted to a different one for distribution. This would happen if, for example, a movie (24 fps) is brought into european TV (25 fps) or an american TV programme (29.97 fps) is brought into India, which uses PAL (25 fps).

Let´s see how this kind of conversion is done.

Sampling Rate vs Frame Rate

Some people think that audio can be set to be recorded at a certain frame rate the same way it can be set to be recorded at a certain sampling frequency. This is not true. Audio doesn´t intrinsically have a frame rate value the same way it has a bit depth and sampling rate.

If I give you an audio file and nothing else, you could easily figure out the bit depth and sampling rate but you would have no idea about the frame rate used on the associated video. Now, and here comes the nuanced but important point, any audio recorded at the same time with video will sync with the specific frame rate used when recording that video. They will sync because they were reocrded together. They will sync because what the camera registered as a second of video was also a second of audio in the sound recorder. Of course, machines are not perfect and their clocks may measure a second slightly different and that’s why we connect them via timecode but that’s another story.

This session is set at 24 fps, so each second is divided into 24 frames.

Maybe this confussion comes from the fact that when you create a new session or project in your DAW, you basically set three things: sampling rate, bit depth and frame rate. So it feels like the audio that is going to be inside is going to have those three intrinsic values. But that is not the case with frame rate. In the context of the session, frame rate is only telling your DAW how to divide a second. Into 24 slices? That would be 24 fps. Into 60 slices? That´s 60 fps.

In this manner, when you bring your video into your DAW, the video´s burnt in timecode and your DAW’s timecode will be perfectly in sync but all of this will change nothing about the duration or quality of the audio within the session.

So, in summary, an audio file only has an associated frame rate in the context of the video it was recorded with or to but this is not an intrinsic charactheristic of this audio file and cannot be determined without the corresponding video.

Changing Frame Rate

A frame rate change is usually needed when the medium (cinema, TV, digital…) or the region changes. There are two basic ways of doing this. One of them is able to do it without changing the final duration of the film, usually by re-distributing, duplicating or deleting frames to accomodate the new frame rate. I won’t go into details on these methods partly because they are quite complex but mostly because if the lenght of the final picture is not changed, we don´t need to do anything to the audio. It will be in sync anyway.

Think about this for a second. We have changed the frame rate of the video but, as long as the final leght is the same, our audio is still in sync which kind of shows you that audio has no intrinsic frame rate value. Disclaimer: This will be true as long as the audio and film are kept separated. If audio and picture are on the same celluloid and then you start moving frames around, obviously you are going to mess up the audio but in our current digital age we don’t need to worry about this.

The second method is the one that concern us. This is, when the lenght of the picture is actually changed. This happens because this is the easiest way to fix the frame rate difference, specially if it is not very big.

Telecine. How video frame rate affects audio.

Let´s use the Telecine case as an example. Telecine is the process of transfering a old fashion analogue film into video. This is not always the case but this usually also implies a change in frame rate. As we saw earlier, films are traditionally shot at 24 fps. If we want to broadcast this film in european television, which uses the PAL system at 25 fps, we would need to go from 24 to 25 fps.

The easiest way to do this is just play the original film 4% faster. The pictures will look faster and the movie will finish earlier but the difference would be tolerable. Also, if you can show the same movie in less time in TV that gives you more time for commercials, so win, win.

What are the drawbacks? First, showing the pictures a 4% faster may be tolerable but is not ideal and can be noticeable in quick action sequences. Second and more importantly, now our audio will be out of sync. We can always fix this by also playing the audio a 4% faster (and this would traditionally be the case since audio and picture were embed in the same film) but in this case, the pitch will be increased by 0.68 semitones.

In the digital realm, we can achieve this by simply playing the audio at a different rate that was recorded. This would be the digital equivalent to just cranking the projector faster. Remember before when I said that an audio file will always be the same leght if it is played at the same saple rate as recorded? This is when this becomes relevant. As you can see below, if we play a 48 KHz file at 50 KHz, we would get the same speed up effect that a change from 24 to 25 fps provides.

This would solve our sync problems, but as we were saying, it would increase the final pitch of the audio by about 0.68 semitones.

That increase in pitch may sound small but can be quite noticeable, specially in dialogue musical sections. So how do we solve this? For many years the simple answer was nothing. Just leave it as it is. But nowadays we are able to re-pitch the resulting audio so it matches its original sound or, alternativaly, we can directly change the lenght of the audio file without affecting the pitch. More on tese methods later but first let’s see what happens if, instead of doing a reasonable jump from film to PAL, we need to go from film to NTSC.

Bigger frame rate jumps, bigger problems (but not for us).

If a jump from 24 to 25 is a 4% change, a jump between 24 to 29.976 would be a whooping 24.9%. That´s way too much and it would be very noticeable. Let´s not even think about the audio, everybody would sound as a chipmunk. So how is this accomplished? The method used is what is called a “2:3 pulldown”.

Now, this method is quite involved so I’m not going to explain the whole thing here but let’s see the basics and how it will affect our audio. First let´s start with 30 fps as this was the original frame rate for TV in NTSC. This makes sense because the electrical grid works at 60 Hz in the states. But as people who, for some reason, are happy living this way, things were bound to get messy and after color TV was introduced and for reasons you can see here, the frame had to be dropped by a 1/1000th to 29.976.

A 2:3 pulldown uses the proportion of frames and the interlaced nature of the resulting video to make 4 frames fit into 5. This is because a 24/30 proportion would be equal to a 4/5 proportion. Again, this is complex and goes beyond the scope of this article but if you want more details this video can help.

But wait, we don’t want to end up with 30 frames, we need 29.97 and this is why the first step we do is slow down the film from 24 fps to 23.976. This difference is impossible to detect but crucial to make our calculations work. Once this is done, we can do the actual pulldown which doesn´t change further the lenght of the film, it only re-arranges the frames.

What does this all mean for us, audio people? It means that we only need to worry about that initial change from 24 to 23.976 which would just be a 0.1 % change. That’s small but it will still throw your audio out of sync during the lenght of a movie. So we just need to adjust the speed in the same way we do for the 4% change. If you look again at the picture above, you’ll see that that 0.1% is the change we need to use to go from film to NTSC.

As for the change in pitch, it will be very small but we can still correct it if we need with the methods I show you below. But before that, here is a table for your convenience with all the usual frame changes and the associated audio change that would be needed.

Frame Rate Change Audio Speed Change Pitch Correction (If needed)
Film to PAL 4% Up 4% Down // 96% // -0.71 Semitones
Film to NTSC 0.1% Down 0.1% Up // 100.1% // + 0.02 Semitones
PAL to Film 4% Down 4% Up // 104% // +0.68 Semitones
PAL to NTSC 4.1% Down 4.1% Up //104.1% // +0.68 Semitones
NTSC to Film 0.1% Up 0.1% Down // 99.9% // -0.02 Semitones
NTSC to PAL 4.1% Up 4.1% Down // 95.9% // -0.89 Semitones

Techniques & Plugins

There are two basic methods to do a pull up or pull down. The first involves two steps: first changing the duration of the file while affecting its pitch (using a different sample rate as explained before) and secondly applying pitch correction to match the original’s tone. The way to actually do the first step depends on your DAW but in Pro Tools, for example, you’ll see that when importing audio you have the option to apply SRC (Sample Rate Conversion) to the file as pictured above.

The second method is simply doing all at once with a plugin capable of changing the lenght of an audio file without affecting its pitch.

Also, keep in mind that these techniques can be applied to not only the stereo or the surround final mix file but also the whole session itself, which would give you much more flexibility to adjust your mix on this new version. This makes sense because a 4% change in speed could be enough to put two short sounds too close together and/or the feel of the mix could be a bit different. Personally, I have only used this “whole session” technique with shorter material like commercials. Here is a nice blog post that goes into detail about how to accomplish this.

As for changing a mixed file as a whole, wether you use a one step or two steps method, you will probably find that is easy to introduce glitches, clicks and pops in the mix. Sometimes you get dialogue that sounds metallic. Phase is also an issue, since the time/pitch is not always consistent between channels.

The thing is, time/pitch shift is not a easy thing to accomplish. Some plugins offer different algorithms to choose from depending on the type of material you have. These are designed with music in mind, not dialogue, so “Polyphonic” is the one that is usually the best option for whole mixes. Another trick you can use is to bounce your mix into stems: music, dialogue, FX, ambiences, etc and then apply the shift to each of them indepentdently, applying the best plugin and algorithm to each. This can be very time consuming but will probably give you the best results.

As you can see, this whole process is kind of tricky, particularly the pitch shift step and this is why in some occassions the audio is corrected for sync but left at the wrong pitch. Nevertheless, nowadays we have better shifting plugins to do the job. Here are some of the most commonly used, although remember that non of these works perfect in every ocassion:

-Zplane Elastique: This is in my opinion the best plugin and the one I personally use. It produces the least artefacts, keeps phase coherent and works great on whole mixes, even with single step processing.
-Pro Tools Pitch Shift: This is the stock time/pitch plugin that comes with Pro Tools. It is quite fast but is prone to create artifacts.
-Pro Tools X-Form: This one is more advanced (comes blunded with Pro Tools Ultimate) but it still suffers from some issues like giving dialogue a metallic tone or mesing the phase on stereo and surround. Also, it is slow. Veeeery slow.
-Serato Pitch n Time: I haven’t tried this either but I had to mention it since it is very commonly used and people swear by it.
-Izotope Time & Pitch: It can work well sometimes and offers many customizable settings that you can adjust to avoid artefacts.
-Waves Sound Shifter: Haven´t used it but it’s another option that seems to work well for some applications.

Which one should you choose? There is no clear answer, you will need to experiment with some of them to see what works for each project. Here is a good article and video comparing some of them.

Conclusions

I hope you now have somehow a better understanding on this messy subject. It is tricky from both a theoretical and practical level but I believe is worth figuring out where things come from instead of just doing what others do without really knowing why. Here are some takeaways:

  • Sampling rate and bit depth are intrinsic to an audio file.

  • At the same time, an audio file can be associated to a certain video frame rate when they are both in sync.

  • The frame rate change process is different depending on the magnitud of the change.

  • An audio pull up or pull down is needed when there is a frame rate chenge on the picture that affects its lenght.

  • The pull up/down can be done in two steps: lenght change first, then pitch correction or ir can be done in a single step.

  • Time/Pitch Shift is a complicated process that can produce artefacts, metallic timbres and phase issues.

  • Mixes can be processed by stems or even as whole sessions for more flexibility.

  • Try different plugins and algorithms to improve results.

Thanks for reading!

Figuring out: Gain Staging

What is it?

Gain staging is all about managing the audio levels of different layers within an audio system. In other words, when you need to make something louder, good gain staging is knowing where in the signal chain would be best to do this. 

I will focus this article on the realm of mix & post-production work under Protools, since this is what I do daily, but these concepts can be applied in any other audio related situation like recording or live sound.

Pro Tools Signal Chain

To start with, let's have a look at the signal chain on Protools:

Untitled Diagram (10).png

Knowing and understanding this chain is very important when setting your session up for mixing. Note that other DAWs would vary in their signal chain. Cubase, for example, offers pre and post-fader inserts while on Pro Tools every insert is always pre-fader except from the ones on the master channel.

Also, I've added a Sub Mix Bus (an auxiliar) at the end of the chain because this is how usually mixing templates are set up and is important to keep it in mind when thinking about signal flow.

So, let's dive into each of the elements of the chain and see their use and how they interact with each other.

Clip gain & Inserts

As I was saying, on Pro Tools, inserts are pre-fader. It doesn't matter how much you lower your track's volume, the audio clip is always hitting the plugins with its "original" level. This renders clip gain very handy since we can use it to control the clip levels before they hit the insert chain.

You can use clip gain to make sure you don't saturate your first insert input and for keeping the level consistent between different clips on the same track. This last use is specially important when audio is going through a compressor since you want roughly the same amount of signal being compressed across all the different clips on a given channel.

So what if you want a post-fader insert? As I said, you can't directly change an insert to post-fader but there is a workaround. If you want to affect the signal after the track's volume, you can always route that track or tracks to an auxiliar and have the inserts on that aux. In this case, these inserts would be post-fader from the audio channel perspective but don't forget they are still pre-fader from the aux channel own perspective.

Signal flow within the insert chain

Since the audio signal flows from the first to the last insert, when choosing the order of these plugins is always important to think about whatever goal you want to achieve. Should you EQ first? Compress first? What if you want a flanger, should it be at the end of the chain or maybe at the beginning?

I don't think there is definitive answer and, as I was saying, the key is to think about the goal you have in mind and whichever way makes conceptual sense to your brain. EQ and compression order is a classic example of this. 

The way I usually work is that I use EQ first to reduce any annoying or problematic frequencies, having also a high pass filter most of the time to remove unnecessary low end. Once this is done, I use the compressor to control the dynamic range as desired. The idea behind this approach is that the compressor is only going to work with the desired part of the signal.

I sometimes add a second EQ after the compressor for further enhancements, usually boosting frequencies if needed. Any other special effects, like a flanger or a vocoder would go last on the chain.

Please note that, if you use the new Pro Tools clip effects (which I do use), these are applied to the clip before the fader and before the inserts.

Channel Fader

After the insert chain, the signal goes through the channel fader or track volume. This is where you usually do most of the automation and levelling work. A good gain stage management job makes working with the fader much easier. You want to be working close to unity, that is, close to 0.

This means that, after clip gain, clip effects and all inserts; you want the signal to be at your target level when the fader is hovering around 0. Why? This is where you have the most control, headroom and confort. If you look closely at the fader you'll notice it has a logarithmic scale. A small movement next to unity would suppose 1 or 2 dB but the same movement down below could be a 10 dB change. Mixing close to unity makes subtle and precise fader movements easy and confortable.

Sends

Pro Tools sends are post-fader by default and this is the behaviour you would usually want most of the time. Sending audio to a reverb or delay is probably the most common use for a send since you want to keep 100% of the dry signal and just add some wet processed signal that will change in level as the dry also changes.

Pre-fader sends are mostly useful for recording and live mixing (sending a headphone mix is a usual example) and I don't find myself using them much on post. Nevertheless, a possible use on a post-production context could be when you want to work with a 100% of the wet signal regardless of how much of the dry signal is coming through. Examples of this could be special effects and/or very distant or echoey reverbs where you don't want to keep much of the original dry signal.

Channel Trim

Trim is pretty much like effectively having two volume lanes per track. Why would this be useful? I use trim when I already have an automation curve that I want to keep but I just want to make the whole thing louder or quieter in a dynamic way. Once you finish a trim pass, both curves would coalesce into one. This is the default behaviour but you can change it on Preferences > Mixing > Automation.

VCAs

VCAs are a concept that comes from analogue consoles (Voltage Controlled Amplifier) and allows you to control the level of several tracks with a single fader. They use to do this by controlling the voltage reaching each channel but on Pro Tools, VCAs are a special type of track that doesn't have audio, inserts, inputs or outputs.  VCA tracks just have a volume lane that can be used to control the volume of any group of tracks.

So, VCAs are something that you usually use when you want to control the overall level of a section of the mix as a whole, like the dialogue or sound effects tracks. In terms of signal flow, VCAs are just changing a track level via the track's fader so you may say they just act as a third fader (the second being trim).

Why is this better that just routing the same tracks to an auxiliar and changing the volume there? Auxiliars are also useful, as you will see on the next section, but if the goal is just level control, VCAs have a few advantages:

  • Coalescing: After every pass, you are able to coalesce your automation, changing the target tracks levels and leaving your VCA track flat and ready for your next pass.

  • More information: When using an auxiliar instead of a VCA track, there is no way to know if a child track is being affected by it. If you accidentally move that aux fader you may go crazy trying to figure out why your dialogue tracks are all slightly lower (true story). On the other hand, VCAs show you a blue outline (see picture below) with the real affected volume lane that would result after coalescing both lanes so you can always see how a VCA is affecting a track.

  • Post fader workflow: Another problem of using an auxiliar to control the volume of a group of tracks, is that if you have post-fader sends on those tracks, you will still send that audio away regardless of the parent's auxiliar level. This is because you are sending that audio away before you send it to the auxiliar. VCAs avoid this problem by directly affecting the child track volume and thus also affecting how much is sent post-fader.

Sub Mix buses

This is the final step of the signal chain. After all inserts, faders, trim and VCA, the resulting audio signals can be routed directly to your output or you may also consider using a sub mixing bus instead. This is an auxiliar track that sums all the signals from a specific group of channels (like Dialogue tracks) and allows you to control and process each sub mix as a whole.

These are the type of auxiliar tracks that I was taking about on the VCA section. They may not be ideal to control the levels of a sub mix, but they are useful when you want to process a group of tracks with the same plugins or when you need to print different stems.

An issue you may find when using them is that you may find yourself "fighting" for a sound to be loud enough. You feel that pushing the fader more and more doesn't really help and you barely hear the difference. When this happens, you've probably run out of headroom. Pushing the volume doesn't seem to help because a compressor or limiter further on the signal chain (that is, acting as a post-fader insert) is squashing the signal.

When this happens, you need to go back and give yourself more headroom by making sure you are not over compressing or lowering every track volume until you are working on manageable level. Ideally, you should be metering your mix from the start so you know where you are in terms of loudness. If you mix to any loudness standard like EBU-R128, that should give you a nice and comfortable amount of headroom.

Final Thoughts

Essentially, mixing is about making things louder or quieter to serve the story that is being told. As you can see, is important to know where in the audio chain the best place to do this is. If you keep your chain in order, from clip gain to the sub mix buses, making sure levels are optimal every step of the way. you'll be in control and have a better idea on where to act when issues arise. Happy Mixing.

All you need to know about the decibel

Here is an bird's eye view on the decibel and how understanding it can be useful if you work as a sound designer, sound mixer or even just anywhere in the media industry.

I've included numbered notes that you can open to get more information. So, enter, the decibel:

The Decibel is an odd unit. There are three main reasons for this: 

1: A Logarithmic Unit

Firstly, a decibel is a logarithmic unit1. Our brains don't usually enjoy the concept of logarithmic units since we are used to things like prices, distances or weights, which usually grow linearly in our every day lives. Nevertheless, logarithmic units are very useful when we want to represent a vast array of different of values.

Let's see an example: If we take a value of 10 and we make it 2, 3 or 5 times bigger, we'll see that the resulting value will get huge pretty fast on a logarithmic scale.2
  1. Note that I will use logarithmic units and logarithmic scales interchangeably.

  2. I'm using a logarithm to base 10. Is the easiest to understand since we use the decimal system.

 
How much bigger? Value on a linear scale Value on a logarithmic scale
1 Time 10 10
2 Times 20 100
3 Times 30 1000
4 Times 40 10000
5 Times 50 100000
 
The reason behind this difference is that, while the linear scale is based on multiplication, the logarithmic scale uses exponentiation.3 Here is the same table but with the math behind it, including the generic formula:
  1. And actually, the logarithm is just the inverse operation to exponentiation, that's why sometimes you will see exponential scales or units. They are basically the same as a logarithmic ones.

 
How much bigger? Value on a linear scale Value on a logarithmic scale
1 Time 10 (10*1) 10 (101)
2 Times 20 (10*2) 100 (102)
3 Times 30 (10*3) 1000 (103)
4 Times 40 (10*4) 10000 (104)
5 Times 50 (10*5) 100000 (105)
X Times 10*X 10X
 

As you can see, with just a 5 times increment we get to a value of a hundred thousand. That can be very convenient when we want to visualise and work with values on a set of data ranging from dozens to millions. 

Some units work fine on a linear scale because we usually move within a small range of values. For example, let's imagine we want to measure distances between cities. As you can see, most values are between 3000 and 18000 km, so they fit nicely on an old fashioned linear scale. It's easy to see how the distances compare.

Now, let's imagine we are still measuring distances between cities, but we are an advanced civilization that has founded some cities throughout the galaxy. Let's have a look:

As you can see, the result is not very easy to read. Orion is so far away that all other distances are squashed on the chart. Of course, we could use light years instead of km and that would be much better for the cities on other stars but then we will have super low, hard to use numbers for the earth cities. Another solution would be measure earth cities in kllometres and galaxy cities in light years but then we wouldn't be able to easily compare the values between them. 

The logarithmic scale offers us a solution for this problem since it easily covers several orders of magnitude. Here is the same distance chart, but on a logarithmic scale, I just took the distances in kilometres and calculated their logarithms.

This is much more comfortable to use, we can get a better idea of the relationships between all these distances.

Like the city examples above, some natural phenomena that span through several orders of magnitude, are more comfortably measured with a logarithmic scale. Some examples are pH, earthquakes and... you guessed it, sound loudness. This is the case, because our ears are ready to process both very quiet and very loud sounds.4
  1. It seems like we animals experience much of the world in a logarithmic way. This also includes sound frequency and light brightness. Here is a cool paper about it.

So the take away here is that we use a logarithmic scale for convenience and because it gives us a more accurate model of nature.

2: A Comparative Unit

Great, so we have now an easy to use scale to measure anything from a whisper to a jet engine, we just need to stick our sound level meter out of the window and check the number. Well, is not that simple. When we say something is 65dB, we are not just making a direct measurement, we are always comparing two values. This is the second reason why decibels are odd, let me elaborate:

Decibels are really the ratio between a certain measured value and a reference value. In other words, they are a comparative unit. Just saying 20dB is incomplete in the same way that just saying 20% is incomplete. We need to specify the reference value we are using. 20% percent of what? 20dB respect to what? So, what kind of reference value could we use? This brings me to the third reason:

3: A Versatile Unit

Although most people associate decibels with sound, they can be used to measure ratios of values of any physical property. These properties can be related to audio (like air pressure or voltage) or they may have little or nothing to do with audio (like light or reflectivity on a radar). Decibels are used in all sort of industries, not only audio. Some examples are electronics, video or optics.

OK, with those three properties in mind, let's sum up what a decibel is.

A decibel is the logarithmically expressed ratio between two physical values

Let that sink in and make sure you really get those three core concepts.
Now, let's see how we can use them to measure sound loudness, that's why we were here if I remember correctly.

In space, nobody can hear you scream

14784812262_6f1534b0e2_b.jpg

As much as Star Wars is trying to convince us on the contrary, sound's energy needs a physical medium to travel through. When sound waves disturb such mediums, there is measurable pressure change as the atoms move back and forth. The louder the sound, the more intense this disturbance is.

Since air is the medium through which we usually experience sound, this gives us the most direct and obvious way of measuring loudness: we just need to register how pressure changes on a particular volume of air. Pressure is measured in Pascals, so we are good to go. But wait, if this is the most direct way of measuring loudness couldn't we just say that a pair of speakers are capable of disturbing the air with a pressure of 6.32 Pascals and forget about decibels?

Well, we could, but again, it wouldn't be very convenient. While the mentioned speakers can reach 6.32 Pascals and this seems like a comfortable number to manage, here are some other examples, from quiet to loud:

 
Source Sound Pressure in Pascals (Pa) Sound Pressure (mPa)
Microsoft's Anechoic Chamber 0.0000019 0.0019
Human Threshold of Hearing @ 1 KHz 0.00002 0.02
Quiet Room 0.0002 0.2
Normal Conversation 0.02 20
Speakers @ 1 meter 6.32 6320
Human Threshold of Pain 63.2 63200
Jet Engine @ 1 meter 650 650000
Rifle shot @ 1 meter 7265 7265000
 

Unless you love counting zeros, that doesn't look very convenient, does it? Note how using Pascals is not very confortable with quiet sounds while mPa (a thousandth of a Pascal) doesn't work very well with loud ones. If our goal is to create a system that measures sound loudness, one of the key things we need is that the unit we use can comfortably cover a large range of values. Several orders of magnitude, actually. To me, that sounds like a job for an logarithmic unit.

Moreover, maybe measuring just naked Pascals doesn't seem like a very useful thing to do when our goal is to just get an idea of how loud stuff is. A better way of doing this, could be to compare our measured value to a reference value and get the ratio between the two. This is starting to sound an awful lot like our previous definition of a decibel! We are getting somewhere.

So, what could we use as a reference level to measure the loudness of sound waves on the air? If you have a look at the table above, you'll notice a very good candidate: the human threshold of hearing. If we do this, 0dB would be the very minimal pressure our ears can detect and after that, the numbers would go up in a comfortable scale as we go up in intensity. Even better, if we measure sounds that are below our ear's threshold the resulting number will be negative, indicating not only that the sound would be imperceptible for us but also saying by how much. That's an elegant system right there. I'm starting to dig decibels.

Now, let's look at the previous Pascals table, but adding now the corresponding decibel values:

 
Source Sound Pressure in Pascals dBSPL
Microsoft's Anechoic Chamber 0.0000019 -20.53
Human Threshold of Hearing @ 1 KHz 0.00002 0
Quiet Room 0.0002 20
Normal Conversation 0.02 60
Speakers @ 1 meter 6.32 110
Human Threshold of Pain 63.2 130
Jet Engine @ 1 meter 650 150
Rifle shot @ 1 meter 7265 171
 

That looks like a much easier scale to use. Remember that dBs are used to measure both very quiet things like anechoic chambers and very loud stuff like space rockets. This scale does a better job for the whole range of human audition, it is fine tuned to those microphones we carry around and call ears.

Here is a nice infographic with some more examples so you get an idea of how some daily sources of sound fit in the decibel scale.

Decibel Flavours

Did you notice that on the table above there is a cute subindex after dB that reads SPL? What's up with that? That subindex stands for Sound Pressure Level and is a particular flavour of decibel. Since decibels can be based on any physical property and since they can use any reference value, we can have many different flavours of decibels depending of which measured property and reference value is more convenient to use in each case.

In the case of dBSPL, this type of decibel is telling us two things. Firstly, that the physical property we are using is pressure. Secondly, that our reference value is the threshold of human hearing. This is fine for measuring loudness on sound waves travelling through the air but, is audio information capable of travelling through other mediums?
AcousticSession.jpg

We have learned to transform the frequency and amplitude information contained in sound waves in the air into grooves in a record or streams of electrons in a cable. That's a pretty remarkable feat that deserves its own post but for now let's just consider that we are able to "code" audio information into flows of electrons that we can measure.

Since dBs can be used with any physical property, we can use units from the realm of electronics like watts or volts to measure loudness in a electrical audio signal. In this sense, both pascals and volts give us an idea of how intense a sound signal is, even though they refer to very different physical properties.

So, we need to establish which units and reference values will be useful to use to build new decibel flavours. We also need to label our particular flavour of dB somehow. This is usually done using a subindex (dBSPL) or a suffix (dBu).

Let's have a look at some of the most used decibel flavours:

dB Unit Property Measured (Unit) Reference Value Used on
dBSPL Pressure (Pascals) 2*10-5 Pascals
(Human Threshold of Hearing)
Acoustics.
dBA, dBB, and dBC Pressure (Pascals) 2*10-5 Pascals
(Human Threshold of Hearing)
Acoustics when accounting for
human sensitivity
to different frequencies.
dBV Electric potential (Volts) 1 Volt Consumer audio equipment.
dBu Electric potential (Volts) 0.7746 Volts Professional audio equipment.
dBm Electric Power (Watts) 1mW Radio, microwave and
fiber-optical communication networks.

As you can see, we can also use units from the electric realm to measure how loud an audio signal is. We will choose the most convenient unit depending on the context. Ideally, when using decibels, the type should be stated although sometimes it has to be inferred by the context.

If you read dB values on a mixer desk, for example, chances are they will be dBu, since this is the unit usually used in professional audio. When shopping for a pair of speakers or headphones, SPL values are usually given. Finally, when measuring things like an office space or a computer fan you will see dBA, dBB or dBC. These units are virtually the same as dBSPL but they apply different weighting filters that account for how we are more sensitive to certain frequencies than others in order to get a more accurate result.

And that's all folks. I left several things out of this post because I wanted to keep it focused on the basics. The decibel has some more mysteries to unravel but I'll leave that for a future post. In the meantime, here are some bullet points to refresh you on what you've learned:

Takeaways

The decibel:

  • Uses the logarithmic scale which works very well when displaying a wide range of values.

  • Is a comparative unit that always uses the ratio between a measured value and a reference value.

  • Can be used with any physical property, not only sound pressure.

  • Uses handy reference values so the numbers we manage are more meaningful.

  • Comes in many different flavours depending on the property measured and the reference value.

Shotgun Microphones Usage Indoors

Note: This is an entry I recovered from the old version of this blog and although is around 5 years old (!), I still think the information can be relevant and interesting. So here is the original post with some grammar and punctuation fixes. Enter 2012 me:

So I have been researching an idea that I have been hearing for a while:

"It’s not a good idea to use a shotgun microphone indoors."

Shotgun microphones

The main goal of these devices is to enhance the on axis signals and attenuate the sound coming form the sides. In other words, make the microphone as directional as possible in order to avoid unwanted noise and ambience.

To achieve this, the system cancels unwanted side audio by delaying it. The operating principle is based on phase cancellation. At first, the system had a series of tubes with different sizes that allowed the on axis signals to arrive early but forces the off-axis signals to arrive delayed. This design, created by the prolific Harry Olson eventually evolved in the modern shotgun microphone design.

Indirect signals arrive delayed. Sketch by http://randycoppinger.com/

In Olson’s original design, in order to improve directivity you had to add more and more tubes, making the microphone too big and heavy to be practical. To solve this, the design evolved into a single tube with several slots that behaved in an equivalent manner to the old additional tubes. These slots made the off-axis sound waves hit the diaphragm later, so when they were combined with the direct sound signal, a noise cancellation occurred, boosting the on-axis signal.

This system has its limitations. The tube needs to be long if we want to cancel low enough frequencies. For example, a typical 30 cm (12″) microphone would start behaving like a cardioid (with a rear lobe) under 1,413 Hz. If we want to go lower, the microphone would need to become too big and heavy. Like this little fellow:

Electro Voice 643, a 2 meters beast that kept it directionality as low as 700 Hz. Call for a free home demostration!

On the other hand, making the microphone longer makes the on-axis angle narrower, so the more directive the microphone is, the more important is a correct axis alignment. The phase cancelation principle also brings consequences like comb filtering and undesirable coloration when we go off axis. This can work against us when is hard to keep the microphone in place, hence this is why these microphones are usually operated by hand or on cranes or boom poles.

In this Sennheiser 416 simplified polar pattern, we can appreciate the directional high frequencies (in red) curling on the sides. The mid frequencies (in blue) show a behaviour somewhere between the highs and a typical cardioid pattern (pictured in green) with a rear lobe.

mg19shotgunrotated.jpeg

This other pattern shows an overall shotgun microphone polar pattern. The side irregularities and the rear lobe are a consequence of the interference system.

Indoor usage

The multiple reflections in a reverberant space, specially the early reflections, will alter how the microphones interprets the signals that reach it. Ideally, the microphone, depending of the incidence angle, will determine if the sound is relevant (wanted signal) or just unwanted noise. When both the signal and noise get reflexed by nearby surfaces they enter the microphone in “unnatural” angles (If we consider natural the direct sound trajectory). The noise then is not properly cancelled since it does not get correctly identified as actual noise. Moreover, part of the useful signal will be cancelled, because it is identified as noise.

For that reason, shotgun microphones will work best outdoors or at least in spaces with good acoustic treatment.

Another aspect to have in mind is the rear lobe that these microphones have. Like we saw earlier this lobe captures specially low frequencies so, again, a bad sounding room that reinforces certain low frequencies is something we want to avoid when using a shotgun microphone. When we have a low ceiling, we are sometimes forced to keep the microphone very close to it so the rear lobe and the proximity effect combines and can make the microphone sound nasty. This is not a problem in a professional movie set where you have high ceilings and good acoustics. In fact, shotgun microphones are a popular choice in these places. 

Lastly, the shotgun size can be problematic to handle in small places, specially when we want precision to keep on axis. 

The alternative

So, for indoors, a better option would be a pencil hipercardioid microphone. They are quite smaller and easier to handle in tight spaces and more forgiving in the axis placement. Moreover, they don’t have an interference tube, so we won't get unwanted colorations from the room reflections.

Is worth noting that these microphones still have a rear lobe that will affect even the mid-high frequencies, but not as pronounced.

So hypercardioid pencil microphones are a great choice for indoors recording. When compared to shotguns, we are basically trading off directionality for a better frequency response and a smaller size.