Pro Tools Batch Rename & Regular Expressions

Batch renaming was introduced into Pro Tools at the end of 2017, with the 12.8.2 version. Since then, I haven’t had much of a chance to use this feature since most of my work has been mixing and sound design. Nevertheless, after some recent days of voice acting recording and all the editing associated, I have been looking a bit into this feature.

So this is a quick summary of what you can do with it with some tips and examples.

Operations

There are two batch rename windows in Pro Tools, one for clips and another for tracks. They are, for the most part, identical. You can open each of them with the following shortcuts:

  • Clips: CTRL + SHIFT + R

  • Tracks: OPTION + SHIFT + R

Both windows also have a preset manager which is great to have.

As you can see, there are four different operations you can do: Replace, Trim, Add and Numbering. As far as I can tell, the different operations are always executed from top to bottom, so keep that in mind when designing a preset. Let’s see each of them in more detail:

Replace (CMD + R) allows you to search for any combination of letters and/or numbers and replace with a different one. The “Clear Existing Name” checkbox allows you to completely remove any previous name the track or clip had. This option makes sense when you want to start from scratch and use any of the other operations (add and numbering) afterwards.

For example, let’s say you don’t like when Pro Tools adds that ugly “dup1” to your track name when duplicating them. You could use a formula like this:

Original names New names

FX 1.dup1 FX 1 Copy
FX 2.dup1 FX 2 Copy
FX 3.dup1 FX 3 Copy

You may realise that this would only work with the first copy of a track. Further copies of the same track will be named “…dup2, ...dup3” so the replace won’t work. There is a way to fix that with the last checkbox, “Regular Expressions”. This allows you to create complex and advanced functions and is where the true power of batch renaming resides. More about it later.

Trim (CMD + T) is useful when you want to shave off a known amount of characters from the beginning or end of the name. You can even use the range option to remove characters right in the middle. This of course makes the most sense when you have a consistent name length, since any difference in size will screw up the process.

So, for example, if you have the following structure and you want to remove the date, you can use the following operation:

Original names New names

Show_EP001_Line001_280819_v01 Show_EP001_Line001_v01
Show_EP001_Line002_280819_v03 Show_EP001_Line002_v03
Show_EP001_Line003_280819_v02 Show_EP001_Line003_v02

Add (CMD + D) lets you insert prefixes and suffixes, pretty much doing the opposite of Trim. You can also insert any text at a certain index in the middle of the name.

We can add to the previous example a suffix to mark the takes that are approved. It would look like this:


Original names New names

Show_EP001_Line001_v01 Show_EP001_Line001_v01_Approved
Show_EP001_Line002_v03 Show_EP001_Line002_v03_Approved
Show_EP001_Line003_v02 Show_EP001_Line003_v02_Approved

Finally, Numbering (CMD + N) is a very useful operation that allows you to add any sequence of numbers or even letters at any index. You can choose the starting number or letter and the increment value. As far as I can tell, this increment value can’t be negative. If you want to use a sequence of letters, you need to check the box “Use A..Z” and in that case the starting number 1 will correspond with the letter “A”.

If we are dealing with different layers for a sound, we could use this function to label them like so:

Original names New names

Plasma_Blaster Plasma_Blaster_A
Plasma_Blaster Plasma_Blaster_B
Plasma_Blaster Plasma_Blaster_C

As you can see, in this case, we are using letters instead of numbers and and underscore to separate them form the name. Also, you can see that in the case of clips, you can choose wether the order comes from the timeline itself of from the clip list.

Regular Expressions

Regular expressions (or regex) are kind of an unified language or syntax used in software to search, replace and validate data. As I was saying this is where the true power of batch renaming is. In fact, it may be a bit overkill for Pro Tools but let’s see some formulas and tips to use regular expressions in Pro Tools.

This stuff gets tricky fast, so you can follow along trying out the examples in Pro Tools or using https://regex101.com/.

Defining searches

First off, you need to decide what do you want to find in order to replace it or delete it (replace with nothing). For this, of course you can search for any term like “Take” or “001” but obviously, you don’t need regex for that. Regex shines when you need to find more general things like any 4 digit number or the word “Mic” followed by optional numbers. Let’s see how we can do all this with some commands and syntax:

[…] Anything between brackets is a character set. You can use “-” to describe a range. For example, “[gjk]” would search for either g, i or k, while [1-6] means any number from 1 to 6. We could use “Take[0-9]“ to search for the word “Take” followed by any 1 digit number.

Curly brackets are used to specify how many times we want to find a certain character set. For example ”[0-9]” would look for any combination of numbers that is 5 digits long. This could be useful to remove or replace a set of numbers like a date which is always constant. You can also use ”[0-9]” to search for any number which is between 5 and 8 digits. Additionally, ”[0-9]” would look for any number longer than 5 digits.

There are also certain special instructions to search for specific sets of charaqcters. “\d” looks for any digit (number) type character, while “\w” would match any letter, digit or underscore character. “\s” finds any whitespace character (normal spaces or tabs).

Modifiers

When defining searches, you can use some modifiers to add extra meaning. Here are some of the most useful:

. (dot or full stop) Matches any character. So, “Take_.” would match any character that comes after the underscore.
+ (plus sign) Any number of characters. We could use “Take_.+” to match any number of character coming after the underscore.
^ (caret) When used within a character set means “everything but whatever is after this character:. So “[^a-d]” would match any character that is not a, b, c or d.
? (question mark) Makes a search optional. So for example, “Mic\d?“ would match the word Mic by itself and also if it has any 1 digit number after it.
* (Asterisk) Also makes a search optional but allowing multiple instances of said search. In a way, is a combination of + and ?. So for example, ”Mic\d*” would match “Mic” by itself, “Mic6” but also “Mic456” and, in general, the word Mic with any number of digits after it.
| (vertical bar) Is used to expressed the boolean “or”. So for example, “Approved|Aproved” would search for either of these options and apply the same processing to both if they are found.

Managing multiple regex in the same preset

You sometimes want to process several sections of a name and replace them with different things, regardless of their position and the content around them. To achieve this, you could create a regex preset for each section but is also possible to have several regex formulas in just one. Let´s see how we can do this.

In the “Find:” section, we need to use (…) (parenthesis). Each section encompased between parenthesis is called a group. A group is just a set of instructions that is processed as a separated entity. So if we want to search for “Track” and also for a 3 digit number we could use a search like this one “(Track)(\d)“. Now, it is important to be careful with what we use between the two groups depending of our goals. With nothing in between, Pro Tools would strictly search for the word track, followed by a 3 digit number. We may want this but tipically what we want is to find those terms wherever in the name and in whichever order. For this, we could use a vertical bar (|) in between the two groups like so: “(Track)|(\d)“ which is telling Pro Tools: hey, search for this or for this and then replace any for whatever.

But what if you want to replace each group for an specific different thing? This is easily done by also using groups in the ¨Replace¨section. You need to indentify each of them with “?1”, “?2” and so on. So the example on the right would search for the word “Track” anywhere in the name and replace ti with “NewTrack” and then it would search for any 3 digit number and replace it with “NewNumbers”

Here is a more complex example, involving 4 different groups. If you have a look at the original names, you will see this structure: “Show_EpisodeNumber_Character_LineNumber”. We would want to change the character and show to the proper names. We are also using a “v” character after the line number to indicate that this is the approved take by the client, it could be nice if we could transform this into the string “Approved”. Finally, Pro Tools adds a dash (-) and some numbers after you edit any clip and we would want to get rid of all of this. If you have a look at our regex, you would see that we can solve all of this in one go. Also, notice how the group order is not important since we are using vertical bars to separate them. You will see that in the third group, I’m searching for anything that comes after a dash and replacing it with just nothing (ie, deleting it), which can be very handy sometimes. So the clip names will change like so:

Original names New names

Show_045_Character_023-01 Treasure_Island_045_Hero_023
Show_045_Character_026v-03 Treasure_Island_045_Hero_026_Approved
Show_045_Character_045v-034 Treasure_Island_045_Hero_045_Approved

Other regex functions that I want to learn in the future

I didn´t have time to learn or figure out everything that I have been thinking regular expressions could do, so here is a list of things I would like to reasearch in the future. Maybe some of them are impossible for now. If you are also interested in achieving some of these things, leave a comment or send me an email and I could have a look in the future.

  • Command that adds the current date with a certain format.

  • Commands that add meta information like type of file, timecode stamp and such.

  • Syntax that allows you to search for a string of characters, process them in some way, and them use it in the replace section.

  • Deal with case sensitivity.

  • Capitalize or uncapitalize characters.

  • Conditional syntax. (If you find some string do A, if you don´t, do B).

Regex Resources:

https://regex101.com/
https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
https://www.youtube.com/playlist?list=PL4cUxeGkcC9g6m_6Sld9Q4jzqdqHd2HiD

Conclusion

I hope you now have a better understanding of how powerful batch renaming can be. With regular expressions I just wanted to give you some basic principles to build upon and have some knowledge to start building more complex presets that can save you a lot of time.

Figuring out: Shepard Tone

The Shepard Tone is an interesting audio illusion that creates the impression of an always rising or falling pitch that really doesn’t get anywhere. Despite feeling like always going up or down, is always stuck in an eternal auditory fractal. How is this possible? And could it be useful for sound design?

The Penrose Stairs is a nice visual equivalent. Depending on perspective, it looks like they are always going up or down but we are really just going in circles.

The Penrose Stairs is a nice visual equivalent. Depending on perspective, it looks like they are always going up or down but we are really just going in circles.

History & Working Principles

Roger Shepard described this idea in his 1964 paper “Circularity in Judgements of Relative Pitch”. The original concept was conceived in a musical context with pitches jumping in discrete steps, aka notes.

He basically stated that to create an apparently always rising melody, we would need to create a circular or looping pattern consisting of sets of ascending notes that are faded in and out in an specific timing.

So we would start with just one ascending scale of notes. This scale will get to the end of the instrument range pretty soon so the trick is to sneak in a new set of notes doing the same thing but fading them in slowly as we fade out the previous set.

If we do this with the proper timing, we get the feeling of an eternally ascending scale.

You can see that working in the following example:

As you can see, the key to get this effect is to use volume to mask the replacement of older octaves with newer octaves that will always start from a lower tone, giving the illusion of an ascending tone overall, despite the average pitch staying constant.

Here is the same concept written in MIDI in Pro Tools. Volume is indicated by the velocity bars below. The lower green notes are rising in volume, while the blue higher ones are fading out. The central red notes stay at the same volume. Again, the average pitch is always the same, but the dynamic changes in the 3 scales provide the illusion of eternal ascension.

Later on, Jean-Claude Risset created a continuous version, called the Shepard-Risset glissando where the pitch glides without discrete jumps, making the overall effect more convincing and seamless.

In this case, the principles stay the same, there is always a new octave gradually fading in to replace the octave that will gradually fade out. This version can be much more useful for Sound Design although to achieve it we need to use instruments or synths that are able to glide through different pitch values in a smooth way.

Risset also tried to apply to concept to rhythm, layering different versions of a beat at proportional tempos (30, 60, 120 & 240 for example) and fading them in and out to create the illusion of an always ascending rhythm. Check out this examples created by music researcher Dan Stowel, below you can see how one of them looks in the spectrogram. Notice the upwards pattern and how the different versions fade in and out.

Screen Shot 2019-07-17 at 16.59.57.png

Building our own designs

Now that we have a good idea of how these effects work, let’s see if we can get creative and build a Shepard-Risset tone that could be useful in a sound design context.

I first tried using Native Instrument’s Form since it is a sample based synthesizer where you can use any sample as a source. I used this tutorial as a starting point.

Basically, you trigger several octaves at once and use two LFOs, one to control the pitch so it’s always rising and a second to control the oscillator level so it rises and then falls. Also, I adjusted the general envelope so sounds have a long attack and release just so they blend together as they come and go. This is the result just using a sine wave:

It basically works but the overlapping is a bit noticeable. I then tweaked the timing and started to play with different sounds:

Form only gives you 30 minutes to demo the plugin so I decided to use the limited time to go for one of the most obvious applications of the Shepard tone: en engine ramping up or down. Here are some of the ones I came up with. Keep in mind that the advantage of generating these is that you have an infinite amount of acceleration and deceleration which can be very handy for later editing.

All of them are quite obvious, you can tell where the sound is re-starting and new octaves are fading in. I think you could fix this playing with the volume values (although I don’t know if an LFO is the most confortable way of doing this) or maybe using more octaves.

Lastly, this last example is interesting because on top of the Shepard effect, I was changing the length of the sample to enhance the feeling of acceleration: as the sample gets shorter, the engine feels to be going faster. I tried to play around with the plugin, kind of driving in real time. This could also have interesting applications for video-games.

After this, my demo expired and I felt I didn’t have enough time to improve the effect and play around with the settings. So I looked for an alternative and after some failed experiments I found “Endless Series”, an specialized Shepard plugin by Oli Larkin.

It offers two synthesiser modes plus four audio processing modes so you can create Shepard tones from scratch or using an audio sample as a base.

There are also a nice amount of variables you can tweak to customise the result. So let’s hear some of the tones I got from this plugin.

First, here are some examples just using the synthesizer built into the plugin. You can create a discrete or a continuous (glissando) tone (Example 1). In the case of a discrete or stepped tone, you can use several different musical scales. A chromatic scale will give you the classical Shepard feel (Example 2) but you can also play with other more exotic ones. In example 3 below I tried creating a dreamy, impressionist whole tone one. Is cool, but that last one doesn’t have much of the going up in pitch feeling.

The plugin also works nicely if you want to create engines. Here are a couple of examples.

As for the audio processing mode, there are different effects that you can apply. The simplest mode, am input, just applies the Shepard processing to the sample as far as I can tell. It works in an strange way with tonal content, the Shepard effect is not very pronounced and it adds a descending tone for some reason which doesn’t help. Here is this mode applied to just a sine wave:

This same mode goes nuts with more noisier content. Here is another example using an engine sound. As you can hear, the am input mode introduces a lot of noise and artefacts. I tried playing around with the settings and using other source material but I could not make it sound clean. I don’t know if I’m missing something.

But there are two other modes that can give better results. There is a flanger and a phaser setting. As you can hear, they sound much cleaner, although the effect is quite mild in the case of the phaser. I just wish there was a way to have a “sheparded” sound as clean as this but without the flanger effect on top.

In summary, I feel that I didn’t find the perfect “Shepard Machine” but I’m sure that there are other options out there. I was also thinking that there is probably no plugin that can do everything perfectly (like sample based and synth based and musical options, etc…) so maybe an array of different plugins may be needed for different purposes.

Use in media

Shepard tones have been used in several music and film projects, sometimes in a subtle way, other times quite explicitly.

In music, they can give a very trippy and psychedelic feel (see Pink Floyd’s “Echoes” below) or they can be used to create rising tension (Used extensively in movies like “Dunkirk” or “Flight”). As some of my examples from before showed, they can also be used to create fantasy or sci-fi vehicle engines. In “The Dark Knight”, Nolan wanted the batpod to feel like an unstoppable force that doesn’t even shift gears which sounds like a perfect use case for the Shepard tone.

But probably my favourite example, and maybe this is just nostalgia, is in Super Mario 64, which features an endless staircase that you need to overcome to get to the final boss. The game gives you the illusion of an eternal ascend but you are just running on a “virtual treadmill” and getting nowhere. Analogously, the music is using a Shepard tone to achieve the same effect, an apparent ascension that is really just circular. A great example of a Shepard tone used in an interactive environment.

Exploring Sound Design Tools: Morph 2

Morph 2, made by the german company Zynaptiq, is based on the original Morph plugin made by Prosoniq years ago. It applies a very simple but powerful concept: creating a hybrid between two different sounds fusing together timbre and dynamic characteristics.

Let’s see what the plugin offers plus some sound design examples.

Setup

Morph 2 works by combining two mono or stereo tracks into a new stereo or quad auxiliar track. In the screenshot there, you can see two stereo tracks being used as sources. This is the method recommended by Zynaptiq.

There is also a side-chain option but it only supports mono sources.

Features & Interface

Screen Shot 2019-06-20 at 09.39.33.png

As you can see, the interface is quite clean and simple. Let’s see which features Morph offers:

The X/Y Section

This central section combines a crossfade and morphing control in a X/Y type interface. This may look simple at a first glance but it has some interesting properties. So starting at the bottom left corner and moving vertically upwards you would be morphing from the first sound (A) to the second (B). If you do the same on the right side you would be morphing from B to A, which would result in a different result. The directionality (from A to B vs from B to A) is relevant and will affect the output.

As far as I can tell, Morph is taking the timbre profile from the first sound and applying it to the second’s timbre and dynamics. Because of this, it is good practice to experiment with all possible combinations when designing a sound, since the results are going to be quite different, as you can hear below.

On the other hand, the X axis is simply a crossfade or blend between those two asymmetrical morphings. So remember, the Y axis (vertical movements) control the morphing while the X axis (horizontal movements) crossfades between them.

Here is an example using human voice and a metallic sound to create a sort of robotic, vocoder-ish sound. The first two sounds are the basic components we are using. The ones below are the morphed result with the X-Axis all the way to the right or to the left but in the middle between sources A and B in both cases.

As you can hear, the right side result is probably what we were looking for: we keep the speech dynamics but use the metallic tonality, while the timbre is a mix between both. The other result is kind of a reversed image of that, we keep the voice tonality but we hear it with a dark, metallic timbre and using the metal impact dynamics. Maybe not what we were looking for in this case, but as I said before worth checking both possibilities while creating sounds.

Of course, since this is a two-dimensional pad we could also use a custom blend between these two results.

Mixing Section

This simple section lets you add some of the unaltered original sounds to the output, while also controlling the level of the morphed signal.

Solo and bypass controls are also included.

Algorithms

There are 3 basic algorithms to choose from, each of them offers a different behaviour.

Classic is a good starting point with the highest frequency resolution, sacrificing time resolution. So it is best to use this option when timbre shaping is the main goal.

Interweave retains more of the first sound character instead creating morphed features. This may help to create more natural sounding results. So if the classic algorithm gives you a result that feels too extreme you can try this one instead.

The Tight algorithm offers the best time resolution so it works well with percussive sounds. This of course is in detriment of the frequency resolution but this doesn’t need to be a bad thing, the result could be interesting.

Additionally you also have lower latency versions for the classic and interweave algorithms.

Processing Section

This section offers 3 additional controls to shape our design.

The Formants trackball slider applies formant shifting up or down which can be handy when doing vocoder type sounds or just any sound in general. It kind of works as a pitch up/down control.

Amp Sense will adjust the maximum level of the newly combined (morphed) audio timbres while using the classic algorithm. You can reduce this value if the resulting combined sound is too harsh or resonant. For the other two algorithms, the sliders acts evening out the levels of the loudest and quietest component, making them more balanced.

Finally, the Complexity slider is connected to “the resolution” of the whole processing. Higher values give more detail but if both sounds are very different, reducing this may help and will introduce larger sections of the original sound in the final output.

Here is the same morphed sound but using different levels of complexity. As you can hear, it almost works as a tonality vs noise slider in this case:

Reverb

We can also find a simple reverb module in Morph with controls for Wet/Dry mix, size and damping for high frequency attenuation.

This is handy for giving designs a quick listen in a relevant acoustic context or just giving sounds some extra flavour.

Examples

Now that we know the basic inner workings of the plugin, let’s see some more examples that I created while playing around.

Blending Timbres

This is probably the most obvious case use for Morph: mixing two timbres together into a hybrid that keeps features from the parents but has a new life of its own. Here is an alien computer SFX, for example:

Or we can create a funny cartoony engine using an old car recording and a vocal sample:

Or some sort of steampunk machine malfunctioning:

Transferring Dynamics

A different use we can give Morph is to “capture” the dynamic characteristics from one sound and applying them to the other. In this case, the resulting timbre is almost a 100% coming from one of the elements only, although some blending can also be cool.

As you can hear in this example, we are using the helicopter’s rhythmic footprint and applying it onto the drone’s timbre to create a morphed sci-fi engine element. The Formant slider was handy to alter the “size” of the sound.

Or we can use a car’s passing-by dynamics to shape the stereo image and amplitude of a water recording and create some sort of water element for a spell, for example.

Voices & Creatures

This is another use we can give Morph. If we combine a human or animal vocal sound with any other element, we can create otherworldly voices and creatures. If the sound we use has a constant tone, the result will be similar to a vocoder.

Here is a simple example with a human voice and a metal resonance:

We could also create a rock monster morphing growls and rock sounds:

Or create a scary voice. Is impressive how much you can change the original source by playing with blending layers, formants and the complexity slider:

Conclusions

Although simple in concept and features, Morph 2 is a very good tool to have as a sound designer. Morphing two sounds together is a very intuitive way to approach audio creativity. Is not always the case that you get something unique but when you do, is a great feeling to “give birth” to a new sound that shares timbre or dynamic features from the parent sound but stands on its own too.

I just gave a few examples on what you do with it, but I’m sure much more if possible. If you are interested, you can pick up the demo in Zynaptiq’s website.

Figuring out: Audio Pull up/down

When working with video, an audio pull up or pull down is needed when there´s being a change in the picture´s frame rate and you need to tweak the audio to make sure it stays in sync.

This subject is somehow always surrounded by a layer of mysticism and confusion so this is my attempt of going through the basics and hopefuly get some clarity.

Audio Sampling Rate

First, we need to understand some basic digital audio concepts. Feel free to skip this if you have it fresh.

Whenever we are converting an audio signal from analogue to digital, all we are doing is checking where the waveform is at certain “points” in its oscilation. These “points” are usually called samples.

In order to get a faithful signal, we need to sample our waveforms many times. The number of times we do this per second is what determines sampling rate and is measured in Hertzs.

Keep in mind that if our sampling rate is not fast enough, we won´t be able to “capture” the higher frequencies since these would fluctuate faster than we can measure. So how fast do we need to be for accurate results?

The Nyquist-Shannon sampling theorem gives us a very good estimation. It basically says that we need about twice the sampling rate of the highest frequency we want to capture. Since the highest frequency humans can hear is around 20Khz, a sampling rate of 40Khz should suffice. Once we know this, let´s see the most comonly used sampling rates:

Sampling Rate Use
8 KhZ Telephones, Walkie-Talkies
22 Khz Low quality digital audio
44.1 Khz CD quality, the music standard.
48 KHz The standard for professional video.
96 Khz DVD & Blu-ray audio
192 Khz DVD & Blu-ray audio.
This is usually the highest sampling rate for professional use.

As you can see, most professional formats use a sampling rate higher than 40 Khz to guarantee that we capture the full frequency spectrum. Something that is important to remember and that will become relevant later on is that a piece of audio is always going to be the same lenght as long as it is played at the same sample rate that it was recorded.

For the sake of completion, I just want to mention audo resolution (or bit depth) briefly. This is the other parameter that we need to take into consideration when converting to digial audio. It measures hoy many bits we use to encode the information of each of our samples. Higher values will give us more dynamic range, since a bigger range of intensity values will be captured. This doesn´t really affect the pull up/down process.

Frames per second in video

Let´s now jump to the realm of video. There´s a lot to be said on the subject of frame rate but I will just keep it short. This value is simply how many pictures per second are put together to create our film or video. 24 frames per second (or just fps) is the standard for cinema, while TV uses 25 fps in europe (PAL) and 29.97 fps in the US (NTSC).

Keep in mind that these frame rates are different not only on a technical level but also on a stylistic level. 24 fps “feels” cinematic and “premium” while sometimes the higher frame rates used in TV feel “cheap”. This is probably a cultural perception and is definitely changing. Videogames, which many times use high frame rates like 60 fps and beyond, are partially responsible for this taste shift. The amount of motion is also very important, higher fps will be the best at showing fast motions.

But how can these different frame rates affect audio sync? The problem usually starts when a project is filmed at a certain rate and then converted to a different one for distribution. This would happen if, for example, a movie (24 fps) is brought into european TV (25 fps) or an american TV programme (29.97 fps) is brought into India, which uses PAL (25 fps).

Let´s see how this kind of conversion is done.

Sampling Rate vs Frame Rate

Some people think that audio can be set to be recorded at a certain frame rate the same way it can be set to be recorded at a certain sampling frequency. This is not true. Audio doesn´t intrinsically have a frame rate value the same way it has a bit depth and sampling rate.

If I give you an audio file and nothing else, you could easily figure out the bit depth and sampling rate but you would have no idea about the frame rate used on the associated video. Now, and here comes the nuanced but important point, any audio recorded at the same time with video will sync with the specific frame rate used when recording that video. They will sync because they were reocrded together. They will sync because what the camera registered as a second of video was also a second of audio in the sound recorder. Of course, machines are not perfect and their clocks may measure a second slightly different and that’s why we connect them via timecode but that’s another story.

This session is set at 24 fps, so each second is divided into 24 frames.

Maybe this confussion comes from the fact that when you create a new session or project in your DAW, you basically set three things: sampling rate, bit depth and frame rate. So it feels like the audio that is going to be inside is going to have those three intrinsic values. But that is not the case with frame rate. In the context of the session, frame rate is only telling your DAW how to divide a second. Into 24 slices? That would be 24 fps. Into 60 slices? That´s 60 fps.

In this manner, when you bring your video into your DAW, the video´s burnt in timecode and your DAW’s timecode will be perfectly in sync but all of this will change nothing about the duration or quality of the audio within the session.

So, in summary, an audio file only has an associated frame rate in the context of the video it was recorded with or to but this is not an intrinsic charactheristic of this audio file and cannot be determined without the corresponding video.

Changing Frame Rate

A frame rate change is usually needed when the medium (cinema, TV, digital…) or the region changes. There are two basic ways of doing this. One of them is able to do it without changing the final duration of the film, usually by re-distributing, duplicating or deleting frames to accomodate the new frame rate. I won’t go into details on these methods partly because they are quite complex but mostly because if the lenght of the final picture is not changed, we don´t need to do anything to the audio. It will be in sync anyway.

Think about this for a second. We have changed the frame rate of the video but, as long as the final leght is the same, our audio is still in sync which kind of shows you that audio has no intrinsic frame rate value. Disclaimer: This will be true as long as the audio and film are kept separated. If audio and picture are on the same celluloid and then you start moving frames around, obviously you are going to mess up the audio but in our current digital age we don’t need to worry about this.

The second method is the one that concern us. This is, when the lenght of the picture is actually changed. This happens because this is the easiest way to fix the frame rate difference, specially if it is not very big.

Telecine. How video frame rate affects audio.

Let´s use the Telecine case as an example. Telecine is the process of transfering a old fashion analogue film into video. This is not always the case but this usually also implies a change in frame rate. As we saw earlier, films are traditionally shot at 24 fps. If we want to broadcast this film in european television, which uses the PAL system at 25 fps, we would need to go from 24 to 25 fps.

The easiest way to do this is just play the original film 4% faster. The pictures will look faster and the movie will finish earlier but the difference would be tolerable. Also, if you can show the same movie in less time in TV that gives you more time for commercials, so win, win.

What are the drawbacks? First, showing the pictures a 4% faster may be tolerable but is not ideal and can be noticeable in quick action sequences. Second and more importantly, now our audio will be out of sync. We can always fix this by also playing the audio a 4% faster (and this would traditionally be the case since audio and picture were embed in the same film) but in this case, the pitch will be increased by 0.68 semitones.

In the digital realm, we can achieve this by simply playing the audio at a different rate that was recorded. This would be the digital equivalent to just cranking the projector faster. Remember before when I said that an audio file will always be the same leght if it is played at the same saple rate as recorded? This is when this becomes relevant. As you can see below, if we play a 48 KHz file at 50 KHz, we would get the same speed up effect that a change from 24 to 25 fps provides.

This would solve our sync problems, but as we were saying, it would increase the final pitch of the audio by about 0.68 semitones.

That increase in pitch may sound small but can be quite noticeable, specially in dialogue musical sections. So how do we solve this? For many years the simple answer was nothing. Just leave it as it is. But nowadays we are able to re-pitch the resulting audio so it matches its original sound or, alternativaly, we can directly change the lenght of the audio file without affecting the pitch. More on tese methods later but first let’s see what happens if, instead of doing a reasonable jump from film to PAL, we need to go from film to NTSC.

Bigger frame rate jumps, bigger problems (but not for us).

If a jump from 24 to 25 is a 4% change, a jump between 24 to 29.976 would be a whooping 24.9%. That´s way too much and it would be very noticeable. Let´s not even think about the audio, everybody would sound as a chipmunk. So how is this accomplished? The method used is what is called a “2:3 pulldown”.

Now, this method is quite involved so I’m not going to explain the whole thing here but let’s see the basics and how it will affect our audio. First let´s start with 30 fps as this was the original frame rate for TV in NTSC. This makes sense because the electrical grid works at 60 Hz in the states. But as people who, for some reason, are happy living this way, things were bound to get messy and after color TV was introduced and for reasons you can see here, the frame had to be dropped by a 1/1000th to 29.976.

A 2:3 pulldown uses the proportion of frames and the interlaced nature of the resulting video to make 4 frames fit into 5. This is because a 24/30 proportion would be equal to a 4/5 proportion. Again, this is complex and goes beyond the scope of this article but if you want more details this video can help.

But wait, we don’t want to end up with 30 frames, we need 29.97 and this is why the first step we do is slow down the film from 24 fps to 23.976. This difference is impossible to detect but crucial to make our calculations work. Once this is done, we can do the actual pulldown which doesn´t change further the lenght of the film, it only re-arranges the frames.

What does this all mean for us, audio people? It means that we only need to worry about that initial change from 24 to 23.976 which would just be a 0.1 % change. That’s small but it will still throw your audio out of sync during the lenght of a movie. So we just need to adjust the speed in the same way we do for the 4% change. If you look again at the picture above, you’ll see that that 0.1% is the change we need to use to go from film to NTSC.

As for the change in pitch, it will be very small but we can still correct it if we need with the methods I show you below. But before that, here is a table for your convenience with all the usual frame changes and the associated audio change that would be needed.

Frame Rate Change Audio Speed Change Pitch Correction (If needed)
Film to PAL 4% Up 4% Down // 96% // -0.71 Semitones
Film to NTSC 0.1% Down 0.1% Up // 100.1% // + 0.02 Semitones
PAL to Film 4% Down 4% Up // 104% // +0.68 Semitones
PAL to NTSC 4.1% Down 4.1% Up //104.1% // +0.68 Semitones
NTSC to Film 0.1% Up 0.1% Down // 99.9% // -0.02 Semitones
NTSC to PAL 4.1% Up 4.1% Down // 95.9% // -0.89 Semitones

Techniques & Plugins

There are two basic methods to do a pull up or pull down. The first involves two steps: first changing the duration of the file while affecting its pitch (using a different sample rate as explained before) and secondly applying pitch correction to match the original’s tone. The way to actually do the first step depends on your DAW but in Pro Tools, for example, you’ll see that when importing audio you have the option to apply SRC (Sample Rate Conversion) to the file as pictured above.

The second method is simply doing all at once with a plugin capable of changing the lenght of an audio file without affecting its pitch.

Also, keep in mind that these techniques can be applied to not only the stereo or the surround final mix file but also the whole session itself, which would give you much more flexibility to adjust your mix on this new version. This makes sense because a 4% change in speed could be enough to put two short sounds too close together and/or the feel of the mix could be a bit different. Personally, I have only used this “whole session” technique with shorter material like commercials. Here is a nice blog post that goes into detail about how to accomplish this.

As for changing a mixed file as a whole, wether you use a one step or two steps method, you will probably find that is easy to introduce glitches, clicks and pops in the mix. Sometimes you get dialogue that sounds metallic. Phase is also an issue, since the time/pitch is not always consistent between channels.

The thing is, time/pitch shift is not a easy thing to accomplish. Some plugins offer different algorithms to choose from depending on the type of material you have. These are designed with music in mind, not dialogue, so “Polyphonic” is the one that is usually the best option for whole mixes. Another trick you can use is to bounce your mix into stems: music, dialogue, FX, ambiences, etc and then apply the shift to each of them indepentdently, applying the best plugin and algorithm to each. This can be very time consuming but will probably give you the best results.

As you can see, this whole process is kind of tricky, particularly the pitch shift step and this is why in some occassions the audio is corrected for sync but left at the wrong pitch. Nevertheless, nowadays we have better shifting plugins to do the job. Here are some of the most commonly used, although remember that non of these works perfect in every ocassion:

-Zplane Elastique: This is in my opinion the best plugin and the one I personally use. It produces the least artefacts, keeps phase coherent and works great on whole mixes, even with single step processing.
-Pro Tools Pitch Shift: This is the stock time/pitch plugin that comes with Pro Tools. It is quite fast but is prone to create artifacts.
-Pro Tools X-Form: This one is more advanced (comes blunded with Pro Tools Ultimate) but it still suffers from some issues like giving dialogue a metallic tone or mesing the phase on stereo and surround. Also, it is slow. Veeeery slow.
-Serato Pitch n Time: I haven’t tried this either but I had to mention it since it is very commonly used and people swear by it.
-Izotope Time & Pitch: It can work well sometimes and offers many customizable settings that you can adjust to avoid artefacts.
-Waves Sound Shifter: Haven´t used it but it’s another option that seems to work well for some applications.

Which one should you choose? There is no clear answer, you will need to experiment with some of them to see what works for each project. Here is a good article and video comparing some of them.

Conclusions

I hope you now have somehow a better understanding on this messy subject. It is tricky from both a theoretical and practical level but I believe is worth figuring out where things come from instead of just doing what others do without really knowing why. Here are some takeaways:

  • Sampling rate and bit depth are intrinsic to an audio file.

  • At the same time, an audio file can be associated to a certain video frame rate when they are both in sync.

  • The frame rate change process is different depending on the magnitud of the change.

  • An audio pull up or pull down is needed when there is a frame rate chenge on the picture that affects its lenght.

  • The pull up/down can be done in two steps: lenght change first, then pitch correction or ir can be done in a single step.

  • Time/Pitch Shift is a complicated process that can produce artefacts, metallic timbres and phase issues.

  • Mixes can be processed by stems or even as whole sessions for more flexibility.

  • Try different plugins and algorithms to improve results.

Thanks for reading!