Figuring out: Shepard Tone

The Shepard Tone is an interesting audio illusion that creates the impression of an always rising or falling pitch that really doesn’t get anywhere. Despite feeling like always going up or down, is always stuck in an eternal auditory fractal. How is this possible? And could it be useful for sound design?

The Penrose Stairs is a nice visual equivalent. Depending on perspective, it looks like they are always going up or down but we are really just going in circles.

The Penrose Stairs is a nice visual equivalent. Depending on perspective, it looks like they are always going up or down but we are really just going in circles.

History & Working Principles

Roger Shepard described this idea in his 1964 paper “Circularity in Judgements of Relative Pitch”. The original concept was conceived in a musical context with pitches jumping in discrete steps, aka notes.

He basically stated that to create an apparently always rising melody, we would need to create a circular or looping pattern consisting of sets of ascending notes that are faded in and out in an specific timing.

So we would start with just one ascending scale of notes. This scale will get to the end of the instrument range pretty soon so the trick is to sneak in a new set of notes doing the same thing but fading them in slowly as we fade out the previous set.

If we do this with the proper timing, we get the feeling of an eternally ascending scale.

You can see that working in the following example:

As you can see, the key to get this effect is to use volume to mask the replacement of older octaves with newer octaves that will always start from a lower tone, giving the illusion of an ascending tone overall, despite the average pitch staying constant.

Here is the same concept written in MIDI in Pro Tools. Volume is indicated by the velocity bars below. The lower green notes are rising in volume, while the blue higher ones are fading out. The central red notes stay at the same volume. Again, the average pitch is always the same, but the dynamic changes in the 3 scales provide the illusion of eternal ascension.

Later on, Jean-Claude Risset created a continuous version, called the Shepard-Risset glissando where the pitch glides without discrete jumps, making the overall effect more convincing and seamless.

In this case, the principles stay the same, there is always a new octave gradually fading in to replace the octave that will gradually fade out. This version can be much more useful for Sound Design although to achieve it we need to use instruments or synths that are able to glide through different pitch values in a smooth way.

Risset also tried to apply to concept to rhythm, layering different versions of a beat at proportional tempos (30, 60, 120 & 240 for example) and fading them in and out to create the illusion of an always ascending rhythm. Check out this examples created by music researcher Dan Stowel, below you can see how one of them looks in the spectrogram. Notice the upwards pattern and how the different versions fade in and out.

Screen Shot 2019-07-17 at 16.59.57.png

Building our own designs

Now that we have a good idea of how these effects work, let’s see if we can get creative and build a Shepard-Risset tone that could be useful in a sound design context.

I first tried using Native Instrument’s Form since it is a sample based synthesizer where you can use any sample as a source. I used this tutorial as a starting point.

Basically, you trigger several octaves at once and use two LFOs, one to control the pitch so it’s always rising and a second to control the oscillator level so it rises and then falls. Also, I adjusted the general envelope so sounds have a long attack and release just so they blend together as they come and go. This is the result just using a sine wave:

It basically works but the overlapping is a bit noticeable. I then tweaked the timing and started to play with different sounds:

Form only gives you 30 minutes to demo the plugin so I decided to use the limited time to go for one of the most obvious applications of the Shepard tone: en engine ramping up or down. Here are some of the ones I came up with. Keep in mind that the advantage of generating these is that you have an infinite amount of acceleration and deceleration which can be very handy for later editing.

All of them are quite obvious, you can tell where the sound is re-starting and new octaves are fading in. I think you could fix this playing with the volume values (although I don’t know if an LFO is the most confortable way of doing this) or maybe using more octaves.

Lastly, this last example is interesting because on top of the Shepard effect, I was changing the length of the sample to enhance the feeling of acceleration: as the sample gets shorter, the engine feels to be going faster. I tried to play around with the plugin, kind of driving in real time. This could also have interesting applications for video-games.

After this, my demo expired and I felt I didn’t have enough time to improve the effect and play around with the settings. So I looked for an alternative and after some failed experiments I found “Endless Series”, an specialized Shepard plugin by Oli Larkin.

It offers two synthesiser modes plus four audio processing modes so you can create Shepard tones from scratch or using an audio sample as a base.

There are also a nice amount of variables you can tweak to customise the result. So let’s hear some of the tones I got from this plugin.

First, here are some examples just using the synthesizer built into the plugin. You can create a discrete or a continuous (glissando) tone (Example 1). In the case of a discrete or stepped tone, you can use several different musical scales. A chromatic scale will give you the classical Shepard feel (Example 2) but you can also play with other more exotic ones. In example 3 below I tried creating a dreamy, impressionist whole tone one. Is cool, but that last one doesn’t have much of the going up in pitch feeling.

The plugin also works nicely if you want to create engines. Here are a couple of examples.

As for the audio processing mode, there are different effects that you can apply. The simplest mode, am input, just applies the Shepard processing to the sample as far as I can tell. It works in an strange way with tonal content, the Shepard effect is not very pronounced and it adds a descending tone for some reason which doesn’t help. Here is this mode applied to just a sine wave:

This same mode goes nuts with more noisier content. Here is another example using an engine sound. As you can hear, the am input mode introduces a lot of noise and artefacts. I tried playing around with the settings and using other source material but I could not make it sound clean. I don’t know if I’m missing something.

But there are two other modes that can give better results. There is a flanger and a phaser setting. As you can hear, they sound much cleaner, although the effect is quite mild in the case of the phaser. I just wish there was a way to have a “sheparded” sound as clean as this but without the flanger effect on top.

In summary, I feel that I didn’t find the perfect “Shepard Machine” but I’m sure that there are other options out there. I was also thinking that there is probably no plugin that can do everything perfectly (like sample based and synth based and musical options, etc…) so maybe an array of different plugins may be needed for different purposes.

Use in media

Shepard tones have been used in several music and film projects, sometimes in a subtle way, other times quite explicitly.

In music, they can give a very trippy and psychedelic feel (see Pink Floyd’s “Echoes” below) or they can be used to create rising tension (Used extensively in movies like “Dunkirk” or “Flight”). As some of my examples from before showed, they can also be used to create fantasy or sci-fi vehicle engines. In “The Dark Knight”, Nolan wanted the batpod to feel like an unstoppable force that doesn’t even shift gears which sounds like a perfect use case for the Shepard tone.

But probably my favourite example, and maybe this is just nostalgia, is in Super Mario 64, which features an endless staircase that you need to overcome to get to the final boss. The game gives you the illusion of an eternal ascend but you are just running on a “virtual treadmill” and getting nowhere. Analogously, the music is using a Shepard tone to achieve the same effect, an apparent ascension that is really just circular. A great example of a Shepard tone used in an interactive environment.

Exploring Sound Design Tools: Morph 2

Morph 2, made by the german company Zynaptiq, is based on the original Morph plugin made by Prosoniq years ago. It applies a very simple but powerful concept: creating a hybrid between two different sounds fusing together timbre and dynamic characteristics.

Let’s see what the plugin offers plus some sound design examples.

Setup

Morph 2 works by combining two mono or stereo tracks into a new stereo or quad auxiliar track. In the screenshot there, you can see two stereo tracks being used as sources. This is the method recommended by Zynaptiq.

There is also a side-chain option but it only supports mono sources.

Features & Interface

Screen Shot 2019-06-20 at 09.39.33.png

As you can see, the interface is quite clean and simple. Let’s see which features Morph offers:

The X/Y Section

This central section combines a crossfade and morphing control in a X/Y type interface. This may look simple at a first glance but it has some interesting properties. So starting at the bottom left corner and moving vertically upwards you would be morphing from the first sound (A) to the second (B). If you do the same on the right side you would be morphing from B to A, which would result in a different result. The directionality (from A to B vs from B to A) is relevant and will affect the output.

As far as I can tell, Morph is taking the timbre profile from the first sound and applying it to the second’s timbre and dynamics. Because of this, it is good practice to experiment with all possible combinations when designing a sound, since the results are going to be quite different, as you can hear below.

On the other hand, the X axis is simply a crossfade or blend between those two asymmetrical morphings. So remember, the Y axis (vertical movements) control the morphing while the X axis (horizontal movements) crossfades between them.

Here is an example using human voice and a metallic sound to create a sort of robotic, vocoder-ish sound. The first two sounds are the basic components we are using. The ones below are the morphed result with the X-Axis all the way to the right or to the left but in the middle between sources A and B in both cases.

As you can hear, the right side result is probably what we were looking for: we keep the speech dynamics but use the metallic tonality, while the timbre is a mix between both. The other result is kind of a reversed image of that, we keep the voice tonality but we hear it with a dark, metallic timbre and using the metal impact dynamics. Maybe not what we were looking for in this case, but as I said before worth checking both possibilities while creating sounds.

Of course, since this is a two-dimensional pad we could also use a custom blend between these two results.

Mixing Section

This simple section lets you add some of the unaltered original sounds to the output, while also controlling the level of the morphed signal.

Solo and bypass controls are also included.

Algorithms

There are 3 basic algorithms to choose from, each of them offers a different behaviour.

Classic is a good starting point with the highest frequency resolution, sacrificing time resolution. So it is best to use this option when timbre shaping is the main goal.

Interweave retains more of the first sound character instead creating morphed features. This may help to create more natural sounding results. So if the classic algorithm gives you a result that feels too extreme you can try this one instead.

The Tight algorithm offers the best time resolution so it works well with percussive sounds. This of course is in detriment of the frequency resolution but this doesn’t need to be a bad thing, the result could be interesting.

Additionally you also have lower latency versions for the classic and interweave algorithms.

Processing Section

This section offers 3 additional controls to shape our design.

The Formants trackball slider applies formant shifting up or down which can be handy when doing vocoder type sounds or just any sound in general. It kind of works as a pitch up/down control.

Amp Sense will adjust the maximum level of the newly combined (morphed) audio timbres while using the classic algorithm. You can reduce this value if the resulting combined sound is too harsh or resonant. For the other two algorithms, the sliders acts evening out the levels of the loudest and quietest component, making them more balanced.

Finally, the Complexity slider is connected to “the resolution” of the whole processing. Higher values give more detail but if both sounds are very different, reducing this may help and will introduce larger sections of the original sound in the final output.

Here is the same morphed sound but using different levels of complexity. As you can hear, it almost works as a tonality vs noise slider in this case:

Reverb

We can also find a simple reverb module in Morph with controls for Wet/Dry mix, size and damping for high frequency attenuation.

This is handy for giving designs a quick listen in a relevant acoustic context or just giving sounds some extra flavour.

Examples

Now that we know the basic inner workings of the plugin, let’s see some more examples that I created while playing around.

Blending Timbres

This is probably the most obvious case use for Morph: mixing two timbres together into a hybrid that keeps features from the parents but has a new life of its own. Here is an alien computer SFX, for example:

Or we can create a funny cartoony engine using an old car recording and a vocal sample:

Or some sort of steampunk machine malfunctioning:

Transferring Dynamics

A different use we can give Morph is to “capture” the dynamic characteristics from one sound and applying them to the other. In this case, the resulting timbre is almost a 100% coming from one of the elements only, although some blending can also be cool.

As you can hear in this example, we are using the helicopter’s rhythmic footprint and applying it onto the drone’s timbre to create a morphed sci-fi engine element. The Formant slider was handy to alter the “size” of the sound.

Or we can use a car’s passing-by dynamics to shape the stereo image and amplitude of a water recording and create some sort of water element for a spell, for example.

Voices & Creatures

This is another use we can give Morph. If we combine a human or animal vocal sound with any other element, we can create otherworldly voices and creatures. If the sound we use has a constant tone, the result will be similar to a vocoder.

Here is a simple example with a human voice and a metal resonance:

We could also create a rock monster morphing growls and rock sounds:

Or create a scary voice. Is impressive how much you can change the original source by playing with blending layers, formants and the complexity slider:

Conclusions

Although simple in concept and features, Morph 2 is a very good tool to have as a sound designer. Morphing two sounds together is a very intuitive way to approach audio creativity. Is not always the case that you get something unique but when you do, is a great feeling to “give birth” to a new sound that shares timbre or dynamic features from the parent sound but stands on its own too.

I just gave a few examples on what you do with it, but I’m sure much more if possible. If you are interested, you can pick up the demo in Zynaptiq’s website.

Figuring out: Audio Pull up/down

When working with video, an audio pull up or pull down is needed when there´s been a change in the picture´s frame rate and you need to tweak the audio to make sure it stays in sync.

This subject is somehow always surrounded by a layer of mysticism and confusion so this is my attempt of going through the basics and hopefuly get some clarity.

Audio Sampling Rate

First, we need to understand some basic digital audio concepts. Feel free to skip this if you have it fresh.

Whenever we are converting an audio signal from analogue to digital, all we are doing is checking where the waveform is at certain “points” in its oscilation. These “points” are usually called samples.

In order to get a faithful signal, we need to sample our waveforms many times. The number of times we do this per second is what determines sampling rate and is measured in Hertzs.

Keep in mind that if our sampling rate is not fast enough, we won´t be able to “capture” the higher frequencies since these would fluctuate faster than we can measure. So how fast do we need to be for accurate results?

The Nyquist-Shannon sampling theorem gives us a very good estimation. It basically says that we need about twice the sampling rate of the highest frequency we want to capture. Since the highest frequency humans can hear is around 20Khz, a sampling rate of 40Khz should suffice. Once we know this, let´s see the most comonly used sampling rates:

Sampling Rate Use
8 KhZ Telephones, Walkie-Talkies
22 Khz Low quality digital audio
44.1 Khz CD quality, the music standard.
48 KHz The standard for professional video.
96 Khz DVD & Blu-ray audio
192 Khz DVD & Blu-ray audio.
This is usually the highest sampling rate for professional use.

As you can see, most professional formats use a sampling rate higher than 40 Khz to guarantee that we capture the full frequency spectrum. Something that is important to remember and that will become relevant later on is that a piece of audio is always going to be the same lenght as long as it is played at the same sample rate that it was recorded.

For the sake of completion, I just want to mention audo resolution (or bit depth) briefly. This is the other parameter that we need to take into consideration when converting to digial audio. It measures hoy many bits we use to encode the information of each of our samples. Higher values will give us more dynamic range, since a bigger range of intensity values will be captured. This doesn´t really affect the pull up/down process.

Frames per second in video

Let´s now jump to the realm of video. There´s a lot to be said on the subject of frame rate but I will just keep it short. This value is simply how many pictures per second are put together to create our film or video. 24 frames per second (or just fps) is the standard for cinema, while TV uses 25 fps in europe (PAL) and 29.97 fps in the US (NTSC).

Keep in mind that these frame rates are different not only on a technical level but also on a stylistic level. 24 fps “feels” cinematic and “premium” while sometimes the higher frame rates used in TV feel “cheap”. This is probably a cultural perception and is definitely changing. Videogames, which many times use high frame rates like 60 fps and beyond, are partially responsible for this taste shift. The amount of motion is also very important, higher fps will be the best at showing fast motions.

But how can these different frame rates affect audio sync? The problem usually starts when a project is filmed at a certain rate and then converted to a different one for distribution. This would happen if, for example, a movie (24 fps) is brought into european TV (25 fps) or an american TV programme (29.97 fps) is brought into India, which uses PAL (25 fps).

Let´s see how this kind of conversion is done.

Sampling Rate vs Frame Rate

Some people think that audio can be set to be recorded at a certain frame rate the same way it can be set to be recorded at a certain sampling frequency. This is not true. Audio doesn´t intrinsically have a frame rate value the same way it has a bit depth and sampling rate.

If I give you an audio file and nothing else, you could easily figure out the bit depth and sampling rate but you would have no idea about the frame rate used on the associated video. Now, and here comes the nuanced but important point, any audio recorded at the same time with video will sync with the specific frame rate used when recording that video. They will sync because they were reocrded together. They will sync because what the camera registered as a second of video was also a second of audio in the sound recorder. Of course, machines are not perfect and their clocks may measure a second slightly different and that’s why we connect them via timecode but that’s another story.

This session is set at 24 fps, so each second is divided into 24 frames.

Maybe this confussion comes from the fact that when you create a new session or project in your DAW, you basically set three things: sampling rate, bit depth and frame rate. So it feels like the audio that is going to be inside is going to have those three intrinsic values. But that is not the case with frame rate. In the context of the session, frame rate is only telling your DAW how to divide a second. Into 24 slices? That would be 24 fps. Into 60 slices? That´s 60 fps.

In this manner, when you bring your video into your DAW, the video´s burnt in timecode and your DAW’s timecode will be perfectly in sync but all of this will change nothing about the duration or quality of the audio within the session.

So, in summary, an audio file only has an associated frame rate in the context of the video it was recorded with or to but this is not an intrinsic charactheristic of this audio file and cannot be determined without the corresponding video.

Changing Frame Rate

A frame rate change is usually needed when the medium (cinema, TV, digital…) or the region changes. There are two basic ways of doing this. One of them is able to do it without changing the final duration of the film, usually by re-distributing, duplicating or deleting frames to accomodate the new frame rate. I won’t go into details on these methods partly because they are quite complex but mostly because if the lenght of the final picture is not changed, we don´t need to do anything to the audio. It will be in sync anyway.

Think about this for a second. We have changed the frame rate of the video but, as long as the final leght is the same, our audio is still in sync which kind of shows you that audio has no intrinsic frame rate value. Disclaimer: This will be true as long as the audio and film are kept separated. If audio and picture are on the same celluloid and then you start moving frames around, obviously you are going to mess up the audio but in our current digital age we don’t need to worry about this.

The second method is the one that concern us. This is, when the lenght of the picture is actually changed. This happens because this is the easiest way to fix the frame rate difference, specially if it is not very big.

Telecine. How video frame rate affects audio.

Let´s use the Telecine case as an example. Telecine is the process of transfering a old fashion analogue film into video. This is not always the case but this usually also implies a change in frame rate. As we saw earlier, films are traditionally shot at 24 fps. If we want to broadcast this film in european television, which uses the PAL system at 25 fps, we would need to go from 24 to 25 fps.

The easiest way to do this is just play the original film 4% faster. The pictures will look faster and the movie will finish earlier but the difference would be tolerable. Also, if you can show the same movie in less time in TV that gives you more time for commercials, so win, win.

What are the drawbacks? First, showing the pictures a 4% faster may be tolerable but is not ideal and can be noticeable in quick action sequences. Second and more importantly, now our audio will be out of sync. We can always fix this by also playing the audio a 4% faster (and this would traditionally be the case since audio and picture were embed in the same film) but in this case, the pitch will be increased by 0.68 semitones.

In the digital realm, we can achieve this by simply playing the audio at a different rate that was recorded. This would be the digital equivalent to just cranking the projector faster. Remember before when I said that an audio file will always be the same leght if it is played at the same saple rate as recorded? This is when this becomes relevant. As you can see below, if we play a 48 KHz file at 50 KHz, we would get the same speed up effect that a change from 24 to 25 fps provides.

This would solve our sync problems, but as we were saying, it would increase the final pitch of the audio by about 0.68 semitones.

That increase in pitch may sound small but can be quite noticeable, specially in dialogue musical sections. So how do we solve this? For many years the simple answer was nothing. Just leave it as it is. But nowadays we are able to re-pitch the resulting audio so it matches its original sound or, alternativaly, we can directly change the lenght of the audio file without affecting the pitch. More on tese methods later but first let’s see what happens if, instead of doing a reasonable jump from film to PAL, we need to go from film to NTSC.

Bigger frame rate jumps, bigger problems (but not for us).

If a jump from 24 to 25 is a 4% change, a jump between 24 to 29.976 would be a whooping 24.9%. That´s way too much and it would be very noticeable. Let´s not even think about the audio, everybody would sound as a chipmunk. So how is this accomplished? The method used is what is called a “2:3 pulldown”.

Now, this method is quite involved so I’m not going to explain the whole thing here but let’s see the basics and how it will affect our audio. First let´s start with 30 fps as this was the original frame rate for TV in NTSC. This makes sense because the electrical grid works at 60 Hz in the states. But as people who, for some reason, are happy living this way, things were bound to get messy and after color TV was introduced and for reasons you can see here, the frame had to be dropped by a 1/1000th to 29.976.

A 2:3 pulldown uses the proportion of frames and the interlaced nature of the resulting video to make 4 frames fit into 5. This is because a 24/30 proportion would be equal to a 4/5 proportion. Again, this is complex and goes beyond the scope of this article but if you want more details this video can help.

But wait, we don’t want to end up with 30 frames, we need 29.97 and this is why the first step we do is slow down the film from 24 fps to 23.976. This difference is impossible to detect but crucial to make our calculations work. Once this is done, we can do the actual pulldown which doesn´t change further the lenght of the film, it only re-arranges the frames.

What does this all mean for us, audio people? It means that we only need to worry about that initial change from 24 to 23.976 which would just be a 0.1 % change. That’s small but it will still throw your audio out of sync during the lenght of a movie. So we just need to adjust the speed in the same way we do for the 4% change. If you look again at the picture above, you’ll see that that 0.1% is the change we need to use to go from film to NTSC.

As for the change in pitch, it will be very small but we can still correct it if we need with the methods I show you below. But before that, here is a table for your convenience with all the usual frame changes and the associated audio change that would be needed.

Frame Rate Change Audio Speed Change Pitch Correction (If needed)
Film to PAL 4% Up 4% Down // 96% // -0.71 Semitones
Film to NTSC 0.1% Down 0.1% Up // 100.1% // + 0.02 Semitones
PAL to Film 4% Down 4% Up // 104% // +0.68 Semitones
PAL to NTSC 4.1% Down 4.1% Up //104.1% // +0.68 Semitones
NTSC to Film 0.1% Up 0.1% Down // 99.9% // -0.02 Semitones
NTSC to PAL 4.1% Up 4.1% Down // 95.9% // -0.89 Semitones

Techniques & Plugins

There are two basic methods to do a pull up or pull down. The first involves two steps: first changing the duration of the file while affecting its pitch (using a different sample rate as explained before) and secondly applying pitch correction to match the original’s tone. The way to actually do the first step depends on your DAW but in Pro Tools, for example, you’ll see that when importing audio you have the option to apply SRC (Sample Rate Conversion) to the file as pictured above.

The second method is simply doing all at once with a plugin capable of changing the lenght of an audio file without affecting its pitch.

Also, keep in mind that these techniques can be applied to not only the stereo or the surround final mix file but also the whole session itself, which would give you much more flexibility to adjust your mix on this new version. This makes sense because a 4% change in speed could be enough to put two short sounds too close together and/or the feel of the mix could be a bit different. Personally, I have only used this “whole session” technique with shorter material like commercials. Here is a nice blog post that goes into detail about how to accomplish this.

As for changing a mixed file as a whole, wether you use a one step or two steps method, you will probably find that is easy to introduce glitches, clicks and pops in the mix. Sometimes you get dialogue that sounds metallic. Phase is also an issue, since the time/pitch is not always consistent between channels.

The thing is, time/pitch shift is not a easy thing to accomplish. Some plugins offer different algorithms to choose from depending on the type of material you have. These are designed with music in mind, not dialogue, so “Polyphonic” is the one that is usually the best option for whole mixes. Another trick you can use is to bounce your mix into stems: music, dialogue, FX, ambiences, etc and then apply the shift to each of them indepentdently, applying the best plugin and algorithm to each. This can be very time consuming but will probably give you the best results.

As you can see, this whole process is kind of tricky, particularly the pitch shift step and this is why in some occassions the audio is corrected for sync but left at the wrong pitch. Nevertheless, nowadays we have better shifting plugins to do the job. Here are some of the most commonly used, although remember that non of these works perfect in every ocassion:

-Zplane Elastique: This is in my opinion the best plugin and the one I personally use. It produces the least artefacts, keeps phase coherent and works great on whole mixes, even with single step processing.
-Pro Tools Pitch Shift: This is the stock time/pitch plugin that comes with Pro Tools. It is quite fast but is prone to create artifacts.
-Pro Tools X-Form: This one is more advanced (comes blunded with Pro Tools Ultimate) but it still suffers from some issues like giving dialogue a metallic tone or mesing the phase on stereo and surround. Also, it is slow. Veeeery slow.
-Serato Pitch n Time: I haven’t tried this either but I had to mention it since it is very commonly used and people swear by it.
-Izotope Time & Pitch: It can work well sometimes and offers many customizable settings that you can adjust to avoid artefacts.
-Waves Sound Shifter: Haven´t used it but it’s another option that seems to work well for some applications.

Which one should you choose? There is no clear answer, you will need to experiment with some of them to see what works for each project. Here is a good article and video comparing some of them.

Conclusions

I hope you now have somehow a better understanding on this messy subject. It is tricky from both a theoretical and practical level but I believe is worth figuring out where things come from instead of just doing what others do without really knowing why. Here are some takeaways:

  • Sampling rate and bit depth are intrinsic to an audio file.

  • At the same time, an audio file can be associated to a certain video frame rate when they are both in sync.

  • The frame rate change process is different depending on the magnitud of the change.

  • An audio pull up or pull down is needed when there is a frame rate chenge on the picture that affects its lenght.

  • The pull up/down can be done in two steps: lenght change first, then pitch correction or ir can be done in a single step.

  • Time/Pitch Shift is a complicated process that can produce artefacts, metallic timbres and phase issues.

  • Mixes can be processed by stems or even as whole sessions for more flexibility.

  • Try different plugins and algorithms to improve results.

Thanks for reading!

Exploring Sound Design Tools: Igniter

Igniter is Krotos’ new engine sound design plugin. They have been kind enough to sent me a license to have a look and see what it can offer. Igniter allows you to virtualize vehicle engines (real, sci-fi or fantastic) combining a granular section, a set of synthetizers and two sample managers. It includes performance controls so you can automate the vehicle RPM, engine load and many FX (including doppler) in order to get a realistic sounding engine. It comes with a big amount of presets including sport and utilitarian cars, planes, helicopters, trucks, motorbikes and sci-fi vehicles.

So here is my in-depth look at the plugin features with some examples here and there. I encourage you to follow along in your own DAW, you can find a full featured demo here.

Interface / UI

The interface is clean and easy to read and you can resize the window which is very nice. A main section (left side) occupies most of the screen and includes all the audio sources we can use. These sources are divided into four different tabs: Granular, Synth, One Shot and Loop and also includes a file browser.

On the right hand side we find the engine master on/off switch and the main revs knob in the middle. This revs knob acts as a gas pedal for the whole plugin. At the top, we find the Mod system, where Igniter’s true power resides, since it allows you to dynamically link any parameter within the plugin to the revs knob using envelopes and LFOs. Lastly, at the bottom right side, we find the FX and mixer sections. Let’s see all these in more detail.

If you need more info, most features are well covered in the manual and on Kroto´s videos. What follows is my own take on the plugin capabilities, plus some wish list features that I would love to see in the future.

Granular Section

This is probably the most complex and important generator. It combines granular synthesis with real recordings to re-create a virtual engine with a revolutions or RPM knob that you can “drive”. Each vehicle includes two mic perspectives: engine and exhaust and we can easily mix between both with an slider.

When I saw this, it occurred to me that it would have been nice to also include an interior perspective as this would be very useful for vehicle scenes like chases. After some looking around, I discovered that all vehicles have a “In-car” preset which solves the problem. But this is not a true recording of the car interior, but a recreation of it using EQ and convolution reverb. Would this be too different or unauthentic compared to a true inside recording? To be honest, I don’t know since I don’t have a huge amount of experience doing car sound design but I suspect these presets will suffice pretty well for most applications and, of course, you can always tweak them to suit your needs or even do your own "in-car” processing outside of Igniter.

In terms of how the granular engine actually works, we can’t see what’s going on under the hood (see what I did there?) but I suppose the plugin is using recordings at different steady RPMs and blending them together as you act on the engine. This is similar to the approach used in middleware like Fmod for use in video games. The result is pretty natural and smooth and driving the RPM feels responsive and clean.

For now, you can’t add your own sounds to the granular section, as they would probably need to be edited in a very specific way for them to work here. On release day, Krotos offers 13 different vehicles that use this granular option but I’m sure more will be added with time or maybe made available for individual purchase in the future.

Driving Modes

As you can see on the interface above, there are basically two ways to control the engine simulation: manual and auto. At the same time, every car comes with a set of three presets, two of them are manual and a third uses auto. So, using the example pictured on the right hand side with the Dacia 1310:

-Dacia 1310: “Free” mode that uses manual driving.
-Dacia 1310 Manual Gears: Uses manual driving but with pre-determined gear shifts on the Revs progression.
-Dacia 1310 Auto Gearbox: Uses the auto mode.

Let’s see what’s the difference between these three:

In general, manual mode allows you to freely change the engine’s RPM an also gives you a “load” knob. This parameter simulates if you are putting pressure on the engine or, in other words, if you are applying pressure on the gas pedal or not and allows us to create more realistic sounding gear shifts and decelerations.

The difference between both manual presets is that on the “free” mode, the relationship between the granular RPM and the master rev knob is completely linear by default, so you have to play with the RPM value yourself to imitate the act of shifting gears. Here is a video of me just doing that with a Revs pass first, followed by a Load pass. As you can see, to achieve a natural result, you need to drive the parameters in a realistic way. It would require a bit of practice to follow onscreen action like this but it feels very easy and responsive.

On the other hand, the other preset type, “Manual Gears”, has the gear shifting already soft coded into the mod section, including on load and off load changes. Of course, you can tweak this as you please but the preset gives you a nice starting point. As you can see, in this mode you don’t need to imitate the engine revs with your automation and you can just use curves to describe how hard you want to accelerate or decelerate.

For the most part, this works quite well when going up on the revs but going down forces you to go through the whole set of gears which doesn’t always feel natural, although sometimes you may want this (Formula 1 cars kind of do this sometimes). I tried different ways to avoid this, like staying within the boundaries of the same gear or jumping fast from a higher to a lower point on the envelope, although this needs to be carefully drawn as automation. A potential solution to this issue would be that the RPM ramps don’t occur when we decelerate, only on our way up on the revs knob.

You can also notice how the load drops are already coded into the revs progression, which is pretty handy and also shows that I was too subtle with it on my free test.

The third preset, Auto Gearbox, uses the auto option which doesn’t allow you to directly control the granular RPM or load and simply gives you an slider called “Power” that we can use to accelerate or brake while the gears shifting is hard coded and can’t be tweaked. This would be similar to driving an automatic car.

Here is an example of me using this mode. Compared to the others, it feels a bit unresponsive at the start but once you get speed it works well, although the gear shifting doesn’t always feel “in the right place”. As long as you don’t need very precise and fast changes in RPM, this mode can be useful to get natural results quickly.

By the way, you may hear some clicks and pops on my examples above. I am not sure a 100% if this is coming from Igniter’s or was a internal audio recording problem but definitely the Audi R8 seems to be a bit more “clicky” on the exhaust than other cars I tried later.

Granular Advanced Controls

Lastly, the granular section also includes some other advanced controls:

-Shuffle Depth controls how thin or wide is the slice that the granular engine uses to select the samples. Higher values can help make the sound more natural and varied. Using the mods, you can, for example, make this value go up as the RPM goes up.

-RPM Smoothing: It slows down the response time to the changes in RPM. You can try increasing this if the engine feels too wild or decreasing it for a more fast response, which could be useful on auto mode.

-Idle Fade: Use this to adjust the fade between the engine on idle and low revs.

-Crossfade: It controls the blending between different grains or audio slices making it more abrupt or smooth.

-Lim Threshold & Kick: The documentation doesn’t cover these two but I suppose they are related to an internal limiter.

Synth Section

It includes 5 oscillators with two different waveforms each that you can blend together. You can also control the frequency and gain of each of the oscillators. There is frequency and amplitude modulation available for each oscillator plus a vibrato option.

And that’s pretty much it. Sounds basic but it is indeed powerful as you are able to link any of these parameters to the master rev knob creating dynamic designs that will grow in intensity and speed as the revs go up. You can also combine synth layers with real engines to create hybrid engines that combine real recordings and synths.

Here are some examples of sci-fi designs I did from scratch. Something that I missed is more options for the noise generator. it would be great to have more noise colours to create textures with or maybe a filter to shape it. The ability to apply separate FX to different oscillators would also be amazing.

One Shot section

This tab allows you to trigger certain individual sounds on specific moments on the rev progression curve. Maybe the most obvious use for this section is to trigger tire skids when we go up on the revs or screeching breaking sounds when we go down. In any case, this section is great to add sweeteners and flavour to the design.

There are four slots where you can drag and drop sounds. Unlike the granular engine, you can use your own sounds here and drag and drop them from finder. Each slot can be monitored independently and there are individual knobs to control volume and pitch. Both of these can also be controlled with an envelope instead of a knob, which offers interesting possibilities.

On top of the sample area there are four “timelines” each of them corresponding to one of the slots. Here is where you can choose when do you want the samples to be triggered but the horizontal axis doesn’t represent time but rev progression. In other words, you get to decide where in the acceleration curve you want some samples to be triggered.

Directionality is also accounted for. You can trigger samples as the revs go up or as the revs go down depending of where the triangle is looking. You can also have a sample that will be triggered both ways (diamond shape) and stop currently playing samples on the slot (square shape).

In general, the system is clever and nice to use but I feel that you’d really need some playlist and randomisation controls to make it really powerful. My idea would be to basically turn each of the slots into something like an fmod event. This way, you could add a playlist of sounds and control how to cycle through them or randomly jump between them.

This will give you a much richer system, where you can use sets of skids, terrain or engine pop sounds to choose from each time the event is triggered. For this to work well, you should be able to choose how deterministic the system is, in case you need predictability. Being able to tweak or re-shuffle the samples that were triggered after a pass would be also a good approach. I know Krotos is working on a run-time, middleware version of Igniter, so maybe something like this is already in mind.

Loop Section

Although the one shot section includes an option to loop its samples, this tab gives us much more power and control of sounds that need to be looped. It can be used in conjunction with the granular system or just by itself to create a completely new vehicle system.

This is pretty powerful. It allows you to have your own responsive car design, provided that you have recordings of steady RPMs to use. You can also use the loop section it to add texture or detail to the granular generator. You can add things like gravel, dirt, snow, clattering, squeaking or engine pops and link their intensity to the master revs knob.

You have four slots for loops and you can control their volume and pitch. The interesting and very handy thing is the section on the upper side. It allows you to customise how you want to blend your four loops together giving you the tools to smooth out both the crossfades and pitch changes between the transitions.

To obtain a good result, you need to make sure you have audio clips that loop cleanly. The Amp section helps when determining the boundaries between the clips but I miss more control on the actual volume of each of the sounds when I need to balance them out. I’ve noticed that actually some of the factory presets use the mod section to control this by using the general gain of the whole looping section but this strikes me as bit left field. Shouldn’t I be able to control the gain of each sample with the amp section? A gain parameter independent of the crossfades is needed here, I think.

On the other hand, the Pitch section is very nice to have and it works well. It would be amazing to actually being able to analyse the pitch of each of the samples and get a “suggested pitch curve”. This could be just an starting point so you can then tweak them by ear later.

The workflow in the loop section is a bit odd since you can’t hear anything unless the main engine switch is on but then if you switch it on, the first loop triggers so you can’t hear what you want to hear in isolation unless you manually mute the first slot. It feels kind of odd. Additionally, when building the loop progression, sometimes a slot doesn’t emit sound and you need to manually hit its play button. Kind of annoying.

So here is an example where I’ve built a Peugeot 307 engine from library recordings. For sure, the result is not as smooth as the granular presets and it sounds a bit “processed”, it’s like you can hear the artificial pitch bending too much. There are also many dropouts in the audio level and I don’t know if this is my fault or if there is a way to remedy that. The factory presets that use the loop system are cleaner than this but I can still hear some dropouts on those so maybe this is a bug?

As for the sound in general, it depends on how you drive the RPM and I assume creating a robust and good sounding vehicle system takes more sample preparation and tinkering than my quick test took. I was also thinking that maybe I chose the incorrect range of RPM loops and I missed having more slots so I can use more RPM states and make the progression smoother.

Browser

It is used to choose and monitor samples for the granular, one shot and looping sections. The tagging system is very nice. Igniter includes a nice selection of different engines and sweeteners, many cars also include recordings of doors, horns or wipers ready to use. I’ve noticed that you can’t drag and drop these sounds from Igniter to Pro Tools, which will probably be my first instinct if I just want a car door sound on the DAW’s timeline. The alternative would be to have the sound on the One shot section and either trigger it via Pro Tools automation or via the timeline system.

Other than the factory sounds, you can also use the “Files” tab to browse around your own computer files, including external drives, which is very nice.

Something I’ve noticed and is a bit counter-intuitive, is that in order to preview a sound on the browser, the engine button needs to be on, maybe that’s the case because that button just mutes the whole plugin internally but it took me a minute to figure it out.

Mods

As I have mentioned before, this is a very powerful and important section of Igniter and probably the one that I liked the most. It reminds me of Propellerhead’s Reason where you can flip the rack and apply envelopes and LFOs to any parameter in the system.

Basically the mod section allows you to link any parameter within Igniter to the master revs knob. You just need to drag the name of the desired parameter and drop it on the mod area. Then, you can edit the envelope that will govern this behaviour and also use an LFO to add some randomness or movement to any of these relations. The range or scale of the change can be adjusted with the sliders that appear to the right of each parameter. There are 8 mod slots so you can create very different envelopes and very complex systems.

By default, the RPM within the Granular section is inked linearly to the master revs and from there, you can link all sorts of other stuff, including FX, to make the engine more dynamic and responsive. Have a look at the presets to get some ideas of what you can do with this, it really allows you to get creative.

I was also thinking that it would be very nice to be able to use the mod section on other things than the RPM. As an experiment, I tried to turn the master revs knob into a distance knob, decoupling it from the granular RPM and linking it in several ways to volume, reverb and EQ.

Why would I want to do this? Because controlling the distance and perspective between shots is probably one of the most time consuming things to do in a vehicle scene. My experiment kind of works although when you do this, you loose the ability to link other stuff to the vehicle RPM. So, for a really powerful, all in one, vehicle design tool, I would love to have 3 master parameters: Revs, Distance and a maybe a third custom one. This is maybe outside of the scope or workflow that Krotos had in mind but that is at least how I would try to design it. Of course, you can create a similar effect just on your DAW but using this method you are able to link many things at once to the “distance knob” like engine/exhaust mix, granular FX, reverb sends, etc, speeding up workflow massively.

FX & Mixer

This section is pretty straight forward, nice to use and clean. You can control the level for each of your audio generators plus you have an FX send and Pan pot. While the sends and FX are pre-fader, the Pan is post fader. Each section has a rack with 5 slots where you can hook up FX. The FX that we can use are:

-EQ: Very nice parametric EQ with everything you need. Works great.
-Compressor: Very good too, with a gain reduction meter and a limiter mode.
-Limiter: Simple and clean dedicated limiter, useful to make sure you don’t saturate the output at high RPMs.
-Saturation: Good for adding some extra nastiness to an engine with extensive controls and colour presets.
-Transient Shaper: An unusual addition to a plugin like this since engine sounds don’t have many transients but it could be cool to use to add or remove dynamics to the granular section or on sweeteners.
-Flanger: Nice for sci-fi designs.
-Noise Gate: I suppose it could be useful if you have a noisy recording on your one-shot section.
-Ring Mod: Pretty cool and alien sounding and a nice addition for creating sci-fi stuff.
-Convolution reverb: Very good to have to recreate distance or an “in-car” sound. The controls are quite simple but you probably don’t need much more. I miss more outdoors IR in the factory library.
-Doppler: Very nice if you need to quickly cover passbys. You can control it independently or attach it to the main Revs knob. Passby presets are already created for each vehicle which is very handy.

General Workflow

In terms of workflow, Igniter allows you to create the engine RPM movements in a very quick and flexible way and of course, you can always come back and tweak the automation to make it work better. Additional passes controlling other parameters (like load) can add extra realism and detail.

The loops are nice to have since you can, for example, make any car go on gravel or dirt, for example, with just adding a loop layer to the granular. The one shots are not that useful, in my opinion, since you can only have five individual sounds and you can’t assign probability or playlists to the triggers, so every time you pass through them on the RPM curve, you would hear the same exact sound. The way it works right now, I think you would be better of just editing sweeteners like skids manually on your DAW the old-fashioned way and use Igniter for the engine itself but I’m open to be wrong about this.

You would probably need two instances of Igniter, one for exterior shots and one for interiors, unless you want to do the interior treatment outside. Once you have the basic RPM behaviour down, you would then need to mix it into the scene with fader work, pan and distance attenuation. That’s why I was thinking that it would be cool to have a dedicated master distance knob so you can tweak this in one go once you find a reverb that works with the scene. With these system I’m imagining, you would do an RPM pass, a distance pass, some tweaks here and there and you would be done for that car. Rinse and repeat.

Lastly, it’s also important to mention that Igniter offers a multi output so you can get an individual signal from each layer and mix them in any way you want in your DAW. This is very much appreciated.

Is Full Tank worth it?

Krotos offers an expanded version called “Igniter Full Tank” which includes all the unprocessed and processed recordings used to build all the presets. You get a lot of coverage for every vehicle in Igniter plus loads foley and sweeteners. The recordings are a great library just by themselves (75 GB of additional audio) and in combination with Igniter will allow you to cover every single detail and sound you may need. To clarify, these extra sounds come as separate audio that you can then browse within Igniter, but they don’t include new presets or vehicles.

Conclusion

I hope both you and me now have a good understanding of how Igniter works and what it can offer. I had a lot of fun testing the plugin, Krotos keeps giving us innovative tools to create custom, unique soundscapes and I feel that with them we can offer much more value to our clients because the result is unique and personal.

Above all, the granular system sounds great and I know how hard is to make interactive engines sound good. I’m sure more content will come for the plugin in the future and maybe some workflow quirks will be fixed with time. As for the features I’ve been suggesting, they are just my own take on how I would improve the software’s workflow and capabilities and since I’m sure some concepts and perspectives have escaped me, I will remain open to new and better ways of using Igniter as it spreads across studios worldwide.

Thanks for reading!