Time alignment

Discussion in 'Location Recording' started by Thomas_Vingtrinier, Jan 17, 2005.

  1. Hi,

    This is a subject which has been quite rarely addressed in pro recording forums and I guess we have the right audience here to gather valuable experiences and opinions on the matter. As far as I am concerned, I think the time alignment technique –when done properly- closes the gap between the old recording school approach (single stereo pair, be it A-B, ORTF, Blumlein, XY, Jecklin, you name it) and the large multitracks projects, in the sense that one (almost) gets the best of both worlds: precision, details, impact of multitracks close micing techniques, and the spaciousness, naturalness and depth of stereo pair techniques.

    Although I am very much in favour of the good old fashioned techniques, I believe there are many situations where spot micing is required to capture the full depth and emotion of a musical event. This is particularly true in the case of a symphonic orchestra for which a good balance between all sections is quite difficult to achieve with a 2 mics setup. Now comes the question of gently blending these spot microphones with the reference stereo pair. I have seen very different approaches from respectable engineers and corporations, which led me to the following classification (feel free to extend it for the sake of the discussion):

    1/ Simple time alignment: if an instrument (or a section) needs to be reinforced in the mix, a spot microphone (usually one with some directivity – but this could be discussed as well) will be placed quite close to the instrument. In this setup, the wavefront will hit the spot mic first then the main stereo pair. It is then quite easy to make an almost sample-accurate calculation of this delay just by looking at the waveforms on your favourite DAW (I personally clap my hands just at the place where the instrument would play). Joe, am I wrong to believe this is the approach you are using? I think I have also read on the Recpit forum that Nika Aldrich is doing something similar. I may well be wrong but it seems like the Deutsche Grammophon 4D process is also doing something comparable.

    2/ First reflections reinforcement: this is a route that has been chosen by the Danish Broadcasting Corporation. I hope DPA will not mind me quoting this extract from their excellent Microphone University section on their website: “Time alignment is very much dependent on the room or concert hall in which the recording is taking place. If each microphone blindly is delayed according to the distance and the speed of sound, there will be severe phasing problems if the musicians move while playing. To overcome the phasing problems while still preserving the timbre of the individual instrument the time delay has to be approximately 25% longer than the first coming sound (at the main stereo pair), calculated relatively to the first reflection (often the floor reflection).”

    3/ No time alignment: I guess most classical recordings are done without any kind of time alignment calibration whilst still sounding extremely good. This is for instance the case with those done at Radio France which has deliberately chosen not to introduce any time delay processing in their mixes. Generally, the spot mics are blended at quite a low level (between -20dB and -15dB compared to the main stereo pair) and positioned laterally with the panpot.

    Well, before going into further details, I am now very impatient to read your first comments on the subject.

  2. DavidSpearritt

    DavidSpearritt Well-Known Member

    We have had limited experience with this requirement as we do very little orchestral recording where it would be important enough to get it really right, ie CD sessions. For most of our live orchestral gigs destined only for FM broadcast there are so many other things (sources) to stuff up the recording, that we do not bother ... but

    We tend to go with method 3, and keep spots so low to be just perceptable, making delays unneccessary complexities.

    When there is a profound need for delay, and I can think of a "studio" session with vocal soloists behind the orchestra, but with significant spot reinforcement, we do a combination of 1 and 2, but start with simple delay calculated from the distances and then listen carefully around that delay setting to find the best actual delay, and this is usually just in excess of the calculated value, ie tending toward method 2. So we start with 1 and adjust toward 2 but I cannot recall ever concluding that it needs to be adjusted by 25% more.

    Also the other factor for us is that we do very little multi-track recording, when mixing to stereo it it difficult to arrange spot delays. In the past, when we have used it, we were recording to DA88 and dialled in sample delay on the way out, or more recently a GX8000, where we adjust in the Wavelab montage, but we use ears in the end to determine final value.

    As I said, not much experience to draw on here. Others will surely have more to offer.

    Great topic though, and I am acutely interested in what others use as well.

    FWIW I find a lot of DG's 4D orchestral recordings to be so flat front to back that there is much lost. I am thinking of Abbaddo Beethoven cycle (~2000) and the Rach Symphony Nr 2 wiith Pletnev and the Russion National Orchestra. Both fine very multitrack recordings but very little front back depth. Don't get me started on the Zimerman again.

  3. FifthCircle

    FifthCircle Well-Known Member

    For me, it really depends on the kind of music, the kind of ensemble and the situation involved as to how I time align.

    For film mixes where I am dealing with many, many tracks (like the 24-60 track mixes), I will almost never time align. Film music is often supposed to sound broad and overly present. It is all because of the presentation where the music often must compete with dialog, effects, etc...

    For my mainstream classical stuff, I will almost always fully time align everything to a central plane in the recording. For an orchestra, that plane is based on the front array of microphones (the center pair and flanks). If the front array is not in a plane, I may or may not time align to the futhest out main mic (ie a decca tree or if my flanks are closer in to the orchestra than the main pair). All spots inside the orchestra will be time aligned to that plane- woodwinds, harps, piano, celeste are the usual spots for my orchestral work. The levels are kept to the point where I get the feel or articulation on the instrument, but not where I can hear the microphone. In non-delayed recordings, I cannot stand the sound of woodwinds that may sound closer than the violins. I find that you can also get a lot more gain out of a spot microphone when it is delayed than when it is not.

    To figure out my delay times in such large-form recordings, I will do one of 2 things- I will either tote a measuring tape out with me and sit down with a pencil and paper and work out the times (gee, that high school math is finally good for something :D ) or I will stand under the main pair of mics and clap while recording to give an impulse in which to measure out the delays on the workstation. Both work just fine- it is a question of preference.

    In chamber recordings, if I use spots I will often use delays, too, but not always. Here it is really program dependent. For a lot of contemporary music I don't find myself delaying, but if I need a spot on a piano quartet (for the piano usually), I will almost always delay that.

    For jazz work, I almost never use delays for ensemble use, but I use them all the time on specific solos. I find, especially with vocalists, that the sound of a vocalist needs to be "pushed back in the mix" There are 2 ways to do this- more reverb or more time for that signal to reach the speaker. The delay is of course more transparent, but sometimes I use a combination of the 2 to get the sound I am looking for. I also find that sometimes the sound of a solo mic competes with the main mic in these shows and creates a sound that is almost like comb filtering (a delay of 3-5 ms between solo spot and mains can do this). Using a delay on the solo mic puts it in time and the comb filtering sound will then disappear...

    Anyways, I'm not sure which categories to put this in, but that is the way I operate. :?

  4. JoeH

    JoeH Well-Known Member

    Ben has pretty much covered what I'd say - probably better at that. I can't speak for him, but I think Jeremy has some thoughts on all this as well. (Didn't we have a thread going about this before? Oh, perhaps one of the earlier ones that got "dumped" accidently...)

    Years and years ago, I used to think it was overkill, and since (at the time) I was doing mostly "live to 2 track" classical/jazz recordings, it was out of reach anyway, and never something we had time to do, esp if it was going to be on the air almost immediatly on FM broadcasts, etc. (We also used the same approach as Dave mentions: using spot mics sparingly, for detail or touch up (same as nowadays) but without the time align. It has a sound of its own, and not always that bad.

    With today's ubiquitous multitracking - available even at OUR "indie" level of operation - it's easier to think about getting the best of both worlds. For large mulitrack recordings, when the hall is quiet (empty, actually) I do a "slap" at the conductor's podium, or hit two items together (wood blocks, etc.) for a fast-attack impulse. Even a leather belt held at either end, with a good "whip-crack" pull works...but you may get some odd looks if they think you're about to remove your pants. 8)

    As Ben mentioned, timing other sources to the front array is obviously a good way to go.

    But of course, you CAN overdo things; like everything else in the digital world, you can go too far, and make it all too homogenous. While I certainly don't want the winds or soloists jumping in EARLY on the time line, I also don't want things flattened out to the point of no details or spaciousness at all.

    With any good DAW (esp with a visual display of the wav forms relative to each other, down to the sample level), you can easily check visually as WELL as listen to the results. Pretty good deal, IMHO. In the end, it's listening to it all in context that will decide for you.
  5. Exsultavit

    Exsultavit Active Member

    Ben & Joe-

    I'm enjoying this discussion VERY much! A deep subject, but my question on this post is a simple procedural one: Ben if I got your method correctly, you do a handclap/ woodblock pulse at the conductor's podium, see that waveform on the DAW, and adjust accordingly. I'm not quite sure of what's next-- can you go over your time align procedure in a bit more detail?

    ONE SCENARIO: I guess one of the problems with spot micing in a multitrack orchestral session is that when you bring up, say, the woodwinds without delay the listener will hear the spot mic BEFORE the winds get into the mains, making them seem unnaturally close to the listener. So the delay may help with this- and the alignment method might be...

    On the DAW screen, I'll see the clap first on the mains, then on the spot mics that are further out. Obviously, if I line up these impulses now, I'll just be bringing the spots even MORE forward in time (to match the impulse that entered the mains first). Wouldn't this just be exacerbating the problem? So is your method, Ben, to measure/see the delay between the spot and the mains and then to delay the spots AGAIN by that amount, doubling the delay? Please assist.


  6. FifthCircle

    FifthCircle Well-Known Member

    By seeing in the wavform the impulse, you can measure the time it took for the sound to reach those microphones. Delay the tracks by that amount. So, for instance, you'll see your woodwinds at about an 18-20ms difference in your standard orchestral setup. Delay the tracks by that amount so that the sound is all reaching the reference point at a specific time. Otherwise, the sound in this example hits your woodwind mics before your mains thereby making your woodwinds sound like they are closer to you than the violins for example. Delay them by the specified amount and now the sounds are all referenced to that point and you have a much better sense of depth in your recording.

    If you use the measuring tape, sound travels at roughly 1080 feet per second at sea level at 70 degrees faranheit. Or a touch less than 1 ms per foot of distance.

  7. Exsultavit

    Exsultavit Active Member

    Thanks, Ben! That does seem obvious upon reading your explanation...

  8. FifthCircle

    FifthCircle Well-Known Member

    Forgive me if some of the stuff I say isn't clear at the beginning when I write it... So many of these techniques are just second nature for me by now that I make assumptions as to what people know coming into it.

    If anybody isn't clear with anything I write, please flag it- I'm happy to explain further...

  9. Yes indeed. I for sure agree these recordings do not give much credit to the time alignment technique.

    That is also a very valid point. When the right delay is applied, the spot mic and the main pair contributions are 'in phase' which explains the gain boost effect.

    Yes, I believe both approaches (handclap and measuring tape) will give you a rather good estimate of the delay to be applied, but I think we need to be more accurate to get the best out of this technique. Let me substantiate my point in the following post.

  10. I would like first to demonstrate the necessity to be VERY accurate when it comes to identifying this delay.

    Imagine we calculate an overall delay of 20 ms, with a 0.5 millisecond error compared to the exact delay we are aiming for. The worst case for a 0.5 ms error delay corresponds to a waveform which frequency is 1 kHz (a 0.5ms delayed 1 kHz sine wave is exactly 180° out of phase with its un-delayed counterpart). Intuitively, one can understand any sound coming from the instrument we are trying to reinforce which frequency content is well below 1 kHz will benefit from the gain boost effect mentioned in the previous post. On the contrary, all frequency content around and above 1 kHz will be messed up despite the time delay correction, the worst case (180° out of phase) occurring at 1 kHz but also 3 kHz, 5 kHz, 7 kHz, etc.
    The formula to calculate the critical frequencies is: Freq (n)=1/2/Te + n.1/Te, where Te is the timing error and n is an integer. Well, that’s the typical and unwanted comb-filtering effect.

    The thing is that 0.5 ms error is not much – that’s only a 2.5% error of the overall 20 ms delay- but can have a dramatic effect on the mix in a frequency band where the human ear is very sensitive (1 kHz). As per the above mentioned formula, one can also notice that the smaller the error, the higher the critical frequency (which is a good thing). SO, we must do better and try to lower further the incertitude on the timing error. This is where an improved version of the time alignment approach should come into play.

    When I first addressed the problem, I realized that clapping my hands just in front of the main stereo pair does not give the same result as when clapping my hand in front of the spot mic. Same order of magnitude for sure, but not exactly the same. As a matter of fact, you get exactly the same result if the instrument emitting point, the spot mic, and the main pair are perfectly aligned. So the only way to achieve this properly is to clap my hands in front of the spot mic exactly where the instrument would play and NOT in front of the main pair.

    Now the second element that comes into play – which I consider to be the ‘secret’ of a proper time delay compensation-, is the fact that the distance between the 2 main microphones (the stereo pair) should not be considered negligible. If the instrument (and therefore the spot mic) is even slightly off-centre to the right, the sound wave will first hit the right microphone of the stereo pair, then the left (obviously the argument is not valid for coincident techniques). On a first approach, one could consider the time difference very small and negligible, but it is a different story when you realise these time differences are the basis of the stereo effect with any non-coincident techniques.

    Putting it all together:
    Back home, in my favourite DAW (Sequoia), I can easily identify the spikes on the waveforms due to my hand clapping. I can therefore calculate with almost single sample accuracy both the delays (in samples) between the spot mic and the main left microphone, and between the spot mic and the main right microphone. Then I use a very simple tool from AnalogX called Sample Slide (and it’s free…), which allows me to delay separately the left and right channels of a track, in my case the spot mic track. A little bit of panning will also enhance the lateral positioning of the spot mic. Et voila!

    My opinion could be biased, but the trick of introducing the delay between the left and right channels made a big difference in my mixes, especially regarding the flattening effect of the stereo image that I also noticed (ref. David Spearritt comments) when I was only using the simple time alignment technique (that’s option 1/ in my first post).

    There are other small refinements and extensions of this technique that I would be happy to share, but first I would love to hear further comments and contributions on the subject. Even more interesting, I am very eager to read your impressions if you try to implement this technique in a forthcoming recording session.

  11. FifthCircle

    FifthCircle Well-Known Member

    Interesting post Thomas...

    The only issue that I see is that with so many different delays on stage, you can go positively batty trying to figure out which to use. The exact nature by that point of measuring an impulse is lost. If you take a more purist approach in the recording, it may work better, but I usually take at least 6 mics to record an orchestra- rarely less. (main pair, flanks, woodwind spots).

    The whole concept of phase I find is big with short distances (ie your concerto/solo spot and your mains), but with the longer distances I use the delay to create a sense of depth in the recording that otherwise wouldn't be there- Phase doesn't even enter into it. As long as the measurements are within a couple milliseconds of where they need to be for those more distant mics, they work fine. For the close mics, I agree it is quite important to be as exact as possible.

    When I first started doing this, I was mixing on a digital console (Yamaha O3D and O1V) and all of my recordings were going direct to stereo. No multitrack involved. For this, an inpulse was useless and it was all done with the measuring tape. The differences between delayed and non-delayed recordings were night and day. I'd find that to get the woodwind mics to disappear in the non-delayed recording, I'd run them at 10-12 dB down (digital attenuation in the board). However, once the delays were entered in, the gain would often raise to 6-8 dB down as I wouldn't have the time/depth issues anymore.

    Oh well... Just a few thoughts.

  12. David French

    David French Well-Known Member

    Fascinating discussion guys! I'd like to add a few small points. First, when a delayed copy is combined with an original signal, the delay must be within 20dB of the original for any comb filtering to occur. Also, with the use of different microphones and whatever may happen to the sound in the air on the way to the far array, the sources should no longer be perfectly correlated. Second, I don't get this 25% stuff; 1-35 msec is the approximate range of the fusion zone where comb filtering is possible. Third, i'm getting 1128.6 ft/sec for that temperature and pressure Ben. How are you calculating?
  13. DavidSpearritt

    DavidSpearritt Well-Known Member

    Thomas, great post.

    This implies full phase coherant addition of the two signals which then surely is true only at low frequencies for the distances we are talking about, ie the distance between a main pair and a pair of wood spots, that might be some 25 feet back.

    Comments? At what frequency do you think the cutoff is occurring for the transition between phase coherance and incoherant addition at these distances.
  14. This is just a question of methodology Ben. From what I have read on this forum, I am sure someone like you would double-check EVERY mic before starting a recording session. So you or your assistant would make some noise in front of every single mic to check the signal is indeed passing through the whole rig. Well, I do exactly the same, except that I press the record button before starting the check (so that I do not even need to take notes!), and I clap my hands in front of each spot mic. Coming back to the studio, it takes me less than a minute per spot mic to calculate the left and right delays just by looking at the waveforms. Sample Slide is then set once for all on the corresponding track inserts.

    For large orchestral works, six mics is also the minimum I would consider to be on the safe side, although I also had some good results with only 4 mics.

    The comment is very pertinent. Indeed, the SNR for ‘closer instruments’ (=relatively to the main pair) is generally much higher than for distant ones, which explains the higher sensitivity to phasing effects. However, I would maintain that from a pure mathematical standpoint the comb-filtering effect is still there for distant mics, but has a less dramatic effect as it is sort of masked by other strong ‘noises sources’ like other sections of the orchestra and room acoustics captured by the main pair. The listening tests I have conducted so far have led me to believe there is still something to be gained from accurate time alignment even for distant mics. The effect is particularly noticeable (and pleasing!) when the brass section or the timpani are taking over: when the right delay values are keyed in, it seems to me we suddenly get rid of that curtain covering the back of the orchestra whilst still keeping all instruments at the right place in the front-back axis.
    Depending on the music and the room acoustics, the resulting effect could be more or less impressive, but at least I have not been in situations where it did harm the mix.

    Yes, phase coherence! You are very much on the ball here and you are paving the way for some refinements of the technique.
    I think it is important to keep in mind that the waveforms you get for both the main pair and the spot mic are ALREADY the products of the complex treatment chain that you are describing, ie air absorption + room acoustics + mic + preamp. SO when you do the time alignment process on your DAW, you are sort of improving the incoherence you are referring to. In other words, there is no need to create a complicated model to simulate the whole process (filtering + non-linear treatments) as you are already working with the end results (I hope this point is clear, I am not sure my English is good enough to properly convey the idea).

    I guess you have also understood between the lines that I have not idea at which frequency the coherence starts to fall apart. I believe it would be very difficult if not impossible to find a suitable model for that. I dare say this is where the maths should step back and let your ears be your sole guides. A good way to avoid the incoherence issues is to apply a lowpass filter on the spot mic and play with the cutoff frequency until you are satisfied with the results. Such a process has also the strong advantage of making your alignment process more robust to calculation errors at the cost of loosing some precious high freq information. Much on that later if you wish to.

  15. Cucco

    Cucco Distinguished Member

    Math. Isn't it wonderful!

    The exact (or really close) delay per foot is .88ms/foot - of course, that is at 70 degrees F. At 85 degrees, those numbers go right out the window - and since most stages run varying temperatures, it's real tough to get exact.

    However, using the .88 ms/ foot is a good starting point. As are both methods of clapping - as long as you understand which way the audio has to slide.

    Ultimately, it is plainly clear when spots are correctly time-aligned and when they are not. I think Ben's point about safely getting a few more dB out of the spots is well taken. I don't think he is suggesting they automatically get louder - rather, it is easier to bring their relative volumes up because it now sounds more natural to do so.

    There was (and still is) a lengthy post on the recording studio forum titled "Misconceptions about Phasing," in which I brought up a lot of this. (And came under a lot of fire since I made the assertion that most waveform editors don't display the correct frequency information, simply a graphical representation of the summation of the frequencies on an amplitude plot.) I don't want to rehash that whole discussion here, but ultimately, I urged people to use math and their ears.

    Just some thoughts...

  16. JoeH

    JoeH Well-Known Member

    I'm reading and absorbing everything in this discussion, and having a good time putting all this stuff into action this week, recording a "Concert" version of an Opera (L'amico Fritz - Tues., Thursday and Friday - tonight). It'll be aired next week.

    I've got 14 mics: Three across the front of the stage on short stands at the lip (for the singers) Two out in the house (applause/ambience), two omni outriggers as the main orchestral pair (Can't fly a center pair, due to supertitle projections, can't put a pair on a stand in front of the conductor either) so to avoid a big whole in the middle of the orchestra mix, I'm going with a lot of spot mics, violins left, low strings right, (4 celli, 3 basses on the right) and various other touch up mics for harp, offstage choir, and even a pair of SM-81's on the 8 wind players.

    The harp and wind mics REALLY do sound unnatural and "in your face" before being delayed, compared to the others, no matter what I do level-wise, ditto for the offstage choir. The choir mic is probably the worst of the lot, because the choir is SUPPOSED to sound far off and away in the distance. (And if I don't mic them, I lose them altogether...)

    I'm still only working with temp mixes (they're expecting to use tonight's performance as the core/main part of the broadcast), but I'm already hearing an improvement as I move them BACK in time vs. the others. I hope to do a few "slaps/impulse" tests tonight, but with it being a union hall, and musicians showing up to warm up (it's a fairly large orch. for this one) I probably won't get any serious "quiet" time to pull it off. Fortunately, there's some percussion "slaps" and effects going on, so I should have what I need, regardless.

    It'll be broadcast next week, and if it goes well, I'll post the URL for the webcast, if anyone's interested.
  17. I cannot speak for Ben, but my experience with the technique is that you get both effects. 1/ It gets louder because you add 2 signals with a strong inter-correlation and 2/ you can push the faders further up without creating the horrible comb-filter effect.

    Without trying to revive the thread you are referring to, I am not sure I understand this way of thinking. As far as I am concerned, waveforms on a DAW are just amplitude versus time graphs. I probably need to reread the thread to get your point…

    Ben: I am very eager to read what you have to say following this week’s recording. Too bad you do not have a centre pair though. Instinctively it would have been a more appropriate ‘time reference’ pair for the technique to work properly.

  18. Cucco

    Cucco Distinguished Member

    Then I think you and I would agree.

    Despite the fact that many wave editors attempt to show frequency in their graphs, they cannot do a complete and accurate job. So, as you say, it is an amplitude versus time graph and nothing more.

  19. JoeH

    JoeH Well-Known Member

    <<Instinctively it would have been a more appropriate ‘time reference’ pair for the technique to work properly. >>

    Well, I do have three cardioids on short stands the floor, across the front for the singers: L, C & R. The center "C" vocal mic is literally a few feet directly behind the conductors back. The two omni "outriggers" are close enough as well, on booms protruding in from the 2nd tier balcony over the orchestra; it's not that wide a space.

    These five mics will comprise the "Front line" of L&R soundfield, and I'll work from there. Tonight's the keeper, we'll build from this performance. Tomorrow (Saturday) looks like a blizzard/cancellation, and a different cast anyway. Everyone's glad we got tonight in the can.
  20. FifthCircle

    FifthCircle Well-Known Member

    Don't want this to seem like a Sequoia ad, but have you looked at the Comparisonics waveform view? That adds a pitch component to the waveform...


    Do you have access to Collette tubes for Schoeps mics? If you do, you could use those to have a very low-profile pair near the conductor to fill up the center part of the image.

    I'd also strongly consider bringing your flanks in a bit closer than you normally would to lessen the whole in the center.


Share This Page