1. Register NOW and become part of this fantastic knowledge base forum! This message will go away once you have registered.

The Science of Sample Rates (When Higher Is Better — And When It Isn’t)

Discussion in 'Recording' started by Sean G, Jan 31, 2016.

  1. Sean G

    Sean G Well-Known Member

    The title says it all really...

  2. thatjeffguy

    thatjeffguy Active Member

    Excellent article... thanks for posting this!
  3. Sean G

    Sean G Well-Known Member

    I'm glad you found it informative Jeff @thatjeffguy

    This co-incides with what I think Bos @Boswell stated in another thread here on sampling rates.

    Sometimes its hard to get your head around this topic but I found this helped clarify some of the questions I had so I thought I'd share it with those that may not have read it already.

    I also found the white paper by Dan Lavry referenced and linked in this article a very interesting read indeed.

    That can be found here http://www.lavryengineering.com/pdfs/lavry-white-paper-the_optimal_sample_rate_for_quality_audio.pdf

    More information can be found below

    Sampling, Oversampling, Imaging and Aliasing - A Basic Tutorial by Dan Lavry http://lavryengineering.com/pdfs/lavry-sampling-oversampling-imaging-aliasing.pdf

    There is also this white paper by Dan Lavry on sampling theory for audio here http://lavryengineering.com/pdfs/lavry-sampling-theory.pdf
  4. thatjeffguy

    thatjeffguy Active Member

    Many years ago I came across a white paper about Nyquist Theorem. It was very thorough and complex, especially the mathematical formulas. But I got the gist of it all, that it was possible to EXACTLY duplicate the sound wave as long as the frequency was less than one-half the sample rate. It didn't say "approximate", Nyquist's theorem proved the wave could be exactly replicated using just two or more data points. (Assuming a limited bandwidth signal).
    This was enough for me to resist the hype surrounding higher sample rates. I have stuck to 44.1 for nearly everything, the exception being I go to 48 for recordings that will go to video.
    Since the mathematical basis for conversion is based on Nyquist's work, it makes me wonder if the quality difference between various converters isn't due strictly due to filter design & component quality? Or have manufacturers experimented with variations on Nyquist's formula?
    Anyway, a very interesting topic, and very important concepts that every engineer should have a grasp on.
  5. Boswell

    Boswell Moderator Distinguished Member

    Thanks for linking that article, Sean. It's an interesting read, but there are some inconsistencies and omissions that serve to dilute its authority.

    The article over-emphasis the notion that human hearing extends only up to around 20KHz for youngsters (falling off with age), but this figure comes from steady-state hearing tests. Steady-state tests not only result in a single upper frequency figure, but they ignore transient responses, and they also have no phase comparisons. As I have mentioned in previous threads, I've done various experiments comparing the recordings at higher sampling rates with lower ones (usually 96KHz and 44.1KHz), and find there are two main points: the difference in transients and the effect of top-octave phase distortion.

    Although the range of my hearing as measured by a clinical hearing test extends only to about 15KHz these days, I have no problem in distinguishing a 7KHz sinewave from a 7KHz squarewave (adjusted to have the same medium to high level fundamental amplitudes), where the only differences between the two start at 21KHz. At lower levels, I can't easily distinguish between them. This squarewave test demonstrates that the upper end of the human hearing range is not a sudden cut-off but a sloping curve with a shape that changes with received level.

    When it comes to transient sounds, there is a big difference for me between a 44.1KHz and a 96KHz recording of things like tingsha bells, which have significant energy in the 20-30KHz region. Incidentally, to make this comparison, you have to use a microphone and other equipment capable of operating to 30KHz or higher. It's very likely that to process transients the ear uses additional mechanisms that not only extend to a higher frequency than the steady-state sound detection mechanisms but do not degrade as much with age.

    For recording and mixing, I have given a local public demonstration of recording, analogue mixing then replay of sources recorded simultaneously at 44.1KHz and at 96KHz. I was careful to use no EQ - the analog mixer was flat to over 30KHz. After mixing, I put the stereo bus through a 44.1 KHz A-D-A neck. There was a demonstrable difference between the results from same sources at the two source sampling rates, but much less difference when the source was recorded only with a single stereo microphone and was not a stereo mix of 8 - 10 mono microphones.

    The demonstration provoked a wide-ranging discussion from the audience about what was going on. The most convincing explanation we arrived at is what I call top-octave phase response, that is, the phase response of the sampling gear being non-linear in the top octave of the sampling rate in use, and the ear being sensitive to this. When the sampling rate is 96KHz, the top octave (20 - 40 KHz) is much less audible than when it is at half that rate. For multiple channels, each has the phase non-linearity. Note that I deliberately put the test signals though 44.1KHz digital sampling and replay, so there was some phase non-linearity there. Incidentally, the best results were (not surprisingly) from the 96KHz recordings, analogue mixed and then sent straight to the monitors, by-passing the 44.1KHz double conversion.

    The tests I outlined illustrate that it is simply incorrect to say that human hearing does not respond to sonic components above a figure such as 20KHz, and therefore incorrect to say there is no point in recorded audio bandwidth being greater than this figure. This view is not supported by the linked article.
    kmetal and Sean G like this.
  6. Sean G

    Sean G Well-Known Member

    Thanks for your input on this topic Boz, its great to have a perspective from someone who is far more knowledgable on this topic than most of us.

    Its also a great benefit to those like myself who are late comers to RO to gain an understanding of this topic from someone who has put these theories to the test and shares their view and findings.

    This is a topic which I'm sure has been discussed here on RO and other forums in great detail before, I believe that for most of us who don't share your depth of knowledge on this subject it can be quite confusing to say the least when it comes to sampling rates and how changes in these rates can effect the audio signal and how that signal is processed in the digital domain.

    It was not my intention to use this forum to propagate one theory over another, or to promote anything which may be seen as mis-information, (there is enough of that already out there on the web....;)), merely to share what I feel was information that contributed to the debate on sampling rates from someone who is respected in the audio community and has conducted much research on the topic to help in my own quest for knowledge on the subject.

    IMHO This is a topic worthy of much further discussion in a world of ever-evolving audio technology and valuable contributions from those such as yourself here on RO, with far more knowledge and personal experience on the subject than most, I'm sure are greatly appreciated from those like myself who desire a greater understanding on the subject.
    kmetal likes this.
  7. kmetal

    kmetal Kyle P. Gushue Well-Known Member

    I've always looked at it in two ways. If you want something to work well you usually need more firepower than what you expect to do on average. Like a lawn mower engine could probably power an average compact car, but would be struggling. Pop a tiny 4 cylinder and things get more effortless. Basically if you want to run on 10, make sure your max is 15 or 20. This stems from me generally being hard on things and using them to the max, so I've learned over the years to compensate for that.

    The other way, is just as un-technical. The sound of tape/analog gear has always been of a 'smooth' character in general, digital has been known to be 'harsh' in the wrong hands.
    The thing with tape is its linear, and analog often 'continuously variable' (correct me if I'm wrong, I'm half speculating). Digital has spaces in between samples wich is essentially nothing, but a placeholder. My feeling was that higher sample rates always got you one step closer to a linear recording. Where there is less and less 'blank space'. I think a lot of time what you hear when things get gritty in digital, say with eq plug-insor compression, is an exaggeration of the blank space or 'steps'.
    Now I know headroom actually uses less than the maximum bits, so I'm not sure my theory even makes sense, as more headroom sounds better and fuller. Perhaps this is back to my first thought where you want a 10lb bag for 5lbs of stuff.

    This isn't to say one is better that the other at all, just my way trying to suss out differences between the two. To me the higher the sample rate the more of the source is actually being captured.

    Other reasons are future compatability as sample rates are likely to only increase. And it seems that higher sample rates lower latency for some reason.

    @Boswell do you feel that the same thinking you described applies when talking 44.1 vs 96, when saying 96 vs 192? Is there a point of finishing returns? As I belive the ear is an analog device. But I've also seen some pretty good arguments that humans and the universe are in fact binary. Lol sorry got off topic there in the last couple sentences.

    Good topic Sean, I think we can all benefit from any knowledge, as the sample rate issue isn't likely going away soon. Especially when 384+ starts hitting the advertising market.
  8. Sean G

    Sean G Well-Known Member

    @Boswell I would like to know your thoughts on Lavrys' viewpoint that the "optimum" sample rate is around 60khz...

    - If this is in fact correct, or even proven given the data, why are todays' manufacturers not using this as a benchmark and building converters to this spec as opposed to higher sampling rates such as 192khz ??

    -BTW I am in no way saying that there are not advantages to sampling at higher rates, but this is something that after reading the above white paper had me asking myself this question.

    I'd love to know your thoughts Boz, or that of anyone else with more knowledge and experience than myself on the topic.
  9. kmetal

    kmetal Kyle P. Gushue Well-Known Member

    Was looking at videos and came across this. DSD is not something I understand, but apparently from they way another video explained it, it uses very very high sample rates. Perhaps this is relevant in some way to the discussion.

  10. Sean G

    Sean G Well-Known Member

    I'm sensing conflict between Teds' beard and his shirt
    kmetal likes this.
  11. Chris Perra

    Chris Perra Active Member

    This is a cool video.. some stuff on samples and band limiting at the end.. a great real world demo of what happens from start to finish of A/D and D/A...

  12. thatjeffguy

    thatjeffguy Active Member

    Excellent video! I especially liked the discussion on dither and how he was able to visually demonstrate each of his points. Thanks for posting this!
  13. Boswell

    Boswell Moderator Distinguished Member

    It's not quite like that for digital audio. The "blank space" you refer to comes out of a representation of the samples on a conventional value-time plot. If you think of a series of thermometer readings, writing these down every 5 minutes or every hour gives a series of numbers that we are used to seeing and interpreting, and does not mean that there was no temperature between the readings. As long as you sample often enough, you can represent any continuously-varying quantity to any required accuracy using numbers. However, see below for when you take DSD into account.

    When it comes to A-D and D-A converters, there are very few equipment manufacturers who design their own converter chips or sub-circuits. Most use standard production parts from the main three or four semiconductor manufacturers that make devices that are suitable for the audio industry. It should be noted that the semiconductor design effort needed to make parts that will run well at 192KHz probably pays off with improved performance at lower rates. However, there's no great advantage taking these parts and then running them in your designs at the non-standard rate of 60KHz, at least when considering just the ADC. It's very possible that Dan Lavry was referring to the whole input chain, not just the ADC, and from the point of view of a clean analogue audio path, there is certainly more merit in restricting the input chain's analog bandwidth to 30KHz (from where the maximum conversion rate of 60KHz may have come from). Doing this would mean there would be an advantage to setting a standard sample rate of 96KHz but nothing extra to be gained by selecting 192KHz.

    Yes, I think it is relevant. DSD does not use sample rates as such, and in reality, the internal clock rates involved are much the same as are used in conventional ADCs. What DSD does is make decisions at discrete points in time as to whether the current estimate of the value of a waveform is more or less than the actual value. If it's less, it increases the estimate by a given amount; if it's more, it decreases the estimate. This action continuously tracks an input, and the more/less decision can be represented as a 0 or a 1 at each time point. It's this string of 0s and 1s that form the output. The D-A process is similar in reverse. Note that you need a start point; a short extract of the string will not get you a good representation of the signal, nor can you chop a bit out of a string and still expect the signal to be useful. This is one of the things that makes writing a DSD editor so problematic.
    kmetal and Sean G like this.
  14. Chris Perra

    Chris Perra Active Member

    I like that video,..it explains alot to me regarding dither and the stair steps/resolution paradigm. I found it interesting that with only 3 samples per 20 khz sine wave it reproduced it no problem.

    I wonder if that would be the same for full range audio. I suppose it's like film to a degree, once you hit a certain frames per second (I always though it was 32, but googling it it seems like it may be higher) your eyes can't tell that it's individual frames. The quality would be determined by the resolution and quality of the frames not the speed they run at over whatever the actual threshold is. I personally do notice better video game play at games that run at 60fps than 30. Although I'm not sure if a refresh rate on a T.V. is the same as a frame rate from a film camera because T.V.s go black in between the images and film transitions right from one to the other.

    It seems that for audio as long as you have the full range of frequencies that you want to hear in the sample and the next sample is close enough in time to be able to create a smooth transition between them you're good to go. If the D/A is creating a transition between the 2 samples what would be the difference if the samples were closer together or not? Both would be creating the same audio waves... as long as you have enough samples to create an audio wave it shouldn't make any difference, the extra samples would just be redundant... they wouldn't help you create a more accurate audio wave.

    I suppose that the extra samples might be useful with plug ins though. My Uad card has version of eq's and compressors that give you option of upsampled or regular. The upsampled always sound better to me. I only use 44.1 normally, I wonder if recording in 96 would make a difference as the plug ins are already upsampling for me in 44/1.. Or because it already has enough information to create an audio wave.. the upsampling plug in fills in the dots on the wave "grid" so to speak does it's thing and then downsamples the finished sound with enough resolution to create an audio wave.

    Lots of things to think about from that video haha..

    Oh to someone who might know... The square wave example.. was that to show the differences in sample rate frequency cut offs? I'm not sure I understood that example.
    kmetal likes this.
  15. kmetal

    kmetal Kyle P. Gushue Well-Known Member

    Thanx. So stick with your analogy, I was under the impression, that the temps between the recorded ones were estimated or rounded. In other words in between samples was an estimate or average of the two points. So my (mis)thinking was more samples would eventually mean no space between samples, or one long sample. Or one continuous,but variable, measurement of temperatures over a period of time.

    I think I got to this by a (mis)understanding of lossless data compression. I thought that the space in between samples was disregarded. Maybe that's on a frequency based basis?Without getting way off topic.

    Back on topic, if the space in between samples is in fact an estimate, is this accuracy based on the clock? And the resulting amount of inter sample modulation relative to this? Is jitter related to inter sample modulation? I'm thinking that (assuming a 'perfectly accurate' clock / ADDA) that more samples would cut down on errors or modulation Ect, because more of the actual signal was there, as opposed to the converters/clock 'filling in the blanks'.

    Is sampling a direct function/necessity of digital audio? Is something that's 'continuously' variably, like say a squiggly line, possible? Is that what designers end goal is?

    Sorry I feel like I'm asking a karate master white belt questions. I'll gladly read any links so you don't have to re iterate things a ton.

Share This Page