Skip to main content

I was just doing my everyday browse through the web and nearly fell asleep until I found this pretty nice dissertation by two graduates at Detmold University over here in Germany which actually confirmed what I have always been thinking or at least been very suspicious about the industry's hype!

Unfortunately, the dissertation is in German, but for those who speak and understand it, here is the link:

http://www.hfm-detmold.de/texts/de/hfm/eti/index.html

It basically says that there was no big difference between 24/48 and 24/96 compared to the original analog signal. The two 48kHz converters they tested even bet the two 96KHz ones. Besides technical measurement, there was an audition. Most people chose the 48 signal for being the closest to the original. A couple of people even considered the 48 being the original. The two graduates claI'm that the quality of converters depends on a high SNR and a good analog circuit design. Note that this is just a very short and hence unprecise extract from the graduates' dissertation.

Comments

TeddyG Sun, 03/06/2005 - 14:44

It is my understanding that higher rates, after a certain point(Professionals I know say 24/48 is that point?), are, indeed, inaudible in any human hearing test. No surprise here, we cannot "hear" 48k, let alone 96. We also cannot "hear" any "faults" at 16/44.1(Or less). So, why don't we just use 44.1?(Or less)

The reason one uses higher rates is supposed to be that starting with the highest possible rate - assuming all equipment is completely capable of retaining all quality - assures a higher level of quality --- later...

I cannot speak of digital rate quality, I don't have the proper equipment, or the ears.

I can speak of analog quality, as regards, say, reel-to-reel audio tape recording.

At speeds at least as low as 1-7/8 ips(Cassette machine quality), "original" recordings can sound pretty "fine" - again, assumning good quality equipment. But if one tries to "copy" such tapes, quality diminishes quickly. After several generations(Copy to copy to copy, etc.), which might be done, for instance, if a cassette tape was a "master" recording, to add compression, or EQ or reverb, to the copy that was not on the master, then to, say, make a "final" copy for duplication(Any number of copies may be done, each to "add" it's own changes), quality really suffers.

On the other hand, if one records the original master at, say 30ips(Rare, but done), then keeps each succeeding "generation" in the process at 30ips, then makes a "final" master tape, at say, 15ips, for duplication, with the ultimate goal of producing 1-7/8ips audio cassettes, the quality of the cassette one bought at the store could be pretty darned close to "sounding" like it was the original master tape -- there was loss all the way through the process, but very little loss that would have translated to the cassette - the cassette was never capable of hearing what may have been on the astoundingly high-quality 30ips tape anyway...

Thus, your statement is no surprise here.

In any event, if this is true, one would always try to "bump up" the master "speed" -sample rate/bit rate -at least a step or two ABOVE the "final" rate, in an attempt to avoid loss of quality by "final" format. Thus, a master recording with the ultimate goal of CD, recorded at 24/48 should be good, though at 24/96(Or above) it could(Depending on number of "copies" from master to final) be better.

TG

anonymous Sun, 03/06/2005 - 15:46

That's what happens when you try to shorten a dissertation... Well, basically I do agree with you. I forgot to mention that there was a BIG difference in quality between 16/44.1/48 and 24/48/96, but not a significant difference between 24/48 and 24/96! Even those two guys mention that it COULD(!) be better to choose the highest sample rate available when having a long chain. (BTW., a long chain is never any good). But they sort of WARN that a very high rate could even generate noise and could not be capable of quickly and adequately turning inputs and outputs on and off accordingly to the incoming analog signal. :D

anonymous Sun, 03/06/2005 - 16:34

Doesn't the high end get all messed up at the lower sample rates??
For instance, a 15khz waveform at 44.1 K will only take 3 samples to map the wavelength( upper and lower limits) and that will be 6 samples at 88.2. In those three samples at 44.1 the wave starts at zero ,peaks positive,hits zero,peaks negative and returns to zero. Even if the waveform were somehow starting exactly on a sample point it would be impossible to map correctly in three sample points because the middle zero crossing would fall exactly half way between two sample points. 96 is twice as good at the higher frequencies but still is far from perfect. If you record tone generated by Protools into protools (eliminating any converter issues) at 15khz and zoom into the waveform it looks a lot more sine waveish at 88.1/96 compared to 44.1/48. Doesn't this make 192 khz ( 12 sample points at 15 khz) far more accurate at the high frequencies? Or am I missing something??

Randyman... Sun, 03/06/2005 - 20:24

What you see in your Sample Editor's window IS NOT what the signal looks like when coming out of the D/A. Even IF a waveform starts 1/3 way inbetween a sample point, the waveform will STILL be replicated ACCURATELY on the D/A output. If you analyze the OPUTPUT of a good DA, you'd likely see that there is no difference in a 15KHz sine at 44.1K fs, and a 15KHz sine at 192KHz fs (No, I have not done this myself, but I have read and seen quite a bit of duccumentation on this). Both will be reflected as they were captured (even if the waveform starts IN BETWEEN the sample points). Both will be a smooth sine wave at either sample rate (on the ANALOG OUTPUT). 44.1K will capture anything below 22.05KHz with every bit of accuracy that is possible - including phase.

This digital stuff is truly mind-boggling. I'm still wet behind the ears, but there are TONS of killer info Here, at PSW, and on the web in general. Nika Aldrich ad George Massenburg (sp) are 2 who surely know their stuff!

The math involved is just plain crazy IMO. Nyquist Theorem is the final word in Digital IMO (I'm surely NO expert). How it is implemented, and how the analog circuitry interfaces into the digital realm is where the problems occur. High sample rates are NOT the fix IMO.

Just think - Have you EVER heard a 44.1K CD that sounded good? I have heard quite a few. When it is done right - 44.1K is a beautiful thing.

:cool:

JoeH Sun, 03/06/2005 - 20:31

aggghhhhhhh, not this again........

It's really pretty simple now; the industry standard has all but settled into a couple of very simple choices:

1. If $$$ is no object (Big budget movie soundtracks/scores, Audiophile recordings, PTs HD, etc) Use 24bit /192k sample rate. All comparative arguments are moot at that point.

2. For everything else, use 24/96 if your system (hardware) supports it. Edit and process at this rate, don't go down to the lower rates until your final mix. Then SRC and dither down with the best converters & algorithms possible when making a 16/44 CD, or 16/48 Video Soundtrack.

Additionally:

3. If your project is going to end up as a video track, start at 24/48 (or 24/96) and them go to 16/48 when finishing up the final video track.

4. If your project is going to end up as a CD Audio track, start at 24/44 and stay there till the end (you'll lose any subjective "better sound" in the gearboxing process from 48 to 44.) Or again, start at 24/96 and stay there until your mix/bounce/dither to 16/44 at the very end of the project.

With the speed and power of most DAWs these days, you can start at 24/96 and stay there until you make your DVD, DVD-A or CD final renders.

anonymous Mon, 03/07/2005 - 06:51

Randyman... wrote: . Even IF a waveform starts 1/3 way inbetween a sample point, the waveform will STILL be replicated ACCURATELY on the D/A output. If you analyze the OPUTPUT of a good DA, you'd likely see that there is no difference in a 15KHz sine at 44.1K fs, and a 15KHz sine at 192KHz fs (No, I have not done this myself, but I have read and seen quite a bit of duccumentation on this).

Yes it does interpolate and smooth waveforms , but in a complex music waveform, just how accurate can that process be? Its still mathematically filling in the holes. Interpolation and smoothing aside, a high frequency complex wave has to be more accurate at the higher sample frequency because it has had twice or four times as many actual sample points as 44.1. Also, the single biggest stated reason for going to 96 or 192 is that the high end does not sound as brittle. If what Randyman says is true, there should be absolutely no audible difference in highend smoothness between 44.1 and 192, yet IMO there is.

Can anyone in this forum actually check the output distortion (Thd) of a 15khz sine wave at 44.1 versus 88.2 or 176.4 ?? I am very curious about this and I don't have the tools to check.

TeddyG Mon, 03/07/2005 - 09:15

Dave62 - All of this information/testing/comparison, one would think, someone has done, somewhre - maybe on the web? Maybe not with WL, specifically..?

Anyway...

Yes, starting high(Initial recording) and ending low(The final format) should be good, but, also yes, the best thing is to do as few "saves", "copys", etc, as possible from first record to finished product. Pre-planning so one does as much of the editing/processing/ as possible with, say, the first "copy", would be great - then to make your next copy "the(Lower) right speed" and the "final product" would be excellent.

What rates are proper is fun to talk about, where it all ends is a good topic, but it's only going to get more confusing as the "final product" rates get higher! Indeed, like it or not, many of us should at least be looking at 24/192 mastering, just to stay ahead of the curve! Ick! At least hard drives are getting cheaper...

Whether we can hear the difference between 44.1 and 192, is nearly irrelevent. One doesn't often, if ever "use" the 200+ mile per hour speed of a fancy sports car, but they get bought just the same.

We must keep in mind though, with our project planning, that it isn't the original we need to plan for(We'll just do that at the highest possible rate we can do well), it's the "final rate", the final format needed that will determine how high we have to start.

24/48 might be great to start with, for a CD or radio commercial, which will ultimately be 16/44.1 .wav(Or even .mp3, as mine are), but if the final format is 24/96..! it's not great enough... Where are my equipment catalogs!

TG

anonymous Mon, 03/07/2005 - 10:57

Something in English - a research paper, a doctoral thesis
(masters may do as well) on the posted subject.

Thanks,
Costy.

P.S. Your example with the circle and three points is not complete
in a sense. You should give three points AND the errors for each
coorinate. Then we can talk how exact is the circle.

anonymous Mon, 03/07/2005 - 11:04

48khz vs 96 khz

it all depends on the project. The increased resolution will give more accurate high frequencies and should give tighter bass response.

Whenever a project involves vocals or drums - anytime I use my C414 mics - I'll record at 96khz.

I claim to be able to hear the difference. Especially with a vocal group a-capella.

Bottom line is: if you can afford the extra disk space and CPU overhead that 96khz requires, then by all means use 96khz.

Th difference is subtle but fine. It can't hurt.

Most software runs at 32 bit. So its already stepping up your 24 bit audio to 32bit. Why not at least give it 96khz instead of 48khz to chew on as well?

anonymous Mon, 03/07/2005 - 11:07

Costy wrote: Something in English - a research paper, a doctoral thesis
(masters may do as well) on the posted subject.

Thanks,
Costy.

P.S. Your example with the circle and three points is not complete
in a sense. You should give three points AND the errors for each
coorinate. Then we can talk how exact is the circle.

I'm afraid I do not have a master's or doctoral thesis, though I do have a 400 page primer on digital audio available if that might help? The best place to look for validation on Nyquists theorem would probably be:

Shannon, Claude E. "Communication in the Presence of Noise." Proceedings of the IRE Vol. 37 (January 1949): 10-21.

You could also look at the original proof offered by Nyquist at:

Nyquist, Harry. "Certain Topics in Telegraph Transmission Theory." Transactions of the AIEE Vol 47 (April 1928): 617-644.

Each of these are fairly short reads and are (in my opinion) a lot easier to get through than doctoral dissertations on the subject. Most dissertations I've read that relate to the audio industry take Nyquist as an assumption, so it is difficult to find something that rehashes the eloquent 11 page proof by Shannon.

As far as the circle analogy and where you find a flaw, I do not understand. The Nyquist frequency merely tells us what information is required in order to reconstruct the shape. The accuracy with which we acquire it will certainly change the results, and thus the quality of the conversion process is essential. Error in the audio industry as far as A/D conversion is concerned comes primarily from two places: the "vertical" error of bit depth quantization, which manifests itself as low level noise and won't be affected by sample rate changes, and the "horizontal" error of sample timing inaccuracies (jitter) which won't be improved by sampling at a faster rate.

Please explain more about the breakdown in the correlation between this mathematical analogy. Each explains merely the minimum amount of data necessary with which to represent a particular shape, no?

Nika

JoeH Mon, 03/07/2005 - 11:42

Nika, I can vouch for you and tell the others you're generally a nice fellow and you've always got something interesting to say. 8-)

I'm NOT, however, interested in more convoluted, overwrought agonizing and arguments over things I cannot hear anyway. (And I freely admit it here as well; I don't care who knows it :lol: ) I just happened to stumble onto this thread, and believe me, i'm not looking to get into any more subjective back and forth about it all.

I'll put my hearing next to anyone's, and state that the variations of things we're talking about here are for the most part so small and negligable as to be inconsequential in any real-world listening environments.

Let's all go record some MUSIC, with real musicians, and then we'll talk about it, ok? 8)

anonymous Mon, 03/07/2005 - 12:16

Happens... We all have attitudes (and bad days as well).

I'll check your references, Nika, but I may look for something more
recent, some experimental research results in a journal, maybe.

I'll explain myself. Number of necessary points to represent a
shape: the sound signal can be only approximated to a shape
with a certain error: from infinite number of harmonics to limited
number of harmonics (mathematically - chopped of Fourier series).
So, the shape itself is not perfectly known. To sum it up - to get
accuratelly more harmonics go up with the sampling frequency.
If we hear it or not, it's another matter (see below).

I agree, bit depth and the jitter are some of the error sources.
There's also AD non-linearity effect (chip quality). All these create
distortion of the original source, right ? My question is:
How the distortion parameters in audible range compare for 24/41
and 24/196 for the real chips we buy (not theoretically) ?
Unfortunatelly I don't read German...

Assuming the industry doesn't sell us crap, I agree with JoeH.
Overall distortions on the final product are defined by the worse
piece of equipmet (bad mic, pre, cables ecc). If you have a fine
setup all the way - go with higher bit depth and sampling
frequency. Just an opinion,

Costy.

anonymous Mon, 03/07/2005 - 17:33

OK, I'm done being an ass.

Sorry, guys. Just had a really awful day and was feeling particularly sardonic. I wrote an essay on another subject and just re-read it and eh gad!?

Anyway,

Nyquist was an engineer at Bell Labs back in the day - a pretty bright guy - invented the fax machine and all. Studying Fourier and LaPlace and Whittaker and the rest of the math geniuses of the century before, he devised a theory (that being an unproven hypothesis). Nyquist's theory was that there was a both necessary and sufficient amount of data needed in order to fully represent a waveform. In the beginning he (or actually his predecessors) thought that it could be done with only one sample per cycle, but this proved to be untrue. Nyquist proposed that if one were to record the amplitude of the waveform at even intervals of time, and if the frequency of those recordings was at twice the frequency of the highest frequency represented, then this would provide enough information about the waveform with which to fully reconstruct it.

According to Fourier, a waveform can be broken into (analyzed as) simple sine waves of various frequencies, each with a particular phase and each with a particular amplitude. We can do what is called a "Fourier Transform" on any waveform and reveal its frequency content and the subsequent phase and amplitude of each frequency component. What Nyquist proposes is that recording the waveform (the term "sampling" came later from N. Erd - what a name, eh?!) in these even intervals of time provides enough information such that doing a Fourier Transform on the results will equal EXACTLY the same Fourier Transform of the original waveform. In other words, the waveforms will match perfectly - they will contain exactly the same frequency content and at the exact same phase and amplitudes.

Now this is quite a theory, no? It doesn't "look" to make sense, does it? It looks like the resultant waveform from this "sampling" process has all kinds of errors - after all, some people have pointed out in this thread the apparent problem of a waveform close to half the sample frequency and how infrequently it will be sampled. At a CD's sample rate and recording a 15kHz sine wave we will have only roughly 3 samples per cycle. This couldn't possibly be enough with which to represent the waveform, can it?

Well it is indeed quite a theory, and this is why it took so long before a mathematician could provide a proof for it. But it did happen - fully 21 years after Nyquist's original paper on the subject. Claude Shannon, an engineer at AT&T (IIRC), provided a mathematical proof for Nyquist's theory in the Proceedings of the Institute for Radio Engineers in 1949. The proof states:

"Theorem 1: If a function f(t) contains no frequencies higher than W cps, it is completely determined by giving its ordinates at a series of points spaced 1/2W seconds apart"

Notice that he says the word "completely." This mathematician of international repute, who has offered hundreds of papers and mathematical proofs and has work that has been regarded by many as "nobel worthy" used the key word "completely," implying that it is possible to recreate the original waveform with only the samples at those even intervals.

Time for the requisite bad analogy:

I put you in a car on the streets of Manhattan and I put a governor on the car, locking the speed to exactly 30mph. New York is a beautiful town in that the city blocks are all pretty much the same size. I cut you loose to drive the streets of New York. In the meantime, I am going to record your movements for posterity, flying overhead in a helicopter taking photographs. How many photographs do I need to take in order to completely accurately retrace your path on the streets of New York? Do I need to take a video camera or can I save some data and take pictures less frequently? Indeed I only need to take one picture per city block. I could take more, but it is unnecessary. If I calculate how fast your car can cover a city block going its 30mph then I know that that is the fastest I need to take pictures - so long as I take them in even intervals of time. Right? Why does this work? Because there are TWO crucial pieces of information that I have available to me when I go to retrace the route. One piece of information is the photographs. They tell me where you were at what times, etc. The other piece of information, however, is the knowledge of the fact that the car can go no slower nor faster than 30mph. THAT piece of information is critical.

Another analogy. If I have a circle on a piece of graph paper I can fully represent the circle by giving three coordinates on the circle's circumference. That is all I need to give you in order to completely recreate the circle. This is because there are actually TWO crucial pieces of information. One is the three coordinates. The other is the knowledge that this represents a circle.

In the world of waveform theory there are two critical pieces of information that are at work. One is the samples themselves, but the other is the crucial piece of information that these samples represent a waveform that contains no frequencies greater than half the sample rate. That piece of information GREATLY (in fact "infinitely") refines the characteristics of any resulting waveform.

In order to recreate the waveform certainly some work needs to be done. We must filter all of the information above the Nyquist frequency - this is a given. We cannot produce the same waveform if it contains frequencies that the original waveform did not contain. So we filter to only contain the same frequencies. Filtering is a rather complex mathematical process involving (in this case) "look ahead" and "look behind" analysis of the samples such that only one possible result remains. Sure, those three samples on a 15kHz waveform don't tell us very much, but we use filters that look well into the future and well into the past in order to get a complete picture of what is happening such that the waveform that passes through those three sample points becomes infinitely clear: it is the same shape, amplitude and size as the original waveform that was captured.

Take the pathological case that the waveform only contains a single cycle and there is no information ahead or behind which to look. It's just a 15kHz sine wave floating in space. This is not, actually, a 15kHz sine wave. It is actually a complex waveform that indeed contains a lot of 15kHz energy, but contains an infinite supply of other frequencies as well - for no true sine wave can "start" or "stop" without the implied presence of other frequencies. Therefore, this is not a 15kHz sine wave and can't actually be sampled with a 44.1kS/s sample rate. We would first have to filter it to contain nothing above ~20kHz before we could sample it, and guess what happens when we do that? When we filter it we make the waveform much, much longer, such that suddenly we DO have look ahead and look behind information. This filtering works in much the same way as the human ear works when it filters. I know this is getting complicated and all, but that's part of the reason it took 21 years for the theory to be proven correct in the first place! This is not lightweight, sophomore in college level mathematics. It is pretty heavy stuff that involves lots of math with squiggly lines and greek letters and stuff that I don't understand still.

In the end, however, one thing that is not disputable is that Nyquist was correct in his theory and Shannon, having proven it mathematically, made it into a "theorem." And it is correct. And if it weren't, satellites would come crashing from the sky, the internet wouldn't work, planes would never take off, and telephones would still require a "dial." The entire telecommunications industry is based on the factuality of Shannon and Nyquist, not just our piddly little industry.

I hope that helps. I'll now remove my other posts so nobody else knows what an ass I can be when I'm having a bad day.

Cheers!
Nika

anonymous Mon, 03/07/2005 - 17:47

Costy wrote: I'll explain myself. Number of necessary points to represent a
shape: the sound signal can be only approximated to a shape
with a certain error: from infinite number of harmonics to limited
number of harmonics (mathematically - chopped of Fourier series).
So, the shape itself is not perfectly known. To sum it up - to get
accuratelly more harmonics go up with the sampling frequency.
If we hear it or not, it's another matter (see below).

Costy,

This is true - a certain amount of error is engendered in the process simply from the error of quantizing the waveform to specific amplitude values. We further engender error because of the induction of noise in the signal due to thermal noise in analog components, etc.

The harmonics issue, however, is unrelated. If you do a Fourier Transform of the waveform in question you will find frequencies present from around 0Hz to perhaps as high as 100kHz (depending on the instrumentation). The ear, however, is basically a Fourier Transform box, and it has limitations. The ear cannot hear above around 20kHz and it cannot hear below something like 20Hz, so the first thing we can do is lop all of those frequencies off the top and bottom of the scale. This is, after all, the first thing the ear will do before encoding the waveform (digitally) to send it to the brain. Therefore, additional frequencies over 20kHz present in the waveform do not affect the way in which the ear hears the data, and thus they can be removed without consequence.

I agree, bit depth and the jitter are some of the error sources.
There's also AD non-linearity effect (chip quality). All these create
distortion of the original source, right ? My question is:
How the distortion parameters in audible range compare for 24/41
and 24/196 for the real chips we buy (not theoretically) ?

The distortion is the same, and it is the same for a very simple reason: they are the same chips. The samples are not actually taken at 44.1kS/s or 96kS/s, etc. The samples are originally taken at around 2.8224MS/s and then downsampled (and filtered) down to the desired rate. This means that all sample rates start with the same data and the difference is just in the filtering used. Distortion is the same. The difference between sample rates lies in the filters used. If poor filters are used for the 44.1kS/s downsampling process then the higher sample rates may sound better as the filters operate much higher than the audible range.

Having said this, there are potential problems that can result because of allowing all of that high frequency stuff through: your speakers weren't designed to reproduce it linearly. As such, you can create audible distortion by shoving inaudible material into your speakers.

Nika

anonymous Mon, 03/07/2005 - 18:38

i understand what your saying about sampling the root and all the harmonic freq's from a composite waveform and then rebuilding the wave from the "basic freq's", but a composite wave can have many many many multiple wave's that make up that specific wave at that specific moment, yeilding a verry verry complicated waveform with many harmonics and overtones. think about vocals being recorded in a room and just the complexity of the voice and how many base waves make up that final composite wave that makes the voice, then sum in reverb from the room which in it'self is a complex composite wave. You also have to consider that these waves all started out as "analog vibrations" and in themselves they have an analog algorythm that they follow upon to sum up the final composite wave. your talking about a verry complicated structure that you want to break up. you would still need the analog "smoothness" of each root wave to properly sum up the final composite wave. which to me i cant understand how you would sample the smoothness of those root analog waves one sample at a time? Mabyee theres something there your talking about im not getting?

anonymous Mon, 03/07/2005 - 19:41

Nika,

Thanks for taking time to reply. I get your points. However, I still
have a doubt about 20 kHz lop-off.

Thanks for answering on the distortion. Nice to know it's the same.
And yes, I agree, good speakers are a must.

Does any one has good enough equipment to actually hear the
difference let's say between 24/44.1 and 24/96 ? Is any audiophil
out there ?

Cheers,
Costy.

anonymous Mon, 03/07/2005 - 19:47

Costy wrote: Nika,

Thanks for taking time to reply. I get your points. But...
Since waves interfere,

Waves don't "interfere." They "sum." Put a 10kHz sine wave into an acoustical space with some 1kHz sine wave and you don't get anything other than 1kHz and 10kHz. Add some reverb and such and you still only get 1kHz and 10kHz.

The only way in which you get "interference" (or what we call "distortion" - or the creation of additional frequency content that is not in the original) is when you do this in a non-linear environment. Non-linear environments include amplifiers, speakers, tape, tubes, etc. Air and acoustical environments are not non-linear (OK, they are, but throughout the amplitude range we're describing, air is notably linear - enough so that audible distortion is simply not created).

I suspect that 20K lop-off may affect the
overtones (as Perfectwave noted above).

If the overtones are above 20kHz then they are inaudible and can be lopped off with no effect on the audible range.

Nika

anonymous Mon, 03/07/2005 - 19:52

perfectwave wrote: i understand what your saying about sampling the root and all the harmonic freq's from a composite waveform and then rebuilding the wave from the "basic freq's", but a composite wave can have many many many multiple wave's that make up that specific wave at that specific moment, yeilding a verry verry complicated waveform with many harmonics and overtones.

Yes, of course. So take all of that complex stuff going on and in the end what you have is still a waveform. It is a 5 minute waveform that is a pop song or whatever, but it is still a succinct and discreet waveform. As such, it has a Fourier Transform, and as such it adheres to the Nyquist theorem just like any other waveform.

think about vocals being recorded in a room and just the complexity of the voice and how many base waves make up that final composite wave that makes the voice, then sum in reverb from the room which in it'self is a complex composite wave. You also have to consider that these waves all started out as "analog vibrations" and in themselves they have an analog algorythm that they follow upon to sum up the final composite wave. your talking about a verry complicated structure that you want to break up.

No, we're not "breaking it up." We're simply going to represent it with the least amount of data necessary to accurately reproduce it later.

you would still need the analog "smoothness" of each root wave to properly sum up the final composite wave.

No, it is not a "root" wave and then other waves. It is a single, lengthy waveform, containing frequencies at various amplitudes and phase.

Nika

kingfrog Mon, 03/07/2005 - 22:55

My question is why when we only had 16/44 did anyone wish we could have 24/192?

Just because its there...does not mean we can hear the difference.
In 2 years it will be 64/284 and we'll wonder how we ever recorded with 24/96.......javascript:emoticon(':?')Confused

Yeah you can look at the differences on an Oscope all day long but There is a point of diminishing returns aurally and I believe we are there..

anonymous Tue, 03/08/2005 - 02:10

nika:

first you wrote:

We can do what is called a "Fourier Transform" on any waveform and REVEAL IT'S FREQUENCY CONTENT and the subsequent phase and amplitude of each frequency component

then you responded to me:

No, we're not "breaking it up." We're simply going to represent it with the least amount of data necessary to accurately reproduce it later.

?????????

another question is what is the benefit of your theory? to sample a waveform, add more degrading mathmatics (fourier transformation)
bear in mind that you are degrading the information becuase the bitrate is still there even though there is "less" information after the transformation.
so the final outcome has less digital information...i dont know if im in for the trade off. ill stick with filling up my hard drives keeping the information as "unprocessed" as possible, besides unlike our california real eastate, they seem to be going down in price.

anonymous Tue, 03/08/2005 - 06:36

Hey Nika,

I think, I understand now where my doubt came from. The theory
you've described (and theorem) deals only with sine wave forms.
Plainly speaking, they can represent EXACTLY only smooth
functions (differential exist everywhere). The Nature is usually
more complicated. I'll use examples similar to yours:

Example-1. 3 sound sources have have the same frequency, sine
wave forms. They add up - frequency is the same, sine form of a
larger amplitude. Using your analogy of circle - each wave needed
3 points for definition, resultive wave need also only 3 points for
definition (still a circle). A nice theory case.

Example-2: 3 sound sources have the same frequency, but the
shape are different: sine, square and triangular (saw) wave forms.
They add up - the frequency of resultive wave is the same, but the
amplitude will look funny. This would be a "circle" with bumps and
peaks. How many points we need now to define it accurately?
More than three, right ? The Fourie Transform can still be used,
but only splitting resultive wave into smooth subintervals (between
peaks, Dirichlet's theorem). And I'm not sure that any of commertial
AD converters does it.

My point - the AD convertion even with low sampling rate is quite
accurate on sine wave sound sources (triange, chimes, bells ecc).
For AD convertion of sound source that has non-sine form
components (buzzing string, symbal crash, dirtorted guitar ecc)
more info is required, hense the higher sampling rate.

Do we hear it ? I think, some of us does, with the right equipment
of course. The current industry standards are defined not only
by the Nysquit Theorem, but the resolution of the cheap solid-state
lasers, ADC chips available back in 80's. Cost - quality ballance,
business as usual. Now they may as well change it to 24/48 to
everybody's benefit (and we need to dither no more).
Just what I think,

Costy.

anonymous Tue, 03/08/2005 - 08:35

Costy wrote: Hey Nika,

I think, I understand now where my doubt came from. The theory
you've described (and theorem) deals only with sine wave forms.

...

My point - the AD convertion even with low sampling rate is quite
accurate on sine wave sound sources (triange, chimes, bells ecc).
For AD convertion of sound source that has non-sine form
components (buzzing string, symbal crash, dirtorted guitar ecc)
more info is required, hense the higher sampling rate.

Costy,

Go back and read the theorem again. It does not refer to sine wave sources. It does not refer to sources that seem "pretty sinusoidal." It does not say that it "completely represents the waveform (except for ones that aren't sinusoidal)" or "(except for cymbals)" or "(except for it adds distortion)" or "(except for the fact that it doesn't)." The theorem states that if you have a waveform that contains no frequencies above x that you can COMPLETELY represent it (that means 100% accurate frequency, amplitude, and phase) with samples given at greater than 2x. COMPLETELY! You are trying to argue that this theorem is not correct - that it has flaws - that there are exceptions to it - that Nyquist and Shannon didn't know what they were talking about, etc. Before you go down that path I would highly advise reading the theorem and understanding the math involved (if you doubt my explanation fits the "complex waveform" application).

Again, it took 21 years for a mathematician to prove this theory correct. If you now want to dispute it I think you have a lot of work carved out for yourself. The theorem is correct.

Nika

P.S. Just a mental exercise for you. There are papers written on this: If you have a waveform that contains no frequencies, say, lower than 1KHz and no higher than 1100Hz what is the minimum sample frequency that you can use with which to accurately represent this waveform?

anonymous Tue, 03/08/2005 - 08:38

perfectwave wrote: nika:

first you wrote:

We can do what is called a "Fourier Transform" on any waveform and REVEAL IT'S FREQUENCY CONTENT and the subsequent phase and amplitude of each frequency component

then you responded to me:

No, we're not "breaking it up." We're simply going to represent it with the least amount of data necessary to accurately reproduce it later.

I only explained that you can do the Fourier Transform as a proof that the original waveform and the resultant waveform are the same. We don't actually do the Transform during the conversion process. We simply filter, then sample the material. Then we convert it back and filter it and we end up with the same thing. IF we did an FT on it at the beginning and the end we'd find out that they are indeed the same.

another question is what is the benefit of your theory? to sample a waveform, add more degrading mathmatics (fourier transformation)
bear in mind that you are degrading the information becuase the bitrate is still there even though there is "less" information after the transformation.

I didn't follow this part. Sorry.

Nika

anonymous Tue, 03/08/2005 - 08:41

kingfrog wrote: My question is why when we only had 16/44 did anyone wish we could have 24/192?

There are good and complete answers to this question and I have explained them in my book.

The short version is that there were a lot of problems with converters in the old days. Increasing the sample rate was (on its surface) the easiest way of fixing those problems. Thus begat the push for the higher sample rates. In the meantime, the brains at the converter companies fixed the problems in other ways, negating the need for the higher rates. By the time the higher rates were available they had almost become obsolete.

Nika

anonymous Tue, 03/08/2005 - 08:50

Costy,

I have one more way to think of the Nyquist Theorem. What he is essentially saying is the following:

If you have a series of sample data spaced even intervals apart there is only ONE way in which you could draw a line through them that yields a waveform that is frequency bounded to the legal range. ANY other way you draw the line through those sample points has frequency content above the legal range.

Nika

anonymous Tue, 03/08/2005 - 12:38

Nika (guys, skip it if you find it boring),

Here you go - there're some "IFs" finally. You surely meant "IF
frequency bounded to legal range"? But here are some more IFs...

Quotes from Wikipedia on Nyquist-Shannon Theorem:
--------------------------------------------------------------------

Q1. IMPORTANT NOTE: This theorem is commonly misstated /
misunderstood (or even mistaught). The sampling rate must be
greater than twice the signal bandwidth, not the maximum /
highest frequency.

Comment: note the "greater".

Q2. Mathematically, the theorem is formulated as a statement
about the Fourier transformation.

Comment: means here = continuous Fourier transformation.

Q3. Theorem: If a function s(x) has a Fourier transform
F[s(x)] = S(f) = 0 for |f| >= W, then it is completely determined
by giving the value of the function at a series of points spaced
1/(2W) apart. The values sn = s(n/(2W)) are called the samples
of s(x).

Comment: note the "If" here.

Q4. If S(f) = 0 for |f| > W, then s(x) can be recovered from its
samples by the Nyquist-Shannon interpolation formula.

Comment: note the "If" again. Also the interpolation formula
looks to me as a very Fourie-like (obviously).

Q5. The proof starts with: To prove the theorem, consider two
continuous signals: f(t) and d(t) (Dirac comb)...

Comment: note "continuous" and the ideal sampling function
based on the Dirac's Delta.

Every theorem holds only in its domain based on its axioms or
definitions (read all the "ifs" above). The Nyquist-Shannon
Theorem has plenty of them. That's what I tried to illustrate
using your "circle" examples (see post above).

Nika, I don't argue that the theorem is incorrect (and I'm very sorry
for Shannon that it took him so long to prove it).
My opinion is this:
After all the IFs were satisfied (read as the reality idealized), the
theorem helped to estimate the optimal sampling frequency rate
for industial CD standards 16/44.1 (acceptable accuracy of sound
reproduction for general customer). Now, if you idealize the reality
"better", apply the same theorem, you get a different standard
(which may be too expensive for general customer).

Costy.

anonymous Tue, 03/08/2005 - 13:01

Costy wrote: Q1. IMPORTANT NOTE: This theorem is commonly misstated /
misunderstood (or even mistaught). The sampling rate must be
greater than twice the signal bandwidth, not the maximum /
highest frequency.

This is not correct. It only needs to be at double, however if it is only at double then the samples must be taken for an infinitely long period of time. For practical applications the sampling rate should indeed be greater than double the highest frequency present. A 20kHz sine wave can be recorded with a 40.00000000001kS/s sample rate. Fortunately, in audio, we use 44.1kS/s so that recording 20kHz should not be problematic - pending converter implementation.

Q2. Mathematically, the theorem is formulated as a statement
about the Fourier transformation.

Comment: means here = continuous Fourier transformation.

Not correct. The formula is often stated in terms of the Fourier Transform - just as I did above. "In effect, take a waveform (which has a discreet and unique FT), convert it and the DFT of the sampled version will equal the FT of the original. Reconstruct the sampled version and the FT of the result will be the same as the FT of the original and will overlay on top of the DFT of the sampled version. Since each waveform has a unique and discreet FT this means that the waveforms are the same." This is an example of discussing the theorem in terms of the FT, but means absolutely nothing about "continuous Fourier transformation." There is no Fourier Transform done in the sampling process - only in the analysis of why it works.

Q3. Theorem: If a function s(x) has a Fourier transform
F[s(x)] = S(f) = 0 for |f| >= W, then it is completely determined
by giving the value of the function at a series of points spaced
1/(2W) apart. The values sn = s(n/(2W)) are called the samples
of s(x).

Not all waveforms have Fourier Transforms (think outside of the box - imaginary numbers, waveforms with multiple y values for a given x value, etc). Thus the "if" in this statement. All AUDIO waveforms have FTs, however. This is just mathematical phraseology and should not obfuscate the point. Take a waveform with bounded frequency range and sample it at twice the highest frequency. It will be COMPLETELY DETERMINED by that sample data.

Q5. The proof starts with: To prove the theorem, consider two
continuous signals: f(t) and d(t) (Dirac comb)...

Comment: note "continuous" and the ideal sampling function
based on the Dirac's Delta.

Not sure where you're going with this one at all. Yes, the waveforms have to be continuous - there has to be one and only one y value per x value (like in audio waveforms) for this to work.

Every theorem holds only in its domain based on its axioms or
definitions (read all the "ifs" above). The Nyquist-Shannon
Theorem has plenty of them.

Yes, of course, and audio waveforms meet all of them.

Nika

anonymous Tue, 03/08/2005 - 13:34

Yes, absolutely. Audio waveforms are continuous, they don't have multiple values of y per value of x, they don't have imaginary numbers, etc. All audio waveforms have Fourier Transforms. Ergo, all audio waveforms can be represented by appropriate sampling with Complete Determinability.

BTW, did you figure out the problem I gave you above? The hint is in your previous post to me - with all of the "ifs."

Nika

anonymous Tue, 03/08/2005 - 14:53

Costy wrote:

did you figure out the problem I gave you above?

Do you mean 2.2 kHz ?

No. The hint was in a reference you gave to wikipedia:

"This theorem is commonly misstated/misunderstood (or even mistaught). The sampling rate must be greater than twice the signal bandwidth, not the maximum/highest frequency."

If you want to represent a waveform that contains no frequencies south of 1kHz and none north of 1100Hz you can completely represent these with a 200Hz sample rate (not 2200Hz). We only need to sample at twice the bandwidth, not the highest frequency.

This principle is used in other scientific fields fairly frequently.

Nika