Skip to main content

Fixed v. Floating

Member for

21 years 2 months
David,

It seems that the other thread was shut down right as I joined. I didn't read the whole thing, but after seven pages I suppose it makes sense that the perception was that it wasn't going anywhere.

Having said this, the topic we just ventured into about fixed/floating point operations I consider to be a significant one. If you (or a lurker at large) are interested in exploring this further please let me know and I will be happy to discuss this further.

Cheers!
Nika

Comments

Member for

21 years 2 months

Pro Audio Guest Sat, 02/12/2005 - 14:52
Hmm. I find it unfortunate that people are so offended by a discussion of such matters? As what's-his-name said in the play, 1776, "I can't think of any topic of conversation so taboo that we can't at least TALK about it!" In any event, I understand. Contact me if you have questions. I'd be happy to help any way I can.

BTW, I was heavily involved in the AWESOME, DAW-SUM test you referenced. It was a good test to check how mixers perform at unity, but the issues we were discussing occur when you get away from unity - like when you mix audio.

Cheers! It's nice to be here.

Nika

Member for

20 years 9 months

FifthCircle Sun, 02/13/2005 - 00:11
It isn't being offended by the conversation... It is the fact that the arguments are bording on personal and the math and logic is being used to show why the other side is full of it.

This is a very hot topic throughout the internet and I have absolutely no problems with informed fact. However, many of the discussions go from fact to "it is better because I say so." That is something i cannot take anymore. If you must discuss that way, take it elsewhere. If you can say for certain how a specific DAW sums and why it is better, than have at it.

I haven't had permission to post it here, but for the Samplitude users out there, a mention on this issue was posted on the users forum and it was very well written. As I said in the last thread, it basically said that both sides of the argument were correct to a certain extent.

Nika- I'm sorry I shut the last discussion down right after you arrived, but after 7 pages, enough was enough. If you have something useful here to say for the benefit of the community, please don't hesitate to say it. If this thread gets even close to the last one in the tone of it, I will shut it down for good. However, until that happens, feel free to post.

--Ben

Member for

17 years 8 months

Cucco Mon, 02/14/2005 - 10:57
Nika,

I don't want you to think I was offended by your remarks on the SACD post - merely I like to make it abundantly clear that numbers on paper, while they play a key part of making sound and making it sound good, sometimes anomolies on paper or "the truth" don't bear out in a recording.

Trust me, the only way you could offend me would be to insult my wife or my dog.

Thanks,

J...

Member for

16 years 10 months

DavidSpearritt Tue, 02/15/2005 - 04:03
Nika

I have read your excellent paper. It explains things very well. I understand your basic premise that in a floating point representation quantisation error is manifest as correlated distortion whereas in a fixed point system it is uncorrelated noise and this must be an improvement in sound quality. But is this significant, except for very very low volumes. Your statement that a fixed point system has infinite dynamic range but nearly all of it is below the noise floor was quite funny to me.

This seems a strong argument for fixed point representation, but I am disturbed by your examples and the significance of them.

I would be very obliged if you could address the following issues:

1. 32bit floating point applies dynamic range accuracy where it is needed, ie it splits up whatever the FS signal amplitude is into 2^24 steps. If the FS amplitude is tiny say 1mV, then it splits 1mV to zero up into 16777216 steps. If the FS is 10V, it splits 10V to zero up into the same number of steps. The quantisation error is equally insignificant despite the huge change in level.

2. You indicate that the problem arises when you try and add a tiny signal to a big signal, that the tiny signal precision is lost in FP, but this is what happens in real life, the big signal totally swamps the tiny one and masks it out completely.

3. A well engineered recording is always using very healthy signal levels, I am never adding or mixing stuff -120dB down with a -10dB down signal. I am mixing -10dB with -10dB and so the accuracy of FP is the essentially the same as fixed point. Most of my mixing is nearly always done at unity gain.

4. Float gives you headroom without unecessary intermediate gain changes. What happens if you add a 0dB fixed point signal to a 0dB fixed point signal, you must have to reduce gain of both first before the addition to avoid >0dB overload in the result. In a floating point system, you simply add them and get +3dB. Then you can continue all your processing and gain reduce once, and dither right at the end. This is user friendlyness built in.

5. In classical recording, the lowest level signals rarely drop below -50dBFS and self noise of mics and preamps is around -70dBFS, we are never down near the digital noise floor so I find your examples not in tune with reality. These noise sources provide some natural dither that is uncorrelated with the music, making the use of FP system much more immune to the problems you mention. One of your main conclusions about whether the ear can perceive these tiny errors is very telling.

6. Floating point has been chosen by most of the big name DAW's, it must be because most people are operating at or near the top of the digital FS range all the time and need to master >0dB in mixing and plugin calcs. You can do all this easily and fast with a FP system. Surely, greater and more complex DSP is required to get this right with a fixed point system.

7. When 64bit PC's become common, 64bit double float would be used without reservation, I suspect you would not be arguing any of this. It would be used in preference to a fixed point system, due to the operation and calculation simplicity for >0dB results.

8. I have never thought that dithering back to fixed point in the middle of a mastering chain is ever a good idea. As soon as possible its straight into 32bit float and there it stays until dithering at the end. Going out to analog devices or dithering back to ints for outboard digital processors before coming back in again and back to float before the final master stages seems very silly to me. This seems to be where some of your emphasis on errors in the floating point system arises.

Nika, this paper is very well written and I learned a lot. But I am a practical person and believe that your points are referring to the exception rather than the rule in good mastering, I have healthy signals always near FS, good natural dither, unity or near unity mix gains and keep everything always throughout in 32bit float from beginning to end.

I am very interested in your comments and am learning a lot and for that I am grateful.

Member for

21 years 2 months

Pro Audio Guest Tue, 02/15/2005 - 07:56
David,

That is certainly a healthy reply. Allow me to tackle some of the issues:

DavidSpearritt wrote: Your statement that a fixed point system has infinite dynamic range but nearly all of it is below the noise floor was quite funny to me.

I believe it is a significant point that many people don't understand. Any activity that happens - even less than the LSB - still gets captured and represented in the remaining bits of the system. This is part of what I told you the other day - if you create a 24 bit fixed point system and record a signal at -160dB then truncate to 24 bits, that signal will still be present - in fact relatively obvious upon normalizing and listening. Nothing that occurs at any level in a fixed point system is ever lost, and if those low level signals can have an effect on later processing they can and do. This is why I was telling you that fixed point systems actually have much better precision than floating point systems - because the data that extends an infinite number of bits to the right is still maintained even at, say, 2 bits. This is not true with floating point. You have instantaneous precision of x number of bits out to the right at any one time, but this is the extent of the precision. And while that precision exists sans noise, the behavior of this system creates distortion that erodes its precision.

1. 32bit floating point applies dynamic range accuracy where it is needed, ie it splits up whatever the FS signal amplitude is into 2^24 steps. If the FS amplitude is tiny say 1mV, then it splits 1mV to zero up into 16777216 steps. If the FS is 10V, it splits 10V to zero up into the same number of steps. The quantisation error is equally insignificant despite the huge change in level.

You write that the accuracy is "where it is needed." Why does that low level signal need that accuracy? And why does it need this accuracy at the risk of distortion? If we determine that the audio signal needs x bits of statistical precision (dB of dynamic range) then why would the larger signal NOT need that accuracy? If we determine that we don't actually need that dynamic range then why do we apply the extra data to the smaller value, negating our ability to properly dither the results, thereby increasing the distortion of the system? You are drawing a logical deduction, here, that more bits to the right is beneficial somehow, and that greater instantaneous precision is beneficial, even if that means different words have different dynamic ranges. I would challenge that this logical deduction may not be completely accurate.

2. You indicate that the problem arises when you try and add a tiny signal to a big signal, that the tiny signal precision is lost in FP, but this is what happens in real life, the big signal totally swamps the tiny one and masks it out completely.

Absolutely not! The small signal has some effect on the big signal. In waveforms that "some effect" would manifest itself over time. If I add a number with an amplitude 1/100th the value of the LSB to a larger number it will, 1 time out of 100 (statistically speaking), have an effect on the value of the larger number, thereby maintaining the integral of the smaller value in the system, which, when averaged out by the time-function of our hearing, manifests itself as the presence of the smaller signal.

The degree to which the larger signal "totally swamps" the smaller signal is a factor of the ear and our hearing, not the signals themselves. Just because the ear has the ability to mask signals that meet certain requirements does not mean that the signal was not still there. With digital signal processing we could filter and reveal for you the smaller signal, mimicing the filtering of the ear if the ear's filters are not fine enough.

With fixed point processing, as with in the real, natural, acoustic world, when smaller signals are summed with larger signals in the presence of noise the smaller signals still have an effect. The degree to which this is significant depends on future processing done to the signal and the limitations of the receiver (in this case, the ear).

Floating point systems do not allow this to happen, making them UNnatural and NOT reflective of what happens in the acoustical world (real life).

3. A well engineered recording is always using very healthy signal levels, I am never adding or mixing stuff -120dB down with a -10dB down signal. I am mixing -10dB with -10dB and so the accuracy of FP is the essentially the same as fixed point. Most of my mixing is nearly always done at unity gain.

While you may not be mixing stuff -120dB down, any signal you put into your system will have values that get down to that level, no? The zero crossing represents -infinity dB. Does your signal never cross zero? At some point you are mixing smaller values with larger values, even if the RMS value or the peak values of the waveform represented by your sample values is fairly high.

4. Float gives you headroom without unecessary intermediate gain changes. What happens if you add a 0dB fixed point signal to a 0dB fixed point signal, you must have to reduce gain of both first before the addition to avoid >0dB overload in the result.

Not at all. In Protools TDM, for example, the internal architecture provides something like 48dB of internal headroom in the mixing bus. This means you can sum something like 64 full scale tracks together, each hitting full scale at the same time without A. clipping the system, and B. reducing the faders for these tracks. Then you gain-reduce once, at the end, just like in your floating point system.

5. In classical recording, the lowest level signals rarely drop below -50dBFS and self noise of mics and preamps is around -70dBFS, we are never down near the digital noise floor so I find your examples not in tune with reality. These noise sources provide some natural dither that is uncorrelated with the music, making the use of FP system much more immune to the problems you mention. One of your main conclusions about whether the ear can perceive these tiny errors is very telling.

Two answers: 1. If this is your situation - the one you describe. If you only really deal with around 70dB of dynamic range, then what, exactly, is the benefit of floating point systems? It sounds like you're saying, "well floating point is, in practical terms, not actually worse, is it?"

2. The problem is certainly minimal with simple summing like this. The problem becomes more consequential during processing stages, where coefficients are calculated, etc.

6. Floating point has been chosen by most of the big name DAW's, it must be because most people are operating at or near the top of the digital FS range all the time and need to master >0dB in mixing and plugin calcs.

[[url=http://[/URL]="http://en.wikipedia…"]Cum hoc ergo propter hoc[/]="http://en.wikipedia…"]Cum hoc ergo propter hoc[/]

You can do all this easily and fast with a FP system. Surely, greater and more complex DSP is required to get this right with a fixed point system.

Not at all. See my post above regarding headroom in fixed point systems.

7. When 64bit PC's become common, 64bit double float would be used without reservation, I suspect you would not be arguing any of this. It would be used in preference to a fixed point system, due to the operation and calculation simplicity for >0dB results.

No. See my post above on headroom in fixed point systems.

8. I have never thought that dithering back to fixed point in the middle of a mastering chain is ever a good idea. As soon as possible its straight into 32bit float and there it stays until dithering at the end. Going out to analog devices or dithering back to ints for outboard digital processors before coming back in again and back to float before the final master stages seems very silly to me. This seems to be where some of your emphasis on errors in the floating point system arises.

My emphasis is split. There is a problem merely with dithering after every calc, independently. Then there is also a problem with dithering to the final 24 bit stage. They are two separate but related issues.

I hope this helps?

Nika

Member for

16 years 10 months

DavidSpearritt Tue, 02/15/2005 - 13:08
I believe it is a significant point that many people don't understand. Any activity that happens - even less than the LSB - still gets captured and represented in the remaining bits of the system. This is part of what I told you the other day - if you create a 24 bit fixed point system and record a signal at -160dB then truncate to 24 bits, that signal will still be present - in fact relatively obvious upon normalizing and listening. Nothing that occurs at any level in a fixed point system is ever lost, and if those low level signals can have an effect on later processing they can and do. This is why I was telling you that fixed point systems actually have much better precision than floating point systems - because the data that extends an infinite number of bits to the right is still maintained even at, say, 2 bits.

Nika, thanks for more great info. But this just floors me. This is almost magic at work, ie the fact that you can record something outside the scope of resolution, then normalise and hear it. I am trying to think how I can do this test, you obviously have. What does it sound like, I assume you have listened to sine waves or something. What about a Mozart Symphony as a test signal??? Surely this will not be recorded with any sort of accuracy at -160dB. Please tell me it is so. :shock:

Member for

21 years 2 months

Pro Audio Guest Tue, 02/15/2005 - 13:36
David,

Try this. Take a beautiful piece of classical music. Put it up at full scale. Now take a noise generator and mix noise in at a level of -6dBFS. Listen to your music. You're hearing a lot of music, deep down into the noisefloor, aren't you?

OK, now drop your master fader to -132dB and bounce this to disk as a 24 bit fixed point file. You now have exactly two bits worth of data in that file, yes?

Open that up in a new session. Normalize the result. Look at it. You should have very few quantization steps, no? Perhaps - 4? or 5? Play the results. How does it sound?

It will probably not sound as good as the original pass at this, before you bounced to disk, etc., because of the specific problems we're addressing, but you get the picture. You are now representing all of that material with only 2 bits. Stuff that happened well, well below the noisefloor is still there and still having an effect but it is masked by noise and perhaps some distortion at a certain level. In this case you are probably hearing stuff at least 15 dB below the noisefloor.

Take this a step further and you can figure out where DSD comes from.

Cheers!

Nika

Member for

16 years 10 months

DavidSpearritt Thu, 02/17/2005 - 03:00
Nika, I did your experiment and was amazed, but I also did this experiment too.

1. Take a string quartet recording, adjust master gain to -96dB, render to 24bit fixed file, called A.WAV. Brought A.WAV into new session, normalised, listened, sounded pretty bad. Dropouts lots of distortion.

2. Take the same string quartet recording, adjust master gain to -96dB, render to 32bit float file, called B.WAV. Brought B.WAV into new session, normalised, listened, sounded pretty damn fine. Very like the original. Is this why the big DAWS are using 32bit float?

Explanation? Absence of dither?

Member for

21 years 2 months

Pro Audio Guest Thu, 02/17/2005 - 08:22
DavidSpearritt wrote: Nika, I did your experiment and was amazed, but I also did this experiment too.

1. Take a string quartet recording, adjust master gain to -96dB, render to 24bit fixed file, called A.WAV. Brought A.WAV into new session, normalised, listened, sounded pretty bad. Dropouts lots of distortion.

Yes, you are hearing the distortion due to the lack of dither in the floating point system. If there was proper dither you would hear absolutely zero distortion, but the noise level would be fairly high (-48dB FS). What you are hearing as distortion does not happen in a properly designed fixed point environment.

2. Take the same string quartet recording, adjust master gain to -96dB, render to 32bit float file, called B.WAV. Brought B.WAV into new session, normalised, listened, sounded pretty damn fine.

Yes, of course. This is because you avoided the damage inflicted by dithering to 24 bits by keeping the file in the 32 bit floating point format. This wasn't really much of a "bounce" at all. The damage you gave in example 1 happens every time you bounce a file to 24 bits, though most of the time that distortion is low enough you aren't hearing it. The same damage actually happened to this file - you distorted the file at -144dB FS or so, but you aren't hearing it because it is pretty well masked.

Is this why the big DAWS are using 32bit float?

No. They probably use floating point because on a typical PC computer it is easier to program for and does not rob the computer of cycles needed for other activities by using the CPU. As Steve Powell speculated yesterday, a computer simply couldn't do what it does with audio if we did it fixed point. They have to use the FPU which is essentially just sitting there, unused. Protools uses fixed point, but they had to add all kinds of CPU power (much like the Powercore card) in order to be able to do fixed point processing throughout.

Member for

21 years 2 months

Pro Audio Guest Thu, 02/17/2005 - 09:47
Nika wrote: [quote=DavidSpearritt]
Is this why the big DAWS are using 32bit float?

No. They probably use floating point because on a typical PC computer it is easier to program for and does not rob the computer of cycles needed for other activities by using the CPU. As Steve Powell speculated yesterday, a computer simply couldn't do what it does with audio if we did it fixed point. They have to use the FPU which is essentially just sitting there, unused. Protools uses fixed point, but they had to add all kinds of CPU power (much like the Powercore card) in order to be able to do fixed point processing throughout.
That's interesting. That might explain why the discontinued Emu/Ensoniq Paris system was able to use fixed point (56 bit, I think). It runs mainly on DSP cards, and only uses the host CPU to run the GUI and any native-based VST and DX effects.

Paris does have a very nice sound. It's no longer the main part of my setup (mainly due to lack of Midi support and easy integration with other DAW's), but I still keep a Paris system for tracking and overdubs, since it's a near-zero latency system.

Mike Barrs

Member for

16 years 10 months

DavidSpearritt Thu, 02/17/2005 - 12:38
Nika wrote: Yes, you are hearing the distortion due to the lack of dither in the floating point system. If there was proper dither you would hear absolutely zero distortion, but the noise level would be fairly high (-48dB FS). What you are hearing as distortion does not happen in a properly designed fixed point environment.

I'll have to take your word for it, since I cannot do the gain change in fixed point math on my DAW.

Yes, of course. This is because you avoided the damage inflicted by dithering to 24 bits by keeping the file in the 32 bit floating point format. This wasn't really much of a "bounce" at all. The damage you gave in example 1 happens every time you bounce a file to 24 bits, though most of the time that distortion is low enough you aren't hearing it. The same damage actually happened to this file - you distorted the file at -144dB FS or so, but you aren't hearing it because it is pretty well masked.

But I am happy with that scenario. You see I do not bounce to anything fixed point ever in my mastering. If at the end I have to back to CD or DVD, then I can tolerate -144dB of distortion because its not significant.

It seems to me that 32float does a damn fine job with "stupid" gain calculations. I will have to find a fixed point DAW somewhere, God knows where and do these tests again.

Member for

21 years 2 months

Pro Audio Guest Thu, 02/17/2005 - 12:45
DavidSpearritt wrote: I did my experiment again this morning, and dithered the 24bit fixed point file UV22HR to 24bits. It still sounds terrible when normalised again. The 32float sounds wonderful.

What am I doing wrong, if anything?

It sounds terrible because it is magnifying the distortion you are getting because of a lack of dithering. Even dithering with UV22 isn't properly dithering the signal (as the paper suggests) and you're still getting distortion. This happens at very low levels every time you process, and it also happens when you bounce to 24 bits.

Nika

Member for

21 years 2 months

Pro Audio Guest Thu, 02/17/2005 - 12:51
DavidSpearritt wrote: I'll have to take your word for it, since I cannot do the gain change in fixed point math on my DAW.

Right. If you get your hands on a Protools TDM system you can try it. It's the difference between:

1. Take fullscale symphonic music. Add noise at -6dB FS. Listen. (This is what fixed point sounds like at the very low levels).

2. Take fullscale symphonic music. Drop signal 138dB. Bounce to disk as 24 bit file. Normalize. Listen. (This is what floating point sounds like at the very low levels).

But I am happy with that scenario. You see I do not bounce to anything fixed point ever in my mastering.

This happens every time you do a calculation (EQ, reverb, auto-tune, gain change, compress, limit, sum, bus, pan, etc) - just at very low levels. It also happens when you bounce the final result to 24 bits - though again at very low levels. I'm just trying to show you in amplified levels what is happening at your lowest bits. The problem is obviously what happens when this compounds.

Cheers!
Nika

Tags

x