Skip to main content

My wife has a small business selling spoken word recordings.
I'm her tech guy, photographer, and financier.

She does her recording in Pro Tools and we sell mp3 files.
I've been using iTunes to convert .WAV to .mp3 but I want to have more control over the quality and size of the mp3 file.
I've just downloaded an opensource conversion program that utilizes a number of algorithms for the conversion.

I'm looking for recommendations/advice on which algorithm to use
Considerations:

  • Small package - this title will be downloaded to customers and often they are in hotels with limited WiFi bandwidth and sometimes even using cellular data for downloads
  • Quality - this isn't music, it's just voice (currently, no background music at all) but we want her voice to sound as good as possible, given the size constraints
  • Budget - we want to spend as little as possible. It shouldn't make a difference to this specific question unless you are proposing an alternate solution like hiring a sound engineer (which we did for a time, but found it was too expensive for us. sorry)

Thank you in advance!

Comments

dvdhawk Mon, 10/13/2014 - 14:19

Hi,
Since you're already committed to ProTools, why not drop $20 on the [[url=http://[/URL]="https://shop.avid.c…"]ProTools mp3 import/export add-on,[/]="https://shop.avid.c…"]ProTools mp3 import/export add-on,[/] and keep it in-house so to speak. The thought process being, the fewer algorithms and conversions the better in the end product. I haven't used it in a while, but it used to do dirty lo-res mp3s all the way up to 320 CBR.

Good luck.

newmarket2 Tue, 10/14/2014 - 06:54

dvdhawk, post: 420163, member: 36047 wrote: Hi,
Since you're already committed to ProTools, why not drop $20 on the [[url=http://[/URL]="https://shop.avid.c…"]ProTools mp3 import/export add-on,[/]="https://shop.avid.c…"]ProTools mp3 import/export add-on,[/] and keep it in-house so to speak. The thought process being, the fewer algorithms and conversions the better in the end product. I haven't used it in a while, but it used to do dirty lo-res mp3s all the way up to 320 CBR.

Good luck.

Thanks for that.
We'll definitely look into that, but today I've got to convert WAV to MP3 and really would like to achieve the best quality (for spoken word) with the smallest file (to ease/speed downloading)
Any thoughts there?

newmarket2 Tue, 10/14/2014 - 08:10

Thanks for that. I'll check it out.

But today I'm on a deadline to convert WAV to MP3 and want to select the encoder that will give me a package that is quickly downloadable (ie smaller) with the best possible sound quality, given that this is spoken word and not a symphony.
The default encoder is LAME MP3 v3.99.5
Others available are:
BONK
FAAC
FLAC
Ogg Vorbis
Windows Media encoder

newmarket2 Tue, 10/14/2014 - 09:02

Well, for now, too late....I couldn't figure out the new converter app I was trying to use (fre:ac) and went back to iTunes, which I have used before.
But, still interested in hearing from someone who really understands the ways in which spoken word differs from music - from the perspective of the sound engineer...
It would help us make a better product!

dvdhawk Tue, 10/14/2014 - 15:43

Your senses rely heavily on your brain to extrapolate limited information and fill in the gaps. The mp3 formats are all based on the concept that humans can only actively listen to a limited range of things at any particular moment. If you put on your photographer hat for a minute and think about depth-of-field, it's a similar idea. mp3s give you a smattering of whatever is in the foreground and clamoring for the most attention. They use an algorithm to mask everything else that's not in the foreground and deemed unimportant, or unnoticeable (by most), to get the file size down. But much like photography, the depth of the background (even if it's out of focus) still provides a lot of the atmosphere of a photo. To get detail, depth-of-field, and clarity beyond one focal length is what separates the men from the boys.

You haven't indicated what version of ProTools you have, or what the end-product might be, but most spoken word files wouldn't necessarily need to be exported in stereo. That cuts your file size in half, right there, if the purchaser doesn't need it to be in stereo. The human voice doesn't need the full 20Hz to 20kHz frequency which would be the standard for music. Limiting your frequency-range would have to be done carefully. Indiscriminately cutting the highs and lows can lead to a perceived loss of quality - even before it's converted and compressed to mp3. (compression here refers to file-compression, not audio compression)

Without all those other audio distractions, an mp3 converter should function to its highest potential doing just a single voice. Uncompressed wav files are going to give you the best possible quality. mp3 formats ALL sacrifice quality to varying degrees (which you can select). The higher sample-rate you choose for your mp3 export, the larger the files are going to be. Sound quality and file size are directly proportional. Selecting Variable Bit-Rate can reduce file size, but you're introducing more potential for error, and usually a loss of quality. Constant Bit-Rate will give you more consistent results and larger files.

If you have a version of ProTools that fits the System Requirements, you can download the mp3 codec immediately after you purchase it online - and you should be shooting mp3s straight out of ProTools in 15 minutes or less. Saving a step and skipping the middleman. Again, by avoiding some freeware, 3rd-party shareware converter of questionable quality / accuracy - you at least theoretically avoid error and degradation from an extra pass through more codecs. It's been a couple years since I've even used my ProTools system, but I could choose Constant or Variable bit-rate, choice of sample rate at least 128 to 320, mono/stereo. If you want to make a project of it, export a variety of formats and listen to them on a variety of playback systems and see whether you can hear the difference. (have your wife listen too, chances are good her hearing sensitivity is better - or at least different than yours in the upper frequencies)

If that doesn't suit you, I don't know what else to tell you but visit [[url=http://[/URL]="http://www.tucows.c…"]tucows[/]="http://www.tucows.c…"]tucows[/] and find whichever program has the best cost / cow ratio for your operating system.

Best of luck.

newmarket2 Fri, 10/17/2014 - 07:48

Thanks for taking the time to share! Here's some quick items to help the context:
First, we're too far along on this title to do anything more.
I used iTunes to do the WAV to mp3 conversion at 192 bps per Audible's requirements.
For our own website download, I want the package much smaller.
Using our free, download converter, using one random track, I created output of 1.9 mb, 700 kb and 600kb and gave my wife a blind test...she though the 700kb was muddy but couldn't distinguish between the 600 and 1,900. My conclusion is that while you might be able to tell the difference, she couldn't (in a quiet room) and neither will our customers standing on a noisy street in Rome
Another consideration is that we do production of audio on 3 separate PCs and our ProTools version is Lite, which requires a box. [I think it's PT v 6]. It's impractical to move the box around so that I can do the conversion on my machine. I'm not rejecting the add-on, but I'd rather find a separate tool.
Finally, I want to go back to my original question: which conversion algorithm is best for spoken word?
I'm planning to do another blind comparison because I'm guessing there's no noticeable difference...
Remember, our clients are not maestros listening to music on their $20,000 stereo system - they are tourists standing visiting art sites in churches and on the street.

anonymous Sat, 10/18/2014 - 05:11

dvdhawk, post: 420184, member: 36047 wrote: mp3s give you a smattering of whatever is in the foreground and clamoring for the most attention. They use an algorithm to mask everything else that's not in the foreground and deemed unimportant, or unnoticeable (by most), to get the file size down.

Great explanation, Hawk. And, exactly why I don't like MP3's. There's a lot of cool nuances, textures, depth and colors that get lost in the translation between .wav's and the average MP3 format.

newmarket2, post: 420264, member: 48529 wrote: My conclusion is that while you might be able to tell the difference, she couldn't (in a quiet room) and neither will our customers standing on a noisy street in Rome

I understand that, and I also understand that in this day and age, they are a necessary evil; the general population of music listeners have their ipods and other MP playback devices, and they either can't hear the difference - or, they hear it but don't care, choosing quantity over quality. And, from a production point of view, it's just not feasible to be sending multiple hi res .wav files to clients via email, as it takes too much time (some email programs won't even let you attach a file that large). The answer is, of course, an FTP transfer... but, that still takes time, too ... and, it's not (yet) an efficient way of sending multiple tracks for a project transfer - at least not without zipping them up.

Speeds and transfer methods are improving, but I don't think that MP3's are going away anytime soon. They're just too convenient... although most of us here on RO know that "convenience" rarely equals quality.

IMHO of course. ;)

d.

newmarket2 Sat, 10/18/2014 - 08:38

While I do appreciate your commitment to quality, frankly, it is getting in the way of your being able to help.
My very specific question has not yet been addressed. Yes, I know mp3 is inferior, but it is reality that some are better at distinguishing differences than others and some situations call for and benefit from "better".
So, given someone listening to mp3's using earbuds on a busy street - "quality" is something entirely different than sitting in your baffled music room listening to a very high quality symphonic recording.

I really need to know if one compression/conversion algorithm might produce an mp3 file that under these specific circumstances would make Jane's spoken words easier to understand. If you don't know, please just tell me so. And, if you know of a better resource, like a forum for spoken word vs broader sound engineering, please pass it along.

anonymous Sat, 10/18/2014 - 09:32

Frankly, no one is "getting in your way". This is a forum where comments are welcome, as long as they aren't personally offensive. You always have the right to skip over or disregard any posts you don't like, or feel are irrelevant. No one has the right to tell anyone to shut up around here - no matter how politely you ask.

The thread is open to other members, along with anyone doing a search engine query, who may feel that some of the info you determined as "getting in your way" may indeed be of importance to them.
When you post a thread on a public forum such as this, you are allowing others to view the thread, and, to ask further questions, or comment if they wish.

To address your SPECIFIC QUESTION, I know of no specific MP3 algorithm that will produce a more legible recording than what the source material holds. If you choose a hi-resolution MP3 format - 320kb @48k - you will get the best MP3 that is possible. It won't make it sound any better than the original 44/48 @24/32 bit .wav file. But it will give you the highest quality that is possible in terms of MP3's.
And, because it's MP3, there will be losses, but as you say, under normal listening circumstances, these losses will likely not be audible to your "average" listener.

This is a forum made up of audio professionals, and our idea of quality is probably not the same as yours. Years of studio training have given the engineers here finely tuned ears. You can't ask us to not hear that which we do.

Accordingly, no one here knows what your own specific idea of "quality" is.

I suggest that you take a section of her original vocal narrative, in its original .wav format, and export it as - or convert it to - several different MP3 codecs. Then listen for yourself, through ear buds, which one you feel will deliver the best quality at the least resolution. 320 may be overkill. You might find that 128 will suffice. Perhaps it will. This is something only you can determine for yourself.

d.

x