The Dynamics of Streaming Audio
by Robert Auld
This is an adaptation of an article that originally appeared in the September, 2001 issue of Recording Magazine. To hear some MP3 files demonstrating how compression of the dynamic range affects web audio encoding, go to the examples below.
The conventional wisdom is that you should heavily compress the dynamics
of your program material. Does that make sense?
Recently I was reading an article in a respected magazine regarding webcasting -- streaming sound on the internet -- when the following passage jumped out at me:
"Because there are still plenty of 28.8 and 56k modems out there, compression of audio files is a necessary evil. All of the encoding algorithms in use today achieve the high compression rates necessary...by concentrating on the loud sounds and ignoring the soft ones. Therefore, to get the best results, you want a signal with almost no dynamic range."
Now, I've done a lot of encoding of classical symphonic music, which has a very wide dynamic range, using Windows Media Audio, Real Audio, and various MP3 codecs. My experience with that program material has been that the soft passages tended to sound better than the loud ones, especially with Windows Media.
So I decided to do some tests aimed directly at this issue: I would take a good quality music file and make compressed and uncompressed versions. I would then encode them for streaming and see which version sounded better.
My test file was "Picture", a composition featuring Ara Dinkjian on the ud (a kind of Armenian lute), accompanied by percussion, bass, and electric piano. I chose this music because it was beautifully recorded by engineer David Baker and preserved the microdynamics of the instruments exceptionally well. (Microdynamics are those constant little note-to-note variations in level and intensity that give acoustic instruments their liveliness and expression. It is the subtle use of microdynamics that separates really good performers from mediocre ones. And, when you squash a musical track with a compressor, microdynamics are among the first things to be compromised.)
I created four versions of "Picture": test file 1 was a straight dub from the CD with about 6 dB headroom above the loudest peaks. Test file 2 was normalized to -1 dB for the loudest peaks, with no other changes. Figure 1 shows a segment of the waveform of this file displayed in Sound Forge 5.0. There is some soft percussion in the right channel the first few seconds, then the ud and the other instruments enter. Most of the louder peaks are from the ud, which stops playing about one minute into the piece.
Figure 1: Audio Before Compression
Test files 3 and 4 were processed with Sonic Foundry's Wave Hammer plug-in, a process similar to the Waves L1 maximizer. Test file 3 was moderately compressed, test file 4 was heavily compressed, and both files were limited to peaks of -1 dB of full scale. (I wanted to be sure to avoid any digital clipping issues.)
Figure 2 shows the waveform of test file 4. The peak energy of the initial percussion part is now almost equal to the ensemble section, and the dynamic range during much of the ud solo is no more than about 6 dB. This track certainly sounds compressed, but is still tolerable to listen to. There are many pop CD's out there that are worse than this.
Figure 2: Audio With Heavy Compression
I encoded all these files with the Windows Media Audio version 7 codec included with Sound Forge 5.0, and with the Real Producer RA8 codec. The Windows Media files were encoded at a 64 kbps (kilo-byte per second) rate, typical of what might be done for a broadband stream; the Real Audio files were encoded at 32 kbps, suitable for dial-up modem streaming.
There were coding artifacts audible with all the test files put through Windows Media. In particular, transient attacks of the ud produced metallic echoes and there was some "swirl" audible on the decay of notes. Test file 2 (normalized but uncompressed) sounded similar to test file 1. Test file 3 sounded about the same as 1 and 2, allowing for the moderate compression. Test file 4 sounded hideous. The combination of coding artifacts and heavy compression was distinctly ugly.
Microsoft claims that Windows Media Audio 7 provides "near CD quality" at the 64 kbps streaming rate. While this is obviously not true, at least the microdynamics of the uncompressed test files were preserved pretty well. Removing those dynamics just made everything worse.
Real Audio does not sound very good at a 32 kbps stereo streaming rate, but I don't know of any codec that does; the data compression is just too severe. Given that, the RA8 codec is pretty well behaved: there is some loss of top end response, and what highs remain are softened and homogenized--transient detail tends to become a blurred hiss.
As with Windows Media, compressing the program material simply added one ugliness on top of another. The uncompressed files at least had a certain openess and life, the heavily compressed file was just plain bad. Believe me, you do not want your web stream to sound like that.
What were they thinking?
With evidence like this, why are various journalists and engineers claiming that you need to heavily compress audio files intended for web streaming? I think it is partly a matter of confusing the requirements of radio broadcasting with the requirements of webcasting.
Radio stations heavily compress their signals because people are listening in cars, because it helps to ensure greater coverage by their broadcast signal (and therefore, more listeners), and because the next station over on the dial is doing it too, so they feel they must compete in loudness or lose out to the competition. (I question that last rationale, but there is no doubt that many broadcasters believe it.)
It is true that many radio stations are putting their signals on the web, but there the resemblance ends. People do not listen to webcasts in cars. Instead, they listen through a computer, either in the office or at home (sometimes the same place...), usually using near-field loudspeakers or headphones. It is difficult to "channel surf" on the web the way you can with cable TV or broadcast radio because the players used for playback take several seconds to buffer the program stream on your computer. This, combined with the vast range of program choices available on the web, makes the relative average loudness of various sites much less of an issue. It's a good bet that the typical listener is at site Y because they like the content, not because it's louder than site X.
Finally, the idea that the encoding algorithms used today concentrate on the loud sounds while ignoring the soft ones is a gross misrepresentation of how an audio compression codec actually works. It is true that a loud signal will mask other softer signals which may then be discarded. However, making the input signal softer also allows encoding the signal with less bits, because the dynamic range of the original signal is interpreted by the codec to be less.
Indeed, at really low bit rates codec designers will often reduce the level of the entire signal as a way of reducing their encoding bit requirements. Pumping up the loudness of the input signal can actually work against this, forcing the codec to allocate scarce bits in a less desirable way.
Just say no
Windows Media Audio, Real Audio and MPEG-1, layer 3 are all encoding schemes that output a 16 bit signal for playback. At their best (at higher bit rates) they can deliver a dynamic range equivalent to the compact disc. At lower bit rates this dynamic range is compromised somewhat, but is still far better than what you can expect from broadcast radio.
Getting the best out of any of these encoders means starting with the best signal available and then, if necessary, doing some careful modification of dynamics and equalization based on what you actually hear coming out the other end of the codec. The idea that preparing material for webcasting means degrading it with heavy compression is just plain wrong, so do your listeners a favor: don't do it.
Copyright 2001, Robert Auld
Back to top
Back to Robert Auld Home Page
Examples of MP3 files processed with varying
degrees of compression before encoding
So that you may judge for yourself the effects of compression applied to audio files intended for web encoding, here are four versions of the same file compressed to varying degrees and then encoded in the MP3 format.
The composition is Hot Sand, written by Thiago de Mello and Claudio Roditi. The original recording used no compression or limiting of any kind. The "as is" file reproduces it exactly, encoded at a rate of 80 K/bytes per second in the MP3 format. (To hear this file encoded at a higher bit rate for better quality, go to the AuldWorks Audio Page.)
The "normalized" file differs from the "as is" file only in the overall level, which is 2.6 dB hotter.
The compressed files were processed with Sonic Foundry's Wave Hammer plug-in, which allows precise control of limiting and compression. I would characterize the "moderately compressed" file as being typical of good CD mastering practice for pop music. The "heavily compressed" file is an attempt to make the music as loud as possible without trashing the sound. (Prior to encoding it sounded quite acceptable, if somewhat compressed.)
The peak and average levels refer to measurements made before encoding. The encoding process may have changed those values slightly. Each file is about 449 kB in size after encoding.
I recommend saving all four files to your hard drive (right click for Windows users) so that you may easily compare them with each other. It is also important that you adjust your playback levels so that each file sounds about equally loud. With big differences in playback level, judging relative sound quality becomes difficult.
In my experience, encoding in the Real Audio format or the Windows Media Audio format gives the same kind of results as heard with these MP3 files.
-Copyright 2002, Robert Auld
Back to top
Back to Robert Auld Home Page