Table of Contents
Before we unravel the digital audio story, it helps to define exactly what digital audio is. In this edition of Primed we're looking at the most common method for digitizing audio – Pulse-code modulation (PCM). While this isn't the only way (Sony, for example, developed "Direct Stream Digital" / DSD for the Super Audio CD format), it's by far the most prevalent. Likewise, it seems logical to also define what digital audio isn't -- and that would be sound converted into, and remaining, electrical signals that vary over time, also known as garden variety "analog audio."
Pulse-code modulation is simply a way of representing the analog electrical signal, in digital form. The signal is "sampled" at regular intervals, and in turn converted to a digital value. This is not to be confused with musical sampling, where snippets are recorded for composition and manipulation. In this case, it refers to taking a series of tiny snapshots of the analog audio to be converted into a digital number. If the signal falls somewhere between integers, it's rounded up or down to the nearest one (known as quantization). The result is that the erratic, undulating form of the analog audio is broken down into a series of unique samples, one after the other. In many ways this is analogous to a flip-book, where a collection of stills, when played (or "flipped") in quick succession, created a fluid moving image. We'll go into these component parts, and beyond, over the course of the article, but for now, this simple description will help us understand the basic concept.
While it's always hard to pin down the exact origins of an idea -- especially one like pulse-code modulation, where a number of parties were involved -- sifting out the large and significant contributions from the gold pan of history tends to bring up the same names time and again. When it comes to digital audio, one such large-nugget name is Harry Nyquist. Working for Bell Laboratories, Nyquist (among other things) set out theories regarding telegraph speed and transmission that would pave the way for future digital audio developments. His work in this area would outline many of the fundamental principles required for digital communications, especially in regard to required sample rates and bandwidth limitations, and build upon previous ideas, like Time Division Multiplexing and early facsimile machines.
Some years later, Claude Shannon (also of Bell Labs) would consolidate many of the prevailing ideas on the subject in his 1948 paper "A Mathematical Theory of Communication" which would reference the work of Nyquist significantly. Shannon would also contribute to the further development of pulse-code modulation (along with Bell alumni Bernard M Oliver and John R Pierce) at around the same time, using technology that eliminated some of the issues (such as complex circuitry) from the earlier, but conceptually similar, implementation from Briton Alec Reeves in 1938.
With the cornerstone ideas in place, digital audio would start to slowly move from the realms of telecoms out into the broader academic and commercial fields. In terms of digital recording, according to the Association for Recorded Sound Collections, Nippon Columbia from Japan (also known as Denon) would pioneer its use in the late '60s and early '70s, releasing the first commercial digital recording: the LP "Something" by Steve Marcus. Other pushes in the right direction would come from American Thomas Stockham, who would push forward 16-bit recording in the US. It's not until the start of the '80s, when Philips (after working with Sony) demonstrated what would be the standard Compact Disc -- finally going into production in 1982 -- that digital would really turn a corner and gain traction. More formats would follow, including DAT, MINI-disc and DCC, but no physical medium would ever truly topple the ubiquitous CD. Of course, we all know the next chapter. MP3 (and other) file formats would combine with increasingly affordable home computers to gently, but definitively, chip away at the CD's monopoly, bringing us to the present day. CDs are still around of course, but as a diminishing stock of record shops around the globe will attest, the MP3 is the new king.
Now that we know more about the background, it's time to get out the scalpel, and start dissecting. What makes it tick? What are the key factors in terms of quality? What do all the different numbers mean? We'll let's jump right in and find out.
If you remember, earlier on we compared PCM to a flip-book. Well, in this analogy, sample rate would be the number of pages per second. A one-page-per-second flip-book won't show you much detail, whereas one with a 1,000 pages per second will bring it to you with much more fluid fidelity. This, however, is a somewhat simplistic metaphor. The sample rate also has a bearing on the top audio frequency that can be reproduced. That's to say, the higher the sample rate, the higher the frequency range that can be recorded. The standard sample rate for commercial CDs is 44,100 times per second, or 44.1KHz, thanks in no small part to the bandwidth limitations of NTSC VCRs, which were used as early digital tape recorders. The theoretical highest frequency that can be recorded is actually half of the sample rate, so in the case of the CD, that's 22,050 Hz (or 22KHz). As most human hearing can typically -- at best -- hear between 12Hz – 20KHz (give or take) this arguably makes 44.1KHz ample for our general consumption (though some like to debate the merits of higher sample rates). By contrast, if you were to lower the sample rate to, for example, 10KHz, the maximum frequency you could grab would be 5KHz, which would be notably "duller" with a lot of the top-end / higher-pitched sounds removed.
Bits, digital data has them, and their application in relation to audio is similar to that of imaging. Remember those crunchy-looking images from the early internet? That's a direct result of low bit depth. If sample rate is the number of "slices" over time, bit depth could be considered the number of levels, or vertical increments, available to each sample. A higher bit depth means more distinct numerical values to represent the signal / voltage. With more values, the level of quantization, or rounding up and down, is decreased. This means the level of noise introduced is lower. The CD standard has a bit depth of 16 bits, which offers 65,536 discrete values, while 24 bits will bring this number up to well over 16 million.
A lower bit depth image
Not to be confused with bit depth, bit rate simply outlines the total amount of data used every second. Again, using the CD standard, which has a bit depth of 16 and a sample rate of 44.1KHz, the bit rate is the result of multiplying these two numbers. So, 16 x 44,100 = 705,600 bits per second for a mono recording, or 1,411.2Kbps for stereo. Our friend, the MP3, can be encoded with a maximum bit rate of 320kbps (more on this later) and in this case the rate refers to compressed bits, and subsequently the compression ratio. Bit rate can also be used to work out the size of an audio file if you know how long it is. For example, a 320kbps MP3 has 320,000 bits of data a second, which is 40KBps (note the capital B for Bytes). A five-minute MP3 has 300 seconds, which would mean 300 x 40KB, to give us a 12,000KB (aka 12MB) file. This is applicable when the bit rate is fixed, but you may also have heard of variable bit rate, or VBR, which -- as the name suggests -- varies the bit rate used with the aim of having smaller files, with minimal impact on overall quality (by lowering the bit rate when there are low frequency sounds, or silence etc.).
Okay, so this Primed is about digital audio, why then are we looking at converting it back to analog again? Well, simply put, you might score a song entirely with software, and export it in a digital format, but we're guessing you'll eventually want to listen to it? While technology is advancing, we humans are largely analog devices, and so to hear the actual sound, it needs to be converted back at some point, to make those speakers or headphones vibrate sweet, sweet musical air on to our ear drums – and this is where the digital-to-analog converter (DAC) comes in. We've already covered what these actually do on a technical level above, but it's important to remember their existence. By converting those 16- or 24-bit samples back into voltage, we get sound, and the process goes both ways (i.e. when recording from a microphone). Most computer sound cards will do both (A to D and D to A,) and in the majority of cases you won't need to worry too much about it. At least now, when you're reading the back of the box for a swanky new audio interface, and it says A/D and D/A conversion at 24-bit / 192KHz, you should know what that means.
So you have your newfound understanding for how audio goes digital, but what about all those file formats? WAV, MP3, AAC, FLAC? Why so many? Good question, and one that's not easy to answer. What we can do, however, is look at some of the more popular ones and understand their individual advantages (and drawbacks).
WAV / Wave
The WAV or Wave file is one of the most common uncompressed formats. Originally a Windows format, it's since become extremely widely supported be it Linux, OS X, or any mobile platform and beyond. The main advantage beyond versatility is that it's a "lossless" format, meaning audio is typically uncompressed, and is very similar, though not identical to, CD audio. The downside to this, of course, is that file sizes tend to be large. Using the calculations mentioned above, a five-minute audio file recorded at 16-bit, 44.1KHz would create a 53MB file.
Short for Audio Interchange File Format, AIFF is similar to WAV in that it is uncompressed and lossless. Whereas WAV started on Windows, AIFF found favor with the Macintosh platform, with its origins in the Interchange File Format from Electronic Arts. AIFF also stored integers in the arguably more efficient big-endian format which was also the default on Macs and Linux, but ultimately lost out to WAV, thanks to the popularity of Windows. Again, like WAV, support is wide, and file sizes are large. There is also an AIFF-C / AIFC variant which is "lossy" / compressed.
MP3s, you have them. We're pretty sure you're aware of what these are. A compressed, or "lossy" format developed by the Moving Picture Experts Group. Clever algorithms maintain fidelity, while reducing file sizes, by compressing the data which is believed to be hard to hear in its original form. When encoded at a bit rate of 128Kbps, the resulting file will be approximately 1/11th the size of the uncompressed original. This ratio obviously changes when a different rate is employed. For example the maximum bit rate of an MP3 is 320Kbps which is a much more favorable compression ratio nearer to 4:1. Should you wish to know more about the origins of the ubiquitous specification, there are lots more details about the MPEG-1 and MPEG-2 layer III standards online, which make excellent restroom reading, but are beyond the scope of this article. The positives of the MP3 format are well-known -- small file sizes with minimal impact to audible quality. As such, it has been adopted as the standard format for many media players and devices. The downsides are that, no matter how good the final result is, you are losing some data on the way, which for purists and audiophiles isn't ideal. Many arguments have been had about the noticeable differences between a high-bit-rate MP3 and a WAV file which, to date, have never been truly settled, and likely never will.
If you've ever used iTunes, then there's a good chance you've met the AAC. Apple's preferred compressed format is still default encoding / compression choice in its ubiquitous software. The AAC format was hoped to be the successor to MP3, as it offers equivalent quality with a lower bit rate (this can depend on the encoder, of course) but as is often the case, the general public can have other ideas. While the format is widely supported across a variety of platforms, it never quite received the hardware, and user, adoption of the MP3, despite some clear technical improvements, such as support for high sample rates (96KHz compared to MP3's 48KHz max).
FLAC and beyond
So far the benefits of one format have been almost directly proportional to their drawbacks, i.e. a see-saw with audio quality on one end, and file size on the other. The Free Lossless Audio Codec tries to take on both of these qualities (high fidelity and smaller file sizes) and squeeze them into the same pot. It does so with some success, with the official site for the standard claiming an average compression of 53 percent in tests. As FLAC is open source, and therefore non-proprietary, it has gained wider support than some of the competing lossless formats such as Apple Lossless and WavPack. This balance of abilities has also earned FLAC a dedicated following, but file sizes are still larger than those offered by MP3 and AAC, which can still make these skinnier formats appealing to less demanding consumers.
There are, of course, many more digital formats, such as WMA, OGG (Vorbis), MP2 and so on. To exhaustively list them here would require a few more pages, or perhaps a Primed of its own. The main distinction, however, is whether they are lossless or not, and then whether compression is involved. While support for some is greater than others, widely available and popular software media players can usually support all of them natively, or if not, then at least by expansion with downloadable codec packs.
Knowing what makes up an audio file is only part of the story, there are actually a swathe of other contributing factors that can have a bearing on the sound. First of all, your ears. Sadly these aren't digital, and can decrease in reliability over time. Secondly, there's the audio itself. Digital will only reproduce what you feed into it in the first place. If the recording is bad, poorly mixed, or taken from an inferior source, it will only be as good as the weakest link in the chain. So, theoretically, a 128Kbps MP3 file could be better quality than a WAV file, if some of the above factors affect one, and not the other. Another way of thinking of it is that you might convert an MP3 to a WAV, but it won't gain anything in the process (other than extra file size). That said, it's an interesting test to use on that friend who swears they can tell the difference. Only when two files are from the same source are these issues relevant. It may seem obvious, but the temptation can be to fixate on the pure numbers and not some of the equally vital external factors.
Go on to any audio forum, and it won't take long before you find a thread about which format is the best, or whether you can hear the compression in MP3s and so on. In fact, don't ever do that, unless you want to read pages and pages of heated debate between parties solidly determined of the others' incorrectness. Each format has its merits, and the most important factor (particularly where music is concerned) is that it sounds good to you, and that you enjoy it. However, knowing that low bit depth or sample rates can affect your end result, will arm you with the tools to get the most out of your audio next time you're encoding it.
[Image credit: IEEE History Center]