This post is meant to be a very high-level overview to digital audio processing standards, terminology, as well as some bottlenecks, misconceptions, and peculiarities. It is meant to be used as a checklist of topics to dive into further for someone who is starting with this hobby.

So please don’t judge me veteran audiophiles 😓

Table of Contents

Digital Audio

Digital audio is composed of 0s and 1s (bits). It is often stored after encoding and compression (codec) which can be Lossy or Lossless.

Lossy - MP3, AAC
Lossless - FLAC, ALAC, WAV (raw), Dolby Digital (AC-3), DTS (Digital Theater Systems)

Wireless (over Bluetooth) - AptX (Qualcomm), LDAC (Sony)

Quality Parameters

Bit Rate: the amount of data processed per unit of time in a digital audio file, usually measured in kilobits per second (kbps).

Sample Rate: the number of samples of audio captured per second, measured in Hertz (Hz).

Bit Depth: the number of bits used to represent each audio sample.

Channels: individual streams of audio information in a recording or playback system. Each channel typically corresponds to a separate speaker or output device.

Mono: 1 channel
Stereo: 2 channels; Left (L) and Right (R)
5.1 Surround Sound: 6 channels (left, center, right, left surround, right surround, subwoofer)
7.1 Surround Sound: 8 channels (same as 5.1 plus two additional rear channels)
Binaural: 2 channel audio recorded using two microphones placed at the ear positions of a dummy head or human. Used for ASMR and VR.

Dynamic Range: the difference between the quietest and loudest parts of an audio signal, measured in decibels (dB).

Signal-to-Noise Ratio (SNR): the level of the audio signal compared to the level of background noise, measured in decibels (dB).

CBR vs VBR vs ABR: bit rate may vary in parts of the same song at different time segments in VBR (variable bit rate), suitable for transmission over network. Constant bit rate forces the data rate to be a fixed value throughout the song using stuffing and trimming. Quality wise: VBR > ABR > CBR.

The audio properties of a digital audio file can be written as follows:

Audio Codec: MP3 (MPEG-1 Audio Layer III)
Bit Depth: 16-bit
Sample Rate: 44.1 kHz (44,100 samples per second)
Channels: 2 (Stereo)
Bit Rate: 320 kbps (kilobits per second)

Audio standards are often used instead of listing down all the specs:

CD Quality: 2 channels, 16-bit, 44.1 kHz
DVD Quality: 2 channels, 16-bit, 48 kHz
Studio Quality: 2 channels, 24-bit, 48 kHz

Dolby Atmos
DTS
Opus

Hi-Res Audio: anything higher than 16-bit, 44.1 kHz audio (aka HD Audio)
LDAC: upto 990 kbps at 32 bits/96 kHz

Where do these numbers come from? See below section.

Audio Specifications

Pulse Code Modulation (PCM) is a method used to digitally represent analog signals. Basically measure (sample) the analog waveform a bunch of times and create quantas of varying heights (depth) that represent bits in the digital equivalent (digial audio output).

Nyquist Theorem (Nyquist-Shannon Sampling Theorem): states that a continuous analog signal can be completely represented by its samples and fully reconstructed from those samples if it is sampled at a rate that is at least twice the highest frequency present in the signal.

In simple terms, for the best conversion of analog to digital, the sampling rate should be 2x the highest frequency present in the audio.

If the audio file is sampled at 44.1 kHz (CD Quality), the highest frequency present in it is approx. 22 kHz, which is enough to represent the maximum human hearing frequency limit of 20 kHz!

Analysis

Most good audio players show metadata about the audio file. Also, most audio streaming apps will have the audio quality settings listed.

A lot of times the audio bitrate is upscaled/downscaled and the file properties will show the upscaled/downscaled number but the actual frequencies in the audio will never reach the maximum possible frequency for that bitrate.

To identify such files we use an acoustic spectrum analyser - Spek.

It shows a frequency-timestamp graph (Spectrogram) and we can clearly see what’s the maximum frequency the track outputs.

A lossless WAV file (1411 kbps):

Same song but in a lossy MP3 file (180 kbps):

Hardware

Digital-to-Analog Converter (DAC): an electronic chip that converts digital output to analog signals. Everywhere you see a 3.5mm audio jack, there is a DAC chip behind it which is converting bits to audio signal.

USB Type-C outputs raw digital data (bits) and we plug an external DAC to it which then in turn outputs analog audio signals. It may look like an adapter but it isn’t.

Be careful! Type-C port can also output analog signals.) as well. When external DAC is plugged into such a device, it detects this external DAC and bypasses any internal DAC behind Type-C port (in case of OnePlus devices). This is the reason why some devices need adapter but some require an external DAC.

Amplifier (AMP): an electronic chip that can modify analog signal values to adjust physical signal parameters. Most external portable DACs have a built-in pre-tuned AMP chip too.

Earphones / Heaphones / IEMs / Speakers: the most important hardware as they are the ones directly plugged into your ears. Maybe I will make a different post on them as its a vast topic on its own - Driver, Impedence, etc.

Bottlenecks

There are bottlenecks in every step of the way till the music reaches your ears (which themselves are capped at 20Hz btw).

All devices: check onboard chip and DAC specs if they are limiting audio output

Streaming Apps: most apps like Spotify doesn’t even support anything more than 320 kbps VBR (only with Spotify Premium on the Web Player).

Android: apps can’t output high quality audio since Android always resamples all audio to 48 kHz (link). This has changed since Android 14) and Android devs can now ouput high quality audio directly but many popular streaming apps haven’t implemented it yet (untested).

This android limit can be circumvented using an app called USB Audio Player PRO.

Tuning

Parametric EQ

Harman Curve

Bit Perfect Audio