Ginger Audio - Immersive Audio: The Definitive Glossary

Immersive Audio: The Definitive Glossary

Struggling to make sense of immersive audio or just want to brush up on your terminology? Check out this definitive glossary on the subject

Immersive audio and the various jargon surrounding it can be a complicated subject. Understanding the key differences between immersive audio, surround sound, binaural audio and the related lingo is vital if you want to build a strong understanding of the topic.

That’s why we’ve put together a definitive glossary on immersive audio, and all the terms you’re likely to come across when exploring the subject. You can pin this page, add it to your bookmarks or even print it out - you never know when you might need to double-check a definition.

3D Audio

3D audio refers to any audio experience or technology that enables someone to listen to audio in a three-dimensional environment, thus enhancing the sense of audio immersion. Immersive audio and spatial audio are both examples of 3D audio, and so 3D audio can be used as a general term to describe any technology or experience that gives the impression that sound is coming from different positions within a three-dimensional space.

Ambisonics

The term Ambisonics refers to a full-sphere surround sound format, in which sound can be positioned on the vertical axis, both above and below the listener. This is achieved through the use of specialist ambisonics-capable microphones with a minimum of four microphone capsules. Unlike stereo or surround sound formats, ambisonics recordings aren’t channel-based. Instead, ambisonics recordings can be decoded and outputted to any speaker configuration.

Audio Immersion

Put simply, audio immersion describes how immersive and enveloping an audio experience is. For example, playing a mono audio recording out of a mobile phone is likely to provide low levels of audio immersion. A specialist 9.1.4 immersive audio configuration with high-fidelity monitor speakers will offer much higher levels of audio immersion.

Bass Management

Bass management refers to the process of using a crossover to split an audio signal at approximately 120Hz, sending frequency content below the crossover to a dedicated LFE channel. The level, timing and other characteristics of the LFE channel can then be independently adjusted before being sent to a subwoofer for playback. Bass management can be achieved using a monitor controller such as Ginger Audio Sphere.

Binaural Audio

Binaural audio is used to describe the process of recording and reproducing audio in order to emulate the way humans perceive audio in the real world. This is achieved by taking binaural recordings with two specialist microphones, sometimes located within a dummy head. The use of a dummy head more accurately simulates the way audio interacts with the human head, ears and auditory system.

Channel-Based Audio

Traditional monophonic and stereo audio formats, as well as 5.1 and 7.1 surround sound formats all use channel-based audio. This means that each sound source is assigned to a dedicated channel, therefore the playback configuration must match the predetermined audio format. If 7.1 surround sound audio is played back on a 5.1 surround sound system, the two side channels will be absent.

Dolby Atmos

Dolby Laboratories’ Dolby Atmos is a widely-used immersive audio format that adds a third dimension to surround sound formats by adding speakers along the vertical axis. The term Dolby Atmos refers to the mixing process and format, as opposed to the playback hardware. The playback hardware is more commonly described as Dolby Atmos-capable, for example, Apple AirPods or some models of soundbar.

Head-Tracking

Within the context of binaural audio, head-tracking refers to the technology which allows a playback device to dynamically alter the spatiality of an audio signal according to the orientation of the listener’s head. For example, turning your head to the left causes audio to play from the right headphone, and vice versa. This gives the perception that the sound is coming from a specific position, thus increasing the listener’s audio immersion.

Head-Related Transfer Function (HRTF)

HRTF refers to the mathematical function which represents how audio travels from an audio source to a given listener’s ears. HRTF data is acquired by scanning or measuring a listener’s ear, head and body shape, and is therefore individual to each listener. HRTF data is then used to simulate and replicate how someone perceives audio in a real-world environment.

Immersive Audio

Widely considered to be the ultimate audio immersion experience, immersive audio relates to creating a three-dimensional sound environment. Unlike surround sound, immersive audio also allows the user to position audio along a vertical axis. Immersive audio is achieved through the use of several advanced audio recording and processing techniques, including ambisonic recordings, mixing sound objects and multichannel mixing.

Low-Frequency Effects (LFE)

LFE refers to the band-limited channel in a surround sound or immersive audio setup, designed to handle low-frequency information below approximately 120Hz. Low-frequency effects are generally assigned to a subwoofer which is denoted by the second number in a surround sound or immersive audio setup configuration. For example, there is a single LFE channel in a 5.1 system, and two LFE channels in a 7.2.4 system.

Multichannel

The term multichannel is used to describe audio formats which contain more than two channels, such as 5.1 or 7.1 surround sound configurations. Multichannel is often used as an umbrella term to describe all immersive audio formats. This is slightly contradictory, as immersive audio formats such as Dolby Atmos and Sony 360 Reality Audio are object-based rather than channel-based.

Renderer

Within the context of immersive audio, a renderer refers to software that works within or alongside your DAW, formatting your mix ready for playback on the appropriate playback system. Some DAWs have integrated Dolby Atmos and Apple Spatial Audio renderers. There are also external renderers such as Dolby Atmos’ external renderer, and Ginger Audio Sphere, which boasts inbuilt Apple Spatial Audio rendering.

Object-Based Audio

Unlike channel-based audio, object-based audio doesn’t use specific channels. Instead, three-dimensional metadata such as level, position, direction and size is assigned to each sound source. This means that object-based audio can be rendered dynamically, according to the playback speaker configuration. Object-based audio offers more control and spatial resolution than is possible with channel-based audio formats.

Spatial Audio

Frequently referred to as virtual surround sound, spatial audio relates to technology that gives the impression of listening to several speakers in a surround sound or immersive audio configuration. Headphones, TVs and soundbars make use of spatial audio processing in order to create a heightened sense of audio immersion.

Spatial Resolution

The term spatial resolution refers to how precisely a speaker configuration is able to convey spatiality to listeners. Generally, configurations with fewer speakers have a lower spatial resolution, while more speakers are likely to achieve a greater sense of spatial resolution. For example, a channel-based stereo playback system has a lower spatial resolution than an object-based 9.1.2 immersive audio system.

Speaker Crossover

When using surround sound or immersive audio speaker configurations, speaker crossovers are used to split an audio signal’s frequency information and send each part to a different speaker. For example, a speaker crossover may send information above 120Hz to traditional broadband speaker monitors, and all information below 120Hz to a subwoofer.

Stereo Sound

The most commonly consumed audio format is stereo sound, which refers to audio consisting of two channels. Also known as stereophonic sound, stereo sound contains a left and right channel, and allows users to position audio along the horizontal axis. A minimum of two headphones or speakers is required to reproduce stereo sound accurately.

Studio Monitor Controller

A studio monitor controller is a device that allows for the independent configuration of speaker level, timing and other parameters. Unlike an audio interface, a studio monitor controller is concerned with outputting audio from a computer as opposed to capturing audio into a computer. Studio monitor controllers are available in both hardware and software forms, and each has their own strengths and weaknesses.

Subwoofer

A subwoofer is a type of speaker which is specifically designed and optimised to accurately reproduce low frequencies, usually below 120Hz. In surround sound and immersive audio configurations, the subwoofer is tasked with outputting low-frequency effects (LFE), while traditional broadband speakers output audio above 120Hz.

Surround Sound

Surround sound refers to a multichannel audio format that allows for the positioning of audio along two dimensions, depth and width, or front to back and left to right. Common surround sound configurations include 5.1 (front left, centre, front right, rear left, rear right and subwoofer), and 7.1 (5.1 with added left side and right side speakers). Unlike immersive audio configurations, surround sound speaker setups do not include speakers along the vertical axis.

‍

About the author

Jake Gill is a journalist, content writer and music producer based in Bristol, UK. Having studied marketing as well as music production, he's gone on to write for some of the industry's leading software developers, instrument manufacturers and publications.