What is 3D Audio ?

3D Audio is a generic name that encompasses ways to render audio like in reality and in particular its spatial dimension; it is the capability to produce and perceive sounds in any direction and at any distance. At 3D Sound Labs we prefer to call this technology Virtual Reality Audio or VR Audio.

3D Audio can be reproduced over many loudspeakers positioned all around the listener. A good example of this application is the Dolby Atmos system designed for Movie Theaters. For headphones, the technology used is called Binaural Synthesis. It consists of simulating, at someone’s eardrums, the same acoustical sound field produced as the one which would have been produced by a real audio scene.

Binaural synthesis, which is therefore the capability to produce 3D Audio over headphones is fast becoming a key technology brick needed by content producers and device/application developers active in the fields of Virtual and Augmented Reality

 
 

 

 

Why 3D Audio ?

3D Audio is key to developers as it provides several important benefits compared to traditional state-of-the-art multichannel audio (5.1, 7.1):

Immersion

This is the most obvious one.
Unlike vision, audio is a 360° sensory experience.
In order to fully immerse a user in the most realistic sound scene, one needs to be able to reproduce sound coming from all directions!

Realism

In many applications leveraging 3D audio, the level of realism of a virtual sound scene is absolutely key as it makes or breaks the overall experience of virtual presence (this is particularly true in Virtual Reality). To achieve this “feels real” perception by the user, the audio experience has to be consistent with the visual and physical perception of the virtual world. It requires accurate reverberation models, low latency user movements processing, natural sound coloration, and precise 3D sound reproduction capabilities.

Localization

the informational nature of 3D audio is very useful in many use cases. It can be used to draw attention to something which is not visible and/or to offload object localization from the visual system to the auditory system thus allowing lower cognitive load and faster reaction time. Typically this is used by 360° film directors to draw viewer attention in a content where there is no “camera window”, user interface developers for mission critical applications (ex: fighter pilot radar alarm) and by game developers (most video game players know that being able to detect the direction of an enemy shot can be life saving).

Intelligibility

binaural reproduction of 3D audio content enables to leverage the “cocktail party effect”, a human auditory system phenomenon that allows to focus one’s auditory attention on a particular sound source while filtering out other sounds and surrounding noises. This is why you can understand somebody talking to you in a multi speaker and noisy environment like a “cocktail Party”! This is an extremely useful feature of 3D audio to use for dialog enhancement and in multi-users application with a social component such as teleconferencing.

 
 

 

3D Audio Engines

A 3D audio engine is a set of software tools that convincingly reproduce any spatial audio content which is either recorded (player mode) or created in real time in interactive applications (VR audio synthesis mode).
3D Sound Labs has developed a 3D audio engine named “VR Audio Kit”, specifically designed to satisfy professionals developing cutting edge applications with high CPU constraints, and device/content creators working on the next generation of entertainment (VR/AR) willing to provide the best in class user experience on multiple distribution platforms.

Object vs Soundfield

Traditionally, 3D audio engines use an object based sound representation where the resulting 3D sound of a scene is the addition of the rendering of each individual sound sources (or sound object) of this scene. It is a very precise way to synthesize 3D sound, but the complexity of the rendering grows dramatically with the number of sources and the level of realism of the scene.
At 3D Sound Labs, we have also developed an engine based on a soundfield representation rather than an Object-based representation, using the spherical harmonics mathematical concept (aka Higher Order Ambisonics or HOA), and synthesizing a binaural signal directly from the HOA presentation of the content.
This engine can natively process scene based audio content, recorded or mixed on a Digital Audio Workstation, in any HOA format (ex: 360° video). It can also process object-based content as it includes an “Object to HOA” conversion stage which brings the benefits of the Ambisonics representation to the traditional object-based paradigm if needed.

3d-sound-labs-object-text

3d-sound-labs-soundfield-text

 
 

 

 

Benefits of Higher Order Ambisonics based Engines

 

The mathematical concept of spherical harmonics is often used to represent three dimensional signals with spherical coordinates (distance, elevation, azimuth), making easy to implement rotations in this space, and describing the global signal as the sum of simple individual signals, which in the case of sound pressure, are plane waves coming from precise directions of space. This is the 3D spatial equivalent of the famous 1D Fourier transform for audio signals, representing any signal as the sum of individual tones (sines) of different frequencies.. It is well adapted to represent a 3D audio sound field in what is called Higher Order Ambisonics representation by the audio community.
The HOA representation is an “approximation” of the real signal and the level of spatial precision depends on the order (or the number of coefficients) of the spherical harmonics decomposition. The bad news: low order like B-Format (1st order) content lacks spatial precision and you can definitely hear it. But this caveat is largely compensated by the benefits of using the hierarchically organised model of Higher Order Ambisonics (HOA):

 

Benefit #1: Scalability and Low CPU% capabilities

This is the most obvious one. Unlike vision, audio is a 360° sensory experience. To fully immerse a user in a sound scene, one needs to be able to reproduce sound coming from all directions!

Benefit #2: Head-tracking friendly

in many applications leveraging 3D audio, the level of realism of a virtual sound scene is absolutely key as it makes or breaks the overall experience of virtual presence (this is particularly true in Virtual Reality). To achieve this “feels real” perception by the user, the audio experience has to be consistent with the visual and physical perception of the virtual world. It requires accurate reverberation models, low latency user movements processing, natural sound coloration, and precise 3D sound reproduction capabilities.

graph-hoa

Benefit #3: Recorded 3D audio friendly

the informational nature of 3D audio is very useful in many use cases. It can be used to draw attention to something which is not visible and/or to offload object localization from the visual system to the auditory system thus allowing lower cognitive load and faster reaction time. Typically this is used by 360° film directors to draw viewer attention in a content where there is no “camera window”, user interface developers for mission critical applications (ex: fighter pilot radar alarm) and by game developers (most video game players know that being able to detect the direction of an enemy shot can be life saving).

 
 

 

3D Sound Labs’ solution to 3D Audio

 

Because we believe HOA paradigm is key to the future of 3D audio, we have developed a novel way to render HOA based content over headphones that provides less sound coloration and use much less CPU than traditional virtual speaker based implementations.

 

3D Sound Labs Hybrid 3D Audio Engine: the best of both worlds

VR AudioKit, the 3D Sound Labs engine combines at the same time an object and a HOA 3D audio engines so that applications developers can leverage the benefits of both worlds and create content which can be a mix of the following:

• Native HOA content at any order and in any format (ACN/SN3D, Ambix, FuMa, …..)

• Object based content with various level of spatial precision to optimize the CPU Load and target multiple platforms with the same content and introduce on purpose different spatial resolution in the 3D content to help the end-user attention management in the narration.

3dsl-ambisonics-object-based-vr-audio-engine

 
 

 
 

3dsl-object-based-hybrid