SPATIAL AUDIO FORMATS & METHODS

SPATIAL AUDIO FORMATS & METHODS

Spatial audio formats and methods define how spatial information is described, carried, translated or reproduced.

They shape how a spatial idea travels from studio to site: whether it is fixed to a known playback setup, kept flexible until rendering, prepared for headphones, carried with metadata or translated into a larger loudspeaker environment. In a project, these choices affect production, rendering, delivery, playback and the way spatial intention survives between studio, system and site.

For KLING KLANG KLONG, formats and methods are not abstract technical categories. They are part of how a spatial score becomes audible: how sound can move, remain local, surround, guide, respond, scale across systems or belong to a specific room.

This Technical Note is the companion to Spatial Audio Systems & Renderers. That article explains the playback environment. This one explains the formats and methods that may feed into, describe or shape that environment.


IN BRIEF


A spatial audio format describes how spatial sound information is stored, exchanged or delivered. It may carry channel signals, object positions, scene-based information, metadata or binaural audio.

A spatial audio method describes how sound is organised or reproduced in space. Some methods are built around a known loudspeaker layout, some keep spatial decisions flexible until rendering, some are made primarily for headphones, and others depend on soundfield representation or loudspeaker-array reproduction.

These formats and methods only become meaningful inside a workflow. A format can describe spatial information, a renderer can translate it, and a playback system can make it audible in a specific listening condition.

The useful question is therefore not which format is most advanced, but which choice keeps the spatial intention clear from composition to playback.


Formats, Methods And Delivery Layers


The terms in this article do not all describe the same kind of thing. They sit on different levels of the spatial audio workflow: composition, representation, rendering, delivery and playback.

At this point, the important thing is not to understand every term in detail, but to see the difference between those levels. One approach may be a way of composing for fixed loudspeakers. Another may describe space before it is rendered. Another may support headphone listening. Others may deal with metadata delivery or loudspeaker-array reproduction.

In real projects, these levels often overlap. A spatial installation may combine a stable multichannel composition with flexible movement, headphone previews, spatial recordings, localised loudspeaker feeds and show-control data within one production process.


Why Formats And Methods Matter


Formats and methods matter because they shape the connection between spatial intention and technical delivery.

A flexible workflow can make it easier to adapt a project to different playback systems, while a more fixed approach can offer precision when the loudspeaker layout is known. A headphone experience, a room-based installation, a loudspeaker array and a multi-zone exhibition each ask for different forms of spatial translation. Some workflows preserve movement and metadata until rendering; others favour stability, predictability and long-term reliability.

In exhibitions, museums and pavilions, these choices affect more than production. They influence how sound can be updated, synchronised, monitored, archived, toured or maintained. They also influence how easily a spatial composition can move from studio preview to the real acoustic conditions of the site.

A format does not guarantee a spatial experience. A binaural file, an Ambisonics scene, an object-based mix or a multichannel playback system can all be spatially meaningful or spatially weak. The decisive question is how the method supports the dramaturgy, architecture and visitor movement of the project.


Channel-Based Multichannel Audio


Some spatial projects do not need a format that remains flexible until the last rendering stage. When the room, loudspeaker layout and visitor journey are known, it can be more useful to compose directly for the installed playback system.

Channel-based multichannel audio works in this way. Sound is assigned to defined output channels or loudspeaker feeds. Stereo, 5.1, 7.1 and many exhibition-specific multichannel setups belong to this broad family.

In spatial installations, this approach can offer direct control over level, timing, movement, transitions and acoustic spill. The composition is shaped for the system that will actually be heard, rather than translated later into an unknown layout. Its limitation appears when conditions change. If the loudspeaker layout, room or playback setup is altered, the spatial composition may need to be adapted or remixed.

This is why channel-based work remains important in exhibitions. Many long-running installations do not need a complex live format; they need a spatial composition that is tuned carefully to a known room and remains stable every day.


Ambisonics And Scene-Based Audio


When the final playback setup may change, it can be useful to describe spatial sound without tying it immediately to fixed loudspeakers.

Ambisonics does this by representing sound as a soundfield. Instead of saying “this sound belongs to this loudspeaker,” it carries directional information that can later be decoded for loudspeakers or rendered binaurally for headphones.

This makes Ambisonics useful for 360-degree video, virtual reality, soundfield recording, immersive media and workflows where the final playback setup may vary. First-order Ambisonics can describe a basic spherical soundfield, while higher-order Ambisonics can offer greater spatial resolution when the production and playback chain support it. Its strength is portability. The same material can travel through different production and playback contexts more easily than a mix fixed to one loudspeaker layout.

The final experience, however, still depends on decoding, loudspeaker setup, binaural rendering and the actual listening condition. Ambisonics can carry spatial information flexibly, but it does not by itself make a room feel spatially meaningful.


Binaural Audio And HRTF-Based Rendering


Not every spatial experience is shared through loudspeakers in a room. Some are designed for headphones, individual listening, VR, audio walks, web experiences or production previews.

Binaural audio creates a spatial impression over headphones by considering how sound reaches the two ears. It can be produced through dummy-head recording, binaural synthesis or rendering from another spatial format.

HRTFs, or head-related transfer functions, describe how sound from different directions is filtered by the listener’s head, ears and body before reaching the ears. They are one reason binaural rendering can make sound appear around, above or behind the listener.

During production, binaural rendering can also support headphone-based previews of spatial sketches or early mixes, especially in VR/XR, real-time and simulation-based workflows. It remains a preview method, not a substitute for the final playback condition.

The main challenge is translation. A binaural mix is heard through headphones and does not behave like sound in a shared physical room. For exhibitions and installations, it should therefore be understood as one playback and preview method, not as a replacement for spatial planning in architecture.

The listening condition remains decisive: a person alone in headphones, a visitor moving through VR, someone walking through a route and a group sharing a room all require different spatial decisions.


Object-Based Audio


Some projects need sound to remain active as behaviour rather than become fixed too early to channels. A voice, sound effect, musical element or moving source may need to change position, respond to interaction or adapt to different playback systems.

Object-based audio keeps these elements as objects with metadata. Position, level, movement or behaviour can travel with the sound element, and a renderer translates that information into the available playback setup.

This can be useful when a project needs movement, adaptation, interactivity or translation across different loudspeaker systems. Instead of committing every spatial decision to fixed channels, object-based workflows can keep some spatial information active until rendering.

The value lies in adaptability, but the result depends on the renderer, metadata workflow and playback system. Objects are not meaningful because they are technically movable; they become meaningful when their behaviour supports the spatial score.

In scenographic terms, object-based audio becomes interesting when sound needs to appear, move, respond, change proximity or connect different areas through a spatial logic.


ADM, BW64 And Delivery Metadata


In some workflows, audio files need to carry more than sound signals. They also need to carry information about what those signals are, how they are organised and how they should be interpreted by production systems or renderers.

The Audio Definition Model, or ADM, is one standardised way to describe audio content, including channel-based, object-based and scene-based material. In delivery workflows, ADM metadata can be embedded with audio in file formats such as BW64, so that spatial information can travel with the audio file.

This layer is usually not the visible part of an exhibition or installation. It becomes relevant in archiving, handover, immersive media delivery, broadcast-related workflows or translation between production systems.

When long-term compatibility and documentation matter, metadata can become part of the project’s technical reliability. It helps ensure that spatial decisions are not only heard during production, but can also be understood, transferred and maintained later.


Wave Field Synthesis


Some spatial reproduction methods focus less on moving sound between loudspeakers and more on creating the impression of virtual sound sources in space.

Wave Field Synthesis uses loudspeaker arrays to support this kind of reproduction. Instead of relying only on phantom images between loudspeakers, it aims to reconstruct wavefronts so that sounds can appear to come from defined positions.

This can be relevant for installations, research contexts and specialised spatial environments where a large loudspeaker array and precise system design are possible. It can support spatial experiences where localisation should remain stable across a broader listening area.

The practical complexity is significant. Wave Field Synthesis usually requires many loudspeakers, careful calibration, processing and architectural integration. Its usefulness therefore depends strongly on whether the spatial idea, room and production conditions justify that level of system design.


Hybrid Workflows


Many real projects do not use one pure format or method. The spatial idea may require different approaches at different moments of the production and playback chain.

A spatial installation may combine a stable multichannel composition with flexible movement, headphone previews, spatial recordings, localised loudspeaker feeds and show-control data in one workflow.

This is especially common in exhibitions, museums and pavilions, where the final experience has to respond to architecture, visitor movement, neighbouring media and long-term operation.

Here, each method is chosen for its role in the experience. Some parts of a project may need flexibility. Others may need reliability. Some sounds may need to move through space, while others need to stay fixed, clear or local.

Hybrid workflows are not a compromise by default. They are often the most realistic way to keep the spatial idea intact across studio production, system integration and the actual listening condition.


Choosing The Right Format Or Method


The right format or method depends on what has to remain flexible and what has to remain fixed.

When the loudspeaker layout and room are known, channel-based or pre-rendered multichannel playback may offer the most direct control. When the project needs to adapt to different playback systems, Ambisonics or object-based workflows may be more useful. When the experience is individual or headphone-based, binaural rendering becomes central. When the goal is a specialised loudspeaker environment with stable virtual sources, Wave Field Synthesis may become relevant.

The decision also depends on production, delivery and maintenance. A format or method that works well in the studio may not be the best choice for a museum system that runs every day. A flexible format may require more careful rendering, documentation and testing. A simple playback format may be easier to maintain, but harder to adapt later.

The useful question is not which format or method is technically most advanced. It is which choice protects the spatial idea through production, integration, playback and long-term operation.


In Exhibitions, Museums And Pavilions


In exhibitions, museums and pavilions, spatial audio formats and methods have to work under open listening conditions. Visitors enter at different moments, move at different speeds and rarely listen from one fixed ideal position.

This changes the role of format choice. The method has to support the way sound belongs to the room: whether it should remain local to an object, move through a route, support speech, surround a projection, create atmosphere, guide attention or connect several spaces into one spatial score.

Long-term operation also matters. The chosen format or method has to be documented, serviceable and understandable for the people who will maintain the installation after opening. This affects whether a project should use fixed multichannel playback, object-based rendering, binaural delivery, hybrid workflows or another approach.


For KLING KLANG KLONG, these decisions are part of sound scenography. A method is chosen because it helps sound behave meaningfully in architecture, not because it carries a fashionable technical label.


How We Work

We begin by defining what the sound has to do in space. Does it need to appear from a specific location, surround, guide, move, respond, remain fixed, travel with the visitor or connect several rooms?

From there, we choose the format or method that can carry that spatial intention through production and playback. This may involve channel-based multichannel composition, Ambisonics, binaural rendering, object-based workflows, Wave Field Synthesis or a hybrid approach.

The workflow is then tested against the real conditions of the project: architecture, acoustics, loudspeaker layout, headphones, media playback, show control, interaction, maintenance and documentation.

The final method is the one that keeps the spatial idea audible, stable and adaptable where it needs to be — from studio to site.


RELATED PRACTICES


Spatial audio formats and methods connect technical workflow with spatial composition, sound scenography and the design of listening experiences.


Spatial Audio Systems & Renderers


Spatial Audio Production Workflows


Spatial Audio


Sound Scenography


Spatial Score


Exhibition Sound Design


Immersive Sound


Interactive Sound

Generative Sound


‍FAQ


What is a spatial audio format?

A spatial audio format describes how spatial sound information is stored, exchanged or delivered. It may include channels, objects, scene-based information, metadata or binaural audio.


What is the difference between a spatial audio format and a spatial audio method?

A format describes how spatial sound information is stored, exchanged or delivered. A method describes how sound is organised, rendered or reproduced in space. In practice, projects often combine several formats and methods in one workflow.


Is Ambisonics the same as spatial audio?

No. Ambisonics is one way of representing spatial audio. Spatial audio is the broader practice of designing sound in relation to space, movement and listening position.


What is object-based audio?

Object-based audio treats sounds as elements with metadata such as position, level, movement or behaviour. A renderer translates this information into the chosen playback setup.


What is binaural audio used for?

Binaural audio is mainly used for headphone listening, VR, audio walks, web experiences and spatial previews. It can create a spatial impression around the listener, but it does not behave like loudspeaker sound in a shared room.


What is Wave Field Synthesis?

Wave Field Synthesis is a spatial reproduction method that uses loudspeaker arrays to create the impression of virtual sound sources in space.


Which spatial audio format is best?

There is no universal best format. The right choice depends on the spatial intention, content, playback system, visitor movement, production workflow and long-term operation.


Can different spatial audio methods be combined?

Yes. Many exhibitions, installations and immersive environments use hybrid workflows that combine multichannel playback, object-based movement, binaural previews, Ambisonics recordings or other methods.


Planning a spatial audio installation, exhibition or immersive environment?

We help choose the right spatial audio format, method and workflow for the project — from spatial concept and production setup to rendering, integration and on-site testing.

Start a project