What is a spatial audio production workflow?

A spatial audio production workflow is the process that connects concept, composition, preview, technical planning, system integration and on-site tuning for a spatial listening experience.

Why are previews important in spatial audio?

Previews help teams hear and discuss spatial behaviour before the final system is available. They support early decisions, communication and testing, but they do not replace the final playback condition.

Can binaural previews replace a loudspeaker installation?

No. Binaural previews are useful over headphones, especially for early review or VR/XR workflows, but they cannot fully predict how sound behaves in a shared physical space with loudspeakers, acoustics and visitor movement.

How are real-time engines used in spatial audio production?

Real-time engines can test sound behaviours that respond to movement, position, interaction, data or changing visual scenes. They are especially useful for interactive installations, VR, XR and simulation-based workflows.

Why is on-site tuning necessary?

On-site tuning is necessary because architecture, materials, loudspeaker placement, reflections, neighbouring media and visitor movement change how spatial audio is perceived.

When should spatial audio workflow planning begin?

It should begin early, together with architecture, exhibition design, media planning and interaction design. This makes it possible to treat sound as part of the spatial concept rather than as a late production layer.

What do games have to do with spatial audio workflows?

Games have trained many audiences to read sound as spatial information: direction, distance, danger, proximity, atmosphere and response. For exhibitions and installations, this makes it useful to test not only how a system plays back sound, but how visitors understand and act on sonic behaviour.

Spatial Audio Formats & Methods

Spatial audio formats and methods define how spatial information is described, carried, translated or reproduced.

They determine how a listening concept moves through production: whether it is fixed to a known playback setup, kept flexible until rendering, prepared for headphones, carried with metadata or translated into a larger loudspeaker environment. These choices affect production, delivery, playback and the way a spatial composition survives between studio, system and site.

For KLING KLANG KLONG, formats and methods are part of how a spatial score becomes audible: how sound can move, remain local, surround, guide, respond, scale across systems or belong to a specific room.

This Technical Note is the companion to Spatial Audio Systems & Renderers. That article explains the playback environment. This one explains the formats and methods that may feed into, describe or shape that environment.

‍

IN BRIEF

A spatial audio format describes how spatial sound information is stored, exchanged or delivered. It may carry channel signals, object positions, scene-based information, metadata or binaural audio.

A spatial audio method describes how sound is organised or reproduced in space. Some methods are built around a known loudspeaker layout, some keep spatial decisions flexible until rendering, some are made primarily for headphones, and others depend on soundfield representation or loudspeaker-array reproduction.

These formats and methods only become meaningful inside a workflow. A format can describe spatial information, a renderer can translate it, and a playback system can make it audible in a specific listening condition.

The important question is therefore not which format sounds most advanced, but which one keeps the spatial logic clear from composition to playback.

‍

Formats, Methods And Delivery Layers

The terms in this article do not all describe the same kind of thing. They sit on different levels of the spatial audio workflow: composition, representation, rendering, delivery and playback.

At this point, the important thing is not to understand every term in detail, but to see the difference between those levels. One approach may be a way of composing for fixed loudspeakers. Another may describe space before it is rendered. Another may support headphone listening. Others may deal with metadata delivery or loudspeaker-array reproduction.

These levels often overlap. A spatial installation may combine a stable multichannel composition with flexible movement, headphone previews, spatial recordings, localised loudspeaker feeds and show-control data within one production process.

‍

Why Formats And Methods Matter

Formats and methods matter because they shape the bridge between composition and technical delivery.

A flexible workflow can make it easier to adapt a project to different playback systems, while a fixed approach can offer precision when the loudspeaker layout is known. A headphone experience, a room-based installation, a loudspeaker array and a multi-zone exhibition each ask for different forms of spatial translation. Some workflows preserve movement and metadata until rendering; others favour stability, predictability and long-term reliability.

These choices also affect how sound can be updated, synchronised, monitored, archived, toured or maintained. A format that works well in a studio may not be the best choice for an installation that runs every day.

A format does not guarantee a spatial experience. A binaural file, an Ambisonics scene, an object-based mix or a multichannel playback system can all be spatially meaningful or spatially weak. What matters is whether the method supports the dramaturgy, architecture and movement pattern of the project.

‍

Channel-Based Multichannel Audio

Some spatial projects do not need a format that remains flexible until the last rendering stage. When the room, loudspeaker layout and visitor journey are known, it can be more useful to compose directly for the installed playback system.

In exhibitions, museums and pavilions, these choices affect more than production. They influence how sound can be updated, synchronised, monitored, archived, toured or maintained. They also determine how easily a spatial composition can move from studio preview into the acoustic reality of the site.

Channel-based multichannel audio works in this way. Sound is assigned to defined output channels or loudspeaker feeds. Stereo, 5.1, 7.1 and many exhibition-specific multichannel setups belong to this broad family.

In spatial installations, this approach can offer direct control over level, timing, movement, transitions and acoustic spill. The composition is shaped for the system that will actually be heard, rather than translated later into an unknown layout. Its limitation appears when conditions change. If the loudspeaker layout, room or playback setup is altered, the spatial composition may need to be adapted or remixed.

This is why channel-based work remains important in exhibitions. Many long-running installations do not need a complex live format; they need a spatial composition that is tuned carefully to a known room and remains stable every day.

‍

Ambisonics And Scene-Based Audio

‍

When the final playback setup may change, it can be useful to describe spatial sound without tying it immediately to fixed loudspeakers.

Ambisonics does this by representing sound as a soundfield. Instead of saying “this sound belongs to this loudspeaker,” it carries directional information that can later be decoded for loudspeakers or rendered binaurally for headphones.

This makes Ambisonics useful for 360-degree video, virtual reality, soundfield recording, immersive media and workflows where the final playback setup may vary. First-order Ambisonics can describe a basic spherical soundfield, while higher-order Ambisonics can offer greater spatial resolution when the production and playback chain support it. Its strength is portability. The same material can travel through different production and playback contexts more easily than a mix fixed to one loudspeaker layout.

The final experience, however, still depends on decoding, loudspeaker setup, binaural rendering and the actual listening condition. Ambisonics can carry spatial information flexibly, but it does not by itself make a room feel spatially meaningful.

‍

Binaural Audio And HRTF-Based Rendering

Not every spatial experience is shared through loudspeakers in a room. Some are designed for headphones, individual listening, VR, audio walks, web experiences or production previews.

Binaural audio creates a spatial impression over headphones by considering how sound reaches the two ears. It can be produced through dummy-head recording, binaural synthesis or rendering from another spatial format.

HRTFs, or head-related transfer functions, describe how sound from different directions is filtered by the listener’s head, ears and body before reaching the ears. They are one reason binaural rendering can make sound appear around, above or behind the listener.

During production, binaural rendering can also support headphone-based previews of spatial sketches or early mixes, especially in VR/XR, real-time and simulation-based workflows. It remains a preview method, not a substitute for the final playback condition.

The main challenge is translation. A binaural mix is heard through headphones and does not behave like sound in a shared physical room. For exhibitions and installations, it should therefore be understood as one playback and preview method, not as a replacement for spatial planning in architecture.

The listening condition remains decisive: a person alone in headphones, a visitor moving through VR, someone walking through a route and a group sharing a room all require different spatial decisions.

‍

Object-Based Audio

Some projects need sound to remain active as behaviour rather than become fixed too early to channels. A voice, sound effect, musical element or moving source may need to change position, respond to interaction or adapt to different playback systems.

Object-based audio keeps these elements as objects with metadata. Position, level, movement or behaviour can travel with the sound element, and a renderer translates that information into the available playback setup.

This can be useful when a project needs movement, adaptation, interactivity or translation across different loudspeaker systems. Instead of committing every spatial decision to fixed channels, object-based workflows can keep some spatial information active until rendering.

The value lies in adaptability, but the result depends on the renderer, metadata workflow and playback system. Objects are not meaningful because they are technically movable; they become meaningful when their behaviour supports the spatial score.

In scenographic terms, object-based audio becomes interesting when sound needs to appear, move, respond, change proximity or connect different areas through a spatial logic.

‍

ADM, BW64 And Delivery Metadata

In some workflows, audio files need to carry more than sound signals. They also need to carry information about what those signals are, how they are organised and how they should be interpreted by production systems or renderers.

The Audio Definition Model, or ADM, is one standardised way to describe audio content, including channel-based, object-based and scene-based material. In delivery workflows, ADM metadata can be embedded with audio in file formats such as BW64, so that spatial information can travel with the audio file.

This layer is usually not the visible part of an exhibition or installation. It becomes relevant in archiving, handover, immersive media delivery, broadcast-related workflows or translation between production systems.

When long-term compatibility and documentation matter, metadata can become part of the project’s technical reliability. It helps ensure that spatial decisions are not only heard during production, but can also be understood, transferred and maintained later.

‍

Wave Field Synthesis

Some spatial reproduction methods focus less on moving sound between loudspeakers and more on creating the impression of virtual sound sources in space.

Wave Field Synthesis uses loudspeaker arrays to support this kind of reproduction. Instead of relying only on phantom images between loudspeakers, it aims to reconstruct wavefronts so that sounds can appear to come from defined positions.

This can be relevant for installations, research contexts and specialised spatial environments where a large loudspeaker array and precise system design are possible. It can support spatial experiences where localisation should remain stable across a broader listening area.

The practical complexity is significant. Wave Field Synthesis usually requires many loudspeakers, careful calibration, processing and architectural integration. Its usefulness therefore depends strongly on whether the spatial idea, room and production conditions justify that level of system design.

‍

Hybrid Workflows

Man yreal projects do not use one pure format or method. A spatial installation may combine a stable multichannel composition with flexible movement, headphone previews, spatial recordings, localised loudspeaker feeds and show-control data in one workflow.

This is especially common in exhibitions, museums and pavilions, where the final experience has to respond to architecture, visitor movement, neighbouring media and long-term operation.

Hybrid workflows are not a compromise by default. They are often the most realistic way to keep the listening concept intact across studio production, system integration and the installed situation. Format and method choices are most useful when they serve a clear listening intention. In practice, they are often developed together with system design and rendering and spatial audio production workflows.

‍

Choosing The Right Format Or Method

The right choice depends on what has to remain flexible and what has to remain fixed.

When the loudspeaker layout and room are known, channel-based or pre-rendered multichannel playback may offer the most direct control. When the project needs to adapt to different playback systems, Ambisonics or object-based workflows may be more useful. When the experience is individual or headphone-based, binaural rendering becomes central. When the goal is a specialised loudspeaker environment with stable virtual sources, Wave Field Synthesis may become relevant.

The decision also depends on production, delivery and maintenance. A flexible format may require more careful rendering, documentation and testing. A simple playback format may be easier to maintain, but harder to adapt later.

The strongest choice is the one that carries the spatial logic through production, integration, playback and long-term operation.

‍

In Exhibitions, Museums And Pavilions

Exhibitions, museums and pavilions rarely offer one ideal listening position. Visitors enter at different moments, move at different speeds and listen while sharing the space with other people.

Format choice therefore has to support how sound belongs to the room: whether it should remain local to an object, move through a route, support speech, surround a projection, create atmosphere, guide attention or connect several spaces into one spatial score.

Long-term operation also matters. The chosen format or method has to be documented, serviceable and understandable for the people who will maintain the installation after opening.

For KLONG, these decisions belong to sound scenography. A method is chosen because it helps sound take a meaningful role in architecture, not because it carries a fashionable technical label.

‍

How We Work

‍

We begin by defining the role of sound in the space. Does it need to appear from a specific location, surround, guide, move, respond, remain fixed, travel with the visitor or connect several rooms?

From there, we choose the format or method that can carry that role through production and playback. This may involve channel-based multichannel composition, Ambisonics, binaural rendering, object-based workflows, Wave Field Synthesis or a hybrid approach.

The workflow is tested against the project itself: architecture, acoustics, loudspeaker layout, headphones, media playback, show control, interaction, maintenance and documentation.

The final choice is the one that keeps the spatial composition audible, stable and adaptable where it needs to be.

‍

RELATED PRACTICES

Spatial audio formats and methods connect technical workflow with spatial composition, sound scenography and the design of listening experiences.

Spatial Audio Systems & Renderers

Spatial Audio Production Workflows

Spatial Audio

Sound Scenography

Spatial Score

Exhibition Sound Design

Immersive Sound

Interactive Sound
‍

Generative Sound

‍

‍FAQ

What is a spatial audio format?‍

A spatial audio format describes how spatial sound information is stored, exchanged or delivered. It may include channels, objects, scene-based information, metadata or binaural audio.‍

What is the difference between a spatial audio format and a spatial audio method?

‍A format describes how spatial sound information is stored, exchanged or delivered. A method describes how sound is organised, rendered or reproduced in space. In practice, projects often combine several formats and methods in one workflow.‍

Is Ambisonics the same as spatial audio?

‍No. Ambisonics is one way of representing spatial audio. Spatial audio is the broader practice of designing sound in relation to space, movement and listening position.‍

What is object-based audio?‍

Object-based audio treats sounds as elements with metadata such as position, level, movement or behaviour. A renderer translates this information into the chosen playback setup.‍

What is binaural audio used for?

‍Binaural audio is mainly used for headphone listening, VR, audio walks, web experiences and spatial previews. It can create a spatial impression around the listener, but it does not behave like loudspeaker sound in a shared room.‍

What is Wave Field Synthesis?

‍Wave Field Synthesis is a spatial reproduction method that uses loudspeaker arrays to create the impression of virtual sound sources in space.‍

Which spatial audio format is best?

‍There is no universal best format. The right choice depends on the spatial intention, content, playback system, visitor movement, production workflow and long-term operation.‍

Can different spatial audio methods be combined?

‍Yes. Many exhibitions, installations and immersive environments use hybrid workflows that combine multichannel playback, object-based movement, binaural previews, Ambisonics recordings or other methods.

‍

Planning a spatial audio installation, exhibition or immersive environment?

We help choose the right spatial audio format, method and workflow for the project — from spatial concept and production setup to rendering, integration and on-site testing.

Start a project

SPATIAL AUDIO FORMATS & METHODS