audio signals, mastering, object-based spatial, audio processing, production process, Mastering Audio, A. Gra�fe, S. Meltzer, Thomson Course Technology, Wave field synthesis, F. Melchior, SMPTE Motion Imaging Journal, motion picture, Melchior, audio production, mixing process, Uwe Michaelis, Frank Melchior, audio signal, audio content, object-based, Focal Press, The process, digital audio, spatial resolution, spatial properties, audio object, independent production, spatial position, spatial audio, sound source, spatial motion, authoring tool, azimuth angle, sound spatialization, digital audio workstations, scene description, sound sources, Spatial audio effects, digital audio workstation
SPATIAL MASTERING - A NEW CONCEPT FOR SPATIAL sound design
IN OBJECT-BASED AUDIO SCENES Frank Melchior, Uwe Michaelis, Robert Steffens IOSONO GmbH Erfurt, Germany [email protected]
ABSTRACT The current work flow of audio production is based on the channel based paradigm. Audio signals are produced with the purpose to be played back using discrete speakers at several more or less defined positions. If such a position cannot be realized on the reproduction side
the auditory scene is distorted. Driven by developments of new audio reproduction methods like wave field synthesis (WFS)  the object-based paradigm is a well known alternative . This concept can also be applied to standard and future surround formats to deliver a clear format definition . During the production process
, the current work flow of audio production has to be applied also to objectbased content. While tools for mixing and recording are available the step of mastering has not been covered, yet. This paper describes the concept of object-based mastering, also termed spatial mastering. Beside the concept, a prototype and new interaction methods for controlling the process are presented.
to adjust the overall balance or Frequency distribution
of the content. 2. OBJECT-BASED AUDIO PRODUCTION In the context of an object-based scene representation the speaker signals are generated at the reproduction side. This means, a master in terms of speaker audio signals does not exist. In comparison to the process described above the object-based paradigm changes the mixing process of the audio content. In an object-based mix, each audio source is treated as a separate audio object with associated properties like position and source type, to mention only a few. The architecture and the basic Building Blocks
of an object-based mixing process are depicted in Figure 1. The
1. AUDIO PRODUCTION The production process of audio content consists of three important steps: recording , mixing  and mastering . During the recording of the musicians a large number of separate audio files are generated. In order to generate a format which can be distributed, these audio data are mixed to standard formats like stereo or 5.1 Multi Channel
surround sound. During the mixing process a large number of processing devices are involved to generate the desired signals which are played back over a given loudspeaker layout. After mixing the signals, these cannot longer be separated or processed separately. The last step is the mastering of the final audio data format. In this step, the overall impression is adjusted or if several tracks are compiled for a single medium (e.g. CD) the characteristics are matched during this step. In the context of channel-based audio representation the mastering process means editing the final audio signal for each loudspeaker. In comparison, in the previous production step (mixing) a large number of audio signals is processed and edited in order to achieve a speaker based representation e.g. left and right. In the mastering stage, only the two signals left and right are processed. This process is important
Figure 1. Block diagram of the object-based production. general System Design
is based on a stratified approach for sound spatialization . For a better technical understanding in this section the production process is divided into spatial authoring and audio authoring. From a user point of view
this separation does not hold because of strong interaction between the spatial layout of an auditory scene and the properties of the underlying audio signals. The audio authoring part provides all commonly known audio editing and arranging functionalities, which are part of modern digital audio workstations. The audio scenes consist of a multitude of single audio clips or audio track
s, which involve specific automation data. Audio authoring is split into the sound recording, editing and mixing process. Spatial authoring enhances the audio data with additional spatial properties defined as sound sources. A
sound source consists of the audio data from the audio authoring and the spatial properties from the spatial authoring tool, termed Spatial Audio Workstation (SAW). The spatial authoring module also provides functionalities to ease the spatial mixing process: · Visualization of the sound source position. · Automation and editing of spatial motion paths. · Spatial audio effects based on source properties. The visualization of the sound source positions allows the user to get an overview of the entire scene. The spatial authoring modules provide functionality to individually record/playback and edit spatial motion paths for each sound source. To create complex audio scenes, spatial authoring enables the building of complex hierarchies of sound sources by grouping of sound sources. Nevertheless, the final production step of mastering is required to adapt and optimize the content. A method for mastering object-based audio content without generating the reproduction signals for dedicated speaker layouts has been developed. While adapting the process of mastering to an object-based audio content it can also be used to generate new spatial audio effects. 3. SPATIAL MASTERING The process of object-based spatial mastering converts a given auditory scene description consisting of audio signals and corresponding meta data
into a new set of audio signals corresponding to the same or different meta data set
. In this process, arbitrary audio processing is used to transform the signals. The processing devices are controlled by a parameter controller. Figure 2 presents a block diagram of the process in general. The key element of the process is the transformation of the audio signals by an audio processing device using a meta data dependent parameter controller. In an object-based scene description, specific parameters of audio objects are stored. Such parameters for example consist of position or direction of an audio source. This data can be either dynamic or static during the scene. These data are processed by the meta data dependent parameter weighting (MDDPW) by extracting a specific set of meta data and generate a modified set as well as a control value for an audio processing unit. Figure 3 depicts a detail block diagram of the MDDPW. The MDDPW receives the scene description and extracts a single parameter using the parameter selector. This selection can be done by a user or is given by a specific fix configuration of the MDDPW. Often this might be the azimuth angle. A direction function is provided by the directional control which is scaled or adapted by the adaptation factor and used for the generation of a control value by the parameter weighting. The control value can be used to control a specific audio processing as described in the following section and to change parameters in the scene description
using the parameter exchange. This can result in a modified scene description. An example for the modification of the scene description can be given considering the value of an audio source. In this case, the azimuth angle of a source is used to scale the stored volume value of the scene description depending on the directional function. In this scenario the audio processing is done on the rendering side. An alternative implementation can use an audio processing unit ,e.g. a digital audio workstation plug-in to modify the audio data directly depending on the required volume. The volume value in the scene description is not changed. In that case, in the first step a parameter of the Figure 3. Block diagram of the parameter weighting. scene description is selected. An illustrative example is the azimuth angle of the source. Using an adaptation factor, the parameter is combined with a directional function and a control value for a Signal Processing
is generated. The weighting controller as given in Figure 4 enables the user to specify the direction dependent control values used in the MDDPW. In case of a two dimensional scene de- Figure 4. Example of a directional controller with interaction elements for rotation (2), direction emphasis (3) and direction dependent values (1). scription this can be visualized using a circle. In a three dimensional system a sphere is more adequate. Without loose of generality the description in this paper is limited to the two dimensional version. The knobs (1) in Figure 4 are used to define specific values for a given direction. The rotation knob (2) is used to rotate all knobs simultaneously. The center knob (3) is us to emphasis a specific direction. It can be controlled by clicking the mouse inside the knob and dragging it to the desired direction. While the knobs deliver specific values defined by the user all in between values are calculated by interpolation. An example with linear interpolation
and four directions in the controller is depicted in Figure 5. The rotation knob (2) is
Figure 2. Block diagram of the spatial mastering process.
Figure 5. Example of an interpolation to get the directional function r(). The corresponding representation in the user interface
can be found in Figure 6.
used to specify an offset rot . This offset is applied to the azimuth angles 1 to 6 by
i = i + rot
where i indicates the azimuth angle index. The center
knob controls the values r1 to r6 of the knobs. Depending
on the displacement vector
, a scaling value
rscal is calculated using
1 + (xd + yd) 2
and applied to the values for the specific point by
ri = rirscal .
The output of the directional controller is a continues parameter function r() generated by a specific interpolation function based on the values of knobs (1) defined by
r() = inter pol(1, . . . N)
where N indicates the number of knobs used in the controller. The controller is independent from the spatial resolution
of the underlying processing.
All available or future audio processing algorithms can be used in the context of object-based spatial mastering. The only constraint is the availability of at least one parameter which can be changed in real time. Two simple examples should be considered here. First, a direction dependent change of level. In this case, the samples of digital audio signals are multiplied with a factor. The factor is derived depending on the direction or azimuth angle of an audio source. For each sample, the value of the azimuth angle is extracted from the scene description. The directional function is applied to the azimuth angle and the samples are multiplied with the given value. As a result, the level of an audio source is changed depending on its spatial position. In another example, the frequency of a filter is controlled by the directional function. As a result, the frequency distribution of a given audio object is modified depending on its direction. 4. PROTOtype system
Based on the Spatial Audio Workstation the concept was implemented as a Core plug-in for a digital audio workstation. The Spatial Audio Workstation enables the reproduction system
independent production of multichannel content with an object-based scene description. Each audio track or audio event is represented as a sound source in a top view of the scene. A detailed description
of this authoring tool can be found in . The prototype as shown in Figure 6 enables the selection of an arbitrary plug-in parameter of a VST plug-in in the channel of the audio object and control this parameter dependent on the position of the source. In order to control the angular distribution
a weighting controller was implemented as well. In the given example, the gain of a single high shelfing equalizer is controlled by the spatial mastering. The result of the current setting is that sources in the frontal area
Figure 6. Screen shot of a prototype implementation. The gain of the high shelfing equalizer of the selected source is controlled by the weighting controller shown in the lower left corner.
get a high frequency attenuation. One can think of several applications: It is possible to process the ambience or direct sound
of a mix dependent on a directional separation. Moving sources can be modified by a position dependent audio effect as well. A complete audio scene can be modified by the direction controller in order to enhance directions using a simple gain modification. 5. CONCLUSION A new concept to adapt the mastering process in objectbased audio was presented. A detailed view was given on the implementation of the required controller and integration in the audio and meta Data processing
. The new directional controller enables an intuitive modification of parameters depending on the source position. A prototype system has been shown. Beside the emulation of the mastering in the object-based production this is a new kind of spatial audio effect which can also be applied to conventional multi-channel productions. First experiments by professional users have shown a high potential for new spatial sound design applications. 6. REFERENCES  A. J. Berkhout and D. de Vries, "Acoustic holography for sound control," Convention paper presented at the 86th AES Convention, Hamburg, March 1989.
 B. Katz, Mastering Audio
. Focal Press, 2002.  F. Melchior, "Wave field synthesis and object-based mixing for Motion Picture
sound," SMPTE Motion Imaging Journal, vol. 119, no. 3, pp. 53 57, April 2011.  F. Melchior and T. Sporer, "3D Audio - Just add another channel ? Object-based audio for Motion Picture Sound," presented at the 26. VDT International convnetion, November 2010
.  S. Meltzer, L. Altmann, A. GraЁfe, and J.-O. Fischer, "An object oriented mixing approach for the design of spatial audio scenes," Presented at the 25. VDT Convention, 2008.  B. Owsinski, The Mixing Engineer's Handbook. Thomson Course Technology, 2006.  ----, The Recording Engineer's Handbook. Thomson Course Technology, 2009.  N. Peters, T. Lossius, J. Schacher, P. Baltazar, C. Bascou, and T. Place, "A stratified approach for sound spatialization," presented at the SMC 2009, July 2009.
F Melchior, U Michaelis, R Steffens