Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Audio mixing in Unity

Discussion in 'Scripting' started by neonleif, Mar 30, 2011.

  1. neonleif

    neonleif

    Joined:
    Feb 25, 2009
    Posts:
    29
    Just found a nice question on answers which I think would do a better job as an actual forum discussion.

    The question was What are some ways to prevent audio flanging and phase cancellation when using a lot of voices?


    Funny... I was prototyping a system like this with a collogue at last week's NinjaCampIII. It still needs a lot of work to be merged into trunk and released as a built-in feature in Unity, but I'm surprised how easy it was to make the system work in Unity.

    Our first proof of concept was a simple implementation of what DICE is using in their FrostBite engine with the Battlefield games. You can find [several][1] [slides][2] on [slideshare][3] explaining all the neat details. Then all which is left is simply to implement and optimize it :)

    The basic idea is that only audio sources within a limited dB range (≈50dB) can be played at once. So we added a loudness property to all 3D sounds in the project and did a rough mix based on real world dB values (e.g. wind≈30, speech≈60, gun≈160, nuke≈195+ etc.).

    Horizontally you should also limit the amount of sources playing within the 50dB window, which we didn't get to. But the idea would be the same, just limiting the range in which sounds of the same loudness is audible.

    It'll interesting to hear about how you accomplish it and what you need from a system like this.


    [1]: http://www.slideshare.net/DICEStudio/adaptive-mixing-in-frostbite
    [2]: http://www.slideshare.net/DICEStudio/automatic-audio-in-frostbite
    [3]: http://www.slideshare.net/DICEStudi...tudies-from-battlefield-bad-company-frostbite
     
  2. invicticide

    invicticide

    Joined:
    Nov 15, 2009
    Posts:
    109
    I posted that question. And you just became my favorite person today. :)

    Checking out presentations now!
     
  3. invicticide

    invicticide

    Joined:
    Nov 15, 2009
    Posts:
    109
    SlideShare is cool but not having a transcript of the original speaker presentation can sometimes make the slides a bit tough to decipher. :p

    So the "HDR audio" thing... you talk about only playing sounds within this ~50db loudness window, but it seems like that needs a reference point. What defines that range? Are you sampling the loudness property of all currently-audible sounds, then grabbing the middle ~50db and calling that the currently-audible range, then scaling the playback volume of individual sounds accordingly? Also, if I'm not mistaken, that loudness property is a manually defined thing, yes? Several of the slides noted that they're not sampling the waveform, so it sounds like you're just tagging each sound with some perceived loudness value that describes it?

    A thing occurred to me... My approach so far has been to limit the number of simultaneous voices by AudioClip -- so, 16 rifle shots because they're common, 4 building explosions because they're not so common, etc. -- and set a minimum delay before a given AudioClip can be retriggered. In most cases right now I've set that delay to effectively "next frame" just to prevent triggering the same sound on top of itself thus making it extra loud. But "next frame" at 30 FPS = 33ms delay, and IIRC the ear requires something like 50ms to separate transients, so a bunch of rifle shots playing on successive frames could be bleeding together into what I perceive as one long, sustained sound. Accounting for that in the retrigger delay seems like it should help, I think?
     
  4. neonleif

    neonleif

    Joined:
    Feb 25, 2009
    Posts:
    29
    Agreed. We ended up with a bunch of questions about some of the higher level problems in the design, but the algorithms are all there.


    That was how we built it as well. Set the Loudness manually and let the algorithm set the volume and aggregated (attenuated) loudness according to the distance to the listener or an arbitrary point of interest (e.g. an avatar in 3rd person). I guess the loudness window can be any dB range that fits your soundscape.


    We haven't gotten to the point of limiting the number of voices yet (we may be able to over a couple of FAFFs), but I would guess that it would be easy to add that to the HDRAudio system as it already handles limitation in the loudness dimension.
    With your approach, it sounds like an easy fix to set the delay until the next instance of the sound is played to a time value instead of 'next frame" - just a matter of tweaking, I guess. But It would be nice to simply a global limit that could produce a sane delay, e.g. based on the length of the clip and the projected frequency of instantiations... Meh... all talk :) it should probably be tried first.