Thursday, January 6, 2011

The Sounds of Silence

If you think your Dolby SRS sounds amazing, you’re sadly mistaken.

 

Aeonix Headphones

 

When faced with the task of creating a fully immersive virtual environment, quite often I have found the results to be lackluster at best. In terms of SecondLife, it is not the textures or details for models (prim or mesh) that concern me, but something more important.

 

Let’s Set The Stage

 

While we can look at textures and always say that higher resolutions are better representative of the image involved, there exists a static approach to this process which will forever limit our ability to truly see an exponential increase in the fidelity over time. 512x512 resolution or 1024x1024 resolution are the sorts of limitations that are inherent with graphic cards cache and so we are forever limited until those numbers increase.

 

The same can be said about the detail levels of mesh and prim creations, in that the polygons there are (vertices, etc) the more calculation it takes to render that item in three dimensions, and thus in an environment such as SecondLife (I use SL as an example quite often), we’re stuck with lower end mesh and prims with orders of magnitude lower fidelity.

 

However, this is not entirely true, in that we aren’t completely limited by the hardware in either case. Instead we are limited by the implementation of how those hardware components are utilized. In the case of textures, it is not the limitation of 1024x1024 resolution which exists, but instead the limitation that the GPU can only comfortably (in most cases) handle that resolution in cache at any given moment. We would assume then that the two statements are the same, but there is a distinct difference in subtlety.

 

In the area of procedural textures, a much higher fidelity of texture can be synthesized with algorithmic methods, creating the ability to scale upward dynamically in resolution while not losing fidelity. While the texture itself may be (in theory) 32,000x32,000 resolution (in 2kb), the idea here is that we don’t actually need to see the entire resolution at the moment, and therefore should stream only what we need through the GPU. Procedural methods for textures are much better for handling this concept of perceived hyper-texturing than any static texture could ever hope to achieve.

 

red_barn_door

 

Similar developments occur when we scrutinize mesh and prim based three dimensional constructs (3D Models), in that the detail level of the models imported into a scene are not necessarily representative of the capacity for fidelity. In our case, a 3D Mesh is merely a base fidelity which can be highly augmented for further clarity and detail through dynamic and algorithmic means such as Tessellation, Parallax Extrusion, Bump Mapping, Screen Space Ambient Occlusion, and more.

 

In terms of mesh detail, Tessellation comes to mind as a manner by which to give much more detail to models (and textures) through GPU algorithms. Take for instance the tessellation routines built into DirectX 11. Essentially what this sort of algorithm does is intelligently subdivide quads for further detail based on camera proximity, and often times the results can be quite amazing.

 

For those of you who were just derailed from that bit of geek speak, simply put: The closer you are to something, the more detail the GPU gives the models. In combination with Parallax Extrusion algorithms, we can take flat textures and make them 3D while adding further detail to models in virtual space where it did not originally have such a level of detail.

 

Again, we’re utilizing algorithmic methods (which are dynamic and not static) to augment and greatly enhance the detail of our virtual worlds.

 

 

Hardware Tessellation with DirectX 11 (UniEngine)

 

 

Of course, with Tessellation, we’re not going to suddenly find infinitely more detail where there was none. But at least it’s a beginning to what I believe will be a future trend for virtual worlds. Dynamic and procedural methods by which to automatically augment fidelity into the future. Let us not forget that these same sorts of algorithms can be deployed for things such as dynamic hair and clothing using a Physics Processing API. In SecondLife, the idea of amazing hair and clothing must surely appeal to a wider audience (mostly because I assume half of SL is in the fashion industry and the other half are partying in nightclubs).

 

 

Physically simulated clothing by CCP using NVIDIA APEX

 

 

Is It Real, Or Is It Memorex?

 

Now that I’ve outlined the two basic components of an environment, and how they can be made drastically better through different approaches to their use and creation, I’d like to focus now on the heart of this article and why you are reading.

 

It has been my contention that the reason most virtual environment spaces in SecondLife blast music at their locations is to avoid having to create a proper soundscape for their environment.

 

Audio is by far the most underutilized aspect of a proper environment, without it we simply move through spaces that are disturbingly quiet (like a silent movie) or are bombarded by loud music in order to cover the fact that in reality we (as designers) are piss poor at making a convincing environment which is capable of truly immersing our participants.

 

There is always exceptions to the rule with this, and there have been fantastic and breathless implementations of complete environmental immersion in SecondLife on occasion. However it still needs to be said that the nature of SecondLife will always limit the ability to actually make that environment convincing. Ten second audio in single channel with a low bitrate just isn’t going to cut it for creating our virtual realities in the future.

 

When taken into account, we would obviously think then that higher bitrate and stereo audio files would be the solution, and we would only be partially correct.

 

I truly believe that in order to create a completely immersive soundscape, we need to implement an audio solution that is better than stereo and high bitrates alone. Of course, then, we must be talking about Dolby Surround Sound right? Not exactly.

 

For virtual environments, I believe it is safe to say at this point that the use of headphones is commonplace. More often than not we are using our computers with a pair of headphones covering our ears, and most of the time that potential is horribly wasted on single channel audio effects or stereo music streams.

 

The human ear simply does not hear like that, and to make matters worse, the bitrate (quality) is absolutely crap to begin with for the source files. The ear is a wonderful thing, but hearing itself happens in our head. Your mind is possibly the most powerful audio processor ever known, thanks in large part to its ability to differentiate spatial awareness based on stereo input.

 

Cetera, which is what this amazing ability is called, is the mind’s ability to perceive the minute difference in arrival time between left and right ear for a sound and use that information to tell you where that sound is coming from in full spatial awareness. This is also something that can be implemented in standard stereo audio using a technique known as Binaural audio.

 

Depending on how the audio is to be created (or processed) you can simply record the source file in a manner which would create a binaural stereo recording, or you can incorporate the means to translate audio in a game environment on the fly using an API like Ghost which incorporates the Cetera Algorithm (among other techniques). Binaural alone isn’t the end of the line either when it comes to capturing spatial fidelity. If we take into account the proposed findings of Hugo Zuccarelli, then there may be an auditory interferometer which creates a sort of ambient reference sound for even higher clarity in interpretation in the mind.

 

In that case, we’re talking about Holophonic recording, which is a trademarked method of Binaural recording with additional methodologies to greatly enhance the output and spatial perfection. Of course, the claims of Holophonic audio are disputed by pretty much half the world while the other half swear by it. whether or not Hugo Zuccarelli really managed to create the equivalent to the audio hologram will more than likely remain in dispute for quite some time. What we can say for sure is that this is the same guy who also managed to create distortion free speakers and microphones that have zero feedback when held up to a speaker.

 

For now, we’ll simply focus on the undisputed claims of binaural audio.

 

Fiber Optic Head Trip

Aeonix Headphones 2

 

For this portion of the post, I’d like to direct your attention to the nearest pair of headphones. This is, after all, an interactive sort of blog and today we’re taking a field trip of the mind. In order for this to work, you need to be wearing a pair of headphones.

 

As we’ve discussed so far, Cetera is when your mind is taking into account the subtle differences between the arrival time of sound between your ears. This is actually quite specific and plays a massively important part in recreating spatial audio.

 

Got those headphones on yet?

 

Time for a Virtual Haircut

 

Alright, so this is a quick demonstration of binaural audio in action. For many, Luigi the Barber just made you cringe with a pair of clippers but also informed you how and why binaural audio works. As I said before, in order to make a more believable environment we need to explore methods by which we can enhance all aspects. At the moment, audio is the lowest hanging fruit to tackle, as it wouldn’t require novel techniques or incredibly expensive equipment to produce.

 

The problem is, binaural audio is recorded on a dummy head like the one below:

 

 

neumann_ku_100

 

Well, not entirely a problem if you are recording audio intended to be heard from a single point of reference. But what happens when we are trying to recreate spatial audio in a dynamic environment? In SecondLife, there is “spatial audio” but I’m not convinced it is entirely useful. As far as I can tell, the spatial audio is essentially panning left and right channels (at best) with volume and doesn’t really convey spatial depth like binaural audio does.

 

So what we need is a method of dynamic Cetera calculation for three dimensional environments in order to allow the sounds to properly convey in a changing vantage point.

 

Luckily such algorithms exist, and are being refined for better accuracy in real-time. You can check out the recent launch of Ghost Dynamic Binaural Audio and get a better understanding of how this sort of technology is useful overall.

 

In the meantime, whether or not you have access to dual channel binaural microphones for recording, you can still keep in mind that creating a believable environment in the virtual world requires attention to detail with your audio. If the visitor cannot close their eyes and become immersed in your sim, then it’s time to go back to the drawing board and make things better.

No comments:

Post a Comment