Immersive MR and VR with Depth Sensing – The Under-appreciated Future

Immersive MR and VR with Depth Sensing – The Under-appreciated Future

For this piece, I had two of Occipital’s Computer Vision Engineers, Brandon Minor and Jack Morrison, to give an even more technical dig into the world of MR and VR.

A seamless blending of Virtual and Reality, Mixed Reality (MR) is undergoing a surge of interest and engagement among consumers. More than a concept or a gimmick, MR has become both more immersive and more affordable with new HMD (Head Mounted Device) releases like the Microsoft HoloLens and the Occipital Bridge. Loosely defined as the integration of virtual characters, objects, and scenes into real world environments, Mixed Reality has huge potential to improve a variety of different fields from business to education to entertainment. MR is often differentiated from Augmented Reality by the complexity of the virtual and real-world interactions. In Mixed Reality, the virtual renderings do more than just float in space or act as a heads-up display: virtual characters hop on your couch, pyramids spring up from your coffee table, and it’s difficult to tell where the real-world ends and the virtual one begins.

Both HoloLens and Bridge offer integrated, untethered headset positional tracking which lets the user move freely, unencumbered by cabling and not limited by external sensors’ ranges. The freedom of movement allows users to experience a deeper sense of immersion and is a significant feat of engineering on its own. An underappreciated aspect of both these systems, however, is the depth sensor which is a critical piece of each system’s sensor suite. Bridge includes an Occipital Structure Sensor mounted above the screen, while HoloLens has an integrated custom depth sensor included above the bridge of the nose within the headset. These depth sensors are responsible for many of the unseen technical aspects of a user’s immersive Mixed Reality experience.

The Power of Depth

First introduced to many consumers with the Microsoft Kinect, depth sensors (also referred to here as depth cameras) are often a combination of an infrared projector and one or more infrared cameras that output “depth images,” describing distances to physical surfaces. These spatial measurements can be used for many things, including aiding in positional tracking, 3D scanning, object or body recognition, and scene understanding. Our company, Occipital, launched the Structure Sensor in 2013 as the first mobile depth sensor. It is a self-powered, mountable 3D sensor that gives iOS devices spatial awareness. Our Structure SDK, which gives developers the tools to build spatially-aware applications, has powered dozens of different applications in a range of industries including gaming, plastic surgery, fashion, and interior design.

There are a few attributes which make depth sensing (as opposed to normal monocular or stereo cameras) ideal for HMD-based scene understanding and mapping. Depth sensors provide high-frequency, accurate spatial measurements of a user’s surroundings, which are difficult to obtain otherwise, and they do so without requiring input or special actions by the user to map the environment. Additionally, depth cameras are relatively inexpensive and compact when compared with other 3D sensing approaches such as laser-based LIDAR.

Mixing Worlds with Depth Sensing

Depth sensing provides scene understanding and spatial awareness which is required for convincing and effective Mixed Reality. Without knowing the physical structure of a user’s environment:

  • Mixed Reality objects would float in space because there is no knowledge of the surfaces on which they can rest
  • Rendering cannot imitate real-world lighting conditions without a model on which to cast shadows or on which virtual reflections can appear
  • Dynamic physical interactions with the real world are not possible
  • It’s not possible to transform real-world elements, for instance by changing their colors or shapes

A dense model of the environment unlocks the true potential of mixed reality. Games can bounce items off real-world tables and chairs, virtual characters can navigate intelligently through a scene, and the real world can appear to reflect the light coming from a virtual source like a torch or flashlight. Instead of falling through one’s couch, a virtual character can hop up and sit on it instead, snuggling up with a real cat on the armrest. These interactions are only possible with a truly dense model of the world.


The Future of VR is Inside-Out and Depth-Enabled

High-end VR headsets, like the HTC Vive, Oculus Rift and Sony Playstation VR, currently rely on external sensors for positional tracking. At the other end of the price range, current mobile VR solutions are only able to track stationary head rotation, a highly limiting and unnatural motion restriction. To enable fully immersive and unbounded VR experiences, untethered positional tracking in an HMD is a necessity, and many VR companies are beginning to create their devices with this in mind. These new headsets use so-called “inside-out” tracking, relying solely on sensors embedded into the HMD itself, e.g. one or two visible spectrum cameras and an Inertial Measurement Unit. These new prototypes are capable of tracking their user’s movement through the world without first setting up an external tracking system.

While positional tracking solutions are one step that will greatly increase the versatility of Virtual Reality headsets, the use of depth sensing and scene mapping remains wholly untapped for VR experiences. Even though Virtual Reality seeks to teleport users entirely out of their current space into completely new environments, these systems will still benefit from integrating depth cameras and real-world spatial understanding. As a matter of safety, current PC-VR users must clear out an entire room to use their headset. There are no safety mechanisms in any of the major shipping headsets that notify users of immediate impending collisions, except for a user-specified bounding box around the area designated for VR use. Thus, even if VR headsets gain self-contained positional tracking abilities, their full potential is still hamstrung by their inability to provide users with safe VR experiences in cluttered environments. With depth sensing and an understanding of the entire play area, derived from a dense scene model, it becomes possible to effectively notify users of impending collisions without disrupting their virtual experience.

Understanding the user’s environment can also help improve presence and immersion. By learning an area before launching into a first-person game, for instance, the system can automatically detect a user’s real height in ways a position-tracking system cannot; it knows where the real-world ground is, not just how the camera is moving through an arbitrary world. As a result, the game is more natural and user-friendly, versus assuming an average height or requiring the user to enter their height.

VR helps people escape their own reality and enter another but still remains technologically tethered to the real world. The more information gathered from one’s surroundings, the better the virtual reality experience becomes. With a scene model generated through depth sensing and 3D reconstruction, a completely virtual world can incorporate tactile real-world structures in ways that further enhance the virtual immersion. By reskinning the real world to take the appearance of a virtual one, objects in the real world become not only obstacles to be avoided, but objects to interact with: a wall becomes a large fence to peek over and a staircase becomes a rocky hillside to climb. Imagine a virtual RPG (role playing game) in an abandoned mall. The mall escalators could become virtual terrain, the storefronts could become caves to hide in, and the food court could become a virtual bazaar to explore. Suddenly the ordinary can be transformed.

Go forth and build with depth

The benefits of including depth sensing in both Mixed and Virtual Reality headsets is immense. Systems that incorporate depth sensing and spatial awareness, along with inside-out positional tracking, will truly leverage the full potential of the medium. Making depth and spatial understanding a top priority of both VR and MR will be a breakthrough moment for these devices. The headsets and platforms which are best able to make spatially aware development possible within their ecosystem will have an outsized advantage against the competition. With the combination of depth sensing hardware and spatially-aware software, VR and MR will come into their own as the cornerstones of entertainment and education, transforming facets of life unimaginable today.


This post is part of our contributor series. It is written and published independently of TNW.

Read next: GIPHY Gaming Roundup: ‘Mass Effect: Andromeda,' ‘Gwent,' and ‘Dying Light!'