While most recent phones come with dual rear cameras to capture depth and deliver Portrait Mode-style shots with the background separated from the foreground, Google’s Pixel phones achieve the same effect with a single lens. That applies to the new Pixel 3 as well, and the company’s now explained how this functionality works – and why it used a rig consisting of five phones to perfect it.
For Portrait Mode-style shots on the Pixel 2, Google used a neural network-powered Phase Detection Autofocus (PDAF) system. It works on a concept called Parallax, in which the camera captures two images from slightly different angles and calculates the movement to perceive depth. But that sometimes doesn’t work, because the camera movement in the user’s hand isn’t enough to allow for shots from different angles.
To fix that, Google made a learning algorithm that corrects the depth perceived by the PDAF system. “Specifically, we train a convolutional neural network, written in TensorFlow, that takes as input the PDAF pixels and learns to predict depth. This new and improved ML-based method of depth estimation is what powers Portrait Mode on the Pixel 3,” Google’s research scientist Rahul Garg said.
The most fascinating thing about these improvements is the method Google used to build the learning algorithm. It made a monster “Frankenphone” rig with five Pixel 3 phones, and used a Wi-Fi-based software tool to make sure all of them captured images simultaneously. This was necessary to capture an object from different angles to teach the algorithm.
The company’s engineers and researchers took photos with this rig as if they were using just one device. This helped them train the algorithm to predict the depth and separate objects better.
With five viewpoints, the team was able to ensure that there was parallax in multiple directions when it shot test photos. Plus, the arrangement of the cameras allowed for more accurate depth estimation than the Pixel 3’s PDAF system, so it was easier to train the algorithm with the data collected using the rig.
Google uses its Tensorflow Lite machine learning library to stitch all the different frames taken from its PDAF system together using the device’s processing power, rather than relying on cloud-based services. And as we’ve seen in the above photos, the results are quite pleasant to look at.
It’s interesting to see what goes on behind the scenes with computational photography techniques, especially because we’ll need to continually build on them to improve phone camera performance within the constraints of these devices’ compact form factors.
Google has released a photo album comparing its old and new techniques to show its enhancements over a generation of the algorithm for portrait shots, which you can check out here.