How did Google make the portrait light effects on Pixel 5?



We have heard the term computational photography too many times in the past two years.

When it comes to computational photography, people naturally think of Google’s Pixel series of mobile phones. This series can be said to create a precedent for computational photography. It reveals the power and charm of computational photography.

It is precise because the power of computational photography is so amazing that the mobile phone manufacturers who have gradually reminisced over the past two years finally plunged into it. At this time, Google is already playing more flowers.

The "Portrait Light Effect" was originally launched with Google's release of Pixel 4a & Pixel 5 in October this year, a feature exclusive to this generation of Pixel. But a few days ago, Google made an update to the camera and photo album applications, delegating this function to users after Pixel 2.


Inspired by the photography lights used by portrait photographers, "Portrait Light Effect" can reposition and model the light source, and then add the new light source to the photo scene. It can also identify the direction and intensity of the initial lighting, and then automatically supplement the lighting situation.

Such a powerful computational photography function is naturally inseparable from the machine learning capabilities of neural networks. After the photos taken by the mobile phone's portrait light effect mode are used as a database for training, the later capabilities of the "portrait light effect" enable two new algorithms:

  • Automatically add synthetic light source: For a given portrait photo, the algorithm synthesizes and adds an external light source, and the lighting and lighting of the photographer will be consistent in reality.
  • Re-lighting after composition: For a given lighting direction and portrait photo, add composite light in the most natural way.

Let me talk about the first problem first, which is to determine the position of the light source and add it. In reality, photographers usually adopt an empirical and perceptual way, by observing the intensity and position of the light falling on the face of the subject, and then determining how to light it. But for AI, how to determine the direction and position of the existing light source is not easy.

To this end, Google adopted a new machine training model-omnidirectional lighting contours. This new lighting calculation model can use the human face as a light detector to infer the direction, relative intensity and color of the light source from all the illumination, and it can also estimate the head posture in the photo through another facial algorithm.

Although it sounds very tall, the actual training model's rendering effect is quite lovely. It will treat the human head as three round silver spherical objects. The top ball "texture" is the roughest, used to simulate Diffuse reflection of light. The ball in the middle is also matte, which is used to simulate a more concentrated light source. The bottom ball is a mirror "material", which is used to simulate a smoother mirror reflection.

In addition, each sphere can reflect the color, intensity and directionality of the ambient lighting according to its own characteristics.

In this way, Google can get the direction of the post-composite light source. For example, the classic portrait light source is located 30° above the line of sight and between 30° and 60° with the camera axis. Google also follows this classic rule.

After learning the direction of adding a light source to a portrait, the next thing to do is how to make the added light source more natural.

The previous question is a bit like the "Dugu Nine Swords". After learning it, I will do some fixed questions. To solve the latter problem, it is necessary to make "Dugu Nine Swords" as much actual combat as possible, to integrate different actual situations, and then learn to crack the world's martial arts.


In order to solve this problem, Google has developed another new training model to determine the self-directional light source to be added to the original photo. Under normal circumstances, it is impossible to train this model with the existing data, because it cannot face the almost infinite light exposure, and it must match the human face perfectly.

For this reason, Google has created a very special device for training machine learning-a spherical "cage". There are 64 cameras with different viewing angles and 331 individually programmable LED light sources in this device.

If you have been to Dolby Cinema, there is a link in the pre-screening show of Dolby Cinema where the sound moves in a hemispherical dome to simulate the almost infinite direction in reality. The Google device actually has a similar principle.


By constantly changing the direction and intensity of the illumination and simulating complex light sources, you can get the data of the reflected light from human hair, skin, and clothes, so as to obtain what the illumination should be under complex light sources.

Google invited 70 different people to train this model with different face shapes, hairstyles, skin colors, clothes, accessories and other characteristics. This ensures that the synthesized light source matches reality to the maximum.

In addition, Google does not directly output the final image through the neural network model but allows the neural network model to output a lower resolution quotient image.

Here is an explanation of what a quotient image is. A picture can be broken down into two layers: the bottom layer and the detail layer. The bottom layer contains the low-frequency information of the image, which reflects the intensity changes of the image on a large scale; the detail layer contains the high-frequency information of the image, which reflects the details of the image on a small scale. The bottom layer multiplied by the detail layer is the source image, and the detail layer can also be called the quotient image.

Then through the bottom layer of the original image, adding additional light sources to the data of the input quotient image during sampling, you can get a final output image.

The final process is like this. First, given a picture, then calculate the surface normal of the character in the picture, then calculate the visible light source in the picture, and use the neural network model to simulate the additional light source to output a lower resolution quotient image, and then use it as The detail layer is multiplied by the bottom layer of the original photo, and finally, a portrait photo with additional light sources is obtained.

Google has also made a lot of optimizations on the pipeline so that the simulated light effects can interact in real-time on the mobile phone, but the size of the entire model is only about 10MB.


The portrait light effect of Pixel 5 can be said to be a typical case in Google's computational photography. Through continuous training of neural network models, the mobile phone can simulate the lighting of real portraits. Completed a new application scenario of computational photography.

Some people say that photography is an art, and computational photography is fundamentally an insult to photography. But since the Frenchman Daguerre made the first practical camera in 1839, the camera has been in use for more than 100 years. From the niche to the masses, until the birth of the mobile phone camera, everyone has a nearly equal opportunity to take pictures. And people's inner expressions have also gradually enriched the art of photography.

That’s right, computational photography is at the same time as “photography” and also “computing shadows”, but algorithms have long been an inseparable part of mobile photography. The pursuit is still the effect that can be achieved in a simulated reality. After all, no one will "Magic Transformation" is called computational photography.

When Apple and Google went further and further in computational photography, we discovered that algorithms are actually a stronger barrier than hardware.

Post a Comment

0 Comments