Human-centric AI news and analysis

Google’s new AI can intelligently crop videos for any screen size

How many times have you seen a video being badly cropped when you watch it on a mobile device? It’s quite frustrating and annoying, and most of the time, there’s not much you can do about it.

To address this problem, Google’s AI team has developed an open-source solution, Autoflip, that reframes the video that suits the target device or dimension (landscape, square, portrait, etc.). 

Left: Original video (16:9). Middle: Reframed using a standard central crop (9:16). Right: Reframed with AutoFlip (9:16). By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content.
Left: Original video (16:9). Middle: Reframed using a standard central crop (9:16). Right: Reframed with AutoFlip (9:16). By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content.

Autoflip works in three stages: Shot (scene) detection, video content analysis, and reframing. The first part is scene detection, in which the machine learning model needs to detect the point before a cut or a jump from one scene to another. So it compares one frame with the previous one before to detect the change of colors and elements.

Three stages of Google Autoflip AI
Three stages of Google Autoflip AI

Once the model determines a shot, it moves on to the video content analysis to determine important objects in a scene. It uses a deep learning neural network to determine not just people or animals, but motion and moving balls in sports, and logos in commercials.

Three stages of Autoflip

For the final stage, the AI model determines if it will use stationary mode for scenes that take place in a single space, or tracking mode for when objects of interest are constantly moving. Based on that, and the target dimensions in which the video needs to be displayed, Autoflip will crop frames while reducing jitter and retaining the content of interest.

[Read: This AI can perfectly dub videos in Indic languages — and correct lip syncing]

Google researchers said that Autoflip can be used to convert videos to many formats and screens without much effort. For the next stage, the team wants to improve object tracking in interviews and animation films. It wants to use text detection and image inpainting techniques to better place foreground and background objects in one frame.

Video conversion in different aspect ratios by Autoflip

You can checkout Autoflip’s code here.


You’re here because you want to learn more about artificial intelligence. So do we. So this summer, we’re bringing Neural to TNW Conference 2020, where we will host a vibrant program dedicated exclusively to AI. With keynotes by experts from companies like Spotify, RSA, and Medium, our Neural track will take a deep dive into new innovations, ethical problems, and how AI can transform businesses. Get your early bird ticket and check out the full Neural track.

Published February 14, 2020 — 13:43 UTC

Corona coverage

Read our daily coverage on how the tech industry is responding to the coronavirus and subscribe to our weekly newsletter Coronavirus in Context.

For tips and tricks on working remotely, check out our Growth Quarters articles here or follow us on Twitter.