Can AI solve developers’ "image" problems?

This is a guest post for the Computer Weekly Developer Network written by Tal Lev-Ami in his capacity as co-founder and CTO of Cloudinary.

California and Israel based Cloudinary provides cloud-based image and video management technology that allows users to upload, store, manage, manipulate and ‘deliver’ images and video for websites and applications.

Lev-Ami laments the fact not everything on the web is as beautiful as it could be and looks to AI-driven routes to a prettier (and, consequently, an altogether more functional) Internet and so writes as follows…

Visitors to Las Vegas’ famous ‘strip’ are dazzled by a profusion of brightly coloured images and videos. Some of the old neon signs haven’t aged very well, with burnt out letters and their intensity fading. On other structures and facades, modern high-definition videos and images seem to come at you in 3D.

It’s the perfect metaphor for today’s online experience — however, a web visitor’s ecstasy can be a developer’s agony.

To meet consumers’ insatiable appetite for dazzling and immersive User eXperiences (UXs), web developers and designers manually solve the same image and video management problems ad infinitum. Delivering thousands of images, or sites that allow users to upload images, means constantly having to remove backgrounds, crop and resize images, change colours and apply effects.

But aren’t these manual, time-consuming tasks the perfect things to outsource to AI?

Answer: ‘yes’.

Deep learning has completely revolutionised computer vision over recent years — and media asset management is no exception.

Here are two examples where AI is proving particularly useful:

Background removal

Using AI for background removal combines a variety of deep-learning and AI algorithms. They recognise the primary foreground subject of photos and then accurately remove photo backgrounds in seconds.

This seemingly simple task belies a lot going on behind the scenes. The AI engine must first recognise the salient object(s) in the image; then accurately segment that object/s and, finally; separate the foreground to an alpha layer.

The AI engine must determine which objects to classify as foreground versus background. This classification depends on a scene’s context and composition and must be near-perfect to produce the expected quality. In the case of a picture of a woman wearing a fur coat, for example, the distinction between the fur and hair pixels and the background pixels must be flawless.

Image auto-cropping & resizing

Site visitors expect top-quality, quick-loading images that display properly regardless of which device they’re using. This involves delivering the same image in many different aspect ratios and potentially cropping closer or wider on your main subject, depending on size. Where hundreds of images are involved, cropping and resizing becomes incredibly tedious and fiddly. Deep-learning algorithms automate this by detecting the image subject, then resizing and cropping them to the desired delivery size and aspect ratio.

Again, there’s a lot going on behind the scenes.

To decide where to crop, algorithms first analyse an image’s pixels and prioritise the most salient areas on-the-fly. All auto-cropping algorithms give priority to faces, but there are differences techniques for determining other salient image areas. Our software, for example, uses a neural network to predict where people will look at in an image.

These are but two examples; AI is applied in lots of other useful ways.

For example, it can help automatically convert landscape mode video formats into mobile-optimized portrait mode. In this case, machine learning automatically determines the optimal focus point, such as faces, subjects, products or moving objects.

Deep learning algorithms for this ‘content-aware’ cropping and scaling make it easy to fit responsive layouts, change product colours and apply effects. Automatic tagging and transcription capabilities use AI to organise and manage images and videos quickly and at-scale. Last but not least, auto-tagging algorithms help developers to better manage, reuse, and analyse incoming user generated content.

Why do I know all this?

Because before my fellow co-founders and I established Cloudinary in 2011, we were working as consulting engineers and found ourselves manually repeating the same image-related tasks time and time again. Fortunately, AI evolved at just the right time to help significantly solve our own ‘image problems’.

Today, we help brands spend less time doing grunt work related to images and more time delivering the kinds of online experiences that boost their businesses.