Invented by Lidiia Bogdanovych, William Brendel, Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang, Snap Inc

The market for generating an image mask using machine learning has seen significant growth in recent years. With the increasing demand for image editing and manipulation in various industries such as advertising, e-commerce, and entertainment, the need for efficient and accurate image masking techniques has become crucial. Image masking is the process of separating an object or a subject from its background in an image. It is a fundamental step in many image editing tasks, including background removal, object extraction, and image compositing. Traditionally, image masking has been performed manually by graphic designers using tools like Photoshop. However, this manual approach is time-consuming, labor-intensive, and often lacks precision. Machine learning algorithms have revolutionized the image masking process by automating and improving its accuracy. These algorithms use deep learning techniques to analyze and understand the visual features of an image, enabling them to generate accurate and detailed image masks. The market for generating image masks using machine learning has witnessed significant advancements in recent years, driven by the increasing availability of large-scale annotated datasets and the development of powerful deep learning models. One of the key drivers of the market growth is the increasing demand for image editing and manipulation in the advertising industry. Advertisers often need to create visually appealing and attention-grabbing images for their campaigns. Image masking using machine learning allows them to easily remove backgrounds, extract objects, and create composite images, saving time and effort compared to manual editing. This has led to a rise in the adoption of machine learning-based image masking solutions by advertising agencies and marketing departments. The e-commerce industry is another major market for generating image masks using machine learning. E-commerce platforms require high-quality product images with clean and consistent backgrounds to attract customers. Machine learning-based image masking solutions enable e-commerce businesses to automatically remove backgrounds from product images, resulting in professional-looking visuals that enhance the overall shopping experience. This has led to increased adoption of machine learning-based image masking tools by e-commerce platforms and online retailers. Furthermore, the entertainment industry has also embraced machine learning-based image masking techniques. Film and video production companies often require complex visual effects and seamless compositing of different elements. Machine learning algorithms can accurately extract objects from video footage, enabling filmmakers to seamlessly integrate computer-generated imagery (CGI) and create realistic visual effects. This has led to the adoption of machine learning-based image masking solutions by visual effects studios and post-production houses. In terms of market players, several companies have emerged as leaders in the field of generating image masks using machine learning. These companies offer software solutions and APIs that leverage deep learning algorithms to automate the image masking process. They provide users with user-friendly interfaces, allowing them to easily upload images and generate accurate image masks with just a few clicks. Some companies also offer cloud-based solutions, enabling users to process large volumes of images quickly and efficiently. In conclusion, the market for generating image masks using machine learning has experienced significant growth due to the increasing demand for image editing and manipulation in various industries. Machine learning algorithms have revolutionized the image masking process, enabling accurate and efficient background removal, object extraction, and image compositing. The market is expected to continue growing as more industries recognize the benefits of machine learning-based image masking solutions in terms of time savings, cost-effectiveness, and improved visual quality.

The Snap Inc invention works as follows

A machine-learning system can generate an pixel mask, which is an image mask that contains pixel assignments. Pixels can be classified into classes such as face, clothes or body skin. Convolutional neural networks can be used to implement the machine learning system on devices with limited resources such as mobile phones. The pixel mask is used to display video effects more accurately when they interact with the user or subject in an image.

Background for Generating an Image Mask using Machine Learning

In recent years, mobile computing device users have been able to apply various image effects to images taken via client devices (e.g. image overlays, videos effects). Image effects can also be applied to specific regions of an image (e.g. recoloring pixels on a face, while leaving the pixels on the hair of that person unaltered). Labeling different areas of an image can be computationally expensive, especially for mobile devices with limited processing power or memory.

The following description includes systems, techniques, instructions sequences, and computer program products that embody examples of embodiments. For the purpose of explaining, various specific details will be provided in the following description to help you understand the different embodiments of inventive subject matter. It will however be obvious to those in the know that embodiments can be performed without these specifics. “In general, well-known instructions, protocols, structures and techniques may not be shown in full detail.

In recent years, mobile computing has allowed users to apply various image effects to images captured by the client device. Image masks can be used to apply image effects to specific regions of an image. Image masks are a representation that labels specific areas of an image. Image masks, for instance, can be used to change the color of pixels on a person?s face, while leaving pixels on their hair untouched. “However, creating image masks to cover different areas of an image can be computationally demanding, especially for mobile devices that have limited memory and processing power.

To this end, a mask image can be used in order to label different parts of an image with different values. A mask for a face, for example, may not modify pixels in the face, but set values outside of the face to zero. Labels are mask data, and can be stored as labels in another image file. Image mask files can be images that have the same dimensions as the original, but where each pixel is set to a value to indicate to what area or label it belongs. In some embodiments, the mask data (e.g. pixel values for labeling) can be stored in channel data for each pixel. (For example, instead of having a RGB value with three channels, it could have a RGBM value with four channels). The image mask data can also be stored as polygon metadata. Each polygon is a circle that surrounds a particular type of masked region, and its vertices may be stored in metadata for an image. Image masks allow video and image effects to be applied more precisely.

In some embodiments, training images have polygons labeled around each area. (e.g. face area, hair region, clothes region). Polygons can also be created by humans (e.g. dragging and holding over different areas, or highlighting different areas in order to create polygons). Label metadata can store the vertices in the polygons.

The labeled training images (e.g. image masks), can be resized to create sets of training pictures of different sizes. As part of the multi-scale training, a segmentation engine that implements a neural net can train its neural model by using different sized sets of training images. The neural network is configured to accept a given image as input and produce an image mask that has different areas labeled. The neural network can be run by client devices using images with different resolutions after training. This is discussed further below. In some embodiments, the generated masks of images can be further refined during a post-processing stage to remove noise and refine the borders of labeled areas.

At runtime the user can use their client device to capture one or more pictures (e.g. a photo, a video). The image mask system will then be able to detect and label the different areas as segments of an image mask. Different visual effects are possible for different labeled regions. If the mask has a hat section, for example, the depicted hat can be replaced by a cartoon hat. According to certain embodiments, the modified images (e.g. an image of a person wearing a zany cap) can be posted as a temporary message on a social network straight from the client device.

FIG. The block diagram 1 shows an example messaging system for exchanging data over a network (e.g. messages and their associated content). The messaging system 100 comprises multiple client devices, each of whom hosts a variety of applications, including a message client application 104. The messaging system 100 includes multiple client devices 102, each of which hosts a number of applications including a messaging client application.

Accordingly each messaging client application is able communicate and exchange data via the network with another messaging application 104 as well as with the messaging system 108.” Data exchanged by messaging client applications 104 as well as between a client application 104 with the messaging system 108 includes both functions (e.g. commands to invoke functions), as well as payload (e.g. text, audio or video data).

The messaging system 108 is a server-side application that provides functionality to a specific messaging client application via the network 106. Although certain functions of a messaging system 100 have been described as being performed either by a messaging application 104, or by the message server system 108 in the present invention, it should be understood that the placement of certain functionality either within the messaging application 104, or the server system is a choice. It may be more technically advantageous to deploy certain technologies and functionality in the messaging server system first, then migrate them to the messaging application 104 when a client device has sufficient processing power.

The messaging server system 108 provides various services and operations to the messaging client application. These operations include sending data to the messaging client application, receiving data, and processing data produced by it. These data can include, for example, message content, client device and geolocation information. Media annotations and overlays, persistent message content conditions, social networking information and live event information. The messaging client application 104 provides functions that allow data exchanges to be controlled and invoked within the messaging system.

Now, focusing on the messaging server system, an API server 110 provides a programmatic user interface to an application server 112, which is connected to an API server 110. The application server is connected to the database server 118. This server facilitates the access to database 120, where data is stored that is associated with messages processed by application server 112.

The API server 110 transmits and receives messages (e.g. commands and message payloads), between the client devices and the application server 112, The API server 110 is a set interfaces (e.g. routines and protocols), which can be invoked by the messaging application 104 to invoke functionality on the application server 112. The API server 110 exposes a variety of functions supported by application server 112. These include account registration, login functionality, the sending or retrieval messages from one messaging client app 104 to another, the sending media files (e.g. images, video) to a message server application 114 that can be accessed by another messaging application 104, the setting up of collections of media data (e.g. a story), the retrieval and setting of such collections, the retrieval and setting of stories, the retrieval and setting

The application server 112 hosts several applications and subsystems including the messaging server 114, an imaging processing system 116 and a social networking system 122. The messaging application 114 implements various message processing technologies and functions, including those related to the aggregation of textual and multi-media content included in messages sent from multiple instances the messaging client application. Text and media content received from multiple sources can be aggregated to create collections (e.g. called stories or galleries) as will be explained in more detail. The messaging server application (114) makes these collections available to the messaging client app 104. The messaging server application can also perform other processor- and memory intensive processing of data, depending on the hardware requirements.

The application server 112 includes an image processing system, which performs various image processing operations. This is typically done with images or videos received in the payload of messages at the messaging application server 114.

The social network system 122 provides various social networking services and functions, and makes them available to the message server application 114. In order to achieve this, the social networking system 122 accesses and maintains an entity diagram (e.g. entity graph 304 shown in FIG. The database 120 contains a graph of entities (e.g., entity graph 304 in FIG. The social network system 122 supports a number of services and functions, including identifying other users in the messaging system 100 that a user has a relationship with or who the user is “following.” The social network system 122 supports functions and services such as identifying other users of the messaging system 100 with whom a user has relationships or who that particular user is?following,?

The application server 112 communicates with a database server, 118. This server facilitates the access to database 120 which contains data associated with messages that are processed by the messaging application 114.

FIG. The block diagram 2 illustrates further details about the messaging system 100 according to example embodiments. The messaging system 100 comprises the messaging client application (104), the application server (112), which, in turn, embody several subsystems. These include an ephemeral-timer system 202 and a collection management system (204), an annotation system (206), and an image masking system 210.

The ephemeral-timer system 202 enforces the temporary content access permitted by the message client application 104, and the messaging server app 114. The ephemeral-timer system 202 integrates a set of timers which, based upon the duration and display parameters of a message (e.g. a SNAPCHAT story), display and enable selective access to messages and their associated content through the messaging client application 104. Below are more details about the operation of the timer system.

The collection management system 204 manages media collections (e.g. collections of images, text, audio, and video data). Some examples include a collection (e.g. messages, images, videos, text and audio), which can be arranged into an “event gallery”. Or an “event story.” A collection of this type may be available for a specific time period such as during the event that the content is related to. As an example, the content of a concert could be available as a “story”. For the duration of a music concert, a?story? may be made available. The collection management system may also be responsible to publish an icon which notifies the user interface of messaging client application 104 of the existence of that particular collection.

The collection management system 204 also includes a curation user interface 208, which allows a collection manager or curator to curate and manage a specific collection of content. The curation interface 208, for example, allows an event organizer curate a group of content related to a particular event (e.g. delete inappropriate content or redundant message). The collection management system uses machine vision (or content rules) to curate content automatically. In some embodiments, a user may receive compensation for the inclusion of their user-generated content in a collection. In these cases, the curation tool 208 automatically makes payments to users who have provided content.

Click here to view the patent on Google Patents.