Artificial Intelligence – Yu-Han Chang, Rajiv Maheswaran, Jeffrey Wayne Su, Noel Hollingsworth, Genius Sports SS LLC

Abstract for “Methods, systems and methods of spatiotemporal pattern detection for video content creation”

“Interacting in a broadcast video stream is done with a machine-learning facility that processes a feed of a broadcast video through a spatiotemporal patterns recognition algorithm that applies machinelearning on at least one of the events in the video feed to gain an understanding of that event. The process of understanding involves identifying the context information related to the event and identifying an entry from a relationship library that details the relationship between two features in the video feed. A touch screen interface allows at least one broadcaster access to a portion the video feed’s content through interaction options that are based upon the context information. Remote viewers can also interact with the content through an interface.

Background for “Methods, systems and methods of spatiotemporal pattern detection for video content creation”

“Field of Invention”

The present application relates generally to a system for performing analysis of events appearing in live and recorded videos feeds such as sporting events. The present application is a system and method for spatio-temporal analysis and extraction of components and elements of events in a video feed. This includes systems for discovering and learning about such events and metrics, as well as systems and systems for visualization and interaction with such systems and methods.

“Description of Related Art”

Live events such as sports continue to gain popularity and generate huge revenue for colleges and franchises. Quantitative methods, such as Sabermetrics have become more popular and widely accepted as an enhancement to traditional scouting methods. They can provide valuable insights and give you a competitive edge in these endeavors. Because of the sheer volume of sporting information generated every day, it is impossible to store and evaluate all the data. Additionally, tools that can extract and analyze such information are not available.

“Systems can now be used to capture and encode event information such as sporting events such as?X?Y?Z. Motion data is captured using imaging cameras installed in National Basketball Association (NBA), arenas. These systems have many limitations, such as difficulty handling the data, difficult transforming X,Y,Z data into meaningful and current sports terminology, difficulty identifying meaningful insight from the data and difficulty visualizing the results. There are also opportunities to extract new insights from the data. There is a need for systems and methods that can analyze video feeds to find relevant events and present them as metrics and insights.

“In accordance to various exemplary and other non-limiting embodiments, the methods and systems described herein allow the exploration of event data from video feeds, the discovery and presentation of relevant events (such like within a live sporting event video feed), and the presentation and analysis of new insights, analytic results and visual displays that enhance decision making, provide better entertainment, and provide additional benefits.”

“Embodiments are data taken from a video feed that enables an automated machine understanding a game. The alignment of video sources to this understanding allows for the automatic delivery of highlights to an end user. Machine learning is used to understand an event in embodiments. This data can be obtained from at most one of the video feeds and a chip-based track system. It includes events in position tracking data that are based on at minimum two of spatial configuration, relative movement, and projected motion. Machine learning is used to understand an event in various ways. This includes aligning multiple unsynchronized input streams (e.g. Tracking video, broadcast video/audio and play-by play data is possible using machine learning. At least one algorithm and a hierarchy are used to align multiple unsynchronized input feeds related to an event. The feeds include one or two broadcast video feeds, one or several feeds that track video for the event and one or many feeds that play-by play data. The multiple unsynchronized feeds may include feeds of different types, such as feeds that are of two or more types and related to the event. Embodiments can also include validating, refining or modifying the understanding or alignment of unsynchronized input feeds. This may be done using a hierarchy that involves at least two or more algorithms, one, more human operators, and one or several input feeds.

“In embodiments, the content that displays an event is automatically extracted form a video feed. This could be based on machine understanding of the event. Extracting the event content from a video feed involves automatically extracting a portion using machine understanding of events, a machine understanding and/or closed caption feeds. The machine understanding of an input feed also includes understanding at most one of the broadcast commentary’s portions and the change in camera view within the input feed. Embodiments can also include a combination machine understanding of the video feed with a machine understanding a different input feed related to those events. This could be a broadcast feed or an audio feed. At least one may edit the video cut and combine it with other content. Understanding the machine of another input feed could include understanding at most one of the broadcast commentary’s content and changing the camera view in the input feed.

“Embodiments could also include automatically creating a semantic index for a video feed based upon the machine understanding at least one event. This will indicate the time and location of the event in a video feed. It may also indicate the location of the event on a display screen, such as a location in pixels, a location in voxels, or other similar information. To augment the video feed, the semantic index may be used to add content based upon the identified location. It can also enable at least one touch interface feature or a mouse interface feature based upon the identified location.

“In accordance to further exemplary and nonlimiting embodiments, a process includes receiving a sports playing field configuration along with at least one picture and determining a camera position based at least in part upon the sport playing fields configuration and at most one image.

“According to further exemplary and nonlimiting embodiments, a method includes performing automatic recognition and augmenting video input with at minimum one of additional imagery or graphics rendered within the reconstructed 3D spatial of the scene.”

“Methods or systems described herein include taking a live video feed of an event and using machine learning to understand it; aligning the video feed with that understanding automatically under computer control; and creating a transformed feed that includes at most one highlight from the machine learning. The event could be a sporting event in some embodiments. In some embodiments, the event could be an entertainment event. The event could be either a TV event or a movie event in some embodiments. The event could be a playground pick-up game or another amateur sport game. The event can be any human activity, motion or movement in a home, commercial establishment or other place. The transformed video feed creates highlights video feeds of video for a set of players. The defined set of player may include a group of fantasy players. Embodiments could include the delivery of the video feed to at minimum one of an inbox or a mobile device, as well as a table, an app, a scoreboard and a Jumbotron board.

The methods and systems described herein include: taking a source feed relating to an events; using machine learning for understanding the event; aligning the source feed with that understanding automatically; producing a transformed feed that contains at least one highlight from the machine learning. The event could be a sporting event in some embodiments. In some embodiments, the event could be an entertainment event. The event can be either a TV event or a movie event in embodiments. The source feed can be one of the following: an audio feed; a text feed; and a speech feed.

Methods and systems described in this document may include taking a data-set associated with a live video feed; taking spatiotemporal characteristics of the live event and applying machine learning to determine at most one spatiotemporal patterns of the event. Finally, using a human validation process for at least one of the validated and taught machine learning of that spatiotemporal structure. The event could be a sporting event in some embodiments.

“Methods or systems described herein can include taking at minimum one of a photo feed and a video feed; taking data relating the venue’s known configuration; and then automatically, under computer control, recognising a camera pose based upon the video feed. The venue could be a venue for a sporting event.

“Methods or systems described herein can include: taking at least 1 feed from the group consisting a video feed as well as an image feed of the scene; collecting data relating to a configuration of a venue; automatically, using computer control, recognising a camera pose based upon the video feed and known configuration; augmenting at least 1 feed with at minimum one image and a graphic within a given space. Human input may be used to validate and aid the automatic recognition and validation of the camera pose. Methods and systems may also include the presentation of at least one metric within the augmented feed. These methods and systems can enable a user interact with at least one video feed and one frame in the video feed using a 3D user interface. These methods and systems can include augmenting at least one feed to create transformed feeds. The transformed feed could create a highlight feed of video that is visible to a set of users.

Methods and systems described in this document may include: taking a video feed of an event; taking spatiotemporal characteristics of the event; applying machine-learning to determine at most one spatiotemporal pattern; and then calculating a metrics based on that pattern. The metric in embodiments may include at least one of the following: a shot quality (SEFG metric), an EFG+ metrics, a rebound positioning measurement, a rebounding attack measure, a rebounding conversion measuring metric and an event count per event-count.

“Methods or systems described herein can include an interactive, graphic user interface that allows exploration of machine learning data from live video. The graphical user interface allows for the exploration and analysis events. The graphical user interface in embodiments can be any of the following: a tablet interface on a mobile device, a laptop interface on a tablet, a tablet interface on a large-format touchscreen, or a personal computer interface. The data can be structured to show at least one of the following: a breakdown, ranking, field-based comparison, statistical comparison and a ranking. Exploration may enable at least one of the following: a touch interaction; a gesture interaction; a voice interaction; and a motion based interaction.

“Methods or systems described herein could include: taking a dataset associated with a live video feed of an event; automatically, using computer control, recognising a camera pose for video; tracking at most one player and an object in video feed; and placing the tracked objects in a spatial location that corresponds to spatial coordinates.”

The methods and systems described herein include: taking a dataset associated with a live video feed; taking spatiotemporal characteristics of the live event and applying machine learning to determine at most one spatiotemporal pattern; and providing contextualized information during an event. The contextualized information may include at least one of the following: a replay, visualization, highlight, compilation of highlights and replay. The information can be delivered to at most one of a mobile device or tablet, as well as a broadcast video feed. Methods and systems can include touch screen interaction that displays at least one item from the contextualized information.

“In some embodiments, the methods or systems described herein can include taking a live video feed of an event; identifying the point of departure of the participant; and then automatically selecting from the video feed a plurality video frames showing at least one view taken from the viewpoint of the participant. These methods and systems can also include rendering a 3D movie using the selected plurality video frames. Methods and systems can also include an interface that allows a user to select a participant from among a number of participants. The embodiments may include a sporting event, where the participant could be a player. Embodiments may include basketball, and the video feed can be linked to 3D motion capture data captured by cameras that capture the video feed.

“In embodiments, a method of providing enhanced video content involves processing at least 1 video feed using at least 1 spatiotemporal patterns recognition algorithm that uses machine-learning to understand a plurality events and determine at most one event type for each event within the at minimum one video feed. At least one event type is an entry in a relation library that describes a relationship between at least two visible features within the at-least one video feed. Method includes extracting multiple video cuts from at least one video stream. The method also includes indexing the extracted plurality video cuts based upon the at least one event type that was determined by machine learning. This corresponds to an event within the plurality events detectable in each of the plurality video cuts. A computer controlled method automatically generates an enhanced video content structure from the extracted plurality video cuts. This is based on the indexing data of the extracted plurality.

“In embodiments, at least one spatiotemporal recognition algorithm uses at least one pattern from the group consisting: relative motion of two features towards each other for at most a duration threshold; acceleration of motion with respect to at least two features for acceleration of at minimum an acceleration threshold; rate of motion between two visible feature toward each other; projected point of intersection between the two features; and separation distance between them being less than a separation limit. The automatic generation of the enhanced video content structure is accomplished by combining an understanding of the plurality with an understanding at least one broadcast video event or broadcast audio event created with machine learning. The generation of the enhanced content data structure depends at least partially on a user preference or a profile for the user for whom the enhanced content data structure was generated.

“In some embodiments, the method provides a user interface that can be displayed on a mobile device. At least one of the search options and filtering options allow users to specify and choose a description for a type event. The description is matched by the enhanced video content data structure. The machine learning is used to understand the event in various embodiments. This includes using position tracking data from multiple events over time, obtained from at most one video feed and a chip based player tracking system. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

In some embodiments, machine learning is used to understand the plurality events. This involves aligning multiple unsynchronized feeds related to one event using at least one algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to several play-by-play data streams. The multiple unsynchronized feeds are at least three feeds from at least two events related to the event. The method may also include at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes at least one of validating and modifiying the understanding of the machine learning using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes automatically creating a semantic index for the at minimum one video feed using the understanding of at least one of the plurality events in the feed. This index will indicate the game time and location of the display of the event in the feed.

“In embodiments, the location for the display of at least one event within the video feed includes at minimum one of a location (pixel, voxel, or raster) The method, in embodiments, includes the provision of semantic index to the video feed with the video stream configured to enable semantically-based augmentation. Embodiments include enhancing the video feed by adding content based upon the identified location and enabling at minimum one of a touch interface and a mouse interface feature based upon the identified location.”

“Extracting the plurality video cuts in embodiments involves automatically extracting a video cut from the feed using a combination the machine-learning developed understanding of the plurality events and an understanding created with the machine learning on another input feed. This feed can be a broadcast feed, an audio feed or a closed caption feed. The understanding gained through machine learning of another input feed is included in embodiments. This includes understanding at least one portion of a broadcast comment and a change of camera view in another input feed.

“In some embodiments, the process includes processing at most one video feed using at least 1 spatiotemporal recognition algorithm that uses machine-learning to understand a plurality events within the video feed and determine at least 1 event type for each. At least one event type is an entry in a relation library that describes a relationship between at least two visible features of the video stream. Method includes extracting multiple video cuts from at least one video feed. The method also includes indexing the plurality video cuts based upon the at least one event type that is determined by machine learning. A mobile application is also provided as part of the method. It allows users to search for the extracted plurality video cuts using the indexing of the mobile app.

“In embodiments, at least one spatiotemporal recognition algorithm is based upon at least one pattern from the group consisting: relative motion of two features towards each other for at most a duration threshold; acceleration of motion with respect to at least two features for acceleration of at minimum an acceleration threshold; rate of motion between two visible feature toward each other; projected point of intersection between the two features; separation distance between them being less than a separation limit. The machine learning method generates at least one metric for each event in the plurality. The mobile application’s user interface allows the user to select the metric that will be included in the edited video. The method, in certain embodiments, includes the mobile app’s user interface that allows the user to edit a video and share it via the mobile app. The machine learning is used to understand the plurality events. In addition, the method uses position tracking data over time from at least one of at least two video feeds and a chip-based player tracker system to determine the number of events. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

“In some embodiments, the process includes the use of machine learning to understand the plurality events. It also involves aligning multiple unsynchronized feeds related to one event of the plurality using at least one from a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from the group that includes broadcast video feeds, feeds of tracking video, and one to three play-by-play data streams. In some embodiments, multiple unsynchronized input feeds consist of at least three feeds from at least two types of event-related feeds.

“In some embodiments, the method involves at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that includes at least two or more algorithms, one, or more human operators, and one or multiple input feeds where at most one algorithm in the hierarchy is based on the nature the input feed. The method, in some embodiments, includes at least one of validating and modifying the understanding using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm of the hierarchy for validating is based upon the nature of the input. Extracting the plurality video cuts from at least one feed involves automatically extracting one of the video clips using the combination of an understanding of the plurality events and an understanding of another feed that is selected from the group of a broadcast feed, an audio feed and a closed caption feed. Understanding the other input feed is, in some instances, a combination of an understanding from machine learning at least one portion of a broadcast commentary or a change of camera view in the input stream.

“In certain embodiments, a method of providing enhanced video content involves processing at least 1 video feed using at least 1 spatiotemporal patterns recognition algorithm that uses machine-learning to understand a plurality events within the at least 1 video feed and determine at most one event type for each. The method involves extracting multiple video cuts from at least one video stream. The method also includes indexing the plurality video cuts based upon the at least 1 event type that was determined using machine learning. It also involves determining at minimum one pattern related to the extracted plurality video cuts. Further, the method includes indexing at most a portion the plurality extracted video cuts with an indicator to the pattern.

Machine learning is used to develop at least one pattern in embodiments. The machine learning can help identify at least one participant in an event. Indexing the extracted plurality video cuts involves identifying at most one player in each of the video clips from the plurality. In certain embodiments, the at most one pattern refers to a series or similar event types that involve the same player over time. A plurality of video clips can be used to show a player participating in multiple identical events over time.

“In some embodiments, the method provides an enhanced video feed that displays a player during a plurality of events over varying times. An enhanced video feed includes at least one of a simultaneous superimposed video of the players involved in multiple identical event types as well as a sequential video of each player involved in that event type. In certain embodiments, the process of determining at most one pattern involves identifying sequences that are likely to predict an action that will follow. The method of determining at least one pattern involves identifying similar sequences across multiple video feeds. The method, in some embodiments, includes a user interface that allows a user at least to view and interact with the pattern.

“In embodiments, at least one pattern is personalized and interaction options are customized based on at minimum one of a user preference or a profile. The at least one pattern in embodiments refers to the expected outcome of at least 1 of a game or an event within a gaming game. The method, in embodiments, includes providing the user with at minimum one of trend information, a statistic and a prediction based upon the at least 1 pattern. The at least one statistic, trend information, or prediction in embodiments are based on at minimum one of a user preference, and a profile. The method can include at least one pattern that relates to the play of an athlete. The method, in some embodiments, includes a comparison of an athlete’s play with another one based on the similarity of at most one of the extracted plurality video cuts and at least one pattern. The comparison is made between a professional and non-professional athlete in certain embodiments. The embodiments compare a professional athlete’s playing style to a non-professional user based on the machine learning analysis of at least one event and at least one pattern.

“In some embodiments, the machine learning of the plurality events also includes using the plurality in position tracking data over the time from at least one of at least one video feeds and a chip-based player track system. The machine learning enables us to understand at least two aspects of spatial configuration, relative motion, projected motion and projection of the player and any item in a game. The machine learning of multiple events allows for alignment of multiple unsynchronized input feeds that relate to one event using at least one algorithm and a hierarchy human operators. Unsynchronized input feeds can be selected from the group that includes one or several broadcast video feeds, one or two feeds tracking video for the event and one or multiple play-by-play data streams of the event.

“In embodiments, multiple unsynchronized input streams include at least three feeds chosen from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes at least one of validating and modifiying the understanding that was developed using machine learning. This is done by using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method of extracting the plurality video cuts from at least one feed involves automatically extracting one of the video feeds using the combination of the understanding of the plurality events and the understanding of another feed that is selected from the group of a broadcast feed, an audio feed and a closed caption feed. Understanding the other input feed is, in some embodiments, a combination of an understanding that was developed using machine learning of at most one of a portion or a change of camera view in the input stream and the broadcast commentary.

“In embodiments, a method of providing enhanced video content includes processing at least 1 video feed through at minimum one spatiotemporal patterns recognition algorithm that uses machine-learning to understand a plurality events within at most one video feed and determine at least 1 event type for each. The method involves extracting multiple video cuts from at least one video stream. Indexing the extracted plurality video cuts is done based on at most one event type, determined using machine learning. The method also includes automatically, under computer supervision, delivering the extracted plurality video cuts to at minimum one user based at least on one of a profile or a preference.

“In embodiments the at least 1 of the user preferences and the user profile are continuously updated based upon a user’s indication that they like or dislike at least one video cut from the plurality of extracted videos cuts. The understanding that is developed using machine learning is based upon human-identified video alignment labels to identify semantic events. To develop an understanding with machine learning, at least one spatiotemporal recognition algorithm uses time-aligned content derived from multiple input sources. The method, in some embodiments, includes at least one spatiotemporal recognition algorithm. This algorithm uses a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds to deal with the multiple input sources.

“In some embodiments, machine learning is used to understand the plurality events. This includes using position tracking data over time from at least one of at least two video feeds and a chip-based player track system. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game. Machine learning is used to understand the plurality events. In some embodiments, this involves aligning multiple unsynchronized input streams related to an event in the plurality using at least one of a hierarchy or human operators. Unsynchronized input feeds can be selected from the group that includes broadcast video feeds, feeds of tracking video, and one to three play-by-play data streams. In some embodiments, multiple unsynchronized input feeds consist of at least three feeds from at least two types that are related to the event.

“In some embodiments, the method involves at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that includes at least two or more algorithms, one, or more human operators and one or several input feeds. The method, in at least one embodiment, includes the validation and modification of the understanding that was developed using machine learning using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm in this hierarchy is based on nature of the input feed.

“In embodiments, the method of extracting the plurality video cuts from at least one of the video feeds includes automatically extracting one of the video feeds using a combination the understanding of the plurality events and the understanding of another input feed chosen from the group consisting: a broadcast feed, an audio feed and a closed caption feed. The understanding gained through machine learning of another input feed is included in embodiments. This includes understanding of at least one of the plurality video cuts of a broadcast comment and a change of camera view in an input feed.

“In embodiments, a method of allowing a user express preferences related to the display of video content involves processing at minimum one video feed through at most one spatiotemporal patterns recognition algorithm that uses machine-learning to understand at least a single event in the at least 1 video feed and determine at least one type of event. An entry in a relation library that describes at most a relationship between at least two features of at least one video stream is included as the at minimum one event type. The method involves automatically extracting video content that displays the at least one event, and associating the understanding created with machine learning with the video content in the video data structure. A user interface is provided to allow a user to select at least one type of event. After receiving an indication from the user regarding their preference, the method involves retrieving at least one video data structure that was determined to be associated with at least one of the event types indicated by them by machine learning. Further, the method provides the user with a video feed that contains the video content and at least one video data structure.

“In some embodiments, machine learning is used to understand at least one event. This data can be obtained using position tracking data from time and video feeds. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item used in a video game. Machine learning is used to understand the at most one event. This involves aligning multiple unsynchronized input streams related to the event using at minimum one of a hierarchy or human operator. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds for at least one of the events, one or two feeds of tracking video, and one, or more, play-by-play feeds. The multiple unsynchronized feeds can include at most three feeds from at least two types of feeds related to the event. The method, in at least one embodiment, includes validating and modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or several input feeds. The method, in at least some embodiments, includes the validation and modification of the understanding that machine learning has produced using a hierarchy that involves at least two algorithms, at most one human operator, and at minimum one input feed. At least one item in this hierarchy is used to validate the understanding based on the nature of one or more inputs.

“In embodiments, the method comprises at least one of a mobile app, a browser or a desktop program, a remote control device and a tablet, as well as a touch screen device, virtual reality headset or augmented reality headset, and smart phone. The method, in embodiments, includes the user interface that allows the user to select the content to be presented. The understanding that the machine learning has provided includes a context for at least one event. This context is then stored in the at most one video content data format. The user interface can also include an element that allows a user to select a context preference.

“In embodiments, the process includes retrieving a portion of video content that corresponds to the context and showing the portion to the user after receiving an indication of preference for the context. The context can include at least one of the following: a preference for a player in at least 1 video feed; a preferred matchup between players in at least 1 video feed; a preferred team in at least 1 video feed; and a preferred matchup among teams in at least 1 video feed. The user interface in embodiments allows the user to choose at least one of the following: a metric or a graphic element that will be displayed on the video stream. The machine learning understanding is at least partially responsible for at least one of the metric. Extracting the content that displays the event is done in accordance with embodiments. This involves automatically extracting a portion of the video feed using both the understanding of events derived from machine learning and the understanding derived from machine learning of an input feed consisting of a broadcast feed, an audio feed and a closed caption feed. The understanding gained through machine learning of another input feed may include at least one of a part of broadcast commentary or a change of camera view in the input stream.

“In embodiments, a method of enabling a mobile app that allows user interaction to video content includes taking a live video feed and processing it through at least one spatiotemporal patterns recognition algorithm that uses machine-learning to understand an event in the video feed. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between two features visible on the video feed. The method involves automatically extracting the content that displays the event and associating it with the context information. The method also includes creating a video content data format that includes the context information. The method automatically produces a story using the video content structure under computer control. The story is partially based on the user’s preference, the context information and the video data structure.

“In embodiments, the extraction of content displaying an event involves automatically extracting a portion of the video feed using both the understanding of an event created with machine learning and the understanding gained with machine learning of an input feed. This feed can be a broadcast feed, an audio feed or a closed caption feed. The understanding gained through machine learning of another input feed may include at least one of the following: a portion of broadcast commentary or a change of camera view in the input feed. The method uses a combination the understanding from the machine learning of an event from the video feed with an understanding from machine learning of an input feed that is selected from a group of a broadcast feed, an audio feed, a closed caption feed, to edit the video and combine it with other content. The understanding gained from the machine learning of another input feed may include at least one of the following: a portion of broadcast commentary or a change of camera view in the input stream.

“In some embodiments, the method involves automatically creating a semantic index for a video feed using the machine learning of at most one event in that feed. This is based on an understanding of the time and location of the event displayed in the feed. The location of the event displayed in the video feed is determined by at least one of the following: a location for the pixel, a location for the voxel, or a location for the raster images. The method, in embodiments, includes combining the semantic index of a video feed with the video stream to allow for augmentation of the feed. Embodiments include enhancing the video feed by adding content based upon the display’s location and enabling at most one of a touch interface and a mouse interface feature based off the identified location.

“In some embodiments, the method involves using machine learning to understand the event. It also includes using events in position tracking data over the time obtained from at most one of the video feeds and a chip-based player track system. Machine learning provides a basis for understanding at least two aspects of spatial configuration, relative motion, projected motion, and projection of the player or an item in a game. The method, according to some embodiments, includes the use of machine learning to understand the event. It also involves aligning multiple unsynchronized feeds related the event using at minimum one of a hierarchy or human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and play-by-play data feeds.

“In embodiments, this method includes multiple unsynchronized inputs feeds that include at least three feeds from at least two types of event-related feeds.”

“In some embodiments, the method involves at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that includes at least two or more algorithms, one, or more human operators, and one or multiple input feeds where at most one algorithm in the hierarchy is based upon a nature of input feeds. The method, in at least one embodiment, includes the validation and modification of understandings using a hierarchy that involves at least two or more algorithms, one operator or more humans and one or multiple input feeds. At least one algorithm of the hierarchy for validation is based upon the nature of the input. In embodiments, the preference of a content type is determined by at least one of two factors: user-expressed preference or user interaction with the item.

“In embodiments, a system that allows a user express preferences regarding the display of video content contains a machine learning facility that uses at minimum one spatiotemporal patterns recognition algorithm to develop an understanding of at most one event in at least 1 video feed to determine at best one type of event. An entry in a relation library that describes at least a relationship between at least two features of at least one videofeed is included as part of the understanding. The system also includes a video production unit that automatically extracts the video content showing the at least 1 event. It then associates the understanding with the video content in at most one video content data format. A server is used to serve data to the user interface. It retrieves at least one video data structure determined by machine learning to be the preferred event type by the user and provides the user with a live feed that contains the event type.

“In embodiments, this method includes a user interface that allows the user to indicate their preference for at most one event type. This could be a mobile app, a browser or a desktop application. It can also include a remote control device, tablet or smart phone.

“In embodiments, the interface is designed to allow the user to select at least one type of event. It also contains an element that allows the user to specify a preference regarding how content will be presented. The understanding that the machine learning facility has developed includes a context for at least one event. This context is then stored with at least one video data structure. The user interface may also include an element that allows the user to select a context preference. The server that serves data to the interface retrieves video content matching the preference for at most one context. After receiving the preference, the server displays the video content to users.

“In embodiments, at least one context includes at least one of the following: a preference for a player in the live video feed; a preferred matchup between players in the feed; a preferred team in video feed; and a preferred matchup among teams in video feed.”

“In certain embodiments, the user interface allows a user to choose at least one of a number and a graphic element that will be displayed on the video feed. The machine learning facility’s understanding is at least partially used to determine the metric. The machine learning facility uses position tracking data over time to understand the at most one event in embodiments. This is obtained from at least one video feed or a chip-based player track system. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game. The machine learning facility used to understand the at most one event uses multiple unsynchronized input streams related to that event. It can use at least one of a hierarchy or human operator hierarchy. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of event tracking video, and one to several play-by-play feeds. The multiple unsynchronized feeds can include at most three feeds from at least two types of feeds related to at least one event. The video production facility at minimum one validates and modifies unsynchronized input feed alignment using a hierarchy that involves at least two or more of one or several algorithms, one, or more human operators, or one or both of these input feeds.

“In embodiments, at least one video production facility validates and modifies the understanding created by the machine learning facility using at least two of one, more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm in this hierarchy is validated based on the nature and type of the input feed. The video production facility automatically extracts the video content that displays the at minimum one event using computer control. This is done by combining the understanding of the event with machine learning and the understanding gained with machine learning of an input feed. The feed can be a broadcast feed, an audio feed or a closed caption feed. The understanding of another input feed is, in certain embodiments, at least one of the following: a portion or a change of camera view in at least one video feed.

“In embodiments, a method of delivering personalized video content involves processing at least 1 video feed using at least 1 spatiotemporal patterns recognition algorithm that uses machine-learning to understand at least 1 event in the at least 1 video feed and determine at most one type. An entry in a relation library that describes at least a relationship between at least two features of at least one video stream is included as the at minimum one event type. The method involves automatically extracting video content that displays the at least one event, and then associating the understanding created with machine learning with the video content in the video data structure. This method creates a user profile based on information about the user and at least one of their expressed preferences. It also collects information about the user about them, as well as information about their actions with respect to each event type. After receiving an indication about the user’s profile, the method includes retrieving at most one video content data format that machine learning determined to contain an event type most likely to be preferred.

“Using machine learning to understand at least one event further includes using at least one event from position tracking data over the time obtained from at minimum one of the at most one video feeds and a chip-based player track system. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

“In some embodiments, the method involves using machine learning to understand the at minimum one event. It also includes aligning multiple unsynchronized feeds related the at least 1 event using at most one of a hierarchy or human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to several play-by-play data streams of at least one event. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The method, in at least one embodiment, includes validating and modifying the alignment of unsynchronized input feeds using at least two or more algorithms and one or several human operators.

“In some embodiments, the method involves at least one of validating or modifiying the understanding that was developed using machine learning using a hierarchy that includes at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method of extracting video content that displays the at minimum one event involves automatically extracting a clip from at least one of the video feeds using a combination the understanding of the at most one event from the machine learning and the understanding of another input feed chosen from the group consisting: a broadcast feed, an audio feed and a closed caption feed. The machine learning can be used to understand the other input feed. This includes understanding at least one portion of a broadcast commentary or a change of camera view in an input feed.

“In embodiments, a method of delivering personalized video content involves processing at least 1 video feed through at minimum one spatiotemporal patterns recognition algorithm that uses machine-learning to understand at least 1 event in the at least 1 professional game video feed. Machine learning provides an entry in a relation library that details at least two of the most visible features within the video feed. This method involves the development of an understanding through machine learning of at most one event in a data feed that relates to non-professional players. The method automatically provides an enhanced video feed, which is controlled by a computer, that depicts the non-professional gamer playing in a professional context. It is based on the understanding and interpretation of at least one event from the at-least one video feed of a professional game, as well as a data feed related to the player’s motion.

“In embodiments, the process includes the provision of a facility with cameras to capture 3D motion data and video of a professional player to provide the feed for the professional player. The non-professional player in embodiments is represented by mixing at least one professional video with the video of the player. The animation that includes the nonprofessional player in embodiments is based on data feeds relating to the player’s motion. The method uses machine learning to understand the at most one event. It also includes the use of position tracking data over time from the least one video feed or a chip-based player tracking device. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

In some embodiments, machine learning is used to understand the event. This involves aligning multiple unsynchronized input streams related to the event using at most one of a hierarchy or human operator. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of event tracking video, and one to several play-by-play feeds. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in at least one embodiment, includes the validation and modification of the understanding using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds.

“In embodiments, a process includes taking a live video feed and processing it through a spatiotemporal patterns recognition algorithm that uses machine-learning to gain an understanding of the event in the feed. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between two features visible on the video feed. The method involves automatically extracting the content that displays the event and associating it with the context information. Further, the method includes creating a video content data structure which includes context information.

“In some embodiments, the process includes the determination of a plurality semantic categories for context information and filtering a plurality video content data structure based on these semantic categories. Each video content data structure of the plurality includes context information related to the event. The method, in some embodiments, includes matching events in a first feed with events in a second feed. The second feed is then separated from the first feed using a semantic understanding. A separate second feed is created based on the matching events in the first and second video streams. The method, in some embodiments, includes the identification of a pattern that relates to multiple events and the creation of a content data structure. The pattern may include a number of important plays that were identified by comparing them to previous events. The pattern may include a number of plays in a sporting event that are unusual according to comparisons to other events’ video feeds.

“In embodiments, the process includes extracting semantic events over a time to draw a comparison between at least one player and a team over a time. To illustrate the comparison, embodiments include superimposing video from two different times from multiple video feeds. The method allows a user interaction with the video content structure to create an edited video stream that includes the video structure. The interaction can include at least one of cutting, editing, cutting and sharing a video clip that contains the video content structure.

“In some embodiments, the method allows users to interact with the video data structure via a user interface. This allows them to enhance the video data structure with at minimum one graphic element from a list of options. The method allows a user to share the enhanced content. The method, in some embodiments, allows a user to search for similar videos clips using a semantic context that is identified in the clips. The method uses the video data structure and context information to create modified video content for a second display that contains the video content data. The modified video content on the second screen corresponds to the time of an event. The modified video content for the screen may include a metric that is determined using machine understanding. The context information is used to determine the metric. Embodiments include using machine learning to understand the event. This includes using position tracking data over time from at least one video feed or a chip-based player tracking device. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

In some embodiments, machine learning is used to understand the event. This involves aligning multiple unsynchronized input streams related to the event using at most one of a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from the group that includes broadcast video feeds, feeds of tracking video, and play-by-play data feeds.

“In embodiments, multiple unsynchronized input streams include at least three feeds chosen from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in at least one embodiment, includes validating and modifiying the understanding that was developed using machine learning using a hierarchy that involves at least two or more algorithms, one, or more human operators, one, or more input feeds.

“In some embodiments, the method involves automatically creating a semantic index for the video stream based on the machine learning of an event in the feed. This will indicate the time and location of the display of the event within the video feed. Embodiments specify that the location of the display event in the video feed can include at least one of a location for pixels, a location for voxels, or a location for raster images. The method, in embodiments, includes integrating the semantic index of a video feed with the video stream to allow for augmentation. Embodiments include enhancing the video feed by adding content based upon the display’s location and enabling at most one of a touch interface and a mouse interface feature based off the identified location.

“In embodiments automatically, under computer controller, extracting content displaying an event involves extracting a cut of the video feed using a combination the understanding of events from machine learning and the understanding obtained with machine learning of an input feed consisting of a broadcast feed, an audio feed and a closed caption feed. The machine learning can also understand the other input feed by identifying at least one portion of broadcast commentary or a camera change in the input feed.

“A system may include an ingestion facility that allows for the ingesting of a plurality video feeds. The machine learning system is used to process the video feed using a spatiotemporal patterns recognition algorithm. This machine learning algorithm applies machine learning to a series events within the plurality video feeds to help develop an understanding of these events. Understanding includes the identification of context information related to the series, as well as an entry in a relation library that at least details a relationship between two of the plurality video feeds. The system has an extraction facility that automatically extracts content from the sequence of events and associates the extracted content with context information. The system also includes a video publishing facility that allows the creation of a video content data structure with context information.

“In embodiments, the system contains an analytic facility that determines a plurality semantic categories for context information and filters the plurality video content data structures based upon the semantic categories. The system may include a matching engine that matches events in a first and second video feeds. This uses a semantic understanding from the first feed to determine which of the two video feeds to filter and cut. The system may also include a pattern recognition facility, which determines a pattern related to the sequence of events and creates a content data structure using that pattern. Embodiments use machine learning to understand the series of events. This includes using position tracking data over time and chip-based player tracking systems. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

In some embodiments, machine learning is used to understand the sequence of events. This involves aligning multiple unsynchronized feeds that relate to the events using at least one algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to three play-by-play data streams. In some embodiments, multiple unsynchronized input feeds consist of at least three feeds from at least two types that are related to the event.

“In some embodiments, the system includes at minimum one of validating or modifiying the alignment unsynchronized infeeds using a hierarchy that involves at least two or more algorithms, one, more human operators and one or multiple input feeds. At least one algorithm in this hierarchy for validation of unsynchronized feeds alignment is based upon a nature of input feeds. The system may include at least one of validating or modifiying the machine learning understanding using a hierarchy that involves at least two algorithms, one operator, and one or several input feeds. The nature of the input feed determines which algorithm is used.

“In some embodiments, the system automatically develops a semantic index for a video feed based on the machine learning of at most one event from the sequence of events in the feed. The semantic index of a video feed indicates the time and location of an event displayed in the feed. The location of the display event in the video feed can be any of three things: a location at least one of a location at a pixels, a location at voxels, or a location at raster images. The system, in embodiments, includes integrating the semantic index of a video feed into the video feed to allow for augmentation. Enhancement of the video feed in embodiments includes adding content based upon the location of the display, and enabling at minimum one of a touch interface and a mouse interface feature based upon the identified location.

“In embodiments, a system that allows interaction with broadcast video content streams includes a machine-learning facility for processing at minimum one video stream through a spatiotemporal patterns recognition algorithm that applies machinelearning on at most one event in the at the least one feed to gain an understanding of the at the least one feed. One video feed refers to a video broadcast. This understanding includes the identification of context information related to at least one event as well as an entry in a relation library that at least details a relationship between at least two visible features from the plurality of feeds. The system has a touch screen interface that allows at least one broadcaster interaction with at least one of the video feeds. The context information is used to provide options for interaction with at least one broadcaster via the touch screen interface. The user interface for touch screens is designed to control at least one portion of the content of the video feed. Remote viewers can control a portion of the video feed content for the broadcast using the interface. Alternately, the touch screen interface can be controlled using context information.

“In embodiments, the touchscreen interface is a large screen that can be seen by viewers as the broadcaster uses it. The touch screen interface allows the broadcaster to choose from many context-relevant metrics that will be displayed on the large screen. The touch screen interface allows the broadcaster to display multiple video feeds with similar contexts. The similarity of contexts in embodiments can be determined by comparing events from the plurality of video streams. The touch screen interface allows the broadcaster to show a superimposed view of at most two video feeds in order to allow for comparisons of events from all the feeds. The embodiments allow for the comparison of events from multiple video feeds.

“In embodiments the machine learning provides detail about similarity of players based upon the characteristics of the players during different time periods.”

“In embodiments, the touch-screen interface allows the broadcaster display a plurality highlights that are automatically determined using machine learning. This is done by the broadcaster using an understanding of a live sporting event that is the subject of at least one video feed. The plurality of highlights in embodiments is determined based on similarities to highlights from other events. Embodiments further include the use of machine learning to understand the at most one event by using events in position tracking data over the time from at least one of at least two video feeds and a chip-based player track system. The machine learning results are based on at most two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

In some embodiments, machine learning is used to understand at least one event. This involves aligning multiple unsynchronized feeds that are related to the event using at most one of a hierarchy or human operator. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to several play-by-play data streams. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The system may include at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm in this hierarchy is validated based on the nature and type of the input feed. The system may include at least one of validating or modifiying the machine learning understanding using a hierarchy that involves at least two algorithms, one or several human operators and one or multiple input feeds. A nature of an input feed is used to determine which algorithm in the hierarchy is valid. The system, in certain embodiments, automatically develops a semantic index for the at minimum one video stream based on the machine learning of at least 1 event in the video feed. This index indicates at most one time and the location of the display of at least 1 event in that video feed. The location of the display for at least 1 event in the at minimum one video feed is determined by the following: a location at least one of a location at a pixels, a location at voxels, or a location at raster images. The system, in embodiments, includes combining the semantic index of at least one of the video feeds with at least one other video feed to allow for the augmentation of that at least 1 video feed. Embodiments allow for the augmentation of at least one video stream by adding content based upon the display’s location. The touch screen interface can be configured to allow a touch screen feature as well as a mouse interface feature that is based on the location.

“In embodiments, a method for enabling interaction with a broadcast content stream involves processing a feed through a spatiotemporal patterns recognition algorithm that uses machine-learning to understand an event in a video feed. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between two features visible on the video feed. This method provides a touch screen interface that allows a broadcaster interact with the video feed. It also allows the broadcaster access to options within the touch screen interface based on context information. The touch screen interface controls content for a certain portion of the broadcast. The touch screen interface in embodiments is a large screen that can be viewed by viewers of the video broadcast while the broadcaster uses it. The touch screen interface allows the broadcaster to choose from a variety of metrics that are relevant to the event context and then display them on the large screen. The touch screen interface can be configured in embodiments to allow the broadcaster display multiple video feeds during the video broadcast. Machine learning provides similar context information for each of the plurality video feeds used in the video broadcast. The system, in certain embodiments, includes similar context information in the plurality video feeds for video broadcast. This is done by comparing events across the plurality video feeds.

“In embodiments, the touch-screen interface allows the broadcaster to display a superimposed view at least of two video feeds to allow for a comparison between events from multiple video feeds. The embodiments allow for the comparison of events from a plurality of video streams.

“The machine learning can be used to determine the similarities of players using characteristics of similar players from different times periods. The machine learning can create a variety of highlights that are based on the subject of the live sport event. The broadcaster can display the plurality highlights via the touch screen interface. The system, in certain embodiments, includes the plurality highlights determined based upon similarity to other events. Embodiments use machine learning to understand the event. This includes using position tracking data over time from the video feed or a chip-based player tracking device. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item used during a game.

“In some embodiments, the system uses machine learning to understand the event. It also includes aligning multiple unsynchronized feeds related the event using at minimum one of a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and play-by-play data feeds for at least one event.

“In some embodiments, the multiple unsynchronized feeds include at most three feeds chosen from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of multiple unsynchronized feeds related to an event using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method in some embodiments includes at least one of validating or modifiying the understanding using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes automatically creating a semantic index for the video feed using the understanding obtained from the machine learning of an event in the feed. This index will indicate the time and location of the event.

“In embodiments, the location for the display of an event in the video feed includes at minimum one of a location (pixel, voxel, or raster) Embodiments include integrating the semantic index of a video feed into the video feed in order to allow for the enhancement of the video stream. Embodiments include augmentation of video feeds by adding content based upon the location of the display and activating at least one touch interface feature or a mouse interface feature based in the identified location.

“In some embodiments, a system that allows user interaction with video content may include an ingestion facility. This facility may execute on at least one processor, and may be modified or configured to access at most one video feed. The machine learning system may be modified or configured to process at least one of the video feeds through a spatiotemporal patterns recognition algorithm. This algorithm applies machine learning to an event in each feed to help develop an understanding of that event. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between at least two visible features within the at least one feed. The system also includes an extraction facility that can be modified or configured to automatically extract the content of the event and assimilate the extracted content with context information. The system also includes a video production unit that can be modified or configured to create a video content structure that includes context information. A user interface is provided to allow users to interact with the video data structure. You may also be able to configure the user interface with context-based options that allow for interaction.

“In embodiments, an application is a mobile app. The application can be at least one of three things: a smart TV application, a virtual reality headset app, or an augmented reality app. In embodiments, the interface to the user interface is a touch screen interface. The user interface allows the user to add content to the video feed.

“In embodiments, the content elements are at least one of a graph element and a metric that is based upon the machine learning. The user interface in embodiments allows the user to select content that is relevant to a specific player during a sporting event. The user interface can be configured in certain embodiments to allow the user to choose content that relates to a particular context. For example, the context could involve a matchup between two players in a sporting event. The machine learning facility detects similarities between at least one player and at least one play in two different video feeds. To allow the user to choose at least one player and play to view a video feed, the interface allows them to do so. The user interface can be used to edit, cut and share a video clip with the video data structure. The at least one video feed in embodiments includes 3D motion camera data from a live sporting venue. The machine learning facility can increase its understanding by ingestion of a variety of events that have been previously identified. The method uses machine learning to understand the event. It also includes events in position tracking data that are collected from at least one video feed or a chip-based player track system. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

In some embodiments, machine learning is used to understand the event. This involves aligning multiple unsynchronized input streams related to the event using at most one of a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of event tracking video, and one to several play-by-play data feeds. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The system may include at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, or one or several input feeds. The system may include at least one of validating and modifying the understanding through a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds.

“In some embodiments, the system automatically develops a semantic index of video feed based upon machine understanding of an event in the feed. This index indicates a time and location for the event to be displayed in the feed. Embodiments include at least one of a location for the display of an event in the video feed, such as a location at a location at a pixel, voxel, or raster image. The system may include the provision of the semantic index of a video feed with the feed in order to allow for enhancement of the video stream.

“Enhancement of the video feed in embodiments includes adding content based upon the location of display and enabling at minimum one of a touch interface function and a mouse interface feature depending on the identified location.”

“In embodiments, the extraction of the event content includes automatically extracting a cut of at least one of the video feeds using a combination the understanding obtained with machine learning and the understanding gained with machine learning of another feed. This feed can be a broadcast feed, an audio feed or a closed caption feed. The system is described as including the machine learning of another input feed that includes at least one portion of broadcast commentary or a change of camera view in the input feed.

“In embodiments, a method of enabling a mobile app that allows user interaction with video content involves taking at least one feed and processing it through a spatiotemporal patterns recognition algorithm that uses machine-learning to understand an event in the at minimum one video stream. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between at least two visible features within the at least one feed. The method involves automatically extracting the content that displays the event and associating it with the context information. The method also includes creating a video content structure that contains the context information. A mobile application is also provided that allows a user interaction with the video content structure. Based on context information, the user interface can be configured to allow for interaction.

“In embodiments, the interface to the user interface is a touch screen interface. The user interface in embodiments allows a user to add content to the video feed. The content element in embodiments is at least one of a machine-understanding metric or a graphic element. The user interface in embodiments is designed to allow the user to select content that pertains to a specific player in a sporting event. The user interface of embodiments allows the user to select content that relates to the context in which two players are playing in a sporting event.

“In embodiments, the process includes taking at most two video feeds from different times periods. The machine learning facility determines the context that includes a similarity among at least one player and a plurality play in the at minimum two feeds. The user interface is designed to allow the user to select one of the plays and players to get a video feed that illustrates the comparison. The user interface can be used to edit, cut and share a video clip that contains the video data structure.

“In embodiments, the video feed includes 3D motion camera data from a live sporting venue. The machine learning facility in embodiments increases the ability of the method to understand by ingesting multiple events that have been previously identified. The machine learning is used to help understand the event. It also includes events in position tracking data that has been collected from at least one of the video feeds and a chip-based player track system. Understanding is based on at most two of spatial configuration, relative movement, and projected motion of at minimum one player and an item in a game.

Summary for “Methods, systems and methods of spatiotemporal pattern detection for video content creation”

“Field of Invention”

The present application relates generally to a system for performing analysis of events appearing in live and recorded videos feeds such as sporting events. The present application is a system and method for spatio-temporal analysis and extraction of components and elements of events in a video feed. This includes systems for discovering and learning about such events and metrics, as well as systems and systems for visualization and interaction with such systems and methods.

“Description of Related Art”

Live events such as sports continue to gain popularity and generate huge revenue for colleges and franchises. Quantitative methods, such as Sabermetrics have become more popular and widely accepted as an enhancement to traditional scouting methods. They can provide valuable insights and give you a competitive edge in these endeavors. Because of the sheer volume of sporting information generated every day, it is impossible to store and evaluate all the data. Additionally, tools that can extract and analyze such information are not available.

“Systems can now be used to capture and encode event information such as sporting events such as?X?Y?Z. Motion data is captured using imaging cameras installed in National Basketball Association (NBA), arenas. These systems have many limitations, such as difficulty handling the data, difficult transforming X,Y,Z data into meaningful and current sports terminology, difficulty identifying meaningful insight from the data and difficulty visualizing the results. There are also opportunities to extract new insights from the data. There is a need for systems and methods that can analyze video feeds to find relevant events and present them as metrics and insights.

“In accordance to various exemplary and other non-limiting embodiments, the methods and systems described herein allow the exploration of event data from video feeds, the discovery and presentation of relevant events (such like within a live sporting event video feed), and the presentation and analysis of new insights, analytic results and visual displays that enhance decision making, provide better entertainment, and provide additional benefits.”

“Embodiments are data taken from a video feed that enables an automated machine understanding a game. The alignment of video sources to this understanding allows for the automatic delivery of highlights to an end user. Machine learning is used to understand an event in embodiments. This data can be obtained from at most one of the video feeds and a chip-based track system. It includes events in position tracking data that are based on at minimum two of spatial configuration, relative movement, and projected motion. Machine learning is used to understand an event in various ways. This includes aligning multiple unsynchronized input streams (e.g. Tracking video, broadcast video/audio and play-by play data is possible using machine learning. At least one algorithm and a hierarchy are used to align multiple unsynchronized input feeds related to an event. The feeds include one or two broadcast video feeds, one or several feeds that track video for the event and one or many feeds that play-by play data. The multiple unsynchronized feeds may include feeds of different types, such as feeds that are of two or more types and related to the event. Embodiments can also include validating, refining or modifying the understanding or alignment of unsynchronized input feeds. This may be done using a hierarchy that involves at least two or more algorithms, one, more human operators, and one or several input feeds.

“In embodiments, the content that displays an event is automatically extracted form a video feed. This could be based on machine understanding of the event. Extracting the event content from a video feed involves automatically extracting a portion using machine understanding of events, a machine understanding and/or closed caption feeds. The machine understanding of an input feed also includes understanding at most one of the broadcast commentary’s portions and the change in camera view within the input feed. Embodiments can also include a combination machine understanding of the video feed with a machine understanding a different input feed related to those events. This could be a broadcast feed or an audio feed. At least one may edit the video cut and combine it with other content. Understanding the machine of another input feed could include understanding at most one of the broadcast commentary’s content and changing the camera view in the input feed.

“Embodiments could also include automatically creating a semantic index for a video feed based upon the machine understanding at least one event. This will indicate the time and location of the event in a video feed. It may also indicate the location of the event on a display screen, such as a location in pixels, a location in voxels, or other similar information. To augment the video feed, the semantic index may be used to add content based upon the identified location. It can also enable at least one touch interface feature or a mouse interface feature based upon the identified location.

“In accordance to further exemplary and nonlimiting embodiments, a process includes receiving a sports playing field configuration along with at least one picture and determining a camera position based at least in part upon the sport playing fields configuration and at most one image.

“According to further exemplary and nonlimiting embodiments, a method includes performing automatic recognition and augmenting video input with at minimum one of additional imagery or graphics rendered within the reconstructed 3D spatial of the scene.”

“Methods or systems described herein include taking a live video feed of an event and using machine learning to understand it; aligning the video feed with that understanding automatically under computer control; and creating a transformed feed that includes at most one highlight from the machine learning. The event could be a sporting event in some embodiments. In some embodiments, the event could be an entertainment event. The event could be either a TV event or a movie event in some embodiments. The event could be a playground pick-up game or another amateur sport game. The event can be any human activity, motion or movement in a home, commercial establishment or other place. The transformed video feed creates highlights video feeds of video for a set of players. The defined set of player may include a group of fantasy players. Embodiments could include the delivery of the video feed to at minimum one of an inbox or a mobile device, as well as a table, an app, a scoreboard and a Jumbotron board.

The methods and systems described herein include: taking a source feed relating to an events; using machine learning for understanding the event; aligning the source feed with that understanding automatically; producing a transformed feed that contains at least one highlight from the machine learning. The event could be a sporting event in some embodiments. In some embodiments, the event could be an entertainment event. The event can be either a TV event or a movie event in embodiments. The source feed can be one of the following: an audio feed; a text feed; and a speech feed.

Methods and systems described in this document may include taking a data-set associated with a live video feed; taking spatiotemporal characteristics of the live event and applying machine learning to determine at most one spatiotemporal patterns of the event. Finally, using a human validation process for at least one of the validated and taught machine learning of that spatiotemporal structure. The event could be a sporting event in some embodiments.

“Methods or systems described herein can include taking at minimum one of a photo feed and a video feed; taking data relating the venue’s known configuration; and then automatically, under computer control, recognising a camera pose based upon the video feed. The venue could be a venue for a sporting event.

“Methods or systems described herein can include: taking at least 1 feed from the group consisting a video feed as well as an image feed of the scene; collecting data relating to a configuration of a venue; automatically, using computer control, recognising a camera pose based upon the video feed and known configuration; augmenting at least 1 feed with at minimum one image and a graphic within a given space. Human input may be used to validate and aid the automatic recognition and validation of the camera pose. Methods and systems may also include the presentation of at least one metric within the augmented feed. These methods and systems can enable a user interact with at least one video feed and one frame in the video feed using a 3D user interface. These methods and systems can include augmenting at least one feed to create transformed feeds. The transformed feed could create a highlight feed of video that is visible to a set of users.

Methods and systems described in this document may include: taking a video feed of an event; taking spatiotemporal characteristics of the event; applying machine-learning to determine at most one spatiotemporal pattern; and then calculating a metrics based on that pattern. The metric in embodiments may include at least one of the following: a shot quality (SEFG metric), an EFG+ metrics, a rebound positioning measurement, a rebounding attack measure, a rebounding conversion measuring metric and an event count per event-count.

“Methods or systems described herein can include an interactive, graphic user interface that allows exploration of machine learning data from live video. The graphical user interface allows for the exploration and analysis events. The graphical user interface in embodiments can be any of the following: a tablet interface on a mobile device, a laptop interface on a tablet, a tablet interface on a large-format touchscreen, or a personal computer interface. The data can be structured to show at least one of the following: a breakdown, ranking, field-based comparison, statistical comparison and a ranking. Exploration may enable at least one of the following: a touch interaction; a gesture interaction; a voice interaction; and a motion based interaction.

“Methods or systems described herein could include: taking a dataset associated with a live video feed of an event; automatically, using computer control, recognising a camera pose for video; tracking at most one player and an object in video feed; and placing the tracked objects in a spatial location that corresponds to spatial coordinates.”

The methods and systems described herein include: taking a dataset associated with a live video feed; taking spatiotemporal characteristics of the live event and applying machine learning to determine at most one spatiotemporal pattern; and providing contextualized information during an event. The contextualized information may include at least one of the following: a replay, visualization, highlight, compilation of highlights and replay. The information can be delivered to at most one of a mobile device or tablet, as well as a broadcast video feed. Methods and systems can include touch screen interaction that displays at least one item from the contextualized information.

“In some embodiments, the methods or systems described herein can include taking a live video feed of an event; identifying the point of departure of the participant; and then automatically selecting from the video feed a plurality video frames showing at least one view taken from the viewpoint of the participant. These methods and systems can also include rendering a 3D movie using the selected plurality video frames. Methods and systems can also include an interface that allows a user to select a participant from among a number of participants. The embodiments may include a sporting event, where the participant could be a player. Embodiments may include basketball, and the video feed can be linked to 3D motion capture data captured by cameras that capture the video feed.

“In embodiments, a method of providing enhanced video content involves processing at least 1 video feed using at least 1 spatiotemporal patterns recognition algorithm that uses machine-learning to understand a plurality events and determine at most one event type for each event within the at minimum one video feed. At least one event type is an entry in a relation library that describes a relationship between at least two visible features within the at-least one video feed. Method includes extracting multiple video cuts from at least one video stream. The method also includes indexing the extracted plurality video cuts based upon the at least one event type that was determined by machine learning. This corresponds to an event within the plurality events detectable in each of the plurality video cuts. A computer controlled method automatically generates an enhanced video content structure from the extracted plurality video cuts. This is based on the indexing data of the extracted plurality.

“In embodiments, at least one spatiotemporal recognition algorithm uses at least one pattern from the group consisting: relative motion of two features towards each other for at most a duration threshold; acceleration of motion with respect to at least two features for acceleration of at minimum an acceleration threshold; rate of motion between two visible feature toward each other; projected point of intersection between the two features; and separation distance between them being less than a separation limit. The automatic generation of the enhanced video content structure is accomplished by combining an understanding of the plurality with an understanding at least one broadcast video event or broadcast audio event created with machine learning. The generation of the enhanced content data structure depends at least partially on a user preference or a profile for the user for whom the enhanced content data structure was generated.

“In some embodiments, the method provides a user interface that can be displayed on a mobile device. At least one of the search options and filtering options allow users to specify and choose a description for a type event. The description is matched by the enhanced video content data structure. The machine learning is used to understand the event in various embodiments. This includes using position tracking data from multiple events over time, obtained from at most one video feed and a chip based player tracking system. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

In some embodiments, machine learning is used to understand the plurality events. This involves aligning multiple unsynchronized feeds related to one event using at least one algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to several play-by-play data streams. The multiple unsynchronized feeds are at least three feeds from at least two events related to the event. The method may also include at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes at least one of validating and modifiying the understanding of the machine learning using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes automatically creating a semantic index for the at minimum one video feed using the understanding of at least one of the plurality events in the feed. This index will indicate the game time and location of the display of the event in the feed.

“In embodiments, the location for the display of at least one event within the video feed includes at minimum one of a location (pixel, voxel, or raster) The method, in embodiments, includes the provision of semantic index to the video feed with the video stream configured to enable semantically-based augmentation. Embodiments include enhancing the video feed by adding content based upon the identified location and enabling at minimum one of a touch interface and a mouse interface feature based upon the identified location.”

“Extracting the plurality video cuts in embodiments involves automatically extracting a video cut from the feed using a combination the machine-learning developed understanding of the plurality events and an understanding created with the machine learning on another input feed. This feed can be a broadcast feed, an audio feed or a closed caption feed. The understanding gained through machine learning of another input feed is included in embodiments. This includes understanding at least one portion of a broadcast comment and a change of camera view in another input feed.

“In some embodiments, the process includes processing at most one video feed using at least 1 spatiotemporal recognition algorithm that uses machine-learning to understand a plurality events within the video feed and determine at least 1 event type for each. At least one event type is an entry in a relation library that describes a relationship between at least two visible features of the video stream. Method includes extracting multiple video cuts from at least one video feed. The method also includes indexing the plurality video cuts based upon the at least one event type that is determined by machine learning. A mobile application is also provided as part of the method. It allows users to search for the extracted plurality video cuts using the indexing of the mobile app.

“In embodiments, at least one spatiotemporal recognition algorithm is based upon at least one pattern from the group consisting: relative motion of two features towards each other for at most a duration threshold; acceleration of motion with respect to at least two features for acceleration of at minimum an acceleration threshold; rate of motion between two visible feature toward each other; projected point of intersection between the two features; separation distance between them being less than a separation limit. The machine learning method generates at least one metric for each event in the plurality. The mobile application’s user interface allows the user to select the metric that will be included in the edited video. The method, in certain embodiments, includes the mobile app’s user interface that allows the user to edit a video and share it via the mobile app. The machine learning is used to understand the plurality events. In addition, the method uses position tracking data over time from at least one of at least two video feeds and a chip-based player tracker system to determine the number of events. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

“In some embodiments, the process includes the use of machine learning to understand the plurality events. It also involves aligning multiple unsynchronized feeds related to one event of the plurality using at least one from a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from the group that includes broadcast video feeds, feeds of tracking video, and one to three play-by-play data streams. In some embodiments, multiple unsynchronized input feeds consist of at least three feeds from at least two types of event-related feeds.

“In some embodiments, the method involves at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that includes at least two or more algorithms, one, or more human operators, and one or multiple input feeds where at most one algorithm in the hierarchy is based on the nature the input feed. The method, in some embodiments, includes at least one of validating and modifying the understanding using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm of the hierarchy for validating is based upon the nature of the input. Extracting the plurality video cuts from at least one feed involves automatically extracting one of the video clips using the combination of an understanding of the plurality events and an understanding of another feed that is selected from the group of a broadcast feed, an audio feed and a closed caption feed. Understanding the other input feed is, in some instances, a combination of an understanding from machine learning at least one portion of a broadcast commentary or a change of camera view in the input stream.

“In certain embodiments, a method of providing enhanced video content involves processing at least 1 video feed using at least 1 spatiotemporal patterns recognition algorithm that uses machine-learning to understand a plurality events within the at least 1 video feed and determine at most one event type for each. The method involves extracting multiple video cuts from at least one video stream. The method also includes indexing the plurality video cuts based upon the at least 1 event type that was determined using machine learning. It also involves determining at minimum one pattern related to the extracted plurality video cuts. Further, the method includes indexing at most a portion the plurality extracted video cuts with an indicator to the pattern.

Machine learning is used to develop at least one pattern in embodiments. The machine learning can help identify at least one participant in an event. Indexing the extracted plurality video cuts involves identifying at most one player in each of the video clips from the plurality. In certain embodiments, the at most one pattern refers to a series or similar event types that involve the same player over time. A plurality of video clips can be used to show a player participating in multiple identical events over time.

“In some embodiments, the method provides an enhanced video feed that displays a player during a plurality of events over varying times. An enhanced video feed includes at least one of a simultaneous superimposed video of the players involved in multiple identical event types as well as a sequential video of each player involved in that event type. In certain embodiments, the process of determining at most one pattern involves identifying sequences that are likely to predict an action that will follow. The method of determining at least one pattern involves identifying similar sequences across multiple video feeds. The method, in some embodiments, includes a user interface that allows a user at least to view and interact with the pattern.

“In embodiments, at least one pattern is personalized and interaction options are customized based on at minimum one of a user preference or a profile. The at least one pattern in embodiments refers to the expected outcome of at least 1 of a game or an event within a gaming game. The method, in embodiments, includes providing the user with at minimum one of trend information, a statistic and a prediction based upon the at least 1 pattern. The at least one statistic, trend information, or prediction in embodiments are based on at minimum one of a user preference, and a profile. The method can include at least one pattern that relates to the play of an athlete. The method, in some embodiments, includes a comparison of an athlete’s play with another one based on the similarity of at most one of the extracted plurality video cuts and at least one pattern. The comparison is made between a professional and non-professional athlete in certain embodiments. The embodiments compare a professional athlete’s playing style to a non-professional user based on the machine learning analysis of at least one event and at least one pattern.

“In some embodiments, the machine learning of the plurality events also includes using the plurality in position tracking data over the time from at least one of at least one video feeds and a chip-based player track system. The machine learning enables us to understand at least two aspects of spatial configuration, relative motion, projected motion and projection of the player and any item in a game. The machine learning of multiple events allows for alignment of multiple unsynchronized input feeds that relate to one event using at least one algorithm and a hierarchy human operators. Unsynchronized input feeds can be selected from the group that includes one or several broadcast video feeds, one or two feeds tracking video for the event and one or multiple play-by-play data streams of the event.

“In embodiments, multiple unsynchronized input streams include at least three feeds chosen from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes at least one of validating and modifiying the understanding that was developed using machine learning. This is done by using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method of extracting the plurality video cuts from at least one feed involves automatically extracting one of the video feeds using the combination of the understanding of the plurality events and the understanding of another feed that is selected from the group of a broadcast feed, an audio feed and a closed caption feed. Understanding the other input feed is, in some embodiments, a combination of an understanding that was developed using machine learning of at most one of a portion or a change of camera view in the input stream and the broadcast commentary.

“In embodiments, a method of providing enhanced video content includes processing at least 1 video feed through at minimum one spatiotemporal patterns recognition algorithm that uses machine-learning to understand a plurality events within at most one video feed and determine at least 1 event type for each. The method involves extracting multiple video cuts from at least one video stream. Indexing the extracted plurality video cuts is done based on at most one event type, determined using machine learning. The method also includes automatically, under computer supervision, delivering the extracted plurality video cuts to at minimum one user based at least on one of a profile or a preference.

“In embodiments the at least 1 of the user preferences and the user profile are continuously updated based upon a user’s indication that they like or dislike at least one video cut from the plurality of extracted videos cuts. The understanding that is developed using machine learning is based upon human-identified video alignment labels to identify semantic events. To develop an understanding with machine learning, at least one spatiotemporal recognition algorithm uses time-aligned content derived from multiple input sources. The method, in some embodiments, includes at least one spatiotemporal recognition algorithm. This algorithm uses a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds to deal with the multiple input sources.

“In some embodiments, machine learning is used to understand the plurality events. This includes using position tracking data over time from at least one of at least two video feeds and a chip-based player track system. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game. Machine learning is used to understand the plurality events. In some embodiments, this involves aligning multiple unsynchronized input streams related to an event in the plurality using at least one of a hierarchy or human operators. Unsynchronized input feeds can be selected from the group that includes broadcast video feeds, feeds of tracking video, and one to three play-by-play data streams. In some embodiments, multiple unsynchronized input feeds consist of at least three feeds from at least two types that are related to the event.

“In some embodiments, the method involves at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that includes at least two or more algorithms, one, or more human operators and one or several input feeds. The method, in at least one embodiment, includes the validation and modification of the understanding that was developed using machine learning using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm in this hierarchy is based on nature of the input feed.

“In embodiments, the method of extracting the plurality video cuts from at least one of the video feeds includes automatically extracting one of the video feeds using a combination the understanding of the plurality events and the understanding of another input feed chosen from the group consisting: a broadcast feed, an audio feed and a closed caption feed. The understanding gained through machine learning of another input feed is included in embodiments. This includes understanding of at least one of the plurality video cuts of a broadcast comment and a change of camera view in an input feed.

“In embodiments, a method of allowing a user express preferences related to the display of video content involves processing at minimum one video feed through at most one spatiotemporal patterns recognition algorithm that uses machine-learning to understand at least a single event in the at least 1 video feed and determine at least one type of event. An entry in a relation library that describes at most a relationship between at least two features of at least one video stream is included as the at minimum one event type. The method involves automatically extracting video content that displays the at least one event, and associating the understanding created with machine learning with the video content in the video data structure. A user interface is provided to allow a user to select at least one type of event. After receiving an indication from the user regarding their preference, the method involves retrieving at least one video data structure that was determined to be associated with at least one of the event types indicated by them by machine learning. Further, the method provides the user with a video feed that contains the video content and at least one video data structure.

“In some embodiments, machine learning is used to understand at least one event. This data can be obtained using position tracking data from time and video feeds. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item used in a video game. Machine learning is used to understand the at most one event. This involves aligning multiple unsynchronized input streams related to the event using at minimum one of a hierarchy or human operator. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds for at least one of the events, one or two feeds of tracking video, and one, or more, play-by-play feeds. The multiple unsynchronized feeds can include at most three feeds from at least two types of feeds related to the event. The method, in at least one embodiment, includes validating and modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or several input feeds. The method, in at least some embodiments, includes the validation and modification of the understanding that machine learning has produced using a hierarchy that involves at least two algorithms, at most one human operator, and at minimum one input feed. At least one item in this hierarchy is used to validate the understanding based on the nature of one or more inputs.

“In embodiments, the method comprises at least one of a mobile app, a browser or a desktop program, a remote control device and a tablet, as well as a touch screen device, virtual reality headset or augmented reality headset, and smart phone. The method, in embodiments, includes the user interface that allows the user to select the content to be presented. The understanding that the machine learning has provided includes a context for at least one event. This context is then stored in the at most one video content data format. The user interface can also include an element that allows a user to select a context preference.

“In embodiments, the process includes retrieving a portion of video content that corresponds to the context and showing the portion to the user after receiving an indication of preference for the context. The context can include at least one of the following: a preference for a player in at least 1 video feed; a preferred matchup between players in at least 1 video feed; a preferred team in at least 1 video feed; and a preferred matchup among teams in at least 1 video feed. The user interface in embodiments allows the user to choose at least one of the following: a metric or a graphic element that will be displayed on the video stream. The machine learning understanding is at least partially responsible for at least one of the metric. Extracting the content that displays the event is done in accordance with embodiments. This involves automatically extracting a portion of the video feed using both the understanding of events derived from machine learning and the understanding derived from machine learning of an input feed consisting of a broadcast feed, an audio feed and a closed caption feed. The understanding gained through machine learning of another input feed may include at least one of a part of broadcast commentary or a change of camera view in the input stream.

“In embodiments, a method of enabling a mobile app that allows user interaction to video content includes taking a live video feed and processing it through at least one spatiotemporal patterns recognition algorithm that uses machine-learning to understand an event in the video feed. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between two features visible on the video feed. The method involves automatically extracting the content that displays the event and associating it with the context information. The method also includes creating a video content data format that includes the context information. The method automatically produces a story using the video content structure under computer control. The story is partially based on the user’s preference, the context information and the video data structure.

“In embodiments, the extraction of content displaying an event involves automatically extracting a portion of the video feed using both the understanding of an event created with machine learning and the understanding gained with machine learning of an input feed. This feed can be a broadcast feed, an audio feed or a closed caption feed. The understanding gained through machine learning of another input feed may include at least one of the following: a portion of broadcast commentary or a change of camera view in the input feed. The method uses a combination the understanding from the machine learning of an event from the video feed with an understanding from machine learning of an input feed that is selected from a group of a broadcast feed, an audio feed, a closed caption feed, to edit the video and combine it with other content. The understanding gained from the machine learning of another input feed may include at least one of the following: a portion of broadcast commentary or a change of camera view in the input stream.

“In some embodiments, the method involves automatically creating a semantic index for a video feed using the machine learning of at most one event in that feed. This is based on an understanding of the time and location of the event displayed in the feed. The location of the event displayed in the video feed is determined by at least one of the following: a location for the pixel, a location for the voxel, or a location for the raster images. The method, in embodiments, includes combining the semantic index of a video feed with the video stream to allow for augmentation of the feed. Embodiments include enhancing the video feed by adding content based upon the display’s location and enabling at most one of a touch interface and a mouse interface feature based off the identified location.

“In some embodiments, the method involves using machine learning to understand the event. It also includes using events in position tracking data over the time obtained from at most one of the video feeds and a chip-based player track system. Machine learning provides a basis for understanding at least two aspects of spatial configuration, relative motion, projected motion, and projection of the player or an item in a game. The method, according to some embodiments, includes the use of machine learning to understand the event. It also involves aligning multiple unsynchronized feeds related the event using at minimum one of a hierarchy or human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and play-by-play data feeds.

“In embodiments, this method includes multiple unsynchronized inputs feeds that include at least three feeds from at least two types of event-related feeds.”

“In some embodiments, the method involves at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that includes at least two or more algorithms, one, or more human operators, and one or multiple input feeds where at most one algorithm in the hierarchy is based upon a nature of input feeds. The method, in at least one embodiment, includes the validation and modification of understandings using a hierarchy that involves at least two or more algorithms, one operator or more humans and one or multiple input feeds. At least one algorithm of the hierarchy for validation is based upon the nature of the input. In embodiments, the preference of a content type is determined by at least one of two factors: user-expressed preference or user interaction with the item.

“In embodiments, a system that allows a user express preferences regarding the display of video content contains a machine learning facility that uses at minimum one spatiotemporal patterns recognition algorithm to develop an understanding of at most one event in at least 1 video feed to determine at best one type of event. An entry in a relation library that describes at least a relationship between at least two features of at least one videofeed is included as part of the understanding. The system also includes a video production unit that automatically extracts the video content showing the at least 1 event. It then associates the understanding with the video content in at most one video content data format. A server is used to serve data to the user interface. It retrieves at least one video data structure determined by machine learning to be the preferred event type by the user and provides the user with a live feed that contains the event type.

“In embodiments, this method includes a user interface that allows the user to indicate their preference for at most one event type. This could be a mobile app, a browser or a desktop application. It can also include a remote control device, tablet or smart phone.

“In embodiments, the interface is designed to allow the user to select at least one type of event. It also contains an element that allows the user to specify a preference regarding how content will be presented. The understanding that the machine learning facility has developed includes a context for at least one event. This context is then stored with at least one video data structure. The user interface may also include an element that allows the user to select a context preference. The server that serves data to the interface retrieves video content matching the preference for at most one context. After receiving the preference, the server displays the video content to users.

“In embodiments, at least one context includes at least one of the following: a preference for a player in the live video feed; a preferred matchup between players in the feed; a preferred team in video feed; and a preferred matchup among teams in video feed.”

“In certain embodiments, the user interface allows a user to choose at least one of a number and a graphic element that will be displayed on the video feed. The machine learning facility’s understanding is at least partially used to determine the metric. The machine learning facility uses position tracking data over time to understand the at most one event in embodiments. This is obtained from at least one video feed or a chip-based player track system. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game. The machine learning facility used to understand the at most one event uses multiple unsynchronized input streams related to that event. It can use at least one of a hierarchy or human operator hierarchy. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of event tracking video, and one to several play-by-play feeds. The multiple unsynchronized feeds can include at most three feeds from at least two types of feeds related to at least one event. The video production facility at minimum one validates and modifies unsynchronized input feed alignment using a hierarchy that involves at least two or more of one or several algorithms, one, or more human operators, or one or both of these input feeds.

“In embodiments, at least one video production facility validates and modifies the understanding created by the machine learning facility using at least two of one, more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm in this hierarchy is validated based on the nature and type of the input feed. The video production facility automatically extracts the video content that displays the at minimum one event using computer control. This is done by combining the understanding of the event with machine learning and the understanding gained with machine learning of an input feed. The feed can be a broadcast feed, an audio feed or a closed caption feed. The understanding of another input feed is, in certain embodiments, at least one of the following: a portion or a change of camera view in at least one video feed.

“In embodiments, a method of delivering personalized video content involves processing at least 1 video feed using at least 1 spatiotemporal patterns recognition algorithm that uses machine-learning to understand at least 1 event in the at least 1 video feed and determine at most one type. An entry in a relation library that describes at least a relationship between at least two features of at least one video stream is included as the at minimum one event type. The method involves automatically extracting video content that displays the at least one event, and then associating the understanding created with machine learning with the video content in the video data structure. This method creates a user profile based on information about the user and at least one of their expressed preferences. It also collects information about the user about them, as well as information about their actions with respect to each event type. After receiving an indication about the user’s profile, the method includes retrieving at most one video content data format that machine learning determined to contain an event type most likely to be preferred.

“Using machine learning to understand at least one event further includes using at least one event from position tracking data over the time obtained from at minimum one of the at most one video feeds and a chip-based player track system. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

“In some embodiments, the method involves using machine learning to understand the at minimum one event. It also includes aligning multiple unsynchronized feeds related the at least 1 event using at most one of a hierarchy or human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to several play-by-play data streams of at least one event. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The method, in at least one embodiment, includes validating and modifying the alignment of unsynchronized input feeds using at least two or more algorithms and one or several human operators.

“In some embodiments, the method involves at least one of validating or modifiying the understanding that was developed using machine learning using a hierarchy that includes at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method of extracting video content that displays the at minimum one event involves automatically extracting a clip from at least one of the video feeds using a combination the understanding of the at most one event from the machine learning and the understanding of another input feed chosen from the group consisting: a broadcast feed, an audio feed and a closed caption feed. The machine learning can be used to understand the other input feed. This includes understanding at least one portion of a broadcast commentary or a change of camera view in an input feed.

“In embodiments, a method of delivering personalized video content involves processing at least 1 video feed through at minimum one spatiotemporal patterns recognition algorithm that uses machine-learning to understand at least 1 event in the at least 1 professional game video feed. Machine learning provides an entry in a relation library that details at least two of the most visible features within the video feed. This method involves the development of an understanding through machine learning of at most one event in a data feed that relates to non-professional players. The method automatically provides an enhanced video feed, which is controlled by a computer, that depicts the non-professional gamer playing in a professional context. It is based on the understanding and interpretation of at least one event from the at-least one video feed of a professional game, as well as a data feed related to the player’s motion.

“In embodiments, the process includes the provision of a facility with cameras to capture 3D motion data and video of a professional player to provide the feed for the professional player. The non-professional player in embodiments is represented by mixing at least one professional video with the video of the player. The animation that includes the nonprofessional player in embodiments is based on data feeds relating to the player’s motion. The method uses machine learning to understand the at most one event. It also includes the use of position tracking data over time from the least one video feed or a chip-based player tracking device. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

In some embodiments, machine learning is used to understand the event. This involves aligning multiple unsynchronized input streams related to the event using at most one of a hierarchy or human operator. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of event tracking video, and one to several play-by-play feeds. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in at least one embodiment, includes the validation and modification of the understanding using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds.

“In embodiments, a process includes taking a live video feed and processing it through a spatiotemporal patterns recognition algorithm that uses machine-learning to gain an understanding of the event in the feed. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between two features visible on the video feed. The method involves automatically extracting the content that displays the event and associating it with the context information. Further, the method includes creating a video content data structure which includes context information.

“In some embodiments, the process includes the determination of a plurality semantic categories for context information and filtering a plurality video content data structure based on these semantic categories. Each video content data structure of the plurality includes context information related to the event. The method, in some embodiments, includes matching events in a first feed with events in a second feed. The second feed is then separated from the first feed using a semantic understanding. A separate second feed is created based on the matching events in the first and second video streams. The method, in some embodiments, includes the identification of a pattern that relates to multiple events and the creation of a content data structure. The pattern may include a number of important plays that were identified by comparing them to previous events. The pattern may include a number of plays in a sporting event that are unusual according to comparisons to other events’ video feeds.

“In embodiments, the process includes extracting semantic events over a time to draw a comparison between at least one player and a team over a time. To illustrate the comparison, embodiments include superimposing video from two different times from multiple video feeds. The method allows a user interaction with the video content structure to create an edited video stream that includes the video structure. The interaction can include at least one of cutting, editing, cutting and sharing a video clip that contains the video content structure.

“In some embodiments, the method allows users to interact with the video data structure via a user interface. This allows them to enhance the video data structure with at minimum one graphic element from a list of options. The method allows a user to share the enhanced content. The method, in some embodiments, allows a user to search for similar videos clips using a semantic context that is identified in the clips. The method uses the video data structure and context information to create modified video content for a second display that contains the video content data. The modified video content on the second screen corresponds to the time of an event. The modified video content for the screen may include a metric that is determined using machine understanding. The context information is used to determine the metric. Embodiments include using machine learning to understand the event. This includes using position tracking data over time from at least one video feed or a chip-based player tracking device. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

In some embodiments, machine learning is used to understand the event. This involves aligning multiple unsynchronized input streams related to the event using at most one of a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from the group that includes broadcast video feeds, feeds of tracking video, and play-by-play data feeds.

“In embodiments, multiple unsynchronized input streams include at least three feeds chosen from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in at least one embodiment, includes validating and modifiying the understanding that was developed using machine learning using a hierarchy that involves at least two or more algorithms, one, or more human operators, one, or more input feeds.

“In some embodiments, the method involves automatically creating a semantic index for the video stream based on the machine learning of an event in the feed. This will indicate the time and location of the display of the event within the video feed. Embodiments specify that the location of the display event in the video feed can include at least one of a location for pixels, a location for voxels, or a location for raster images. The method, in embodiments, includes integrating the semantic index of a video feed with the video stream to allow for augmentation. Embodiments include enhancing the video feed by adding content based upon the display’s location and enabling at most one of a touch interface and a mouse interface feature based off the identified location.

“In embodiments automatically, under computer controller, extracting content displaying an event involves extracting a cut of the video feed using a combination the understanding of events from machine learning and the understanding obtained with machine learning of an input feed consisting of a broadcast feed, an audio feed and a closed caption feed. The machine learning can also understand the other input feed by identifying at least one portion of broadcast commentary or a camera change in the input feed.

“A system may include an ingestion facility that allows for the ingesting of a plurality video feeds. The machine learning system is used to process the video feed using a spatiotemporal patterns recognition algorithm. This machine learning algorithm applies machine learning to a series events within the plurality video feeds to help develop an understanding of these events. Understanding includes the identification of context information related to the series, as well as an entry in a relation library that at least details a relationship between two of the plurality video feeds. The system has an extraction facility that automatically extracts content from the sequence of events and associates the extracted content with context information. The system also includes a video publishing facility that allows the creation of a video content data structure with context information.

“In embodiments, the system contains an analytic facility that determines a plurality semantic categories for context information and filters the plurality video content data structures based upon the semantic categories. The system may include a matching engine that matches events in a first and second video feeds. This uses a semantic understanding from the first feed to determine which of the two video feeds to filter and cut. The system may also include a pattern recognition facility, which determines a pattern related to the sequence of events and creates a content data structure using that pattern. Embodiments use machine learning to understand the series of events. This includes using position tracking data over time and chip-based player tracking systems. The machine learning enables us to understand at least two things: spatial configuration, relative motion, projected motion and the movement of an item in a game.

In some embodiments, machine learning is used to understand the sequence of events. This involves aligning multiple unsynchronized feeds that relate to the events using at least one algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to three play-by-play data streams. In some embodiments, multiple unsynchronized input feeds consist of at least three feeds from at least two types that are related to the event.

“In some embodiments, the system includes at minimum one of validating or modifiying the alignment unsynchronized infeeds using a hierarchy that involves at least two or more algorithms, one, more human operators and one or multiple input feeds. At least one algorithm in this hierarchy for validation of unsynchronized feeds alignment is based upon a nature of input feeds. The system may include at least one of validating or modifiying the machine learning understanding using a hierarchy that involves at least two algorithms, one operator, and one or several input feeds. The nature of the input feed determines which algorithm is used.

“In some embodiments, the system automatically develops a semantic index for a video feed based on the machine learning of at most one event from the sequence of events in the feed. The semantic index of a video feed indicates the time and location of an event displayed in the feed. The location of the display event in the video feed can be any of three things: a location at least one of a location at a pixels, a location at voxels, or a location at raster images. The system, in embodiments, includes integrating the semantic index of a video feed into the video feed to allow for augmentation. Enhancement of the video feed in embodiments includes adding content based upon the location of the display, and enabling at minimum one of a touch interface and a mouse interface feature based upon the identified location.

“In embodiments, a system that allows interaction with broadcast video content streams includes a machine-learning facility for processing at minimum one video stream through a spatiotemporal patterns recognition algorithm that applies machinelearning on at most one event in the at the least one feed to gain an understanding of the at the least one feed. One video feed refers to a video broadcast. This understanding includes the identification of context information related to at least one event as well as an entry in a relation library that at least details a relationship between at least two visible features from the plurality of feeds. The system has a touch screen interface that allows at least one broadcaster interaction with at least one of the video feeds. The context information is used to provide options for interaction with at least one broadcaster via the touch screen interface. The user interface for touch screens is designed to control at least one portion of the content of the video feed. Remote viewers can control a portion of the video feed content for the broadcast using the interface. Alternately, the touch screen interface can be controlled using context information.

“In embodiments, the touchscreen interface is a large screen that can be seen by viewers as the broadcaster uses it. The touch screen interface allows the broadcaster to choose from many context-relevant metrics that will be displayed on the large screen. The touch screen interface allows the broadcaster to display multiple video feeds with similar contexts. The similarity of contexts in embodiments can be determined by comparing events from the plurality of video streams. The touch screen interface allows the broadcaster to show a superimposed view of at most two video feeds in order to allow for comparisons of events from all the feeds. The embodiments allow for the comparison of events from multiple video feeds.

“In embodiments the machine learning provides detail about similarity of players based upon the characteristics of the players during different time periods.”

“In embodiments, the touch-screen interface allows the broadcaster display a plurality highlights that are automatically determined using machine learning. This is done by the broadcaster using an understanding of a live sporting event that is the subject of at least one video feed. The plurality of highlights in embodiments is determined based on similarities to highlights from other events. Embodiments further include the use of machine learning to understand the at most one event by using events in position tracking data over the time from at least one of at least two video feeds and a chip-based player track system. The machine learning results are based on at most two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

In some embodiments, machine learning is used to understand at least one event. This involves aligning multiple unsynchronized feeds that are related to the event using at most one of a hierarchy or human operator. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and one to several play-by-play data streams. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The system may include at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, and one or multiple input feeds. At least one algorithm in this hierarchy is validated based on the nature and type of the input feed. The system may include at least one of validating or modifiying the machine learning understanding using a hierarchy that involves at least two algorithms, one or several human operators and one or multiple input feeds. A nature of an input feed is used to determine which algorithm in the hierarchy is valid. The system, in certain embodiments, automatically develops a semantic index for the at minimum one video stream based on the machine learning of at least 1 event in the video feed. This index indicates at most one time and the location of the display of at least 1 event in that video feed. The location of the display for at least 1 event in the at minimum one video feed is determined by the following: a location at least one of a location at a pixels, a location at voxels, or a location at raster images. The system, in embodiments, includes combining the semantic index of at least one of the video feeds with at least one other video feed to allow for the augmentation of that at least 1 video feed. Embodiments allow for the augmentation of at least one video stream by adding content based upon the display’s location. The touch screen interface can be configured to allow a touch screen feature as well as a mouse interface feature that is based on the location.

“In embodiments, a method for enabling interaction with a broadcast content stream involves processing a feed through a spatiotemporal patterns recognition algorithm that uses machine-learning to understand an event in a video feed. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between two features visible on the video feed. This method provides a touch screen interface that allows a broadcaster interact with the video feed. It also allows the broadcaster access to options within the touch screen interface based on context information. The touch screen interface controls content for a certain portion of the broadcast. The touch screen interface in embodiments is a large screen that can be viewed by viewers of the video broadcast while the broadcaster uses it. The touch screen interface allows the broadcaster to choose from a variety of metrics that are relevant to the event context and then display them on the large screen. The touch screen interface can be configured in embodiments to allow the broadcaster display multiple video feeds during the video broadcast. Machine learning provides similar context information for each of the plurality video feeds used in the video broadcast. The system, in certain embodiments, includes similar context information in the plurality video feeds for video broadcast. This is done by comparing events across the plurality video feeds.

“In embodiments, the touch-screen interface allows the broadcaster to display a superimposed view at least of two video feeds to allow for a comparison between events from multiple video feeds. The embodiments allow for the comparison of events from a plurality of video streams.

“The machine learning can be used to determine the similarities of players using characteristics of similar players from different times periods. The machine learning can create a variety of highlights that are based on the subject of the live sport event. The broadcaster can display the plurality highlights via the touch screen interface. The system, in certain embodiments, includes the plurality highlights determined based upon similarity to other events. Embodiments use machine learning to understand the event. This includes using position tracking data over time from the video feed or a chip-based player tracking device. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item used during a game.

“In some embodiments, the system uses machine learning to understand the event. It also includes aligning multiple unsynchronized feeds related the event using at minimum one of a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of tracking video, and play-by-play data feeds for at least one event.

“In some embodiments, the multiple unsynchronized feeds include at most three feeds chosen from at least two types of event-related feeds. The method in embodiments includes at least one of validating or modifiying the alignment of multiple unsynchronized feeds related to an event using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method in some embodiments includes at least one of validating or modifiying the understanding using a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds. The method, in some embodiments, includes automatically creating a semantic index for the video feed using the understanding obtained from the machine learning of an event in the feed. This index will indicate the time and location of the event.

“In embodiments, the location for the display of an event in the video feed includes at minimum one of a location (pixel, voxel, or raster) Embodiments include integrating the semantic index of a video feed into the video feed in order to allow for the enhancement of the video stream. Embodiments include augmentation of video feeds by adding content based upon the location of the display and activating at least one touch interface feature or a mouse interface feature based in the identified location.

“In some embodiments, a system that allows user interaction with video content may include an ingestion facility. This facility may execute on at least one processor, and may be modified or configured to access at most one video feed. The machine learning system may be modified or configured to process at least one of the video feeds through a spatiotemporal patterns recognition algorithm. This algorithm applies machine learning to an event in each feed to help develop an understanding of that event. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between at least two visible features within the at least one feed. The system also includes an extraction facility that can be modified or configured to automatically extract the content of the event and assimilate the extracted content with context information. The system also includes a video production unit that can be modified or configured to create a video content structure that includes context information. A user interface is provided to allow users to interact with the video data structure. You may also be able to configure the user interface with context-based options that allow for interaction.

“In embodiments, an application is a mobile app. The application can be at least one of three things: a smart TV application, a virtual reality headset app, or an augmented reality app. In embodiments, the interface to the user interface is a touch screen interface. The user interface allows the user to add content to the video feed.

“In embodiments, the content elements are at least one of a graph element and a metric that is based upon the machine learning. The user interface in embodiments allows the user to select content that is relevant to a specific player during a sporting event. The user interface can be configured in certain embodiments to allow the user to choose content that relates to a particular context. For example, the context could involve a matchup between two players in a sporting event. The machine learning facility detects similarities between at least one player and at least one play in two different video feeds. To allow the user to choose at least one player and play to view a video feed, the interface allows them to do so. The user interface can be used to edit, cut and share a video clip with the video data structure. The at least one video feed in embodiments includes 3D motion camera data from a live sporting venue. The machine learning facility can increase its understanding by ingestion of a variety of events that have been previously identified. The method uses machine learning to understand the event. It also includes events in position tracking data that are collected from at least one video feed or a chip-based player track system. Understanding is based at least on two factors: spatial configuration, relative motion, projected motion and the movement of at least one player or an item in a game.

In some embodiments, machine learning is used to understand the event. This involves aligning multiple unsynchronized input streams related to the event using at most one of a hierarchy algorithm or a hierarchy human operators. Unsynchronized input feeds can be selected from a group that includes broadcast video feeds, feeds of event tracking video, and one to several play-by-play data feeds. The multiple unsynchronized feeds can include at least three feeds from at least two types of event-related feeds. The system may include at least one of validating or modifying the alignment of unsynchronized inputs using a hierarchy that involves at least two or more algorithms, one, or more human operators, or one or several input feeds. The system may include at least one of validating and modifying the understanding through a hierarchy that involves at least two or more algorithms, one, or more human operators, as well as one or several input feeds.

“In some embodiments, the system automatically develops a semantic index of video feed based upon machine understanding of an event in the feed. This index indicates a time and location for the event to be displayed in the feed. Embodiments include at least one of a location for the display of an event in the video feed, such as a location at a location at a pixel, voxel, or raster image. The system may include the provision of the semantic index of a video feed with the feed in order to allow for enhancement of the video stream.

“Enhancement of the video feed in embodiments includes adding content based upon the location of display and enabling at minimum one of a touch interface function and a mouse interface feature depending on the identified location.”

“In embodiments, the extraction of the event content includes automatically extracting a cut of at least one of the video feeds using a combination the understanding obtained with machine learning and the understanding gained with machine learning of another feed. This feed can be a broadcast feed, an audio feed or a closed caption feed. The system is described as including the machine learning of another input feed that includes at least one portion of broadcast commentary or a change of camera view in the input feed.

“In embodiments, a method of enabling a mobile app that allows user interaction with video content involves taking at least one feed and processing it through a spatiotemporal patterns recognition algorithm that uses machine-learning to understand an event in the at minimum one video stream. Understanding includes the identification of context information related to the event as well as an entry in a relation library that at least details a relationship between at least two visible features within the at least one feed. The method involves automatically extracting the content that displays the event and associating it with the context information. The method also includes creating a video content structure that contains the context information. A mobile application is also provided that allows a user interaction with the video content structure. Based on context information, the user interface can be configured to allow for interaction.

“In embodiments, the interface to the user interface is a touch screen interface. The user interface in embodiments allows a user to add content to the video feed. The content element in embodiments is at least one of a machine-understanding metric or a graphic element. The user interface in embodiments is designed to allow the user to select content that pertains to a specific player in a sporting event. The user interface of embodiments allows the user to select content that relates to the context in which two players are playing in a sporting event.

“In embodiments, the process includes taking at most two video feeds from different times periods. The machine learning facility determines the context that includes a similarity among at least one player and a plurality play in the at minimum two feeds. The user interface is designed to allow the user to select one of the plays and players to get a video feed that illustrates the comparison. The user interface can be used to edit, cut and share a video clip that contains the video data structure.

“In embodiments, the video feed includes 3D motion camera data from a live sporting venue. The machine learning facility in embodiments increases the ability of the method to understand by ingesting multiple events that have been previously identified. The machine learning is used to help understand the event. It also includes events in position tracking data that has been collected from at least one of the video feeds and a chip-based player track system. Understanding is based on at most two of spatial configuration, relative movement, and projected motion of at minimum one player and an item in a game.

Click here to view the patent on Google Patents.