Invented by Asim Roy, Arizona Board of Regents of ASU

The market for Method and Apparatus for Machine Learning at Large Scale is experiencing significant growth and is poised to revolutionize various industries. Machine learning, a subset of artificial intelligence, has gained immense popularity in recent years due to its ability to analyze large datasets and extract valuable insights. This technology has the potential to transform industries such as healthcare, finance, retail, and manufacturing, among others. Machine learning at large scale refers to the application of machine learning algorithms on massive datasets. Traditional machine learning techniques often struggle to handle large volumes of data, resulting in slower processing times and limited scalability. However, with the advent of advanced computing technologies and cloud computing, the market for methods and apparatus for machine learning at large scale has witnessed a surge. One of the key drivers of this market is the increasing availability of big data. Organizations are generating massive amounts of data every day, and they are realizing the potential of leveraging this data to gain a competitive edge. Machine learning at large scale enables businesses to process and analyze these vast datasets, uncovering patterns, trends, and insights that were previously hidden. This, in turn, allows companies to make data-driven decisions, optimize operations, and enhance customer experiences. The healthcare industry is one sector that stands to benefit greatly from machine learning at large scale. With the growing adoption of electronic health records and the digitization of medical data, there is an abundance of patient information available. Machine learning algorithms can analyze this data to identify patterns and predict disease outcomes, enabling early detection and personalized treatment plans. Additionally, large-scale machine learning can assist in drug discovery, clinical trials, and medical imaging analysis, revolutionizing the field of healthcare. Another industry that can leverage machine learning at large scale is finance. Financial institutions deal with vast amounts of data, including transaction records, customer information, and market data. By applying machine learning algorithms at large scale, banks and investment firms can detect fraudulent activities, predict market trends, and optimize trading strategies. This technology can also help in credit scoring, risk assessment, and portfolio management, leading to more accurate and efficient financial decision-making. Retail is yet another sector that can benefit from machine learning at large scale. With the rise of e-commerce and online shopping, retailers have access to an enormous amount of customer data. By analyzing this data using large-scale machine learning, retailers can personalize marketing campaigns, recommend products, and optimize pricing strategies. This technology can also help in inventory management, supply chain optimization, and fraud detection, ultimately enhancing the overall customer experience. Manufacturing is also witnessing the impact of machine learning at large scale. By analyzing sensor data from machines and production lines, manufacturers can identify patterns and anomalies, leading to predictive maintenance and reducing downtime. Large-scale machine learning can also optimize production processes, improve quality control, and enable autonomous robots for tasks such as assembly and inspection. In conclusion, the market for Method and Apparatus for Machine Learning at Large Scale is experiencing rapid growth, driven by the increasing availability of big data and advancements in computing technologies. This technology has the potential to transform various industries, including healthcare, finance, retail, and manufacturing. As organizations realize the value of leveraging large datasets, machine learning at large scale will continue to play a crucial role in driving innovation and improving business outcomes.

The Arizona Board of Regents of ASU invention works as follows

The process of analyzing patterns in data streams and taking action on them involves training data in order to create examples of training, then selecting features from the examples that are predictive for different classes of patterns. In addition, a number of artificial neural network (?ANNs?) are trained in parallel. The process continues by adding class labels to each extracted active node, classifying patterns in the data based on the class-labeled active nodes, and taking an action based on those patterns. The process then continues by adding class labels for each active node extracted, classifying data patterns based on these class-labeled nodes, and taking action based upon the classifying data patterns.

Background for Method and apparatus for machine learning at large scale

With the advent and high-dimensional big data stored in streaming format, machine learning at a large scale is required. This machine learning would benefit from being extremely fast, scalable with volume and dimensions, able to learn using streaming data, able to automatically perform dimension reduction on high-dimensional data and deployable on massively concurrent hardware. Artificial neural networks (ANNs), which are well-positioned to tackle these challenges, can be used for large scale machine learning.

Embodiments” of the invention are a way to analyze patterns in a stream of data and take action based on that analysis. The data are trained using training examples. Using the training examples, features are selected which are predictive of the different types of patterns found in the data. The data is used to train a set of Kohonen network based on selected features. Then active nodes that represent a particular class of patterns are extracted from the Kohenen nets. The extracted active nodes are then assigned classes. Then, based on these classes, you can take action.

Embodiments provide a method which can handle large-scale, high-dimensional information. Embodiments offer an online method for streaming data as well as large amounts of big data stored. Embodiments train a large number of Kohonen networks simultaneously, both during the feature selection phase and the classifier construction phase. In the end, however, embodiments only retain a small number of neurons (nodes), from the Kohonen networks, during the classifier construction phase. The embodiments then discard the entire Kohonen network after training. Kohonen networks are used by embodiments to reduce dimensionality through feature selection, and also for creating an ensemble of classifiers based on single Kohonen neuron. The embodiments are designed to take advantage of massive parallelism, and they should be easy to deploy on hardware that implements Kohonen networks. Other embodiments provide a method for handling imbalanced data. It is also known as a Kohonen network or map. The artificial neural networks introduced by Teuvo Kohonen, a Finnish professor in the 1980s, are sometimes referred to as Kohonen maps. A Kohonen map is a self organizing map (SOM), or self-organizing features map (SOFM), which is a type artificial neural network that is trained by unsupervised learning. It produces a low-dimensional, discretized representation (typically 2D), of the input space for training samples. Self-organizing map artificial neural networks are distinct from others because they employ competitive learning instead of error-correction (such as gradient descent or back propagation) and they use a neighbourhood function to preserve topological properties in the input space. SOMs are useful in visualizing low-dimensional views from high-dimensional data. This is similar to multidimensional scale. Kohonen Nets don’t require outputs to be assigned for every input vector. Instead, the inputs are connected into a grid of two-dimensional nodes or neurons. Multi-dimensional data can then be mapped on a two dimensional surface.

Streaming data” is data generated continuously from many data sources. These data are typically transmitted simultaneously and in small size, usually on the order kilobytes. Data streams include a variety of information, including log files created by users of mobile and web applications, data from e-commerce, data from financial trading floors, data from geospatial platforms, data from gene expression datasets and telemetry or sensor data collected from data center devices. These streaming data are often received in real-time or near-real-time and then processed on a record by record basis or over sliding time frames. They can be used for many analytics, including correlations and aggregations. In most cases, streaming data processing can be beneficial when new data is continuously generated. This is applicable to a wide range of industries and use cases for big data. Data analysis is usually conducted by companies, which includes applying machine-learning algorithms and extracting deeper insights from data. The stored data is different. It has been collected and saved in permanent memory(s) and storage devices for retrieval and later processing by a computer platform with access to these permanent memory(s).

1. INTRODUCTION

The influx of streaming and big data has caused major changes in the machine learning industry. Machine learning systems are under more pressure than ever before, with the need to quickly learn from large amounts of data and to automate machine learning in order to reduce the involvement of experts (humans) and to deploy machine learning on highly parallel hardware. Traditional artificial neural network (ANN) algorithms (?neural network algorithms?, ?neural net algorithms?) These algorithms have many features that meet the demands of big data. They can therefore play a major role in the transformations taking place. The learning mode for many neural network algorithms, for example, is incremental learning online?a mode which does not require simultaneous data access. This learning mode not only solves many of the computational problems associated with learning using big data but also eliminates the headaches of accurately sampling large volumes of data. This makes neural network algorithms highly scalable, and they can learn from any data. This type of learning can also be used when dealing with streaming data. Very little or no data is stored, and the learning system will only take a quick look at what’s being sent through the system.

Neural networks algorithms also have an advantage in that they use very simple computations which can be highly parallelized. They can exploit massively parallel computing facilities in order to achieve very rapid learning and response rates. Parallel computations are used in many neural network implementations on graphics processor units (GPUs). Also, neuromorphic hardware is becoming available, specifically designed for neural network implementation. In embodiments of this invention, Kohonen network, or Kohonen Nets, are a single layer net that has its own hardware implementation. In general embodiments of the present invention are hardware implementations for neural network algorithms that handle high-speed streaming data. These hardware implementations are also capable of processing big data stored in a very quick manner. “All of these neural network algorithms have the potential to be the backbone for machine learning in an era of streaming and big data.

The invention provides a novel and new neural network learning method which (1) can be parallelized on different levels of granularity (2) deals with high-dimensional data by class-based feature selection (3) creates an ensemble classifiers using Kohonen neuron (or Kohonen nodes?) “Embodiments of the invention provide a new and novel neural network learning method that (1) can be parallelized at different levels of granularity, (2) addresses the issue with high-dimensional data through class-based feature selection, (3) learns an ensemble of classifiers using selected Kohonen neurons (or Kohonen?nodes?)

Referring to FIG. The method 500 takes a large volume of data (for example, from a high-dimensional stream at 505, or a historical store), and trains it to create examples. In one embodiment, for example, the method 500 trains a plurality of Kohonen networks in parallel with streaming information to create representative data points (also referred to by other names such as training samples and training examples) at 510. It should be noted that the stored data may also be received from a permanent or memory accessible by a computing platform, whether it is a software-based platform that executes code to access the permanent or memory, or from the permanent or memory accessible by the computing platform. In one embodiment, class-based features are selected at 515 using Kohonen networks. The basic criteria for selecting features in each class are (1) that the method compacts the class and (2) that the method maximizes average distance to the other classes. Once the class-specific features are selected, the method discards all Kohonen networks that were used to create training samples. At 520, in a second phase of the method, it constructs multiple new Kohonen networks in parallel, in different feature space, again based on selected class-based attributes. After these Kohonen networks are trained at 525 the method only extracts the active neurons. The method adds class labels for each active neuron and creates a group of Kohonen cells to be classified. The method discards all Kohonen networks and only keeps a few active Kohonen nodes from different Kohonen Nets. The set of active nodes with class labels can be used at 530 to classify patterns within the data and at 535, some action is taken based on these classified patterns.

In imbalanced data problems such as fraud detection there are very few data points available for some classes but many data points are available in other classes. Classification algorithms have always had a difficult time dealing with imbalanced problems. Dealing with streaming versions of imbalanced information is especially challenging. A second embodiment of the invention is a method for handling imbalanced data by creating a layer of Kohonen Nets. This is described below in greater detail.

Section 2 of this description provides a brief overview of concepts that are used by embodiments of the present invention, including hypersphere nets and class-specific features selection. According to embodiments, section 3 describes an algorithm which uses Kohonen networks in parallel for selecting class-specific features from streaming data. According to different embodiments of this invention, sections 4 and 5 describe how an ensemble hypersphere net is constructed by using neurons from various Kohonen networks. The section 6 of the invention presents results from a computational model for a number of high-dimensional problems. Section 7 presents an embodiment of the invention, including a method for dealing with imbalanced data problems as well as some computational results. “Section 8 discusses the hardware implementation of embodiments and in Section 9, conclusions are discussed.

2. “2.

Embodiments use a method to create hypersphere classification networks, as shown in embodiment 100 of FIG. By constructing Kohonen networks from streaming data within these reduced feature space, it is possible to construct a 1 in reduced feature areas. In FIG., the general architecture of a Kohonen map or self-organizing network (SOM) is shown. At 200 and 205. For clarity, in the network shown at 200 only three connections have been shown. In some embodiments, all Kohonen networks are discarded at the end of the process and only selected Kohonen neuronal connections from these nets remain as hyperspheres. In one embodiment, all Kohonen networks can be built in parallel in two phases: the feature selection phase, followed by the classifier construction phase.

2.1 Hypersphere Classification nets

As shown in FIG. A hypersphere classification network 100 has a hidden layer as well as an output layer. This shallow architecture is much faster than multiple hidden layer nets, particularly when it comes to computing speed.

A prior art technique constructs hypersphere networks in an offline mode. Kohonen networks are used by embodiments of the invention to construct hypersphere nets online. One embodiment builds Kohonen networks in reduced feature space with streaming data after selecting class-specific features. After training the Kohonen Nets, an embodiment 300 adds class label to each neuron (or “node”). As shown in FIG. 3 . The embodiment assigns a neuron to a class if the majority of streaming data points that it is the best or winning neuron for belong to that class. Radius of activation for such a neuron equals the distance between the farthest data points of the class to which it is assigned and in which the neuron also wins. The neurons that have not been assigned to a class are discarded. The main concepts for the process of creating hypersphere nets using Kohonen networks have been described so far. This description provides further details on embodiments of the present invention.

2.2 Class Specific Feature Selection and Dimensionality Reduce

Dimensionality reduction is one of the major challenges that machine learning systems face when dealing with high-dimensional streaming data. Many of the methods used in the past to extract features, like Principal Component Analysis and Linear Discriminant Analysis, do not work well with high-dimensional data. In recent years, a number of prior art methods for online feature selection as well as feature extraction from high-dimensional streaming datasets have been developed. In one prior art method, the online learning problem is a sequential learning problem. However, the number of features that can be used by the learning system are fixed. In other prior art methods, features are streamed one by one and the system must select the best features. All the examples for training are available prior to the start of the training. Another approach proposes two methods of dimensionality reduction using the orthogonal centroids algorithm: an online, incremental method for feature extraction and a offline method for feature selection. Another prior art proposal proposes an extension of the Maximum Margin Criterion to include streaming data. One prior art approach proposes an online version Isometric Feature Mapping, a nonlinear dimensionality-reduction method.

One prior-art approach presents a distributed parallel feature selection algorithm that preserves variance within data. It is capable of performing both supervised feature selection (based on user input) and automatic feature selection (based on machine learning). For large data sets, it uses data partitioning. Another approach proposes an algorithm that is highly scalable for feature selection in logistic regression. This can be parallelized using both records and features within a Map Reduce framework. This approach ranks features based on the logistic regression coefficients.

However none of these approaches is a class-specific feature-selection method for streaming data although the idea of projected subspaces (i.e. class-specific extracted characteristics) has been around for a while. Prior art methods have recently used the concept of class-specific features selection in ensemble learning. In one prior art method, a subset was used in the class-specific classifiers. “However, none of the prior art methods is appropriate for streaming data.

Embodiments use class-specific features for dimensionality. It is advantageous to preserve the original features in a problem because they often have meaning or interpretation. This meaning is lost when extracted or derived. The algorithm selects class-specific features by finding separate feature sets that are best suited to distinguish each class from other classes. This criteria for identifying class-specific features are similar to those used in LDA or Maximum Margin Criterion, which are methods of feature extraction. LDA, MMC and other feature extraction methods that are similar maximize the scatter between classes and minimize scatter within classes. These methods aim to minimize the scatter within a class while maximising the distance between the class centers. The embodiments of the invention are not based upon a method for extracting features, but they do share a similar idea. The feature selection criteria is similar to a prior art method that preserves variance within the data.

Click here to view the patent on Google Patents.