Invented by Mohammed Shoaib, Jie Liu, Swagath Venkataramani, Microsoft Technology Licensing LLC
The Microsoft Technology Licensing LLC invention works as followsScalable effort machine learning” can automatically and dynamically adapt the amount of computation applied to input data depending on its complexity. Fixed-effort machine-learning, on the other hand, uses a uniform approach in applying a classifier algorithm for both simple and complex data. Scalable-effort machines learning includes, among others, a classifier that can be organized as a sequence of classifier stages with increasing complexity (and precision). In a first stage, machine learning models that are relatively simple may be used to classify relatively simple data. The next classifier stage has increasingly complex machine-learning models that can classify complex data. Machine learning algorithms for scalable effort can distinguish between data depending on the complexity of data.
Background for Scalable effort classifiers for energy efficient machine learning
Data-driven or supervised algorithms for machine learning are becoming important tools in the analysis of information on portable devices, cloud computing, and other devices. Machine learning is a collection of algorithms that learn automatically over time. These algorithms are based on math and statistics, which can be used to diagnose problems, classify entities and predict events. These algorithms are used in a variety of applications, including semantic text analysis and web search. They can also be used for speech and object recognition and speech and object identification. Supervised machine learning algorithms are usually divided into two phases, training and testing. In the training stage, input examples that are typical of the data are used to create decision models. The learned model is then applied to new instances of data to determine different properties, such as similarity and relevance.
This disclosure describes techniques and architectures of a machine learning system that can dynamically and automatically adjust the amount applied to input data depending on its complexity. An amount of effort, for example, corresponds to a certain amount of computing power, energy or resources, such as the area of the hardware (e.g. footprint). It is therefore important to avoid a “one-size-fits all” approach when applying a classifier algorithm for both simple and complicated data. SE machine learning uses biased and cascaded classifiers. Cascaded classifications can be set up as a series multiple classifier stages with increasing complexity (and accuracy). A first classifier stage, for example, uses the simplest machine-learning models and can classify input data which is relatively simple. The subsequent classifier stages use increasingly complex machine-learning models to classify complex input data. This approach offers a number benefits, such as faster computations and reduced energy consumption, compared with fixed-effort machinelearning.
This Summary is intended to present a number of concepts that will be further explained in the detailed description. This summary is not meant to identify the key features or essential elements of the claimed matter. Nor is it meant to be used to determine the scope of claimed subject matter. The term “techniques” is used. The term?techniques’ is used throughout this document.
Complexity is a variable factor in input data to computing systems.” The complexity of data can be measured, for example, by the computing time (e.g. effort) or cost required to process it. In an 8-bit multiplier system, multiplying 23 by 114 should be more difficult than computing the product 2 and 1. In a similar way, in a second example, compressing mainly blue sky images should be easier than compressing images that contain crowded streets. The typical computational systems don’t dynamically adjust to the complexity of input data. The same algorithm, for instance, is used to process both the image of a mostly blue sky and the image of a busy street. In these cases, the algorithm will be configured to work optimally with either high-complexity or average-complexity data. In the first configuration, it is common for computing efforts (e.g. cost) to be “wasted”. On all input data except the most complex, computing effort is wasted. “In the second configuration, computational effort can be wasted for input data with a low level of complexity while high levels or uncertainty in computation may occur when input data are above average complexity.
In various embodiments, machine learning techniques and architectures use scalable-effort (SE), which, among other things adjusts automatically and dynamically the amount of computation effort applied to the input data depending on its complexity. In this context, effort is the time or energy spent by a computer device, the area needed to implement a computing function, etc. Fixed-effort machine-learning, on the other hand, uses a uniform approach for applying a classifier algorithm across simple and complex data. SE machine learning includes, among others, biased classifiers, and cascaded classesifiers. Cascaded classifications can be set up as a series multiple classifiers stages with increasing accuracy and complexity. A first classifier stage, for example, may use relatively simple machine-learning models that are able to classify relatively simple data. The next classifier stage has increasingly complex machine-learning models that can classify complex data. The complexity of classifier stages can be related to the computing costs, for instance.
SE machine-learning includes algorithms that can distinguish between data based on the complexity of the data. SE machine learning can expend computational energy (e.g. computational time) proportional to data difficulty. This approach offers a number benefits, such as faster computations and reduced energy consumption, compared to machine learning with fixed effort.
In general, fixed-effort machinelearning operates in a phase of training, where data examples typical to the domain are used to create a decision model which characterizes data. During the training phase, SE machine learning builds a few relatively simple decision models using subsets. SE machine learning can apply one or several decision models during test time depending on the input data’s difficulty.
To illustrate the benefits of SE machine learning, we describe a traditional approach which can be used with a fixed effort machine learning. For example, a binary support vector machine (SVM), a classifier that uses input training data (hereafter called training instances), may use a specific algorithm to build a model (decision boundary). The decision boundary can be used to divide data into two classes or categories in a feature space. The test instances may be classified into one of two classes at test time after training, depending on their location and distance from the decision boundaries in the feature space. The computational effort required to process each test instance (in energy and time) depends on both the complexity of the boundary and the distance and location of the test instances. Non-linear boundary decision costs are typically higher than linear boundaries. In general, a non-linear decision boundary is used to accommodate all levels of complexity in test cases. This can lead to a relatively high computational cost for both complex and simple test instances.
SE machine learning generates multiple models of decision by selecting different levels of complexity for the training cases. This type of selecting is called model-partitioning and can reduce computing costs because all data instances do not need to be processed by the exact same nonlinear decision model.
The amount of time and energy saved by using model partitioning depends on the particular application. In many applications, the test cases are relatively simple. When detecting motion using a surveillance camera, for example, the majority of video frames only contain relatively static objects. “In another example, about two thirds of the handwriting recognition data could be located far away (and therefore relatively simple) from a boundary.
In some embodiments, a test case’s complexity may be implicitly determined at runtime. Test instances can be processed using a series of decision models, starting with the simplest and progressing to more complex models. The confidence level for a class label resulting from each model (e.g. class probability) can be determined after the application of the models in the sequence. The output class label generated by the current model will be considered final if the confidence level exceeds a certain threshold value. In this situation, the test instance will not be processed by any of the subsequent models. So, relatively simple test instances will be processed by the first few (least complicated) models, while complex test instances will be processed by a larger number of (increasingly sophisticated) models. This technique is a resource-management technique that allows for the scaling of computational effort during runtime.
The following is a description of the various embodiments with respect to FIGS. 1-14.
The environment described below is only an example, and does not limit the claims in any way to a particular operating environment. “Other environments can be used without compromising the spirit and scope” of the claimed subject.
FIG. FIG. 1 illustrates an example environment in which embodiments of SE machine learning described herein may operate. In some embodiments the environment 100 may include computing devices 102. Computing devices 102, for example, may include devices from 102 a to 102 e. However, they are not limited by the device types shown. Computing devices 102 may include any device that has one or more processors 104 connected via a bus to memory 108 and an input/output interface. Computing devices 102 include personal computers like desktop computers 102a, laptops 102b, tablets 102c, telecommunications devices 102d, PDAs 102e, electronic books, wearable computers and automotive computers. Computing devices 102 may also include devices for business or retail, such as server computers, thin client terminals and/or workstations. Computing devices 102 may include components that can be integrated into a computing device or appliances. In some embodiments some or all the functionality described by computing devices as being performed by them may be implemented by a server, a peer device, or cloud computing resources. In some embodiments, computing devices 102 can include an input port for receiving an input value with a complexity level and a memory device that stores a plurality machine learning models. Machine learning models have different abilities to classify input values. The computing device 102 can further include a processor that applies one or more machine learning models, based at least in part on the complexity of the input value.
In certain embodiments, such as the device 102d, memory (108) can store instructions that can be executed by the processors 104. These include an operating system 112, a module for machine learning 114, or programs 116, which are loaded and executable on processors 104. One or more of the processors 104 can include central processing units, graphics processing units, video buffer processors and more. In some implementations the machine learning module comprises executable code that is stored in memory and can be executed by processors 104. This allows computing device 102 to collect data, either locally or remotely, using input/output system 106. Information may be linked to one or more applications 116. The machine learning module 114 can selectively apply one of the machine learning decision models that are stored in memory (or more specifically, in machine learning 114) in order to apply input data. The selection may be based at least in part on the complexity and size of the input data.
Although certain modules are described as performing different operations, they are only examples. The same or similar functionality can be performed by more or less modules. The functions that are performed by the modules shown do not have to be localized by one device. Some operations may be performed remotely (e.g. by a server, peer device, cloud etc .).
Hardware logic components can perform some or all functionality described in the present invention. Examples of hardware logic components include, but are not limited to, Field-programmable Gate Arrays, Program-specific Integrated Circuits, Program-specific Standard Products, System-on-a Chip systems, Complex Programmable Logic Devices, etc.
In some embodiments, the computing device 102 may be equipped with a camera that can capture images, video or audio. Input/output module can include a camera or microphone. Memory 108 can include one or more computer-readable media.
Computer readable media can include computer storage media or communication media. Computer storage media include volatile and nonvolatile media, removable and not removable, implemented in any technology or method for storing information, such as computer-readable instructions, datastructures, program modules or other data. Computer storage media include, but are not limited to: phase change memory, static random-access memories (SRAM), and dynamic random-access memories (DRAM), as well as other types of random access memory, read-only storage (ROM), electrically eraseable programmable memory (EEPROM), Flash memory or other memory technologies, compact disk read only memory (CDROM), digital versatile discs (DVD), or other optical storage.
In contrast to this, communication media is a modulated signal that contains computer-readable instructions, datastructures, program modules or other data, in the form of a modulated transmission mechanism, such as carrier waves or another type. Computer storage media, as defined herein does not include communications media. Memory 108, in various embodiments is an example computer storage media that stores computer-executable instruction. The computer-executable instruction(s), when executed by the processor(s), 104, configures the processor(s), among other things to execute an application and gather information associated with the applicaiton. Information may be collected by computing device 102. The computer-executable instruction can configure the processors to normalize the feature output of a model for machine learning accessible by the application, based at least in part on the local information collected by the client device.
In various embodiments, the input device for input/output interfaces 106 may be a direct touch device (e.g. a touchscreen), an indirect touch device (e.g. a touchpad), or an indirect input (e.g. a keyboard, camera, camera array, mouse, etc.). “In various embodiments, an input device of input/output (I/O) interfaces 106 can be a direct-touch device (e.g., a touch screen), an indirect-touch device (e.g. : a touch pad), or an indirect input device such as ;……………??a mouse, keyboard,…Click here to view the patent on Google Patents.