Invented by Harish Doddi, Jerry Xu, Datatron Technologies Inc

The market for systems and methods for modeling machine learning and data analytics has witnessed significant growth in recent years. With the increasing adoption of artificial intelligence (AI) and big data analytics across various industries, the demand for advanced modeling tools and techniques has surged. Machine learning and data analytics have become crucial for businesses to gain insights from their vast amounts of data and make informed decisions. These technologies enable organizations to uncover patterns, trends, and correlations that were previously hidden, leading to improved operational efficiency, customer satisfaction, and competitive advantage. The market for systems and methods for modeling machine learning and data analytics is driven by several factors. Firstly, the exponential growth of data generated by businesses and individuals has created a need for powerful tools to process and analyze this data effectively. Traditional methods of data analysis are no longer sufficient to handle the massive volumes, variety, and velocity of data being generated. Secondly, the advancements in AI and machine learning algorithms have made it possible to develop sophisticated models that can learn from data and make accurate predictions or recommendations. These models can be applied to various domains, such as finance, healthcare, marketing, and manufacturing, to solve complex problems and optimize processes. Furthermore, the increasing availability of cloud computing and storage solutions has made it easier for businesses to access and utilize machine learning and data analytics tools. Cloud-based platforms offer scalability, flexibility, and cost-effectiveness, allowing organizations of all sizes to leverage these technologies without significant upfront investments. The market for systems and methods for modeling machine learning and data analytics is highly competitive, with numerous vendors offering a wide range of solutions. These solutions can be categorized into three main types: software platforms, open-source frameworks, and specialized tools. Software platforms provide end-to-end solutions for data modeling, starting from data preprocessing and feature engineering to model training and deployment. These platforms often include user-friendly interfaces, drag-and-drop functionality, and automated workflows, making it easier for non-technical users to build and deploy models. Open-source frameworks, such as TensorFlow and PyTorch, have gained popularity due to their flexibility, extensibility, and large developer communities. These frameworks provide libraries and tools for building and training machine learning models, allowing developers to customize and experiment with different algorithms and architectures. Specialized tools focus on specific aspects of machine learning and data analytics, such as data visualization, anomaly detection, or natural language processing. These tools often offer advanced features and algorithms tailored to specific use cases, providing users with more specialized and efficient solutions. The market for systems and methods for modeling machine learning and data analytics is expected to continue growing in the coming years. As more businesses recognize the value of data-driven decision-making and invest in AI and analytics capabilities, the demand for advanced modeling tools will increase. Moreover, the ongoing research and development in the field of machine learning and data analytics will lead to the emergence of new algorithms, techniques, and tools. These advancements will further drive the market growth and enable businesses to extract even more value from their data. In conclusion, the market for systems and methods for modeling machine learning and data analytics is witnessing rapid growth due to the increasing adoption of AI and big data analytics. The demand for advanced modeling tools and techniques is driven by the exponential growth of data, advancements in AI algorithms, and the availability of cloud computing solutions. As businesses continue to invest in data-driven decision-making, the market is expected to expand further, offering more innovative and specialized solutions.

The Datatron Technologies Inc invention works as follows

Systems and Methods for Implementing and Using a Data Modeling and Machine Learning Lifecycle Management Platform that Facilitates Collaboration among Data Engineering, Development and Operations Teams and Provides Capabilities to Experiment using Different Models in a Production Environment to Accelerate the Innovation Cycle. The platform is instantiated by stored computer instructions and processors. Modules include a user-interface, a collector for accessing data sources from various sources, a work flow module to process data received from data sources. A training module executes stored computer instructions in order to train one or multiple data analytics model using the processed data.

Background for Systems and Methods for Modeling Machine Learning and Data Analytics

Many organizations and individuals use digital data to improve operations or assist in decision-making. Many businesses use data management technology to improve the efficiency of business processes such as processing transactions, tracking inputs or outputs, pricing products, and marketing. Many businesses, for example, use operational data to assess the performance of business processes and to determine how to adjust processes.

The sheer volume of data that is available from transactional logs and social media as well as web traffic, among other sources, provides a number of opportunities for organizations to become data-driven. The ability to learn and model from data allows organizations to adapt to ever-changing environments and situations. To capitalize on these data, it isn’t as simple. It often takes highly-skilled scientists to create and test models. And the process of analyzing large, dynamic, and diverse datasets is time consuming, expensive, and tedious. The process of consuming and changing data is a major overhead that can impact the timelines for development and implementation. Many business entities cannot model the business problem and build and test models that solve the problem without highly specialized expertise.

It is necessary to establish a systematic analytical platform to develop, deploy, and manage models in relation to the problem to be resolved.

Data analysts can build predictive models using analytic techniques, computational infrastructures and electronic data including operational and evaluation data. The invention provides a platform for machine learning lifecycles management to help with the development of models, from feature engineering through to production deployment. The platform facilitates the collaboration between data engineering, development, and operations teams. It also provides capabilities for experimenting with different models in a manufacturing environment to accelerate innovation cycles.

According to a first aspect of the present disclosure, a system for executing data analytics comprises one or more processors and a memory coupled with the one or more processors, wherein the processor executes a plurality of modules stored in the memory. The modules include a user interface, a collector module, a workflow module, a training module a predictor module, and a challenger module. The user interface module is used to model and manage a data analytics plan, and displays various processing nodes. The processing nodes include one or more collector nodes, one or more workflow manager nodes, one or more training nodes, one or more predictor nodes and one or more challenger nodes. The collector module provides access to data sources, each data source providing data for use in training and executing the data analytics plan, and each instantiation of a collector module is presented as a collector nodes in the user interface. The workflow module processes the data prior to its use in training and executing the data analytics plan, and each instantiation of a workflow module is presented as a workflow manager node in the user interface. The training module executes stored computer instructions to train one or more data analytics models using the processed data, and each instantiation of a training module is presented as a training manager nodes in the user interface. The predictor module produces predictive datasets based on the data analytics models, and each instantiation of the predictor modules is presented as a predictor nodes in the user interface. The challenger module executes multi-sample hypothesis testing of the data analytics models, and each instantiation of a challenger module is presented as a challenger nodes in the user interface.

In some embodiments, a system includes a publication module that publishes results calculated by the prediction module. Each instantiation is displayed as a node for a publisher in the user interface. The data sources can be static data, accessed via an API to an external computer system. In other cases, they are streaming data, providing real-time transactional data. Both static and real-time sources of data can be used in some cases to feed the data analytics model.

In some embodiments, a workflow module implements tasks. Each task performs a discrete data operation, such as filtering, aggregating, selecting, parsing or normalizing data. In some versions, the subordinate user interface constructs and presents a directed graph. The tasks are represented by the vertices of the graph while the possible paths for the data are represented as the edges. In some cases an instance of a workflow module can be assigned to a processor so that the tasks associated with this instance are performed on the processor. These assignments can be made by the user or automatically selected based on a data processing load estimate for a series of tasks.

The allocation of data may be based on designating a percentage of data to each of the models. In some cases, a subset is allocated to one model, while another subset to another. The allocation of data can be done by allocating a certain percentage of data to the different models, or allocating a large portion of data to one model, and then, when the first model fails to perform, sending a large amount of data to another model, or allocating a substantial amount of data to two models in parallel.

Another aspect of the invention is a method to perform data analytics. On a user-interface for managing and modeling a data analytics, the method displays various processing nodes including training nodes. The method includes accessing data, with each data source providing information for use in executing and training the data analysis plan. Each data access step is presented as a user interface collector node.

In some embodiments the results of the predictor module’s calculations are published and displayed as publisher nodes in the user interface. The data sources can be static sources that are accessed via an API to an external computer system or real-time streaming data sources. Both static and real-time sources of data can be used in some cases to feed the data analytics model.

In some embodiments, tasks are used to represent discrete data operations, such as filtering, aggregation and selection of data, parsing data and/or normalization. In some versions, an interface subordinate is provided for constructing and presenting a directed graph. The graph includes tasks that are represented as vertices and data paths represented as edges. In some cases the tasks can be assigned to one or more processors so that all the series tasks for the workflow module will run on this processor. These assignments can be made by the user or automatically selected based on estimated data processing loads of the series tasks.

The allocation of data may be based on designating a percentage of data to each of the models. In some cases, a subset is allocated to one model, while another subset to another. The allocation of data can be done by allocating a certain percentage of data to the different models, or allocating a large portion of data to one model, and then, when the first model fails to perform, sending a large amount of data to another model, or allocating a substantial amount of data to two models in parallel.

The subject matter described herein relates, in another aspect to an article. The article includes non-transitory, computer-readable media with instructions that, when executed, cause one or more processors to perform various operations, including displaying on a user-interface for modeling and managing an analytics plan, different processing nodes including collector nodes. Each data source provides data for use in both training and execution of the analytics plan. Presenting each processing step in the interface as a workflow management node; Training data analytics using the processed data, and presenting every training step in the interface as a training manager

Elements from embodiments described in relation to a particular aspect of an invention can be used for various embodiments of a different aspect of the same invention.” It is envisaged, for example, that features of dependent claim derived from an independent claim may be used in the apparatus, systems and/or method of any other independent claim.

Referring to FIG. In some embodiments, a platform and system 100 for building, managing, and analyzing different predictive data comprises, on a general basis, a collector 104, a layer of interface 108, a layer of execution 112, a layer 116, and a storage 120. The collector service may perform data collection tasks through a connection to one or multiple data sources 122. Data sources 122 can be static historical data, such as web logs or transaction logs. Traditional relational databases, such as Oracle and MySql, schemaless databases, like MongoDB, or cloud services that provide access to data through an API (application programming interface) are all possible. Sources such as Salesforce Marketo, Stubhub and others. In either a push or pull connection, the connector service is able to connect with data sources 106. In push mode the data is sent to the collector service by the client application via an API. Pull mode is when the collector service connects with the data source at a predefined interval.

The interface layer 108 is responsible for providing the platform’s user-facing features. The interface layer 108 also includes a dashboard (124), which facilitates the organization and creation of projects in the platform 100. It also allows a user the ability to view log files and monitor the progress of tasks for troubleshooting. Dashboard 124 can also be used to monitor machine health and utilization information. The interface layer 108 includes a module 128 for publishing and challenging data generated by the execution of models.

The execution layer 112 consists of a reporting service, a publishing server, a scheduler, a job-server, and a pool of workers. The dashboard 124 is supported by the reporting service 132, which connects in read-only mode to the dataset layer to collect and deliver data. The publishing service receives input from client machines and data sources, and returns results as the models are executed. Publishing service 136 operates in two modes: direct serving and caching mode. In direct serving mode, the training node is paired up with the publisher node. In caching mode the training node pairs with a predictor, who calculates system state and output periodically and caches them. The results are then provided to the publisher node on request. The caching mode can also be used with models that incorporate a streaming data service, which provides near-real-time data to the model. This requires frequent processing or recalculations.

Then, refer to FIG. The scheduling service 140 schedules tasks (such as importing data, executing software programs for models implementation, etc.). The scheduling service 140 schedules tasks to be performed by either the worker pool 148 or the spark jobserver 144. The spark jobserver 144 coordinates the assignment of tasks to spark machines 152. The worker pool 148 can include different types of hardware, operating system, and functional software. Tasks may be assigned a specific computational element in the pool 148 depending on the parameters of the task, the estimated workload etc. If a task involves highly computational deep-learning or machine-learning models, it may be assigned to graphics processing units (GPUs) machines specially designed for these tasks.

The dataset layer includes a dataset 156 which keeps track of the different datasets that are used as sources in tasks. The dataset service 156, for example, maintains schema, revision history and parameters such as location, size, owner, source, etc., of the different data sources 106 that the system 100 uses. The data is stored in one or multiple datasets 160 on the storage layer 120.

In one example of a system deployment, each computing device could have four or more processor cores and enough memory and storage (e.g. 8 GB memory, 60 GB hard disk). In a production deployment, approximately 25 nodes may be used with CentOS or RedHat as the operating system. One arrangement calls for five nodes to be used as a utility cluster. These are used for bootup, system instantiation and management. Two nodes can be used to display the application portal (described further below), two for reporting service (for example), two for scheduling service 140 and four or more for worker pool 148, which executes code to implement different data modes. Four for dataset service (156) to upload and/or download content from and to various cloud-based data sources and/or local ones, and two for connector service (104), four for publishing service (136), and one for spark job server (144, if using spark jobs). If the system has been implemented in a hosted environment by a third party (e.g. AWS, Azure etc.), then there will be no need for a spark job server. Each machine is equipped with the credentials necessary to access virtual machines hosted by these services. Each machine will be given access credentials for the storage layer 120 if the data is locally stored (using Hadoop, or another similar data storage system). Some implementations split the machines into private and publicly clusters by using DCOS terminology and methods. In these cases, dashboard 124 and the publisher 128 will be set up to be public and accessible via an internal cloud while the rest of the machines will remain private and unaccessible.

FIG. The dashboard in Figure 2 is an initial workspace 200 that allows users to view and manage projects within system 100. Each line 204 represents a different project and shows various parameters, such as the project name, creation date, owner, most recent modification date, user, and other options, like data export or deletion. The workspace of a project is opened by selecting one of the projects on the dashboard. The model code for new projects can be uploaded and assigned a unique URL to a code repository like Git. Each workspace contains one project.

Click here to view the patent on Google Patents.