Invented by Pavan Kumar Reddy Bedadala, Praveen VEERAMACHANENI, Commvault Systems Inc

The market for machine learning-based data object storage is rapidly expanding as businesses recognize the value of leveraging artificial intelligence (AI) to manage and analyze their data. This innovative technology combines the power of machine learning algorithms with the scalability and flexibility of object storage systems, revolutionizing the way organizations store, organize, and retrieve their data. Traditional data storage methods often struggle to handle the massive amounts of data generated by businesses today. As data volumes continue to grow exponentially, organizations are seeking more efficient and intelligent ways to manage their data. Machine learning-based data object storage offers a solution to this challenge by automating data management tasks and providing advanced analytics capabilities. One of the key advantages of machine learning-based data object storage is its ability to automatically classify and categorize data. With the help of machine learning algorithms, data can be organized based on its content, context, and relevance. This enables businesses to quickly locate and retrieve specific data objects, saving valuable time and resources. Moreover, machine learning algorithms can continuously learn and adapt to changing data patterns, ensuring that data is always organized and up to date. Another significant benefit of machine learning-based data object storage is its advanced analytics capabilities. By leveraging machine learning algorithms, businesses can gain valuable insights from their data, uncover hidden patterns, and make data-driven decisions. These insights can be used to optimize business processes, improve customer experiences, and identify new revenue opportunities. Machine learning-based data object storage also enables businesses to perform predictive analytics, forecasting future trends and behaviors based on historical data. Furthermore, machine learning-based data object storage offers enhanced security and data protection. Machine learning algorithms can detect and prevent security threats, such as unauthorized access or data breaches, in real-time. By continuously monitoring data access patterns and user behavior, machine learning-based data object storage systems can identify suspicious activities and take immediate action to mitigate risks. This ensures that sensitive data remains secure and protected from potential cyber threats. The market for machine learning-based data object storage is witnessing significant growth, driven by the increasing demand for efficient data management and analytics solutions. Businesses across various industries, including healthcare, finance, retail, and manufacturing, are adopting this technology to gain a competitive edge and improve their operational efficiency. Moreover, advancements in AI and machine learning technologies are further fueling the market growth, enabling more sophisticated and intelligent data object storage solutions. In conclusion, the market for machine learning-based data object storage is expanding rapidly as businesses recognize the benefits of leveraging AI to manage and analyze their data. This technology offers automated data management, advanced analytics capabilities, enhanced security, and data protection. As organizations continue to generate massive amounts of data, machine learning-based data object storage provides an efficient and intelligent solution to store, organize, and retrieve data, enabling businesses to make data-driven decisions and gain a competitive advantage in today’s data-driven world.

The Commvault Systems Inc invention works as follows

An information-management system is described herein which uses machine learning to predict the data to be stored in a secondary device, and/or the time to do so. A client computing device, for example, can be configured initially to store data on a secondary device in accordance with one or more storage policy. Media agents in the information system can monitor the data usage of the client computing device and use the data usage to train an ML model for data storage. The data storage model can be trained to predict what data should be stored in a secondary device, and/or when the data should actually be stored. The client computing device may then be configured to use a trained data storage model instead of storage policies to determine what data to store on a secondary device and/or at what time to do the storage.

Background for Machine learning-based Data Object Storage

Businesses recognize the commercial value and look for cost-effective, reliable ways to safeguard the data stored on their computers networks. This will minimize the impact on productivity. As part of its daily, weekly, and monthly maintenance program, a company may back up important computing systems like databases, file servers or web servers. A company might also protect the computing systems of its employees, such those used by marketing departments, accounting departments, engineering departments, and so on. Companies continue to look for innovative ways to manage data growth due to the ever-growing volume of data under their control. This includes migrating data to cheaper storage over time, reducing redundant information, pruning lower priority data, and so on. Companies increasingly see their stored data as an asset and seek solutions to leverage it. Data analysis, information management, enhanced data presentation and access, and other such capabilities are becoming more in demand.

This invention describes an information management system which uses machine learning to predict the data that should be stored in a secondary device, when the data should be stored and/or the data that needs to retrieved from the secondary device. A client computing device, for example, can be configured initially to store data on a secondary storage according to one or several storage policies. Media agents in the information system can monitor the data usage of the client computing device and use the data usage data for training a machine learning data storage model. The data storage model can be trained so that it predicts, given inputs such as time and data object metadata or identification, what data should be stored in a secondary device. The client computing device may be configured to use a trained data storage model instead of storage policies to determine what data to store and when.

The media agent may also generate and store context data when data retrieval requests are received by a client computing system. This context data and/or monitored data usage data can be used to train a machine learning model for recall. The recall machine-learning model can be trained so that it predicts, given inputs such as time and metadata of the data object to be recalled, what data should be retrieved, or when the data should retrieved. The media agent or client computing device may then be configured to use a trained recall machine-learning model to determine what data to recall and/or at what time to perform the recall.

The disclosure includes a networked data management system that comprises a client computer device with one or multiple first hardware processors. When executed, the first computer-executable instruction causes the client to store at least one or two data objects according to a policy in secondary storage for the first time. The networked management system also comprises one of more computing units in communication with client computing unit, each of which has one or multiple second hardware processors. These computing units are configured with second instructions that when executed cause them to retrieve data object usage information associated with client computing unit; train a machine learning model for data storage using this data object usage; and transmit that model to the computing device so that it uses that model rather than the storage policy in determining which one of the many data objects is to be stored in secondary storage at a second after

The

The disclosure also provides computer-implemented methods for: retrieving data-object usage data that is associated with a computing client device. This computing client device has one or multiple first-hardware processors. When executed, these instructions cause the computing client device to first store one or several data objects into a secondary storage according to a policy; then, training a machine-learning (ML) data-storage model with the data-object usage data. Finally, transmitting the data-storage ML model back to the computing

The

The networked management system further comprises one or multiple computing devices in communication with the client computing machine, each of which has one or several second hardware processors. These computing devices are configured with second computer-executable instruction that, upon execution, causes the computing device to first store one or a number of data objects according to a policy. The networked management system also comprises one, or possibly more, computing devices that are in communication with the computer device. These computing devices have one, or perhaps more, second hardware processors. They can be configured with second instructions that cause them to retrieve data object usage information associated with the computing device.

The networked information system described in the preceding paragraph may include any combination of features. For example, the second computer executable instructions can cause one or more computing systems to retrieve second data object use data associated with the computing device client, retrieve the data-storage ML-model, retrain it using the second object-use data, and then transmit the retrained ML-model to the computing device client. This causes the computing device client to use at least one of either the retrained ML-model or the

The disclosure also provides a networked data management system that includes a client computer device with one or multiple first hardware processors. This client computing device is configured to run a first application which generated one or several data objects. The networked management system also comprises one of more computing systems in communication with the computer device. Each computing device has one or multiple second hardware processors.

The

The disclosure also provides a computer-implemented technique that comprises: retrieving data object usage context information from one or multiple computing devices that are configured to manage data transmission between a client device and a second storage device; the client device is configured to run a first app that created one or several data objects; training an ML (machine learning) model with the data object context information; and then transmitting the ML (machine learning) model to the device so that it uses the ML model for the first time to determine

The

The networked management system further comprises one or multiple computing devices in communication with the client computing device, where the one or several computing devices each have one or two second hardware processors. The networked management system also comprises one of more computing systems in communication with the computing device. Each of these computing devices has one or multiple second hardware processors.

The networked information system described in the preceding paragraph may include any combination of the features below: the computer-executable instruction, when executed, causes the one-or-more computing devices to retrieve second data item usage recall context data associated with the client computing unit, retrieve the memory ML, retrain a recall ML using the second object usage context data, and identify a secondary data object within the one-or-more data objects to be recalled from the secondary storage at a first time, using the ret

As described in this document, data accessed via a client computing system can be stored on a primary storage medium. Some data may not be accessed nearly as often as others. This is because some data are not accessed as often, so they can be moved away from the primary device to make more memory available for data that is accessed more frequently. A typical information management system will use one or multiple storage policies to decide which data is transferred from a primary to secondary storage device. A client computing device, for example, can be configured to have one or more storage policy. Storage policies are usually a data structure, or another information source that contains information to define (or determine) preferences or criteria for data transfer. Storage policies may include the following: 1) what data is associated with the policy; 2) a destination where the data are stored; 3) datapath information specifying the way the data are communicated to destination; 4) the type of copy operation that will be performed at destination; and 5) retention information specifying the length of time the data should be retained.

Storage policies are not dynamic policies and may not be able to adapt to specific situations in which data generation and use occurs. A storage policy might specify that data not accessed in 90 days be stored in secondary storage devices (e.g. for backup, archive, or other data protection purposes). Some data that has not been accessed in 90 days might not be of importance to the user. However, some data may be very important and the user would prefer not to store it on the secondary storage device. The client computing device could provide a slower response if data is stored in a secondary device and then requested by the user.

Furthermore the time period defined by a storage policy might not be enough to clear sufficient memory space for the storage new data. The client computing device, for example, may generate more data during certain time periods. A greater variety of data can also be accessed during certain time periods. In this way, the policy of storage may allow for a certain amount of memory to be cleared during certain time periods but not others.

Not only are typical information systems experiencing reduced performance because of the implementation and enforcement of storage policies but they also experience decreased performance due to how data recalls have been handled. Client computing devices, for example, are not generally configured with policies that specify what stored data should be recalled and/or when the recall should occur because it is difficult to determine when an individual user might request access to a specific set of data. When data is stored, the client computing device will recall the stored data when a user requests access to it. The process of retrieving data from secondary storage devices can cause a client computing device to provide a delayed response. This delay can be increased if a user is requesting access to large amounts of data.

Accordingly described herein is an Information Management System that uses machine-learning to predict which data to store in secondary storage devices and/or the timing of the storage and/or the data to be recalled from secondary storage devices and/or the timing of the recall. A client computing device, for example, can be configured initially to store data on a secondary device in accordance with one or more storage policy. Media agents in the information system can monitor the data usage of the client computing device and use the data usage data for training a machine learning data storage model. The data storage model can be trained so that it predicts, given inputs such as time and data object metadata or identification, what data should be stored in a secondary device. The client computing device may be configured to use a trained data storage model instead of storage policies to determine what data to store and when.

The media agent may also generate and store context data when data retrieval requests are received by a client computing system. This context data and/or monitored data usage data can be used to train a machine learning model for recall. The recall machine-learning model can be trained so that it predicts, given inputs such as time and metadata of the data object to be recalled, what data should be retrieved, or when the data should retrieved. The media agent or client computing device may then be configured to use a trained recall machine-learning model to determine what data to recall and/or at what time to perform the recall.

Click here to view the patent on Google Patents.