Invented by Hetalkumar N. Joshi, Chandrashekar MARANNA, Manoj Kumar Vijayan, Commvault Systems Inc

The market for scalable copy processing of auxiliary copies in a storage management system by using media agent resources is rapidly growing as businesses continue to generate and store vast amounts of data. In today’s digital age, data is considered one of the most valuable assets for organizations, making efficient storage management systems crucial for seamless operations. Scalable copy processing refers to the ability of a storage management system to handle increasing amounts of data copies without compromising performance or reliability. Auxiliary copies, also known as backup copies, are essential for data protection and disaster recovery purposes. These copies are created and stored on separate media, ensuring that in the event of data loss or corruption, a backup is readily available. Traditionally, copy processing has been a time-consuming and resource-intensive task, often requiring dedicated hardware and personnel. However, with the advancements in storage management systems, media agents have emerged as a valuable resource for efficient copy processing. Media agents are software components within a storage management system that manage the movement of data between the primary storage and auxiliary copies. By utilizing media agent resources, organizations can significantly improve the scalability and performance of copy processing. One of the key advantages of using media agent resources for copy processing is the ability to parallelize the backup process. Instead of relying on a single media agent to handle all backup tasks, multiple media agents can be deployed to process copies simultaneously. This parallelization not only reduces the time required for backup but also allows organizations to handle larger data volumes efficiently. Another benefit of using media agent resources is the flexibility it provides in terms of hardware utilization. Media agents can be deployed on a variety of hardware platforms, including physical servers, virtual machines, or cloud-based infrastructure. This flexibility allows organizations to leverage their existing infrastructure or choose the most cost-effective option for their backup needs. Furthermore, media agent resources can be easily scaled up or down based on the organization’s requirements. As data volumes grow, additional media agents can be deployed to handle the increased workload. Conversely, if data volumes decrease or backup requirements change, media agents can be scaled down accordingly, ensuring optimal resource utilization. The market for scalable copy processing of auxiliary copies in storage management systems is witnessing significant growth due to the increasing need for data protection and disaster recovery. Organizations across various industries, including healthcare, finance, and e-commerce, are investing in robust storage management systems that can efficiently handle their backup requirements. Vendors in this market are continuously innovating and enhancing their storage management solutions to provide seamless scalability and performance. They are focusing on developing advanced media agents that can leverage the latest technologies, such as artificial intelligence and machine learning, to optimize copy processing and improve data protection. In conclusion, the market for scalable copy processing of auxiliary copies in a storage management system by using media agent resources is expanding rapidly. As organizations continue to generate and store massive amounts of data, the need for efficient backup solutions becomes paramount. Leveraging media agent resources allows organizations to achieve scalability, performance, and flexibility in their copy processing, ensuring the safety and availability of their valuable data.

The Commvault Systems Inc invention works as follows

A scalable method is described for the processing of auxiliary-copy tasks in a storage system using distributed media agents instead of a central storage manager. The enhanced media agents control and coordinate auxiliary copy jobs, and they use the storage manager for data stream reservations and job-specific metadata. A media agent may be selected by an enhanced storage manager as the ‘coordinator’. The coordinator media agent coordinates auxiliary-copy tasks with any number other media agents who act as “controllers.” The coordinator media agent’s main responsibility is to obtain data stream reservations from the storage manager, and then assign auxiliary copy jobs to controller media agents based on components in the reserved data streams.

Background for Scalable copy processing of auxiliary copies in a storage management system by using media agents resources

Global businesses recognize the commercial value and need to find cost-effective, reliable ways to secure their information while minimising their impact on productivity. As part of its daily, weekly or monthly maintenance program, a company may back up important computing systems like web servers, file servers, and databases. A company might also protect the computing systems of each employee, such as those used in an accounting, marketing, or engineering department.

Backups alone may not be enough to protect certain important data. Consequently, data that has been protected by backups can be copied onto additional storage media or moved to a lower-cost location for a longer-term storage. One or more copies can be created from a copy of the primary production data. Herein, a copy of a copied is referred as an “auxiliary copy.” A job that creates an auxiliary copies is called an “auxiliary-copy job” in a storage system.

The traditional way to generate auxiliary copies was to centralize the auxiliary-copy management. This is done by a storage manager. The storage manager runs one auxiliary-copy-manager process for each auxiliary-copy-job and sends the job information to various media agents. These agents will then access and process the source copy and create and store the auxiliary copies. This centralized approach can cause bottlenecks within the storage manager. On larger systems, hundreds or even thousands of auxiliary copy jobs may be programmed to run at convenient times of the day. The storage manager may experience performance issues due to the many processes running on the same platform. They may also impact other operations, like backup jobs and/or delay some auxiliary copy jobs. “A different, more streamlined approach to the problem is needed.

An example of a solution would be to use media agents to perform certain control and coordination tasks. This new approach tends offload the storage manger, which is still responsible for managing storage management system in its entirety. This new approach relies on the media agents to control and coordinate auxiliary-copy tasks and taps into the storage manager for data stream reservations and job-specific metadata. The new approach relies on media agents to coordinate and control auxiliary-copy tasks, while the storage manager is tapped for data streams and job-specific metadata. This occurs on demand. The new approach also improves the way the storage manager prioritises auxiliary-copy tasks. The enhanced storage manager does not launch new auxiliary copy processes for each job, as was done in the prior art. Instead, it analyzes the upcoming auxiliary copy jobs along with other pending tasks (e.g. backup jobs, snapshots and replication). The enhanced storage manager prioritizes the jobs via a queue of job priorities. The enhanced storage manager can more accurately reserve and allocate resources across the storage management system by integrating auxiliary copy jobs with other pending job in the queue.

The illustrative approach uses enhanced media agents that include coordination and control logic. The enhanced storage manager can initially select a media-agent as a “coordinator”. The enhanced storage manager can select a media agent to act as the ‘coordinator.’ This role coordinates auxiliary copy jobs with any other media agents that are acting as controllers. A coordinator media agent can operate on the basis of a local coordinator that is triggered from the storage manager. The coordinator media agent’s main responsibility is to obtain data stream reservations from the storage manager, and then assign auxiliary copy jobs to controller media agents based on components in the reserved data streams. The coordinator receives the job status of its associated coordinator media agent and transmits it directly to the storage manager. The coordinator media agents may request from the storage manager additional data streams for auxiliary copy jobs in progress. This is done, e.g. to increase data transfer bandwidth. The storage manager’s response to the coordinator’s request for more data streams may include information about new auxiliary-copy tasks that are pending and identified by the manager (e.g. using the jobs priority queue). These upcoming jobs are then assigned to one or several controller media agents. A coordinator process can start at a media server and coordinate any number auxiliary copy jobs based on the new data streams that are requested by the storage manager. “Media agent.

A controller media agents may operate on a local control process that is triggered when the coordinator media agent receives data stream reservations metadata relative to a auxiliary-copy task that the controller media agents should execute. In certain configurations, the Coordinator media agent can trigger a local Controller process on the media agent that is executing the Coordinator process. A coordinator media agent is more likely to trigger a controller process on another media server that will execute the auxiliary copy jobs. A controller media agent retrieves job-specific meta-data directly from the storage management system before it can start an auxiliary copy job. This on-demand operation frees up the coordinator media agent from being a bottleneck and allows it to be offloaded. Storage manager’s management database is usually used to supply job-specific metadata. As soon as a controller starts, the media agent can process any number auxiliary copy jobs that are assigned by the coordinator agent. The present approach is scalable because it distributes coordination and control across multiple components within the storage management system, and taps into the storage manager when needed.

Systems and Methods are disclosed for scalable copy processing using media agent resource.” In this document, examples of such systems and method are described with reference to FIGS. 3A to 5. The components and functionality of scalable auxiliary copy processing using media agents resources can be configured or incorporated into the information management system described in FIGS. Components and functionality for scalable auxiliary-copy processing using media agent resources may be configured and/or incorporated into information management systems such as those described herein in FIGS. 2.

Information Management System Overview

Organizations simply cannot afford to lose critical data. This is because of the growing importance of protecting and leveraging data. Protecting and managing data is becoming more difficult due to runaway data growth and other modern realities. It is imperative to have user-friendly, efficient and powerful solutions for managing and protecting data.

Depending on the organization’s size, there may be many data production sources that fall under the control of thousands, hundreds or even thousands of employees. Individual employees used to be responsible for protecting and managing their data in the past. In other cases, a patchwork of software and hardware point solutions was used. These solutions were often offered by different vendors, and sometimes had little or no interoperability.

CERTAIN embodiments described herein offer systems and methods capable to address these and other shortcomings in prior approaches by implementing unified information management across the organization. FIG. FIG. 1A illustrates one such information management systems 100. It generally includes combinations hardware and software that are used to manage and protect data and metadata generated by various computing devices within information management system 100. An organization using the information management system 100 could be a company, other business entity, educational institution, household or governmental agency.

Generally, the systems described herein may be compatible and/or provide some of the functionality of one or more U.S patents or patent application publications assigned by CommVault Systems, Inc., each which is hereby incorporated into its entirety by reference herein.

The information management software 100 can contain a wide range of computing devices. As an example, the information management software 100 could include one or more client computing device 102 and secondary storage computing device 106, as we will discuss in more detail.

Computer devices may include without limitation one or more of the following: personal computers, workstations, desktop computers or other types generally fixed computing systems like mainframe computers or minicomputers. Other computing devices include portable or mobile computing devices like laptops, tablets computers, personal information assistants, mobile phones (such a smartphones), and other mobile/portable computing devices like embedded computers, set top boxes or vehicle-mounted devices. Servers can be included in computing devices, including mail servers, file server, database servers and web servers.

In certain cases, a computing device may include virtualized and/or Cloud computing resources. A third-party cloud service provider may provide one or more virtual machines to an organization. In some cases, computing devices may include one or more virtual machines running on a physical host computing device (or “host machine?”). The organization may use one or more virtual machines to run its database server and another virtual machine as a mail server. One example is that the organization might use one virtual machine to run its database server and another as a mail server. Both virtual machines are running on the same host computer.

A virtual machine is an operating system and associated resources that is hosted on a host computer or host machine. Hypervisor is typically software and is also known as a virtual monitor, virtual machine manager or?VMM? The hypervisor acts as a bridge between the virtual machine’s hardware and its host machine. ESX Server, by VMware, Inc., of Palo Alto, Calif., is an example of hypervisor used for virtualization. Other examples include Microsoft Virtual Server, Microsoft Windows Server Hyper-V, and Sun xVM, both by Oracle America Inc., Santa Clara, Calif. In some embodiments, hypervisors may be hardware or firmware.

The hypervisor gives each virtual operating system virtual resources such as a processor, virtual memory, and virtual network devices. Each virtual machine can have one or more virtual drives. The data of virtual drives is stored by the hypervisor in files on the filesystem of the physical host machine. These files are called virtual machine disk images (in the instance of Microsoft virtual servers) and virtual machine disk files (in case of VMware virtual server). VMware’s ESX server provides the Virtual Machine File System, (VMFS), for storage of virtual machine files. Virtual machines read and write data to their virtual disks in the same manner as physical machines.

U.S. Pat. 102,297 describes “Examples for information management techniques in cloud computing environments.” No. No. 8,285,681 is incorporated herein. U.S. Pat. explains some techniques for managing information in virtualized computing environments. No. No. 8.307,177, also included by reference herein

The information management software 100 can include many storage devices. Primary storage devices 104, secondary storage devices (108), and others are examples. You can store any type of storage device, including hard-disk arrays and semiconductor memory (e.g. solid state storage), network-attached storage (NAS), tape libraries or other magnetic non-tape storage devices as well as optical media storage devices. DNA/RNA-based memories technology and combinations thereof. Storage devices may be part of a distributed storage system in some instances. Some storage devices can be provided in a cloud, such as a private cloud, or one managed by a third party vendor. In some cases, a storage device is a disk array or a portion thereof.

The illustrated information system 100 comprises one or more client computing devices 102 that execute at least one application 110, and one or two primary storage devices (104) that store primary data 112. In some cases, the client computing device(s), 102 and primary storage devices (104) may be called a primary storage subsystem. 117 A computing device that is part of an information management systems 100 and has a data agent 42 installed and running on it is called a client computing device (or in the context of a component in the information management systems 100, simply as a “client ?).””).

The meaning of the term “information management system” depends on the context. It can be used to refer to all the software and hardware components. In other cases, it may only refer to a subset or all of the components.

Click here to view the patent on Google Patents.