Software – Anand Vibhor, Bhavyan Bharatkumar Mehta, Amey Vijaykumar Karandikar, Commvault Systems Inc

Abstract for “Work flow management in an information management system”

“Disclosed are methods and systems for managing information management operations. The system could be set up to use a work flow queue in order to reduce network traffic or manage server processing resources. The system can also be used to forecast and estimate information management operations using throughput estimates between computers that are scheduled to execute one or several jobs. System alert recipients may be used to determine if the system can automatically escalate or reassign notification. Other embodiments are also described in this document.

Background for “Work flow management in an information management system”

Global businesses recognize the commercial value and seek cost-effective, reliable ways to secure their information while minimizing productivity. Information protection is often part and parcel of an organizational process.

A company may back up important computing systems like web servers, file servers, web server, etc. as part of its daily, weekly or monthly maintenance plan. A company might also protect the computing systems of each employee, such as those used in an accounting, marketing, or engineering department.

Companies continue to look for innovative ways to manage data growth and protect data, given the ever-growing volume of data under their control. Companies often use migration techniques to move data to cheaper storage and data reduction techniques to reduce redundant data, prune lower priority data, and so forth.

“Enterprises increasingly see their stored data as an asset. Customers are seeking solutions that can not only manage and protect their data but also allow them to leverage it. Solutions that provide data analysis capabilities, information management and improved data presentation and accessibility features are increasingly in demand.

“Overview”

Organizations simply cannot afford to lose critical data. This is because of the growing importance of protecting and leveraging data. Protecting and managing data is becoming more difficult due to runaway data growth and other modern realities. It is therefore essential to have efficient, powerful and user-friendly data protection and management solutions.

“Depending on the organization’s size, there may be several data production sources that are under the control of many employees, students, or thousands of individuals. Nearly all employees, students and other people now have access to a computer (or are assigned one). It is essential for daily tasks. To provide information management and other services for clients, organizations deploy servers in various hierarchical configurations.

A storage manager can be used to manage server jobs not covered by either a data retention or data storage policy. This will increase productivity of computing devices such as servers within an information management system. The present disclosure provides methods for managing servers using non-storage policy or non-retention job jobs. These jobs are based on server statuses such as idle and available. However, the methods can be applied to any computing device. The storage manager can be configured to issue jobs to servers by using push queue techniques. This reduces query traffic to the storage manager. Reducing server-originating queries can lower the server manager’s load, increase network bandwidth availability, and enable server processing resources to be used to process jobs already assigned to servers.

“In addition managing jobs such as queuing or issuing them, a storage manager can notify system administrators, or other users, jobs that seem unlikely to be completed within a specified time limit. The present disclosure describes systems and methods for forecasting and estimating job failures in a timely fashion. These include throughput estimations between a transmitting and receiving computing device. Instead of comparing the number or amount of jobs copied to a threshold to determine failure, the system compares data thruput (the amount of data processed per unit of time) to a threshold. A storage manager can generate an alert to notify users of a predicted failure. The user can either address the problem or reschedule the job to avoid the alert. A user can prevent network congestion by receiving an alert and taking corrective action. This could help other users.

“Even though a storage manager may generate or transmit a system alert to a recipient, it is possible for the user to not respond or to be unable to address a job failure or network congestion. The present disclosure describes systems and methods for escalating alerts when an alert recipient is not available. You can use system directory tools to determine if an alert recipient is available or based on the failure to acknowledge the alert. You can escalate an alert by sending the alert to other members of your information technology (?IT?) team or supervisors. a team of information management administrators. A storage manager can decrease the chance of system failures not being addressed and reduce organizations’ risk of unprotected information systems or unprotected computing devices by escalating alerts. The storage manager might be configured to raise alerts in the event that a storage device, storage computing device or network bandwidth drops below a certain threshold or if a job is expected to fail within a specified time limit.

“Brief Information Management System Overview.”

“FIG. “FIG. 1 illustrates work queue management within an information management system 100 according to one embodiment. There are many computing devices that can be used in the information management system 100. As will be explained in detail, the information management systems 100 could include a primary storage system 102, secondary storage subsystems 104 and a storage manager. These components and systems allow users to create, store and manage data objects that are associated with them.

“The primary storage system 102 comprises one or more client computing device 108 that is communicatively connected to one or several primary storage devices 110. Any number of electronic computing devices can be included in the client computing device 108, including a laptop, tablet or smart phone.

“As shown, the client computing devices 108 may contain one or more data agents 112. These are designed to manage information generated through or through the use one or more of the applications 114 that are installed on the client computing devices 108. To facilitate manipulation and retention of primary data 116, the data agent 112 communicates to the storage manager 106 and the primary storage 110.

According to some embodiments, primary data 116 can be production data or any other?live? data. Data generated by the operating systems and other applications 114, residing on a client computing devices 108. Primary data 116 is usually stored on the primary storage device(s), 110. It is organized using a file system that is supported by the client computing devices 108. The client computing device (108) and the corresponding applications 114 can create, modify, delete, write, or otherwise use primary data 116. Some or all of the primary information 116 may be stored in the cloud storage resources.

“Primary Data 116 is usually in the native format for the source application 114. Primary data 116, according to some aspects, is the initial or first stored copy (e.g. created before any other copies, or at least one additional copy) of data generated from the source application 114. In some cases, primary data 116 is substantially created directly from the data generated by the source applications 114.

“Primary data 116 can sometimes be called a?primarycopy? It is a set of data. This does not mean that the “primary copy” is necessarily the one being used. It is a copy in that it was copied from another version.”

“The primary storage device 110 is able to serve the storage requirements of the client computing devices 108 in any of a variety of storage device implementations. The primary storage device 110 could be either a solid-state or mechanical hard drive, or a network accessible storage unit (?NAS). You can also use the same.

“The primary storage system102 shows a single client computing unit 108, and a single primary computing device 110. However, the primary subsystem102 can contain dozens, hundreds, or even thousands of client computing units 108, and primary storage devices 110. The primary storage subsystem (102) can include any or all computing devices that are used to support the productivity of a company, educational institution, or another organization that values the preservation, retention and maintenance of electronically generated data.

In the discussion that follows FIGS., you will find additional details about various exemplary embodiments for the components of primary storage subsystem 102. 9A-9H.”

It may be helpful to create copies of the primary data (116) for recovery purposes and/or regulatory compliance. The secondary storage subsystem104 contains one or more secondary storage computing units 118 and one, or more, secondary storage devices 120 that are configured to create and store one (or more) secondary copies 124 (inclusive copies 124a-124n) of primary data 116.

“Creating secondary copies 124 can aid in search and analysis efforts. It also helps meet other information management objectives, such as: restoring data or metadata if an initial version (e.g. of primary data 116) is deleted, corrupted, or lost; allowing point-in time recovery; complying to regulatory data retention and electronic discovery requirements; reducing storage capacity; facilitation of organization and search; increasing user access to data files across multiple computing devices; and implementing data retention policy.

“The client computing devices (108) access and receive primary data 116. They then communicate the data, e.g. over the communication paths 126 to the secondary storage device(s), 120. One or more private and/or publicly accessible networks can be included in the communication pathways 126, such as campus area networks and metropolitan area networks.

A secondary copy 124 may contain a separate, stored copy of the application data. It can be derived from one or several earlier-created, store copies (e.g. primary data 116 or another second copy 124). Secondary copies 124 may contain point-in time data and can be stored for a relatively long period of storage (e.g. weeks, months, or years) before any or all data is moved to another storage or discarded. You can choose from full backup, incremental backup or auxiliary copies.

“The secondary storage computing device 118 provides an intermediary interface between secondary storage devices 120, and other components within the information management system 100. To facilitate inter-component communication within the information management 100, each secondary storage computing device 118 can be associated with or may contain a media agent 122. The media agent 122 is configured to communicate with both the storage manager (106) and the data agent (112 of the client computing devices 108). The media agent 122 interfaces with secondary storage devices 124 for copying, reading, analyzing, transferring or other manipulation of secondary copies 124.

In the discussion that follows FIGS. 9A-9H.”

“The storage manager106 is a centralized storage/or information manager that can be configured to perform specific control functions. The communication channel(s), 126 connects the storage manager 106 to the primary storage system 102 and secondary storage subsystems 104. The storage manager106 facilitates data transfer between the primary storage system 102 and secondary storage subsystem.104 The storage manager 106 might instruct the data agent 112 how to retrieve all or part of the primary data 116. The storage manager 106 can then initiate communications with the data agent 112 to one or several media agents 122 in order to transfer some, or all, of the primary data (116) to one or multiple secondary storage devices 120. In some embodiments, the storage manger 106 may use a software module such as a jobs agent 128 to initiate, facilitate, schedule and otherwise manage communications between data agent 112 122 and media agents 122.

“The storage manager106 can be configured for additional information management operations. The storage manager 106 could include an index 130, or interface with the index 130. The index 130 could be a database, or another data structure, that can be used for scheduling and tracking information management policies. The storage manager 106 can update index 130 to reflect an operation such as a transfer of information between the primary storage system 102 and the secondary storage system 104. The index 130 can be updated by the storage manager 106 to reflect any information management operations that have been performed or are scheduled to occur in the information management software 100. In accordance with data retention policies, the jobs agent 128 can reference the index 130 before transferring a secondary copy (124) from one secondary storage computing unit 120 to another, slower, and more expensive secondary storage device 120.

“The information system 100 could be a single information system cell that contains multiple information management system cells. It may also include a number of educational institutions, businesses, or other organizations. A management agent 132 may be used by the storage manager 106 to communicate with other storage managers from other information management system cell cells. The storage manager 106 may query other storage managers and other information management system cell cells to obtain information that meets the requirements of the queries. The storage manager 106 can update any or all databases, tables, data structure, or similar, upon receipt of information from other information management systems cells.

“While it is possible to distribute functionality across multiple computing devices, there are other situations where it may be advantageous to consolidate functionality on one computing device. In various other embodiments, any or all of the components in FIG. 1, which are all implemented on separate computing devices, can be implemented on the same computing machine. One configuration includes a storage manager (106), one or more data agents (112 and one or two media agents (122) all being implemented on one computing device. One or more data agents 112 or one or several media agents 122 can be implemented on the same computing device while the storage manager (106) is on a separate computing machine.

“Work Queues”

The storage manager 106 can be set up to manage different types of jobs within the information management software 100 by using different resources. The storage manager 106, for example, can group all jobs and tasks in the information management software 100 into one or several types or categories. The storage manager can then allocate certain types of jobs to specific storage manager resources such as processes. The storage manager may allocate 134 jobs to jobs to execute a data storage or retention policy. A second group of 136 jobs can be allocated to jobs to perform other jobs in the information management system 100. The first group of processes 134 includes tasks such as backing up, restoring, and analyzing data. The second group of jobs 136 includes those associated with maintaining the information management system 100 (e.g. software updates), security maintenance (e.g. security patches, virus scanners, etc.). Information management system policy synchronizations (e.g. changes to job preemption policies and job priorities, updates of alert definitions, etc.) are also possible. It may not be possible to manage the jobs that are associated with the first group 134 of the information management system 100 from other components than the storage manager. These jobs can be managed by the storage manger 106 in accordance to data storage and retention policies. They are also stored and maintained at the storage manager106. The first group 134 can be interchangeably called information management operation or system process 134. While the second group 136 is different than the first group 134, it can be interchangeably called information management system or operation processes.

“In contrast, tasks or jobs associated with the second group 136 can traditionally be initiated and managed by servers, clients, or other than the storage manager (106), within the information management software 100. This implementation of task management may have some disadvantages. If the storage manager106 manages many client computing devices 108, 118 and secondary storage computing device 118, then receiving request for updates or task authorizations from all these devices can lead to a bombardment on the storage manager106. Client computing devices 108, and/or secondary storage computing device 118, can all dedicate processing resources, such as CPU cycles or memory. To request information from the storage manager106 or to allocate CPU cycles and memory, these resources will be at least partially inaccessible for backup, restoration and retention operations. Each request also consumes bandwidth on communication channels 126 that communicate with the components of information management system 100.

“Configuring the storage manger 106 to manage the jobs associated with the second set of processes 136 is possible by taking advantage of the storage manger 106 holistic awareness the status of each computing device within the primary and secondary storage subsystems 102 and 104. For example, since the storage manager 106 already uses the jobs agent 128 for tracking the status of various jobs within the primary storage subsystem 102 and the secondary storage subsystem 104, the storage manager 106 is positioned to efficiently issue non-storage/retention policy jobs to the computing devices of the primary storage subsystem 102 and secondary storage subsystem 104 based on the operational statuses of the computing devices. The storage manager 106 could issue a management job to a secondary storage computing devices 118 if that secondary storage computing device is available and online.

“The storage manager106 may use different work queues to manage (e.g. track and schedule) jobs within the information management system 100. The storage manager 106 might manage the first group 134 by using a first queue 138. The second group of processes may be managed by the storage manager 106 using a second queue 140.

“The first work queue number 138 can be any type of data structure such as a table that contains a number columns identifying aspects about a job such as the job ID, device ID, media agent identifiers, job type, job status, etc. The first work queue 138 may also contain additional columns such as an error identification, data agent identification, or a numerical indication about job progression, although this is not shown. A number of rows 142 may be included in the first work queue 138, each one associated with a job or task.

The second work queue 140 might only contain jobs that are related to the second group of process 136. The second work queue 140 could include columns such job ID, device ID and status, job type, job status, and job status. The second work queue 140 can be divided into one or more rows of tasks, 144. The second group of tasks 136 could include jobs or tasks that are related to the information management system 100, but are not directly connected to backing up, restoring, or retaining data. The second group of processes (136) can also be used to manage jobs that don’t execute data storage/retention policy execution, are not related to them, or are only tangentially connected to the information management system’s data retention policies. The second work queue 140 may be responsible for installing security patches, synchronizing information management system policies and other software updates.

Based on the status of devices under their control, the storage manager 106 could issue jobs to media agents.122 The storage manager may wait for the job to be distributed in the first row 144 of the second queue 140. This is based on the status of the computing devices that the media agents 122 control. The storage manager 106 can queue a job until the status of the computing devices becomes available or idle. Computing device 1 (shown at the second work queue 140 in one embodiment) is the secondary storage computing devices 118a and client computing devices 108. The storage manager 106 can preempt jobs in the first queue 138 for more urgent jobs. The storage manager 106 might wait until there are only one, two or a few jobs for a device in the first queue 138 before deciding whether to preemptively issue or prioritize a job in 140 over the job(s), still in the first queue 138 for that device.

The storage manager 106 suspends all unidirectional communication to computing devices when they are offline. The storage manager106 reduces the incoming traffic by issuing and dispersing jobs unidirectionally from second work queue 140. This could be used to distribute jobs to media agents 122 or data agents 112. The storage manager 106 can be configured to stop issuing jobs if a computing device belonging to a media agent 122, or data agent 112 is offline. This will further reduce network traffic. To determine when a computing device becomes online, the storage manager 106 could periodically or continuously ping or transmit messages. A more efficient network implementation of the information manager system 100 could be to configure all computing devices to notify storage manager 106 when a status changes from offline to online. The storage manager 106 may update the second queue 140 to reflect the current status of the device and resume the distribution of jobs from the second queue 140.

“Some or all of the second groups of processes 136 may be executed by other devices in the information management system 100. The information management software may be installed on the computing devices that make up the information management system 100 and executed by them. FIG. 1 illustrates the first and second groups of processes 134, 136 respectively. 1 is included in the client computing devices (108) and secondary storage computing devices (118) respectively, as well as the storage manager 106.

The information management system 100 will run more efficiently and less trouble if the storage manager 106 is configured to manage jobs that have been completed or managed by client computing devices (108) or secondary storage computing device (118). The storage manager 106 can protect itself against being bombarded by task or job requests. Network traffic on channel 126 may also be reduced. Client computing devices (108) and secondary storage computing device 118 can then focus their processing resources on jobs related to data storage and retention policies. The embodiments above refer to a work queue for administrative tasks that is managed by the storage manager. However, another or more secondary storage computing device 118 can execute a similar work queue to reduce bombardment by these computing devices by subordinate devices, e.g. client computing devices. Client computing devices 108 can wait for secondary storage computing device 118 to start jobs or tasks. This is an alternative to pinging secondary storage computing device 118 for job requests updates. This configuration can further reduce network traffic and protect servers from being bombarded by requests. It also allows client computing devices to 108 allocate processing resources to non-managerial tasks or jobs.

“FIG. 2. This illustrates a method 200 for managing a work queue in an information management software that is different from jobs in a data storage or retention policy. In accordance with one embodiment, the method 200 could be executed in an information management system similar to 100.

“In block202, a storage administrator receives jobs from the Internet or from software program administrators. These jobs can be performed in addition to the data retention or storage policy-defined jobs. According to different embodiments, the jobs could include security patches, software updates, or synchronizing configuration changes across the information management system.

“In block 200, the storage manager updates at most two work queues of jobs using the received jobs. The storage manager schedules or organizes tasks or jobs that are related to executing a data storage policy or data retention policy. The second work queue includes jobs and tasks of a different nature than those in the first queue. The jobs in the second queue may not be related or tangentially related with the jobs specified by the data storage policy or data retention policy. However, execution of the jobs in the second queue is required for the information management system or to function.

“At block206, the storage manager updates work queues with statuses computing devices to which jobs are scheduled for assignment. Statuses for the computing devices in the information management system could include offline, online or available, busy, processing the job and job completed recently, job paused, job canceled, job lost, job finished, job recently completed, job stopped, or other similar.

“At block208, the storage manger issues jobs to the two or more work queues according to priority settings for jobs and based at minimum in part on the current status of computing devices to which they are distributed. The storage manager can be configured to distribute jobs outside of those defined by retention policies and data storage. This allows the storage manager more efficient management of network traffic and protection from being bombarded with requests.

“Throughput Failure Forecasting.”

“If a computer device cannot complete one or more scheduled, queued or issued jobs within a given timeframe, it might be helpful to notify or alert a system administrator, other users, or the deficiency. There are many reasons why jobs may not be completed in a timely fashion. Unexpectedly, data can increase or decrease in the job, making it impossible to transfer data within a given time frame. Another example is the network throughput. This refers to the rate or amount data transfers over time. It can decrease unexpectedly and/or dramatically enough to make an allotted window of time insufficient to transfer a certain amount of data between computing device. Another example is when a computer used in transmitting or receiving data may fail to function properly. Notifying the appropriate personnel about any potential problems may allow a system administrator or IT administrator to fix any hardware or software issues that might be preventing an individual job from being executed. Information management system operations that fail to take place during regular windows of time can cause delays in the system and spread to other areas. This is similar to a traffic jam. An organization’s information can be at risk if it isn’t being fully or partially backed-up. These systems and methods are described in accordance to various embodiments of this disclosure.

“FIG. FIG. 3 shows an information management system 300, which can be configured to provide an operator failure forecast interface 302. A user may be able to set parameters to forecast information management failures, and generate alerts about those failures. The operation failure forecasting interface 302 can be accessed via a web-based interface, hosted by the storage manager. It may be accessible from any computer device, whether internal or external, connected to the information management software 300. One implementation of the operation failure prediction interface 302 contains a definition of operation 304, an operation selection 306, an alert notification time 312 and 312 respectively, as well as a default action menu 314, a stop menu 316, and an alert selection menu 312. The operation failure forecast interface 302 allows users to modify the feature according to their preferences or system administrators. One example of an operation fail forecast interface 302 is displayed. There are many others.

“System administrators schedule resource-intensive information management operations in accordance with their convenience and, most importantly, the availability of network throughput. Network throughput, as it is defined herein, refers to data transfer rates from one computing device or another. Network throughput can be defined as both the bandwidth available to network communication channels, and the processing speed and/or availability of computing devices involved in data transfer. Network throughput can be measured from a source device to a destination device or target device. It may also include the number of networks used in data transfer. The network throughput measures the speed at which data is processed by the source computing devices for transfer over the networks, transferred from the source computing devices to the target computing devices, and/or stored temporarily by the target computing devices after receiving it over the networks. A particular data transfer or operation might have start times that are dependent on other jobs being completed, heavy network bandwidth usage and/or availability of other components. Information management operations may have stop-time limitations that are determined by other scheduled information management operations, scheduled maintenance or an otherwise upcoming demand for network resources. You can define a specific operation window 304 using the operation failure forecast interface 302. The operation window 304 can be used to specify a date for an operation as well as a time and duration. You can define the day of operation in terms of days of week, such as Sunday through Saturday, days of the year, days of the month or days of the week. You can also define the duration option using any of several duration parameters such as minutes, seconds, hours, days or the like. You can display the time in either 24-hour or 12-hour increments. The operation window 304 defines the end time and duration. There is no need to define a start time. In some embodiments, however, an operation window 304 may include a start time description, either in lieu or in addition to any of the parameters illustrated.

“In the operation menu 306. The operation failure forecast interface 302 allows the user to select the information management operation that the alert will be applied to. You can choose from a drop-down or other options selection interfaces in the operation menu 306 You can populate the operation menu 306 with backup copies, disaster recovery copies, compliance copies, auxiliary copies, archive and other options. A user may be able to choose to perform more specific operations, such as full backup or incremental backup, synthetic back-up, and the like.

“In some embodiments, the operation failure forecast interface 302 displays the recommended time for an operation window based upon previous information management operations. If the user has allocated one hour for a full backup of 10 TB of computing systems, and a previous similar operation took 10 hours, then the operation failure forecast interface 302 may inform the user about the times of similar operations that were based on operation history timetables. The index 130 may store tables of operation history in some embodiments.

“The operation fail forecast interface 302 contains the throughput estimate menu 308 that allows users to choose from several throughput estimation methods. Although the throughput estimation menu 308 can be displayed as a drop-down, it can also be used as a textbox, multiple check boxes, radio buttons or other graphic interface elements. The throughput estimation menu 308 shows at least three methods that a storage manager can use to estimate the throughput for an information management operation. These techniques include a window technique for previous jobs, a window technique of time, and a graphic correlation technique. Below are descriptions of each of these techniques.

According to one embodiment, the storage manger 106 estimates throughput for job using throughput data from one of more jobs. You can select one or more of the previous jobs that were used in the estimation to provide a sample set. This allows you to choose operations that have varying degrees or correspondence with the operation in the operation menu 306 To provide an indication of current throughput within the information system 300, it is possible to use one or more jobs that were performed immediately before the job selection in the operation menu 304. Another example is to filter the jobs more closely with the job in operation menu 306 by adding one or more jobs executed on the same computing device as the job in operation menu 306. Another example is the use of one or more jobs from the past to estimate throughput. This can be done by calculating the average throughput for jobs of the same type (e.g. average number of full backups and incremental backups). The following information is required: the time the job was performed, the date and the hour, the day the job was executed, as well as the name of the previous job.

You can also apply different mathematical functions to the job or jobs that were selected for throughput estimation. For example, the average throughput of the jobs. You can use the lowest or slowest throughput job to make a conservative estimate. To estimate the throughput for the job in operation menu 306, the fastest or most productive of the previous jobs can be used.

According to another embodiment, the storage manger estimates throughput for the job selected in operation menu 306 using throughput measurements taken within a specific time window. You can choose to include past day’s throughput measurement, several days of throughput measurement, or an entire year of data throughput measurement. The downside to the window of time technique is that it may not accurately represent positive and negative spikes, extremes in throughput rates by using an average of throughput measurements. A Sunday’s throughput rate may be much higher than the actual rates at the close of business on Thursday or Friday, when employees are more likely to use network bandwidth while surfing the Internet. A window of throughput measurements that spans a week may not correspond well with the time for which a job is to be executed. One embodiment uses the average throughput measurement within the same time frame as the operation in operation menu 306 to determine the schedule for execution. Other embodiments use statistical functions to estimate throughput rates for a given time window. The storage manager 106 can create quartiles of throughput rates. For example, the first and second quartiles are throughputs that are lower than the mean or average throughput data for the chosen window of times, while the third and fourth represent throughputs rates greater than the average or median throughput rate within the window. Storage manager 106 can use either the average of the first or second quartile throughput measurements to estimate throughput more conservatively. Storage manager 106 can use the average of the third or fourth quartile throughput measurements to make more optimistic estimates. Alternately, the storage manager106 can use the lowest throughput rates achieved within the time window. This could provide system administrators with an ‘worst-case scenario’. Estimation of the time it would take to complete a job. Additional statistical operations can also be used. The applied statistical functions may include, for example, determining one or more standard deviations below the mean throughput measurements and using them.

According to another embodiment, the storage manager (106) may use graphical correlation techniques in order to estimate throughputs for a specific job. You can use the historical graph to show cyclic variations in throughput for a longer period of time. The storage manager 106, for example, can be set up to calculate cyclic patterns graphically or mathematically based on the days of each week, days of each month, times of month, and other factors.

“FIG. “FIG. A y-axis 402 may be included in the historic pattern correlation graph 400. This represents network throughput. It is a rate per minute (e.g. gigabytes or megabytes per second). Historic pattern correlation graph 400 also contains an x-axis 402. For average throughput, the x-axis 406 may contain more than one reference. The x-axis can be used to identify the days 406 and 406 respectively in a monthly cycle. This representation or calculation can provide better estimates of future throughput, as it may be more accurate to estimate future throughput, because the end of a month cycle, the beginning of a month cycle, and the middle portion of a month cycle could all show similar throughput averages over a time. The graph 400 can contain average data points for a monthly, annual, semi-annual, and other cycles. Some times in a monthly cycle (e.g., the time window for 412 which is approximately seven days) may show relatively higher throughput averages, or measurements. The storage manager 106 may notify a system administrator of the relatively higher throughput rates while operating the operation fail forecast interface 302.

“The storage manager108 may compare historical measurements with a snapshot of throughput measurements 414 in order to predict future trends. The storage manager 106 might compare the current throughput measurements 414 to the average throughput measurements for a specific time period 416. For example, seven days. The storage manager 106 may use the plotted throughput trends derived from the average throughput measurement 410 as a forecast/estimate if the time window 412 is strongly correlated with the average throughput measurements. If the correlation between throughput measurements 414 & average throughput measurements 412, is weak, storage manager 106 can use the operation failure forecast interface 302. This will indicate weakness and recommend alternative throughput estimation techniques. The storage manager 106 can use various mathematical operations to determine the correlation between throughput measurements 414, and average throughput measurements (410). The advantage to using historical pattern correlation is that it allows for the capture of cyclical patterns, such as weekend days or throughput associated at the middle of the month (versus beginning or ending of the month), over time. This may give a more reliable indicator of throughput.

“Returning back to FIG. “Returning to FIG. The alert threshold 310 lets a user set a threshold to initiate an alert based upon the estimated completion time of a job. If the user wishes to be notified if the job takes longer than 30 minutes, they can enter 30 minutes into the alert threshold 310. The alert selection menu 312 lets the user select from one or more alert types, including SMS, email, page, voicemail, and page. The alert selection menu 312 shows check boxes and text boxes that allow users to enter email addresses, telephone numbers, page numbers and telephone numbers. Other selection menus such as drop-down options may be used. The default action menu 314 lets a user choose the default action that the storage manager 106 will take if a job is not completed by the deadline or the alert threshold. One embodiment allows the storage manager 106 to be set up to stop a job if it is determined that the job won’t complete by the deadline. Other embodiments allow the storage manager to continue processing the job even after an alert is sent. You can use the stop menu 316 to tell the storage manager to stop a job at specific times relative to the end time. The storage manager 106 can be set up to stop a job either before, during, or after the specified end time depending on various factors. The priority of the job, as well as the availability of network resources, are some of the factors that the user might consider.

The storage manager 106 may also be set up to send alerts based upon live throughput measurements between a transmitting and receiving device. If the storage manager 106 performs a backup operation on primary data 116 to the secondary storage device 120n, the storage manger may measure or time the speed at which a certain portion of a data transfer takes place by measuring or timing a delivery of, say, one tenth the total size of the data to be delivered. If the primary data to be backed up is 1 terabyte in size, the storage manager may calculate the throughput between the primary storage device 110 and the secondary storage 120 n using the rate at which one, or more, preceding gigabytes are successfully transferred. Alternativly, the storage device may be set up to send a pilot packet to establish a current estimate of throughput before beginning an information management operation. Some embodiments measure throughput based on data transferred between a primary storage device 110 and a secondary storage facility 120. In other embodiments however, throughput can be measured using data transferred from a primary storage unit 110 to a second storage device, 118, or from a client computer device 108 and a second storage device, 118.

The ability to predict, forecast, or estimate the failure of an information management operation in order to complete it in a timely fashion may allow a user to proactive trouble-shoot and manage the information management system 300. As an example, a failure prediction can be used by a system administrator for rescheduling of previous or preceding operations. This can enable the administrator to justify upgrading network hardware. It can also enable him to identify bottlenecks in the information management system 300 and/or enable him to protect information more confidently.

“FIG. “FIG.5” illustrates an operation of the information management system failure forecast features. The ability to predict, forecast, or estimate the time it will take for information management operations to fail in a timely fashion can be a useful tool for system administrators or other users of the information management system.

“At block 502, a computing system receives from a user a threshold, such a time-related threshold. This threshold is used to determine the completion of one or more information management operation. The time-related threshold can be defined in various ways. It may be in the form of days in a week, days in a month, or days in the year. It also could be in terms the start time, end times, and/or duration of the information management operation. The threshold could also be used to define the time frame within which the operation should be completed.

“A block 504 is where a computing device calculates data throughput for a specific information management operation. A variety of techniques can be used to estimate or measure data throughput, including the use of past jobs, the window of times, and/or cyclic patterns that are based on historical throughput measurements.

“Equation 1 units are seconds and gigabytes. However, other units such as minutes, hours and days, megabytes and terabytes can be used.”

“At block 508, a computing device informs the user if it estimates that the information-management operation will not complete prior to or by the time-related threshold specified by the user. For example, Sunday, December 29, 2013, at 11:00 p.m. The computing device may use any of the following methods to send the alert to the user: email, text message or a page.

“Escalating Alerts”

“The information management software 300 can generate an alert when an operation in information management, such as jobs in the first and second work queues 138 and 140, is completed (shown in FIG. 1) are expected to be completed by a certain time or estimated to be incomplete. You can configure the system 300 to generate additional alerts about information management system 300. The system 300 can also be configured to generate alerts about application management, automatic upgrades, configuration alerts and job management alerts. The information management system 300 can generate alerts for any of the following: when a Microsoft Exchange mailbox exceeds a certain limit; when software updates, downloads, or upgrades become available; whenever a storage manager client, media agent or data agent configuration has changed; when data aging or data classification, protection, recovery, or verification operation stalls or fails; when one, two, or more media drives, media libraries, go offline unexpectedly; or any other event relating to data management.

“Some alerts are common in an information management system. They can be ignored or addressed at the discretion of the system administrator. However, other alerts can have a significant impact on an information management system’s ability manage and protect an organization’s information. Alerts regarding online to offline status changes for secondary storage devices or storage libraries can prove problematic and may prevent the execution of important storage and/or retention operations. An information management system can generate alerts in response to different alert-generating events. Alerts may go unanswered if they are sent to employees on vacation, not working for the organization, absent from the office, sick, bereavement leave or who are involved in personal matters that could hinder or stop the recipient from responding to the alert. One embodiment of the invention allows an information management system (e.g. information management system 300) to be set up to automatically escalate unacknowledged alerts up a hierarchy of management until the alert has been acknowledged and/or someone takes corrective action to resolve the alert-causing incident.

“FIG. “FIG. Some of the alerts or events may be linked to events that can hinder or prevent an information management system’s ability to protect an organization’s information. The 600 employee hierarchy chart represents a hierarchy of people who are responsible for maintaining an information system. The chart 600’s lowest level of employees is responsible for acknowledging, addressing and/or resolving alert-related events. However, the ultimate responsibility for resolving an event-driven alarm rests with the person at top of the hierarchy. an IT division director.”

According to one embodiment, “The employee hierarchy diagram 600 could include task-specific teams 602 or layers of management 604 according to one embodiment. Task specific teams 602 could include IT administrators and personnel responsible for maintaining and updating the organization’s information technology infrastructure. A task specific team 602 could include an information management group 606, a software support group 608, a network support staff 610, a team 612 and an administrative support crew 614. The information management team 606 could be responsible for ensuring data storage and retention policies are properly executed. Other IT-related tasks may include updating and maintaining communication networks, installing new software and operating systems, setting up and purchasing new computers/clients, as well as creating usernames and passwords for clients.

The storage manager 106 can be set up to raise specific alerts based on a hierarchy, priority, or set rules. An alert priority rule or an alert escalation policy related to information management operation can be set up to be sent to different members of the information team 606, team supervisors, division managers, and finally, to the director or a client. The storage manager 106 can be set up to transmit an alert first to a team member. Information management team 606. Some embodiments allow team members of the information management group 606 to be given the designation of team member. Different types or alerts can be assigned to different team members within the information management group 606 in order to share responsibility for high priority alerts. The storage manager 106 can be set up to wait for acknowledgement for the alert for a certain amount of time (e.g. 30 minutes). If the alert is not acknowledged by the expiration date, the storage manager 106 can escalate it to team member?B. The alert can be escalated to the information management team 606 by the storage manager 106. Each team member will receive a predetermined time limit to acknowledge the alert. The storage manager can escalate the alert to a higher level of management if all members of a team fail to acknowledge the alert promptly. If the information management group 606 comprises team members A,B,C, and D, then the storage manager 106 can be configured to escalate the unacknowledged alarm to team member E, who is the supervisor of the information management group 606. The storage manager 106 can configure the alert to be escalated to team supervisor F or team supervisor G, before taking the matter to the next level. In some cases, the storage manager 106 may be set up to escalate unacknowledged alarms to H and then to escalate the alert to I and/or the client J. In some embodiments, 600 is the employee hierarchy chart that represents the team responsible for IT support within the organization for which the alert was generated. In another embodiment, the employee hierarchy diagram 600 is an outside IT services firm or group that has been hired to manage information operations and/or alerts for another company, such as client J.

The employee hierarchy chart 600 shows one example of an alert escalation pathway. However, the storage manager (106) can be configured to execute and escalate alerts using other priority routes or other escalation rules. According to different embodiments, the time it takes to generate an alert and escalate an alert can be extended or decreased. The alerts can also be sent to all levels of management 604 before being escalated to a higher management level. Some embodiments allow the storage manager to escalate an alert to the first level of management within a predetermined time, such as 30 minutes. Then, it can escalate the alert to the higher management level 604. In the discussion below, additional options are provided for setting and managing escalation priority rules. 8.”

“FIG. “FIG.7” illustrates a flow chart of 700 methods that can be used by a storage manager, or any other computing device in an information management system, for the purpose of increasing information management operation alerts. The advantage of elevating alerts within the hierarchy of a team that is responsible for information management can be to reduce the time between an alert-causing incident and acknowledgment (and remedy).

“At block 702, a computing devices receives an alert of a system problem, system slowdown or other alert-causing information system event. A computing device may be notified that the status of a secondary storage device has changed unexpectedly from an online to offline status. This could prevent an information management system’s ability to execute backup operations. It may also expose an organization’s information to greater risks of data loss than it wants.

“At block 704, the computing device determines the point of contact to receive an alert. An alert may indicate a system failure, slowdown, or any other system event that was triggered in block 704. A set of rules or a service team hierarchical or employee hierarchy chart may be used by the computing device to determine the first point of contact. Or, it could use a list of manually entered contacts.

“At block 706, a computing device determines whether a point-of-contact is available using directory services. Directory services may be used by the computing device, such as Microsoft’s Active Directory, Lync, Novell?s eDirectory and Apache’s ApacheDS. Oracle’s Oracle Internet Director, OpenDS or other similar. Many directory services have specific interfaces for application programming or can be used with a general directory access protocol such as LDAP (lightweight director access protocol). The computing device can query various attributes of directory services, such as organizationStatus and meetingEndTime and meetingStartTime and meetingScope to determine if the point of contact is still employed, out of the office or on a call. It can also determine if the point of contact is unavailable to respond to alerts and acknowledge it. The computing device can, for example, call a mobile or home telephone number to determine if the point of contact cannot be reached. Some implementations allow the computing device to make several calls to the point of contact, such as three calls in 60 minutes before it determines that the point is not available. The method 700 will block 708 if the point of contact is not available. If the point is accessible, the method proceeds to block 708

“At block 708 the computing device determines the next point of contact to receive an alert. The computing device can identify the next contact by reference to one or more organizational tables, charts or by walking through an automatically generated or manually generated list. These lists may be prioritized based on seniority or job function within an organisation. If an IT administrator is the primary point of contact, then the IT administrator’s manager or supervisor may be the next point. The computing device could also determine that the next point is someone who has a peer relationship to the primary point. The computing device can then be configured to escalate an alert to contact who have supervisory roles or relationships with the primary point-of-contact after exhausting the list of contacts. Block 708 moves on to block 706, where computing determines the availability for the next contact. The 700 method may alternate between blocks 706 and 708 until the organization has found an available point-of-contact.

“At block 710 the computing system notifies or alerts the point of contact for the alert-generating event. The computing device can alert the point of contact using any combination of electronic resources, according to different embodiments. The computing device could alert the point of contact by using a pager or a cell phone (e.g. text message and/or electronic recording), an email, home telephone, RSS feed, or other electronic resources. The computing system may alert more than one person at once in some cases. The computing system could be set up to notify a point-of-contact and his/her supervisor, such as by copying them on email to the point of touch. This duplicative notification can be used to alert the point of contact’s supervisor about escalating alarms so that they are not taken by surprise when an alert goes to them.

After sending one or more alerts, the computing device can be set up to escalate and/or transmit the alert to another point of contact if the first point of contact fails acknowledge, respond to and/or correct the original event. For example, the computing device can be configured to host a web interface through which employees of the 600-member hierarchy can log in and acknowledge receipt. The method 700 will continue to block 708 if the alert is not received within the predetermined time. The method 700 is terminated at block 712 if the contact has acknowledged the alert within the time limit.

“FIG. “FIG. 8 shows an alert escalation portal 800 that is hosted/provided from one or more computing devices within the information management system 100, 300 in accordance to various embodiments. The storage manager 106 can be configured to host the alert-escalation interface 800. This allows a user to set or adjust alert priority and rules from any one or more computing devices in the information management system. Alert escalation interface 800 may include multiple windows, such as an events for escalation and alert window 802, a availability tracking window 806, a location track window 808, or a priority point 810. FIG. 8 shows one example of an alert escalation window. 8 is just one example of the alert escalation interface. Many others are possible.

“The events for alarm window 802 allows a system administrator to choose from one or more events that are related to information management system 100, 300. A computing device can be set up to notify or alert one or more people about equipment failures, job or task problems, performance changes, and the like, based on the events for escalation windows 802 selection. A client device that is not backed up for a certain number of days, reaching a maximum number/files/data size to be used by a data agent; failing or restoring a job; low disk storage on a client; failure to access or mount media; and/or low space for a software module such as a media agent. These are some examples of events that can be generated an alert. These are just a few examples of the hundreds of system events that a user can select to trigger alert escalation.

“The devices for alert window 804 allows users to choose a method to send an alert. Electronic modes of notification include email, cellular phone messages and home telephone. They can also be used to update a network feed or a page. Not shown, alerts can also be sent via various social networking apps. The alert escalation interface 800, for example, may allow the storage manager (106) to connect to various social media platforms, such as Facebook, Twitter and Google Circles to distribute alerts. This is possible if the list of contacts or individual points within the contact list authorizes. The alert escalation protocol interface 800 may include additional features for connecting social networking apps, as described in the commonly assigned U.S. Patent Application Publication 2013,/0263289 with attorney docket number 60692-8093.US1, entitled?INFORMATION AND MANAGEMENT DATA ASSOCIATED w/ MULTIPLE CLOUDS? This document is hereby included by reference in its entirety.

“The availability tracking window 806 may be used by the system administrator to determine what conditions are necessary for determining the availability or non-availability of a point contact. The storage manager 106 can be set up to track or determine the unavailability a specific point of contact. Some internet protocols (?IP?) can be used, for example. Telephones and private branch exchanges (?PBX?) can be connected to the internet protocol (?IP) Telephones can be connected to directory services to indicate that a point-of-contact is using the telephone. The storage manager 106 can check if a point-of-contact is in a meeting, out of office, or engaged in a scheduled event.

The storage manager 106 can be set up to connect to various social networking apps from the list of contacts and use the APIs associated to the networking applications to determine the status of a user. A point of contact might use Facebook’s location feature (e.g., to indicate the location of their post, such as a park, movie, theater, restaurant or other attraction). The location information may be accompanied by a map or coordinate-based information, which the storage manager 106 can use to locate the point of contact. The storage manager 106 may also manipulate other social networking apps like Foursquare and Google Circles to locate a point-of-contact.

“The availability tracking window 806 allows a system administrator or system administrator to determine which status should be considered unavailable. A point of contact might be unavailable if they are not in the office or are attending a meeting but could be available if they were on a telephone call or if Outlook calendar items are not meetings or conferences, which is when it is just an informational reminder, or entry.

“The location tracking window 808 allows a system administrator or user to authorize the tracking of one or more people on a contact list. You can track the IP address and location of your laptop via cell phone. Companies and other organizations often issue communication devices to employees, such as smart phone, in order to allow them to respond more quickly to company needs. Many communication devices now have location services that are based either on GPS or wireless service provider-based tracking. For example, triangulation. According to certain embodiments, an organization can install a program on a work-assigned communications device and configure it to: 1) acquire the location of the communication devices; and 2) update directory service with the location of the communications devices, using, for example web-based services. If enabled, the storage manager 106 can determine that a point-of-contact is not available if it’s located outside of a predefined radius of the company’s/organization’s location.

“In other embodiments the storage manager106 may track an Ip address of a laptop, or any other electronic device that is assigned to a point-of-contact in an organization. A laptop or any other electronic device can have one or more programs installed to collect and transmit its IP address to a directory service or database. To determine the current IP address of the laptop, the program may be run by the operating system. One or more reverse lookup resources may be used by the storage manager 106 to locate the general location of the mobile device. Websites such as??whois.net,??ipaddress.com, and others are examples of reverse lookup resources. In a U.S. patent application Ser. No. No. 13/728.386, with attorney docket number 60692-8107.US. Titled,?APPLICATION of INFORMATION MANAGEMENTPOLICIES BASED ON OPERATION MIT A GEOGRAPHIC ENTITY? It is herein incorporated by reference in its entirety.

The location tracking window 808 can also be used by system administrators or other users to allow the information management system 100 300 to track an employee who is using a building security program. Many organizations use electronic access methods, such as RFID cards, biometric scanners and swipe cards, to track employees’ movements on their premises. Some electronic access methods allow for access to buildings, while others permit access to the parking areas associated with an organization’s building. The storage manager 106 can query the building security system’s computing system or data structures to determine if there is a point-of-contact in a building or parking area associated with an organization. This will allow the storage manager to send an alert to that point of contact, or escalate an alert beyond that point of contact.

“The point-of-contact priority window 810 allows you to select from a variety of methods to determine primary and secondary contacts to whom alerts will be sent when there are errors-related or failure-related events in an information management system. A user may be able to choose or determine an alert escalation policy using either manual parameters 812 or team-based parameters 814 or graphically assigned parameters 816. The manual parameters 812 can include the name, username, telephone number or email address for one or more points to contact the storage manager 106 to incrementally reach. A user may be able to prioritize the IT departments that are being contacted by an alert-causing system event using team-based parameters 814. As discussed in FIG. 8, graphically assigned parameters 816 can be used to allow a user to graphically determine the order of alert escalation. 6.”

The information displayed in the point-of-contact priority window 810 can be based upon information obtained from one or more systems directories. The team-based priority parameters 814 or the graphically assigned parameters can be filled by running an Active Directory query for each subgroup within the IT department. This will display the results to allow the user to prioritize. To populate the graphically-assigned parameters 816, you can use a similar query. The alert escalation interface 800 allows users to choose from and prioritize alert delivery to various members of an IT support group or other groups that are responsible for information management operations support.

“This section will show you various examples of systems that can be used to illustrate and describe the methods and systems described in FIGS. You may also implement the systems illustrated in FIGS. 1-8. Systems illustrated in FIGS. 9A-9H and related discussions further explain the features of each component introduced in information management system 100 and 300. Together with FIGS. The systems described in FIGS. 9A-9H also allow work queue management, forecasting or estimating information management failures, as well as escalating information management system alerts to fix system errors, failures and performance issues.

“Information Management System Overview”

“Depending on the organization’s size, there may be many data production sources that fall under the control of thousands, hundreds or even thousands of employees. Individual employees used to be responsible for protecting and managing their data in the past. In other cases, a patchwork of software and hardware point solutions was used. These solutions were often offered by different vendors, and sometimes had little or no interoperability.

“Certain embodiments discussed herein provide systems or methods capable of addressing these shortcomings and other shortcomings of previous approaches by implementing unified information management across the organization. FIG. FIG. 9A illustrates one such information management software 900. It generally includes hardware and software that can be used to manage metadata and data generated by various computing devices within the information management system 900.

“The organization that uses the information management system 990 may be a company or other business entity, a non-profit organization or educational institution, household or governmental agency or the like.”

“Generally, the systems described herein may be compatible and/or provide some of the functionality of one or more U.S patents or patent application publications assigned by CommVault Systems, Inc., each which is hereby incorporated into its entirety by reference herein.

“The information management software 900 can contain a wide range of computing devices. The information management system 900, for example, can include one or more client computing device 902 and secondary storage computing device 906.

Summary for “Work flow management in an information management system”

Global businesses recognize the commercial value and seek cost-effective, reliable ways to secure their information while minimizing productivity. Information protection is often part and parcel of an organizational process.

A company may back up important computing systems like web servers, file servers, web server, etc. as part of its daily, weekly or monthly maintenance plan. A company might also protect the computing systems of each employee, such as those used in an accounting, marketing, or engineering department.

Companies continue to look for innovative ways to manage data growth and protect data, given the ever-growing volume of data under their control. Companies often use migration techniques to move data to cheaper storage and data reduction techniques to reduce redundant data, prune lower priority data, and so forth.

“Enterprises increasingly see their stored data as an asset. Customers are seeking solutions that can not only manage and protect their data but also allow them to leverage it. Solutions that provide data analysis capabilities, information management and improved data presentation and accessibility features are increasingly in demand.

“Overview”

Organizations simply cannot afford to lose critical data. This is because of the growing importance of protecting and leveraging data. Protecting and managing data is becoming more difficult due to runaway data growth and other modern realities. It is therefore essential to have efficient, powerful and user-friendly data protection and management solutions.

“Depending on the organization’s size, there may be several data production sources that are under the control of many employees, students, or thousands of individuals. Nearly all employees, students and other people now have access to a computer (or are assigned one). It is essential for daily tasks. To provide information management and other services for clients, organizations deploy servers in various hierarchical configurations.

A storage manager can be used to manage server jobs not covered by either a data retention or data storage policy. This will increase productivity of computing devices such as servers within an information management system. The present disclosure provides methods for managing servers using non-storage policy or non-retention job jobs. These jobs are based on server statuses such as idle and available. However, the methods can be applied to any computing device. The storage manager can be configured to issue jobs to servers by using push queue techniques. This reduces query traffic to the storage manager. Reducing server-originating queries can lower the server manager’s load, increase network bandwidth availability, and enable server processing resources to be used to process jobs already assigned to servers.

“In addition managing jobs such as queuing or issuing them, a storage manager can notify system administrators, or other users, jobs that seem unlikely to be completed within a specified time limit. The present disclosure describes systems and methods for forecasting and estimating job failures in a timely fashion. These include throughput estimations between a transmitting and receiving computing device. Instead of comparing the number or amount of jobs copied to a threshold to determine failure, the system compares data thruput (the amount of data processed per unit of time) to a threshold. A storage manager can generate an alert to notify users of a predicted failure. The user can either address the problem or reschedule the job to avoid the alert. A user can prevent network congestion by receiving an alert and taking corrective action. This could help other users.

“Even though a storage manager may generate or transmit a system alert to a recipient, it is possible for the user to not respond or to be unable to address a job failure or network congestion. The present disclosure describes systems and methods for escalating alerts when an alert recipient is not available. You can use system directory tools to determine if an alert recipient is available or based on the failure to acknowledge the alert. You can escalate an alert by sending the alert to other members of your information technology (?IT?) team or supervisors. a team of information management administrators. A storage manager can decrease the chance of system failures not being addressed and reduce organizations’ risk of unprotected information systems or unprotected computing devices by escalating alerts. The storage manager might be configured to raise alerts in the event that a storage device, storage computing device or network bandwidth drops below a certain threshold or if a job is expected to fail within a specified time limit.

“Brief Information Management System Overview.”

“FIG. “FIG. 1 illustrates work queue management within an information management system 100 according to one embodiment. There are many computing devices that can be used in the information management system 100. As will be explained in detail, the information management systems 100 could include a primary storage system 102, secondary storage subsystems 104 and a storage manager. These components and systems allow users to create, store and manage data objects that are associated with them.

“The primary storage system 102 comprises one or more client computing device 108 that is communicatively connected to one or several primary storage devices 110. Any number of electronic computing devices can be included in the client computing device 108, including a laptop, tablet or smart phone.

“As shown, the client computing devices 108 may contain one or more data agents 112. These are designed to manage information generated through or through the use one or more of the applications 114 that are installed on the client computing devices 108. To facilitate manipulation and retention of primary data 116, the data agent 112 communicates to the storage manager 106 and the primary storage 110.

According to some embodiments, primary data 116 can be production data or any other?live? data. Data generated by the operating systems and other applications 114, residing on a client computing devices 108. Primary data 116 is usually stored on the primary storage device(s), 110. It is organized using a file system that is supported by the client computing devices 108. The client computing device (108) and the corresponding applications 114 can create, modify, delete, write, or otherwise use primary data 116. Some or all of the primary information 116 may be stored in the cloud storage resources.

“Primary Data 116 is usually in the native format for the source application 114. Primary data 116, according to some aspects, is the initial or first stored copy (e.g. created before any other copies, or at least one additional copy) of data generated from the source application 114. In some cases, primary data 116 is substantially created directly from the data generated by the source applications 114.

“Primary data 116 can sometimes be called a?primarycopy? It is a set of data. This does not mean that the “primary copy” is necessarily the one being used. It is a copy in that it was copied from another version.”

“The primary storage device 110 is able to serve the storage requirements of the client computing devices 108 in any of a variety of storage device implementations. The primary storage device 110 could be either a solid-state or mechanical hard drive, or a network accessible storage unit (?NAS). You can also use the same.

“The primary storage system102 shows a single client computing unit 108, and a single primary computing device 110. However, the primary subsystem102 can contain dozens, hundreds, or even thousands of client computing units 108, and primary storage devices 110. The primary storage subsystem (102) can include any or all computing devices that are used to support the productivity of a company, educational institution, or another organization that values the preservation, retention and maintenance of electronically generated data.

In the discussion that follows FIGS., you will find additional details about various exemplary embodiments for the components of primary storage subsystem 102. 9A-9H.”

It may be helpful to create copies of the primary data (116) for recovery purposes and/or regulatory compliance. The secondary storage subsystem104 contains one or more secondary storage computing units 118 and one, or more, secondary storage devices 120 that are configured to create and store one (or more) secondary copies 124 (inclusive copies 124a-124n) of primary data 116.

“Creating secondary copies 124 can aid in search and analysis efforts. It also helps meet other information management objectives, such as: restoring data or metadata if an initial version (e.g. of primary data 116) is deleted, corrupted, or lost; allowing point-in time recovery; complying to regulatory data retention and electronic discovery requirements; reducing storage capacity; facilitation of organization and search; increasing user access to data files across multiple computing devices; and implementing data retention policy.

“The client computing devices (108) access and receive primary data 116. They then communicate the data, e.g. over the communication paths 126 to the secondary storage device(s), 120. One or more private and/or publicly accessible networks can be included in the communication pathways 126, such as campus area networks and metropolitan area networks.

A secondary copy 124 may contain a separate, stored copy of the application data. It can be derived from one or several earlier-created, store copies (e.g. primary data 116 or another second copy 124). Secondary copies 124 may contain point-in time data and can be stored for a relatively long period of storage (e.g. weeks, months, or years) before any or all data is moved to another storage or discarded. You can choose from full backup, incremental backup or auxiliary copies.

“The secondary storage computing device 118 provides an intermediary interface between secondary storage devices 120, and other components within the information management system 100. To facilitate inter-component communication within the information management 100, each secondary storage computing device 118 can be associated with or may contain a media agent 122. The media agent 122 is configured to communicate with both the storage manager (106) and the data agent (112 of the client computing devices 108). The media agent 122 interfaces with secondary storage devices 124 for copying, reading, analyzing, transferring or other manipulation of secondary copies 124.

In the discussion that follows FIGS. 9A-9H.”

“The storage manager106 is a centralized storage/or information manager that can be configured to perform specific control functions. The communication channel(s), 126 connects the storage manager 106 to the primary storage system 102 and secondary storage subsystems 104. The storage manager106 facilitates data transfer between the primary storage system 102 and secondary storage subsystem.104 The storage manager 106 might instruct the data agent 112 how to retrieve all or part of the primary data 116. The storage manager 106 can then initiate communications with the data agent 112 to one or several media agents 122 in order to transfer some, or all, of the primary data (116) to one or multiple secondary storage devices 120. In some embodiments, the storage manger 106 may use a software module such as a jobs agent 128 to initiate, facilitate, schedule and otherwise manage communications between data agent 112 122 and media agents 122.

“The storage manager106 can be configured for additional information management operations. The storage manager 106 could include an index 130, or interface with the index 130. The index 130 could be a database, or another data structure, that can be used for scheduling and tracking information management policies. The storage manager 106 can update index 130 to reflect an operation such as a transfer of information between the primary storage system 102 and the secondary storage system 104. The index 130 can be updated by the storage manager 106 to reflect any information management operations that have been performed or are scheduled to occur in the information management software 100. In accordance with data retention policies, the jobs agent 128 can reference the index 130 before transferring a secondary copy (124) from one secondary storage computing unit 120 to another, slower, and more expensive secondary storage device 120.

“The information system 100 could be a single information system cell that contains multiple information management system cells. It may also include a number of educational institutions, businesses, or other organizations. A management agent 132 may be used by the storage manager 106 to communicate with other storage managers from other information management system cell cells. The storage manager 106 may query other storage managers and other information management system cell cells to obtain information that meets the requirements of the queries. The storage manager 106 can update any or all databases, tables, data structure, or similar, upon receipt of information from other information management systems cells.

“While it is possible to distribute functionality across multiple computing devices, there are other situations where it may be advantageous to consolidate functionality on one computing device. In various other embodiments, any or all of the components in FIG. 1, which are all implemented on separate computing devices, can be implemented on the same computing machine. One configuration includes a storage manager (106), one or more data agents (112 and one or two media agents (122) all being implemented on one computing device. One or more data agents 112 or one or several media agents 122 can be implemented on the same computing device while the storage manager (106) is on a separate computing machine.

“Work Queues”

The storage manager 106 can be set up to manage different types of jobs within the information management software 100 by using different resources. The storage manager 106, for example, can group all jobs and tasks in the information management software 100 into one or several types or categories. The storage manager can then allocate certain types of jobs to specific storage manager resources such as processes. The storage manager may allocate 134 jobs to jobs to execute a data storage or retention policy. A second group of 136 jobs can be allocated to jobs to perform other jobs in the information management system 100. The first group of processes 134 includes tasks such as backing up, restoring, and analyzing data. The second group of jobs 136 includes those associated with maintaining the information management system 100 (e.g. software updates), security maintenance (e.g. security patches, virus scanners, etc.). Information management system policy synchronizations (e.g. changes to job preemption policies and job priorities, updates of alert definitions, etc.) are also possible. It may not be possible to manage the jobs that are associated with the first group 134 of the information management system 100 from other components than the storage manager. These jobs can be managed by the storage manger 106 in accordance to data storage and retention policies. They are also stored and maintained at the storage manager106. The first group 134 can be interchangeably called information management operation or system process 134. While the second group 136 is different than the first group 134, it can be interchangeably called information management system or operation processes.

“In contrast, tasks or jobs associated with the second group 136 can traditionally be initiated and managed by servers, clients, or other than the storage manager (106), within the information management software 100. This implementation of task management may have some disadvantages. If the storage manager106 manages many client computing devices 108, 118 and secondary storage computing device 118, then receiving request for updates or task authorizations from all these devices can lead to a bombardment on the storage manager106. Client computing devices 108, and/or secondary storage computing device 118, can all dedicate processing resources, such as CPU cycles or memory. To request information from the storage manager106 or to allocate CPU cycles and memory, these resources will be at least partially inaccessible for backup, restoration and retention operations. Each request also consumes bandwidth on communication channels 126 that communicate with the components of information management system 100.

“Configuring the storage manger 106 to manage the jobs associated with the second set of processes 136 is possible by taking advantage of the storage manger 106 holistic awareness the status of each computing device within the primary and secondary storage subsystems 102 and 104. For example, since the storage manager 106 already uses the jobs agent 128 for tracking the status of various jobs within the primary storage subsystem 102 and the secondary storage subsystem 104, the storage manager 106 is positioned to efficiently issue non-storage/retention policy jobs to the computing devices of the primary storage subsystem 102 and secondary storage subsystem 104 based on the operational statuses of the computing devices. The storage manager 106 could issue a management job to a secondary storage computing devices 118 if that secondary storage computing device is available and online.

“The storage manager106 may use different work queues to manage (e.g. track and schedule) jobs within the information management system 100. The storage manager 106 might manage the first group 134 by using a first queue 138. The second group of processes may be managed by the storage manager 106 using a second queue 140.

“The first work queue number 138 can be any type of data structure such as a table that contains a number columns identifying aspects about a job such as the job ID, device ID, media agent identifiers, job type, job status, etc. The first work queue 138 may also contain additional columns such as an error identification, data agent identification, or a numerical indication about job progression, although this is not shown. A number of rows 142 may be included in the first work queue 138, each one associated with a job or task.

The second work queue 140 might only contain jobs that are related to the second group of process 136. The second work queue 140 could include columns such job ID, device ID and status, job type, job status, and job status. The second work queue 140 can be divided into one or more rows of tasks, 144. The second group of tasks 136 could include jobs or tasks that are related to the information management system 100, but are not directly connected to backing up, restoring, or retaining data. The second group of processes (136) can also be used to manage jobs that don’t execute data storage/retention policy execution, are not related to them, or are only tangentially connected to the information management system’s data retention policies. The second work queue 140 may be responsible for installing security patches, synchronizing information management system policies and other software updates.

Based on the status of devices under their control, the storage manager 106 could issue jobs to media agents.122 The storage manager may wait for the job to be distributed in the first row 144 of the second queue 140. This is based on the status of the computing devices that the media agents 122 control. The storage manager 106 can queue a job until the status of the computing devices becomes available or idle. Computing device 1 (shown at the second work queue 140 in one embodiment) is the secondary storage computing devices 118a and client computing devices 108. The storage manager 106 can preempt jobs in the first queue 138 for more urgent jobs. The storage manager 106 might wait until there are only one, two or a few jobs for a device in the first queue 138 before deciding whether to preemptively issue or prioritize a job in 140 over the job(s), still in the first queue 138 for that device.

The storage manager 106 suspends all unidirectional communication to computing devices when they are offline. The storage manager106 reduces the incoming traffic by issuing and dispersing jobs unidirectionally from second work queue 140. This could be used to distribute jobs to media agents 122 or data agents 112. The storage manager 106 can be configured to stop issuing jobs if a computing device belonging to a media agent 122, or data agent 112 is offline. This will further reduce network traffic. To determine when a computing device becomes online, the storage manager 106 could periodically or continuously ping or transmit messages. A more efficient network implementation of the information manager system 100 could be to configure all computing devices to notify storage manager 106 when a status changes from offline to online. The storage manager 106 may update the second queue 140 to reflect the current status of the device and resume the distribution of jobs from the second queue 140.

“Some or all of the second groups of processes 136 may be executed by other devices in the information management system 100. The information management software may be installed on the computing devices that make up the information management system 100 and executed by them. FIG. 1 illustrates the first and second groups of processes 134, 136 respectively. 1 is included in the client computing devices (108) and secondary storage computing devices (118) respectively, as well as the storage manager 106.

The information management system 100 will run more efficiently and less trouble if the storage manager 106 is configured to manage jobs that have been completed or managed by client computing devices (108) or secondary storage computing device (118). The storage manager 106 can protect itself against being bombarded by task or job requests. Network traffic on channel 126 may also be reduced. Client computing devices (108) and secondary storage computing device 118 can then focus their processing resources on jobs related to data storage and retention policies. The embodiments above refer to a work queue for administrative tasks that is managed by the storage manager. However, another or more secondary storage computing device 118 can execute a similar work queue to reduce bombardment by these computing devices by subordinate devices, e.g. client computing devices. Client computing devices 108 can wait for secondary storage computing device 118 to start jobs or tasks. This is an alternative to pinging secondary storage computing device 118 for job requests updates. This configuration can further reduce network traffic and protect servers from being bombarded by requests. It also allows client computing devices to 108 allocate processing resources to non-managerial tasks or jobs.

“FIG. 2. This illustrates a method 200 for managing a work queue in an information management software that is different from jobs in a data storage or retention policy. In accordance with one embodiment, the method 200 could be executed in an information management system similar to 100.

“In block202, a storage administrator receives jobs from the Internet or from software program administrators. These jobs can be performed in addition to the data retention or storage policy-defined jobs. According to different embodiments, the jobs could include security patches, software updates, or synchronizing configuration changes across the information management system.

“In block 200, the storage manager updates at most two work queues of jobs using the received jobs. The storage manager schedules or organizes tasks or jobs that are related to executing a data storage policy or data retention policy. The second work queue includes jobs and tasks of a different nature than those in the first queue. The jobs in the second queue may not be related or tangentially related with the jobs specified by the data storage policy or data retention policy. However, execution of the jobs in the second queue is required for the information management system or to function.

“At block206, the storage manager updates work queues with statuses computing devices to which jobs are scheduled for assignment. Statuses for the computing devices in the information management system could include offline, online or available, busy, processing the job and job completed recently, job paused, job canceled, job lost, job finished, job recently completed, job stopped, or other similar.

“At block208, the storage manger issues jobs to the two or more work queues according to priority settings for jobs and based at minimum in part on the current status of computing devices to which they are distributed. The storage manager can be configured to distribute jobs outside of those defined by retention policies and data storage. This allows the storage manager more efficient management of network traffic and protection from being bombarded with requests.

“Throughput Failure Forecasting.”

“If a computer device cannot complete one or more scheduled, queued or issued jobs within a given timeframe, it might be helpful to notify or alert a system administrator, other users, or the deficiency. There are many reasons why jobs may not be completed in a timely fashion. Unexpectedly, data can increase or decrease in the job, making it impossible to transfer data within a given time frame. Another example is the network throughput. This refers to the rate or amount data transfers over time. It can decrease unexpectedly and/or dramatically enough to make an allotted window of time insufficient to transfer a certain amount of data between computing device. Another example is when a computer used in transmitting or receiving data may fail to function properly. Notifying the appropriate personnel about any potential problems may allow a system administrator or IT administrator to fix any hardware or software issues that might be preventing an individual job from being executed. Information management system operations that fail to take place during regular windows of time can cause delays in the system and spread to other areas. This is similar to a traffic jam. An organization’s information can be at risk if it isn’t being fully or partially backed-up. These systems and methods are described in accordance to various embodiments of this disclosure.

“FIG. FIG. 3 shows an information management system 300, which can be configured to provide an operator failure forecast interface 302. A user may be able to set parameters to forecast information management failures, and generate alerts about those failures. The operation failure forecasting interface 302 can be accessed via a web-based interface, hosted by the storage manager. It may be accessible from any computer device, whether internal or external, connected to the information management software 300. One implementation of the operation failure prediction interface 302 contains a definition of operation 304, an operation selection 306, an alert notification time 312 and 312 respectively, as well as a default action menu 314, a stop menu 316, and an alert selection menu 312. The operation failure forecast interface 302 allows users to modify the feature according to their preferences or system administrators. One example of an operation fail forecast interface 302 is displayed. There are many others.

“System administrators schedule resource-intensive information management operations in accordance with their convenience and, most importantly, the availability of network throughput. Network throughput, as it is defined herein, refers to data transfer rates from one computing device or another. Network throughput can be defined as both the bandwidth available to network communication channels, and the processing speed and/or availability of computing devices involved in data transfer. Network throughput can be measured from a source device to a destination device or target device. It may also include the number of networks used in data transfer. The network throughput measures the speed at which data is processed by the source computing devices for transfer over the networks, transferred from the source computing devices to the target computing devices, and/or stored temporarily by the target computing devices after receiving it over the networks. A particular data transfer or operation might have start times that are dependent on other jobs being completed, heavy network bandwidth usage and/or availability of other components. Information management operations may have stop-time limitations that are determined by other scheduled information management operations, scheduled maintenance or an otherwise upcoming demand for network resources. You can define a specific operation window 304 using the operation failure forecast interface 302. The operation window 304 can be used to specify a date for an operation as well as a time and duration. You can define the day of operation in terms of days of week, such as Sunday through Saturday, days of the year, days of the month or days of the week. You can also define the duration option using any of several duration parameters such as minutes, seconds, hours, days or the like. You can display the time in either 24-hour or 12-hour increments. The operation window 304 defines the end time and duration. There is no need to define a start time. In some embodiments, however, an operation window 304 may include a start time description, either in lieu or in addition to any of the parameters illustrated.

“In the operation menu 306. The operation failure forecast interface 302 allows the user to select the information management operation that the alert will be applied to. You can choose from a drop-down or other options selection interfaces in the operation menu 306 You can populate the operation menu 306 with backup copies, disaster recovery copies, compliance copies, auxiliary copies, archive and other options. A user may be able to choose to perform more specific operations, such as full backup or incremental backup, synthetic back-up, and the like.

“In some embodiments, the operation failure forecast interface 302 displays the recommended time for an operation window based upon previous information management operations. If the user has allocated one hour for a full backup of 10 TB of computing systems, and a previous similar operation took 10 hours, then the operation failure forecast interface 302 may inform the user about the times of similar operations that were based on operation history timetables. The index 130 may store tables of operation history in some embodiments.

“The operation fail forecast interface 302 contains the throughput estimate menu 308 that allows users to choose from several throughput estimation methods. Although the throughput estimation menu 308 can be displayed as a drop-down, it can also be used as a textbox, multiple check boxes, radio buttons or other graphic interface elements. The throughput estimation menu 308 shows at least three methods that a storage manager can use to estimate the throughput for an information management operation. These techniques include a window technique for previous jobs, a window technique of time, and a graphic correlation technique. Below are descriptions of each of these techniques.

According to one embodiment, the storage manger 106 estimates throughput for job using throughput data from one of more jobs. You can select one or more of the previous jobs that were used in the estimation to provide a sample set. This allows you to choose operations that have varying degrees or correspondence with the operation in the operation menu 306 To provide an indication of current throughput within the information system 300, it is possible to use one or more jobs that were performed immediately before the job selection in the operation menu 304. Another example is to filter the jobs more closely with the job in operation menu 306 by adding one or more jobs executed on the same computing device as the job in operation menu 306. Another example is the use of one or more jobs from the past to estimate throughput. This can be done by calculating the average throughput for jobs of the same type (e.g. average number of full backups and incremental backups). The following information is required: the time the job was performed, the date and the hour, the day the job was executed, as well as the name of the previous job.

You can also apply different mathematical functions to the job or jobs that were selected for throughput estimation. For example, the average throughput of the jobs. You can use the lowest or slowest throughput job to make a conservative estimate. To estimate the throughput for the job in operation menu 306, the fastest or most productive of the previous jobs can be used.

According to another embodiment, the storage manger estimates throughput for the job selected in operation menu 306 using throughput measurements taken within a specific time window. You can choose to include past day’s throughput measurement, several days of throughput measurement, or an entire year of data throughput measurement. The downside to the window of time technique is that it may not accurately represent positive and negative spikes, extremes in throughput rates by using an average of throughput measurements. A Sunday’s throughput rate may be much higher than the actual rates at the close of business on Thursday or Friday, when employees are more likely to use network bandwidth while surfing the Internet. A window of throughput measurements that spans a week may not correspond well with the time for which a job is to be executed. One embodiment uses the average throughput measurement within the same time frame as the operation in operation menu 306 to determine the schedule for execution. Other embodiments use statistical functions to estimate throughput rates for a given time window. The storage manager 106 can create quartiles of throughput rates. For example, the first and second quartiles are throughputs that are lower than the mean or average throughput data for the chosen window of times, while the third and fourth represent throughputs rates greater than the average or median throughput rate within the window. Storage manager 106 can use either the average of the first or second quartile throughput measurements to estimate throughput more conservatively. Storage manager 106 can use the average of the third or fourth quartile throughput measurements to make more optimistic estimates. Alternately, the storage manager106 can use the lowest throughput rates achieved within the time window. This could provide system administrators with an ‘worst-case scenario’. Estimation of the time it would take to complete a job. Additional statistical operations can also be used. The applied statistical functions may include, for example, determining one or more standard deviations below the mean throughput measurements and using them.

According to another embodiment, the storage manager (106) may use graphical correlation techniques in order to estimate throughputs for a specific job. You can use the historical graph to show cyclic variations in throughput for a longer period of time. The storage manager 106, for example, can be set up to calculate cyclic patterns graphically or mathematically based on the days of each week, days of each month, times of month, and other factors.

“FIG. “FIG. A y-axis 402 may be included in the historic pattern correlation graph 400. This represents network throughput. It is a rate per minute (e.g. gigabytes or megabytes per second). Historic pattern correlation graph 400 also contains an x-axis 402. For average throughput, the x-axis 406 may contain more than one reference. The x-axis can be used to identify the days 406 and 406 respectively in a monthly cycle. This representation or calculation can provide better estimates of future throughput, as it may be more accurate to estimate future throughput, because the end of a month cycle, the beginning of a month cycle, and the middle portion of a month cycle could all show similar throughput averages over a time. The graph 400 can contain average data points for a monthly, annual, semi-annual, and other cycles. Some times in a monthly cycle (e.g., the time window for 412 which is approximately seven days) may show relatively higher throughput averages, or measurements. The storage manager 106 may notify a system administrator of the relatively higher throughput rates while operating the operation fail forecast interface 302.

“The storage manager108 may compare historical measurements with a snapshot of throughput measurements 414 in order to predict future trends. The storage manager 106 might compare the current throughput measurements 414 to the average throughput measurements for a specific time period 416. For example, seven days. The storage manager 106 may use the plotted throughput trends derived from the average throughput measurement 410 as a forecast/estimate if the time window 412 is strongly correlated with the average throughput measurements. If the correlation between throughput measurements 414 & average throughput measurements 412, is weak, storage manager 106 can use the operation failure forecast interface 302. This will indicate weakness and recommend alternative throughput estimation techniques. The storage manager 106 can use various mathematical operations to determine the correlation between throughput measurements 414, and average throughput measurements (410). The advantage to using historical pattern correlation is that it allows for the capture of cyclical patterns, such as weekend days or throughput associated at the middle of the month (versus beginning or ending of the month), over time. This may give a more reliable indicator of throughput.

“Returning back to FIG. “Returning to FIG. The alert threshold 310 lets a user set a threshold to initiate an alert based upon the estimated completion time of a job. If the user wishes to be notified if the job takes longer than 30 minutes, they can enter 30 minutes into the alert threshold 310. The alert selection menu 312 lets the user select from one or more alert types, including SMS, email, page, voicemail, and page. The alert selection menu 312 shows check boxes and text boxes that allow users to enter email addresses, telephone numbers, page numbers and telephone numbers. Other selection menus such as drop-down options may be used. The default action menu 314 lets a user choose the default action that the storage manager 106 will take if a job is not completed by the deadline or the alert threshold. One embodiment allows the storage manager 106 to be set up to stop a job if it is determined that the job won’t complete by the deadline. Other embodiments allow the storage manager to continue processing the job even after an alert is sent. You can use the stop menu 316 to tell the storage manager to stop a job at specific times relative to the end time. The storage manager 106 can be set up to stop a job either before, during, or after the specified end time depending on various factors. The priority of the job, as well as the availability of network resources, are some of the factors that the user might consider.

The storage manager 106 may also be set up to send alerts based upon live throughput measurements between a transmitting and receiving device. If the storage manager 106 performs a backup operation on primary data 116 to the secondary storage device 120n, the storage manger may measure or time the speed at which a certain portion of a data transfer takes place by measuring or timing a delivery of, say, one tenth the total size of the data to be delivered. If the primary data to be backed up is 1 terabyte in size, the storage manager may calculate the throughput between the primary storage device 110 and the secondary storage 120 n using the rate at which one, or more, preceding gigabytes are successfully transferred. Alternativly, the storage device may be set up to send a pilot packet to establish a current estimate of throughput before beginning an information management operation. Some embodiments measure throughput based on data transferred between a primary storage device 110 and a secondary storage facility 120. In other embodiments however, throughput can be measured using data transferred from a primary storage unit 110 to a second storage device, 118, or from a client computer device 108 and a second storage device, 118.

The ability to predict, forecast, or estimate the failure of an information management operation in order to complete it in a timely fashion may allow a user to proactive trouble-shoot and manage the information management system 300. As an example, a failure prediction can be used by a system administrator for rescheduling of previous or preceding operations. This can enable the administrator to justify upgrading network hardware. It can also enable him to identify bottlenecks in the information management system 300 and/or enable him to protect information more confidently.

“FIG. “FIG.5” illustrates an operation of the information management system failure forecast features. The ability to predict, forecast, or estimate the time it will take for information management operations to fail in a timely fashion can be a useful tool for system administrators or other users of the information management system.

“At block 502, a computing system receives from a user a threshold, such a time-related threshold. This threshold is used to determine the completion of one or more information management operation. The time-related threshold can be defined in various ways. It may be in the form of days in a week, days in a month, or days in the year. It also could be in terms the start time, end times, and/or duration of the information management operation. The threshold could also be used to define the time frame within which the operation should be completed.

“A block 504 is where a computing device calculates data throughput for a specific information management operation. A variety of techniques can be used to estimate or measure data throughput, including the use of past jobs, the window of times, and/or cyclic patterns that are based on historical throughput measurements.

“Equation 1 units are seconds and gigabytes. However, other units such as minutes, hours and days, megabytes and terabytes can be used.”

“At block 508, a computing device informs the user if it estimates that the information-management operation will not complete prior to or by the time-related threshold specified by the user. For example, Sunday, December 29, 2013, at 11:00 p.m. The computing device may use any of the following methods to send the alert to the user: email, text message or a page.

“Escalating Alerts”

“The information management software 300 can generate an alert when an operation in information management, such as jobs in the first and second work queues 138 and 140, is completed (shown in FIG. 1) are expected to be completed by a certain time or estimated to be incomplete. You can configure the system 300 to generate additional alerts about information management system 300. The system 300 can also be configured to generate alerts about application management, automatic upgrades, configuration alerts and job management alerts. The information management system 300 can generate alerts for any of the following: when a Microsoft Exchange mailbox exceeds a certain limit; when software updates, downloads, or upgrades become available; whenever a storage manager client, media agent or data agent configuration has changed; when data aging or data classification, protection, recovery, or verification operation stalls or fails; when one, two, or more media drives, media libraries, go offline unexpectedly; or any other event relating to data management.

“Some alerts are common in an information management system. They can be ignored or addressed at the discretion of the system administrator. However, other alerts can have a significant impact on an information management system’s ability manage and protect an organization’s information. Alerts regarding online to offline status changes for secondary storage devices or storage libraries can prove problematic and may prevent the execution of important storage and/or retention operations. An information management system can generate alerts in response to different alert-generating events. Alerts may go unanswered if they are sent to employees on vacation, not working for the organization, absent from the office, sick, bereavement leave or who are involved in personal matters that could hinder or stop the recipient from responding to the alert. One embodiment of the invention allows an information management system (e.g. information management system 300) to be set up to automatically escalate unacknowledged alerts up a hierarchy of management until the alert has been acknowledged and/or someone takes corrective action to resolve the alert-causing incident.

“FIG. “FIG. Some of the alerts or events may be linked to events that can hinder or prevent an information management system’s ability to protect an organization’s information. The 600 employee hierarchy chart represents a hierarchy of people who are responsible for maintaining an information system. The chart 600’s lowest level of employees is responsible for acknowledging, addressing and/or resolving alert-related events. However, the ultimate responsibility for resolving an event-driven alarm rests with the person at top of the hierarchy. an IT division director.”

According to one embodiment, “The employee hierarchy diagram 600 could include task-specific teams 602 or layers of management 604 according to one embodiment. Task specific teams 602 could include IT administrators and personnel responsible for maintaining and updating the organization’s information technology infrastructure. A task specific team 602 could include an information management group 606, a software support group 608, a network support staff 610, a team 612 and an administrative support crew 614. The information management team 606 could be responsible for ensuring data storage and retention policies are properly executed. Other IT-related tasks may include updating and maintaining communication networks, installing new software and operating systems, setting up and purchasing new computers/clients, as well as creating usernames and passwords for clients.

The storage manager 106 can be set up to raise specific alerts based on a hierarchy, priority, or set rules. An alert priority rule or an alert escalation policy related to information management operation can be set up to be sent to different members of the information team 606, team supervisors, division managers, and finally, to the director or a client. The storage manager 106 can be set up to transmit an alert first to a team member. Information management team 606. Some embodiments allow team members of the information management group 606 to be given the designation of team member. Different types or alerts can be assigned to different team members within the information management group 606 in order to share responsibility for high priority alerts. The storage manager 106 can be set up to wait for acknowledgement for the alert for a certain amount of time (e.g. 30 minutes). If the alert is not acknowledged by the expiration date, the storage manager 106 can escalate it to team member?B. The alert can be escalated to the information management team 606 by the storage manager 106. Each team member will receive a predetermined time limit to acknowledge the alert. The storage manager can escalate the alert to a higher level of management if all members of a team fail to acknowledge the alert promptly. If the information management group 606 comprises team members A,B,C, and D, then the storage manager 106 can be configured to escalate the unacknowledged alarm to team member E, who is the supervisor of the information management group 606. The storage manager 106 can configure the alert to be escalated to team supervisor F or team supervisor G, before taking the matter to the next level. In some cases, the storage manager 106 may be set up to escalate unacknowledged alarms to H and then to escalate the alert to I and/or the client J. In some embodiments, 600 is the employee hierarchy chart that represents the team responsible for IT support within the organization for which the alert was generated. In another embodiment, the employee hierarchy diagram 600 is an outside IT services firm or group that has been hired to manage information operations and/or alerts for another company, such as client J.

The employee hierarchy chart 600 shows one example of an alert escalation pathway. However, the storage manager (106) can be configured to execute and escalate alerts using other priority routes or other escalation rules. According to different embodiments, the time it takes to generate an alert and escalate an alert can be extended or decreased. The alerts can also be sent to all levels of management 604 before being escalated to a higher management level. Some embodiments allow the storage manager to escalate an alert to the first level of management within a predetermined time, such as 30 minutes. Then, it can escalate the alert to the higher management level 604. In the discussion below, additional options are provided for setting and managing escalation priority rules. 8.”

“FIG. “FIG.7” illustrates a flow chart of 700 methods that can be used by a storage manager, or any other computing device in an information management system, for the purpose of increasing information management operation alerts. The advantage of elevating alerts within the hierarchy of a team that is responsible for information management can be to reduce the time between an alert-causing incident and acknowledgment (and remedy).

“At block 702, a computing devices receives an alert of a system problem, system slowdown or other alert-causing information system event. A computing device may be notified that the status of a secondary storage device has changed unexpectedly from an online to offline status. This could prevent an information management system’s ability to execute backup operations. It may also expose an organization’s information to greater risks of data loss than it wants.

“At block 704, the computing device determines the point of contact to receive an alert. An alert may indicate a system failure, slowdown, or any other system event that was triggered in block 704. A set of rules or a service team hierarchical or employee hierarchy chart may be used by the computing device to determine the first point of contact. Or, it could use a list of manually entered contacts.

“At block 706, a computing device determines whether a point-of-contact is available using directory services. Directory services may be used by the computing device, such as Microsoft’s Active Directory, Lync, Novell?s eDirectory and Apache’s ApacheDS. Oracle’s Oracle Internet Director, OpenDS or other similar. Many directory services have specific interfaces for application programming or can be used with a general directory access protocol such as LDAP (lightweight director access protocol). The computing device can query various attributes of directory services, such as organizationStatus and meetingEndTime and meetingStartTime and meetingScope to determine if the point of contact is still employed, out of the office or on a call. It can also determine if the point of contact is unavailable to respond to alerts and acknowledge it. The computing device can, for example, call a mobile or home telephone number to determine if the point of contact cannot be reached. Some implementations allow the computing device to make several calls to the point of contact, such as three calls in 60 minutes before it determines that the point is not available. The method 700 will block 708 if the point of contact is not available. If the point is accessible, the method proceeds to block 708

“At block 708 the computing device determines the next point of contact to receive an alert. The computing device can identify the next contact by reference to one or more organizational tables, charts or by walking through an automatically generated or manually generated list. These lists may be prioritized based on seniority or job function within an organisation. If an IT administrator is the primary point of contact, then the IT administrator’s manager or supervisor may be the next point. The computing device could also determine that the next point is someone who has a peer relationship to the primary point. The computing device can then be configured to escalate an alert to contact who have supervisory roles or relationships with the primary point-of-contact after exhausting the list of contacts. Block 708 moves on to block 706, where computing determines the availability for the next contact. The 700 method may alternate between blocks 706 and 708 until the organization has found an available point-of-contact.

“At block 710 the computing system notifies or alerts the point of contact for the alert-generating event. The computing device can alert the point of contact using any combination of electronic resources, according to different embodiments. The computing device could alert the point of contact by using a pager or a cell phone (e.g. text message and/or electronic recording), an email, home telephone, RSS feed, or other electronic resources. The computing system may alert more than one person at once in some cases. The computing system could be set up to notify a point-of-contact and his/her supervisor, such as by copying them on email to the point of touch. This duplicative notification can be used to alert the point of contact’s supervisor about escalating alarms so that they are not taken by surprise when an alert goes to them.

After sending one or more alerts, the computing device can be set up to escalate and/or transmit the alert to another point of contact if the first point of contact fails acknowledge, respond to and/or correct the original event. For example, the computing device can be configured to host a web interface through which employees of the 600-member hierarchy can log in and acknowledge receipt. The method 700 will continue to block 708 if the alert is not received within the predetermined time. The method 700 is terminated at block 712 if the contact has acknowledged the alert within the time limit.

“FIG. “FIG. 8 shows an alert escalation portal 800 that is hosted/provided from one or more computing devices within the information management system 100, 300 in accordance to various embodiments. The storage manager 106 can be configured to host the alert-escalation interface 800. This allows a user to set or adjust alert priority and rules from any one or more computing devices in the information management system. Alert escalation interface 800 may include multiple windows, such as an events for escalation and alert window 802, a availability tracking window 806, a location track window 808, or a priority point 810. FIG. 8 shows one example of an alert escalation window. 8 is just one example of the alert escalation interface. Many others are possible.

“The events for alarm window 802 allows a system administrator to choose from one or more events that are related to information management system 100, 300. A computing device can be set up to notify or alert one or more people about equipment failures, job or task problems, performance changes, and the like, based on the events for escalation windows 802 selection. A client device that is not backed up for a certain number of days, reaching a maximum number/files/data size to be used by a data agent; failing or restoring a job; low disk storage on a client; failure to access or mount media; and/or low space for a software module such as a media agent. These are some examples of events that can be generated an alert. These are just a few examples of the hundreds of system events that a user can select to trigger alert escalation.

“The devices for alert window 804 allows users to choose a method to send an alert. Electronic modes of notification include email, cellular phone messages and home telephone. They can also be used to update a network feed or a page. Not shown, alerts can also be sent via various social networking apps. The alert escalation interface 800, for example, may allow the storage manager (106) to connect to various social media platforms, such as Facebook, Twitter and Google Circles to distribute alerts. This is possible if the list of contacts or individual points within the contact list authorizes. The alert escalation protocol interface 800 may include additional features for connecting social networking apps, as described in the commonly assigned U.S. Patent Application Publication 2013,/0263289 with attorney docket number 60692-8093.US1, entitled?INFORMATION AND MANAGEMENT DATA ASSOCIATED w/ MULTIPLE CLOUDS? This document is hereby included by reference in its entirety.

“The availability tracking window 806 may be used by the system administrator to determine what conditions are necessary for determining the availability or non-availability of a point contact. The storage manager 106 can be set up to track or determine the unavailability a specific point of contact. Some internet protocols (?IP?) can be used, for example. Telephones and private branch exchanges (?PBX?) can be connected to the internet protocol (?IP) Telephones can be connected to directory services to indicate that a point-of-contact is using the telephone. The storage manager 106 can check if a point-of-contact is in a meeting, out of office, or engaged in a scheduled event.

The storage manager 106 can be set up to connect to various social networking apps from the list of contacts and use the APIs associated to the networking applications to determine the status of a user. A point of contact might use Facebook’s location feature (e.g., to indicate the location of their post, such as a park, movie, theater, restaurant or other attraction). The location information may be accompanied by a map or coordinate-based information, which the storage manager 106 can use to locate the point of contact. The storage manager 106 may also manipulate other social networking apps like Foursquare and Google Circles to locate a point-of-contact.

“The availability tracking window 806 allows a system administrator or system administrator to determine which status should be considered unavailable. A point of contact might be unavailable if they are not in the office or are attending a meeting but could be available if they were on a telephone call or if Outlook calendar items are not meetings or conferences, which is when it is just an informational reminder, or entry.

“The location tracking window 808 allows a system administrator or user to authorize the tracking of one or more people on a contact list. You can track the IP address and location of your laptop via cell phone. Companies and other organizations often issue communication devices to employees, such as smart phone, in order to allow them to respond more quickly to company needs. Many communication devices now have location services that are based either on GPS or wireless service provider-based tracking. For example, triangulation. According to certain embodiments, an organization can install a program on a work-assigned communications device and configure it to: 1) acquire the location of the communication devices; and 2) update directory service with the location of the communications devices, using, for example web-based services. If enabled, the storage manager 106 can determine that a point-of-contact is not available if it’s located outside of a predefined radius of the company’s/organization’s location.

“In other embodiments the storage manager106 may track an Ip address of a laptop, or any other electronic device that is assigned to a point-of-contact in an organization. A laptop or any other electronic device can have one or more programs installed to collect and transmit its IP address to a directory service or database. To determine the current IP address of the laptop, the program may be run by the operating system. One or more reverse lookup resources may be used by the storage manager 106 to locate the general location of the mobile device. Websites such as??whois.net,??ipaddress.com, and others are examples of reverse lookup resources. In a U.S. patent application Ser. No. No. 13/728.386, with attorney docket number 60692-8107.US. Titled,?APPLICATION of INFORMATION MANAGEMENTPOLICIES BASED ON OPERATION MIT A GEOGRAPHIC ENTITY? It is herein incorporated by reference in its entirety.

The location tracking window 808 can also be used by system administrators or other users to allow the information management system 100 300 to track an employee who is using a building security program. Many organizations use electronic access methods, such as RFID cards, biometric scanners and swipe cards, to track employees’ movements on their premises. Some electronic access methods allow for access to buildings, while others permit access to the parking areas associated with an organization’s building. The storage manager 106 can query the building security system’s computing system or data structures to determine if there is a point-of-contact in a building or parking area associated with an organization. This will allow the storage manager to send an alert to that point of contact, or escalate an alert beyond that point of contact.

“The point-of-contact priority window 810 allows you to select from a variety of methods to determine primary and secondary contacts to whom alerts will be sent when there are errors-related or failure-related events in an information management system. A user may be able to choose or determine an alert escalation policy using either manual parameters 812 or team-based parameters 814 or graphically assigned parameters 816. The manual parameters 812 can include the name, username, telephone number or email address for one or more points to contact the storage manager 106 to incrementally reach. A user may be able to prioritize the IT departments that are being contacted by an alert-causing system event using team-based parameters 814. As discussed in FIG. 8, graphically assigned parameters 816 can be used to allow a user to graphically determine the order of alert escalation. 6.”

The information displayed in the point-of-contact priority window 810 can be based upon information obtained from one or more systems directories. The team-based priority parameters 814 or the graphically assigned parameters can be filled by running an Active Directory query for each subgroup within the IT department. This will display the results to allow the user to prioritize. To populate the graphically-assigned parameters 816, you can use a similar query. The alert escalation interface 800 allows users to choose from and prioritize alert delivery to various members of an IT support group or other groups that are responsible for information management operations support.

“This section will show you various examples of systems that can be used to illustrate and describe the methods and systems described in FIGS. You may also implement the systems illustrated in FIGS. 1-8. Systems illustrated in FIGS. 9A-9H and related discussions further explain the features of each component introduced in information management system 100 and 300. Together with FIGS. The systems described in FIGS. 9A-9H also allow work queue management, forecasting or estimating information management failures, as well as escalating information management system alerts to fix system errors, failures and performance issues.

“Information Management System Overview”

“Depending on the organization’s size, there may be many data production sources that fall under the control of thousands, hundreds or even thousands of employees. Individual employees used to be responsible for protecting and managing their data in the past. In other cases, a patchwork of software and hardware point solutions was used. These solutions were often offered by different vendors, and sometimes had little or no interoperability.

“Certain embodiments discussed herein provide systems or methods capable of addressing these shortcomings and other shortcomings of previous approaches by implementing unified information management across the organization. FIG. FIG. 9A illustrates one such information management software 900. It generally includes hardware and software that can be used to manage metadata and data generated by various computing devices within the information management system 900.

“The organization that uses the information management system 990 may be a company or other business entity, a non-profit organization or educational institution, household or governmental agency or the like.”

“Generally, the systems described herein may be compatible and/or provide some of the functionality of one or more U.S patents or patent application publications assigned by CommVault Systems, Inc., each which is hereby incorporated into its entirety by reference herein.

“The information management software 900 can contain a wide range of computing devices. The information management system 900, for example, can include one or more client computing device 902 and secondary storage computing device 906.

Click here to view the patent on Google Patents.