Software – Ashwin Gautamchand Sancheti, Commvault Systems Inc

Abstract for “Snapshots and backup copies for individual virtual machines”

“System and techniques to perform snapshot and backup copy operations on individual virtual machines within a shared storage. A system may also include one or more physical storage devices that are communicatively connected to the hypervisor and can store a plurality of virtual machine. One or more shared computer storage devices can store a plurality of storage volumes, each volume uniquely corresponding with one of the virtual machine. A command can be issued to the hypervisor by the system to create a backup copy or snapshot operation using a specific information management policy.

Background for “Snapshots and backup copies for individual virtual machines”

Businesses recognize the commercial value and look for cost-effective, reliable ways to safeguard the data stored on their computers networks. This will minimize the impact on productivity. As part of its daily, weekly, and monthly maintenance program, a company may back up important computing systems like databases, file servers or web servers. A company might also protect the computing systems of its employees, such those used by marketing departments, accounting departments, engineering departments, and so on. Companies continue to look for innovative ways to manage data growth due to the ever-growing volume of data under their control. This includes migrating data to cheaper storage over time, reducing redundant information, pruning lower priority data, and so on. Companies increasingly see their stored data as an asset and seek solutions to leverage it. Data analysis, information management, enhanced data presentation and access, and other such capabilities are becoming more in demand.

“In some embodiments, the system includes: a hypervisor that creates and operates a plurality virtual machines; one, or more, shared physical storage devices to the hypervisor to store the plurality virtual machines; a virtual server agent that issues a command to hypervisor to perform a snap copy operation on a selected virtual machine without performing the same operation on any other virtual machine in one or more of the shared physical computers storage devices.

“In some embodiments, the method includes: creating and operating a plurality virtual machines using a hypervisor; storing them in one or more physical computer storage device communicatively coupled with the hypervisor; and a virtual server agent issuing a command for the hypervisor to perform the snapshot copy operation on a selected virtual machine without performing the snapshot copy operation on any other virtual machine in one or multiple shared physical computers storage devices.

“In some embodiments, a Virtual Server Agent comprises: A memory for storing instructions to execute a method that includes issuing a command for a hypervisor in order to perform a snapshotcopy operation for one selected one among a plurality virtual machines operated by it, the hypervisor being communicatively connected to one or several shared physical computers storage devices which store the plurality virtual machines, wherein a plurality if storage volumes are stored in the one physical computer storage device, each volume uniquely corresponding one of the database.

“Some embodiments of a virtual agent method include: the virtual agent issuing a command for a hypervisor that performs a snapshotcopy operation for one selected one of a plurality virtual machines. The hypervisor is communicatively coupled with one or several shared physical computers storage devices which store the plurality virtual machines. A plurality storage volumes are stored in the one- or more shared computer storage device, each volume uniquely corresponding one of the virtual machine. The virtual server agents is receiving metadata from the hypervisor corresponding to the complete snapshot copy operations.

“In some embodiments, non-transitory computer-readable media stores instructions that, when executed by at minimum one computing device, perform a virtual agent method. This includes: issuing a command for a hypervisor in order to perform a snapshotcopy operation for a selected machine, the hypervisor being communicatively connected to one or several shared physical computers storage devices which store the plurality virtual machines. The virtual server agents also receive metadata corresponding the completed snapshot operation from the hypervisor. The virtual server agents then stores the database.

“In some embodiments, the virtual server agent includes: a memory to store instructions to carry out a process that involves commanding an outside system to create snapshot copies of any one of the plurality virtual machines stored in it; receiving metadata about the completed snapshot copy from this external system; storing the metadata into a database; selecting any one or more of the completed snapshot copies to use to create a backup of any one of the plurality virtual machines; and instructing the external systems to create a backup of a selected machine using metadata.

“In some embodiments, the method of a virtual agent includes: the virtual agent commanding an outside system to create snapshot copies of any one of a plurality virtual machines stored in that external system; the virtual agent receiving metadata about the completed snapshot copy from that external system; the database storing the metadata; the virtual agent receiving a selection any one or more complete snapshot copies to use to create a backup of any of the plurality virtual machines; and the virtual agent commanding the exterior system to create backup copy of a select virtual machine

“In some embodiments, non-transitory computer-readable media stores instructions that when executed by at minimum one computing device perform the following: the virtual agent commands an external system create a snapshot of any one of a plurality virtual machines stored in the system; the virtual agent receives metadata about the completed snapshot copy from this system; the database stores the metadata; the virtual agent selects one or more complete snapshot copies to be used for creating a backup of any one of the plurality virtual machines; and the virtual agent commands the external systems to create a selected snapshot using metadata.

Virtual machines are used by many companies to maximize the use of computing resources. A shared datastore is where many independent virtual machines can be stored. One Logical Unit Number (LUN), for example, can hold hundreds to thousands of virtual machines in a shared datastore. Enterprises must create backup copies and snapshots of their virtual machine data just like any other data. However, conventional virtual machine systems may not be able to make snapshots or backup copies for individual virtual machine disks within the shared datastore. Some systems are limited in their ability to take a snapshot at the LUN-level, so that a snapshot is not only taken of a specific virtual machine, but all virtual machines stored to the LUN. This can lead to inefficient backup and snapshot copy operations for virtual machine systems. The systems and techniques described herein can be used to overcome this problem. They allow snapshot and backup copy operations for specific virtual machines within a shared storage.

“Detailed descriptions and examples for systems and methods according one or more illustrative embodiments are available in the section entitled Selective snapshot and Backup copy Operations for Individual Virtual Machines within a Shared Store, as well in the section called Example Embodiments and in FIGS. 4-6 herein. Additionally, components and functionality that enable selective snapshots and backup copies for individual virtual machines within a shared storage can be configured and/or integrated into information management systems like those shown in FIGS. 1A-1H, 2A-2C.”

“Many of the embodiments described herein are inextricably linked to, enable by, or would not exist without, computer technology.” The systems and techniques described herein for performing backup copies and selective snapshots for virtual machines within a shared storage cannot be done by humans without the technology that supports them.

“Information Management System Overview”

Organizations cannot afford to lose critical data due to the importance of protecting and leveraging their data. Protecting and managing data is becoming more difficult due to runaway data growth, and other modern realities. It is imperative to find efficient, powerful and user-friendly ways to manage and protect data as well as smart, efficient storage management. Depending on how large an organization is, there might be multiple data production sources that fall under the control of many individuals. Individuals were responsible for protecting and managing their data in the past. There may have been a variety of software and hardware solutions used within any organization. These solutions were often offered by different vendors and did not have interoperability. These and other limitations are addressed by certain embodiments. They implement scalable, unified information management across the organization, including data storage management.

“FIG. “FIG. It generally comprises combinations of hardware, software and hardware that protect and manage metadata and data generated by computing devices in system 100. In some embodiments, System 100 can be called a?storage management software? System 100 may also be referred to as a’storage management system? System 100 performs information management operations. Some of these operations may also be called?storage operations. or data storage operations. To protect and manage data stored in or managed by system 100. System 100 can be used by any organization: a company, other business entity, non profit organization, educational institution or household.

“Generally, the systems described herein may be compatible and/or provide some/all of the functionality of one or more U.S patents/publications or patent applications assigned by Commvault Systems, Inc., each which is hereby incorporated in its entirety herein.

“System 100 can include computing devices and computing technology. System 100, for example, can contain one or more client computing units 102, secondary storage computing devices (106), as well as storage manager 140 and a host computing device. Computer devices can include without limitation one or more of the following: personal computers, workstations, desktop computers, and other types generally fixed computing systems like mainframe computers or servers. Other computing devices include portable or mobile computing devices like laptops, tablets computers, personal information assistants, mobile phones (such a smartphones), and other mobile/portable computing devices such embedded computers, set top boxes or vehicle-mounted devices. Mail servers, file servers and database servers can all be considered servers. A computing device can have one or more processors, such as CPU, single-core, multi-core, or multi-core processors, as well as non-transitory memory (e.g. random-access memory, RAM) that is used to store computer programs that will be executed by one or more processors. Other computer memory that can be used for mass storage may be included in the package/configuration of the computing device (e.g. an internal hard drive) or may be accessible from the outside (e.g. network-attached storage or a storage array). Sometimes, cloud computing resources are included in a computing device, which can be used to create virtual machines. A third-party cloud service provider may provide one or more virtual machines to an organization.

“In some instances, computing devices may include one or more virtual machines (or?host machine?). The organization may use the virtual machines. One example is that an organization might use one virtual machine to run its database server and another as a mail server. Both virtual machines can be operated on the same machine. A virtual machine (?VM) is a computer that runs on a virtual host. A virtual machine (?VM?) is a software implementation that is not physically present. Instead, it is instantiated in the operating system of a host computer or physical computer to allow applications to run within the VM’s environment. A virtual machine (VM) includes an operating system, as well as associated virtual resources such computer memory and processors. The hypervisor is responsible for running and creating VMs and acts between the VM’s hardware and the virtual host machine. Hypervisors may also be known as virtual machine monitors, virtual machine managers, or?VMMs’ in the art. They can be implemented in firmware, software, or hardware specialized for the host machine. ESX Server, from VMware, Inc., Palo Alto, Calif., Microsoft Virtual Server, and Microsoft Windows Server Hyper-V, all by Microsoft Corporation, Redmond, Wash. Sun xVM, Oracle America Inc., Santa Clara, Calif., and Xen, Citrix Systems, Santa Clara, Calif. are some examples of hypervisors. Each virtual operating system has a hypervisor that provides resources such as a virtual processor and virtual memory. Each virtual machine is associated with one or more virtual disks. The hypervisor stores data from virtual disks on the file system of the physical machine. These files are called virtual machine disk files (VMDK). In VMware lingo, or virtual hard drive image files (in Microsoft terminology). VMware’s ESX Server offers the Virtual Machine File System, (VMFS), for the storage and management of virtual machine disk files. Virtual machines read and write data to their virtual disks in the same way as physical machines. U.S. Pat. outlines some techniques for information management in cloud computing environments. No. 8,285,681. U.S. Pat. explains some techniques for information management in virtualized computing environments. No. 8,307,177.”

Information management system 100 may also include electronic data storage devices. These devices are generally used for mass storage, such as primary storage devices (104) and secondary storage devices (108). You can store any type of storage device, including disk drives, storage arrays, network-attached storage technology (NAS), technology), semiconductor memory (e.g. solid state storage devices), network-attached storage technology (NAS), tape libraries or magnetic, non-tape storage, optical media storage devices and DNA/RNA-based memories technology. Some storage devices are part of a distributed data system. Some storage devices can be provided in a cloud storage environment, such as a private cloud, or one managed by a third party vendor, for primary data, secondary copies, or both.

“Depending on the context, the term “information management system?” The term “information management system” can be used to refer to all the hardware and software components shown in FIG. 1C or a subset may be used. In some cases, system 100 may refer to a group of components that are used to protect, manage, manipulate and/or process data generated by client computing devices 102. System 100 does not necessarily include the components that create and/or store primary information 112, such the client computing devices (102), and primary storage devices (104). System 100 may also not include secondary storage devices 108, such as a third-party cloud storage environment. For example, what is an?information management? system? Or?storage management? Sometimes, the term storage manager may also refer to one or more components. These will be further described below: data agent, storage manager, and media agent.

“One or more client computing units 102 may be part system 100. Each client computing device (102) has an operating system, at least one application 110, and one or two accompanying data agents. It is also associated with one or several primary storage devices (104), which store primary data 112. In some cases, the primary storage subsystem 117 may be used to refer to client computing device(s), 102, and primary storage device(s), 104.

“Client Computing Devices and Clients”, and Subclients

“Data must be managed and protected from a variety sources within an organization. One example is that corporate environments can include employee workstations as well as company servers, such a mail server or web server, database server, transaction server, and the like. One or more client computing devices (102) are data generation sources in system 100. A client computing device is a computing device with a data agent (142) installed and running on it. It can include any type or computing device. One or more clients computing devices 102 can be associated with user accounts and/or users.

“A ?client? “A?client?” is a logical part of information management system 100. It may be a logical grouping or a set of data agents that are installed on a client computing devices 102. Storage manager 140 can recognize a client as part of system 100 and may, in certain embodiments, automatically create a client component when a data agent (142) is installed on a client computer 102. The associated data agent 142 tracks data generated by executable components 110 so it can be properly protected in system 100. A client could be considered to have generated data and stored the generated data to primary storage such as primary storage device (104). The terms “client” and “client computing device” are not interchangeable. and ?client computing device? These terms are not intended to imply that client computing devices 102 and 102 can be configured in the client/server context relative to other computing devices such as mail servers or that client computing devices 102 cannot be considered a server. A client computing device 102 could include file servers, database servers and virtual machine servers.

“Each client computing device (102) may have application(s), 110 running thereon that generates and manipulates the data to be protected against loss and managed by system 100. Applications 110 are generally used to support the operation of an organization. They can include file system applications (e.g. Microsoft Exchange Server), mail client applications (e.g. Microsoft Exchange Client), application(s) 110 that generate and manipulate data. An application-specific data agent (142) may accompany each 110 application. However, not all data agents are specific to an application. An application 110 may include a file system such as Microsoft Windows Explorer. It may also be accompanied with its own data agent 142. Client computing devices 102 may have at most one operating system (e.g. Microsoft Windows, Mac OS X iOS, IBM z/OS Linux, or other Unix-based OSs). There may be one or more file system or other applications 110 installed on the device. A virtual machine that runs on a client computing device 102 in some embodiments may be considered to be an application 110. It may also be accompanied by a data agent 142 (e.g. virtual server data agent).

“Client computing device 102 can be connected to other components of system 100 via one or more electronic communications pathways 114. A first communication path 114 can communicatively link client computing devices 102 and secondary storage computing devices 106. A second communication pathway, 114, may also communicate with storage manager 140, client computing device 102, and storage device 102. A third communication pathway, 114, may also communicatively link storage manager 140, client computing device 101, and storage device 106. (see, e.g., FIG. 1A and FIG. 1C). 1C. In some cases, communication pathways 114 may also include application programming Interfaces (APIs), such as cloud service provider APIs and virtual machine management APIs. Communication pathways 114’s underlying infrastructure can be wired or wireless, analog or digital, and may include any combination of these facilities.

“A ?subclient? “A?subclient?” is a logical grouping or subset of client’s primary information 112. A subclient can be defined depending on how subclient data will be protected in system 100. A subclient could be associated with a particular storage policy. One client could have several subclients. Each subclient may be associated with a different storage strategy. Some files could form a subclient, which requires compression and duplication and is associated to a first storage plan. Some files may also form a second client, which may require a different retention schedule and encryption. It may also be associated with a different storage policy. Although the primary data can be generated by the same software 110 and may belong only to one client, different subclients may have different treatment. Below is more information about subclients and storage policies.

“Primary Data, Exemplary Primary Storage Devices”

“Primary Data 112 is usually production data or?live data. data generated by the operating systems and/or applications 110 running on client computing device.102. Primary data 112 is usually stored on primary storage device(s), 104. It is organized using a file system that runs on client computing device, 102. Client computing device(s), 102, and the corresponding applications 110 can create, modify, delete, or otherwise use primary data 112. Primary data 112 can be found in the native format of source application 110. Primary data 112 refers to the initial or first storage body of data created by the source software 110. In some cases, primary data 112 is substantially created directly from the data generated by the source application 110. This can be helpful in certain tasks, such as organizing primary data 112 into different units with different granularities. Primary data 112 may include files, directories and file system volumes. Data blocks, extents and any other hierarchies of data objects can also be included. A “data object” is defined herein. A?data object? can be defined as (i) any file currently addressable or previously addressable (e.g. an archive file) and/or (ii) a subset (e.g. a data block, extent, etc.). Primary data 112 can include structured data (e.g. database files), unstructured (e.g. documents) and/or semistructured data. See, e.g., FIG. 1B.”

It can also be used to perform certain functions of system 100, such as accessing and modifiying metadata in primary data 112. Metadata is information about data objects, and/or the characteristics that are associated with them. It is important to note that any reference to primary information 112 includes the associated metadata. However, primary data does not usually include metadata. The metadata can include the following information: the name of the data owner (e.g. the client or user who generated the data object), the date and time at which the data was last modified (e.g. the date and time at which the data object was modified), the file size (e.g. the number of bytes of data), information regarding the content (e.g. an indication of the existence of a specific search term), user-supplied tag information, to/from information for emails (e.g. an email sender, recipient), recipients, etc. The creation date, file type (e.g. format or application type), the last accessed times, application type (e.g. type of application that created the data objects), location/network (e.g. a current, past, or future location of the object and network paths to/from it), metadata about the content (e.g. user-supplied tags), to/from information for email (e.g. an email sender, recipient, etc.), permissions, users, groups, access control list (ACLs), system metadata, registry information (e. Some applications 110 and/or components of system 100 also maintain metadata indices for data objects. For example, metadata associated to individual emails. Below is a more detailed explanation of how metadata can be used to perform classification and other functions.

“Primary storage devices (104) storing primary data 112 can be relatively slow and/or costly technology (e.g. flash storage, hard-disk array, solid state memory, or disk storage). These devices are typically used to support high-performance production environments. Primary data 112 can be very dynamic and/or intended for short-term retention (e.g. hours, days or weeks). Client computing device 102 may be able to access primary data 112 stored at primary storage device104 using conventional file system calls through the operating system. Client computing device 102 can be associated with or in communication with one of the primary storage devices (104), which stores the corresponding primary data 112. Client computing device number 102 is considered to be in communication or associated with a specific primary storage devices 104 if it can perform one or more of the following: retrieving data from primary storage devices 104, coordination of retrieval from primary storage devices 104, routing and/or storage of data to primary storage system 104, or modifying or deleting data in primary storage unit 104. A client computing device 102 could be considered to have access to data stored in an associated storage unit 104.

“Primary storage devices 104 can be shared or dedicated. Each primary storage device (104) may be dedicated to a client computing device 102. Other cases allow one or more primary storage device 104 to be shared between multiple client computing devices (102), e.g. via a local network, in cloud storage implementations, etc. One example is primary storage device 104, which can be shared by a group 102 clients, including EMC Clariion and EMC Symmetrix.

“System 100 could also include hosted services (not illustrated), which may be hosted by another entity than the one that employs the components of system 100. The hosted services could be offered by online service providers. Such service providers can provide social networking services, hosted email services, or hosted productivity applications or other hosted applications such as software-as-a-service (SaaS), platform-as-a-service (PaaS), application service providers (ASPs), cloud services, or other mechanisms for delivering functionality via a network. Each hosted service can generate additional data and metadata as it serves users. This data may be managed by system 100 (e.g. primary data 112). The hosted services can be accessed via one of the applications 110 in some cases. A hosted mail service can be accessed by a browser on a client computer device 102.

“Secondary copies and Exemplary Secondary Storage Devices”

In some instances, “Primary Data 112 stored on primary Storage Devices 104 could be compromised by an employee who deletes or accidentally overwrites primary Data 112. Primary storage devices 104 may be lost, damaged, or corrupted. It is important to keep copies of primary data 112 in order to recover and/or comply with regulatory requirements. System 100 therefore includes one or several secondary storage computing devices (106), and one or multiple secondary storage devices (108) that are designed to create and store secondary copies 116, as well as metadata. Secondary storage subsystem 118 may include the secondary storage computing devices (106) and secondary storage devices (108).

“Secondary copies (116) can be used to aid in analysis and search efforts, and other information management goals, such as: restoring data or metadata in the event of a disaster, deletion, corruption, or loss; allowing point-in time recovery; complying and maintaining regulatory data retention and electronic discovery requirements (e-discovery); facilitating organization and searching data; increasing user access to data files across multiple computing devices, and/or hosted service; and implementing data pruning and retention policies.

A secondary copy 116 may contain a separate, stored copy of data that was derived from one or several earlier-created copies (e.g. primary data 112 or another secondary copy 126). Secondary copies 116 may contain point-in time data and can be stored for a relatively long period of time before the data is moved to another storage or discarded. A secondary copy 116 might be stored in a different storage unit than the other copies. Secondary copies 116 may be kept in the same storage device that primary data 112. A disk array that can perform hardware snapshots may store primary data 112 but creates secondary copies 116. Secondary copies 116 can be kept in slower and/or cheaper storage (e.g. magnetic tape). The secondary copy 116 could be kept in a backup, archive format, or another format than the native application format or primary data 112

Secondary storage computing devices 106 can index secondary copies (e.g. using a media agent 144) to allow users to browse and restore later and facilitate the lifecycle management for the indexed data. A secondary copy 116 representing certain primary data 112 may be created. A pointer or another location indicator (e.g., an stub), can be added to primary data 112. This will indicate the current location for a particular secondary version 116. System 100 can create multiple secondary copies 116 to store and manage metadata or instances of data objects in primary data 11.2. Each copy represents the state of the primary data object at that particular time. System 100 may also continue to manage point in time representations of the data object even though primary storage device 104 or the file system may delete an instance of the data object. The operating system 110 and other client computing devices 102 can execute under virtualization software. Primary storage device(s), 104 may also contain a virtual disk that was created on a physical storage media. System 100 can create secondary copies of 116 files or other data objects within a virtual disk and/or secondary copies of 116 of the entire virtual drive file itself (e.g. of an entire.vmdk)

“Secondary copy 116 is distinguishable from the corresponding primary data 112. Second, secondary copies 116 are distinguishable from the corresponding primary data 112. Applications 110 and client computing device 101 may not be able to access secondary copies 116 for various reasons. operations. Secondary copies 116 could have been created by media agent 144 and/or data agent 142 during the creation process (e.g. compression, deduplication of encryption, integrity markers or indexing, formatting, metadata that is application-aware, etc.). Secondary copy 116 could represent source primary data 112 but not necessarily be identical to the source.

“Second copies, secondary copies 116 could be stored on secondary storage device108 that is not accessible to client computing device102 or hosted service application 110. Some secondary copies 116 could be “offline copies”, They are not easily accessible (e.g., they are not mounted to tape or disc). “Offline copies” can be copies of data that system 100 cannot access without human intervention (e.g. tapes in an automated tape library but not yet mounted to a drive) and copies that system 100 can only access with some human intervention (e.g. tapes stored at an offsite storage location).

“Using Secondary Copies with Intermediate Devices?Secondary Storage Computing devices”

It can be difficult to create secondary copies when there are hundreds of clients computing devices 102 that constantly generate large amounts of primary data 112 which must be protected. Secondary copies can also be costly due to overhead 116. Accessing secondary storage devices 108 requires specialized programming intelligence and/or hardware capabilities. Client computing devices 102 can interact with secondary storage devices 108 to create secondary copies. However, due to the factors discussed above, this approach may negatively impact client computing device102’s ability to service/service application 110 or produce primary data 112. Client computing devices 102 may not be optimized to interact with certain secondary storage device 108.

“System 100 could include one or more software components and/or hardware that act as intermediaries between client computing device 102 (that generate primary information 112) and secondary storage device 108 (that store second copies 116). These intermediate components offer other benefits, as well as offloading some responsibilities from client computing device 102. As shown in FIG. 1D, it is possible to distribute some of the work involved with creating secondary copies (116), which can improve system performance and scalability. For example, the use of specialized secondary storage computing device 106 and media agents144 to interconnect with secondary storage device 108 and/or perform data processing operations can dramatically improve the speed at which system 100 handles information management operations. It can also increase the system’s ability to handle large volumes of operations. This can reduce the computational load on client computing devices 102. As shown in FIG. 1A, and/or one (or more) media agents 144. Below are further details about media agents (e.g., in relation to FIGS. 1C-1E). These components are specialized in intelligence and/or hardware that enable the system 100 to write to, read from, instruct, communicate with, or interact with secondary storage devices 108.

“Secondary storage computing devices(s)106” can include any of the above-mentioned computing devices. Secondary storage computing device(s), 106 may also include specialized hardware components and/or intelligence (e.g. specialized interfaces) that allow them to interact with certain secondary storage devices(s), 108 with whom they may be associated in some cases.

“To create secondary copies 116, which involves the copying data from primary storage system 117 to secondary subsystem 118,” client computing device 102 may transmit the primary data 112 (or a processed copy thereof generated by a Data Agent 142) via a communication path 114 to the designated secondary storage computing devices 106. Secondary storage computing device (106) may then process the data and transmit it to secondary storage device (108). A number of secondary copies 116 can be made from secondary copies 116. This is known as an auxiliary copy operation.

“Exemplary Secondary Data and Exemplary Primary Data”

“FIG. “FIG. Primary data 112 objects are stored on primary storage device(s), 104. These include word processing documents 119A to B, spreadsheets 120 and presentation documents 122. Video files 124 and image files 126. Email mailboxes 128 (and corresponding emails 129A?C), HTML/XML 130, databases 132, and corresponding tables, or other data structures 133A?133C. Some or all of the primary data 112 objects may be associated with the corresponding metadata (e.g.?Meta1-11). These metadata may contain file system metadata or application-specific metadata. The secondary copy 116 data object 134A-C is stored on secondary storage device(s), 108. These secondary data objects 134A and 134C may contain copies of, or otherwise represent, corresponding primary data 112.

“Secondary data objects 134A to C can each represent more than one primary object. Secondary copy data object (134A) can represent three primary data objects 133C-122C and 129C respectively. They are represented as 133C? and 122C respectively and accompanied with corresponding metadata Meta11 and Meta3. The prime mark (?) also indicates that secondary copy data object 134A may contain three separate primary data objects 133C, 122, and 129C. They are accompanied by corresponding metadata Meta11, Meta3, and Meta8 respectively. As indicated by the prime mark (? ), secondary storage computing device 106 and other components of secondary storage subsystem 112 may process primary storage subsystem 175 data and store a secondary version that includes a transformed and/or enhanced representation of primary data objects and/or metadata. This secondary copy can also include a modified or altered format such as a compressed, encrypted or deduplicated format. Secondary storage computing devices 106, for example, can create new metadata or other information from the processing and store the newly created information with the secondary copies. Secondary copy data object 1346 is the primary data objects 120, 133B, and 119A respectively. It is accompanied by corresponding metadata Meta2, METa10, and Meta1, and can be used to store secondary copies. Secondary copy data object 1346 represents primary data objects 133A and 1196 as 133A and 1196? respectively. It is accompanied by the corresponding metadata Meta9 and Meta5, respectively.

“Exemplary Information Management System Architecture”

“System 100 can contain a wide range of hardware and software components. These can be organized in different ways depending on the embodiment. It is crucial to make clear design decisions about the functional responsibilities and roles of components in system 100. These design decisions can have a significant impact on how system 100 adapts to changing data growth and other circumstances. FIG. FIG. 1C illustrates a system 100 that was designed in accordance with these considerations. It includes storage manager 140, one (or more) data agents 142 which execute on client computing devices 102 and are configured to process primary information 112, and one (or more) media agents 144 which execute on one or multiple secondary storage computing devices.106 to perform tasks involving secondary storage device 108.

“Storage Manager”

Storage manager 140 is a central storage and/or information management system that can perform control functions and store critical information about system 100. It is possible to have a large number of components and a lot of data under control. It is a complex task to manage the components and data. This can become more difficult as the organization grows in number and data. According to some embodiments, storage manager 140 is responsible for the management of system 100 or at least a substantial portion thereof. The storage manager 140 can be modified independently to adapt to changing conditions without the need to replace or redesign the rest of the system. To best serve the functions and networking requirements of storage manager 140, you can choose a computing device to host and/or operate as storage manager 140. These and other benefits are further described below with reference to FIG. 1D.”

“Storage manager 140 could be a software module, or any other application that is hosted on a suitable computing device. Storage manager 140 may be a computing device that performs some of the functions described in this document. Depending on the configuration, storage manager 140 may be used in conjunction with one or several associated data structures (e.g. management database 146), or both. Storage manager 140 is responsible for initiating, performing, coordination, and/or controlling storage operations and other information management operations. This includes protecting and controlling primary data 112 as well as secondary copies 116. Storage manager 140, in general, is responsible for managing system 100. This includes communicating with and instructing and controlling, under certain circumstances, components such as media agents 144 and data agents 142.

“As indicated by the dashed-arrowed lines (114 in FIG. 1C Storage manager 140 can communicate with, instruct, or control any or all elements in system 100 such as media agents 144 and data agents 142. Storage manager 140 is responsible for managing the operation of various software and hardware components within system 100. In some embodiments, control information is received from storage manager 140. Status and index reporting are transmitted to storage manager 140 via managed components. Payload data and metadata are typically communicated between media agents 144 and data agents 142 (or between client computing devices(s), 102, and secondary storage computing device(s), 106), depending on the direction and under the supervision of storage manager 140. The control information may include instructions and parameters for performing information management operations. This includes instructions and timing information, which specifies when to start an operation, instructions on how to execute a task, data path information that specifies what components to access or communicate with in order to complete an operation, as well as instructions and parameters. Other embodiments allow for information management operations to be controlled and initiated by other components of the system 100, such as media agents 144 or data agent 142, or combined with storage manager 140.

“Storage manager 140, according to certain embodiments, provides one or more functions:

“Storage manager 140 can maintain an associated database (or?storage manger database 146?) or ?management database 146?) Management-related data and information management policy 148. The database 146 is kept in computer memory that can be accessed by storage manager 140. Database 146 could contain a management index 150 or?index 150? Database 146 may contain a management index 150 (or?index 150?) Management tasks, media containerization, other useful data and/or any combination thereof. Storage manager 140 might use index 150 to track the logical associations between media agent 144 and secondary storage device 108, and/or data movement to/from secondary storage device 108. Index 150, for example, may store data that associates a client computing device with a media agent 144 or secondary storage device108, as described in an information management strategy 148.

“Administrators or other individuals may initiate and configure certain information management operations individually. This may work for certain recovery operations, or for other tasks that are not often performed frequently, but it is often incontainable for the implementation of ongoing organization-wide data management and protection. System 100 could use information management policies 148 to automate the execution of information management operations. An information management policy 148 may contain a stored data structure, or another information source, that specifies parameters (e.g. criteria and rules) related to storage management or any other information management operations. Storage manager 140 can process information management policies 148/or index 150. Based on these results, it will identify an information management operation and identify the components of system 100 that are involved. These components can be connected to and/or between them, and/or the operation can be controlled and instructed to these components. System 100 can then translate stored information into coordinated activity between the computing devices of system 100.

Management database 146 may contain information management policies and associated data. However, information management policies can be stored in computer storage at any location other than management database 146. An information management policy 148, such as a storage policy, may be stored in metadata in a media agent 152 or in secondary storage device 108 (e.g. as an archive copy) depending on the embodiment. Below are details about information management policies 148. In certain embodiments, management data base 146 includes a relational database (e.g. an SQL database) that tracks metadata. This could include metadata related to secondary copy operations, such as the metadata that protects client computing devices 102 and subclient data, where secondary copies are stored, and who performed the storage operation. These and other metadata can also be stored at secondary storage computing devices 106 and 108. This allows data recovery without the need for storage manager 140 in certain cases. Management database 146 could contain data necessary to launch secondary copy operations (e.g. storage policies, schedule policies etc. ), information regarding the status and reporting of completed jobs (e.g. status and error reports for yesterday’s backup jobs), as well as additional information necessary to allow restore and disaster recovery operations (e.g. media agent associations and location indexing, content and other indexing ).

“Storage Manager 140” may contain a jobs agent (156), a user interface (158), and a management agent (154). All of these may be implemented as interconnected modules or applications programs. These will be described below.

In some embodiments, “Jobs agent” 156 initiates, controls and/or monitors some or all information management operations that have been performed, are currently being performed or are scheduled to be performed in system 100. A job is a logical collection of information management operations, such as daily storage operations that are scheduled for a set of subclients (e.g. generating incremental block level backup copies 116 at a specific time every day for certain database files in a particular geographical location). Jobs agent 156 can access information management policies (e.g. in management database 146) in order to determine when, how, and where to initiate/control jobs within system 100.

“Storage Manager User Interfaces”

“User interface (158) may contain information processing and display software such as a graphic user interface (GUI), a program interface (API) and/or any other interactive interface(s). Through this interface, users and system processes can retrieve information on the status of information management operations and issue instructions to storage manager 140 or other components. Users can issue instructions to system components 100 via user interface 158 regarding secondary copy and/or recovery operations. A user might modify a schedule indicating the number of secondary copy operations that are pending. Another example is that a user could use the GUI to see the status of secondary copy jobs, or monitor certain components of system 100 (e.g. the remaining storage capacity). Storage manager 140 can track information that allows it to identify, designate, and otherwise identify content indices or deduplication databases or similar resources or data sets within its information cell (or another cell), to be searched for certain queries. These queries can be entered by users by using the user interface 158.

“Various configurations of information management 100 can be used to generate user interface data that is usable for rendering various interactive user interfaces. System 100, as well as any other system, device, or software program (e.g. a browser program), may use the user interface data to render interactive user interfaces. Interactive user interfaces can be displayed on electronic displays, including touch-enabled displays, consoles, and other devices, either directly connected to storage manager 140, or remotely communicatively coupled, e.g. via an internet connection. This disclosure describes several embodiments of dynamic and interactive user interfaces. Some of these interfaces may be generated using user interface agent158. They are also the result of substantial technological advancement. These user interfaces may allow for improved cognitive and ergonomic interactions between humans and computers. They can also provide significant cognitive and ergonomic efficiencies, as well as advantages over other systems such reduced mental workloads and better decision-making. The user interface 158 could operate in one integrated view (not shown). A reporting capability may be available on the console to generate a variety reports that can be customized to specific aspects of information management.

“User interfaces do not have to be limited to storage manager 140. In some instances, a user can access information locally via a computing device component in system 100. Client computing device 102 may have information about installed data agents 142, and associated data streams. Some information regarding media agents 144, and associated data streams, may also be available from secondary storage computing unit 106.

“Storage Manager Management Agent”

Management agent 154 provides storage manager 140 with the ability communicate with other components of system 100 and/or other information management cells using network protocols and APIs. This includes HTTP, HTTPS FTP, REST, virtualization APIs, cloud services provider APIs and hosted service provider APIs. Management agent 154 allows multiple information management cells the ability to communicate with each other. System 100 may, for example, be one information management cells in a network that includes multiple cells, or other cells that are logically related to each other, such as in a WAN, LAN, or WAN. The cells can communicate with each other through their respective management agents 154. U.S. Pat. 154 provides more information about inter-cell communication and hierarchy. No. 7,343,453.”

“Information Management Cell”

“A?information management cell?” (or ?storage operation cell? (or?storage operation cell? A logical or physical grouping may include hardware and/or software components that are used to perform information management operations on electronic data. This includes at least one storage manager 140, at least one data agent (executing on client computing devices 102) and at most one media agent (executing on secondary storage computing devices 106). FIG. 1C shows components that may be combined to form an information management cell. 1C together may form an information management cells. In some configurations, the system 100 can be called an information management or storage operation cell. An identification number 140 that is responsible for managing the cells can identify a particular cell.

Multiple cells can be organized hierarchically so that they may inherit properties from hierarchically superior cell or be controlled (automatically or not) by other cells in the hierarchy. In some embodiments, cells can inherit or be linked to information management policies, preferences or operational parameters. You can also arrange cells hierarchically according geography, architectural considerations, and other factors that are useful or desirable for information management operations. A cell could be a geographical segment of an enterprise such as a Chicago office. A second cell might represent another geographic segment such as a New York City Office. Others cells could represent different departments within an office, such as human resources, finance and engineering. A first cell can perform one or several first types information management operations, such as one or two first types at a particular frequency, and another cell may perform one, or more, second types at a different frequency, and with different retention rules. The hierarchical information is generally maintained by one or more storage mangers 140 who manage the respective cells (e.g. in the corresponding management database(s 146).

“Data Agents”

A variety of applications 110 may be run on a client computing device 102. These include operating systems, file system, database applications, e mail applications and virtual machines. The client computing device 102 can also be responsible for processing and preparing primary data 112 created by various applications 110. Moreover, the nature of the processing/preparation can differ across application types, e.g., due to inherent structural, state, and formatting differences among applications 110 and/or the operating system of client computing device 102. In some embodiments, each data agent 142 can be advantageously configured to help with information management operations that are based on the type and protection of the data being protected.

“Data agent142” is a component in information system 100. It is usually directed by storage manager 140 to create or restore secondary copies 116. Data agent 142 could be a program, such as a set executable binary files. It executes on the same client computing devices 102 and 110 that the associated application 110 is protected by data agent 142. Data agent 142 is responsible for initiating and managing information management operations with respect to the associated application(s), 110, and the corresponding primary data 112 that is generated/accessed in the specific application(s). 110. Data agent 142 might be involved in copying, archiving and migrating certain primary data 112 from the primary storage device(s). 104. Storage manager 140 may give data agent 142 control information, including commands to send copies of data objects and/or metadata 144 to media agents. Data agent 142 may also compress, deduplicate and encrypt some primary data 112, and capture application-related metadata prior to transmitting the processed data 144. Storage manager 140 may give instructions to data agent 142 to restore or assist in restoring a secondary copy (116) from secondary storage device 110 to primary storage 104 so that application 110 can access the restored data in a suitable format.

Each data agent 142 can be configured to access the data and/or metadata stored on the primary storage device(s), 104 and the host client computing device, 102 and then process the data accordingly. Data agent 142, for example, may organize or assemble data and metadata during a secondary copy operation before transferring them to a media agent (144) or another component. A list of files and other metadata may be included in the file(s). A data agent 142 can be distributed between client computing device (102) and storage manager 140 (and any other intermediate parts), or it may be deployed from remote locations or its functions approximated using a remote process that performs all or some of the functions of data agents 142. A data agent 142 can also perform functions provided by media agent. Some embodiments may use one or more generic agents 142 to handle data from multiple applications 110 or can process multiple types of data, in lieu of or in addition, using specialized data agent 142. One generic data agent 142 could be used to backup, migrate, and restore Microsoft Exchange Mailbox and Microsoft Exchange Database data. A second generic data agent might handle Microsoft Exchange Public Folder and Microsoft Windows File System data.

“Media Agents”

As noted, shifting certain responsibilities from client computers 102 to intermediate components like secondary storage computing device(s), 106 and corresponding medium agent(s), 144 can bring a variety of benefits, including faster and more reliable information management operations and improved scalability. One example is that media agent 144 can be used to store metadata and recently copied data locally on the secondary storage device(s). 108. This improves restore capabilities and performance.

“Media agent144” is a component in system 100. It is usually directed by storage manager 140 to create and restore secondary copies 116. While storage manager 140 manages system 100 in its entirety, media agent144 is a portal to secondary storage devices 108. It has specialized features that allow access to and communication with certain secondary storage devices 108. Media agent 144 could be a program that runs on a secondary storage computing unit 106. Media agent 144 coordinates and facilitates data transmission between a media agent 142 (executing from client computing device 102), and secondary storage device(s), 108 that are associated with media agent.144. Other components of the system can interact with media agent144 to gain access data stored on secondary storage devices 108. Media agents 144 are able to generate and store information about the characteristics of stored data and/or metadata. They can also generate and store additional information that provides insight into secondary storage devices 108. This is commonly referred to as indexing the secondary copies 116. One media agent 144 can operate on a secondary storage computing unit 106. In other instances, multiple media agents 144 could operate on the same secondary computing device.

“A media agent (144) may be associated to a specific secondary storage unit 108 if it is capable of the following: retrieving data from the secondary storage unit 108, coordinating retrieval from the secondary storage facility 108, routing and/or storage of data to that particular secondary device 108, and retrieving data from that particular secondary device 108. In certain embodiments, media agent 144 is physically distinct from the associated secondary storage unit 108. A media agent 144 can operate on a secondary storage computing unit 106 in a different housing, package, or location than the associated secondary storage device. One example is that a media agent 144 works on a primary server computer and communicates with secondary storage devices 108 in separate rack-mounted RAID-based systems.

A media agent 144 may be associated with a specific secondary storage device. 108 can instruct secondary storage device.108 to complete an information management task. A media agent 144 might instruct a tape library how to load or eject certain media and then archive, migrate or retrieve that data, e.g. for the purpose of restoring data from client computing device 102. Another example is that a secondary storage device (108) may contain an array of solid state drives or hard disk drives, and media agent144 may forward a LUN and other relevant information to the array. The array uses the received information for the secondary copy operation. Media agent 144 can communicate with secondary storage device (108) via a suitable communication link such as a SCSI/Fibre Channel link.

“Each media agent may have an associated media agent database (152). Media agent database 152 can be saved to a disk, or another storage device (not illustrated) that is local the secondary storage computing devices 106 on which media agents 144 executes. Other cases, media agent 152 may be stored separate from the secondary storage computing device (106). Media agent database 152 may include, among others, a media agents index 153 (see FIG. 1C). Media agent index 153 is sometimes not part of the media agent database 152.

“Media agent index 153 (or ?index 153?) “Media agent index 153 (or?index 153?)” Index 153 is a quick and efficient way to locate/browse secondary copies 116 and other data stored in secondary storage units 108. This allows you to quickly access the secondary storage device 108 to retrieve any information. For instance, for each secondary copy 116, index 153 may include metadata such as a list of the data objects (e.g., files/subdirectories, database objects, mailbox objects, etc. ), a path to the secondary copies 116 on the appropriate secondary storage device. 108; location information (e.g. offsets) that indicates where the data items are stored in secondary storage device. 108; when they were created or modified. Index 153 also includes metadata that can be used from media agent 144 to associate with secondary copies 116. Some embodiments allow index 153 to be stored alongside secondary copies 116 in secondary storage unit 108. A secondary storage device 108 may contain sufficient information to allow a ‘bare metal restore’ in some instances. A secondary storage device 108 may contain sufficient information to enable a?bare metal restore,’ in some embodiments.

“Index 153 can be used as a cache. It is also known as an ‘index cache. Index cache 153 contains data that typically reflects details about recent secondary copy operations. Some index cache 153 portions may be copied to secondary storage device (108) or moved to another location after a triggering event such as when index cache 53 reaches a certain size. This information can be recovered and uploaded back to index cache 153, or restored to media agent 14 to allow retrieval from secondary storage device(s).108. Some embodiments may contain format or containerization information related archives or other files on storage device(s).

Media agent 144 may also act as a facilitator or coordinator of secondary copy operations between client computing device 102 and secondary storage device 108. However, it does not actually write data to secondary storage unit 108. Storage manager 140 (or media agent 144) can instruct client computing devices 102 and 108 to communicate directly. Client computing device 102 can transmit data to secondary storage device 110 via intermediary components or directly, depending on the instructions received. Media agent 144 can still receive, process and/or maintain metadata related the secondary copy operations. For example, it may continue to maintain index 153. These embodiments allow payload data to flow through media agent144 in order to populate index 153. However, it is not possible for secondary storage device 108 to be written. In some cases, media agent 144 or other components, such as storage manager 140, may include additional functionality such as data classification and content indexing. These and other functions can be found below.

“Distributed, Scalable Architecture”

“System 100’s functions can be divided among various physical and/or logic components, as described. One or more of the storage manager 140, data agents 142, and media agents 144 can be used on different computing devices. This architecture can offer many benefits. This architecture can provide many benefits, such as the ability to tailor hardware and software choices to each component’s specific function. Secondary computing devices 106 that media agents 144 use can be customized for interaction with secondary storage devices 110 and provide index cache operation among other specific tasks. Client computing devices 102 can also be chosen to service applications 110 to store and produce primary data 112.

“Moreover, information management system 100 may be distributed to several computing devices in certain cases. Database 146 can be moved to or stored on a separate server, such as an SQL server, in large file systems that have large amounts of data. This distributed configuration provides additional protection as database 146 can be protected using standard database utilities, such as SQL log shipping and database replication. It is independent of other storage manager 140 functions. Database 146 can be easily replicated to a remote location for use in case of a disaster at the primary site. Database 146 can also be replicated to another computing device at the same site, for example to a faster machine in case a storage manager host computing devices is unable to support the growing system 100.

“The distributed architecture provides efficiency and scalability. FIG. FIG. 1D illustrates an information management system 100 that includes a plurality 102 client computing devices and associated data agents, 142 as well a plurality 106 secondary storage computing devices and associated media agents, 144. Based on system 100’s changing needs, additional components can be added and subtracted. Administrators can, for example, add client computing devices 102 and secondary storage computing devices106 depending on the location of bottlenecks. Load balancing is also possible if multiple fungible parts are available to address bottlenecks. Storage manager 140 could, for example, dynamically choose which media agents (144) and/or secondary devices 108 to store operations using a processing load analysis.

“Where system 100 contains multiple media agents, 144 (see, for example, FIG. 1D), a media agent 144 can provide failover functionality for a failed media agent. Dynamically selecting media agents 144 to provide load balancing is another option. Client computing devices 102 can communicate with any of the media agents, 144, e.g. as directed by storage manager 140. Each media agent 144 can communicate with any of the secondary storage devices (108), e.g. as directed by storage manger 140. Operations can be routed to secondary devices 108 dynamically and in highly flexible ways to provide failover, load balancing, and other functions. U.S. Pat. provides additional examples of scalable systems that can perform dynamic storage operations, load balancencing, failover, and more No. 7,246,207.”

While it is possible to distribute functionality across multiple computing devices, there are some advantages. In other situations, consolidating functionality can be more beneficial. Alternative configurations may allow certain components to reside on the same computing device and run simultaneously. In other embodiments, any one or more components in FIG. 1C can be implemented on the same computing devices. One configuration may include a storage manager 140 and one or more data agent 142. Other embodiments allow one or more data agent 142, one or several media agents 144, and storage manager 140 to be implemented on the same computing devices. This is not a limitation.

“Exemplary Types Information Management Operations, including Storage Operations”

“In order to protect stored data and maximize its potential, system 100 can be configured for a range of information management operations. These operations may be called storage management operations or storage operations. These operations include: (i) data movement operations; (ii), processing and data manipulation operations; and (iii); analysis, reporting and management operations.

“Data Movement Operations, including Secondary Copy Operations”

Data movement operations refer to storage operations that involve copying or migrating data between different locations within system 100. Data movement operations include copying, migrating, or other transfer of data from one or more storage devices to another.

Data movement operations include backup operations, archive operations and information lifecycle management operations like hierarchical storage management, replication operations (e.g. continuous data replication), snapshots operations, deduplication, single-instancing operations as well as auxiliary copy operations and disaster-recovery operation. Some of these operations don’t necessarily create separate copies, as will be explained. These operations are referred to as “secondary copy operations”. Because they involve secondary copies, it is simple to say that they are called “secondary copy operations” Data movement includes restoring secondary copies.

“Backup Operations”

A backup operation creates an exact copy of primary data 112 at a specific point in time (e.g. one or more files, or other data units). The backup copy 116, which is a type of secondary copy 116, may be kept independently from the original. Backups generally include maintaining both a copy of the primary data 112 and backup copies 116. In some embodiments, the backup copy is stored in a different format than the native format. This is in contrast to the backup copy in primary data 112, which may be stored in a format that is native to the source software(s) 110. Backup copies may be stored in different formats. A backup copy can be saved in a compressed backup format to facilitate long-term storage. Backup copies 116 may have a longer retention period than primary data 112, which can be highly variable. Backup copies 116 could be kept on media that is slower to retrieve than primary storage device. 104 Backup copies may be kept for shorter periods than other types of secondary copies 116. Backups can be kept offsite.

Backup operations include full backups and differential backups as well as incremental backups. Backups and/or the creation of a?reference backup. A full backup (or standard full backup) In some cases, a full backup (or?standard full backup?) is a complete copy of the data that needs to be protected. Full backup copies can take up a lot of storage so it is a good idea to keep a backup copy of the data as a baseline, and then only save changes to the backup copy.

“A differential backup operation, or cumulative incremental backup operation, tracks and stores any changes made since the last full backup. Although differential backups can quickly grow in size, they can be restored relatively quickly because you can only use the most recent differential copy to complete a restore.

An incremental backup operation stores and tracks changes made since the last backup copy. This can help reduce storage usage. However, in some cases, the process of restoring can take longer than restoring from full or differential backups. This is because accessing multiple incremental backups and a full backup may be required to complete a restore operation.

Synthetic full backups consolidate data and do not directly back up client data. The most recent full backup, either synthetic or standard, is used to create a synthetic backup. This backup can then be combined with any subsequent incremental and/or different backups. The resultant synthetic full backup is exactly the same as what would have been created if the last backup for subclient had been a full backup. A synthetic full backup, unlike standard full, incremental and differential backups does not transfer data to the backup media. It acts as a backup consolidator. Synthetic full backups extract the index data from each subclient. It uses this index data, along with the previously backed-up user data images, to create new full backup images (e.g. bitmaps) for each subclient. The new backup images combine the index and user data from the previous full backups, as well as the previously backed up differential and incremental backups, into a synthetic backup file. This backup file fully represents each subclient (e.g. via pointers), but it does not contain all its constituent data.

Summary for “Snapshots and backup copies for individual virtual machines”

Businesses recognize the commercial value and look for cost-effective, reliable ways to safeguard the data stored on their computers networks. This will minimize the impact on productivity. As part of its daily, weekly, and monthly maintenance program, a company may back up important computing systems like databases, file servers or web servers. A company might also protect the computing systems of its employees, such those used by marketing departments, accounting departments, engineering departments, and so on. Companies continue to look for innovative ways to manage data growth due to the ever-growing volume of data under their control. This includes migrating data to cheaper storage over time, reducing redundant information, pruning lower priority data, and so on. Companies increasingly see their stored data as an asset and seek solutions to leverage it. Data analysis, information management, enhanced data presentation and access, and other such capabilities are becoming more in demand.

“In some embodiments, the system includes: a hypervisor that creates and operates a plurality virtual machines; one, or more, shared physical storage devices to the hypervisor to store the plurality virtual machines; a virtual server agent that issues a command to hypervisor to perform a snap copy operation on a selected virtual machine without performing the same operation on any other virtual machine in one or more of the shared physical computers storage devices.

“In some embodiments, the method includes: creating and operating a plurality virtual machines using a hypervisor; storing them in one or more physical computer storage device communicatively coupled with the hypervisor; and a virtual server agent issuing a command for the hypervisor to perform the snapshot copy operation on a selected virtual machine without performing the snapshot copy operation on any other virtual machine in one or multiple shared physical computers storage devices.

“In some embodiments, a Virtual Server Agent comprises: A memory for storing instructions to execute a method that includes issuing a command for a hypervisor in order to perform a snapshotcopy operation for one selected one among a plurality virtual machines operated by it, the hypervisor being communicatively connected to one or several shared physical computers storage devices which store the plurality virtual machines, wherein a plurality if storage volumes are stored in the one physical computer storage device, each volume uniquely corresponding one of the database.

“Some embodiments of a virtual agent method include: the virtual agent issuing a command for a hypervisor that performs a snapshotcopy operation for one selected one of a plurality virtual machines. The hypervisor is communicatively coupled with one or several shared physical computers storage devices which store the plurality virtual machines. A plurality storage volumes are stored in the one- or more shared computer storage device, each volume uniquely corresponding one of the virtual machine. The virtual server agents is receiving metadata from the hypervisor corresponding to the complete snapshot copy operations.

“In some embodiments, non-transitory computer-readable media stores instructions that, when executed by at minimum one computing device, perform a virtual agent method. This includes: issuing a command for a hypervisor in order to perform a snapshotcopy operation for a selected machine, the hypervisor being communicatively connected to one or several shared physical computers storage devices which store the plurality virtual machines. The virtual server agents also receive metadata corresponding the completed snapshot operation from the hypervisor. The virtual server agents then stores the database.

“In some embodiments, the virtual server agent includes: a memory to store instructions to carry out a process that involves commanding an outside system to create snapshot copies of any one of the plurality virtual machines stored in it; receiving metadata about the completed snapshot copy from this external system; storing the metadata into a database; selecting any one or more of the completed snapshot copies to use to create a backup of any one of the plurality virtual machines; and instructing the external systems to create a backup of a selected machine using metadata.

“In some embodiments, the method of a virtual agent includes: the virtual agent commanding an outside system to create snapshot copies of any one of a plurality virtual machines stored in that external system; the virtual agent receiving metadata about the completed snapshot copy from that external system; the database storing the metadata; the virtual agent receiving a selection any one or more complete snapshot copies to use to create a backup of any of the plurality virtual machines; and the virtual agent commanding the exterior system to create backup copy of a select virtual machine

“In some embodiments, non-transitory computer-readable media stores instructions that when executed by at minimum one computing device perform the following: the virtual agent commands an external system create a snapshot of any one of a plurality virtual machines stored in the system; the virtual agent receives metadata about the completed snapshot copy from this system; the database stores the metadata; the virtual agent selects one or more complete snapshot copies to be used for creating a backup of any one of the plurality virtual machines; and the virtual agent commands the external systems to create a selected snapshot using metadata.

Virtual machines are used by many companies to maximize the use of computing resources. A shared datastore is where many independent virtual machines can be stored. One Logical Unit Number (LUN), for example, can hold hundreds to thousands of virtual machines in a shared datastore. Enterprises must create backup copies and snapshots of their virtual machine data just like any other data. However, conventional virtual machine systems may not be able to make snapshots or backup copies for individual virtual machine disks within the shared datastore. Some systems are limited in their ability to take a snapshot at the LUN-level, so that a snapshot is not only taken of a specific virtual machine, but all virtual machines stored to the LUN. This can lead to inefficient backup and snapshot copy operations for virtual machine systems. The systems and techniques described herein can be used to overcome this problem. They allow snapshot and backup copy operations for specific virtual machines within a shared storage.

“Detailed descriptions and examples for systems and methods according one or more illustrative embodiments are available in the section entitled Selective snapshot and Backup copy Operations for Individual Virtual Machines within a Shared Store, as well in the section called Example Embodiments and in FIGS. 4-6 herein. Additionally, components and functionality that enable selective snapshots and backup copies for individual virtual machines within a shared storage can be configured and/or integrated into information management systems like those shown in FIGS. 1A-1H, 2A-2C.”

“Many of the embodiments described herein are inextricably linked to, enable by, or would not exist without, computer technology.” The systems and techniques described herein for performing backup copies and selective snapshots for virtual machines within a shared storage cannot be done by humans without the technology that supports them.

“Information Management System Overview”

Organizations cannot afford to lose critical data due to the importance of protecting and leveraging their data. Protecting and managing data is becoming more difficult due to runaway data growth, and other modern realities. It is imperative to find efficient, powerful and user-friendly ways to manage and protect data as well as smart, efficient storage management. Depending on how large an organization is, there might be multiple data production sources that fall under the control of many individuals. Individuals were responsible for protecting and managing their data in the past. There may have been a variety of software and hardware solutions used within any organization. These solutions were often offered by different vendors and did not have interoperability. These and other limitations are addressed by certain embodiments. They implement scalable, unified information management across the organization, including data storage management.

“FIG. “FIG. It generally comprises combinations of hardware, software and hardware that protect and manage metadata and data generated by computing devices in system 100. In some embodiments, System 100 can be called a?storage management software? System 100 may also be referred to as a’storage management system? System 100 performs information management operations. Some of these operations may also be called?storage operations. or data storage operations. To protect and manage data stored in or managed by system 100. System 100 can be used by any organization: a company, other business entity, non profit organization, educational institution or household.

“Generally, the systems described herein may be compatible and/or provide some/all of the functionality of one or more U.S patents/publications or patent applications assigned by Commvault Systems, Inc., each which is hereby incorporated in its entirety herein.

“System 100 can include computing devices and computing technology. System 100, for example, can contain one or more client computing units 102, secondary storage computing devices (106), as well as storage manager 140 and a host computing device. Computer devices can include without limitation one or more of the following: personal computers, workstations, desktop computers, and other types generally fixed computing systems like mainframe computers or servers. Other computing devices include portable or mobile computing devices like laptops, tablets computers, personal information assistants, mobile phones (such a smartphones), and other mobile/portable computing devices such embedded computers, set top boxes or vehicle-mounted devices. Mail servers, file servers and database servers can all be considered servers. A computing device can have one or more processors, such as CPU, single-core, multi-core, or multi-core processors, as well as non-transitory memory (e.g. random-access memory, RAM) that is used to store computer programs that will be executed by one or more processors. Other computer memory that can be used for mass storage may be included in the package/configuration of the computing device (e.g. an internal hard drive) or may be accessible from the outside (e.g. network-attached storage or a storage array). Sometimes, cloud computing resources are included in a computing device, which can be used to create virtual machines. A third-party cloud service provider may provide one or more virtual machines to an organization.

“In some instances, computing devices may include one or more virtual machines (or?host machine?). The organization may use the virtual machines. One example is that an organization might use one virtual machine to run its database server and another as a mail server. Both virtual machines can be operated on the same machine. A virtual machine (?VM) is a computer that runs on a virtual host. A virtual machine (?VM?) is a software implementation that is not physically present. Instead, it is instantiated in the operating system of a host computer or physical computer to allow applications to run within the VM’s environment. A virtual machine (VM) includes an operating system, as well as associated virtual resources such computer memory and processors. The hypervisor is responsible for running and creating VMs and acts between the VM’s hardware and the virtual host machine. Hypervisors may also be known as virtual machine monitors, virtual machine managers, or?VMMs’ in the art. They can be implemented in firmware, software, or hardware specialized for the host machine. ESX Server, from VMware, Inc., Palo Alto, Calif., Microsoft Virtual Server, and Microsoft Windows Server Hyper-V, all by Microsoft Corporation, Redmond, Wash. Sun xVM, Oracle America Inc., Santa Clara, Calif., and Xen, Citrix Systems, Santa Clara, Calif. are some examples of hypervisors. Each virtual operating system has a hypervisor that provides resources such as a virtual processor and virtual memory. Each virtual machine is associated with one or more virtual disks. The hypervisor stores data from virtual disks on the file system of the physical machine. These files are called virtual machine disk files (VMDK). In VMware lingo, or virtual hard drive image files (in Microsoft terminology). VMware’s ESX Server offers the Virtual Machine File System, (VMFS), for the storage and management of virtual machine disk files. Virtual machines read and write data to their virtual disks in the same way as physical machines. U.S. Pat. outlines some techniques for information management in cloud computing environments. No. 8,285,681. U.S. Pat. explains some techniques for information management in virtualized computing environments. No. 8,307,177.”

Information management system 100 may also include electronic data storage devices. These devices are generally used for mass storage, such as primary storage devices (104) and secondary storage devices (108). You can store any type of storage device, including disk drives, storage arrays, network-attached storage technology (NAS), technology), semiconductor memory (e.g. solid state storage devices), network-attached storage technology (NAS), tape libraries or magnetic, non-tape storage, optical media storage devices and DNA/RNA-based memories technology. Some storage devices are part of a distributed data system. Some storage devices can be provided in a cloud storage environment, such as a private cloud, or one managed by a third party vendor, for primary data, secondary copies, or both.

“Depending on the context, the term “information management system?” The term “information management system” can be used to refer to all the hardware and software components shown in FIG. 1C or a subset may be used. In some cases, system 100 may refer to a group of components that are used to protect, manage, manipulate and/or process data generated by client computing devices 102. System 100 does not necessarily include the components that create and/or store primary information 112, such the client computing devices (102), and primary storage devices (104). System 100 may also not include secondary storage devices 108, such as a third-party cloud storage environment. For example, what is an?information management? system? Or?storage management? Sometimes, the term storage manager may also refer to one or more components. These will be further described below: data agent, storage manager, and media agent.

“One or more client computing units 102 may be part system 100. Each client computing device (102) has an operating system, at least one application 110, and one or two accompanying data agents. It is also associated with one or several primary storage devices (104), which store primary data 112. In some cases, the primary storage subsystem 117 may be used to refer to client computing device(s), 102, and primary storage device(s), 104.

“Client Computing Devices and Clients”, and Subclients

“Data must be managed and protected from a variety sources within an organization. One example is that corporate environments can include employee workstations as well as company servers, such a mail server or web server, database server, transaction server, and the like. One or more client computing devices (102) are data generation sources in system 100. A client computing device is a computing device with a data agent (142) installed and running on it. It can include any type or computing device. One or more clients computing devices 102 can be associated with user accounts and/or users.

“A ?client? “A?client?” is a logical part of information management system 100. It may be a logical grouping or a set of data agents that are installed on a client computing devices 102. Storage manager 140 can recognize a client as part of system 100 and may, in certain embodiments, automatically create a client component when a data agent (142) is installed on a client computer 102. The associated data agent 142 tracks data generated by executable components 110 so it can be properly protected in system 100. A client could be considered to have generated data and stored the generated data to primary storage such as primary storage device (104). The terms “client” and “client computing device” are not interchangeable. and ?client computing device? These terms are not intended to imply that client computing devices 102 and 102 can be configured in the client/server context relative to other computing devices such as mail servers or that client computing devices 102 cannot be considered a server. A client computing device 102 could include file servers, database servers and virtual machine servers.

“Each client computing device (102) may have application(s), 110 running thereon that generates and manipulates the data to be protected against loss and managed by system 100. Applications 110 are generally used to support the operation of an organization. They can include file system applications (e.g. Microsoft Exchange Server), mail client applications (e.g. Microsoft Exchange Client), application(s) 110 that generate and manipulate data. An application-specific data agent (142) may accompany each 110 application. However, not all data agents are specific to an application. An application 110 may include a file system such as Microsoft Windows Explorer. It may also be accompanied with its own data agent 142. Client computing devices 102 may have at most one operating system (e.g. Microsoft Windows, Mac OS X iOS, IBM z/OS Linux, or other Unix-based OSs). There may be one or more file system or other applications 110 installed on the device. A virtual machine that runs on a client computing device 102 in some embodiments may be considered to be an application 110. It may also be accompanied by a data agent 142 (e.g. virtual server data agent).

“Client computing device 102 can be connected to other components of system 100 via one or more electronic communications pathways 114. A first communication path 114 can communicatively link client computing devices 102 and secondary storage computing devices 106. A second communication pathway, 114, may also communicate with storage manager 140, client computing device 102, and storage device 102. A third communication pathway, 114, may also communicatively link storage manager 140, client computing device 101, and storage device 106. (see, e.g., FIG. 1A and FIG. 1C). 1C. In some cases, communication pathways 114 may also include application programming Interfaces (APIs), such as cloud service provider APIs and virtual machine management APIs. Communication pathways 114’s underlying infrastructure can be wired or wireless, analog or digital, and may include any combination of these facilities.

“A ?subclient? “A?subclient?” is a logical grouping or subset of client’s primary information 112. A subclient can be defined depending on how subclient data will be protected in system 100. A subclient could be associated with a particular storage policy. One client could have several subclients. Each subclient may be associated with a different storage strategy. Some files could form a subclient, which requires compression and duplication and is associated to a first storage plan. Some files may also form a second client, which may require a different retention schedule and encryption. It may also be associated with a different storage policy. Although the primary data can be generated by the same software 110 and may belong only to one client, different subclients may have different treatment. Below is more information about subclients and storage policies.

“Primary Data, Exemplary Primary Storage Devices”

“Primary Data 112 is usually production data or?live data. data generated by the operating systems and/or applications 110 running on client computing device.102. Primary data 112 is usually stored on primary storage device(s), 104. It is organized using a file system that runs on client computing device, 102. Client computing device(s), 102, and the corresponding applications 110 can create, modify, delete, or otherwise use primary data 112. Primary data 112 can be found in the native format of source application 110. Primary data 112 refers to the initial or first storage body of data created by the source software 110. In some cases, primary data 112 is substantially created directly from the data generated by the source application 110. This can be helpful in certain tasks, such as organizing primary data 112 into different units with different granularities. Primary data 112 may include files, directories and file system volumes. Data blocks, extents and any other hierarchies of data objects can also be included. A “data object” is defined herein. A?data object? can be defined as (i) any file currently addressable or previously addressable (e.g. an archive file) and/or (ii) a subset (e.g. a data block, extent, etc.). Primary data 112 can include structured data (e.g. database files), unstructured (e.g. documents) and/or semistructured data. See, e.g., FIG. 1B.”

It can also be used to perform certain functions of system 100, such as accessing and modifiying metadata in primary data 112. Metadata is information about data objects, and/or the characteristics that are associated with them. It is important to note that any reference to primary information 112 includes the associated metadata. However, primary data does not usually include metadata. The metadata can include the following information: the name of the data owner (e.g. the client or user who generated the data object), the date and time at which the data was last modified (e.g. the date and time at which the data object was modified), the file size (e.g. the number of bytes of data), information regarding the content (e.g. an indication of the existence of a specific search term), user-supplied tag information, to/from information for emails (e.g. an email sender, recipient), recipients, etc. The creation date, file type (e.g. format or application type), the last accessed times, application type (e.g. type of application that created the data objects), location/network (e.g. a current, past, or future location of the object and network paths to/from it), metadata about the content (e.g. user-supplied tags), to/from information for email (e.g. an email sender, recipient, etc.), permissions, users, groups, access control list (ACLs), system metadata, registry information (e. Some applications 110 and/or components of system 100 also maintain metadata indices for data objects. For example, metadata associated to individual emails. Below is a more detailed explanation of how metadata can be used to perform classification and other functions.

“Primary storage devices (104) storing primary data 112 can be relatively slow and/or costly technology (e.g. flash storage, hard-disk array, solid state memory, or disk storage). These devices are typically used to support high-performance production environments. Primary data 112 can be very dynamic and/or intended for short-term retention (e.g. hours, days or weeks). Client computing device 102 may be able to access primary data 112 stored at primary storage device104 using conventional file system calls through the operating system. Client computing device 102 can be associated with or in communication with one of the primary storage devices (104), which stores the corresponding primary data 112. Client computing device number 102 is considered to be in communication or associated with a specific primary storage devices 104 if it can perform one or more of the following: retrieving data from primary storage devices 104, coordination of retrieval from primary storage devices 104, routing and/or storage of data to primary storage system 104, or modifying or deleting data in primary storage unit 104. A client computing device 102 could be considered to have access to data stored in an associated storage unit 104.

“Primary storage devices 104 can be shared or dedicated. Each primary storage device (104) may be dedicated to a client computing device 102. Other cases allow one or more primary storage device 104 to be shared between multiple client computing devices (102), e.g. via a local network, in cloud storage implementations, etc. One example is primary storage device 104, which can be shared by a group 102 clients, including EMC Clariion and EMC Symmetrix.

“System 100 could also include hosted services (not illustrated), which may be hosted by another entity than the one that employs the components of system 100. The hosted services could be offered by online service providers. Such service providers can provide social networking services, hosted email services, or hosted productivity applications or other hosted applications such as software-as-a-service (SaaS), platform-as-a-service (PaaS), application service providers (ASPs), cloud services, or other mechanisms for delivering functionality via a network. Each hosted service can generate additional data and metadata as it serves users. This data may be managed by system 100 (e.g. primary data 112). The hosted services can be accessed via one of the applications 110 in some cases. A hosted mail service can be accessed by a browser on a client computer device 102.

“Secondary copies and Exemplary Secondary Storage Devices”

In some instances, “Primary Data 112 stored on primary Storage Devices 104 could be compromised by an employee who deletes or accidentally overwrites primary Data 112. Primary storage devices 104 may be lost, damaged, or corrupted. It is important to keep copies of primary data 112 in order to recover and/or comply with regulatory requirements. System 100 therefore includes one or several secondary storage computing devices (106), and one or multiple secondary storage devices (108) that are designed to create and store secondary copies 116, as well as metadata. Secondary storage subsystem 118 may include the secondary storage computing devices (106) and secondary storage devices (108).

“Secondary copies (116) can be used to aid in analysis and search efforts, and other information management goals, such as: restoring data or metadata in the event of a disaster, deletion, corruption, or loss; allowing point-in time recovery; complying and maintaining regulatory data retention and electronic discovery requirements (e-discovery); facilitating organization and searching data; increasing user access to data files across multiple computing devices, and/or hosted service; and implementing data pruning and retention policies.

A secondary copy 116 may contain a separate, stored copy of data that was derived from one or several earlier-created copies (e.g. primary data 112 or another secondary copy 126). Secondary copies 116 may contain point-in time data and can be stored for a relatively long period of time before the data is moved to another storage or discarded. A secondary copy 116 might be stored in a different storage unit than the other copies. Secondary copies 116 may be kept in the same storage device that primary data 112. A disk array that can perform hardware snapshots may store primary data 112 but creates secondary copies 116. Secondary copies 116 can be kept in slower and/or cheaper storage (e.g. magnetic tape). The secondary copy 116 could be kept in a backup, archive format, or another format than the native application format or primary data 112

Secondary storage computing devices 106 can index secondary copies (e.g. using a media agent 144) to allow users to browse and restore later and facilitate the lifecycle management for the indexed data. A secondary copy 116 representing certain primary data 112 may be created. A pointer or another location indicator (e.g., an stub), can be added to primary data 112. This will indicate the current location for a particular secondary version 116. System 100 can create multiple secondary copies 116 to store and manage metadata or instances of data objects in primary data 11.2. Each copy represents the state of the primary data object at that particular time. System 100 may also continue to manage point in time representations of the data object even though primary storage device 104 or the file system may delete an instance of the data object. The operating system 110 and other client computing devices 102 can execute under virtualization software. Primary storage device(s), 104 may also contain a virtual disk that was created on a physical storage media. System 100 can create secondary copies of 116 files or other data objects within a virtual disk and/or secondary copies of 116 of the entire virtual drive file itself (e.g. of an entire.vmdk)

“Secondary copy 116 is distinguishable from the corresponding primary data 112. Second, secondary copies 116 are distinguishable from the corresponding primary data 112. Applications 110 and client computing device 101 may not be able to access secondary copies 116 for various reasons. operations. Secondary copies 116 could have been created by media agent 144 and/or data agent 142 during the creation process (e.g. compression, deduplication of encryption, integrity markers or indexing, formatting, metadata that is application-aware, etc.). Secondary copy 116 could represent source primary data 112 but not necessarily be identical to the source.

“Second copies, secondary copies 116 could be stored on secondary storage device108 that is not accessible to client computing device102 or hosted service application 110. Some secondary copies 116 could be “offline copies”, They are not easily accessible (e.g., they are not mounted to tape or disc). “Offline copies” can be copies of data that system 100 cannot access without human intervention (e.g. tapes in an automated tape library but not yet mounted to a drive) and copies that system 100 can only access with some human intervention (e.g. tapes stored at an offsite storage location).

“Using Secondary Copies with Intermediate Devices?Secondary Storage Computing devices”

It can be difficult to create secondary copies when there are hundreds of clients computing devices 102 that constantly generate large amounts of primary data 112 which must be protected. Secondary copies can also be costly due to overhead 116. Accessing secondary storage devices 108 requires specialized programming intelligence and/or hardware capabilities. Client computing devices 102 can interact with secondary storage devices 108 to create secondary copies. However, due to the factors discussed above, this approach may negatively impact client computing device102’s ability to service/service application 110 or produce primary data 112. Client computing devices 102 may not be optimized to interact with certain secondary storage device 108.

“System 100 could include one or more software components and/or hardware that act as intermediaries between client computing device 102 (that generate primary information 112) and secondary storage device 108 (that store second copies 116). These intermediate components offer other benefits, as well as offloading some responsibilities from client computing device 102. As shown in FIG. 1D, it is possible to distribute some of the work involved with creating secondary copies (116), which can improve system performance and scalability. For example, the use of specialized secondary storage computing device 106 and media agents144 to interconnect with secondary storage device 108 and/or perform data processing operations can dramatically improve the speed at which system 100 handles information management operations. It can also increase the system’s ability to handle large volumes of operations. This can reduce the computational load on client computing devices 102. As shown in FIG. 1A, and/or one (or more) media agents 144. Below are further details about media agents (e.g., in relation to FIGS. 1C-1E). These components are specialized in intelligence and/or hardware that enable the system 100 to write to, read from, instruct, communicate with, or interact with secondary storage devices 108.

“Secondary storage computing devices(s)106” can include any of the above-mentioned computing devices. Secondary storage computing device(s), 106 may also include specialized hardware components and/or intelligence (e.g. specialized interfaces) that allow them to interact with certain secondary storage devices(s), 108 with whom they may be associated in some cases.

“To create secondary copies 116, which involves the copying data from primary storage system 117 to secondary subsystem 118,” client computing device 102 may transmit the primary data 112 (or a processed copy thereof generated by a Data Agent 142) via a communication path 114 to the designated secondary storage computing devices 106. Secondary storage computing device (106) may then process the data and transmit it to secondary storage device (108). A number of secondary copies 116 can be made from secondary copies 116. This is known as an auxiliary copy operation.

“Exemplary Secondary Data and Exemplary Primary Data”

“FIG. “FIG. Primary data 112 objects are stored on primary storage device(s), 104. These include word processing documents 119A to B, spreadsheets 120 and presentation documents 122. Video files 124 and image files 126. Email mailboxes 128 (and corresponding emails 129A?C), HTML/XML 130, databases 132, and corresponding tables, or other data structures 133A?133C. Some or all of the primary data 112 objects may be associated with the corresponding metadata (e.g.?Meta1-11). These metadata may contain file system metadata or application-specific metadata. The secondary copy 116 data object 134A-C is stored on secondary storage device(s), 108. These secondary data objects 134A and 134C may contain copies of, or otherwise represent, corresponding primary data 112.

“Secondary data objects 134A to C can each represent more than one primary object. Secondary copy data object (134A) can represent three primary data objects 133C-122C and 129C respectively. They are represented as 133C? and 122C respectively and accompanied with corresponding metadata Meta11 and Meta3. The prime mark (?) also indicates that secondary copy data object 134A may contain three separate primary data objects 133C, 122, and 129C. They are accompanied by corresponding metadata Meta11, Meta3, and Meta8 respectively. As indicated by the prime mark (? ), secondary storage computing device 106 and other components of secondary storage subsystem 112 may process primary storage subsystem 175 data and store a secondary version that includes a transformed and/or enhanced representation of primary data objects and/or metadata. This secondary copy can also include a modified or altered format such as a compressed, encrypted or deduplicated format. Secondary storage computing devices 106, for example, can create new metadata or other information from the processing and store the newly created information with the secondary copies. Secondary copy data object 1346 is the primary data objects 120, 133B, and 119A respectively. It is accompanied by corresponding metadata Meta2, METa10, and Meta1, and can be used to store secondary copies. Secondary copy data object 1346 represents primary data objects 133A and 1196 as 133A and 1196? respectively. It is accompanied by the corresponding metadata Meta9 and Meta5, respectively.

“Exemplary Information Management System Architecture”

“System 100 can contain a wide range of hardware and software components. These can be organized in different ways depending on the embodiment. It is crucial to make clear design decisions about the functional responsibilities and roles of components in system 100. These design decisions can have a significant impact on how system 100 adapts to changing data growth and other circumstances. FIG. FIG. 1C illustrates a system 100 that was designed in accordance with these considerations. It includes storage manager 140, one (or more) data agents 142 which execute on client computing devices 102 and are configured to process primary information 112, and one (or more) media agents 144 which execute on one or multiple secondary storage computing devices.106 to perform tasks involving secondary storage device 108.

“Storage Manager”

Storage manager 140 is a central storage and/or information management system that can perform control functions and store critical information about system 100. It is possible to have a large number of components and a lot of data under control. It is a complex task to manage the components and data. This can become more difficult as the organization grows in number and data. According to some embodiments, storage manager 140 is responsible for the management of system 100 or at least a substantial portion thereof. The storage manager 140 can be modified independently to adapt to changing conditions without the need to replace or redesign the rest of the system. To best serve the functions and networking requirements of storage manager 140, you can choose a computing device to host and/or operate as storage manager 140. These and other benefits are further described below with reference to FIG. 1D.”

“Storage manager 140 could be a software module, or any other application that is hosted on a suitable computing device. Storage manager 140 may be a computing device that performs some of the functions described in this document. Depending on the configuration, storage manager 140 may be used in conjunction with one or several associated data structures (e.g. management database 146), or both. Storage manager 140 is responsible for initiating, performing, coordination, and/or controlling storage operations and other information management operations. This includes protecting and controlling primary data 112 as well as secondary copies 116. Storage manager 140, in general, is responsible for managing system 100. This includes communicating with and instructing and controlling, under certain circumstances, components such as media agents 144 and data agents 142.

“As indicated by the dashed-arrowed lines (114 in FIG. 1C Storage manager 140 can communicate with, instruct, or control any or all elements in system 100 such as media agents 144 and data agents 142. Storage manager 140 is responsible for managing the operation of various software and hardware components within system 100. In some embodiments, control information is received from storage manager 140. Status and index reporting are transmitted to storage manager 140 via managed components. Payload data and metadata are typically communicated between media agents 144 and data agents 142 (or between client computing devices(s), 102, and secondary storage computing device(s), 106), depending on the direction and under the supervision of storage manager 140. The control information may include instructions and parameters for performing information management operations. This includes instructions and timing information, which specifies when to start an operation, instructions on how to execute a task, data path information that specifies what components to access or communicate with in order to complete an operation, as well as instructions and parameters. Other embodiments allow for information management operations to be controlled and initiated by other components of the system 100, such as media agents 144 or data agent 142, or combined with storage manager 140.

“Storage manager 140, according to certain embodiments, provides one or more functions:

“Storage manager 140 can maintain an associated database (or?storage manger database 146?) or ?management database 146?) Management-related data and information management policy 148. The database 146 is kept in computer memory that can be accessed by storage manager 140. Database 146 could contain a management index 150 or?index 150? Database 146 may contain a management index 150 (or?index 150?) Management tasks, media containerization, other useful data and/or any combination thereof. Storage manager 140 might use index 150 to track the logical associations between media agent 144 and secondary storage device 108, and/or data movement to/from secondary storage device 108. Index 150, for example, may store data that associates a client computing device with a media agent 144 or secondary storage device108, as described in an information management strategy 148.

“Administrators or other individuals may initiate and configure certain information management operations individually. This may work for certain recovery operations, or for other tasks that are not often performed frequently, but it is often incontainable for the implementation of ongoing organization-wide data management and protection. System 100 could use information management policies 148 to automate the execution of information management operations. An information management policy 148 may contain a stored data structure, or another information source, that specifies parameters (e.g. criteria and rules) related to storage management or any other information management operations. Storage manager 140 can process information management policies 148/or index 150. Based on these results, it will identify an information management operation and identify the components of system 100 that are involved. These components can be connected to and/or between them, and/or the operation can be controlled and instructed to these components. System 100 can then translate stored information into coordinated activity between the computing devices of system 100.

Management database 146 may contain information management policies and associated data. However, information management policies can be stored in computer storage at any location other than management database 146. An information management policy 148, such as a storage policy, may be stored in metadata in a media agent 152 or in secondary storage device 108 (e.g. as an archive copy) depending on the embodiment. Below are details about information management policies 148. In certain embodiments, management data base 146 includes a relational database (e.g. an SQL database) that tracks metadata. This could include metadata related to secondary copy operations, such as the metadata that protects client computing devices 102 and subclient data, where secondary copies are stored, and who performed the storage operation. These and other metadata can also be stored at secondary storage computing devices 106 and 108. This allows data recovery without the need for storage manager 140 in certain cases. Management database 146 could contain data necessary to launch secondary copy operations (e.g. storage policies, schedule policies etc. ), information regarding the status and reporting of completed jobs (e.g. status and error reports for yesterday’s backup jobs), as well as additional information necessary to allow restore and disaster recovery operations (e.g. media agent associations and location indexing, content and other indexing ).

“Storage Manager 140” may contain a jobs agent (156), a user interface (158), and a management agent (154). All of these may be implemented as interconnected modules or applications programs. These will be described below.

In some embodiments, “Jobs agent” 156 initiates, controls and/or monitors some or all information management operations that have been performed, are currently being performed or are scheduled to be performed in system 100. A job is a logical collection of information management operations, such as daily storage operations that are scheduled for a set of subclients (e.g. generating incremental block level backup copies 116 at a specific time every day for certain database files in a particular geographical location). Jobs agent 156 can access information management policies (e.g. in management database 146) in order to determine when, how, and where to initiate/control jobs within system 100.

“Storage Manager User Interfaces”

“User interface (158) may contain information processing and display software such as a graphic user interface (GUI), a program interface (API) and/or any other interactive interface(s). Through this interface, users and system processes can retrieve information on the status of information management operations and issue instructions to storage manager 140 or other components. Users can issue instructions to system components 100 via user interface 158 regarding secondary copy and/or recovery operations. A user might modify a schedule indicating the number of secondary copy operations that are pending. Another example is that a user could use the GUI to see the status of secondary copy jobs, or monitor certain components of system 100 (e.g. the remaining storage capacity). Storage manager 140 can track information that allows it to identify, designate, and otherwise identify content indices or deduplication databases or similar resources or data sets within its information cell (or another cell), to be searched for certain queries. These queries can be entered by users by using the user interface 158.

“Various configurations of information management 100 can be used to generate user interface data that is usable for rendering various interactive user interfaces. System 100, as well as any other system, device, or software program (e.g. a browser program), may use the user interface data to render interactive user interfaces. Interactive user interfaces can be displayed on electronic displays, including touch-enabled displays, consoles, and other devices, either directly connected to storage manager 140, or remotely communicatively coupled, e.g. via an internet connection. This disclosure describes several embodiments of dynamic and interactive user interfaces. Some of these interfaces may be generated using user interface agent158. They are also the result of substantial technological advancement. These user interfaces may allow for improved cognitive and ergonomic interactions between humans and computers. They can also provide significant cognitive and ergonomic efficiencies, as well as advantages over other systems such reduced mental workloads and better decision-making. The user interface 158 could operate in one integrated view (not shown). A reporting capability may be available on the console to generate a variety reports that can be customized to specific aspects of information management.

“User interfaces do not have to be limited to storage manager 140. In some instances, a user can access information locally via a computing device component in system 100. Client computing device 102 may have information about installed data agents 142, and associated data streams. Some information regarding media agents 144, and associated data streams, may also be available from secondary storage computing unit 106.

“Storage Manager Management Agent”

Management agent 154 provides storage manager 140 with the ability communicate with other components of system 100 and/or other information management cells using network protocols and APIs. This includes HTTP, HTTPS FTP, REST, virtualization APIs, cloud services provider APIs and hosted service provider APIs. Management agent 154 allows multiple information management cells the ability to communicate with each other. System 100 may, for example, be one information management cells in a network that includes multiple cells, or other cells that are logically related to each other, such as in a WAN, LAN, or WAN. The cells can communicate with each other through their respective management agents 154. U.S. Pat. 154 provides more information about inter-cell communication and hierarchy. No. 7,343,453.”

“Information Management Cell”

“A?information management cell?” (or ?storage operation cell? (or?storage operation cell? A logical or physical grouping may include hardware and/or software components that are used to perform information management operations on electronic data. This includes at least one storage manager 140, at least one data agent (executing on client computing devices 102) and at most one media agent (executing on secondary storage computing devices 106). FIG. 1C shows components that may be combined to form an information management cell. 1C together may form an information management cells. In some configurations, the system 100 can be called an information management or storage operation cell. An identification number 140 that is responsible for managing the cells can identify a particular cell.

Multiple cells can be organized hierarchically so that they may inherit properties from hierarchically superior cell or be controlled (automatically or not) by other cells in the hierarchy. In some embodiments, cells can inherit or be linked to information management policies, preferences or operational parameters. You can also arrange cells hierarchically according geography, architectural considerations, and other factors that are useful or desirable for information management operations. A cell could be a geographical segment of an enterprise such as a Chicago office. A second cell might represent another geographic segment such as a New York City Office. Others cells could represent different departments within an office, such as human resources, finance and engineering. A first cell can perform one or several first types information management operations, such as one or two first types at a particular frequency, and another cell may perform one, or more, second types at a different frequency, and with different retention rules. The hierarchical information is generally maintained by one or more storage mangers 140 who manage the respective cells (e.g. in the corresponding management database(s 146).

“Data Agents”

A variety of applications 110 may be run on a client computing device 102. These include operating systems, file system, database applications, e mail applications and virtual machines. The client computing device 102 can also be responsible for processing and preparing primary data 112 created by various applications 110. Moreover, the nature of the processing/preparation can differ across application types, e.g., due to inherent structural, state, and formatting differences among applications 110 and/or the operating system of client computing device 102. In some embodiments, each data agent 142 can be advantageously configured to help with information management operations that are based on the type and protection of the data being protected.

“Data agent142” is a component in information system 100. It is usually directed by storage manager 140 to create or restore secondary copies 116. Data agent 142 could be a program, such as a set executable binary files. It executes on the same client computing devices 102 and 110 that the associated application 110 is protected by data agent 142. Data agent 142 is responsible for initiating and managing information management operations with respect to the associated application(s), 110, and the corresponding primary data 112 that is generated/accessed in the specific application(s). 110. Data agent 142 might be involved in copying, archiving and migrating certain primary data 112 from the primary storage device(s). 104. Storage manager 140 may give data agent 142 control information, including commands to send copies of data objects and/or metadata 144 to media agents. Data agent 142 may also compress, deduplicate and encrypt some primary data 112, and capture application-related metadata prior to transmitting the processed data 144. Storage manager 140 may give instructions to data agent 142 to restore or assist in restoring a secondary copy (116) from secondary storage device 110 to primary storage 104 so that application 110 can access the restored data in a suitable format.

Each data agent 142 can be configured to access the data and/or metadata stored on the primary storage device(s), 104 and the host client computing device, 102 and then process the data accordingly. Data agent 142, for example, may organize or assemble data and metadata during a secondary copy operation before transferring them to a media agent (144) or another component. A list of files and other metadata may be included in the file(s). A data agent 142 can be distributed between client computing device (102) and storage manager 140 (and any other intermediate parts), or it may be deployed from remote locations or its functions approximated using a remote process that performs all or some of the functions of data agents 142. A data agent 142 can also perform functions provided by media agent. Some embodiments may use one or more generic agents 142 to handle data from multiple applications 110 or can process multiple types of data, in lieu of or in addition, using specialized data agent 142. One generic data agent 142 could be used to backup, migrate, and restore Microsoft Exchange Mailbox and Microsoft Exchange Database data. A second generic data agent might handle Microsoft Exchange Public Folder and Microsoft Windows File System data.

“Media Agents”

As noted, shifting certain responsibilities from client computers 102 to intermediate components like secondary storage computing device(s), 106 and corresponding medium agent(s), 144 can bring a variety of benefits, including faster and more reliable information management operations and improved scalability. One example is that media agent 144 can be used to store metadata and recently copied data locally on the secondary storage device(s). 108. This improves restore capabilities and performance.

“Media agent144” is a component in system 100. It is usually directed by storage manager 140 to create and restore secondary copies 116. While storage manager 140 manages system 100 in its entirety, media agent144 is a portal to secondary storage devices 108. It has specialized features that allow access to and communication with certain secondary storage devices 108. Media agent 144 could be a program that runs on a secondary storage computing unit 106. Media agent 144 coordinates and facilitates data transmission between a media agent 142 (executing from client computing device 102), and secondary storage device(s), 108 that are associated with media agent.144. Other components of the system can interact with media agent144 to gain access data stored on secondary storage devices 108. Media agents 144 are able to generate and store information about the characteristics of stored data and/or metadata. They can also generate and store additional information that provides insight into secondary storage devices 108. This is commonly referred to as indexing the secondary copies 116. One media agent 144 can operate on a secondary storage computing unit 106. In other instances, multiple media agents 144 could operate on the same secondary computing device.

“A media agent (144) may be associated to a specific secondary storage unit 108 if it is capable of the following: retrieving data from the secondary storage unit 108, coordinating retrieval from the secondary storage facility 108, routing and/or storage of data to that particular secondary device 108, and retrieving data from that particular secondary device 108. In certain embodiments, media agent 144 is physically distinct from the associated secondary storage unit 108. A media agent 144 can operate on a secondary storage computing unit 106 in a different housing, package, or location than the associated secondary storage device. One example is that a media agent 144 works on a primary server computer and communicates with secondary storage devices 108 in separate rack-mounted RAID-based systems.

A media agent 144 may be associated with a specific secondary storage device. 108 can instruct secondary storage device.108 to complete an information management task. A media agent 144 might instruct a tape library how to load or eject certain media and then archive, migrate or retrieve that data, e.g. for the purpose of restoring data from client computing device 102. Another example is that a secondary storage device (108) may contain an array of solid state drives or hard disk drives, and media agent144 may forward a LUN and other relevant information to the array. The array uses the received information for the secondary copy operation. Media agent 144 can communicate with secondary storage device (108) via a suitable communication link such as a SCSI/Fibre Channel link.

“Each media agent may have an associated media agent database (152). Media agent database 152 can be saved to a disk, or another storage device (not illustrated) that is local the secondary storage computing devices 106 on which media agents 144 executes. Other cases, media agent 152 may be stored separate from the secondary storage computing device (106). Media agent database 152 may include, among others, a media agents index 153 (see FIG. 1C). Media agent index 153 is sometimes not part of the media agent database 152.

“Media agent index 153 (or ?index 153?) “Media agent index 153 (or?index 153?)” Index 153 is a quick and efficient way to locate/browse secondary copies 116 and other data stored in secondary storage units 108. This allows you to quickly access the secondary storage device 108 to retrieve any information. For instance, for each secondary copy 116, index 153 may include metadata such as a list of the data objects (e.g., files/subdirectories, database objects, mailbox objects, etc. ), a path to the secondary copies 116 on the appropriate secondary storage device. 108; location information (e.g. offsets) that indicates where the data items are stored in secondary storage device. 108; when they were created or modified. Index 153 also includes metadata that can be used from media agent 144 to associate with secondary copies 116. Some embodiments allow index 153 to be stored alongside secondary copies 116 in secondary storage unit 108. A secondary storage device 108 may contain sufficient information to allow a ‘bare metal restore’ in some instances. A secondary storage device 108 may contain sufficient information to enable a?bare metal restore,’ in some embodiments.

“Index 153 can be used as a cache. It is also known as an ‘index cache. Index cache 153 contains data that typically reflects details about recent secondary copy operations. Some index cache 153 portions may be copied to secondary storage device (108) or moved to another location after a triggering event such as when index cache 53 reaches a certain size. This information can be recovered and uploaded back to index cache 153, or restored to media agent 14 to allow retrieval from secondary storage device(s).108. Some embodiments may contain format or containerization information related archives or other files on storage device(s).

Media agent 144 may also act as a facilitator or coordinator of secondary copy operations between client computing device 102 and secondary storage device 108. However, it does not actually write data to secondary storage unit 108. Storage manager 140 (or media agent 144) can instruct client computing devices 102 and 108 to communicate directly. Client computing device 102 can transmit data to secondary storage device 110 via intermediary components or directly, depending on the instructions received. Media agent 144 can still receive, process and/or maintain metadata related the secondary copy operations. For example, it may continue to maintain index 153. These embodiments allow payload data to flow through media agent144 in order to populate index 153. However, it is not possible for secondary storage device 108 to be written. In some cases, media agent 144 or other components, such as storage manager 140, may include additional functionality such as data classification and content indexing. These and other functions can be found below.

“Distributed, Scalable Architecture”

“System 100’s functions can be divided among various physical and/or logic components, as described. One or more of the storage manager 140, data agents 142, and media agents 144 can be used on different computing devices. This architecture can offer many benefits. This architecture can provide many benefits, such as the ability to tailor hardware and software choices to each component’s specific function. Secondary computing devices 106 that media agents 144 use can be customized for interaction with secondary storage devices 110 and provide index cache operation among other specific tasks. Client computing devices 102 can also be chosen to service applications 110 to store and produce primary data 112.

“Moreover, information management system 100 may be distributed to several computing devices in certain cases. Database 146 can be moved to or stored on a separate server, such as an SQL server, in large file systems that have large amounts of data. This distributed configuration provides additional protection as database 146 can be protected using standard database utilities, such as SQL log shipping and database replication. It is independent of other storage manager 140 functions. Database 146 can be easily replicated to a remote location for use in case of a disaster at the primary site. Database 146 can also be replicated to another computing device at the same site, for example to a faster machine in case a storage manager host computing devices is unable to support the growing system 100.

“The distributed architecture provides efficiency and scalability. FIG. FIG. 1D illustrates an information management system 100 that includes a plurality 102 client computing devices and associated data agents, 142 as well a plurality 106 secondary storage computing devices and associated media agents, 144. Based on system 100’s changing needs, additional components can be added and subtracted. Administrators can, for example, add client computing devices 102 and secondary storage computing devices106 depending on the location of bottlenecks. Load balancing is also possible if multiple fungible parts are available to address bottlenecks. Storage manager 140 could, for example, dynamically choose which media agents (144) and/or secondary devices 108 to store operations using a processing load analysis.

“Where system 100 contains multiple media agents, 144 (see, for example, FIG. 1D), a media agent 144 can provide failover functionality for a failed media agent. Dynamically selecting media agents 144 to provide load balancing is another option. Client computing devices 102 can communicate with any of the media agents, 144, e.g. as directed by storage manager 140. Each media agent 144 can communicate with any of the secondary storage devices (108), e.g. as directed by storage manger 140. Operations can be routed to secondary devices 108 dynamically and in highly flexible ways to provide failover, load balancing, and other functions. U.S. Pat. provides additional examples of scalable systems that can perform dynamic storage operations, load balancencing, failover, and more No. 7,246,207.”

While it is possible to distribute functionality across multiple computing devices, there are some advantages. In other situations, consolidating functionality can be more beneficial. Alternative configurations may allow certain components to reside on the same computing device and run simultaneously. In other embodiments, any one or more components in FIG. 1C can be implemented on the same computing devices. One configuration may include a storage manager 140 and one or more data agent 142. Other embodiments allow one or more data agent 142, one or several media agents 144, and storage manager 140 to be implemented on the same computing devices. This is not a limitation.

“Exemplary Types Information Management Operations, including Storage Operations”

“In order to protect stored data and maximize its potential, system 100 can be configured for a range of information management operations. These operations may be called storage management operations or storage operations. These operations include: (i) data movement operations; (ii), processing and data manipulation operations; and (iii); analysis, reporting and management operations.

“Data Movement Operations, including Secondary Copy Operations”

Data movement operations refer to storage operations that involve copying or migrating data between different locations within system 100. Data movement operations include copying, migrating, or other transfer of data from one or more storage devices to another.

Data movement operations include backup operations, archive operations and information lifecycle management operations like hierarchical storage management, replication operations (e.g. continuous data replication), snapshots operations, deduplication, single-instancing operations as well as auxiliary copy operations and disaster-recovery operation. Some of these operations don’t necessarily create separate copies, as will be explained. These operations are referred to as “secondary copy operations”. Because they involve secondary copies, it is simple to say that they are called “secondary copy operations” Data movement includes restoring secondary copies.

“Backup Operations”

A backup operation creates an exact copy of primary data 112 at a specific point in time (e.g. one or more files, or other data units). The backup copy 116, which is a type of secondary copy 116, may be kept independently from the original. Backups generally include maintaining both a copy of the primary data 112 and backup copies 116. In some embodiments, the backup copy is stored in a different format than the native format. This is in contrast to the backup copy in primary data 112, which may be stored in a format that is native to the source software(s) 110. Backup copies may be stored in different formats. A backup copy can be saved in a compressed backup format to facilitate long-term storage. Backup copies 116 may have a longer retention period than primary data 112, which can be highly variable. Backup copies 116 could be kept on media that is slower to retrieve than primary storage device. 104 Backup copies may be kept for shorter periods than other types of secondary copies 116. Backups can be kept offsite.

Backup operations include full backups and differential backups as well as incremental backups. Backups and/or the creation of a?reference backup. A full backup (or standard full backup) In some cases, a full backup (or?standard full backup?) is a complete copy of the data that needs to be protected. Full backup copies can take up a lot of storage so it is a good idea to keep a backup copy of the data as a baseline, and then only save changes to the backup copy.

“A differential backup operation, or cumulative incremental backup operation, tracks and stores any changes made since the last full backup. Although differential backups can quickly grow in size, they can be restored relatively quickly because you can only use the most recent differential copy to complete a restore.

An incremental backup operation stores and tracks changes made since the last backup copy. This can help reduce storage usage. However, in some cases, the process of restoring can take longer than restoring from full or differential backups. This is because accessing multiple incremental backups and a full backup may be required to complete a restore operation.

Synthetic full backups consolidate data and do not directly back up client data. The most recent full backup, either synthetic or standard, is used to create a synthetic backup. This backup can then be combined with any subsequent incremental and/or different backups. The resultant synthetic full backup is exactly the same as what would have been created if the last backup for subclient had been a full backup. A synthetic full backup, unlike standard full, incremental and differential backups does not transfer data to the backup media. It acts as a backup consolidator. Synthetic full backups extract the index data from each subclient. It uses this index data, along with the previously backed-up user data images, to create new full backup images (e.g. bitmaps) for each subclient. The new backup images combine the index and user data from the previous full backups, as well as the previously backed up differential and incremental backups, into a synthetic backup file. This backup file fully represents each subclient (e.g. via pointers), but it does not contain all its constituent data.

Click here to view the patent on Google Patents.

How to Search for Patents

A patent search is the first step to getting your patent. You can do a google patent search or do a USPTO search. Patent-pending is the term for the product that has been covered by the patent application. You can search the public pair to find the patent application. After the patent office approves your application, you will be able to do a patent number look to locate the patent issued. Your product is now patentable. You can also use the USPTO search engine. See below for details. You can get help from a patent lawyer. Patents in the United States are granted by the US trademark and patent office or the United States Patent and Trademark office. This office also reviews trademark applications.

Are you interested in similar patents? These are the steps to follow:

1. Brainstorm terms to describe your invention, based on its purpose, composition, or use.

Write down a brief, but precise description of the invention. Don’t use generic terms such as “device”, “process,” or “system”. Consider synonyms for the terms you chose initially. Next, take note of important technical terms as well as keywords.

Use the questions below to help you identify keywords or concepts.

  • What is the purpose of the invention Is it a utilitarian device or an ornamental design?
  • Is invention a way to create something or perform a function? Is it a product?
  • What is the composition and function of the invention? What is the physical composition of the invention?
  • What’s the purpose of the invention
  • What are the technical terms and keywords used to describe an invention’s nature? A technical dictionary can help you locate the right terms.

2. These terms will allow you to search for relevant Cooperative Patent Classifications at Classification Search Tool. If you are unable to find the right classification for your invention, scan through the classification’s class Schemas (class schedules) and try again. If you don’t get any results from the Classification Text Search, you might consider substituting your words to describe your invention with synonyms.

3. Check the CPC Classification Definition for confirmation of the CPC classification you found. If the selected classification title has a blue box with a “D” at its left, the hyperlink will take you to a CPC classification description. CPC classification definitions will help you determine the applicable classification’s scope so that you can choose the most relevant. These definitions may also include search tips or other suggestions that could be helpful for further research.

4. The Patents Full-Text Database and the Image Database allow you to retrieve patent documents that include the CPC classification. By focusing on the abstracts and representative drawings, you can narrow down your search for the most relevant patent publications.

5. This selection of patent publications is the best to look at for any similarities to your invention. Pay attention to the claims and specification. Refer to the applicant and patent examiner for additional patents.

6. You can retrieve published patent applications that match the CPC classification you chose in Step 3. You can also use the same search strategy that you used in Step 4 to narrow your search results to only the most relevant patent applications by reviewing the abstracts and representative drawings for each page. Next, examine all published patent applications carefully, paying special attention to the claims, and other drawings.

7. You can search for additional US patent publications by keyword searching in AppFT or PatFT databases, as well as classification searching of patents not from the United States per below. Also, you can use web search engines to search non-patent literature disclosures about inventions. Here are some examples:

  • Add keywords to your search. Keyword searches may turn up documents that are not well-categorized or have missed classifications during Step 2. For example, US patent examiners often supplement their classification searches with keyword searches. Think about the use of technical engineering terminology rather than everyday words.
  • Search for foreign patents using the CPC classification. Then, re-run the search using international patent office search engines such as Espacenet, the European Patent Office’s worldwide patent publication database of over 130 million patent publications. Other national databases include:
  • Search non-patent literature. Inventions can be made public in many non-patent publications. It is recommended that you search journals, books, websites, technical catalogs, conference proceedings, and other print and electronic publications.

To review your search, you can hire a registered patent attorney to assist. A preliminary search will help one better prepare to talk about their invention and other related inventions with a professional patent attorney. In addition, the attorney will not spend too much time or money on patenting basics.

Download patent guide file – Click here