Patent for “Systems and methods to protect and govern genomic and other information”

Digital Healthcare – W. Knox CAREY, David P. Maher, Michael G. Manente, Jarl Nilsson, Talal G. Shamoon, Intertrust Technologies Corp

Search Patent for “Systems and methods to protect and govern genomic and other information”

Abstract for “Systems and methods to protect and govern genomic and other information”

“Trusted privacy-protected methods and systems are disclosed for processing, handling and performing tests on human genome and other information. A system that uses cloud technology to store and analyze genetic and other data is disclosed in some embodiments. The system could include some or all of the following: authenticated and certificated data sources; certified diagnostic tests; policy-based access.

Background for “Systems and methods to protect and govern genomic and other information”

“Genetic testing has moved from single nucleotide polymorphisms (SNPs), which detects individual chemical differences in the gene code, to Whole Genome Sequencing(WGS), which records every base pairing in a sequence. Companies are currently focusing their efforts on developing devices that can produce whole genome sequences at a reasonable cost for individuals. In the next three years, it is anticipated that devices that can sequence an entire human genome in less than a day will become commercially available. Today’s primary focus is on the development of sequencing technology, biochemistry and first-stage genomic data processing (raw and base-calling statistical processing).

“Some embodiments describe a method for trusting computations of human genomic and other data. This method involves receiving a set genomic or other data, and creating an executable diagnostic program that can operate on the genomic or other data. The executable diagnostic program must be authenticated. Once the authenticity evaluations have been satisfied, the program must then be executed upon at least some of the data. Some embodiments provide diagnostic results that can be used in medical diagnosis. This method may also be used to verify the authenticity of the results. A digital signature can be used to verify the authenticity of the diagnostic program. The same goes for the authenticity of genomic or other data. A digital signature can be verified with the data. In some embodiments, the method includes the maintenance of privacy related to the data set based on one or several privacy policies.

“Accordingly to some embodiments, a trusted computer system is described. It includes: a secure storage device configured to store at most a part of a set data and a computer programme for operating on that data; and a secure programming system programmed to evaluate and configure the authenticity of the program. The authenticity evaluations can be used to validate the authenticity of at minimum a portion the set of information and, when satisfied, to run the program on at the least a fraction of the set.

“Some embodiments describe an executable diagnostic computer software program that can be used to generate diagnostic results from a data set. This includes a diagnostic algorithm that executes on at least one portion of the data set and produces diagnostic results. A digital signature is also described to verify the program’s authenticity. Some embodiments allow the computer program to be packaged with metadata. This metadata can include information about the diagnostic algorithm, its intended use, and any precautions. It also includes technical descriptions of inputs that are expected to produce the useful diagnostic results.

“Some embodiments describe a method for generating packaged genomic information. This includes receiving genomic data from a DNA sequencer; encrypting it; creating a digital signature that will allow verification of the data; and packaging the digital signature with encrypted genomic data. You can generate the digital signature using either a private key associated to the DNA-sequencing instrument or a private keys associated with the sequencing facility.

“Some embodiments describe a method for operating on one or several sets of genomic information. This includes securely receiving one or many sets of genomic information; associating permission data with each set, the permission information having to be specified by an owner. Receiving an algorithm to operate upon genomic data; receiving a request for the algorithm to run on one or multiple sets of received data; authenticating the request; checking permissions associated to a set genomic data; and finally allowing the algorithm access to or use the set if permitted by the permissions.

“Genomic data” is the term used herein. Data that expresses, represents, or derives from a whole genome or sequence of genomes is generally referred to as “genomic data”. These data could include information encoded in chemical structures like DNA, mRNA and proteins, as well as regulatory information such as methylation status.

“As used herein, the term ‘genome? The organism’s genetic information. A genome can be encoded in DNA, RNA, or protein sequences derived form these nucleic acids sequences. The term “genome” can be used to refer to both genes and non-coding sequences. It can contain both genes and non-coding sequences. The term “genome” is used to refer to specific organisms. can be used to refer to genomic data taken from normal cells, including mitochondrial DNA?and also data from related cells like tumors and other microbiomes.

Below is a detailed description of the inventive work. Although several embodiments have been described, it is important to understand that the inventive body does not only include one embodiment. It also includes many modifications and equivalents. While many details are provided in this description to give a clear understanding of the inventive body, certain embodiments can be used without any or all of them. To avoid confusing the inventive body of work, technical material in the related art is not described in detail.

“Systems and methods for trusted handling genomic and/or additional information are presented.” These systems and methods, along with many of their components, systems, or methods, are innovative.

“Genomic data may be the most personal identifiable data in health. As with many traditional medical tests, once a sample has been taken and tested it is discarded. No further tests can then be done. Your data sample can be saved with Whole Genome Sequencing (WGS). Your data can be kept alive indefinitely with Whole Genome Sequencing (WGS). The data can be used to perform tests as new genes are discovered, and without additional laboratory work.

“If data are not adequately protected, the patient is essentially consenting to all the current tests and any future ones. Revealing genetic information can have far-reaching consequences: such as spousal selection/desirability; employment screening/employability; and profiling/discrimination, to name just a few examples. Information about an individual’s genetic information may also reveal information about their family members (e.g., siblings, children and twins).

“FIGS. “FIGS. These analytical modules are called Virtual Diagnostic Tests (or VDTs) in this instance.

“FIG. 1A shows how testing is done today, where testing and analysis are closely coupled. The patient’s 110-gram sample is then directly analysed using a genomic analysis tool, such as a microarray. 112, which yields a result of 114.”

“FIG. 1B shows a patient’s sample 110 being analysed by a sequencer 120, which produces a sequence output of 122. The sequence 122 can then be used immediately for analysis. The sequence output 122 may also be saved in a computer-readable format. FIG. 1C shows that a sequence stored on file 130 is processed according to certain embodiments in a trusted execution area 140 with one or more VDTs (142) to produce a diagnostic result 150. The processes in FIGS. The processes shown in FIGS. 1B and 1C may not be possible if the sequencer 120 is used at the time. According to certain embodiments, the testing and diagnostic apparatuses must be independently certified to carry out their respective tasks safely and accurately. The interface between them should also be known and trusted. These tests should be properly certified in order to be authenticated by others as they are added to the system.

“Illustrative Design”

“Some examples of embodiments show that a system can address privacy and security concerns associated with sensitive information such as genetic data. Some embodiments can include all or some of the following features:

“(1) Privacy-Protected Collect of Genomic Data”

“In preferred embodiments the individual’s privacy will be protected even at the point of collection or genesis of data. The service receives encrypted data from devices. The service secures and private associates patient information in a manner that is not easily inferred by laboratory personnel or observers.

“(2) Data is Anonymously Protected at Rest”

Preferable embodiments store genomic data in encrypted form within the system. This decouples information from which could reveal the identity of the person to whom it belongs. The linking information is only accessible to authorized personnel and is protected according to permissions.

“(3) Distributed trust model”

It is important to ensure that the diagnostic system produces a reliable result. A distributed trust model allows each party to be responsible for a specific part of the process. Doctors and patients can then trust that the final result was assembled from trusted, independently-created components.

“(4) Healthcare Certifications”

“In a rapidly changing field like genomics, it is unreasonable to expect doctors will be able follow every new discovery and translate that research into easily ordered diagnostic tests. Doctors can easily specify tests by codifying and associating descriptions with recommendations for use. Doctors can also trust that tests ordered by industry and regulatory agencies have been peer reviewed and will yield medically relevant results.

“(5) Virtual Lab Programming Tools?”

Researchers can codify their discoveries using standardized functions in a genomic programming language. DIFF (returns difference between two segments of a genome), IF/THEN statements and Boolean logic simplify programming required to commercialize discoveries.

“(6) Market for IP?”

It takes significant amounts of capital, time, resources and money to identify a gene sequence and their relationship with phenotypes or disease. The systems and methods described in this document provide a way for those who make such discoveries to be compensated.

“(7) Trusted Systems for Collaboration”

“In certain embodiments, a standard method to create and distribute codified search algorithm is provided. This allows discoveries to be shared between researchers. Multiple types of tests can be linked together to create reusable building blocks that can be shared between organizations?for free, or for an exchange of value; and/or

“(8) Privacy by Design?”

“In some embodiments, privacy protections are designed in advance to protect clients’ privacy. Privacy protections can be designed at the beginning to protect both anonymous and private analyses. This allows both types of use without compromise.

“Illustrative Gene Cloud Ecosystem”

“Accordingly to certain embodiments, a system is provided for the safe storage and analysis genetic and/or other data. Sometimes, this system is referred to as a “Gene Cloud”. Preferential embodiments of the Gene Cloud allow for long-term trusted storage and processing genomic (and/or any other) data in accordance with privacy and usage policies set forth by the stakeholders. You will appreciate that you can use any configuration of servers or storage media, including a single or cluster of servers or a distributed collection heterogeneous computers connected via a variety networks (e.g. the Internet, private and/or public networks and/or the like).

“Some versions of the Gene Cloud might include or support the following: (1) Virtual Diagnostic Tests;(2) protected personal genomic data;(3) authenticated, certified data sources;(4) authenticated, certified diagnostic tests; (5) access genomic data governed by rules;(6) patient-owned genomic data that can be used to make medical diagnoses; (7) the ability for a person to authorize research access and the privacy level required; and (8) the ability for a person to authorize specific tests of his/her genome and to specify who may have access.

“FIG. “FIG. The potential stakeholders of Gene Cloud system 200 include certification agencies 201 and researchers 202, payers203, labs204, clients205, healthcare providers206, tool providers207. Each stakeholder may have their own concerns and proprietary interests in genetic data management or use. The term “client” is used in FIG. FIG. 2. The terms?client? and?consumer? are not interchangeable. However, the terms?client? und?consumer? are often interchangeable within this description. These terms are often used interchangeably in this description. FIG. 2 shows many of the potential stakeholders. FIG. 2 shows how many potential stakeholders can play a part in ensuring data security and integrity in the chain of handling. 3.”

“FIG. “FIG. A trusted result 209 can be assured by labs 204 by certifying proper procedures for sample collection and processing; sequencer manufacturers 220 by certifying proper sequence data is extracted from a given specimen; and trusted gene cloud environment 200, which certifies that diagnostic tests are performed in a controlled environment and that rules have been observed. Tool providers 207 also certify that a test results in valid medical diagnosis. The table 1 shows in more detail the roles of each stakeholder in the operation and maintenance of Gene Cloud ecosystems.

“TABLE 1\nStakeholder Involvement in Operation of an Illustrative Gene Cloud Ecosystem\nActor Role Examples\nCertification Agencies:\nMedical Trust Confirms that medical research FDA\nAuthority supports medical claims associated with American Medical\ngene identification and fitness of a Association\nvirtual diagnostic test for a particular use Society of Genetic\nor diagnosis. Counselors\nHealthcare providers may regard World Health\nthis assurance as a minimal criterion for Organization (WHO)\nuse in their daily practice. Center for Disease\nControl (CDC)\nNational Cancer Institute\nNational Institute of\nHealth (NIH/NHGRI)\nPrivate Trust Confirms that tests that have been American Journal of\nAuthorities published by their researchers have been Medical Genetics\npeer-reviewed, are indeed authentic, and The Lancet\nhave not been recalled. Nature Genomics\nWhitehead Institute\nNew England Journal of\nMedicine\nJAMA\nTool Providers:\nTool Providers Tool providers create Virtual Pharmaceutical\nDiagnostic Tests (VDTs) and other researchers\nbioinformatics tools for use within the Academic researchers\nGene Cloud. The VDTs may, for Bioinformatics tools\nexample, be tests that help doctors providers\ndetermine dosing for a particular drug, or\nthey may be components that are used in\na research tool chain.\nThe tool provider will often be\nrequired to digitally sign each tool to\nindicate its source and protect its\nintegrity; these facts will be validated\nwhen the tools are executed in the Gene\nCloud’s VDT execution environment.\nClients/Consumers:\nClients/Consumers Ultimate owner of their genetic Any person\ninformation. Parents have associated their babies with their data through privacy permissions. (tested at birth), andnApproves the use of their data while they are minors.nPeriodically examines the record of Guardians who have been granted access to their personal data. manage the privacy of\nothers’ genomic\ninformation, including\nfetal genomic\ninformation acquired\nbefore birth\nLabs:\nCertified labs Labs are responsible for ensuring Private research labs\nthat sample collection, handling and Academic labs\nsequencing are performed according to CLIA-certified labs\ncertified procedures. Other medically-certified\nE.g., a university may have a labs\nresearch lab that provides genome\nsequences for research study; the\nuniversity’s hospital may have an\napproved medical testing lab. Both can sign and upload data to cloud for later testing. However, in some\nembodiments only the latter may be used\nby doctors seeking to make a diagnosis.\nSequencing Devices:\nSequencer Device The sequencing device is the actual Any sequencing device\nlab equipment that tests the sample and manufacturer\nidentifies the genomic sequence.\nIn one embodiment, each device\nthat is certified to operate in the\necosystem is given a digital certificate.\nData signed with this certificate\nauthenticates that it came from a device\nthat will properly format the data for use\nin latter parts of the system.\nResearchers & Pharmaceuticals:\nPharmaceutical In a customer role, a pharmaceutical Any pharmaceutical\nCompany company may pay for access to the company\n(Customer Role) consumer data that is retained and\nmanaged in the Gene Cloud. Researchers may wish to identify certain populations and run?research bots. within\nthe cloud, with willing participants to\nmap patient history to genetic factors\nc) Advertise to researchers or\ndoctors who are treating certain diseases\nd) Locate and invite specific\nindividuals to participate in controlled\nstudies of new treatments\nPharmaceutical In a supplier role, a pharmaceutical Any pharmaceutical\nCompany company may submit ?virtual diagnostic company\n(Supplier Role) tools? The system. These virtual diagnostic tools may be used to, e.g. :\nTools to help doctors prescribe\ndrugs which already exist for the general\npopulation, but dosing varies by genetic\ncharacteristics.\nTools to help doctors identify the\nbest possible treatment among a variety\nof drugs that can all be used to treat a\ncondition.\nTools that were mandated (e.g., by\nthe FDA) as a condition for granting\napproval for a drug. E.g. may only be\nprescribed for individuals with certain\ncharacteristics because it is ineffective or\nhas adverse side-effects for other\ncharacteristics\nAcademic and In a supplier role, research Universities\nResearch institutions may submit ?virtual Research Hospitals\nInstitutions diagnostic tools? The system. These National Cancer Institute\nvirtual diagnostic tools can be tools to (NCI)\ndiagnose genetic sequences that have\nbeen identified to be indicators of\nparticular diseases.\nIn one embodiment, if there is a cost\nassociated with performing a test, the\ngene cloud can process the payment,\npossibly retain a portion as\ncompensation, and remit the remainder\nto the submitting institution to help\ncompensate/reward them for their\nresearch.”

“Stakeholder involvement in operation of an Illustrative Gen Cloud Ecosystem”

“Gene Cloud Use Cases”

“Table 2 contains some use cases that describe some of the capabilities of certain implementations of a Gene Cloud System, focusing on the trust and security aspects for each case. These use cases are intended to illustrate, but not be exhaustive, the various Gene Cloud functions that can be found in certain embodiments of the inventive body.

“TABLE 2\nExample Use Cases\nUse Case Description Trust and security aspects\nPrescription A doctor is prescribing a Doctor needs to trust that the\nAssistant medication for a patient. The patient’s genetic record and the pharmaceutical were produced by an accredited laboratory. revoked), and that it can be\nThe doctor selects the authenticated to the pharmaceutical\nappropriate test and applies it to manufacturer and/or a reputable\nthe patient’s genome of record. Certifying authority (e.g. a private\nThe test result is returned medical association or governmental\nimmediately. health authority).\nPharmaceutical company may\nrequest some anonymous feedback\ndata to help improve dosing guidelines.\nRegulatory agencies may require\nuse of the tool as a condition for\napproving the drug. (E.g., the tool must be used to prescribe and/or choosenappropriate dose)nCancer A doctor wants to treat a patient with recently diagnosed cancer. The doctor also wants to compare the results to previousnRegimen-cancer tests. tests in the patient record that were\nDoctor orders a biopsy performed years ago, by different\ntaken of the tumor and orders a institutions, he wants to determine\nsequencing of its DNA. To determine if these tests were done, the doctor orders a virtual lab to use trusted procedures and that thentest be performed. The doctor orders a?virtual lab using trusted procedures and that thentest? be performed. He wants to test (e.g. to determine the inputs to the diagnostic tool). The National Cancer Institute might have assembled a?meta-test? that runs three\ntools provided by three different cancer\ndrug manufacturers to determine the\ntreatment with the best chance of\nsuccess.\nPre-natal A woman is pregnant and Although whole genome\nAssessment the child is at risk for a sequencing can be performed on\nvs. ?Designer a specific genetic condition. Limitations can be placed on the number of babies born to fetuses. An amniocentesis test can be done on her sequence. A sample of the data can also be taken. Any restrictions regarding the baby’s DNA can be placed at the lab. Guardians fornThe society at large has new genetic information associated to anDNA test. A sample of the data can be taken and analyzed. Provide a technical solution to enforcenDespite these limitations, the practice or any societal norms (and laws?)n?genomic screening for pre-birth? dictate.\nhas begun to emerge. Trust/Privacy Controls: Many Individuals who have a guardian role or are enacted by governments as custodians may have restricted access to data and tests. E.g. a\nunborn. The default could be “no testing”,? It may be that there is no testing. Possible genomic theory undernAs a part of routine research. She doesn’t want to benhealth assessed (and as a negligent person by failing to test and records for future use,nthe child’s lifetime), those who are not wellnpediatricians swab the baby’s support. The doctor requests a DNA sample. She is assured that the lab will process it. (a) She has requested the tests to be approved by the medicalnstandard community of genetic tests; this is currently recommended. (b) The complete set of tests that shenby AMA and the American requests is the one that isnBoard of Pediatric Medicine. The current standard of medical care. It has created and certified a meta test bundle to aid doctors. In this example, the researcher may request a tool that searches for specific information. The system does not have access to personal data. The aggregate results are all that the researcher has access to. The researcher can only access the aggregate results. if they so choose. They are curious and grant permission to each other. Any test The results of any test should clearly indicate who has the genetic conditions and what level of detail they should see. (e.g. only the risk factors, not\nSince they don’t know if the source of the risk)\nthey will get married, they don’t\nwant to know about the other’s\ngenome, just the risk factors that\nmight be presented to their\nchildren.\nThey want to run a test that\nthey can believe in, but don’t\nwant to pay. They select a?free? test that was co-signed by\nthe peer-reviewed journal\nGeneticsToday, rather than the\nAMA-signed version that\ndoctors use.\nFamilial/ A consumer runs an Access to identity information is\nAncestry ?ancestry request? To determine closely controlled. Privacy violation by itself. Search itself.nIn the above example, the request to exchange informationnresults are in three sequences that should not be revealed to either side. (The requester may not wish to reveal their identity to the originator. However, they have the option of reaching out to the requester to determine if they are willing to double-blind. to determine if both sides are willing to cooperate in a research study. However, both sides must be able to identify themselves. It may be necessary to remain anonymous.

“EXAMPLE USES CASES”

Below are additional examples of systems and methods that implement various aspects of the inventive work.

“Example: Prescription assistant”

“A pharmaceutical company developed an anti-cancer drug that was shown to be effective in treating a subset Alzheimer’s patients. The treatment’s effectiveness is determined by the subset of patients who share certain genotypical characteristics. This means that they are genetically related in certain ways that have been shown to correlate with effectiveness. The exact genotype will determine the dosage of the drug. Overdosing can cause dangerous side effects long-term for patients with a specific genotype.

The FDA approved the drug. However, because the drug is only effective in certain patients and is potentially dangerous if administered at incorrect dosages, the FDA requires that a genetic screening test be performed to determine the likely effectiveness of the drug.

“The pharmaceutical company creates a program to evaluate these factors and packages it in a Gene Cloud VDT. To prove their authorship, the company digitally signs and tests the Gene Cloud VDT. This signature was done using a certificate key issued by the Gene Cloud to this purpose.

The VDT is signed by the pharmaceutical company. After reviewing the program, the FDA tests it in the Gene Cloud using their own data. The FDA then signs the VDT digitally with their certified key. This certificate is derived from another root certificate authority (CA). The VDT includes the certificate chain necessary to validate the signature. The root CA from which FDA certificates are derived is also recorded in the Gene Cloud. Users may use this information.”

Once the VDT has been approved, and all signatures have been attached, it can be uploaded to the Gene Cloud and made available to any potential prescribing physicians. The Gene Cloud allows a clinician to search for the VDT and apply it directly to a specific person’s genome.

“A patient presents to a specialist in cancer evaluation. The doctor informs her that he would love to run a genetic test to determine which treatment is best. The doctor performs the following:

“The lab extracts DNA from the sample and then sequences it. Finally, the lab uploads it.” A secure module has been integrated into the sequencing machine that allows upload of the sample data to the Gene Cloud. This module also provides interface for the technician who is responsible for uploading the sample.

“The lab technician prepares the sample for sequencing and presents a badge to a sensor near the machine. He then enters a PIN code. This acts as a proof of identity and authenticates the technician.

“The technician scans the barcode that contains the temporary sequence ID. This associates the sequencing run with the sample.”

The technician will enter any metadata that is relevant to the sequencing run after it has finished. This means that the sequencing ran went as planned and there were no machine errors.

“The lab technician approves the upload of the sample.”

“The secure module embedded into the sequencing machine encrypts data with an ephemeral secret that was created specifically for this purpose.”

“The secure module adds metadata such as the lab technician?s ID number, sample ID number, technician?s notes, environmental parameters, and so on. Signs the package using a certified key issued by the manufacturer. A trust authority managed under the Gene Cloud issued the manufacturer’s certificate.

“The ephemeral encryption keys are encrypted using the public Key of a Gene Cloud Ingestion Point, which is known by the secure module within the sequencer.”

“The Gene Cloud has uploaded the sequence package along with the encrypted encryption ephemeral keys.”

“The Gene Cloud accepts the package and immediately verifies its source and integrity. The package is signed by the Gene Cloud. For future reference, the integrity status and signer list are recorded.

The Gene Cloud ingestion point’s private key is used to decrypt an ephemeral encryption secret, which is used to decrypt data. The Gene Cloud archives the ephemeral encryption key for future auditing. Data are then pre-processed to ensure correct formatting.

“The Gene Cloud determines which patient the sample corresponds to by determining to whom temporary sample ID was assigned.”

“The Gene Cloud assigns the entire sample a new ID; the old ID is preserved for forensic purposes.”

“The Gene Cloud notifies both the prescribing physician and the patient that the sample was received. The doctor receives this notification and uses the Gene Cloud search tool for the VDT to be found. He then requests that the VDT be applied to the patient’s genome. The doctor may request that the results are visible to his patient.

“The Gene Cloud generates an approval request for the patient (or their designated caregiver) to perform the test. In layman’s terms, approved by FDA, the approval request lists the purpose of the test as well as the identity of the person who requested it. Alternately, the patient might have given permission for the doctor to perform such tests by indicating her relationship with him.

The VDT will be executed once the approval of the patient has been granted. This includes verifying that the VDT has been approved by the appropriate authorities. It also involves decrypting and operating the VDT program.

The VDT results are returned to the doctor who requested them. An audit record is created and stored. A notification is sent to the patient that a test was performed. It also includes information about who ordered it and what it was. It could include or not the test results depending on how the doctor set up the VDT request.

“The VDT results are evaluated by the doctor and the prescription is made.”

“Example: Tumor Classification & Treatment”

“This is a two-part example. The first part is where a research group attempts to classify breast carcinoma tumors into different classes according to their response to different pharmaceuticals. This research aims to determine the classes based upon genotypes and information about how the various treatments have affected them.

“In the second section, a doctor treats a patient who has recently been diagnosed with cancer. The doctor orders a biopsy of the tumor to be taken and a sequencing of its DNA. A ‘virtual lab test’ is ordered by the doctor. This compares tumor DNA with the patient’s normal DNA and compares tumor DNA with other cancers. The doctor then prescribes a treatment plan that is appropriate for the patient’s genetic makeup based on these comparisons.

“Now let’s move on to the first part. A research group attempts to classify breast carcinoma tumors. The researchers have a hypothesis which suggests that seventy-five genes may be involved in the biology of the disease. They want to assess as many patients possible to gather information that will allow them to classify the tumors into different treatment groups.

“The researchers create a number of bioinformatics software programs to run in Gene Cloud:

These programs are uploaded to the Gene Cloud by researchers as Secure Research Requests (SRR), which is a type of VDT request. The Workflow specifies the Start Date for the research experiment.

The Selector is run in a trusted execution environment, which ensures it only has access to relevant phenotypical and not genome data. The Selector will identify a group of 1200 patients who meet the criteria in the Selector.

“As each possible cohort member is identified, and added to the study,” The Gene Cloud uses the userID (or medical record ID), of the member to search for unique genome sequence identifiers associated with the patient. The Gene Cloud performs the mapping from the user ID to the genome ID. This prevents the entire workflow of associating personal identifiers to genomes.

“The Gene Cloud verifies the policies of potential cohort members are compatible with the research uses for their genome data. The Gene Cloud verifies that patients have given permission for their genome data being used for research purposes. While some patients might be open to any research, others may prefer that the researcher is affiliated with an academic institution or public health institution. Other patients may want to be explicitly invited to approve any research use and may expect to receive compensation if their data is used in a study.

“The Gene Cloud creates a Gene Profiler instance for each cohort member whose insurance allows them to participate. The normal and tumor genomes are made available as input to this instance.”

The Gene Cloud assigns each instance of a Gene Profiler a randomly generated random ID. This random ID can be used to identify the cohort members without disclosing any personal information.

“Just like the Selector, Gene Profilers run in a trusted execution environment which restricts access to resources such as databases and storage. A Gene Profiler might be prohibited from sending an HTTP request or posting genome data to third-party sites. For example, it may be blocked from accessing phenotypical and genome data not specifically assigned by the Gene Cloud.

The Gene Profiler program can have input in many ways. The Gene Profiler is informed that it has two genomes to use as arguments. One for normal cells and one in the case of tumor cells. The Gene Profiler uses reference identifiers from the Gene Cloud to request sequence data for the seventy five genes in question. These sequence data are not revealed to the Gene Profiler, so the Gene Profiler is protected from any leakage of genome ID information.

“As soon as the data is submitted to Genome Profiler, they are audited. A user might have indicated that she does not want the status of her BRCA2 gene revealed to anyone for any reason. The Gene Profiler who requests this data would then be denied and must decide how to respond, whether it is terminating the request or producing a best-effort results without the requested information.

These data are validated in the exact same way as inputs to a typical VTDT. This validation could include constraints on the source or quality of the input data and the data format.

“The Gene Profiler runs the data it was assigned, produces an answer, and returns to the Gene Cloud with the random-produced identifier. This is then passed on to Classification Learner.”

“The Classification Learner also works in a trusted execution environment and begins to receive results for various Gene Profiler instances.”

“The Classification Learner doesn’t necessarily know the expected number of results it should receive. Even when the number of cohort members is known, mistakes in Genome Profiler instances or policy violations may result in fewer results than expected. The Classification Learner will eventually decide whether to run its algorithm. However, it just collects inputs. The Workflow specification of the researcher states that the Classification Learner should run if there is more than 1000 samples and it has not received any new data for at least one hour.

“The Classification Learner requires not only the results of the Genome Profiler instances it has collected, but also information about each cohort member and how they responded to particular treatments. The Gene Cloud provides APIs to the Classification Learner that allow it to query non-personally-identifiable phenotypical properties using the random identifier assigned to the Genome Profiler as a proxy for the cohort members’ IDs. This indirect mechanism allows the Classification Learner to correlate genotypical and psychometric information without accessing personally identifiable information like names, addresses, medical records numbers, etc. Only to the properties that are relevant to learning the classification.

“The Classification Learner generates an output result for researchers, which contains data structures that can help classify new instances on genomes other than the training set.

“Applications of the classifier are similar to those of the?Prescription assistant? As shown in an earlier example. The researchers created a new VDT program to test and apply the classifier they learned above. It incorporates the previously learned classification information. The Classifier program works on the genomes of a single patient and her tumor, extracting seventy-five genes from the profile and applying the classification learned above.

“As in?Prescription assistant? Third-party authorities may also be able to certifiy the VDT (the Classifier Program). After the Classifier has been tested and the results are deemed acceptable, an entity like the FDA or National Cancer Institute can digitally sign the VDT to indicate compliance with its policies.

“Example: Blind Pharmaceutical Screening.”

Experts believe that the era blockbuster drugs are over. The future of pharmaceuticals will be based on more targeted therapies for patients than universally-applicable drugs. A patient’s genotype can be used in many cases to determine if a therapy is effective. Pharmaceutical companies are keen to find potential candidates for clinical trials or direct marketing. This should not be done without respecting the privacy of patients.

“In this case, a pharmaceutical company created a screening program to determine if the genetic owner is a candidate for an anti-psychotic drug. The pharmaceutical company found that individuals with a particular genotype respond well to the drug.

“The pharmaceutical company creates a series of bioinformatics programmes:”

“The pharmaceutical company creates an inquiry, signs, and uploads the programs to the Gene Cloud. They begin to run. As new cohort members become available, the Selector will continue to run and identify them for further study.

“Initially, there are no matches for the Selector because nobody knows about this trial or has chosen to allow all their genome data to be mined by pharmaceutical companies. The policies of the Gene Cloud owners?or, more specifically, the absence of policies that would allow the use to prevent matches from happening.

“The pharmaceutical company posts notification to a patient group?hosted in the Gene Cloud system?that includes a link that allows interested parties to sign up to this free screening.”

“The invitation to take part in the screening explains the purpose of the test and the benefits it could bring to the person being tested. The invitation clearly states that the pharmaceutical company cannot learn the identities of participants and that participants must follow up if they feel they may be a good match for the therapy.

Summary for “Systems and methods to protect and govern genomic and other information”