Digital Healthcare – Richard J. Davies, Rick Batye, MD DATACOR, MDdatacor Inc

Abstract for “System, method and apparatus for storing and retrieving clinical, diagnostic, genomic and therapeutic data.”

“A method, system and computer program product are disclosed for retrieving and storing patient data in a database that is connected to a network. The system, method, and computer program product include storing clinical data in a database, extracting clinical data from it, querying the database with a taxonomy that includes exclusive or inclusive search criterians, and receiving a result list. The system, method, and computer program product include creating a taxonomy with at least one search criteria, querying the database using the query and receiving a result set. This result set includes at most one result record. A user can also be included in the method, system, or computer program product, such as a clinician researcher, a treating doctor, or a consultant physician who analyzes the result set.

Background for “System, method and apparatus for storing and retrieving clinical, diagnostic, genomic and therapeutic data.”

The U.S. healthcare sector is the sector with the highest stable growth rate. The demand for healthcare services increases with increasing age. The average person over 65 uses four times more healthcare dollars than someone under 65. This means that the healthcare sector’s growth rate will likely increase as the U.S. population grows from 12% to 18% by 2020.

A data warehouse is a collection data that can be used to support both clinical and patient management decisions. A data warehouse is a collection of many data that provides a clear picture of business and clinical conditions at one time. The development of a data warehouse involves the creation of systems that extract data from operating system and the installation of a warehouse database system that allows clinicians or managers to have flexible access to the data. Data warehousing is a term that refers to the process of combining multiple databases. It is the process of combining multiple databases within an enterprise. In contrast, a ?data mart? A?data mart? is a collection or database that helps clinicians and managers to identify and make clinical and strategic business decisions regarding their patients. Data marts, which combine databases from multiple enterprises, are typically smaller and more focused on one subject or department. Some data marts are called dependent data marts and are subsets larger data warehouses.

The vast amount of medical technology and information is opening up new avenues for drug and device therapies, diagnosis, and disease prevention strategies for many diseases. This includes heart disease, diabetes and hypertension, mental illness and allergic reactions, as well as cancer, heart disease, heart disease, heart disease, heart disease, heart disease, heart disease, heart disease, heart disease, and infectious disease. Many diseases are linked to specific contributing factors such as genetic factors, family history and dietary issues. To improve diagnosis and treatment accuracy, it is important to identify these contributing factors. Furthermore, the future of healthcare will emphasize disease prevention and past treatment, diagnosis, and treatment. It will also be important to identify people at high risk for developing a disease.

“Genetic information has become a powerful tool for clinicians and researchers in medicine. Genomic studies will lead to the development of many targeted therapies. Researchers and clinicians will soon be able to identify variations in Deoxyribonucleic acid (DNA) and predict the patient’s response to a specific medicine. It is crucial for physicians to identify if a patient has a genetically-based reaction to a drug. Approximately 7% of all patients have severe adverse reactions to prescribed medications, with drug side effects being the 5th leading cause of death in the United States in 1997 (Pharmacogenomics-Offering a Wealth of Targets for the Pharma Prospector; IMS Health Web Site). Clinical intelligence is needed to allow a doctor to identify when a patient’s clinical profile, family history or symptoms suggest a genetically-based reaction to a specific therapy. If a patient is identified this way, they will be eligible for genetic screening to determine if they have a genetic anomaly that will result in an adverse side effect. This information will allow a physician to prescribe better medicines and treatments.

“Aside from identifying therapeutic strategies the healthcare industry recognizes the value of a database system containing electronic health records (EMRs). This would allow for better patient care and improve the efficiency of the doctor’s practice. A well-functioning EMR system could provide valuable information to a wide range of applications. This includes, but is not limited, diagnostic, therapeutic, clinical, marketing research (i.e. passive recruitment of a population), and marketing services (i.e. active recruitment of research populations). EMR companies have been marketing the benefits of EMR systems for over a decade. However, the adoption rate of the technology is slow due to the complexity of integration and the need to modify workflows. Automation in physicians’ offices is restricted to small-scale client server based billing and scheduling. EMR software and other database management capabilities are not available to all physician practices. Even fewer have IT support. EMR management is becoming more important due to the complex regulatory environment that clinicians face. A paper-based system is almost impossible to comply with the new healthcare regulations and practice guidelines. Moreover,”

“PCT Patent Application Serial Number WO 00/51053” refers to a medical and diagnostic database that includes patient records, including genotype, phenotype, and sample information. However, the PCT application describes a database system that relies on genotype or stored samples information to generate correlations between genotype and phenotype.

“Moreover, the medical databases in the prior art force physicians to alter the normal process of collecting information because they rely on physicians to complete a questionnaire and have other restrictions on data entry that are inconvenient for the physician. The U.S. Patent discloses an epidemiological database as one example of exemplary medical databases. Ser. No. No. Ser. No. 6,182,029. 6,182,029.

A successful product or service within the healthcare industry will improve the quality of life of a large number patients. It will focus on the physician’s tasks while offering a cost-effective solution for a problem. The healthcare industry will provide clinical and economic value for the patient’s medical records by automating the collection and processing clinical documentation.

“FIG. “FIG. Patient 100 first visits physician 110 to discuss a clinical reason. You can visit physician 110 in any clinic such as a private practice, hospital, or health clinic. The visit may be for an annual physical, or to treat a medical condition. Physician 110 compiles a medical note following the visit. This may include historical medical information, vital signs and symptomatic descriptions as well as prescriptions for pharmaceuticals or other diagnostic findings. Physician 110 uses the public switched telephone network 120 (PSTN) to connect to transcription service 130 to dictate the clinical notes for patient 100. The dictated clinical note is stored by transcription service 130 in an audio format on storage device 131. Transcriptionist 130 retrieves the dictated medical note from storage device 131, transcribes it into electronic medical records 135, and then stores electronic medical records 135 in digital format on storage device 13. Physician 110 reviews electronic record 135 and keeps a print copy in paper-based charting 140.

“After the visit with patient 100 physician 110 may recommend that clinician 115 conduct a clinical test on patient number 100. Physician 110 reviews the results and discusses them with patient 100. Then, he stores the results in a paper-based charting 140 that is associated with patient 100.

FIG. 1. lacks the ability of efficiently searching for data not associated with a particular patient. A system, method, or apparatus that automates clinical documentation and allows for the storage and retrieval in natural language format of clinical, diagnostic, treatment and treatment data is needed. Software tools will be provided to help define clinical term or disease taxonomies and group the parsed data. Search criteria can then be used to allow intelligent searching of the data warehouse. The disclosed system, method and apparatus automates clinical documentation and provides search tools and an engine for a data warehouse that unlocks clinical and economic value in patient medical records.

“A method, system and computer program product are disclosed for retrieving data from a database. The system, method, and computer program product include creating a taxonomy with at least one search criteria and sending a query through the database. In response to the query, receiving the result set, which includes at most one result record, and then displaying that record. A user can also be included in the method, system, or computer program product. This could include a consultant physician, a physician treating, or a clinical researcher.

“Creating a taxonomy may also include adding at most one search rule to it that includes at least 1 search characteristic, storing and validating the taxonomy. Every search rule contains an inclusion search rule that defines at least one search characteristic. Each inclusion result record generated by running the rule against the database includes at least one search rule. Alternately, each search rule contains an exclusion rule that defines at least one excluded search characteristic. This means that running an exclusion against the database will generate at least one record of exclusions, with each exclusion record excluding the at least one exclusion criteria. Alternately, each search rule contains an inclusion rule to determine at least 1 inclusion search feature and an exclusion rules to define at most 1 exclusion search attribute. Running the exclusion against the database will generate at least 1 inclusion result record. Each inclusion result record will include said at minimum one inclusion search feature. Whereas running the exclusion against the database will generate at least 1 exclusion record. Each exclusion record will exclude said at-least one exclusion look characteristic. The search characteristic can include a diagnosis phrase, a prescription for drugs, an illness, and demographic data. Demographic data may include a geographical location, gender, or age. A clinical diagnosis phrase that includes a myocardial injury, an LDL or a heart attack.

Validating the taxonomy may also include running it against the database, receiving and displaying the result sets. Notifying the database can also be part of running the taxonomy. Receiving the result set may also include receiving an inclusion results set. This is where at least one search rules includes an inclusion rule. Each record in the result set contains at least one inclusion search attribute. The receiving of the result sets can also include an exclusion set. In this case, at least one search rules includes an exclusion rule, and the exclusion rule against a database generates the exclusion set. Each record in the exclusion set must contain at least one exclusion-search characteristic. Another option is to receive an inclusion result sets. This means that at least one search rules includes an inclusion rule, and running an inclusion rule against a database generates an inclusion result. Each record in an exclusion set includes at least 1 inclusion search characteristic.

“The creation of the taxonomy may also include an analysis of the result sets and the updating of the taxonomy based upon the analysis of the result sets. An additional step in updating the taxonomy is to flag an included or excluded record, and unflag it.

“In one embodiment, the analysis of the result set could determine a disease susceptibility type or risk for at least one patient. A genetic test of at least one patient could identify a modifier gene or detect cancer. A sample, such as a tumor or tissue sample, could be tested by somatic testing. This would allow the patient to determine if the disease is present, and predict whether a drug response will occur. It also gives information that can help in predicting the likelihood of the disease. Proteomic testing on at least one of the patients could provide prognostic information or a propensity to develop the disease. Another embodiment of the analysis of the result set could identify at least one person, such as a drug response polymorphism or hypertension drug response polymorphism. Another embodiment generates a treatment recommendation for at least 1 patient, identifies at most one clinical trial where said at least 1 patient is eligible, models the protocol for a virtual clinical trials protocol, or generates data for market research or market services.

In one embodiment, data may include diagnostic data such as past diagnosis and treatment, biochemical data or biochemical data. It could also include family history, genetic data, drug response history, data on diet, exercise, and physiologic data. Data may also include genotype data and haplotype data, such as a chromosome arrangement, a DNA sequence or length, or a gene expression or nucleotide polymorphism. Another embodiment of the data is related a genetic-based disorder and includes oncology, cardiology, gastroenterology, orthopedic, gene expression, haplotype data, genotype data or haplotype data. Another embodiment of the database may contain an archive database, an audit log, or error log.

“A method, system and computer program product are disclosed for storing patient data in a database connected with a network. The system, method, and computer program product include receiving the clinical data of the patient, storing it in an archive database connected with the network, extracting the data from the clinical records, and then storing the data within the database. The computer program product, method, and system can also include the storage of the structured file in the databank. Alternately, the system, method, and computer product may also include creating a patient record in the database and populating it with data.

“Receiving the clinical information can also include setting up a network connection to a server machine that contains the clinical data, and then requesting the clinical details from the server. After receiving the clinical information, the receiving process can also include dismantling the network connection to that server computer.

“Extracting the data may also include creating a structured data file, parsing it, and then copying it into the structured data. The structured file must contain a tag for each segment of the clinical data. Parsing the clinical data may also include the location of at least one segment within the clinical data. Parsing the clinical data may also include the conversion of data from said at least one segment to another format in order to increase the database’s performance when performing a search or adding records. Parsing the clinical data may also include linking data from said data segment to relevant clinical data for another patient. Parsing the clinical information can also include the recognition of a known error in clinical data. In this case, the parsing the clinical details corrects the error before the copying of clinical data. Alternately, parsing the clinical data may include the storage of an unknown error in an errors database. Another embodiment of the structured file tag is an extensible Markup Language tag, a Hypertext Markup Language tag or a Health Level Seven tag.

In one embodiment, data may include diagnostic data such as past diagnosis and treatment, medical history, biochemical, biochemical, physiologic, proteonomic, family history, diet, exercise, demographic, drug response history, or other data. Data may also include genotype data and haplotype data, such as a chromosome diagram, a DNA sequence or length, a gene expression or a nucleotide polymorphism. Another embodiment of the clinical data includes a medical record that includes a clinical note, a laboratory report or a laboratory result. Another embodiment of the data is related a genetic-based disorder and includes oncology, cardiology, gastroenterology, orthopedic, immunology, neurology, rheumatology, neurology, pulmonology, family practice medicine, demographic, and internal medicine data. Another embodiment of the database may contain an archive database, an audit log, or error log.

“Another embodiment of the system, method and apparatus for retrieving and storing clinical, diagnostic and treatment data. The system, apparatus, and method parse a transcriptional data stream, an electronic medical record, or a historical third-party data base, store the parsed data in the data warehouse, and provide software tools to identify disease or clinical taxonomies. These search criteria allow for intelligent searching of the data warehouse.

The present invention is a general-purpose computer system and method that includes a database that can contain information useful for diagnostic and clinical purposes. The system allows users to input clinical information about patients from any source including laboratory reports, physician’s dictated notes and EKG or other instrument reports. It also creates an electronic medical record that includes patient information. This electronic medical record then correlates patient information with information stored in the data warehouse. Users can also receive suggestions for treatment, diagnostics, or genetic testing. The invention also addresses methods for extracting and storing clinical data. It also provides methods for searching, correlating, and identifying patient groups that share similar attributes.

The present invention also relates to a general purpose computer system, method and apparatus that contains a plurality electronic medical records. Each record contains clinical information about an individual patient, including phenotypes, medical, familial, biochemical and proteonomic as well as diet, exercise, demographic and drug response history. Further, the present invention relates to systems that include genotype and/or Haplotype information. These electronic medical records and methods can be used for many purposes, including clinical, diagnostic, market research and clinical trial applications.

“The present invention also relates to a method of determining a patient?s disease risk and susceptibility types. This involves extracting clinical data from any clinical source to create an electronic record. Correlating patient’s clinical information to information from the system and/or accessed via one or more public/private domain databases and generating a result set that contains a suggestion for genetic, proteinomic and/or another type of diagnostic testing.

“The invention also pertains to the display of identified correlations and/or the calculation of statistical significance for the identified correlation.”

“The invention also relates to the entry of the results from the genetic, proteonomic and/or other diagnostic tests or transmission into the data warehouse and the generation of a result set that includes suggestions for treatment based on the patient’s records.”

“The invention also refers to a method of identifying a patient having a drug reaction polymorphism. This involves creating an electronic medical record, which includes extracting patient’s clinical information and drug reaction information from any source. Correlating this information with information in system and/or accessed via one or more public or privately owned databases relating single polynucleotide Polymorphisms (SNPs), and producing a result set that includes a suggestion to genetic test for possible SNPs associated with the drug.

“The invention also relates to the entry of the results of the genetic test into a system. After that, the system generates an alternative therapy suggestion based on the patient’s record.”

“The present invention also addresses a method of identifying subjects for clinical trials. This involves extracting clinical data to create an electronic medical records, correlating patient’s clinical information with other patient information in the system, identifying populations or sub-populations of patients with similar phenotypes or genotypes or clinical characteristics, as well as identifying clinical trials that would be appropriate for the patient to participate.”

“The invention also relates to the general purpose computer system, method and apparatus described herein that can be applied to a wide variety of diseases including but not limited to cancer, heart disease and diabetes, hypertension and mental illness, allergies and infectious, neurological, and immunological disorders.”

“FIG. FIG. 2 shows an example of a system that integrates data warehouses for storing, retrieving, and treating clinical data into the prior art medical documentation process. 1. Another embodiment integrates a data warehouse in the prior art clinical documentation process to determine the disease susceptibility type or risk for a patient. The FIG. 2 system shows how the prior art clinical documentation process can still be used in any of the embodiments. 2 and adds additional features to correct the deficiencies in the prior art process.”

“In FIG. 2. Physician 110 can connect to transcription service 230 via either the public switched telephone network 120 (PSTN) or network 220. PSTN 120 covers traditional landline telephone networks as well as mobile and cellular telephone networks. It also includes satellite-based telephone networks. Network 220 is the public Internet, wide-area networks or local area networks using an transmission protocol such as transmission control protocols/Internet protocol (TCP/IP), file transfer protocol (FTP) or personal area networks like a Bluetooth network. The system illustrated in FIG. 110 can be used by Physician 110 to input medical, diagnostic, or treatment data. 2. Using a variety audio and digital input formats. Audio input formats can include both traditional audio over a PSTN and digital audio over wireless networks. Digital input formats include digital audio, voice recognition technology, digital video, audio/video, digital documents like word processing documents and portable file format (PDF), documents, and digital images files.

“In addition to being able to receive input from physician 110 and clinical provider 115 the system shown in FIG. 2. may also receive input from third-party database 215. Third party database 215 may contain pharmacogenomics and laboratory databases, instrumentation databases, and other publicly accessible medical databases. Third party database 215 can also communicate with FIG. 2. via PSTN 120, network 220. The system in FIG. 2. Retrieves the relevant information.

“In FIG. 2. Since physician 110 can input data in many formats, storage device 231, transcription service 230, stores both audio and digital input formats. The system transcribes physician 110’s input data into an electronic medical record 135 then forwards it to physician 110 via network 220 or PSTN 120. Transcription service 220 also transcribes physician 110’s input data into structured electronic medical records 235. Structured electronic record 235 enhances electronic medical record 135’s contents by segmenting it into fields and associating an?tag? Each field can be tagged with a?tag? One technology that may be used for field tagging is the Extensible Markup Language, (XML), which is a tagging system based upon the hypertext markup languages (HTML) or the simple generalized marker language (SGML), which are both healthcare industry tagging standards. Speech Machines can perform a subset of functions that are performed by transcription service 233. DictationNet service, along with similar offerings by Vianeta? and MedRemote? and Total eMed?. FIGS. FIGS. 4A- 4C show an example electronic medical record for a fictional patient. FIGS. FIGS. 5A through 5I show the exemplary electronic health record. As an example structured electronic medical record, 4A through 4C includes XML field tagging.

“FIG. 2. also shows the interaction between industry customer 260, transcription service 230, and data warehouse 250. Data warehouse 250 receives as input data electronic medical records 135 and structured electronic health records 235 from transcription services 230. Data warehouse 250 stores the input data and provides search tools for industry customer 260 to search through data warehouse 250. Industry customer 260 includes physician 110 and medical marketing agencies, manufacturers of medical devices, Medicare, clinical researchers organizations, and companies that focus on pharmacology or genetics.

“FIG. “FIG. 2. Batch download module 311 receives input data from data warehouse 250. This data is derived from electronic medical records 135 and structured electronic health record 235. Archive module 325 stores an archive or backup copy of the input data. Parse module330 processes the input data, and stores the result data in archive data 325. Search 340 contains taxonomy definition module 342, and taxonomy validation program 344. It also includes query builder 346 that performs search functions on clinical, diagnosis, and treatment data 332 to produce query results for output 350. Output 350 contains web distribution module 352, module 354, report generation module 354, as well as download module 356 for distributing query results from search.340 to industry customer 265. Web distribution module 352, Report generation module 354, or download module 356 store the result data in audit log 334.

FIG. 325 shows archive data, diagnostic, treatment, and clinical data 332, and error log 334 and audit log 336. 2. as separate databases, but the present invention contemplates consolidating them as well as distributing them to meet efficiency and performance requirements. These databases may use a relational database management software such as Oracle 8i (version 8.1.7). An object-oriented database management software architecture may also be used in one embodiment of these databases.

“FIG. “FIG. 3. Step 610 determines whether batch download module310 is performing bulk data loads. If the answer is “no”, batch download module310 performs a periodic retrieval and the process moves to step 612. If you answered “yes”, batch download module310 will perform a bulk download and proceed to step 626.

Referring to FIGS. “Referring to FIGS. 2, 3 and 6. The periodic retrieval of input information begins at step 612 when batch download module310 issues a query for input data, transcription service 230. Batch download module 310 will determine at step 614 that there is no data available. The process then proceeds to step 624. It will then sleep until the next retrieval period. If data is available at transcription services 230, the process proceeds with step 616. Next, it retrieves electronic medical record 135 and then at step 618, it retrieves structured electronic medical records 235. Batch download module 310 stores electronic health record 135 and structured medical record 235 in archive 325 at step 620. Step 622: Batch download module 310 extracts structured electronic medical record (235). FIG. FIG. 7 explains the parsing process more fully. The process continues to step 622 and then sleeps until the next retrieval period.

Referring to FIGS. Steps 2, 3 and 6 are for bulk downloading input data. Batch download module 310 connects to a data loader. A data load server is a general purpose computer with direct access to bulk data. A network connection is used to facilitate communication between the data warehouse 260 and the data load server in one embodiment. Another embodiment integrates the data load server and the data warehouse 260 into one general-purpose computer platform. Batch download module 310 starts an iterative process to load the data by retrieving an electronically stored record, such as an electronic medical record 135. Batch download module 310 converts an electronic record into a structured electronic document such as structured electronic health record 235. This conversion is similar in nature to that of transcription service 230 to create structured electronic record 235. Step 632: Batch download module 310 stores both the electronic record as well as the structured electronic record in archive 325. Step 634: Batch download module 310 parses structured electronic records. FIG. FIG. 7 explains the parsing process more fully. Batch download module 310 will determine, at step 636 that there is more bulk data available. The process continues from step 628. Batch download module 310 disconnects at step 638 from the data load server if all bulk data have been loaded. The process continues to step 638 and then sleeps until the next retrieval period.

“FIG. “FIG. 3. Step 710 is where you create an empty database record. Parser module 330 starts the iterative process to locate a tag field in structured electronic medicine record 235 as shown in FIG. 2. Parser module330 locates the tagged areas in structured electronic medical records 235. It does not process every word to determine the meaning of the phrase within the document, in the context of a particular domain or canonical grammar. Parser module330 will determine, at step 714 that the structured electronic medical record (235) does not contain additional tagged fields. The process stores the record in the clinical, diagnostic, treatment data 332 at 724. Parser module 330 will attempt to correct known data errors at 718 if it finds a tagged area but fails to recognize it. If the error is not a data error, the process writes the data to an exception log at step 716. The exception log will be periodically analyzed by a system operator who will attempt to correct or reprocess any incorrect data. The process converts the field data at step 722, if the field has been tagged, to improve the efficiency of the database search. If the field contains the date of the patient’s visit, then the structured electronic medical record data 235 field data consists of?Mar. 28, 2001? stored in a text field with a length of 10 characters. Step 722 will convert field data to a date and time, as it is inefficient for a database search to text data. datatype. The Oracle? The DATE datatype is an excellent example of a?date? and?time? Datatype is extremely efficient as it uses only 7 bytes to store the day and month, century, year and hour, minute and second. If the field data uniquely identifies another record within the database, it is linked to this record after converting it at step 724. The process continues from step 712 by repeating steps 720 and 724.

“FIG. “FIG. 3. A taxonomy is a collection of clinical, diagnostic, or treatment data that a database query will return. A taxonomy includes a description of the illness, drug prescriptions and medical coverage, as well as demographic data like geographic location, gender and age. It also contains clinical diagnostic terms like myocardial Infarction, LDL or heart attack. Step 810 is where the inclusion rules are created. These rules specify the characteristics that must be included in every record of the taxonomy definition result set. Step 810 is followed by the taxonomy definition process which creates the exclusion guidelines at step 812. The exclusion rules specify characteristics that cannot be found in any record of the query result set. Once a user has created the inclusion and exclusion criteria that make up the taxonomy, the taxonomy process stores them in the clinical, diagnostic, or treatment data 332 at step 804.

“FIG. “FIG. 3. Step 910 is when the validator selects a taxonomy description from the database to validate. Step 910 is when a validator selects a taxonomy description to be validated. The database runs step 914 the inclusion rules of taxonomy to generate an inclusion set. The database runs the exclusion rules from the taxonomy to generate an exclusion set at step 916. The inclusion result set is flagged at step 918 if the same database rows appear in both the exclusion and inclusion result sets. Step 920 is when the database informs the validator that an analysis of the inclusion result set has been completed. This analysis involves row-by-row inspection. The validator can remove an exclusion flag from a row to correct the error and update the taxonomy description to delete the row from the exclusion set. The validator can also update the taxonomy to include additional rows into the inclusion result set. After the analysis is completed, the validator saves updated taxonomy in the database at step 922. Then, he/she can optionally repeat the process starting at step 912.

Referring to FIG. 3. The query builder module 346 lets a user, such as a clinician researcher, treating physician, or consulting physician pose a clinical question to receive a set of results that answer the question. The query builder module 346 merges multiple taxonomy definitions to create a single result set.

“The invention concerns a database system that contains information useful for medical marketing, clinical diagnostic, clinical trial recruitment, and other purposes. The invention’s database system has two main advantages over existing medical database systems:

The system is a new data entry system that extracts relevant clinical information from almost any data source, including physician’s dictated notes and laboratory reports, EKGs, EEGs, or other instrument reports. After the results are created in an electronic-based medium, the database system tags the data to allow for search and correlation functions. This is a particularly useful method because it allows for the entry of large amounts of clinical information. It also doesn’t require clinicians change their routine collection of such information. For example, they can limit them to questionnaire formats, or other fixed data entry methods.

The second is that the system allows a clinician to access valuable, current information and make suggestions for diagnostic testing. This information can be based on the patient’s clinical data and attributes without the need to obtain specific genotype information. The database system of invention correlates patients’ clinical information, including specific attributes and demographic information, with data in the warehouse and generates recommendations for genetic, proteonomic or other diagnostic tests based on the patients phenotypic characteristics. Further, the invention relates to the entry of genetic testing results into the system. The system then generates treatment suggestions and/or alternate therapies based on those results.

“One embodiment of the database system includes a number of electronic medical records. Each record contains clinical information taken from any clinical source relevant to a patient. Because they can be easily segmented and searched using a wide range of criteria, the electronic medical records of this invention are particularly important. The electronic medical records of this invention include relevant clinical information such as phenotypes, medical, family and biochemical data, physiologic and proteonomic data, diet, exercise, demographics, drug reaction history and drug prescriptions. Laboratory results are also included. Past diagnoses and treatments can also be included. The database may optionally include information such as medication taken, occupational history, hobbies, diet, family history and normal exercise routines. Information that is more specific includes whether an individual is receiving hormone replacement therapy, whether they smoke, and whether they are a regular user of a sun-tanning device. It also contains information about the geographical region where the patient lives. One embodiment collects the individual’s phenotype as well as chemical information simultaneously to ensure that the information is most relevant to the patient’s phenotype.

“Another embodiment of the invention is a database system in which the electronic medical record contains the patient’s genotype information and/or haplotype information. Genotype and haplotype information, for example, includes information about chromosome structure, DNA sequence, length of specific genes or regions, gene expression levels, identifications of single nucleotide Polymorphisms (SNPs) and/or other information related to a patient’s genetic makeup. Alternately, or further, the genotype data can include a record of the actual or inferred DNA bases sequences at one or several regions within the genome. The genotype information may also include a record of variation in a specific sequence of chromosomes compared to a reference sequence. This indicates whether or not there is variation at the same positions within the sequence. A record of length or variants of a sequence can be included in the genotype information. This information is useful for determining whether there is a correlation between genetic variation or phenotype variation.

“It is possible that an individual’s genotype information (such as SNP information) will not be known at the time they are examined by their doctor in many of the applications of this invention. According to the invention, the doctor would input the patient’s medical data, including demographics, medical history, and laboratory test results, into the database. The system would then link the patient’s clinical data with the database and/or access from any other public or private databases to generate a recommendation for a specific genetic test. The patient’s medical information can be compared to other records in the database in order to identify common attributes in the population that shares a common SNP. The physician would be informed that the patient shares attributes with others who have the same SNP. This method can also be used to identify patients who are good candidates for clinical trials.

“In another embodiment, this invention is related to a method of determining a patient?s disease susceptibility and risk. Future healthcare strategies will be more focused on disease prevention in areas like congestive heart failure and cancer, as well as neurological diseases, and other degenerative conditions. This method involves extracting clinical data from any source and creating an electronic patient record. It also includes correlating patient’s clinical information to information in the system and/or accessed via one or more public/private domain databases like the SNP Consortium. The result set may include a suggestion for genetic or proteonomic testing.

“In another embodiment, the present invention also pertains to the display of identified correlation to aid with determining statistical significance of that identified correlation.”

“In another embodiment, this invention also relates to putting the results from the genetic, proteonomic and/or other diagnostic tests into the system and creating a result set that includes a suggestion of treatment based on the test result and patient’s record.”

“Another embodiment of the invention is a method to identify a patient with drug response polymorphisms. This involves creating a patient record, including patient’s clinical information and drug response information. Correlating this information with information in system and/or accessed via one or more public or privately owned databases relating single polynucleotide Polymorphisms (SNPs), and producing a result set that includes suggestions for genetic testing for possible SNPs associated with the drug reaction.

“In another embodiment, the present invention further refers to the step in which the results of the genetic test are entered and the system generates an alternative therapy suggestion based on the patient’s record.

Although many SNPs have been identified and their significance is unknown, they are being investigated. Patients can be tested cheaply using PCR, restriction fragment long polymorphism, microchip array technology or other well-known methods. However, the missing link is access to clinical information that can be used to identify patients for genetic testing. This link is provided by the present invention, which allows a clinician to combine phenotypic and specific genotype information. It is crucial to have accurate genetic testing, if indicated by clinical and demographic information.

“Another embodiment of the invention relates to a method of identifying subjects for clinical trials. This involves extracting clinical data to create an electronic patient record. Correlating this patient’s clinical information with other patient information in the system, identifying sub-populations of patients with similar phenotypes or clinical characteristics and identifying clinical studies that would be appropriate for their participation.

“Approximately 65%” of clinical trials fail to finish on schedule due to delays in recruiting patients. Trial sponsors spend $1.3 million each day to pay for the delay in recruitment, which averages more than three months. Sponsors rely on their treating physician and his research staff almost 100 percent to screen and enroll patients for clinical trials. To?recruit,? there are many media options such as radio/TV, the internet and other media. Clinical trial candidates are not always successful, particularly when they have to deal with a chronic condition. Patients trust their physician to inform them about all possible treatment options.

According to current practice, the sponsor of a clinical trial grants a clinical trial to a doctor or group of physicians who have participated in clinical studies in the past and have a large number of patients from their practice. This is because the vast majority of these practices don’t have the capability to search any type of database to conduct a suitability check or, as it is commonly known, screening. for patients based on detailed, multi-dimensional, ?inclusion/exclusion? Criteria? Patients on multiple drugs may or not allow them to be included. Past medical history may or not exclude the patient. It would be difficult and expensive to search their medical records manually because they are paper-based. Therefore, doctors and their research staff wait until patients are seen in the office before initiating the screening and recruitment process. Sponsors will lose hundreds of millions in sales revenue due to this inefficient process.

The present invention solves this problem by using the data warehouse and search function to screen large numbers of patients automatically. This is done with greater accuracy and use the inclusion/exclusion or validation functions described herein. A patient may be qualified for a clinical study if he is Type II, insulin-dependent diabetic or takes a cholesterol lowering medication. The invention allows the user to exclude or include subjects based upon detailed information. This system also speeds up clinical trial screening and enrollment, with less administrative and cost-intensive costs for the researchers and physicians.

“This invention also provides a method for identifying individuals and sub-populations that share similar phenotypic and genetic characteristics. These individuals or sub-populations can be used to provide valuable information for diagnostic, therapeutic or research purposes. One embodiment of the invention allows for the identification of sub-populations of individuals with common phenotypic characteristics, using shared attributes from the database. The sub-population of individuals may be further evaluated to determine whether they share a common genotype or if they have a unique response to drug treatment. This is especially useful in identifying appropriate matched control populations and test populations for the clinical evaluation of drug treatments.

“A further embodiment allows physicians to identify individuals from the database based on common characteristics. These individuals could be candidates for further diagnostic testing such as genetic testing or screening for specific mutations.

“In another embodiment, information that is relevant to making specific treatment decisions may be provided to individuals according to this invention by identifying common attributes within a sub-population in the database and communicating pertinent information to a doctor concerning a patient with attributes in common to others in the subpopulation.”

The system can also be used for market research. Companies often have to make complex marketing and development decisions. This is because they purchase and use sub-optimal information that does not provide a good clinical representation of the targeted patient populations.

“For example, prescription information obtained from a pharmacy does not represent a whole group of prescriptions that have been filled. On a brand- and physician-specific basis, e.g. The pharmacy filled four brand name cholesterol-lowering prescriptions and two generic brand prescriptions for cholesterol-lowering drugs. It also filled one brand-name arthritis medication prescription written by a particular physician for five patients. First, the data does not show whether prescriptions were written or filled. This leaves a gap in monitoring patient compliance. The second is that there are no longitudinal data about patient compliance, such as age, sex and past medical history. Only what can be identified through prescriptions?filled? It does not accurately reflect the overall?treatable? of physicians. patient populations. Companies that use information from insurance claims data to determine the patient and physician populations in which they are most needed have the same problem.

“The invention provides a method and system that aggregates and imports prospectively digitized patient data from the network into an information warehouse. The system searches for patient populations using characteristics such as age and sex. It also analyzes past medical histories, family history and past surgeries. Lab values, past medications, and referring physicians are all included in the data warehouse.

“The invention offers many benefits. The first is that users can focus their queries and efforts on specific patient populations using the validated, rich clinical criteria found in the electronic medical records. An electronic medical record could include the following information. A 54-year-old, sedentary Hispanic woman, who is currently taking drug X, has had a cardiac catheter and no other interventional procedures. for hypertension, drug ?Y? For her cholesterol and whose LDL levels were greater than 175 over a one-year period. Accessing all or part of this de-identified data (i.e. data that has been cleaned to remove personal information like name, address and social security number) is critical for planning a clinical research strategy or marketing launch for a new therapeutic approach.

“In addition to having access to more robust clinical data, users and companies can direct their energies towards targeted patient cohorts. This will not only give them a historical view of patients’ past clinical profiles, but will also create scenarios where treatment plans and products could be targeted and tracked in order to validate clinical claims and marketing claims. Companies can target their marketing messages and focus on the clinical community with a larger data set. Another embodiment of the invention allows for the use of de-identified aggregate patient data to create and test virtual? Clinical trial protocol development using rich, segmented information.

“In another embodiment, the invention can also be used to provide marketing services. It is crucial that marketers identify the target population and the type of conventional therapy they want to replace. The field marketing teams are not equipped or trained to recruit patients in physician offices for Phase IV studies. While pharmaceutical companies encourage doctors to accept the results of their clinical studies, they still try to increase the marketing of the drug’s Phase IV market-focused studies.

“However, data companies buy generally don’t accurately reflect market conditions (e.g. The data includes the number of? The data covers the?number? of prescriptions for name-brand drugs that a doctor may have written. But not for ‘whom? The companies don’t know who the potential patients for new drugs are, and they may not have written them. Additionally, many physicians use paper-based charts and can’t easily identify which patients have been prescribed which drugs without performing a manual audit of their charts. Given the time constraints and diminishing resources at physicians offices, this task can be daunting. This can be very costly and time-consuming for companies. It is also a burden for them to recruit physicians for Phase IV initiatives.

The present invention provides a method and system for importing historical data as well as continuing to populate a data warehouse with prospective data. This system can then segment all patients by date seen, location, physician, and who prescribed the drug to a patient with a particular clinical profile. The data can be shared with companies that are developing alternative therapies with the consent of both the physician and patient. This allows companies to target patients who might benefit from the switching strategy and increases awareness about the product’s benefits and market acceptance. The same technology can also generate practice-based reports, which allow users or companies to track compliance and audit compliance and improve physician-patient communication.

The present invention is about the application of the system described herein to a wide range of diseases, including but not limited to cancer, heart disease and diabetes, hypertension and mental illness, allergies and arthritis, as well as neurological, immunological and infectious diseases. Any disease that the database of the invention identifies as a common constellation of phenotypic or genetic features can be treated or diagnosed according to the present invention. The system and methods described herein are also applicable to any other application that might require the data.

“Referring back to FIG. The output modules for data warehouse 250 are the web distribution module 352, the report generation module 354, as well as the download module 356. Each module produces output by retrieving data from archive 325 or by obtaining a result set using query builder module 346. Access to the data and reports conforms with the Health Insurance Portability and Accountability Act. Each module determines authorization and authentication at customer level. Web distribution module 352 provides a web-based graphical user interface that allows you to view and print clinical notes, request reports or clinical trial reports, as well as update your data warehouse service. Report generation module 354 lets customers create and save custom reports. Download module 356 lets customers transfer their output data to a local storage device.

“FIG. 10. This is a functional block diagram that shows the hardware and software components of data warehouse 250. Bus 1012 connects central processor 1016 to archive data 325, medical, diagnostic, treatment data 332, error log 324, audit log 336 and transmission control protocol/internet protocols (TCP/IP), adapter 1014 and memory 1010. TCP/IP adapter 1114 is also coupled to network 221. This is the mechanism that allows network traffic to flow between data warehouse 250, network 220, and 1010. The central processor 1016 executes the operations described herein by running the sequences and instructions of each computer program that is resident or operative in memory 1010.

“FIG. “FIG. 10” shows the functional components in data warehouse 250 as an object model. The object model organizes the object-oriented programs into the components that will perform the main functions and applications of data warehouse 250. The FIG. 10. may use Enterprise JavaBeans specifications. Paul J. Perrone and colleagues wrote the book titled?Building Java Enterprise Systems using J2EE? Sams Publishing, June 2000. A description of a Java enterprise app that was developed using Enterprise JavaBeans specifications. Matthew Reynolds’ book, titled “Beginning E-Commerce?” (Wrox Press Inc. 2000) – A description of how an object model is used in the design and development of a Web server to support Electronic Commerce applications.

“The object model of memory 1010 in data warehouse 250 uses a three-tier architecture. It includes presentation tier 1020 and infrastructure objects partition 1030. Business logic tier 1040 is also included. The object model also divides the business logic tier 1040 in two parts, application service objects partition 1050 or data objects partition 1060.

“Presentation Tier 1020 preserves the programs that manage graphical user interface to the data warehouse 250 for the industry customer 260. FIG. FIG. 10 shows presentation tier 1020. It includes TCP/IP 1022, web distribution 1024, and report generation 1026. Java servlets can be used to communicate with industry customer 261 via a network transmission protocol, such as the Hypertext Transfer Protocol (HTTP), or Secure HTTP (S-HTTP). Java servlets are run inside a request/response service that receives request messages from industry customers 260 and returns responses to customer 260. A Java servlet, a Java program, runs in a Web server environment. A Java servlet receives a request as input and parses it, performs logic operations and returns a response to the industry customer 260. To service multiple requests simultaneously, the Java runtime platform pools Java servlets. TCP/IP interface 1022 makes use of Java servlets as a Web server to communicate with industry customer 262. It uses a network transmission protocol like HTTP or S-HTTP. TCP/IP Interface 1022 accepts HTTP requests by industry customer 260. The request to visit object 1042 is passed to business logic tier 1040. Visit object 1042 transmits the result information from business logic 1040 to TCP/IP Interface 1022. These results are sent to TCP/IP interface 1022 via an HTTP response. TCP/IP interface 1022 uses the TCP/IP network adapter 1114 to exchange data over network 220.

“Infrastructure object partition 1030” contains programs that perform administrative or system functions for business logic tier 1040. Infrastructure objects partition 1030 contains operating system 1032 and an object oriented program component for system administrator interface 1034 and database management system interface 1036. It also includes Java runtime platform 1038.

“Business logic Tier 1040” retains programs that are essential to the system’s operation for storing, retrieving, and analyzing clinical, diagnostic, or treatment data. FIG. 1040 is Business logic Tier 10. Multiple instances of visit object 1042 can be found in 10. Each client session initiated via web distribution 1024, or report generation 1026 via the TCP/IP interface 1022 creates a separate instance of visit object 1042. Each visit object 1042 represents a stateful session bean. It includes a persistent storage space that is available from the initiation to termination of the client session. This persists beyond the time spent on a single interaction, method call or other interactions. FIG. 260 is associated with the persistent storage area. 2. The persistent storage area also stores data exchanged between the data warehouse 250, transcription services 230, physician 110 and clinical provider 115 or third party databases 215 via TCP/IP Interface 1022.

Industry customer 260 visits a program in Application Service objects Partition 1050. A message is sent by TCP/IP Interface 1022 to invoke a method to create visit object 1042 and store connection information in visit 1042 state. In turn, visit object 1042 invokes a method within the program. Although FIG. FIG. 10. depicts central processor 1016 controlling each program in the application service objects partition 1050. However, it should be understood that each function can be distributed to another system similar to data warehouse 250.

The object model splits business logic Tier 1040 into two parts: an application service objects partition 1050, and a data items partition 1060. Applications that reside in the application service objects partition 1050 include batch download 1051 and archiver 1052. Parser 1053 and taxonomy definier and validator 1054 are also included. Query builder 1055 is another program. C++, Java Server Pages and Oracle scripts are some of the programs found in application service object partition 1050. Data objects partition 1060 contains download data 1061 and archiver data 1062. Parser data 1063 and taxonomy delimiter and validator data 1064 are also included. Every program in the application object partition 1050 has an equivalent in the data objects partition 1106, which stores the program’s input, intermediate and output data. FIG. 6 shows the batch download 1051 process and archiver 1052 process. 6. FIG. 7, as discussed above. FIG. 8 shows the process of validator 1054 and taxonomy delimiter 1054 8 and FIG. 9 as discussed above. “The query builder 1055 process is described above.”

“FIG. FIG. 11 shows a diagram of the structure of clinical, diagnostic and treatment data 332 as shown in FIG. 3. A data warehouse for clinical, diagnostic, or treatment data 332 supports clinical and management decision making. The data that includes clinical, diagnostic, or treatment data 332 are grouped into the logical components for the data warehouse for specialty, demographics, oncology, urology, cardiology, gastroenterology, 1150, and orthopedics 1260. One embodiment of specialty and demographics 1110 provides external access to oncology 1120 and urology 1130 and cardiology 1140 and gastroenterology 150 and 1160. Only specialty and demographics1110 is accessible. Another embodiment has each logical component being separate and not linked to the other. It is also externally accessible.

“The embodiments described herein are filled-functioning systems, methods, and apparatus for storing, retrieving, and analyzing clinical, diagnostic, or treatment data in natural human language formats. However, it is important to understand that there are other similar embodiments. The disclosure is open to many modifications and variations. This system, method and apparatus for storing, retrieving, clinical, diagnostic, or treatment data does not have to be limited to the specific construction and operation shown. This disclosure will cover all modifications and equivalents that are possible within the scope of the claims.

“EXAMPLES”

“Example 1”

“Lone QT Syndrone”

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

The system informs the doctor that the patient may be suffering from partially penetrant Long QT Syndrome. The patient is recommended to undergo genetic testing. This includes testing for any of the five genes that are associated with Long QT Syndrome. A mutation in LQT2 was found, which affects potassium channels. All drugs that increase cardiac repolarization, such as antiarrythmics or gastrokinetics, antipsychotics and antihistamines, should be avoided according to the system. A substitute drug is recommended for seasonal allergies. Further testing is recommended by the system for relatives of patients. One sister and one daughter have the same LQT2 mutation. The physician makes recommendations to the patient and their family about avoiding these drugs in order to prevent sudden cardiac death.

“Example 2”

“Arthritis and anemia?”Thiopurine S?Methyltransferase Mutation

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

The system generates a set of results that include a suggestion to the doctor to test the patient for a Thiopurine-Methyltransferase Gene Locus mutation. The patient is heterozygous to mutant TPMT, which causes severe hematopoietic toxicity as well as anemia. The system produces a set of results that suggests to the doctor that the patient may have a genetic polymorphism that makes it intolerant to thiopurine medication. It also suggests alternative anti-arthritic medications that are not TPMT metabolized.

“Example 3”

“Colonic Neoplasia, Rapid Metabolic Phenotype For Acetyltransferase And Cytochrome P4501A2”.

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

There is growing evidence that fast acetylators who eat red meat cooked to a high degree may be more at risk of developing colon cancer. This type of susceptibility testing is becoming more important. This system will encourage physicians to conduct genetic testing as needed. It is unlikely that the average physician will know what the current recommendations are, especially since most doctors don’t follow the latest developments in genetic/molecular medicine and clinical medicine.

“Example 4”

“Breast Cancer BRCA1/2 Mutations, and Estrogen Metabolism.”

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

Summary for “System, method and apparatus for storing and retrieving clinical, diagnostic, genomic and therapeutic data.”

The U.S. healthcare sector is the sector with the highest stable growth rate. The demand for healthcare services increases with increasing age. The average person over 65 uses four times more healthcare dollars than someone under 65. This means that the healthcare sector’s growth rate will likely increase as the U.S. population grows from 12% to 18% by 2020.

A data warehouse is a collection data that can be used to support both clinical and patient management decisions. A data warehouse is a collection of many data that provides a clear picture of business and clinical conditions at one time. The development of a data warehouse involves the creation of systems that extract data from operating system and the installation of a warehouse database system that allows clinicians or managers to have flexible access to the data. Data warehousing is a term that refers to the process of combining multiple databases. It is the process of combining multiple databases within an enterprise. In contrast, a ?data mart? A?data mart? is a collection or database that helps clinicians and managers to identify and make clinical and strategic business decisions regarding their patients. Data marts, which combine databases from multiple enterprises, are typically smaller and more focused on one subject or department. Some data marts are called dependent data marts and are subsets larger data warehouses.

The vast amount of medical technology and information is opening up new avenues for drug and device therapies, diagnosis, and disease prevention strategies for many diseases. This includes heart disease, diabetes and hypertension, mental illness and allergic reactions, as well as cancer, heart disease, heart disease, heart disease, heart disease, heart disease, heart disease, heart disease, heart disease, and infectious disease. Many diseases are linked to specific contributing factors such as genetic factors, family history and dietary issues. To improve diagnosis and treatment accuracy, it is important to identify these contributing factors. Furthermore, the future of healthcare will emphasize disease prevention and past treatment, diagnosis, and treatment. It will also be important to identify people at high risk for developing a disease.

“Genetic information has become a powerful tool for clinicians and researchers in medicine. Genomic studies will lead to the development of many targeted therapies. Researchers and clinicians will soon be able to identify variations in Deoxyribonucleic acid (DNA) and predict the patient’s response to a specific medicine. It is crucial for physicians to identify if a patient has a genetically-based reaction to a drug. Approximately 7% of all patients have severe adverse reactions to prescribed medications, with drug side effects being the 5th leading cause of death in the United States in 1997 (Pharmacogenomics-Offering a Wealth of Targets for the Pharma Prospector; IMS Health Web Site). Clinical intelligence is needed to allow a doctor to identify when a patient’s clinical profile, family history or symptoms suggest a genetically-based reaction to a specific therapy. If a patient is identified this way, they will be eligible for genetic screening to determine if they have a genetic anomaly that will result in an adverse side effect. This information will allow a physician to prescribe better medicines and treatments.

“Aside from identifying therapeutic strategies the healthcare industry recognizes the value of a database system containing electronic health records (EMRs). This would allow for better patient care and improve the efficiency of the doctor’s practice. A well-functioning EMR system could provide valuable information to a wide range of applications. This includes, but is not limited, diagnostic, therapeutic, clinical, marketing research (i.e. passive recruitment of a population), and marketing services (i.e. active recruitment of research populations). EMR companies have been marketing the benefits of EMR systems for over a decade. However, the adoption rate of the technology is slow due to the complexity of integration and the need to modify workflows. Automation in physicians’ offices is restricted to small-scale client server based billing and scheduling. EMR software and other database management capabilities are not available to all physician practices. Even fewer have IT support. EMR management is becoming more important due to the complex regulatory environment that clinicians face. A paper-based system is almost impossible to comply with the new healthcare regulations and practice guidelines. Moreover,”

“PCT Patent Application Serial Number WO 00/51053” refers to a medical and diagnostic database that includes patient records, including genotype, phenotype, and sample information. However, the PCT application describes a database system that relies on genotype or stored samples information to generate correlations between genotype and phenotype.

“Moreover, the medical databases in the prior art force physicians to alter the normal process of collecting information because they rely on physicians to complete a questionnaire and have other restrictions on data entry that are inconvenient for the physician. The U.S. Patent discloses an epidemiological database as one example of exemplary medical databases. Ser. No. No. Ser. No. 6,182,029. 6,182,029.

A successful product or service within the healthcare industry will improve the quality of life of a large number patients. It will focus on the physician’s tasks while offering a cost-effective solution for a problem. The healthcare industry will provide clinical and economic value for the patient’s medical records by automating the collection and processing clinical documentation.

“FIG. “FIG. Patient 100 first visits physician 110 to discuss a clinical reason. You can visit physician 110 in any clinic such as a private practice, hospital, or health clinic. The visit may be for an annual physical, or to treat a medical condition. Physician 110 compiles a medical note following the visit. This may include historical medical information, vital signs and symptomatic descriptions as well as prescriptions for pharmaceuticals or other diagnostic findings. Physician 110 uses the public switched telephone network 120 (PSTN) to connect to transcription service 130 to dictate the clinical notes for patient 100. The dictated clinical note is stored by transcription service 130 in an audio format on storage device 131. Transcriptionist 130 retrieves the dictated medical note from storage device 131, transcribes it into electronic medical records 135, and then stores electronic medical records 135 in digital format on storage device 13. Physician 110 reviews electronic record 135 and keeps a print copy in paper-based charting 140.

“After the visit with patient 100 physician 110 may recommend that clinician 115 conduct a clinical test on patient number 100. Physician 110 reviews the results and discusses them with patient 100. Then, he stores the results in a paper-based charting 140 that is associated with patient 100.

FIG. 1. lacks the ability of efficiently searching for data not associated with a particular patient. A system, method, or apparatus that automates clinical documentation and allows for the storage and retrieval in natural language format of clinical, diagnostic, treatment and treatment data is needed. Software tools will be provided to help define clinical term or disease taxonomies and group the parsed data. Search criteria can then be used to allow intelligent searching of the data warehouse. The disclosed system, method and apparatus automates clinical documentation and provides search tools and an engine for a data warehouse that unlocks clinical and economic value in patient medical records.

“A method, system and computer program product are disclosed for retrieving data from a database. The system, method, and computer program product include creating a taxonomy with at least one search criteria and sending a query through the database. In response to the query, receiving the result set, which includes at most one result record, and then displaying that record. A user can also be included in the method, system, or computer program product. This could include a consultant physician, a physician treating, or a clinical researcher.

“Creating a taxonomy may also include adding at most one search rule to it that includes at least 1 search characteristic, storing and validating the taxonomy. Every search rule contains an inclusion search rule that defines at least one search characteristic. Each inclusion result record generated by running the rule against the database includes at least one search rule. Alternately, each search rule contains an exclusion rule that defines at least one excluded search characteristic. This means that running an exclusion against the database will generate at least one record of exclusions, with each exclusion record excluding the at least one exclusion criteria. Alternately, each search rule contains an inclusion rule to determine at least 1 inclusion search feature and an exclusion rules to define at most 1 exclusion search attribute. Running the exclusion against the database will generate at least 1 inclusion result record. Each inclusion result record will include said at minimum one inclusion search feature. Whereas running the exclusion against the database will generate at least 1 exclusion record. Each exclusion record will exclude said at-least one exclusion look characteristic. The search characteristic can include a diagnosis phrase, a prescription for drugs, an illness, and demographic data. Demographic data may include a geographical location, gender, or age. A clinical diagnosis phrase that includes a myocardial injury, an LDL or a heart attack.

Validating the taxonomy may also include running it against the database, receiving and displaying the result sets. Notifying the database can also be part of running the taxonomy. Receiving the result set may also include receiving an inclusion results set. This is where at least one search rules includes an inclusion rule. Each record in the result set contains at least one inclusion search attribute. The receiving of the result sets can also include an exclusion set. In this case, at least one search rules includes an exclusion rule, and the exclusion rule against a database generates the exclusion set. Each record in the exclusion set must contain at least one exclusion-search characteristic. Another option is to receive an inclusion result sets. This means that at least one search rules includes an inclusion rule, and running an inclusion rule against a database generates an inclusion result. Each record in an exclusion set includes at least 1 inclusion search characteristic.

“The creation of the taxonomy may also include an analysis of the result sets and the updating of the taxonomy based upon the analysis of the result sets. An additional step in updating the taxonomy is to flag an included or excluded record, and unflag it.

“In one embodiment, the analysis of the result set could determine a disease susceptibility type or risk for at least one patient. A genetic test of at least one patient could identify a modifier gene or detect cancer. A sample, such as a tumor or tissue sample, could be tested by somatic testing. This would allow the patient to determine if the disease is present, and predict whether a drug response will occur. It also gives information that can help in predicting the likelihood of the disease. Proteomic testing on at least one of the patients could provide prognostic information or a propensity to develop the disease. Another embodiment of the analysis of the result set could identify at least one person, such as a drug response polymorphism or hypertension drug response polymorphism. Another embodiment generates a treatment recommendation for at least 1 patient, identifies at most one clinical trial where said at least 1 patient is eligible, models the protocol for a virtual clinical trials protocol, or generates data for market research or market services.

In one embodiment, data may include diagnostic data such as past diagnosis and treatment, biochemical data or biochemical data. It could also include family history, genetic data, drug response history, data on diet, exercise, and physiologic data. Data may also include genotype data and haplotype data, such as a chromosome arrangement, a DNA sequence or length, or a gene expression or nucleotide polymorphism. Another embodiment of the data is related a genetic-based disorder and includes oncology, cardiology, gastroenterology, orthopedic, gene expression, haplotype data, genotype data or haplotype data. Another embodiment of the database may contain an archive database, an audit log, or error log.

“A method, system and computer program product are disclosed for storing patient data in a database connected with a network. The system, method, and computer program product include receiving the clinical data of the patient, storing it in an archive database connected with the network, extracting the data from the clinical records, and then storing the data within the database. The computer program product, method, and system can also include the storage of the structured file in the databank. Alternately, the system, method, and computer product may also include creating a patient record in the database and populating it with data.

“Receiving the clinical information can also include setting up a network connection to a server machine that contains the clinical data, and then requesting the clinical details from the server. After receiving the clinical information, the receiving process can also include dismantling the network connection to that server computer.

“Extracting the data may also include creating a structured data file, parsing it, and then copying it into the structured data. The structured file must contain a tag for each segment of the clinical data. Parsing the clinical data may also include the location of at least one segment within the clinical data. Parsing the clinical data may also include the conversion of data from said at least one segment to another format in order to increase the database’s performance when performing a search or adding records. Parsing the clinical data may also include linking data from said data segment to relevant clinical data for another patient. Parsing the clinical information can also include the recognition of a known error in clinical data. In this case, the parsing the clinical details corrects the error before the copying of clinical data. Alternately, parsing the clinical data may include the storage of an unknown error in an errors database. Another embodiment of the structured file tag is an extensible Markup Language tag, a Hypertext Markup Language tag or a Health Level Seven tag.

In one embodiment, data may include diagnostic data such as past diagnosis and treatment, medical history, biochemical, biochemical, physiologic, proteonomic, family history, diet, exercise, demographic, drug response history, or other data. Data may also include genotype data and haplotype data, such as a chromosome diagram, a DNA sequence or length, a gene expression or a nucleotide polymorphism. Another embodiment of the clinical data includes a medical record that includes a clinical note, a laboratory report or a laboratory result. Another embodiment of the data is related a genetic-based disorder and includes oncology, cardiology, gastroenterology, orthopedic, immunology, neurology, rheumatology, neurology, pulmonology, family practice medicine, demographic, and internal medicine data. Another embodiment of the database may contain an archive database, an audit log, or error log.

“Another embodiment of the system, method and apparatus for retrieving and storing clinical, diagnostic and treatment data. The system, apparatus, and method parse a transcriptional data stream, an electronic medical record, or a historical third-party data base, store the parsed data in the data warehouse, and provide software tools to identify disease or clinical taxonomies. These search criteria allow for intelligent searching of the data warehouse.

The present invention is a general-purpose computer system and method that includes a database that can contain information useful for diagnostic and clinical purposes. The system allows users to input clinical information about patients from any source including laboratory reports, physician’s dictated notes and EKG or other instrument reports. It also creates an electronic medical record that includes patient information. This electronic medical record then correlates patient information with information stored in the data warehouse. Users can also receive suggestions for treatment, diagnostics, or genetic testing. The invention also addresses methods for extracting and storing clinical data. It also provides methods for searching, correlating, and identifying patient groups that share similar attributes.

The present invention also relates to a general purpose computer system, method and apparatus that contains a plurality electronic medical records. Each record contains clinical information about an individual patient, including phenotypes, medical, familial, biochemical and proteonomic as well as diet, exercise, demographic and drug response history. Further, the present invention relates to systems that include genotype and/or Haplotype information. These electronic medical records and methods can be used for many purposes, including clinical, diagnostic, market research and clinical trial applications.

“The present invention also relates to a method of determining a patient?s disease risk and susceptibility types. This involves extracting clinical data from any clinical source to create an electronic record. Correlating patient’s clinical information to information from the system and/or accessed via one or more public/private domain databases and generating a result set that contains a suggestion for genetic, proteinomic and/or another type of diagnostic testing.

“The invention also pertains to the display of identified correlations and/or the calculation of statistical significance for the identified correlation.”

“The invention also relates to the entry of the results from the genetic, proteonomic and/or other diagnostic tests or transmission into the data warehouse and the generation of a result set that includes suggestions for treatment based on the patient’s records.”

“The invention also refers to a method of identifying a patient having a drug reaction polymorphism. This involves creating an electronic medical record, which includes extracting patient’s clinical information and drug reaction information from any source. Correlating this information with information in system and/or accessed via one or more public or privately owned databases relating single polynucleotide Polymorphisms (SNPs), and producing a result set that includes a suggestion to genetic test for possible SNPs associated with the drug.

“The invention also relates to the entry of the results of the genetic test into a system. After that, the system generates an alternative therapy suggestion based on the patient’s record.”

“The present invention also addresses a method of identifying subjects for clinical trials. This involves extracting clinical data to create an electronic medical records, correlating patient’s clinical information with other patient information in the system, identifying populations or sub-populations of patients with similar phenotypes or genotypes or clinical characteristics, as well as identifying clinical trials that would be appropriate for the patient to participate.”

“The invention also relates to the general purpose computer system, method and apparatus described herein that can be applied to a wide variety of diseases including but not limited to cancer, heart disease and diabetes, hypertension and mental illness, allergies and infectious, neurological, and immunological disorders.”

“FIG. FIG. 2 shows an example of a system that integrates data warehouses for storing, retrieving, and treating clinical data into the prior art medical documentation process. 1. Another embodiment integrates a data warehouse in the prior art clinical documentation process to determine the disease susceptibility type or risk for a patient. The FIG. 2 system shows how the prior art clinical documentation process can still be used in any of the embodiments. 2 and adds additional features to correct the deficiencies in the prior art process.”

“In FIG. 2. Physician 110 can connect to transcription service 230 via either the public switched telephone network 120 (PSTN) or network 220. PSTN 120 covers traditional landline telephone networks as well as mobile and cellular telephone networks. It also includes satellite-based telephone networks. Network 220 is the public Internet, wide-area networks or local area networks using an transmission protocol such as transmission control protocols/Internet protocol (TCP/IP), file transfer protocol (FTP) or personal area networks like a Bluetooth network. The system illustrated in FIG. 110 can be used by Physician 110 to input medical, diagnostic, or treatment data. 2. Using a variety audio and digital input formats. Audio input formats can include both traditional audio over a PSTN and digital audio over wireless networks. Digital input formats include digital audio, voice recognition technology, digital video, audio/video, digital documents like word processing documents and portable file format (PDF), documents, and digital images files.

“In addition to being able to receive input from physician 110 and clinical provider 115 the system shown in FIG. 2. may also receive input from third-party database 215. Third party database 215 may contain pharmacogenomics and laboratory databases, instrumentation databases, and other publicly accessible medical databases. Third party database 215 can also communicate with FIG. 2. via PSTN 120, network 220. The system in FIG. 2. Retrieves the relevant information.

“In FIG. 2. Since physician 110 can input data in many formats, storage device 231, transcription service 230, stores both audio and digital input formats. The system transcribes physician 110’s input data into an electronic medical record 135 then forwards it to physician 110 via network 220 or PSTN 120. Transcription service 220 also transcribes physician 110’s input data into structured electronic medical records 235. Structured electronic record 235 enhances electronic medical record 135’s contents by segmenting it into fields and associating an?tag? Each field can be tagged with a?tag? One technology that may be used for field tagging is the Extensible Markup Language, (XML), which is a tagging system based upon the hypertext markup languages (HTML) or the simple generalized marker language (SGML), which are both healthcare industry tagging standards. Speech Machines can perform a subset of functions that are performed by transcription service 233. DictationNet service, along with similar offerings by Vianeta? and MedRemote? and Total eMed?. FIGS. FIGS. 4A- 4C show an example electronic medical record for a fictional patient. FIGS. FIGS. 5A through 5I show the exemplary electronic health record. As an example structured electronic medical record, 4A through 4C includes XML field tagging.

“FIG. 2. also shows the interaction between industry customer 260, transcription service 230, and data warehouse 250. Data warehouse 250 receives as input data electronic medical records 135 and structured electronic health records 235 from transcription services 230. Data warehouse 250 stores the input data and provides search tools for industry customer 260 to search through data warehouse 250. Industry customer 260 includes physician 110 and medical marketing agencies, manufacturers of medical devices, Medicare, clinical researchers organizations, and companies that focus on pharmacology or genetics.

“FIG. “FIG. 2. Batch download module 311 receives input data from data warehouse 250. This data is derived from electronic medical records 135 and structured electronic health record 235. Archive module 325 stores an archive or backup copy of the input data. Parse module330 processes the input data, and stores the result data in archive data 325. Search 340 contains taxonomy definition module 342, and taxonomy validation program 344. It also includes query builder 346 that performs search functions on clinical, diagnosis, and treatment data 332 to produce query results for output 350. Output 350 contains web distribution module 352, module 354, report generation module 354, as well as download module 356 for distributing query results from search.340 to industry customer 265. Web distribution module 352, Report generation module 354, or download module 356 store the result data in audit log 334.

FIG. 325 shows archive data, diagnostic, treatment, and clinical data 332, and error log 334 and audit log 336. 2. as separate databases, but the present invention contemplates consolidating them as well as distributing them to meet efficiency and performance requirements. These databases may use a relational database management software such as Oracle 8i (version 8.1.7). An object-oriented database management software architecture may also be used in one embodiment of these databases.

“FIG. “FIG. 3. Step 610 determines whether batch download module310 is performing bulk data loads. If the answer is “no”, batch download module310 performs a periodic retrieval and the process moves to step 612. If you answered “yes”, batch download module310 will perform a bulk download and proceed to step 626.

Referring to FIGS. “Referring to FIGS. 2, 3 and 6. The periodic retrieval of input information begins at step 612 when batch download module310 issues a query for input data, transcription service 230. Batch download module 310 will determine at step 614 that there is no data available. The process then proceeds to step 624. It will then sleep until the next retrieval period. If data is available at transcription services 230, the process proceeds with step 616. Next, it retrieves electronic medical record 135 and then at step 618, it retrieves structured electronic medical records 235. Batch download module 310 stores electronic health record 135 and structured medical record 235 in archive 325 at step 620. Step 622: Batch download module 310 extracts structured electronic medical record (235). FIG. FIG. 7 explains the parsing process more fully. The process continues to step 622 and then sleeps until the next retrieval period.

Referring to FIGS. Steps 2, 3 and 6 are for bulk downloading input data. Batch download module 310 connects to a data loader. A data load server is a general purpose computer with direct access to bulk data. A network connection is used to facilitate communication between the data warehouse 260 and the data load server in one embodiment. Another embodiment integrates the data load server and the data warehouse 260 into one general-purpose computer platform. Batch download module 310 starts an iterative process to load the data by retrieving an electronically stored record, such as an electronic medical record 135. Batch download module 310 converts an electronic record into a structured electronic document such as structured electronic health record 235. This conversion is similar in nature to that of transcription service 230 to create structured electronic record 235. Step 632: Batch download module 310 stores both the electronic record as well as the structured electronic record in archive 325. Step 634: Batch download module 310 parses structured electronic records. FIG. FIG. 7 explains the parsing process more fully. Batch download module 310 will determine, at step 636 that there is more bulk data available. The process continues from step 628. Batch download module 310 disconnects at step 638 from the data load server if all bulk data have been loaded. The process continues to step 638 and then sleeps until the next retrieval period.

“FIG. “FIG. 3. Step 710 is where you create an empty database record. Parser module 330 starts the iterative process to locate a tag field in structured electronic medicine record 235 as shown in FIG. 2. Parser module330 locates the tagged areas in structured electronic medical records 235. It does not process every word to determine the meaning of the phrase within the document, in the context of a particular domain or canonical grammar. Parser module330 will determine, at step 714 that the structured electronic medical record (235) does not contain additional tagged fields. The process stores the record in the clinical, diagnostic, treatment data 332 at 724. Parser module 330 will attempt to correct known data errors at 718 if it finds a tagged area but fails to recognize it. If the error is not a data error, the process writes the data to an exception log at step 716. The exception log will be periodically analyzed by a system operator who will attempt to correct or reprocess any incorrect data. The process converts the field data at step 722, if the field has been tagged, to improve the efficiency of the database search. If the field contains the date of the patient’s visit, then the structured electronic medical record data 235 field data consists of?Mar. 28, 2001? stored in a text field with a length of 10 characters. Step 722 will convert field data to a date and time, as it is inefficient for a database search to text data. datatype. The Oracle? The DATE datatype is an excellent example of a?date? and?time? Datatype is extremely efficient as it uses only 7 bytes to store the day and month, century, year and hour, minute and second. If the field data uniquely identifies another record within the database, it is linked to this record after converting it at step 724. The process continues from step 712 by repeating steps 720 and 724.

“FIG. “FIG. 3. A taxonomy is a collection of clinical, diagnostic, or treatment data that a database query will return. A taxonomy includes a description of the illness, drug prescriptions and medical coverage, as well as demographic data like geographic location, gender and age. It also contains clinical diagnostic terms like myocardial Infarction, LDL or heart attack. Step 810 is where the inclusion rules are created. These rules specify the characteristics that must be included in every record of the taxonomy definition result set. Step 810 is followed by the taxonomy definition process which creates the exclusion guidelines at step 812. The exclusion rules specify characteristics that cannot be found in any record of the query result set. Once a user has created the inclusion and exclusion criteria that make up the taxonomy, the taxonomy process stores them in the clinical, diagnostic, or treatment data 332 at step 804.

“FIG. “FIG. 3. Step 910 is when the validator selects a taxonomy description from the database to validate. Step 910 is when a validator selects a taxonomy description to be validated. The database runs step 914 the inclusion rules of taxonomy to generate an inclusion set. The database runs the exclusion rules from the taxonomy to generate an exclusion set at step 916. The inclusion result set is flagged at step 918 if the same database rows appear in both the exclusion and inclusion result sets. Step 920 is when the database informs the validator that an analysis of the inclusion result set has been completed. This analysis involves row-by-row inspection. The validator can remove an exclusion flag from a row to correct the error and update the taxonomy description to delete the row from the exclusion set. The validator can also update the taxonomy to include additional rows into the inclusion result set. After the analysis is completed, the validator saves updated taxonomy in the database at step 922. Then, he/she can optionally repeat the process starting at step 912.

Referring to FIG. 3. The query builder module 346 lets a user, such as a clinician researcher, treating physician, or consulting physician pose a clinical question to receive a set of results that answer the question. The query builder module 346 merges multiple taxonomy definitions to create a single result set.

“The invention concerns a database system that contains information useful for medical marketing, clinical diagnostic, clinical trial recruitment, and other purposes. The invention’s database system has two main advantages over existing medical database systems:

The system is a new data entry system that extracts relevant clinical information from almost any data source, including physician’s dictated notes and laboratory reports, EKGs, EEGs, or other instrument reports. After the results are created in an electronic-based medium, the database system tags the data to allow for search and correlation functions. This is a particularly useful method because it allows for the entry of large amounts of clinical information. It also doesn’t require clinicians change their routine collection of such information. For example, they can limit them to questionnaire formats, or other fixed data entry methods.

The second is that the system allows a clinician to access valuable, current information and make suggestions for diagnostic testing. This information can be based on the patient’s clinical data and attributes without the need to obtain specific genotype information. The database system of invention correlates patients’ clinical information, including specific attributes and demographic information, with data in the warehouse and generates recommendations for genetic, proteonomic or other diagnostic tests based on the patients phenotypic characteristics. Further, the invention relates to the entry of genetic testing results into the system. The system then generates treatment suggestions and/or alternate therapies based on those results.

“One embodiment of the database system includes a number of electronic medical records. Each record contains clinical information taken from any clinical source relevant to a patient. Because they can be easily segmented and searched using a wide range of criteria, the electronic medical records of this invention are particularly important. The electronic medical records of this invention include relevant clinical information such as phenotypes, medical, family and biochemical data, physiologic and proteonomic data, diet, exercise, demographics, drug reaction history and drug prescriptions. Laboratory results are also included. Past diagnoses and treatments can also be included. The database may optionally include information such as medication taken, occupational history, hobbies, diet, family history and normal exercise routines. Information that is more specific includes whether an individual is receiving hormone replacement therapy, whether they smoke, and whether they are a regular user of a sun-tanning device. It also contains information about the geographical region where the patient lives. One embodiment collects the individual’s phenotype as well as chemical information simultaneously to ensure that the information is most relevant to the patient’s phenotype.

“Another embodiment of the invention is a database system in which the electronic medical record contains the patient’s genotype information and/or haplotype information. Genotype and haplotype information, for example, includes information about chromosome structure, DNA sequence, length of specific genes or regions, gene expression levels, identifications of single nucleotide Polymorphisms (SNPs) and/or other information related to a patient’s genetic makeup. Alternately, or further, the genotype data can include a record of the actual or inferred DNA bases sequences at one or several regions within the genome. The genotype information may also include a record of variation in a specific sequence of chromosomes compared to a reference sequence. This indicates whether or not there is variation at the same positions within the sequence. A record of length or variants of a sequence can be included in the genotype information. This information is useful for determining whether there is a correlation between genetic variation or phenotype variation.

“It is possible that an individual’s genotype information (such as SNP information) will not be known at the time they are examined by their doctor in many of the applications of this invention. According to the invention, the doctor would input the patient’s medical data, including demographics, medical history, and laboratory test results, into the database. The system would then link the patient’s clinical data with the database and/or access from any other public or private databases to generate a recommendation for a specific genetic test. The patient’s medical information can be compared to other records in the database in order to identify common attributes in the population that shares a common SNP. The physician would be informed that the patient shares attributes with others who have the same SNP. This method can also be used to identify patients who are good candidates for clinical trials.

“In another embodiment, this invention is related to a method of determining a patient?s disease susceptibility and risk. Future healthcare strategies will be more focused on disease prevention in areas like congestive heart failure and cancer, as well as neurological diseases, and other degenerative conditions. This method involves extracting clinical data from any source and creating an electronic patient record. It also includes correlating patient’s clinical information to information in the system and/or accessed via one or more public/private domain databases like the SNP Consortium. The result set may include a suggestion for genetic or proteonomic testing.

“In another embodiment, the present invention also pertains to the display of identified correlation to aid with determining statistical significance of that identified correlation.”

“In another embodiment, this invention also relates to putting the results from the genetic, proteonomic and/or other diagnostic tests into the system and creating a result set that includes a suggestion of treatment based on the test result and patient’s record.”

“Another embodiment of the invention is a method to identify a patient with drug response polymorphisms. This involves creating a patient record, including patient’s clinical information and drug response information. Correlating this information with information in system and/or accessed via one or more public or privately owned databases relating single polynucleotide Polymorphisms (SNPs), and producing a result set that includes suggestions for genetic testing for possible SNPs associated with the drug reaction.

“In another embodiment, the present invention further refers to the step in which the results of the genetic test are entered and the system generates an alternative therapy suggestion based on the patient’s record.

Although many SNPs have been identified and their significance is unknown, they are being investigated. Patients can be tested cheaply using PCR, restriction fragment long polymorphism, microchip array technology or other well-known methods. However, the missing link is access to clinical information that can be used to identify patients for genetic testing. This link is provided by the present invention, which allows a clinician to combine phenotypic and specific genotype information. It is crucial to have accurate genetic testing, if indicated by clinical and demographic information.

“Another embodiment of the invention relates to a method of identifying subjects for clinical trials. This involves extracting clinical data to create an electronic patient record. Correlating this patient’s clinical information with other patient information in the system, identifying sub-populations of patients with similar phenotypes or clinical characteristics and identifying clinical studies that would be appropriate for their participation.

“Approximately 65%” of clinical trials fail to finish on schedule due to delays in recruiting patients. Trial sponsors spend $1.3 million each day to pay for the delay in recruitment, which averages more than three months. Sponsors rely on their treating physician and his research staff almost 100 percent to screen and enroll patients for clinical trials. To?recruit,? there are many media options such as radio/TV, the internet and other media. Clinical trial candidates are not always successful, particularly when they have to deal with a chronic condition. Patients trust their physician to inform them about all possible treatment options.

According to current practice, the sponsor of a clinical trial grants a clinical trial to a doctor or group of physicians who have participated in clinical studies in the past and have a large number of patients from their practice. This is because the vast majority of these practices don’t have the capability to search any type of database to conduct a suitability check or, as it is commonly known, screening. for patients based on detailed, multi-dimensional, ?inclusion/exclusion? Criteria? Patients on multiple drugs may or not allow them to be included. Past medical history may or not exclude the patient. It would be difficult and expensive to search their medical records manually because they are paper-based. Therefore, doctors and their research staff wait until patients are seen in the office before initiating the screening and recruitment process. Sponsors will lose hundreds of millions in sales revenue due to this inefficient process.

The present invention solves this problem by using the data warehouse and search function to screen large numbers of patients automatically. This is done with greater accuracy and use the inclusion/exclusion or validation functions described herein. A patient may be qualified for a clinical study if he is Type II, insulin-dependent diabetic or takes a cholesterol lowering medication. The invention allows the user to exclude or include subjects based upon detailed information. This system also speeds up clinical trial screening and enrollment, with less administrative and cost-intensive costs for the researchers and physicians.

“This invention also provides a method for identifying individuals and sub-populations that share similar phenotypic and genetic characteristics. These individuals or sub-populations can be used to provide valuable information for diagnostic, therapeutic or research purposes. One embodiment of the invention allows for the identification of sub-populations of individuals with common phenotypic characteristics, using shared attributes from the database. The sub-population of individuals may be further evaluated to determine whether they share a common genotype or if they have a unique response to drug treatment. This is especially useful in identifying appropriate matched control populations and test populations for the clinical evaluation of drug treatments.

“A further embodiment allows physicians to identify individuals from the database based on common characteristics. These individuals could be candidates for further diagnostic testing such as genetic testing or screening for specific mutations.

“In another embodiment, information that is relevant to making specific treatment decisions may be provided to individuals according to this invention by identifying common attributes within a sub-population in the database and communicating pertinent information to a doctor concerning a patient with attributes in common to others in the subpopulation.”

The system can also be used for market research. Companies often have to make complex marketing and development decisions. This is because they purchase and use sub-optimal information that does not provide a good clinical representation of the targeted patient populations.

“For example, prescription information obtained from a pharmacy does not represent a whole group of prescriptions that have been filled. On a brand- and physician-specific basis, e.g. The pharmacy filled four brand name cholesterol-lowering prescriptions and two generic brand prescriptions for cholesterol-lowering drugs. It also filled one brand-name arthritis medication prescription written by a particular physician for five patients. First, the data does not show whether prescriptions were written or filled. This leaves a gap in monitoring patient compliance. The second is that there are no longitudinal data about patient compliance, such as age, sex and past medical history. Only what can be identified through prescriptions?filled? It does not accurately reflect the overall?treatable? of physicians. patient populations. Companies that use information from insurance claims data to determine the patient and physician populations in which they are most needed have the same problem.

“The invention provides a method and system that aggregates and imports prospectively digitized patient data from the network into an information warehouse. The system searches for patient populations using characteristics such as age and sex. It also analyzes past medical histories, family history and past surgeries. Lab values, past medications, and referring physicians are all included in the data warehouse.

“The invention offers many benefits. The first is that users can focus their queries and efforts on specific patient populations using the validated, rich clinical criteria found in the electronic medical records. An electronic medical record could include the following information. A 54-year-old, sedentary Hispanic woman, who is currently taking drug X, has had a cardiac catheter and no other interventional procedures. for hypertension, drug ?Y? For her cholesterol and whose LDL levels were greater than 175 over a one-year period. Accessing all or part of this de-identified data (i.e. data that has been cleaned to remove personal information like name, address and social security number) is critical for planning a clinical research strategy or marketing launch for a new therapeutic approach.

“In addition to having access to more robust clinical data, users and companies can direct their energies towards targeted patient cohorts. This will not only give them a historical view of patients’ past clinical profiles, but will also create scenarios where treatment plans and products could be targeted and tracked in order to validate clinical claims and marketing claims. Companies can target their marketing messages and focus on the clinical community with a larger data set. Another embodiment of the invention allows for the use of de-identified aggregate patient data to create and test virtual? Clinical trial protocol development using rich, segmented information.

“In another embodiment, the invention can also be used to provide marketing services. It is crucial that marketers identify the target population and the type of conventional therapy they want to replace. The field marketing teams are not equipped or trained to recruit patients in physician offices for Phase IV studies. While pharmaceutical companies encourage doctors to accept the results of their clinical studies, they still try to increase the marketing of the drug’s Phase IV market-focused studies.

“However, data companies buy generally don’t accurately reflect market conditions (e.g. The data includes the number of? The data covers the?number? of prescriptions for name-brand drugs that a doctor may have written. But not for ‘whom? The companies don’t know who the potential patients for new drugs are, and they may not have written them. Additionally, many physicians use paper-based charts and can’t easily identify which patients have been prescribed which drugs without performing a manual audit of their charts. Given the time constraints and diminishing resources at physicians offices, this task can be daunting. This can be very costly and time-consuming for companies. It is also a burden for them to recruit physicians for Phase IV initiatives.

The present invention provides a method and system for importing historical data as well as continuing to populate a data warehouse with prospective data. This system can then segment all patients by date seen, location, physician, and who prescribed the drug to a patient with a particular clinical profile. The data can be shared with companies that are developing alternative therapies with the consent of both the physician and patient. This allows companies to target patients who might benefit from the switching strategy and increases awareness about the product’s benefits and market acceptance. The same technology can also generate practice-based reports, which allow users or companies to track compliance and audit compliance and improve physician-patient communication.

The present invention is about the application of the system described herein to a wide range of diseases, including but not limited to cancer, heart disease and diabetes, hypertension and mental illness, allergies and arthritis, as well as neurological, immunological and infectious diseases. Any disease that the database of the invention identifies as a common constellation of phenotypic or genetic features can be treated or diagnosed according to the present invention. The system and methods described herein are also applicable to any other application that might require the data.

“Referring back to FIG. The output modules for data warehouse 250 are the web distribution module 352, the report generation module 354, as well as the download module 356. Each module produces output by retrieving data from archive 325 or by obtaining a result set using query builder module 346. Access to the data and reports conforms with the Health Insurance Portability and Accountability Act. Each module determines authorization and authentication at customer level. Web distribution module 352 provides a web-based graphical user interface that allows you to view and print clinical notes, request reports or clinical trial reports, as well as update your data warehouse service. Report generation module 354 lets customers create and save custom reports. Download module 356 lets customers transfer their output data to a local storage device.

“FIG. 10. This is a functional block diagram that shows the hardware and software components of data warehouse 250. Bus 1012 connects central processor 1016 to archive data 325, medical, diagnostic, treatment data 332, error log 324, audit log 336 and transmission control protocol/internet protocols (TCP/IP), adapter 1014 and memory 1010. TCP/IP adapter 1114 is also coupled to network 221. This is the mechanism that allows network traffic to flow between data warehouse 250, network 220, and 1010. The central processor 1016 executes the operations described herein by running the sequences and instructions of each computer program that is resident or operative in memory 1010.

“FIG. “FIG. 10” shows the functional components in data warehouse 250 as an object model. The object model organizes the object-oriented programs into the components that will perform the main functions and applications of data warehouse 250. The FIG. 10. may use Enterprise JavaBeans specifications. Paul J. Perrone and colleagues wrote the book titled?Building Java Enterprise Systems using J2EE? Sams Publishing, June 2000. A description of a Java enterprise app that was developed using Enterprise JavaBeans specifications. Matthew Reynolds’ book, titled “Beginning E-Commerce?” (Wrox Press Inc. 2000) – A description of how an object model is used in the design and development of a Web server to support Electronic Commerce applications.

“The object model of memory 1010 in data warehouse 250 uses a three-tier architecture. It includes presentation tier 1020 and infrastructure objects partition 1030. Business logic tier 1040 is also included. The object model also divides the business logic tier 1040 in two parts, application service objects partition 1050 or data objects partition 1060.

“Presentation Tier 1020 preserves the programs that manage graphical user interface to the data warehouse 250 for the industry customer 260. FIG. FIG. 10 shows presentation tier 1020. It includes TCP/IP 1022, web distribution 1024, and report generation 1026. Java servlets can be used to communicate with industry customer 261 via a network transmission protocol, such as the Hypertext Transfer Protocol (HTTP), or Secure HTTP (S-HTTP). Java servlets are run inside a request/response service that receives request messages from industry customers 260 and returns responses to customer 260. A Java servlet, a Java program, runs in a Web server environment. A Java servlet receives a request as input and parses it, performs logic operations and returns a response to the industry customer 260. To service multiple requests simultaneously, the Java runtime platform pools Java servlets. TCP/IP interface 1022 makes use of Java servlets as a Web server to communicate with industry customer 262. It uses a network transmission protocol like HTTP or S-HTTP. TCP/IP Interface 1022 accepts HTTP requests by industry customer 260. The request to visit object 1042 is passed to business logic tier 1040. Visit object 1042 transmits the result information from business logic 1040 to TCP/IP Interface 1022. These results are sent to TCP/IP interface 1022 via an HTTP response. TCP/IP interface 1022 uses the TCP/IP network adapter 1114 to exchange data over network 220.

“Infrastructure object partition 1030” contains programs that perform administrative or system functions for business logic tier 1040. Infrastructure objects partition 1030 contains operating system 1032 and an object oriented program component for system administrator interface 1034 and database management system interface 1036. It also includes Java runtime platform 1038.

“Business logic Tier 1040” retains programs that are essential to the system’s operation for storing, retrieving, and analyzing clinical, diagnostic, or treatment data. FIG. 1040 is Business logic Tier 10. Multiple instances of visit object 1042 can be found in 10. Each client session initiated via web distribution 1024, or report generation 1026 via the TCP/IP interface 1022 creates a separate instance of visit object 1042. Each visit object 1042 represents a stateful session bean. It includes a persistent storage space that is available from the initiation to termination of the client session. This persists beyond the time spent on a single interaction, method call or other interactions. FIG. 260 is associated with the persistent storage area. 2. The persistent storage area also stores data exchanged between the data warehouse 250, transcription services 230, physician 110 and clinical provider 115 or third party databases 215 via TCP/IP Interface 1022.

Industry customer 260 visits a program in Application Service objects Partition 1050. A message is sent by TCP/IP Interface 1022 to invoke a method to create visit object 1042 and store connection information in visit 1042 state. In turn, visit object 1042 invokes a method within the program. Although FIG. FIG. 10. depicts central processor 1016 controlling each program in the application service objects partition 1050. However, it should be understood that each function can be distributed to another system similar to data warehouse 250.

The object model splits business logic Tier 1040 into two parts: an application service objects partition 1050, and a data items partition 1060. Applications that reside in the application service objects partition 1050 include batch download 1051 and archiver 1052. Parser 1053 and taxonomy definier and validator 1054 are also included. Query builder 1055 is another program. C++, Java Server Pages and Oracle scripts are some of the programs found in application service object partition 1050. Data objects partition 1060 contains download data 1061 and archiver data 1062. Parser data 1063 and taxonomy delimiter and validator data 1064 are also included. Every program in the application object partition 1050 has an equivalent in the data objects partition 1106, which stores the program’s input, intermediate and output data. FIG. 6 shows the batch download 1051 process and archiver 1052 process. 6. FIG. 7, as discussed above. FIG. 8 shows the process of validator 1054 and taxonomy delimiter 1054 8 and FIG. 9 as discussed above. “The query builder 1055 process is described above.”

“FIG. FIG. 11 shows a diagram of the structure of clinical, diagnostic and treatment data 332 as shown in FIG. 3. A data warehouse for clinical, diagnostic, or treatment data 332 supports clinical and management decision making. The data that includes clinical, diagnostic, or treatment data 332 are grouped into the logical components for the data warehouse for specialty, demographics, oncology, urology, cardiology, gastroenterology, 1150, and orthopedics 1260. One embodiment of specialty and demographics 1110 provides external access to oncology 1120 and urology 1130 and cardiology 1140 and gastroenterology 150 and 1160. Only specialty and demographics1110 is accessible. Another embodiment has each logical component being separate and not linked to the other. It is also externally accessible.

“The embodiments described herein are filled-functioning systems, methods, and apparatus for storing, retrieving, and analyzing clinical, diagnostic, or treatment data in natural human language formats. However, it is important to understand that there are other similar embodiments. The disclosure is open to many modifications and variations. This system, method and apparatus for storing, retrieving, clinical, diagnostic, or treatment data does not have to be limited to the specific construction and operation shown. This disclosure will cover all modifications and equivalents that are possible within the scope of the claims.

“EXAMPLES”

“Example 1”

“Lone QT Syndrone”

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

The system informs the doctor that the patient may be suffering from partially penetrant Long QT Syndrome. The patient is recommended to undergo genetic testing. This includes testing for any of the five genes that are associated with Long QT Syndrome. A mutation in LQT2 was found, which affects potassium channels. All drugs that increase cardiac repolarization, such as antiarrythmics or gastrokinetics, antipsychotics and antihistamines, should be avoided according to the system. A substitute drug is recommended for seasonal allergies. Further testing is recommended by the system for relatives of patients. One sister and one daughter have the same LQT2 mutation. The physician makes recommendations to the patient and their family about avoiding these drugs in order to prevent sudden cardiac death.

“Example 2”

“Arthritis and anemia?”Thiopurine S?Methyltransferase Mutation

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

The system generates a set of results that include a suggestion to the doctor to test the patient for a Thiopurine-Methyltransferase Gene Locus mutation. The patient is heterozygous to mutant TPMT, which causes severe hematopoietic toxicity as well as anemia. The system produces a set of results that suggests to the doctor that the patient may have a genetic polymorphism that makes it intolerant to thiopurine medication. It also suggests alternative anti-arthritic medications that are not TPMT metabolized.

“Example 3”

“Colonic Neoplasia, Rapid Metabolic Phenotype For Acetyltransferase And Cytochrome P4501A2”.

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

There is growing evidence that fast acetylators who eat red meat cooked to a high degree may be more at risk of developing colon cancer. This type of susceptibility testing is becoming more important. This system will encourage physicians to conduct genetic testing as needed. It is unlikely that the average physician will know what the current recommendations are, especially since most doctors don’t follow the latest developments in genetic/molecular medicine and clinical medicine.

“Example 4”

“Breast Cancer BRCA1/2 Mutations, and Estrogen Metabolism.”

“A physician inputs the following clinical data into a system to determine a patient’s risk of developing a disease or susceptibility type, and/or drug reaction polymorphism:

Click here to view the patent on Google Patents.