Chemical Products – Reshma P. Shetty, Curt R. Fischer, Ginkgo Bioworks Inc

Abstract for “Methods, systems and methods for methylotrophic organic compound production”

“The present disclosure describes pathways, mechanisms and systems that can be used to produce carbon-based goods of interest such as sugars and alcohols, chemicals and amino acids as well as fatty acids and their derivatives as well as hydrocarbons and isoprenoids. These organisms are engineered or evolved methylotrophs and efficiently convert C1 compounds such as formate and formaldehyde to organic carbon-based product of interest. This includes the use of these organisms to commercially produce various carbon-based product of interest

Background for “Methods, systems and methods for methylotrophic organic compound production”

Heterotrophs are organisms that use energy from organic compounds to grow and reproduce. The commercial production of many carbon-based products is largely dependent on heterotrophic organisms, which ferment sugar from crop biomass like corn or sugarcane. [Bai 2008]. Alternative to bio-production based on fermentation is the production carbon-based products from photosynthetic organisms like plants, algae, and cyanobacteria that get their energy from sunlight and carbon dioxide to support growth [U.S. Patent. No. 7,981,647]. The production of carbon-based products using algae is dependent on photosynthesis, which is a slow process that produces energy from carbon dioxide. [Larkum 2010]. Photobioreactor design remains a challenging technical problem because commercial production of carbon-based products requires consistent and reliable exposure to light.

“Methylotrophs” are biological organisms that use energy and/or carbon from C1 substances containing no carbon?carbon bonds like formate, formic acids, formaldehyde and methane to make all multi-carbon organic compounds required for growth and reproduction. The majority of naturally occurring methylotrophs have not been shown to be commercially viable for industrial bio-processing. These organisms are more productive than industrialized heterotrophic organisms like Escherichiacoli due to their long doubling times. Techniques for genetic manipulation (homologous transformation, transfection or transformation of nucleic acids molecules and recombinant genes expression) are slow, tedious, laborious, or non-existent.

“There is a need to create engineered or evolved methylotrops that can be used for industrial purposes. The ability to create methylotrophs with biosynthetic capabilities to produce carbon-based products would allow for more efficient and energy-efficient production. The methylotroph could also be equipped with additional pathways to energy conversion, methylotrophy, and/or carbon fixation that would increase its ability to produce the carbon-based products of interest.

The present invention provides efficient production of carbon-based energy and other products of interest (e.g. fuels, sugars, chemicals) using C1 compounds. The present invention also allows for the replacement of conventional methods of producing chemicals, such as olefins (e.g. ethylene, propylene), that are traditionally derived using petroleum. This process generates toxic byproducts, which are harmful to the environment and are considered hazardous waste pollutants. The present invention is able to avoid the use and generation of toxic byproducts from petroleum. It also materially improves the environment’s quality by helping to maintain basic elements like air, water, and soil.

“In accordance with certain aspects, the invention provides a methylotroph that can be used to produce various carbon-based products from C1 compounds. One or more engineered carbon product biosynthetic pathways are used to convert central metabolites to desired products. The carbon-based products of concern include, but aren’t limited to, alcohols. fatty acids, their derivatives. fatty alcohols. fatty acids. fatty wax esters. hydrocarbons. fuels. commodity chemicals. specialty chemicals. carotenoids. sugars. phosphates. central metabolites. pharmaceuticals. intermediates. For example, the carbon-based products of interest can include one or more of a sugar (for example, glucose, fructose, sucrose, xylose, lactose, maltose, pentose, rhamnose, galactose or arabinose), sugar phosphate (for example, glucose-6-phosphate or fructose-6-phosphate), sugar alcohol (for example, sorbitol), sugar derivative (for example, ascorbate), alcohol (for example, ethanol, propanol, isopropanol or butanol), fermentative product (for example, ethanol, butanol, lactic acid, lactose or acetate), ethylene, propylene, 1-butene, 1,3-butadiene, acrylic acid, fatty acid (for example, co-cyclic fatty acid), fatty acid intermediate or derivative (for example, fatty acid alcohol, fatty acid ester, alkane, olegin or halogenated fatty acid), amino acid or intermediate (for example, lysine, glutamate, aspartate, shikimate, chorismate, phenylalanine, tyrosine, tryptophan), phenylpropanoid, isoprenoid (for example, hemiterpene, monoterpene, sesquiterpene, triterpene, tetraterpene, polyterpene, isoprene, bisabolene, myrcene, amorpha-4,11-diene, farnesene, taxadiene, squalene, lanosterol, ?-carotene, ?-carotene, lycopene, phytoene, limonene, or polyisoprene), glycerol, 1,3-propanediol, 1,4-butanediol, 1,3-butadiene, polyhydroxyalkanoate, polyhydroxybutyrate, lysine, ?-valerolactone, and acrylate. The carbon-based products can also include carbon-based centralmetabolites in some embodiments.

The resulting engineered methylotroph is capable of synthesizing efficient carbon-based products from C1 compounds. The invention provides carbon product biosynthetic routes for conferring biosynthetic manufacturing of the carbon-based product on host organisms that lack the ability to produce C1 compounds efficiently. The invention provides methods to introduce the carbon product biosynthetic pathway into the methylotroph. The invention provides media compositions and methods for cultivating the engineered or evolved methylotrophs to facilitate efficient methylotrophic production and other carbon-based products.

“The invention allows the C1 compound to be used as an energy source and carbon source for various embodiments. The C1 compound can be soluble in water or made to dissolve in water. The C1 compound could be formate, formic, methanol, and/or formaldehyde. Because of the efficiency of mass transfer and uptake, C1 compounds that dissolve in high concentrations or are easily dispersed in water are preferred to immiscible or less soluble chemical species like methane. Similar to Example 7, soluble C1 substances are preferred over molecular hydrogen or carbon dioxide, which can be used in the autotrophic production and use of carbon-based chemicals (see example 7). The composition of the media used to grow the organism may allow the C1 compound to dissolve in other solvents than water. Other components may enhance the solubility or the solubility of C1 compounds in media. The C1 compound may be obtained from electrolysis in some instances.

“In certain embodiments, one of the following carbon product biosynthetic routes can be used:

“In some embodiments, the 1-deoxy-D-xylulose-5-phosphate synthase can be encoded by SEQ ID NO: 1, or a homolog thereof having at least 80% sequence identity; or the isopentenyl pyrophosphate isomerase can be encoded by SEQ ID NO:2, or a homolog thereof having at least 80% sequence identity. One embodiment allows for the inclusion of isoprene synthase in said carbon product-biosynthetic pathway. SEQ ID NO.3 or a homolog with at least 80% sequence identity can encode the isoprene synthase. Another embodiment of the carbon product biosynthetic pathway includes E-alphabisabolene synthase. SEQ ID No:4 or a homolog with at least 80% sequence identity can encode the E-alphabisabolene synthase.

“In certain embodiments, the methylotrophic organism can be selected from the class Alphaproteobacterium. Paracoccus may also be used as the genus Paracoccus to select the methylotrophic organism. Paracoccus denitrificans or Paracoccus versutus can all be methylotrophic organisms.

“In some embodiments, an engineered cell can have a lower growth rate for electrolytically produced C1 compounds relative to nonevolved methylotrophic organism. Or a substantially comparable or increased growth rate for electrolytically produced C1 compounds relative to nonelectrolytically generated. The engineered cell may be further modified to reduce the growth rate of electrolytically produced C1 compound relative non-evolved methylotrophic organism or to increase the growth rate of electrolytically created C1 compound relative non-electrolytically.

“Another aspect is that an evolved methylotrophic animal is provided. It has a lower growth rate for electrolytically produced C1 compounds relative to nonevolved, or has a substantially similar growth rate or enhanced growth rate for electrolytically generated compound relative to nonelectrolytically generated.

“A further aspect of the invention is that a method of selecting an evolved methylotrophic species with improved growth on C1 compounds is provided. This involves: continuously monitoring the concentration of biomass in the culture room; and setting a flow rate for the C1 compound into this chamber to ensure a constant environment that promotes a higher growth rate. The method may also include the adjustment of the medium inflow to make it more conducive to growth or less inhibitive to growth. This is done to ensure that the cells are fit and healthy. The C1 compound may be formatted in certain cases. The C1 compound can be electroly generated. The C1 compound may also be soluble in water.

“Another aspect of the invention is a composition for bacterial cultivation, which is designed to provide formate as the only source of C1 compound and enhance the growth of methylotrophic bacteria. The composition may contain between 0 to 160 mM salt bicarbonate, between 16 and 100 mg sodium chloride, between 30 and 60 mM nitrate, and between 5 and 100 mg of sodium thiosulfate. The composition could contain 100 mM of sodium bicarbonate and 6 mM of sodium chloride, 6 mg sodium nitrate and 11 mM respectively. It also can have 26 mM ammonium or sodium formate. A basal minimal medium can also be included in the composition. The basal minimal media can, in some embodiments be M9 minimal medium or MOPS minimal medium. R medium or M63 medium may also be used.

“In another aspect, a composition for bacterial culture is also provided. It is designed to provide formate, the only C1 compound, and to increase the growth of methylotrophic bacterium in a fed batch bioreactor. A composition may include R medium, which is initially charged in a fed-batch bioreactor, supplemented with between 100 and 1 micromolar sodium molybdate and between 10 and 1000 nanomolar salt selenite, 0.01 to 1. mg/L cobalamin, and between 0.001 and 1 mg/L thiamine. The medium may contain, for example, 5 to 20 micromolar sodium molybdate, 50 to 200 nanomolar sodium selnite, 0.01 to 0.2 mg/L of cobalamin, and between 0.05 and 2 mg/L thiamine. A feed composition that is supplied to the fed batch bioreactor can also be included. It may contain a formate sodium at supramolar concentration. Formate salts can include ammonium formate or sodium formate. A supramolar amount of nitrate salt, such as sodium nitrate, can be added to the feed composition. The molar ratio between the nitrate salts and formate salt can be 3.2:8, 3.0,8 or lower.

“A method for cultivating methylotrophic bacterium is also provided in this document. It involves incubating methylotrophic bacterial in any of these compositions. In certain embodiments, incubating can also be done aerobically. Incubating can be done in a fed batch bioreactor. Some embodiments allow for a C1 feedstock consumption rate exceeding 1.5 g*L 1 hr. In some embodiments, incubating can take place with a C1-feedstock electron donor and a nitrate sodium salt as electron acceptor. The C1 feedstock can be a formate salt such as ammonium or sodium formate. The molar ratio between the nitrate and the formate salts can be maintained below 3.2 to 8 during incubation. The fed-batch bioreactor can be supplied with the nitrate and formate salt in a feed composition at supramolar concentrations, with a molar relationship of 3.0 to 8. The formate salt may be either ammonium formate or sodium formate in some embodiments. You can use sodium nitrate as the nitrate salt.

“Some aspects of growth of the methylotroph upon C1 compounds can also be enhanced by the addition or alternative pathways for energy conversion and/or methylotrophy, and/or carbon fixation. U.S. Pat. describes examples of energy conversion pathways as well as carbon fixation paths. No. No. 8.349,587. The entirety of this document is hereby incorporated by reference.”

The invention concerns the development and use of engineered and/or developed methylotrophs that can utilize C1 compounds to make desired products. This invention allows for the engineering of a new methylotroph such as Paracoccus denitrificans or Paracoccus versutus, or any other organism that is suitable for large-scale commercial production of chemicals and fuels. It can effectively utilize C1 compounds for growth. The chemical production process provides cost-effective processes for making carbon-based products. These organisms can be quickly optimized and tested at low costs. Further, the invention allows for engineering a methylotroph that includes one or more alternative pathways for utilizing C1 compounds to create central metabolites and/or other products.

C1 compounds are an alternative to sugar or light pluscarbon dioxide for the production carbon-based products. Non-biological methods exist to convert C1 compounds into chemicals or fuels. The Fischer-Tropsch process, for example, uses carbon monoxide as well as hydrogen gas from the gasification of biomass or coal to make methanol and mixed hydrocarbons. No. 1,746,464]. Fischer-Tropsch processes have several drawbacks. The Fischer-Tropsch process is considered an expensive and environmentally harmful method of generating liquid fuels. There are alternative processes that use naturally occurring microbes to convert synthesis gas (or syngas), a mixture primarily of molecular hydrocarbon and carbon monoxide. These can be obtained by gasification of any organic material, such as coal, oil, biomass, or other organic materials, to produce products such as ethanol or acetate or molecular Hydrogen [Henstra 2007]. These naturally occurring microbes are not able to produce a wide range of products and lack the tools necessary for genetic manipulation. They are also sensitive to high levels of their end products. There is work underway to integrate syngas utilization into industrial microbe hosts [U.S. Patent. No. No. 7,803,589]

“The invention allows for the engineering or evolution of methylotrophic organisms which are useful and/or suitable to be used in industrial applications. The invention provides a source for renewable energy. The invention allows for the use, in some embodiments, of a C1 chemical, such as formate or formic acid, formaldehyde or methanol, or any combination thereof. One embodiment of the C1 compound may be obtained from electrolysis. The commercial interest in renewable and/or carbon neutral energy is high. This includes solar voltaic and geothermal as well as nuclear and hydroelectric. These technologies are limited in their use of the electrical grid because they produce electricity. Some of these renewable energy sources, such as wind and solar, are intermittent and unreliable. There are not enough practical, large-scale electricity storage technologies to allow for the transfer of more electricity from renewable sources. It would be possible to store electricity in chemical form such as carbon-based products of interests. This would allow large-scale storage and also enable renewable electricity to meet the energy needs of the transportation sector. As one aspect of this invention, the combination of renewable electricity and electrolysis such as the electrochemical formation of formate/formic from carbon dioxide (for example, see WO/2007/041872] as well as formaldehyde/methanol from carbon dioxide (for example: WO/2012/015909; WO/2012/015905), opens up the possibility for a sustainable, reliable supply of C1 compounds.

“Some embodiments allow for the use of C1 compounds, such as formaldehyde or methanol, that are derived from waste streams. Formaldehyde, for example, is an oxidation product from methanol and methane. Methanol can also be made from synthesis gas (the main product of gasification, such as coal oil, natural gas and carbonaceous materials like biomass and agricultural crops, residues and waste organic matter), or by chemically synthetic processes that reduce carbon dioxide and hydrogen. Natural gas contains a significant amount of methane. It can also be extracted from renewable biomass.

“The invention allows for the expression of exogenous proteins and enzymes in the host cells, thus conferring biosynthetic pathways to use central metabolites to make reduced organic compounds. An engineered cell may also have one or more carbon product biosynthetic pathway(s) that convert central metabolism into desired products.

“The invention is described with reference to the reaction or product of a metabolic reaction. However, it also includes specific references to nucleic acid or genes that encode an enzyme, catalyzing, and/or a protein associated to the reaction or product. As stated elsewhere, the term reaction refers to both the reactants or products of the reaction. A reference to a reaction or product refers to the reaction. Likewise, any reference to any metabolic constituent also refers to the gene or genes that encode the enzymes or proteins involved with the referenced reaction, reactant, or product. Given the well-known fields in metabolic biochemistry, genomics and enzymology, any reference to a gene, encoding nucleic acids, or the reaction it catalyzes, or to a protein involved with the reaction, as well as to reactants or products, is also a reference to that enzyme.

“Definitions”

“As used herein the terms ‘nucleic acid molecule?,?, and? ?nucleic acid molecule? ?polynucleotide and?nucleic acid molecule? These terms can be interchanged and may include single-stranded and double-stranded RNA, DNA, and RNA:DNA hybrids. The terms “nucleic Acid?”, “nucleic Acid molecule?”, “polynucleotide?”, “oligonucleotide?”, and ‘oligomer?” are used herein. ?oligo? and?nucleic acid? are interchangeable. They are interchangeable and can be used interchangeably. Oligos can be as short as 5 to 200 nucleotides or as long at about 100 nucleotides or 30 to 50 nucleotides. You can use shorter or longer oligonucleotides. The present invention allows for the creation of oligos. A nucleic acid molecule can encode either a full-length or fragment of a polypeptide, or it may not.

“Nucleic acid” can be used to refer to either naturally occurring or synthetic polymeric nucleotides. The present invention’s oligos or nucleic acids molecules may be made from naturally occurring nucleotides. They can form deoxyribonucleic (DNA) and ribonucleic (RNA) molecules. Alternately, naturally occurring oligonucleotides can be modified to alter their properties such as in peptide or locked nucleic acid (LNA). It is important to understand that the terms can also refer to analogs, RNA and DNA made from nucleotide analogues. The embodiment described here may be applicable to single-stranded or dual-stranded polynucleotides. Nucleotides useful in the invention include, for example, naturally-occurring nucleotides (for example, ribonucleotides or deoxyribonucleotides), or natural or synthetic modifications of nucleotides, or artificial bases. For increased stability, modifications can include phosphorothioated base.

“Complementary nucleic acid sequences” These are the ones that can base-pairing according a standard Watson-Crick complementarity rule. The term “complementary sequences” is used herein. It refers to nucleic acids sequences that are substantially complementary, as can be determined by the nucleotide comparators and algorithms below. Also, it may be defined as being capable or capable of hybridizing with the polynucleotides that code the protein sequences.

“Gene” is the term used herein. “Gene” refers to any nucleic acid that contains information required for the expression of a protein, polypeptide, or untranslatedRNA (e.g., RRNA, tRNA and anti-senseRNA). The gene that encodes a protein includes the promoter, the structural gene open-reading frame sequence (ORF) and other sequences that are involved in the expression of the protein. The gene that encodes untranslatedRNA includes the promoter as well as the nucleic acids that encode the untranslatedRNA.

“The term ‘genome? as used herein refers to the whole hereditary information of an organism. The term “genome” refers to all of the hereditary information in an organism encoded within the DNA (orRNA for some viral species). This includes both coding and uncoding sequences. The term can include both the chromosomal and/or non-coding DNA. A ?native gene? or ?endogenous gene? Refers to a gene that is natively expressed in the host cell and has its own regulatory sequences, whereas an “exogenous” gene is? or ?heterologous gene? A gene that is not a natural gene and contains regulatory or coding sequences not native to the host cells. A heterologous gene can contain mutated sequences and/or part of regulatory or coding sequences. Some regulatory sequences can be homologous or heterologous to a particular gene. A heterologous regulatory sequence is not designed to regulate the same gene in nature as it is in the transformed host cell. ?Coding sequence? Refers to a DNA sequence that codes for a particular amino acid sequence. The term “regulatory sequences” is used herein. Refers to the nucleotide sequences upstream (5? Non-coding sequences, within or downstream (3? Non-coding sequences), within, or downstream (3? These regulatory sequences include promoters and ribosome binding sequences, translation leader sequences as well as RNA processing sites, effector binding sites (e.g. activator, regulator), stem-loop structures, etc.

“A genetic element can be any non-coding or coding sequence of nucleic acids, as described herein. A genetic element may be a nucleic that codes for an amino, a peptide, or a protein in some embodiments. Gene elements can be genes, genes fragments or operons. They may also include promoters, exons and introns. Gene elements may only have one or two codons, or they could include functional components such as e.g. Encoding proteins and/or regulatory elements. A genetic element may include the entire open-reading frame of a protein or the entire open-reading frame with one or more regulatory sequences. The genetic elements can also be considered modular genetic elements, or genetic modules by those skilled in the art. A genetic module may include a regulatory sequence, promoter, or coding sequence, or any combination thereof. In some embodiments, the gene element may include at least two genetic modules and at most two recombination spots. The genetic element in eukaryotes can include at least three modules. A genetic module could be, for example, a regulator sequence, a promoter or a coding sequence and a polyadenlylationtail or any combination thereof. The nucleic acid sequence can also include the coding sequences and the promoter. The leader sequence is an operable link between the 5? terminus of the coding sequence nucleic acids sequence. Signal peptide sequence codes an amino acid sequence that is linked to the amino terminus, which directs the polypeptide into a cell’s secretion pathway.

A codon, as it is commonly understood, is a sequence of three nucleotides or triplets that encode a specific amino acids residue in a polypeptide chains or for terminating translation (stop codons). There are 64 codons, 61 codons for amino acids and 3 stop codons. However, there are only 20 translated amino acids. Many amino acids can be encoded by multiple codons due to the overabundance of codons. Different organisms and organelles may have different preferences or biases about which codon encodes the same amino acid. The frequency with which a codon is used varies according to the organelle and organism. It is possible to alter the sequence of a gene to make it more compatible with the frequency and codon usage in a host. It is preferable to use codons that correspond with the host’s level of tRNA, particularly those that are charged even during starvation, in order to ensure reliable expression. Codons with rare cognate transcript tRNA’s can also affect protein folding and translation rates and may be used. Genes that are optimized for codon usage bias or relative tRNA abundance in the host are often called “optimized”. You can increase the expression level by optimizing codon usage. High accuracy and faster translation rates can be achieved by using optimal codons. Codon optimization is a silent mutation that does not alter the amino acid sequence of a protein.

“Genetic elements and genetic modules can be derived from natural or synthetic polynucleotides, or a combination of both. The genetic elements modules may be derived from different organisms in some instances. The genetic elements and modules that are useful in the methods described can be obtained from many sources, including DNA libraries, BAC (bacterial Artificial Chromosome) libraries, chemical synthesis de novo, excision or modification of a genome segment. These sequences can then be modified with standard molecular biology or recombinantDNA technology to create polynucleotide structures that have the desired modifications for reintroduction or construction of large product nucleic acids, such as modified, partially or fully synthetic genomes. Several methods are available to modify polynucleotide sequencing obtained from a library or genome, including site directed mutagenesis, PCR mutagenesis, inserting, deleting, or swapping sections of a sequence with restriction enzymes optionally combined with ligation, in vitro and in vivo homologous recombination, and site-specific recombination, or any combination thereof. Other embodiments may use synthetic oligonucleotides and polynucleotides as the genetic sequences. There are many methods that can be used to synthesize synthetic oligonucleotides and polynucleotides.

“In some instances, genetic elements share less that 99%, less over 95%, less about 90%, less then 80%, and less than 70% sequence identity. Each sequence can be identified by comparing the position of each amino acid in order to determine its identity. If the same amino acid or base occupies an identical position in the sequences, the molecules will be considered to be identical. However, if the site is occupied at the same location by a similar or identical amino acid residue (e.g. similar in steric or electronic nature), the molecules can then be called homologous (similar). The expression of identity, homology, similarity or identity is a function the number of identical or similarly named amino acids in the positions shared by the compared sequencing. The percentage of homology or similarity or identity is a function the number of identical and similar amino acids found in the positions shared by the compared sequencing. FASTA, BLAST and ENTREZ are all possible alignment algorithms. ENTREZ can be obtained through the National Center for Biotechnology Information at National Library of Medicine, National Institutes of Health in Bethesda (Md.). One embodiment of the GCG program can determine the percent identity between two sequences with a gap weight 1; e.g. each amino acid gap is weighted like a single nucleotide or amino acid mismatch. There are other methods of aligning [Doolittle 1996]. To align sequences, it is preferable to use an alignment program that allows gaps in the sequence. Smith-Waterman allows gaps in sequence alignments [Shpaer 1997]. To align sequences, you can also use the GAP program that uses the Needleman or Wunsch alignment methods. MPSRCH software runs on a MASPAR machine. This is an alternative search strategy. MPSRCH scores sequences using a Smith-Waterman algorithm on a hugely parallel computer.

“An ‘ortholog? is a term used herein. An ortholog is a gene (or set of genes) that is related via vertical descent and which perform substantially the same functions in different organisms. Orthologs can be made for biological functions such as hydrolysis of epoxides. Vertical descent is when genes are related if they have sufficient sequence similarity to show that they are homologous or are related by evolution from the same ancestor. Orthologs are also possible if genes share a three-dimensional structure, but not necessarily a sequence similarity. This indicates that they evolved from a common ancestral source. Orthologous genes can encode proteins that have a sequence similarity between 25% and 100%. Vertical descent can also be used to consider genes encoding proteins with an amino acid similarity of less than 25%. If their three-dimensional structures also show similarities, they may also be considered vertical descendent. Vertical descent is considered to be the cause of all members of the serine protease enzyme family, which includes tissue plasminogen activate and elastase. Orthologs are genes or encoded gene products that have evolved to differ in their structure or activity. If a species encodes a gene product that has two functions, and these functions have been split into distinct genes in another species, then the orthologs are three genes and their respective products. The production of a biochemical product requires that an orthologous gene which has the same metabolic activity is chosen to construct the non-naturally occurring microorganism. One example of orthologs with separable activities is when distinct activities are separated into distinct gene product between two or more species, or within one species. One example is the seperation of plasminogen proteolysis (two types of serine protease activity) and elastase proteolysis (two types of serine protease activator), into distinct molecules, plasminogen activator or elastase. Another example is the seperation of mycoplasma 5-3? Drosophila DNA Polymerase III activity and exonuclease. The DNA polymerase of the first species can be considered an ortholog for either or both the exonuclease and polymerase of the second species, and vice versa.

“Paralogs” is a synonym for “in contrast”, as it is used herein. Paralogs are homologs that are related through, for instance, duplication and evolutionary divergence. They have similar or identical functions, but they do not necessarily have the same functions. Paralogs can be derived from the same species, or from another species. Paralogs can include microsomal and soluble epoxide hydrlases (epoxide hllase I and II), which are two different enzymes that co-evolved in the same ancestor. They catalyze different reactions and perform distinct functions within the same species. Paralogs are proteins of the same species that have significant sequence similarities to one another, suggesting they are either homologous or related by co-evolution from common ancestors. Paralogous protein families can be divided into HipA homologs and luciferase gene genes, peptidases and other groups.

“As used herein, a ?nonorthologous gene displacement? A non-orthologous gene of one species that can replace a referenced function in another species. Substitution can be defined as being able to perform substantially similar functions in the species of origin to the referenced function. A nonorthologous gene displacement can be identified as structurally related with a gene encoding the referenced function. However, functionally more similar but less structurally related genes and their corresponding products still fall within the definition of the term as it is used in this document. Functional similarity is, for instance, a minimum of structural similarity in a non-orthologous product’s active site or binding area to the gene that encodes the function to be substituted. A non-orthologous gene can also include a paralog, or unrelated gene.

Methods that are well-known to the skilled in the field can determine “Orthologs,” “Parlogs” and “nonorthologous gene displacements.” An example of this is the inspection of two polypeptide sequences, either amino acid or nucleic acids. This can show similarities and sequence identity. One skilled in the art can use such similarities to determine if the similarity is sufficient to prove that the proteins are related by evolution from a common ancestral. Algorithms such as Align BLAST, Clustal W, Clustal W, and others can be used to determine a sequence’s identity or raw sequence similarity. They also help determine the significance or presence of gaps in the sequence that can be assigned a score or weight. These algorithms are also used to determine nucleotide sequence identity or similarity. The parameters for sufficient similarity to determine similarity are calculated based on well-known methods of calculating statistical similarity. This is the probability of finding a match in random polypeptides. The significance of the match is also determined. If desired, computer comparisons of multiple sequences can be visually optimized by skilled artists. Similar gene products and proteins can have a high degree of similarity. For example, 25 to 100 percent sequence identity. Unrelated proteins can have an identity that is almost identical to what would be expected by chance if there is enough data (about 5%). It is possible for sequences between 5% to 24% to be homologous enough to prove that they are related. To determine the relevancy of these sequences, additional statistical analysis can be performed to determine their significance given the size of this data set. Below are some examples of parameters that can be used to determine the relatedness between two or more sequences by using the BLAST algorithm. BLASTP version 2.0.8 (January 5, 1999) can be used to align amino acid sequences. It uses the following parameters: Matrix = 0 BLOSUM62; gap close: 11; gap extension = 1; x_dropoff= 50; expect: 10.0, wordsize: 3 and filter: on. BLASTN version 2.0.6 (Sept. 16, 1998) can be used to align nucleic acids. The following parameters are required: Match: 1, mismatch:??2, gap open: 5, gap extension: 2, x_dropoff 50, expect: 10.0, wordsize: 11, filter: off The art is not difficult if you are skilled enough to modify the parameters above to increase or decrease stringency, or to determine the relationship between two or more sequences.

“Homolog” is used herein. “Homolog” can be used to refer to any ortholog, paralog or non-orthologous gene or similar gene that encodes an enzyme that catalyzes a similar or substantially identical metabolic reaction from different species.

“Homologous Recombination” is the term used herein. The process by which nucleic acids molecules with similar nucleotide sequences exchange nucleotide strings is called homologous recombination. A nucleotide sequence in a first nucleic acids molecule that is capable of engaging in homologous replication at a predetermined position of a second nucleic acids molecule can have a nucleotide sequencing that facilitates nucleotide exchange between the nucleic acids molecule and the defined position of the second. The nucleotide sequence of the first nucleic acids can be sufficiently complementary to the nucleotide sequence of the second nucleic acids molecule to encourage nucleotide pairing. Homologous Recombination requires homologous sequences from the two recombining nucleic acid partners, but not any particular sequences. It is possible to introduce heterologous nucleic acids and/or mutations to the host genome using homologous recombination. These systems rely on the sequence flanking the heterologous nuclear acid to be expressed to have enough homology with the target sequence in the host cell genome. Recombination between vector nucleic acids and target nucleic acids takes place, allowing the delivered nucleic Acid to be integrated into host genome. The art of homologous recombination is well-known to those skilled in the field.

“It is important to realize that the nucleic acids sequence or gene of interest can be derived from natural organisms’ genomes. In certain embodiments, genes may be extracted from either the host genome or the genome of a naturally occurring organism. In vitro enzymatic and in vivo excisions and amplifications can be used to excision large genomic fragments. The FLP/FRT site-specific recombination system, and the Cre/loxP website-specific recombination software have been used to efficiently excision large genomic pieces for the purpose of sequence [Yoon 1998]. Some embodiments allow for excision and amplification to aid in artificial genome assembly or chromosome assembly. The chromosome of the methylotroph may be removed from which genomic fragments can be altered and inserted into the artificial genome or the chromosome in the host cell. The engineered promoters or other gene expression elements can be used to assemble the genomic fragments excised and then inserted into the host cell’s genome.

“Polypeptide” is the term used herein. A sequence of contiguous amino acid of any length. The terms “peptide”,? ?oligopeptide,? ?protein? or ?enzyme? This term may be interchangeable with the term “polypeptide?” In certain instances, ?enzyme? A protein with catalytic activities.

“A ?proteome? A proteome is the whole set of proteins that are expressed by any organism, cell, tissue, or genome. It is, more specifically, the entire set of proteins expressed in a particular type of cell or organism at a specific time and under certain conditions. Transcriptome refers to all RNA molecules including mRNA and rRNA. The complete set of small-molecule metabolic metabolites, including hormones, signaling molecules and metabolic intermediates, that can be found in a biological sample (such as one organism) is called the Metabolome.

“The term “fuse” is used in the following sentences: ?fused? ?fused? The covalent linkage of two polypeptides within a fusion protein is called?link? The polypeptides can be joined by a peptide linkage, which is either directly to one another or via an amino- acid linker. The peptides may also be joined using non-peptide covalent links, which are well-known to those skilled in the art.

“As used herein except where otherwise noted, the term ‘transcription? The synthesis of RNA using a DNA template is the term; the term “translation” refers to the synthesis. The synthesis of a polypeptide using an mRNA template. The sequence and structure 5? regulates translation. Untranslated region (?-UTR), of the mRNA transcript. The ribosome binding sequence (RBS) is one regulatory sequence that promotes accurate and efficient translation of mRNA. Prokaryotic RBS is a Shine-Dalgamo-rich sequence of 5’-UTR. This sequence is complementary to the UCCU core sequence at the 3?-end 16S rRNA (located in the 30S small ribosomal unit). There are many Shine-Dalgamo sequences that have been identified in prokaryotic DNAs. They generally reside about 10 nucleotides downstream of the AUG start codon. The length and nucleotide content of the spacer that separates the RBS from the initiator AUG can influence the activity of a RBS. The Kozak sequence A/GCCACCAUGG is found in eukaryotes. It lies within a 5? Translation of mRNA is controlled by the untranslated area. If the mRNA lacks the Kozak consensus sequence, it may still be translated in vitro if it has a moderately long 5-?-UTR without a stable secondary structure. E.coli prefers to recognize the Shine?Dalgamo sequence. However, eukaryotic Ribosomes (such retic lysate) are able to efficiently use either the Shine?Dalgamo ribosomal binding site or the Kozak.

“As used herein the terms ‘promoter,’? ?promoter element,? Or?promoter element? A DNA sequence that, when ligated with a nucleotide sequencing of interest, is capable of controlling transcription of that nucleotide sequence into mRNA. A promoter is usually located at 5? It is located upstream of the nucleotide sequence that interests and controls its transcription into mRNA. This provides a location for specific binding by RNA Polymerase and other transcription factors to initiate transcription.

“One must understand that promoters are modular in their architecture, and that this modular architecture can be modified. Bacterial promoters usually include a core promoter and additional promoter elements. Core promoter is the minimum amount of promoter needed to initiate transcription. A core promoter contains a Transcription Start site, which is a binding site to RNA polymerases as well as general transcription factor binding spots. What is the ‘transcription start site’? The?transcription start site? refers to the nucleotide that is to be transscribed. It is designated +1. Nucleotides downstream of the start site will be numbered +1 and +2, respectively, while nucleotides that are upstream of the start site will be numbered?1,?2, etc. Additional promoter elements can be found 5? The frequency of transcription is controlled by additional promoter elements located 5? Specific transcription factor sites are the proximal and distal promoter elements. A core promoter in prokaryotes usually contains two consensus sequences: a?10 or a?35 sequence that are recognized by the sigma factors (see [Hawley 1983]). The?10 sequence is located 10 bp downstream of the first transcribed DNA nucleotide. It typically contains 6 nucleotides and typically includes the nucleotides Adenosine (also known by the Pribnowbox). In some cases, the nucleotide sequence for the?10 sequence may be 5?-TATAAT. Other times it may contain 3 to 6 bases pairs from the consensus sequence. This box is necessary for the transcription to begin. The core promoter’s?35 sequence is usually 6 nucleotides long. Each of the four nucleosides make up the nucleotide sequence for the?35 sequence. This sequence is essential for a high transcription rate. Some embodiments have the nucleotide sequence for the?35 sequence at 5?-TTGACA. Other sequences may contain 3 to 6 base pairs. Some embodiments have the sequences?10 or?35 separated by 17 nucleotides. Eukaryotic promoters can be found several kilobases downstream of the transcription start site. They are more diverse than those for prokaryotics. TATA boxes are sometimes found in eukaryotic promoters (e.g. TATA box (e.g., containing part of the consensus sequence TATAAA), located between 40 and 120 bases from the transcriptional start point. Specific binding proteins may recognize one or more UAS sequences upstream and activate the transcription. These UAS sequences can be found downstream of the transcription initiation sites. It is possible for the distance between UAS sequences (and the TATA box) to be as high as 1 kb.

“As used in this document, the term vector? “Vector” refers to any genetic element such as a plasmid or phage, transposon and cosmid. A marker may be included in the vector that can be used to identify transformed or infected cells. The markers could be antibiotic resistant, fluorescent, enzyme-activated, or other traits. A second example is that markers can be used to supplement auxotrophic deficiencies and provide critical nutrients not found in culture media. Cloning and expression vectors are two types of vectors. The term “cloning vector” is used herein. The term?cloning vector? refers to a plasmid, phage DNA, or any other DNA sequence that is capable of reproducing autonomously in a host cells. It is also characterized by one to a few restriction endonuclease sites and/or sites to site-specific replication. These sites can be used to splice a foreign DNA fragment into the vector in order for it to become a part of the vector. Expression vector is a term that means “expression vector”. Expression vector is a vector that can express a gene that has already been cloned. This expression can take place after transformation into a host or IVPS system. The cloned genome is often operably linked with one or more regulatory sequences such as terminators, enhancers, promoters, activator/repressor binding site, terminators, enhancers, and so on. You can have promoter sequences that are constitutive, inducible, and/or repressible.

“As used herein the term ‘host? “As used herein, the term?host? or?host cells? Any prokaryotic and eukaryotic organism (e.g. mammalian or insect, yeast, plant or animal, bacterial, archaeal. avian. etc.) Cell or organism. A replicable expression vector, or cloning vector can be delivered to the host cell. A methylotroph is a cell that has been genetically engineered, metabolically modified, or naturally exists. The host cells can be either prokaryotic species such as those of the genus Paracoccus or Escherichia or eukaryotic cell types such as yeast, mammalian, insect, amphibian or mammalian cells. Cell lines are specific cells that can be grown indefinitely under the right conditions and medium. Cell lines can include mammalian, insect, and plant cells. Exemplary cell lines include stem cell lines and tumor cell lines. A heterologous nucleic acids molecule can contain, among other things, a sequence, a transcriptional regulatory sequence, such as a promoter or enhancer, repressor and the like, and/or an origin for replication. The terms “host” and “host cell” are used herein. ?host cell,? ?recombinant host? ?recombinant cell? They may be interchangeable. Examples of such hosts are found in [Sambrook 2001].

“One or more nucleic acids sequences can be targeted to deliver prokaryotic and eukaryotic cells using conventional transfection or transformation techniques. The terms “transformation” and “transfection” are used herein. The terms?transformation? and?transfection are used herein. are intended to refer to a variety of art-recognized techniques for introducing an exogenous nucleic acid sequence (e.g., DNA) into a target cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, conjugation, electroporation, optoporation, injection and the like. Water, CaCl2, cationic Polymers, lipids and other suitable media are all acceptable for transfection. You can find suitable materials and methods to transform or transfect target cells in [Sambrook 2001] and other laboratory manuals. For transfection or transformation, certain oligo concentrations (per oligo), can be used in certain cases.

“As used herein the term’marker? or ?reporter? A gene or protein that can attach to a regulatory sequence or other protein of interest so that when expressed in a host cell, the reporter can confer certain characteristics which can be easily identified, measured, and/or selected. Reporter genes can be used to indicate whether a particular gene has been introduced into the host cell or organism. Some examples of common reporters are: auxotropic markers, antibiotic resistance genes,?-galactosidase, luciferase, bacterial gene lacZ, luciferase, chloramphenicol Acetyltransferase, (CAT; from bacteria), GUS; commonly used by plants, and green fluorescent protein; (GFP) from jelly fish. Selectable markers or reporters can be screenable. A selectable marker, such as an antibiotic resistance gene or auxotropic marker, is a marker that confers a trait that can be artificially selected. Typically, host cells expressing the marker are protected from any selective agent that could cause damage to their growth. Screenable markers (e.g., lacZ, gfp) allow researchers to identify desired cells that express the marker and those that don’t (or are not expressing it at sufficient levels).

“As used herein the term’methylotrophic organism? or ?methylotrophic organism? This term refers to organisms which produce complex organic compounds using compounds that are not carbon-carbon bound. It includes formate, formic acids, formaldehyde and methane. C1 compounds are often used by methylotrophs as both a source and a sink for carbon. The ribulose monophosphate (FIG) cycle is an example of a methylotrophic metabolic pathway for the production central metabolites from C1 substances. 1) and the serine (FIG. 2). ?Autotrophs? or ?autotrophic organisms? This term refers to organisms using simple, inorganic carbon molecules such as carbon dioxide as their primary carbon source for growth. Some methylotrophs, but not all, assimilate C1 compound via carbon dioxide. They are also autotrophs. These organisms convert C1 compounds like methanol, formaldehyde, formaldehyde, and methylamine to carbon dioxide (see FIG. 3) and then reduce carbon dioxide to central metabolites using carbon fixation cycles using, for example, the Calvin-Benson-Bassham cycle (FIG. 4) or the reductive tricarboxlic acid cycle (FIG. 5). In contrast, ?heterotrophs? Or?heterotrophic or heterotrophic organisms? This refers to organisms who must use reduced organic carbon compounds with carbon carbon bonds for growth. They cannot use inorganic CO as their primary source of carbon. Heterotrophs instead get energy from the breakdown of organic molecules they consume. Mixotrophs and mixotrophic organisms can combine different sources of energy and carbon. They can alternate between heterotrophy or autotrophy, heterotrophy or methylotrophy or between phototrophy or chemotrophy or any combination thereof depending on the environmental conditions.

“Reduced cofactor” is the term used herein. Refers to intracellular energy carriers and redox, such as NADH and NADPH. They can also donate high-energy electrons in reduction-oxidation processes. The terms?reduced and reducing cofactor? are interchangeable. redox cofactor? “Can be interchangeably used.”

“As used herein the term ‘C1 compound?, or?1C compound?” or ?C1 compound? C1 compounds are chemical species that have reduced species, but no carbon-carbon bonds. C1 compounds can contain one carbon atom (e.g. formate, formic, formamide and formaldehyde), or multiple carbons (e.g. dimethylether, dimethylamine and dimethyl sulfur). C1 compounds can also be organic (e.g. formate, formic acids) or inorganic (e.g. methane and methanol). C1 compounds are often used as a source for energy and carbon for methylotrophs.

“Central metabolite” is the term used herein. refers to organic carbon compounds, such as acetyl-coA, pyruvate, pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, malate, malic acid, lactate, lactic acid, acetate, acetic acid, citrate and/or citric acid, that can be converted into carbon-based products of interest by a host cell or organism. The central metabolites are usually restricted to the reduced organic compounds that can be obtained in any given host cell. In certain embodiments, the central metabolism can also be the carbon product of concern. In this case, no further chemical conversion is required.

“References to a particular chemical specie include not only that species, but also the water-solvated forms. Carbon dioxide, for example, includes both the gaseous (CO2) and water-solvated forms (e.g. bicarbonate ion).

“The term “biosynthetic path” is used herein. or?metabolic path? A set of biochemical reactions that convert (transmute) one chemical species into another. Anabolic pathways are a way to make a larger molecule out of smaller molecules. This requires energy. Catabolic pathways are the breaking down of larger molecules and often releasing energy. The term “energy conversion pathway” is used herein. A metabolic pathway that converts energy from a C1-compound to a reducing agent is called “energy conversion pathway”. The term “carbon fixation pathway” is used. A biosynthetic pathway which converts inorganic carbon (e.g. carbon dioxide, bicarbonate, or formate) to reduced organic carbon such as one or several carbon product precursors. The term “methylotrophic pathway” is used. A biosynthetic pathway that transforms C1 compounds into compounds with carbon-carbon bonds such as one or several carbon product precursors. “Carbon product biosynthetic pathway” is the term. A biosynthetic pathway which converts one or several carbon product precursors into one or multiple carbon based products of interest.

“Engineered methylotrophic organism” is the term used herein. Or?engineered-methylotrophic organism? This refers to organisms genetically engineered for conversion of C1 compounds such as formate or formaldehyde to organic carbon compounds. An engineered methylotroph does not have to derive its organic carbon compounds exclusively from C1 compounds, as is the case herein. An engineered methylotroph can also refer to an original methylotrophic, mixotrophic organism that has been genetically engineered to include energy conversion, carbon fixation and/or carbon product biosynthetic routes in addition to its endogenous ability to methylotrophic. The term “engineer”,? ?engineering? or ?engineered,? As used herein, it refers to genetic manipulation of biomolecules like DNA, RNA, and/or proteins, or any similar technique, commonly known in biotechnology art.”

“Carbon-based products of interest” is the term used herein. refers to a desired product containing carbon atoms and include, but not limited to alcohols such as ethanol, propanol, isopropanol, butanol, octanol, fatty alcohols, fatty acid esters, wax esters; hydrocarbons and alkanes such as propane, octane, diesel, Jet Propellant 8, polymers such as terephthalate, 1,3-propanediol, 1,4-butanediol, polyols, polyhydroxyalkanoates (PHAs), polyhydroxybutyrates (PHBs), acrylate, adipic acid, epsilon-caprolactone, isoprene, caprolactam, rubber; commodity chemicals such as lactate, docosahexaenoic acid (DHA), 3-hydroxypropionate, ?-valerolactone, lysine, serine, aspartate, aspartic acid, sorbitol, ascorbate, ascorbic acid, isopentenol, lanosterol, omega-3 DHA, lycopene, itaconate, 1,3-butadiene, ethylene, propylene, succinate, citrate, citric acid, glutamate, malate, 3-hydroxyprionic acid (HPA), lactic acid, THF, gamma butyrolactone, pyrrolidones, hydroxybutyrate, glutamic acid, levulinic acid, acrylic acid, malonic acid; specialty chemicals such as carotenoids, isoprenoids, itaconic acid; biological sugars such as glucose, fructose, lactose, sucrose, starch, cellulose, hemicellulose, glycogen, xylose, dextrose, galactose, uronic acid, maltose, polyketides, or glycerol; central metabolites, such as acetyl-coA, pyruvate, pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, malate, malic acid, lactate, lactic acid, acetate, acetic acid, citrate and/or citric acid, from which other carbon products can be made; pharmaceuticals and pharmaceutical intermediates such as 7-aminodesacetoxycephalosporonic acid, cephalosporin, erythromycin, polyketides, statins, paclitaxel, docetaxel, terpenes, peptides, steroids, omega fatty acids and other such suitable products of interest. These products can be used as intermediates in the production of other products such as pharmaceuticals, biofuels, industrial chemicals and specialty chemicals.

“Hydrocarbon” is the term used herein. A chemical compound made up of carbon, hydrogen, and optionally oxygen. ?Surfactants? Surfactants are substances that can reduce the liquid’s surface tension. They typically consist of a water-soluble head, a hydrocarbon chain, or tail. The hydrocarbon chain can be hydrophobic and the water-soluble group can be either ionic or notionic. “Biofuel” is a term that refers to fuel that comes from biological sources. Any fuel that comes from a biological source is called “biofuel”.

“The accession numbers in this description were derived from the NCBI (National Center for Biotechnology Information) database maintained by the National Institute of Health USA. These accession numbers were added to the database on August 1, 2011. Enzyme Classification Numbers, (E.C. The Enzyme Classification Numbers (E.C.) described in this description were derived from the KEGG Ligand Database, which is maintained by the Kyoto Encyclopedia of Genes and Genomics and sponsored in part by University of Tokyo. The E.C. The E.C.

“Other terms in the fields of recombinant DNA technology, microbiology and metabolic engineering as used herein will generally be understood by someone of ordinary skill in applicable arts.”

“Source of C1 Compounds.”

“Some embodiments use suitable C1 compounds such as formate, formic, methanol, and/or formaldehyde. The electrochemical reduction of CO2 can produce formate, formic, formaldehyde, and methanol [see, e.g. Hori, 2008].

“In certain cases, liquid feedstocks like formate, formic, or formaldehyde can be preferred to gaseous feedstocks such methane or synthetic gas. Methane, a gas that has low water solubility in the water, is commonly used as a feedstock for engineered or evolved methylotrophs. (Biological systems are aqueous). The same is true for synthesis gas, which is composed of molecular hydrogen (and carbon monoxide). It also has low water solubility. High rates of mass transfer between the liquid and gas phases can be difficult at large scale reactors or fermentors. In contrast, formate, formaldehyde and methanol due to their higher solubility/miscibility in H2O, do not have this problem. Formate, formaldehyde, formaldehyde, or methanol can be more beneficial when water is used as a solvent in the growth medium.

The energy efficiency of electrochemical carbon dioxide conversion impacts the overall energy efficiency for a bio-manufacturing procedure using an engineered or evolved methylotroph of this invention. Overall energy efficiency of electrolyzers is 56-73% for current densities between 110-300 mA/cm2 and 800-1600 (PEM) electrolyzers [Whipple 2010,]. However, while electrochemical systems have had high current densities and energy efficiency, they are not able to achieve both. For electrochemical productions of formate and formaldehyde, as well as formic acid and formaldehyde, further technology improvements are required.

“Organisms or host cells for engineering or evolution”

“The host cell or organism, as disclosed herein, may be chosen from methylotrophic eukaryotic or prokaryotic systems, such as bacterial cells (Gram-negative (e.g., Alphaproteobacterium) or Gram-positive), archaea and yeast cells. You can use the same cells or cell lines that are used in industrial and laboratory settings. In some embodiments, host cells/organisms can be selected from Bacillus species including Bacillus methanolicus, Bilophila wadsworthia, Burkholderia species including Burkholderia phymatum, Candida species including Candida boidinii, Candida sonorensis, Cupravidus necator (formerly Alcaligenes eutrophus and Ralstonia eutropha), Hyphomicrobium species including Hyphomicrobium methylovorum, Hyphomicrobium zavarzinii, Methanococcus maripaludis, Methanomonas methanooxidans, Methanosarcina species, Methylibium petroleiphilum, Methylobacillus flagellatus, Methylobacillus flagellatum, Methylobacillus fructoseoxidans, Methylobacillus glycogenes, Methylobacillus viscogenes, Methylobacter bovis, Methylobacter capsulatus, Methylobacter vinelandii, Methylobacterium species including Methylobacterium dichloromethanicum, Methylobacterium extorquens, Methylobacterium mesophilicum, Methylobacterium organophilum, Methylobacterium rhodesianum, Methylococcus capsulatus, Methylococcus minimus, Methylocystis species including Methylocystis parvus, Methylomicrobium alcaliphilum, Methylomonas species including Methylomonas agile, Methylomonas albus, Methylomonas clara, Methylomonas methanica (formerly Bacillus methanicus and Pseudomonas methanica), Methylomonas methanolica, Methylomonas rosaceous, Methylomonas rubrum, Methylomonas streptobacterium, Methylophilus methylotrophus, Methylosinus species including Methylosinus sporium, Methylosinus trichosporium, Methylosporovibrio methanica, Methyloversatilis universalis, Methylovorus mays, Mycobacterium vaccae, Nautilia sp. Nautilia profundicola strain AmN, Paracoccus species (including Paracoccus veritrificans), Paracoccus versutus, Paracoccus zeaxanthinifaciens), Nautilia profundicola species, Nautilia lithotrophica species, Nautilia profundicola species, Methylobacterium capsulatus, Methylobacterium mesophilicum, Methylobacterium, Mycobacterium species, Verrucomicrobia species and Xanthomicrobia species, Xanthomicrobia species, Xanthomicrobia species, s, Nautilia s, Nautilia s, Nautilia s, Nautilia s, Nautilia a s, Nautilia s, Nautilia s, Nautilia s, Nautilia species, s s s s s s s s s s s s s s s es s s s s s s s s s s s s s s, s s s s s s s s s s s s s s The genetic modifications and metabolic alterations described herein are intended to be used with Paracoccus species, Paracoccus versutus or Paracoccus zeaxanthinifaciens as a host organism. The complete genome sequencing of many organisms and the skill level in genomics would allow those who are skilled in the art to easily apply the teachings and guidance herein to virtually all other methylotrophic hosts cells and organisms. Paracoccus denitrificans’ metabolic modifications can be easily applied to other species, by including the same or an analogous encoding DNA from other species. These genetic modifications can include genetic alterations of species homologs in general and, in particular, orthologs paralogs or non-orthologous genes displacements.

“In different aspects of the invention, cells are genetically engineered or metabolically evolved for optimized energy conversion and/or carbon fixation. The terms “metabolically evolved” and “metabolic evolution” are interchangeable. Or?metabolic evolution? “This refers to growth-based selection (metabolic evolutionary) of host cells that show improved growth (cell yield).

“Exemplary genomes and nucleic acids include full and partial genomes of a number of organisms for which genome sequences are publicly available and can be used with the disclosed methods, such as, but not limited to, Aeropyrum pernix; Agrobacterium tumefaciens; Anabaena; Anopheles gambiae; Apis mellifera; Aquifex aeolicus; Arabidopsis thaliana; Archaeoglobusfulgidus; Ashbya gossypii; Bacillus anthracis; Bacillus cereus; Bacillus halodurans; Bacillus licheniformis; Bacillus subtilis; Bacteroides fragilis; Bacteroides thetaiotaomicron; Bartonella henselae; Bartonella quintana; Bdellovibrio bacteriovirus; Bifidobacterium longum; Blochmannia floridanus; Bordetella bronchiseptica; Bordetella parapertussis; Bordetella pertussis; Borrelia burgdorferi; Bradyrhizobium japonicum; Brucella melitensis; Brucella suis; Buchnera aphidicola; Burkholderia mallei; Burkholderia pseudomallei; Caenorhabditis briggsae; Caenorhabditis elegans; Campylobacter jejuni; Candida glabrata; Canis familiaris; Caulobacter crescentus; Chlamydia muridarum; Chlamydia trachomatis; Chlamydophila caviae; Chlamydophila pneumoniae; Chlorobium tepidum; Chromobacterium violaceum; Ciona intestinalis; Clostridium acetobutylicum; Clostridium perfringens; Clostridium tetania Corynebacterium diphtheriae; Corynebacterium efficiens; Coxiella burnetii; Cryptosporidium hominis; Cryptosporidium parvum; Cyanidioschyzon merolae; Debaryomyces hansenii; Deinococcus radiodurans; Desulfotalea psychrophila; Desulfovibrio vulgaris; Drosophila melanogaster; Encephalitozoon cuniculi; Enterococcusfaecalis; Erwinia carotovora; Escherichia coli; Fusobacterium nucleatum; Gallus gallus; Geobacter sulfurreducens; Gloeobacter violaceus; Guillardia theta; Haemophilus ducreyi; Haemophilus influenzae; Halobacterium; Helicobacter hepaticus; Helicobacter pylori; Homo sapiens; Kluyveromyces waltii; Lactobacillus johnsonii; Lactobacillus plantarum; Legionella pneumophila; Leifsonia xyli; Lactococcus lactis; Leptospira interrogans; Listeria innocua; Listeria monocytogenes; Magnaporthe grisea; Mannheimia succiniciproducens; Mesoplasma florum; Mesorhizobium loti; Methanobacterium thermoautotrophicum; Methanococcoides burtonii; Methanococcus jannaschii; Methanococcus maripaludis; Methanogenium frigidum; Methanopyrus kandleri; Methanosarcina acetivorans; Methanosarcina mazei; Methylococcus capsulatus; Mus musculus; Mycobacterium bovis; Mycobacterium leprae; Mycobacterium paratuberculosis; Mycobacterium tuberculosis; Mycoplasma gallisepticum; Mycoplasma genitalium; Mycoplasma mycoides; Mycoplasma penetrans; Mycoplasma pneumoniae; Mycoplasma pulmonis; Mycoplasma mobile; Nanoarchaeum equitans; Neisseria meningitidis; Neurospora crassa; Nitrosomonas europaea; Nocardia farcinica; Oceanobacillus iheyensis; Onions yellows phytoplasma; Oryza sativa; Pan troglodytes; Paracoccus denitrificans; Paracoccus versutus; Paracoccus zeaxanthinifaciens; Pasteurella multocida; Phanerochaete chrysosporium; Photorhabdus luminescens; Picrophilus torridus; Plasmodium falciparum; Plasmodium yoelii yoelii; Populus trichocarpa; Porphyromonas gingivalis Prochlorococcus marinus; Propionibacterium acnes; Protochlamydia amoebophila; Pseudomonas aeruginosa; Pseudomonas putida; Pseudomonas syringae; Pyrobaculum aerophilum; Pyrococcus abyssi; Pyrococcus furiosus; Pyrococcus horikoshii; Pyrolobus fumarii; Ralstonia solanacearum; Rattus norvegicus; Rhodopirellula baltica; Rhodopseudomonas palustris; Rickettsia conorii; Rickettsia typhi; Rickettsia prowazekii; Rickettsia sibirica; Saccharomyces cerevisiae; Saccharomyces bayanus; Saccharomyces boulardii; Saccharopolyspora erythraea; Schizosaccharomyces pombe; Salmonella enterica; Salmonella typhimurium; Schizosaccharomyces pombe; Shewanella oneidensis; Shigella flexneria; Sinorhizobium meliloti; Staphylococcus aureus; Staphylococcus epidermidis; Streptococcus agalactiae; Streptococcus mutans; Streptococcus pneumoniae; Streptococcus pyogenes; Streptococcus thermophilus; Streptomyces avermitilis; Streptomyces coelicolor; Sulfolobus solfataricus; Sulfolobus tokodaii; Synechococcus; Synechoccous elongates; Synechocystis; Takifugu rubripes; Tetraodon nigroviridis; Thalassiosira pseudonana; Thermoanaerobacter tengcongensis; Thermoplasma acidophilum; Thermoplasma volcanium; Thermosynechococcus elongatus; Thermotagoa maritima; Thermus thermophilus; Treponema denticola; Treponema pallidum; Tropheryma whipplei; Ureaplasma urealyticum; Vibrio cholerae; Vibrio parahaemolyticus; Vibrio vulnificus; Wigglesworthia glossinidia; Wolbachia pipientis; Wolinella succinogenes; Xanthomonas axonopodis; Xanthomonas campestris; Xylellafastidiosa; Yarrowia lipolytica; Yersinia pseudotuberculosis; and Yersinia pestis nucleic acids.”

“In some embodiments, sources for encoding nucleic acid for enzymes for biosynthetic pathways can include, for instance, any species in which the encoded gene product is capable to catalyze the referenced reaction. Exemplary species for such sources include, for example, Aeropyrum pernix; Aquifex aeolicus; Aquifex pyrophilus; Candidatus Arcobacter sulfidicus; Candidatus Endoriftia persephone; Candidatus Nitrospira defluvii; Chlorobium limicola; Chlorobium tepidum; Clostridium pasteurianum; Desulfobacter hydrogenophilus; Desulfurobacterium thermolithotrophum; Geobacter metallireducens; Halobacterium sp. NRC-1 Hydrogenimonas thermophila Hydrogenivirga 128-5R1 Hydrogenobacter thermophilus Hydrogenobaculum. Y04AAS1; Lebetimonas acidiphila Pd55T ; Leptospirillum ferriphilum; Leptospirillum ferrodiazotrophum; Leptospirillum rubarum; Magnetococcus marinus; Magnetospirillum magneticum; Mycobacterium bovis; Mycobacterium tuberculosis; Methylobacterium nodulans; Nautilia lithotrophica; Nautilia profundicola; Nautilia sp. strain AMN; Nitratifractor saltuginis; Nitratiruptor. strain SB155-2; Paracoccus denitrificans; Paracoccus versutus; Paracoccus zeaxanthinifaciens; Persephonella marina; Rimcaris exoculata episymbiont; Streptomyces avermitilis; Streptomyces coelicolor; Sulfolobus avermitilis; Sulfolobus solfataricus; Sulfolobus tokodaii; Sulfurihydrogenibium azorense; Sulfurihydrogenibium sp. Y03AOP1; Sulfurihydrogenibium yellowstonense; Sulfurihydrogenibium subterraneum; Sulfurimonas autotrophica; Sulfurimonas denitrificans; Sulfurimonas paralvinella; Sulfurovum lithotrophicum; Sulfurovum sp. strain NBC37-1 PCC 7120; Acidithiobacillus ferrooxidans; Allochromatium vinosum; Aphanothece halophytica; Oscillatoria limnetica; Rhodobacter capsulatus; Thiobacillus denitrificans; Cupriavidus necator (formerly Ralstonia eutropha), Methanosarcina barkeri; Methanosarcia mazei; Methanococcus maripaludis; Mycobacterium smegmatis; Burkholderia stabilis; Candida boidinii; Candida methylica; Pseudomonas sp. 101; Methylcoccus capsulatus; Mycobacterium gastri; Cenarchaeum symbiosum; Chloroflexus aurantiacus; Erythrobacter sp. 101; Methylcoccus capsuleatus; Mycobacterium gastri; Cenarchaeum symbiosum; Chloroflexus aurantiacus; Erythrobacter sp. The complete genome sequence of more than 4400 species (including a variety yeast, fungi and mammalian genes) is now publicly available. This allows for identification of genes that encode the required energy conversion, carbon fixation, or carbon product biosynthetic activities for one or more genes from related or distant species. The metabolic modifications that enable methylotrophic growth or production of carbon-based goods described herein for Paracoccus denitrificans are easily applicable to all methylotrophic microorganisms. The teachings and guidance herein will help those who are skilled in the art to see that a metabolic modification demonstrated in one organism can be applied equally in other organisms.

“In certain instances, such as where an alternative energy conversion, carbon fixation or carbon product biosynthetic pathway is present in an unrelated specie, enhanced methylotrophic and carbon-based products production can be conferred onto a host species by exogenous expression a paralog (or paralogs) from the unrelated specie that catalyzes an identical, but non-identical, metabolic reaction to replace referenced reaction. The fact that different metabolic networks are different between organisms means that different gene usages might be possible. Those skilled in the art will understand this. The teachings and guidance herein will help those who are skilled in the art to understand that the invention’s methods and teachings can be applied to any microbial organism using cognate metabolic modifications. This is done to create a microbial species that produces carbon-based products from C1 compounds.

“It is important to note that you can use various engineered strains, mutations, and/or combinations of the organisms or cells discussed herein.”

“Methods to Identify and Select Candidate Enzymes For a Metabolic Activity Of Interest”

“In one aspect, this invention provides a method to identify candidate proteins or enzymes capable of performing a desired metabolism. Bayer and his colleagues took advantage of the rapid growth in gene and genome sequence databases, and the affordability of commercial gene synthesis to develop a synthetic metagenomics strategy. They used a bioinformatic search approach to identify homologous or related enzymes in sequence databases, optimize their encoding genes for heterologous expression, synthesize and clone the sequence into an expression vector, and screen for the desired function in E.coli and yeast. There may be thousands of homologs available in publicly accessible sequence databases, depending on the protein or metabolic activity of the target gene. It can sometimes be difficult or impossible to synthesize all homologs in a reasonable amount of time and at a reasonable cost. This invention addresses this problem by providing an alternative method to identify and select candidate sequences of protein for a metabolic activity. These are the steps of the method. The first step is to identify the enzyme(s) of interest for the desired metabolic activity. This could be an enzyme-catalyzed enzyme in an energy conversion or methylotrophic carbon fixation pathway. The enzyme(s), of interest, have typically been experimentally confirmed to perform the desired activity. This could be in the published scientific literature. One or more enzymes of interest may have been expressed heterologously and demonstrated functionally in some embodiments. A bioinformatic search of protein classification and grouping databases such as Entrez Protein Clusters [Klimke 2009], Tatusov 2003], Clusters of Orthologous groups (COGs] [Tatusov 1997; Tatusov 2003] and InterPro [Zdobnov 2001] is used to find protein groups that contain the enzyme(s), or closely related enzymes. For bioinformatic analysis purposes, if the enzyme(s), of interest, contain multiple subunits then the protein that corresponds to the catalytic or largest subunit is chosen. A third step is to perform an expert-guided, systematic search to determine which database groups are likely to have a majority members whose metabolism is similar to the protein(s). The fourth step is to compile a list of NCBI Protein accession number corresponding each member of each selected grouping. Finally, the corresponding sequences of protein are then downloaded from the sequence database. This set may also include sequences from other sources than the public databases. Optionally, fifth, one or more outgroup proteins are identified and added into the set. Outgroup proteins are proteins that may have some functional, structural, and/or sequence similarities with the model enzyme(s), but do not possess an essential feature or desired metabolic activity. E.C. is an example of flavocytochrome C (E.C. 1.8.2.3 is similar to sulfidequinone-oxidoreductase. 1.8.5.4 is similar to sulfide quinone oxidoreductase (E.C. Sixth, all sequences of proteins are aligned using a sequence alignment program such as MUSCLE (Edgar, 2004a, Edgar, 2004b). Seventh, the MUSCLE alignment is used to create a tree. This can be done using methods that are well-known to those who are skilled in the art such as neighbor joining or UPGMA (Sokal, 1958; Murtagh 1984). Eighth, different types of clades are chosen from the tree to ensure that there is enough screening proteins. Final, one protein is chosen from each clade for gene synthesis or functional screening using the following heuristics:

“Therefore, when constructing the engineered/or evolved methylotroph of the invention it would be obvious to those skilled in the arts that it is possible to replace/additional genes in a metabolic pathway such as an energy conversion pathway or carbon fixation pathway or a methylotrophic pathway. With homologs identified by the methods here, whose gene products catalyze similar or substantially identical metabolic reactions, it would be possible to do so using the teachings and guidance provided herein. These modifications can be used to improve the kinetic properties and/or optimize the engineered or evolved methylotroph.

“Methods to Design Nucleic Acids Encoding enzymes for Heterologous Expression”

“The present invention, in one aspect, provides a computer program product that can be used to design a nucleic acids that encode a protein or enzyme that is optimized for the target organism or host cell (the target species). The program can be stored on a hard drive that is computer-readable. It contains a number of instructions that, when executed by the processor, cause it to perform operations. These operations are part of the program. The program selects the codon at each position of the protein of concern in which the rank-order codon usage frequency in the target species of that codon is equal to the rank-order codon usage frequency in the source species gene. The source and target species must have the same genetic code (the mapping codons into amino acids [Jukes 1993],] and the codon frequency table (the frequency at which each synonymous codon occurs within a genome [Grantham 1980],) in order to select the desired codon at each position. The usage frequency of each codon can be calculated for source species with a complete sequence of their genome. This is done by adding the number of instances for that codon in all annotated sequences and then multiplying that number by 1000. The usage frequency of source species that do not have a complete genome can be calculated using any available coding sequences, or the codon frequency tables of closely related organisms. The program will then standardize the start codon to ATG and stop codons to TAA. It also allows for the second and third last codons to be converted to one of the twenty possible codons (one for each amino acid). To improve the probability that the codon optimized sequence of nucleic acids can be synthesized using commercial gene synthesis [Sambrook 2001] or DNA assembly methods [WO/2010/070295], the program runs a series if checks. The program checks for key restriction enzyme recognition site locations in DNA assembly methods or standards. It also looks at whether sequence repeats exist. If sequence motifs, G or C homopolymers greater that 5 nucleotides are present, as well as any sequence motifs that could give rise to spurious transposon sites. The program will then run all the synonymous mutation checks to determine if the codon optimized sequence of nucleic acids fails. If it does, it will create a new sequence that passes all the checks and minimizes differences in codon frequencies between the original sequence and the new sequence.

“Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application-specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These implementations may include one or more computer programmes that can be executed and/or translated on a programmable platform, including at least one processor. This processor can be either special or general-purpose, and is coupled to receive and transmit data and instructions to a storage system, an input device and at least a output device. These computer programs (also called software, software applications, code, programs) can include machine instructions for the programmable processor. They may also be implemented in any programming language, including high-level procedural or object-oriented programming languages and/or assembly/machine programming languages. Computer programs can be deployed in many forms, including standalone programs, modules, components, subroutines, and other units that are suitable for use within a computing environment. You can deploy a computer program to execute or interpret on one computer, on multiple computers at the same site, or spread across multiple sites and connected by a network.

A computer program can, in one embodiment, be stored on an electronic storage medium. Computer readable storage media stores computer data. This data may include computer program code, which can be executed by a processor or computer system. A computer readable medium can include computer readable media that is used for the tangible or fixed storage or for communication media to allow for the transient interpretation code-containing signals. Computer-readable storage media can refer to tangible or physical storage, as opposed to signals. It may also include volatile and nonvolatile media that are removable or non-removable and media that can be used in any technology or method for tangible storage of information, such as computer-readable instructions and data structures, program modules, or other data. Computer-readable storage media can include, but not be limited to: RAM, EPROM and EEPROM flash memory or another solid state memory technology, CDROM, DVD, other optical storage, magnetic cassettes magnetic tape magnetic disk storage or any other physical or materials that can be used to tangibly store information, data, instructions, and can be accessed using a computer or processor.

Program 210 could be a computer program that performs the functions and processes described above. Program 210 could contain various instructions and subroutines that, when loaded into memory206 and executed by processor204, cause processor 204 various operations. Some or all of these operations may result in the methods, processes and/or functions described herein.

Computer processing device 200 could include different input and output forms, even though they are not shown. The I/O could include network adapters and USB adapters, Bluetooth radios or mice, Bluetooth keyboards, Bluetooth radios or touchpads, displays or touch screens, Bluetooth radios or Bluetooth radios. It may also include Bluetooth speakers, microphones or sensors.

“Methods to Expression Heterologous Enzymes”

“Composite Nucleic Acids can be made to contain one or more of the following energy conversion, methylotrophic and/or carbon fixation pathways encoding nucleic acid. These composite nucleic acid can be transfected to a host organism that allows for the expression of one or more of your desired proteins. By operably linking nucleic acid sequences that encode one or more standard genetic parts with the protein(s), of interest, nucleic acid can be made into composite nucleic Acids. Standardized genetic parts are sequences of nucleic acids that have been modified to conform to a set of technical standards such as an assembly norm [Knight, 2003, Shetty 2008; Shetty 2011,]. Standardized genetic parts may encode transcriptional initiation and termination elements, translational elements, translational elements, or other elements. They can also include protein affinity tags, proteins degradation tags, protein localization tag, selectable markers as well as replication elements. The advantage of standardized genetic parts is that they can be independently validated, characterized, and then easily combined with other parts to create functional nucleic acid [Canton 2008]. Mixing and matching standard genetic parts that encode different expression control elements with nucleic Acids encoding proteins can speed up the process of achieving soluble expression and validating their function. The set of standardized parts could include constitutive promoters with varying strengths [Davis 2011, ribosome binding site of varying strength [Anderson 2007,] and protein degradation of tags of different strengths [Andersen 1998].

“Exogenous expression can be achieved in Paracoccus and other prokaryotic cell types by modifying nucleic acid encoding proteins of concern to add solubility tags to the protein of particular interest. This will ensure that the protein is soluble. The addition of the maltose binding proteins to a protein has been shown to increase soluble expression in E. coli. Chaperone proteins such as DnaK and DnaJ, GroES, GroEL, and GroES may be co-expressed or increased with proteins of interest [Greene, 1999; Kapust, 1999; Sachdev, 2000]. This promotes correct folding and assembly [Martinez Alonso 2009; Martinez Alonso 2010, 2010].

“Exogenous expression can be achieved in Parococcus and other prokaryotic cell lines by using nucleic acid sequences within the genes or cDNAs. These sequences can encode targeting signals, such as an Nterminal mitochondrial signal or another signal. These signals can be removed prior to transformation into prokaryotic host cell. E.coli expression was increased when a mitochondrial leader sequence was removed [Hoffmeister 2005]. Exogenous expression can occur in yeast and other eukaryotic cell types. Genes can be expressed in cytosol, without the addition or targeting of a leader sequence. It is possible to modify a nucleic acids sequence to include or remove a targeting sequence, and incorporate it into an exogenous sequence of nucleic acids to give desired properties.

“Example 2 shows how to introduce exogenous nucleic acid into the methylotrophic bacteria Paracoccus versutus or Paracoccus denitrificans by conjugative plasmid transmission.

“Production of Central Metabolites for the Carbon-Based Products Of Interest”

“In some embodiments, the engineered or evolved methylotroph according to the present invention produces central metabolites such as citrate, succinate and fumarate. These metabolites can also include dihydroxyacetone, trihydroxyacetone phosphate (DHA), 3-hydroxypropionate, and pyruvate. The engineered or evolved methylotroph creates central metabolites either as intermediates or products of carbon fixation, methylotrophic pathway, or as intermediates or products of host metabolism. One or more transporters can be expressed in an engineered or evolved methylotroph to allow the cell to export the central metabolism. One or more members of the C4-dicarboxylate carrier family, which is a group of enzymes, are responsible for exporting succinate from cells to the media [Janausch 2002; Kim 2007,]. These central metabolites are easily converted into other products (FIG. 7).”

“In some instances, the engineered or evolved methylotroph might interconvert between central metabolites to create alternate carbon-based products. One embodiment of the engineered or evolved methylotroph can produce aspartate by expressing one (or more) aspartate aminotransferase(E.C.). 2.6.1.1) to convert L-glutamate and oxaloacetate to L-aspartate, and 2-oxoglutarate.

“In another embodiment, an engineered and/or developed methylotroph produces dihydroxyacetonephosphate by expressing one (or more) dihydroxyacetonekinases. 2.7.1.29), such C. freundii DhaK to convert dihydroxyacetone to ATP.

“In another embodiment, an engineered or evolved methylotroph produces serine in the carbon-based product. The metabolic reactions required for serine biosynthesis are: phosphoglycerate deshydrogenase, (E.C. 1.1.1.95), phosphoserine transaminase (E.C. 2.6.1.52, phosphoserine transaminase (E.C. 3.1.3.3). Phosphoglycerate dehydrogenase, such as E. coli SerA, converts 3-phospho-D-glycerate and NAD+ to 3-phosphonooxypyruvate and NADH. Phosphoserine transaminase, such as E. coli SerC, interconverts between 3-phosphonooxypyruvate+L-glutamate and O-phospho-L-serine+2-oxoglutarate. E. coli SerB converts O-phosphor-L-serine into L-serine.

“In another embodiment, an engineered or evolved methylotroph produces glutamate in the carbon-based product. Glutamate dehydrogenase is one of the metabolic reactions required for glutamate biosynthesis. 1.4.1.4. E.g., E. coli GDhA), which converts??ketoglutarate (NH3) and NADPH into glutamate. The following diagram shows how Glutamate can be converted into other carbon-based products. 8.”

“In another embodiment, an engineered or evolved methylotroph produces itaconate to be the carbon-based product. Itaconate biosynthesis requires aconitate encarboxylase (E. C. 4.1.1; such as the one from A. terreus), which converts cisaconitate into itaconate, and CO2. The following diagram shows how itaconate can be converted into other carbon-based products. 8.”

“Production Sugars as Carbon-Based Products Of Interest”

“Industrial production from biological organisms of chemical products is often achieved using a sugar source such as glucose or fructose as the feedstock. Hence, in certain embodiments, the engineered and/or evolved methylotroph of the present invention produces sugars including glucose and fructose or sugar phosphates including triose phosphates (such as 3-phosphoglyceraldehyde and dihydroxyacetone-phosphate) as the carbon-based products of interest. Interconversion may be possible for sugars and sugarphosphates. E.C. 5.3.1.9; e.g., E. coli Pgi) may interconvert between D-fructose 6-phosphate and D-glucose-6-phosphate. Phosphoglucomutase, E.C. 5.4.2.2; e.g., E. coli Pgm) converts D-?-glucose-6-P to D-?-glucose-1-P. Glucose-1-phosphatase (E.C. 3.1.3.10: e.g. E.coli Agp converts D?-glucose-1?P to D?-glucose. Aldose 1-epimerase (E.C. 5.1.3.3. e.g. E.coli GalM) D?-glucose or D-??-glucose. Optionally, the sugars and sugar phosphates can be exported from the engineered or evolved methylotroph into culture medium.

“Sugar-phosphates can be converted to sugars by dephosphorylation, which occurs intra- or extracellularly. For example, phosphatases such as a glucose-6-phosphatase (E.C. 3.1.3.9) or glucose-1-phosphatase (E.C. 3.1.3.10) or glucose-1-phosphatase (E.C. Exemplary phosphatases include Homo sapiens glucose-6-phosphatase G6PC (P35575), Escherichia coli glucose-1-phosphatase Agp (P19926), E. cloacae glucose-1-phosphatase AgpE (Q6EV19) and Escherichia coli acid phosphatase YihX (POA8Y3).”

“Sugarphosphates can also be exported via transporters from engineered or evolved methylotrophs into culture media. Sugar phosphate transporters act generally as anti-porters of inorganic phosphate. A. thaliana triose phosphatetransporter APE2 is an example of a triose phosphate transportationer (Genbank accession at5G46110.4). E.coli’s sugar phosphate transporter, UhpT (NP_418122.1), A. Thalian glucose-6-phosphate Transporter GPT1(AT5G54800.1) and A. thaliana GPT2 (or homologs thereof) are examples of glucose-6 phosphate transporters. You can also dephosphorylate glucose-6-phosphate and add it to glucose transport using Genbank accession numbers AAA16222., AAD19898., O43826.

Permeases allow sugars to diffuse from engineered or evolved methylotrophs into culture media. H. sapiens glucose transporter, GLUT-1, -3 or -7 (P11166; P11169; Q6PXP3) and S. cerevisiae glucose transporter HXT-1 (P32465,P32467,P39003). Synechocystis species sp. (P21906). 1148 glucose/fructose:H+ symporter GlcP (T.C. 2.A.1.1.32; P15729) [Zhang, 1989], Streptomyces lividans major glucose (or 2-deoxyglucose) uptake transporter GlcP (T.C. 2.A.1.1.35, Q7BEC4 [van Wezel 2005], Plasmodium falciparum hexose transporter PfHT1 2.A. 2.A. One or more active transporters can be introduced into the cell to allow active efflux of sugars from engineered and/or evolved methylotrophs. Examples of transporters are the mouse glucose transporter (GLUT 1) or its homologs.

Summary for “Methods, systems and methods for methylotrophic organic compound production”

Heterotrophs are organisms that use energy from organic compounds to grow and reproduce. The commercial production of many carbon-based products is largely dependent on heterotrophic organisms, which ferment sugar from crop biomass like corn or sugarcane. [Bai 2008]. Alternative to bio-production based on fermentation is the production carbon-based products from photosynthetic organisms like plants, algae, and cyanobacteria that get their energy from sunlight and carbon dioxide to support growth [U.S. Patent. No. 7,981,647]. The production of carbon-based products using algae is dependent on photosynthesis, which is a slow process that produces energy from carbon dioxide. [Larkum 2010]. Photobioreactor design remains a challenging technical problem because commercial production of carbon-based products requires consistent and reliable exposure to light.

“Methylotrophs” are biological organisms that use energy and/or carbon from C1 substances containing no carbon?carbon bonds like formate, formic acids, formaldehyde and methane to make all multi-carbon organic compounds required for growth and reproduction. The majority of naturally occurring methylotrophs have not been shown to be commercially viable for industrial bio-processing. These organisms are more productive than industrialized heterotrophic organisms like Escherichiacoli due to their long doubling times. Techniques for genetic manipulation (homologous transformation, transfection or transformation of nucleic acids molecules and recombinant genes expression) are slow, tedious, laborious, or non-existent.

“There is a need to create engineered or evolved methylotrops that can be used for industrial purposes. The ability to create methylotrophs with biosynthetic capabilities to produce carbon-based products would allow for more efficient and energy-efficient production. The methylotroph could also be equipped with additional pathways to energy conversion, methylotrophy, and/or carbon fixation that would increase its ability to produce the carbon-based products of interest.

The present invention provides efficient production of carbon-based energy and other products of interest (e.g. fuels, sugars, chemicals) using C1 compounds. The present invention also allows for the replacement of conventional methods of producing chemicals, such as olefins (e.g. ethylene, propylene), that are traditionally derived using petroleum. This process generates toxic byproducts, which are harmful to the environment and are considered hazardous waste pollutants. The present invention is able to avoid the use and generation of toxic byproducts from petroleum. It also materially improves the environment’s quality by helping to maintain basic elements like air, water, and soil.

“In accordance with certain aspects, the invention provides a methylotroph that can be used to produce various carbon-based products from C1 compounds. One or more engineered carbon product biosynthetic pathways are used to convert central metabolites to desired products. The carbon-based products of concern include, but aren’t limited to, alcohols. fatty acids, their derivatives. fatty alcohols. fatty acids. fatty wax esters. hydrocarbons. fuels. commodity chemicals. specialty chemicals. carotenoids. sugars. phosphates. central metabolites. pharmaceuticals. intermediates. For example, the carbon-based products of interest can include one or more of a sugar (for example, glucose, fructose, sucrose, xylose, lactose, maltose, pentose, rhamnose, galactose or arabinose), sugar phosphate (for example, glucose-6-phosphate or fructose-6-phosphate), sugar alcohol (for example, sorbitol), sugar derivative (for example, ascorbate), alcohol (for example, ethanol, propanol, isopropanol or butanol), fermentative product (for example, ethanol, butanol, lactic acid, lactose or acetate), ethylene, propylene, 1-butene, 1,3-butadiene, acrylic acid, fatty acid (for example, co-cyclic fatty acid), fatty acid intermediate or derivative (for example, fatty acid alcohol, fatty acid ester, alkane, olegin or halogenated fatty acid), amino acid or intermediate (for example, lysine, glutamate, aspartate, shikimate, chorismate, phenylalanine, tyrosine, tryptophan), phenylpropanoid, isoprenoid (for example, hemiterpene, monoterpene, sesquiterpene, triterpene, tetraterpene, polyterpene, isoprene, bisabolene, myrcene, amorpha-4,11-diene, farnesene, taxadiene, squalene, lanosterol, ?-carotene, ?-carotene, lycopene, phytoene, limonene, or polyisoprene), glycerol, 1,3-propanediol, 1,4-butanediol, 1,3-butadiene, polyhydroxyalkanoate, polyhydroxybutyrate, lysine, ?-valerolactone, and acrylate. The carbon-based products can also include carbon-based centralmetabolites in some embodiments.

The resulting engineered methylotroph is capable of synthesizing efficient carbon-based products from C1 compounds. The invention provides carbon product biosynthetic routes for conferring biosynthetic manufacturing of the carbon-based product on host organisms that lack the ability to produce C1 compounds efficiently. The invention provides methods to introduce the carbon product biosynthetic pathway into the methylotroph. The invention provides media compositions and methods for cultivating the engineered or evolved methylotrophs to facilitate efficient methylotrophic production and other carbon-based products.

“The invention allows the C1 compound to be used as an energy source and carbon source for various embodiments. The C1 compound can be soluble in water or made to dissolve in water. The C1 compound could be formate, formic, methanol, and/or formaldehyde. Because of the efficiency of mass transfer and uptake, C1 compounds that dissolve in high concentrations or are easily dispersed in water are preferred to immiscible or less soluble chemical species like methane. Similar to Example 7, soluble C1 substances are preferred over molecular hydrogen or carbon dioxide, which can be used in the autotrophic production and use of carbon-based chemicals (see example 7). The composition of the media used to grow the organism may allow the C1 compound to dissolve in other solvents than water. Other components may enhance the solubility or the solubility of C1 compounds in media. The C1 compound may be obtained from electrolysis in some instances.

“In certain embodiments, one of the following carbon product biosynthetic routes can be used:

“In some embodiments, the 1-deoxy-D-xylulose-5-phosphate synthase can be encoded by SEQ ID NO: 1, or a homolog thereof having at least 80% sequence identity; or the isopentenyl pyrophosphate isomerase can be encoded by SEQ ID NO:2, or a homolog thereof having at least 80% sequence identity. One embodiment allows for the inclusion of isoprene synthase in said carbon product-biosynthetic pathway. SEQ ID NO.3 or a homolog with at least 80% sequence identity can encode the isoprene synthase. Another embodiment of the carbon product biosynthetic pathway includes E-alphabisabolene synthase. SEQ ID No:4 or a homolog with at least 80% sequence identity can encode the E-alphabisabolene synthase.

“In certain embodiments, the methylotrophic organism can be selected from the class Alphaproteobacterium. Paracoccus may also be used as the genus Paracoccus to select the methylotrophic organism. Paracoccus denitrificans or Paracoccus versutus can all be methylotrophic organisms.

“In some embodiments, an engineered cell can have a lower growth rate for electrolytically produced C1 compounds relative to nonevolved methylotrophic organism. Or a substantially comparable or increased growth rate for electrolytically produced C1 compounds relative to nonelectrolytically generated. The engineered cell may be further modified to reduce the growth rate of electrolytically produced C1 compound relative non-evolved methylotrophic organism or to increase the growth rate of electrolytically created C1 compound relative non-electrolytically.

“Another aspect is that an evolved methylotrophic animal is provided. It has a lower growth rate for electrolytically produced C1 compounds relative to nonevolved, or has a substantially similar growth rate or enhanced growth rate for electrolytically generated compound relative to nonelectrolytically generated.

“A further aspect of the invention is that a method of selecting an evolved methylotrophic species with improved growth on C1 compounds is provided. This involves: continuously monitoring the concentration of biomass in the culture room; and setting a flow rate for the C1 compound into this chamber to ensure a constant environment that promotes a higher growth rate. The method may also include the adjustment of the medium inflow to make it more conducive to growth or less inhibitive to growth. This is done to ensure that the cells are fit and healthy. The C1 compound may be formatted in certain cases. The C1 compound can be electroly generated. The C1 compound may also be soluble in water.

“Another aspect of the invention is a composition for bacterial cultivation, which is designed to provide formate as the only source of C1 compound and enhance the growth of methylotrophic bacteria. The composition may contain between 0 to 160 mM salt bicarbonate, between 16 and 100 mg sodium chloride, between 30 and 60 mM nitrate, and between 5 and 100 mg of sodium thiosulfate. The composition could contain 100 mM of sodium bicarbonate and 6 mM of sodium chloride, 6 mg sodium nitrate and 11 mM respectively. It also can have 26 mM ammonium or sodium formate. A basal minimal medium can also be included in the composition. The basal minimal media can, in some embodiments be M9 minimal medium or MOPS minimal medium. R medium or M63 medium may also be used.

“In another aspect, a composition for bacterial culture is also provided. It is designed to provide formate, the only C1 compound, and to increase the growth of methylotrophic bacterium in a fed batch bioreactor. A composition may include R medium, which is initially charged in a fed-batch bioreactor, supplemented with between 100 and 1 micromolar sodium molybdate and between 10 and 1000 nanomolar salt selenite, 0.01 to 1. mg/L cobalamin, and between 0.001 and 1 mg/L thiamine. The medium may contain, for example, 5 to 20 micromolar sodium molybdate, 50 to 200 nanomolar sodium selnite, 0.01 to 0.2 mg/L of cobalamin, and between 0.05 and 2 mg/L thiamine. A feed composition that is supplied to the fed batch bioreactor can also be included. It may contain a formate sodium at supramolar concentration. Formate salts can include ammonium formate or sodium formate. A supramolar amount of nitrate salt, such as sodium nitrate, can be added to the feed composition. The molar ratio between the nitrate salts and formate salt can be 3.2:8, 3.0,8 or lower.

“A method for cultivating methylotrophic bacterium is also provided in this document. It involves incubating methylotrophic bacterial in any of these compositions. In certain embodiments, incubating can also be done aerobically. Incubating can be done in a fed batch bioreactor. Some embodiments allow for a C1 feedstock consumption rate exceeding 1.5 g*L 1 hr. In some embodiments, incubating can take place with a C1-feedstock electron donor and a nitrate sodium salt as electron acceptor. The C1 feedstock can be a formate salt such as ammonium or sodium formate. The molar ratio between the nitrate and the formate salts can be maintained below 3.2 to 8 during incubation. The fed-batch bioreactor can be supplied with the nitrate and formate salt in a feed composition at supramolar concentrations, with a molar relationship of 3.0 to 8. The formate salt may be either ammonium formate or sodium formate in some embodiments. You can use sodium nitrate as the nitrate salt.

“Some aspects of growth of the methylotroph upon C1 compounds can also be enhanced by the addition or alternative pathways for energy conversion and/or methylotrophy, and/or carbon fixation. U.S. Pat. describes examples of energy conversion pathways as well as carbon fixation paths. No. No. 8.349,587. The entirety of this document is hereby incorporated by reference.”

The invention concerns the development and use of engineered and/or developed methylotrophs that can utilize C1 compounds to make desired products. This invention allows for the engineering of a new methylotroph such as Paracoccus denitrificans or Paracoccus versutus, or any other organism that is suitable for large-scale commercial production of chemicals and fuels. It can effectively utilize C1 compounds for growth. The chemical production process provides cost-effective processes for making carbon-based products. These organisms can be quickly optimized and tested at low costs. Further, the invention allows for engineering a methylotroph that includes one or more alternative pathways for utilizing C1 compounds to create central metabolites and/or other products.

C1 compounds are an alternative to sugar or light pluscarbon dioxide for the production carbon-based products. Non-biological methods exist to convert C1 compounds into chemicals or fuels. The Fischer-Tropsch process, for example, uses carbon monoxide as well as hydrogen gas from the gasification of biomass or coal to make methanol and mixed hydrocarbons. No. 1,746,464]. Fischer-Tropsch processes have several drawbacks. The Fischer-Tropsch process is considered an expensive and environmentally harmful method of generating liquid fuels. There are alternative processes that use naturally occurring microbes to convert synthesis gas (or syngas), a mixture primarily of molecular hydrocarbon and carbon monoxide. These can be obtained by gasification of any organic material, such as coal, oil, biomass, or other organic materials, to produce products such as ethanol or acetate or molecular Hydrogen [Henstra 2007]. These naturally occurring microbes are not able to produce a wide range of products and lack the tools necessary for genetic manipulation. They are also sensitive to high levels of their end products. There is work underway to integrate syngas utilization into industrial microbe hosts [U.S. Patent. No. No. 7,803,589]

“The invention allows for the engineering or evolution of methylotrophic organisms which are useful and/or suitable to be used in industrial applications. The invention provides a source for renewable energy. The invention allows for the use, in some embodiments, of a C1 chemical, such as formate or formic acid, formaldehyde or methanol, or any combination thereof. One embodiment of the C1 compound may be obtained from electrolysis. The commercial interest in renewable and/or carbon neutral energy is high. This includes solar voltaic and geothermal as well as nuclear and hydroelectric. These technologies are limited in their use of the electrical grid because they produce electricity. Some of these renewable energy sources, such as wind and solar, are intermittent and unreliable. There are not enough practical, large-scale electricity storage technologies to allow for the transfer of more electricity from renewable sources. It would be possible to store electricity in chemical form such as carbon-based products of interests. This would allow large-scale storage and also enable renewable electricity to meet the energy needs of the transportation sector. As one aspect of this invention, the combination of renewable electricity and electrolysis such as the electrochemical formation of formate/formic from carbon dioxide (for example, see WO/2007/041872] as well as formaldehyde/methanol from carbon dioxide (for example: WO/2012/015909; WO/2012/015905), opens up the possibility for a sustainable, reliable supply of C1 compounds.

“Some embodiments allow for the use of C1 compounds, such as formaldehyde or methanol, that are derived from waste streams. Formaldehyde, for example, is an oxidation product from methanol and methane. Methanol can also be made from synthesis gas (the main product of gasification, such as coal oil, natural gas and carbonaceous materials like biomass and agricultural crops, residues and waste organic matter), or by chemically synthetic processes that reduce carbon dioxide and hydrogen. Natural gas contains a significant amount of methane. It can also be extracted from renewable biomass.

“The invention allows for the expression of exogenous proteins and enzymes in the host cells, thus conferring biosynthetic pathways to use central metabolites to make reduced organic compounds. An engineered cell may also have one or more carbon product biosynthetic pathway(s) that convert central metabolism into desired products.

“The invention is described with reference to the reaction or product of a metabolic reaction. However, it also includes specific references to nucleic acid or genes that encode an enzyme, catalyzing, and/or a protein associated to the reaction or product. As stated elsewhere, the term reaction refers to both the reactants or products of the reaction. A reference to a reaction or product refers to the reaction. Likewise, any reference to any metabolic constituent also refers to the gene or genes that encode the enzymes or proteins involved with the referenced reaction, reactant, or product. Given the well-known fields in metabolic biochemistry, genomics and enzymology, any reference to a gene, encoding nucleic acids, or the reaction it catalyzes, or to a protein involved with the reaction, as well as to reactants or products, is also a reference to that enzyme.

“Definitions”

“As used herein the terms ‘nucleic acid molecule?,?, and? ?nucleic acid molecule? ?polynucleotide and?nucleic acid molecule? These terms can be interchanged and may include single-stranded and double-stranded RNA, DNA, and RNA:DNA hybrids. The terms “nucleic Acid?”, “nucleic Acid molecule?”, “polynucleotide?”, “oligonucleotide?”, and ‘oligomer?” are used herein. ?oligo? and?nucleic acid? are interchangeable. They are interchangeable and can be used interchangeably. Oligos can be as short as 5 to 200 nucleotides or as long at about 100 nucleotides or 30 to 50 nucleotides. You can use shorter or longer oligonucleotides. The present invention allows for the creation of oligos. A nucleic acid molecule can encode either a full-length or fragment of a polypeptide, or it may not.

“Nucleic acid” can be used to refer to either naturally occurring or synthetic polymeric nucleotides. The present invention’s oligos or nucleic acids molecules may be made from naturally occurring nucleotides. They can form deoxyribonucleic (DNA) and ribonucleic (RNA) molecules. Alternately, naturally occurring oligonucleotides can be modified to alter their properties such as in peptide or locked nucleic acid (LNA). It is important to understand that the terms can also refer to analogs, RNA and DNA made from nucleotide analogues. The embodiment described here may be applicable to single-stranded or dual-stranded polynucleotides. Nucleotides useful in the invention include, for example, naturally-occurring nucleotides (for example, ribonucleotides or deoxyribonucleotides), or natural or synthetic modifications of nucleotides, or artificial bases. For increased stability, modifications can include phosphorothioated base.

“Complementary nucleic acid sequences” These are the ones that can base-pairing according a standard Watson-Crick complementarity rule. The term “complementary sequences” is used herein. It refers to nucleic acids sequences that are substantially complementary, as can be determined by the nucleotide comparators and algorithms below. Also, it may be defined as being capable or capable of hybridizing with the polynucleotides that code the protein sequences.

“Gene” is the term used herein. “Gene” refers to any nucleic acid that contains information required for the expression of a protein, polypeptide, or untranslatedRNA (e.g., RRNA, tRNA and anti-senseRNA). The gene that encodes a protein includes the promoter, the structural gene open-reading frame sequence (ORF) and other sequences that are involved in the expression of the protein. The gene that encodes untranslatedRNA includes the promoter as well as the nucleic acids that encode the untranslatedRNA.

“The term ‘genome? as used herein refers to the whole hereditary information of an organism. The term “genome” refers to all of the hereditary information in an organism encoded within the DNA (orRNA for some viral species). This includes both coding and uncoding sequences. The term can include both the chromosomal and/or non-coding DNA. A ?native gene? or ?endogenous gene? Refers to a gene that is natively expressed in the host cell and has its own regulatory sequences, whereas an “exogenous” gene is? or ?heterologous gene? A gene that is not a natural gene and contains regulatory or coding sequences not native to the host cells. A heterologous gene can contain mutated sequences and/or part of regulatory or coding sequences. Some regulatory sequences can be homologous or heterologous to a particular gene. A heterologous regulatory sequence is not designed to regulate the same gene in nature as it is in the transformed host cell. ?Coding sequence? Refers to a DNA sequence that codes for a particular amino acid sequence. The term “regulatory sequences” is used herein. Refers to the nucleotide sequences upstream (5? Non-coding sequences, within or downstream (3? Non-coding sequences), within, or downstream (3? These regulatory sequences include promoters and ribosome binding sequences, translation leader sequences as well as RNA processing sites, effector binding sites (e.g. activator, regulator), stem-loop structures, etc.

“A genetic element can be any non-coding or coding sequence of nucleic acids, as described herein. A genetic element may be a nucleic that codes for an amino, a peptide, or a protein in some embodiments. Gene elements can be genes, genes fragments or operons. They may also include promoters, exons and introns. Gene elements may only have one or two codons, or they could include functional components such as e.g. Encoding proteins and/or regulatory elements. A genetic element may include the entire open-reading frame of a protein or the entire open-reading frame with one or more regulatory sequences. The genetic elements can also be considered modular genetic elements, or genetic modules by those skilled in the art. A genetic module may include a regulatory sequence, promoter, or coding sequence, or any combination thereof. In some embodiments, the gene element may include at least two genetic modules and at most two recombination spots. The genetic element in eukaryotes can include at least three modules. A genetic module could be, for example, a regulator sequence, a promoter or a coding sequence and a polyadenlylationtail or any combination thereof. The nucleic acid sequence can also include the coding sequences and the promoter. The leader sequence is an operable link between the 5? terminus of the coding sequence nucleic acids sequence. Signal peptide sequence codes an amino acid sequence that is linked to the amino terminus, which directs the polypeptide into a cell’s secretion pathway.

A codon, as it is commonly understood, is a sequence of three nucleotides or triplets that encode a specific amino acids residue in a polypeptide chains or for terminating translation (stop codons). There are 64 codons, 61 codons for amino acids and 3 stop codons. However, there are only 20 translated amino acids. Many amino acids can be encoded by multiple codons due to the overabundance of codons. Different organisms and organelles may have different preferences or biases about which codon encodes the same amino acid. The frequency with which a codon is used varies according to the organelle and organism. It is possible to alter the sequence of a gene to make it more compatible with the frequency and codon usage in a host. It is preferable to use codons that correspond with the host’s level of tRNA, particularly those that are charged even during starvation, in order to ensure reliable expression. Codons with rare cognate transcript tRNA’s can also affect protein folding and translation rates and may be used. Genes that are optimized for codon usage bias or relative tRNA abundance in the host are often called “optimized”. You can increase the expression level by optimizing codon usage. High accuracy and faster translation rates can be achieved by using optimal codons. Codon optimization is a silent mutation that does not alter the amino acid sequence of a protein.

“Genetic elements and genetic modules can be derived from natural or synthetic polynucleotides, or a combination of both. The genetic elements modules may be derived from different organisms in some instances. The genetic elements and modules that are useful in the methods described can be obtained from many sources, including DNA libraries, BAC (bacterial Artificial Chromosome) libraries, chemical synthesis de novo, excision or modification of a genome segment. These sequences can then be modified with standard molecular biology or recombinantDNA technology to create polynucleotide structures that have the desired modifications for reintroduction or construction of large product nucleic acids, such as modified, partially or fully synthetic genomes. Several methods are available to modify polynucleotide sequencing obtained from a library or genome, including site directed mutagenesis, PCR mutagenesis, inserting, deleting, or swapping sections of a sequence with restriction enzymes optionally combined with ligation, in vitro and in vivo homologous recombination, and site-specific recombination, or any combination thereof. Other embodiments may use synthetic oligonucleotides and polynucleotides as the genetic sequences. There are many methods that can be used to synthesize synthetic oligonucleotides and polynucleotides.

“In some instances, genetic elements share less that 99%, less over 95%, less about 90%, less then 80%, and less than 70% sequence identity. Each sequence can be identified by comparing the position of each amino acid in order to determine its identity. If the same amino acid or base occupies an identical position in the sequences, the molecules will be considered to be identical. However, if the site is occupied at the same location by a similar or identical amino acid residue (e.g. similar in steric or electronic nature), the molecules can then be called homologous (similar). The expression of identity, homology, similarity or identity is a function the number of identical or similarly named amino acids in the positions shared by the compared sequencing. The percentage of homology or similarity or identity is a function the number of identical and similar amino acids found in the positions shared by the compared sequencing. FASTA, BLAST and ENTREZ are all possible alignment algorithms. ENTREZ can be obtained through the National Center for Biotechnology Information at National Library of Medicine, National Institutes of Health in Bethesda (Md.). One embodiment of the GCG program can determine the percent identity between two sequences with a gap weight 1; e.g. each amino acid gap is weighted like a single nucleotide or amino acid mismatch. There are other methods of aligning [Doolittle 1996]. To align sequences, it is preferable to use an alignment program that allows gaps in the sequence. Smith-Waterman allows gaps in sequence alignments [Shpaer 1997]. To align sequences, you can also use the GAP program that uses the Needleman or Wunsch alignment methods. MPSRCH software runs on a MASPAR machine. This is an alternative search strategy. MPSRCH scores sequences using a Smith-Waterman algorithm on a hugely parallel computer.

“An ‘ortholog? is a term used herein. An ortholog is a gene (or set of genes) that is related via vertical descent and which perform substantially the same functions in different organisms. Orthologs can be made for biological functions such as hydrolysis of epoxides. Vertical descent is when genes are related if they have sufficient sequence similarity to show that they are homologous or are related by evolution from the same ancestor. Orthologs are also possible if genes share a three-dimensional structure, but not necessarily a sequence similarity. This indicates that they evolved from a common ancestral source. Orthologous genes can encode proteins that have a sequence similarity between 25% and 100%. Vertical descent can also be used to consider genes encoding proteins with an amino acid similarity of less than 25%. If their three-dimensional structures also show similarities, they may also be considered vertical descendent. Vertical descent is considered to be the cause of all members of the serine protease enzyme family, which includes tissue plasminogen activate and elastase. Orthologs are genes or encoded gene products that have evolved to differ in their structure or activity. If a species encodes a gene product that has two functions, and these functions have been split into distinct genes in another species, then the orthologs are three genes and their respective products. The production of a biochemical product requires that an orthologous gene which has the same metabolic activity is chosen to construct the non-naturally occurring microorganism. One example of orthologs with separable activities is when distinct activities are separated into distinct gene product between two or more species, or within one species. One example is the seperation of plasminogen proteolysis (two types of serine protease activity) and elastase proteolysis (two types of serine protease activator), into distinct molecules, plasminogen activator or elastase. Another example is the seperation of mycoplasma 5-3? Drosophila DNA Polymerase III activity and exonuclease. The DNA polymerase of the first species can be considered an ortholog for either or both the exonuclease and polymerase of the second species, and vice versa.

“Paralogs” is a synonym for “in contrast”, as it is used herein. Paralogs are homologs that are related through, for instance, duplication and evolutionary divergence. They have similar or identical functions, but they do not necessarily have the same functions. Paralogs can be derived from the same species, or from another species. Paralogs can include microsomal and soluble epoxide hydrlases (epoxide hllase I and II), which are two different enzymes that co-evolved in the same ancestor. They catalyze different reactions and perform distinct functions within the same species. Paralogs are proteins of the same species that have significant sequence similarities to one another, suggesting they are either homologous or related by co-evolution from common ancestors. Paralogous protein families can be divided into HipA homologs and luciferase gene genes, peptidases and other groups.

“As used herein, a ?nonorthologous gene displacement? A non-orthologous gene of one species that can replace a referenced function in another species. Substitution can be defined as being able to perform substantially similar functions in the species of origin to the referenced function. A nonorthologous gene displacement can be identified as structurally related with a gene encoding the referenced function. However, functionally more similar but less structurally related genes and their corresponding products still fall within the definition of the term as it is used in this document. Functional similarity is, for instance, a minimum of structural similarity in a non-orthologous product’s active site or binding area to the gene that encodes the function to be substituted. A non-orthologous gene can also include a paralog, or unrelated gene.

Methods that are well-known to the skilled in the field can determine “Orthologs,” “Parlogs” and “nonorthologous gene displacements.” An example of this is the inspection of two polypeptide sequences, either amino acid or nucleic acids. This can show similarities and sequence identity. One skilled in the art can use such similarities to determine if the similarity is sufficient to prove that the proteins are related by evolution from a common ancestral. Algorithms such as Align BLAST, Clustal W, Clustal W, and others can be used to determine a sequence’s identity or raw sequence similarity. They also help determine the significance or presence of gaps in the sequence that can be assigned a score or weight. These algorithms are also used to determine nucleotide sequence identity or similarity. The parameters for sufficient similarity to determine similarity are calculated based on well-known methods of calculating statistical similarity. This is the probability of finding a match in random polypeptides. The significance of the match is also determined. If desired, computer comparisons of multiple sequences can be visually optimized by skilled artists. Similar gene products and proteins can have a high degree of similarity. For example, 25 to 100 percent sequence identity. Unrelated proteins can have an identity that is almost identical to what would be expected by chance if there is enough data (about 5%). It is possible for sequences between 5% to 24% to be homologous enough to prove that they are related. To determine the relevancy of these sequences, additional statistical analysis can be performed to determine their significance given the size of this data set. Below are some examples of parameters that can be used to determine the relatedness between two or more sequences by using the BLAST algorithm. BLASTP version 2.0.8 (January 5, 1999) can be used to align amino acid sequences. It uses the following parameters: Matrix = 0 BLOSUM62; gap close: 11; gap extension = 1; x_dropoff= 50; expect: 10.0, wordsize: 3 and filter: on. BLASTN version 2.0.6 (Sept. 16, 1998) can be used to align nucleic acids. The following parameters are required: Match: 1, mismatch:??2, gap open: 5, gap extension: 2, x_dropoff 50, expect: 10.0, wordsize: 11, filter: off The art is not difficult if you are skilled enough to modify the parameters above to increase or decrease stringency, or to determine the relationship between two or more sequences.

“Homolog” is used herein. “Homolog” can be used to refer to any ortholog, paralog or non-orthologous gene or similar gene that encodes an enzyme that catalyzes a similar or substantially identical metabolic reaction from different species.

“Homologous Recombination” is the term used herein. The process by which nucleic acids molecules with similar nucleotide sequences exchange nucleotide strings is called homologous recombination. A nucleotide sequence in a first nucleic acids molecule that is capable of engaging in homologous replication at a predetermined position of a second nucleic acids molecule can have a nucleotide sequencing that facilitates nucleotide exchange between the nucleic acids molecule and the defined position of the second. The nucleotide sequence of the first nucleic acids can be sufficiently complementary to the nucleotide sequence of the second nucleic acids molecule to encourage nucleotide pairing. Homologous Recombination requires homologous sequences from the two recombining nucleic acid partners, but not any particular sequences. It is possible to introduce heterologous nucleic acids and/or mutations to the host genome using homologous recombination. These systems rely on the sequence flanking the heterologous nuclear acid to be expressed to have enough homology with the target sequence in the host cell genome. Recombination between vector nucleic acids and target nucleic acids takes place, allowing the delivered nucleic Acid to be integrated into host genome. The art of homologous recombination is well-known to those skilled in the field.

“It is important to realize that the nucleic acids sequence or gene of interest can be derived from natural organisms’ genomes. In certain embodiments, genes may be extracted from either the host genome or the genome of a naturally occurring organism. In vitro enzymatic and in vivo excisions and amplifications can be used to excision large genomic fragments. The FLP/FRT site-specific recombination system, and the Cre/loxP website-specific recombination software have been used to efficiently excision large genomic pieces for the purpose of sequence [Yoon 1998]. Some embodiments allow for excision and amplification to aid in artificial genome assembly or chromosome assembly. The chromosome of the methylotroph may be removed from which genomic fragments can be altered and inserted into the artificial genome or the chromosome in the host cell. The engineered promoters or other gene expression elements can be used to assemble the genomic fragments excised and then inserted into the host cell’s genome.

“Polypeptide” is the term used herein. A sequence of contiguous amino acid of any length. The terms “peptide”,? ?oligopeptide,? ?protein? or ?enzyme? This term may be interchangeable with the term “polypeptide?” In certain instances, ?enzyme? A protein with catalytic activities.

“A ?proteome? A proteome is the whole set of proteins that are expressed by any organism, cell, tissue, or genome. It is, more specifically, the entire set of proteins expressed in a particular type of cell or organism at a specific time and under certain conditions. Transcriptome refers to all RNA molecules including mRNA and rRNA. The complete set of small-molecule metabolic metabolites, including hormones, signaling molecules and metabolic intermediates, that can be found in a biological sample (such as one organism) is called the Metabolome.

“The term “fuse” is used in the following sentences: ?fused? ?fused? The covalent linkage of two polypeptides within a fusion protein is called?link? The polypeptides can be joined by a peptide linkage, which is either directly to one another or via an amino- acid linker. The peptides may also be joined using non-peptide covalent links, which are well-known to those skilled in the art.

“As used herein except where otherwise noted, the term ‘transcription? The synthesis of RNA using a DNA template is the term; the term “translation” refers to the synthesis. The synthesis of a polypeptide using an mRNA template. The sequence and structure 5? regulates translation. Untranslated region (?-UTR), of the mRNA transcript. The ribosome binding sequence (RBS) is one regulatory sequence that promotes accurate and efficient translation of mRNA. Prokaryotic RBS is a Shine-Dalgamo-rich sequence of 5’-UTR. This sequence is complementary to the UCCU core sequence at the 3?-end 16S rRNA (located in the 30S small ribosomal unit). There are many Shine-Dalgamo sequences that have been identified in prokaryotic DNAs. They generally reside about 10 nucleotides downstream of the AUG start codon. The length and nucleotide content of the spacer that separates the RBS from the initiator AUG can influence the activity of a RBS. The Kozak sequence A/GCCACCAUGG is found in eukaryotes. It lies within a 5? Translation of mRNA is controlled by the untranslated area. If the mRNA lacks the Kozak consensus sequence, it may still be translated in vitro if it has a moderately long 5-?-UTR without a stable secondary structure. E.coli prefers to recognize the Shine?Dalgamo sequence. However, eukaryotic Ribosomes (such retic lysate) are able to efficiently use either the Shine?Dalgamo ribosomal binding site or the Kozak.

“As used herein the terms ‘promoter,’? ?promoter element,? Or?promoter element? A DNA sequence that, when ligated with a nucleotide sequencing of interest, is capable of controlling transcription of that nucleotide sequence into mRNA. A promoter is usually located at 5? It is located upstream of the nucleotide sequence that interests and controls its transcription into mRNA. This provides a location for specific binding by RNA Polymerase and other transcription factors to initiate transcription.

“One must understand that promoters are modular in their architecture, and that this modular architecture can be modified. Bacterial promoters usually include a core promoter and additional promoter elements. Core promoter is the minimum amount of promoter needed to initiate transcription. A core promoter contains a Transcription Start site, which is a binding site to RNA polymerases as well as general transcription factor binding spots. What is the ‘transcription start site’? The?transcription start site? refers to the nucleotide that is to be transscribed. It is designated +1. Nucleotides downstream of the start site will be numbered +1 and +2, respectively, while nucleotides that are upstream of the start site will be numbered?1,?2, etc. Additional promoter elements can be found 5? The frequency of transcription is controlled by additional promoter elements located 5? Specific transcription factor sites are the proximal and distal promoter elements. A core promoter in prokaryotes usually contains two consensus sequences: a?10 or a?35 sequence that are recognized by the sigma factors (see [Hawley 1983]). The?10 sequence is located 10 bp downstream of the first transcribed DNA nucleotide. It typically contains 6 nucleotides and typically includes the nucleotides Adenosine (also known by the Pribnowbox). In some cases, the nucleotide sequence for the?10 sequence may be 5?-TATAAT. Other times it may contain 3 to 6 bases pairs from the consensus sequence. This box is necessary for the transcription to begin. The core promoter’s?35 sequence is usually 6 nucleotides long. Each of the four nucleosides make up the nucleotide sequence for the?35 sequence. This sequence is essential for a high transcription rate. Some embodiments have the nucleotide sequence for the?35 sequence at 5?-TTGACA. Other sequences may contain 3 to 6 base pairs. Some embodiments have the sequences?10 or?35 separated by 17 nucleotides. Eukaryotic promoters can be found several kilobases downstream of the transcription start site. They are more diverse than those for prokaryotics. TATA boxes are sometimes found in eukaryotic promoters (e.g. TATA box (e.g., containing part of the consensus sequence TATAAA), located between 40 and 120 bases from the transcriptional start point. Specific binding proteins may recognize one or more UAS sequences upstream and activate the transcription. These UAS sequences can be found downstream of the transcription initiation sites. It is possible for the distance between UAS sequences (and the TATA box) to be as high as 1 kb.

“As used in this document, the term vector? “Vector” refers to any genetic element such as a plasmid or phage, transposon and cosmid. A marker may be included in the vector that can be used to identify transformed or infected cells. The markers could be antibiotic resistant, fluorescent, enzyme-activated, or other traits. A second example is that markers can be used to supplement auxotrophic deficiencies and provide critical nutrients not found in culture media. Cloning and expression vectors are two types of vectors. The term “cloning vector” is used herein. The term?cloning vector? refers to a plasmid, phage DNA, or any other DNA sequence that is capable of reproducing autonomously in a host cells. It is also characterized by one to a few restriction endonuclease sites and/or sites to site-specific replication. These sites can be used to splice a foreign DNA fragment into the vector in order for it to become a part of the vector. Expression vector is a term that means “expression vector”. Expression vector is a vector that can express a gene that has already been cloned. This expression can take place after transformation into a host or IVPS system. The cloned genome is often operably linked with one or more regulatory sequences such as terminators, enhancers, promoters, activator/repressor binding site, terminators, enhancers, and so on. You can have promoter sequences that are constitutive, inducible, and/or repressible.

“As used herein the term ‘host? “As used herein, the term?host? or?host cells? Any prokaryotic and eukaryotic organism (e.g. mammalian or insect, yeast, plant or animal, bacterial, archaeal. avian. etc.) Cell or organism. A replicable expression vector, or cloning vector can be delivered to the host cell. A methylotroph is a cell that has been genetically engineered, metabolically modified, or naturally exists. The host cells can be either prokaryotic species such as those of the genus Paracoccus or Escherichia or eukaryotic cell types such as yeast, mammalian, insect, amphibian or mammalian cells. Cell lines are specific cells that can be grown indefinitely under the right conditions and medium. Cell lines can include mammalian, insect, and plant cells. Exemplary cell lines include stem cell lines and tumor cell lines. A heterologous nucleic acids molecule can contain, among other things, a sequence, a transcriptional regulatory sequence, such as a promoter or enhancer, repressor and the like, and/or an origin for replication. The terms “host” and “host cell” are used herein. ?host cell,? ?recombinant host? ?recombinant cell? They may be interchangeable. Examples of such hosts are found in [Sambrook 2001].

“One or more nucleic acids sequences can be targeted to deliver prokaryotic and eukaryotic cells using conventional transfection or transformation techniques. The terms “transformation” and “transfection” are used herein. The terms?transformation? and?transfection are used herein. are intended to refer to a variety of art-recognized techniques for introducing an exogenous nucleic acid sequence (e.g., DNA) into a target cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, conjugation, electroporation, optoporation, injection and the like. Water, CaCl2, cationic Polymers, lipids and other suitable media are all acceptable for transfection. You can find suitable materials and methods to transform or transfect target cells in [Sambrook 2001] and other laboratory manuals. For transfection or transformation, certain oligo concentrations (per oligo), can be used in certain cases.

“As used herein the term’marker? or ?reporter? A gene or protein that can attach to a regulatory sequence or other protein of interest so that when expressed in a host cell, the reporter can confer certain characteristics which can be easily identified, measured, and/or selected. Reporter genes can be used to indicate whether a particular gene has been introduced into the host cell or organism. Some examples of common reporters are: auxotropic markers, antibiotic resistance genes,?-galactosidase, luciferase, bacterial gene lacZ, luciferase, chloramphenicol Acetyltransferase, (CAT; from bacteria), GUS; commonly used by plants, and green fluorescent protein; (GFP) from jelly fish. Selectable markers or reporters can be screenable. A selectable marker, such as an antibiotic resistance gene or auxotropic marker, is a marker that confers a trait that can be artificially selected. Typically, host cells expressing the marker are protected from any selective agent that could cause damage to their growth. Screenable markers (e.g., lacZ, gfp) allow researchers to identify desired cells that express the marker and those that don’t (or are not expressing it at sufficient levels).

“As used herein the term’methylotrophic organism? or ?methylotrophic organism? This term refers to organisms which produce complex organic compounds using compounds that are not carbon-carbon bound. It includes formate, formic acids, formaldehyde and methane. C1 compounds are often used by methylotrophs as both a source and a sink for carbon. The ribulose monophosphate (FIG) cycle is an example of a methylotrophic metabolic pathway for the production central metabolites from C1 substances. 1) and the serine (FIG. 2). ?Autotrophs? or ?autotrophic organisms? This term refers to organisms using simple, inorganic carbon molecules such as carbon dioxide as their primary carbon source for growth. Some methylotrophs, but not all, assimilate C1 compound via carbon dioxide. They are also autotrophs. These organisms convert C1 compounds like methanol, formaldehyde, formaldehyde, and methylamine to carbon dioxide (see FIG. 3) and then reduce carbon dioxide to central metabolites using carbon fixation cycles using, for example, the Calvin-Benson-Bassham cycle (FIG. 4) or the reductive tricarboxlic acid cycle (FIG. 5). In contrast, ?heterotrophs? Or?heterotrophic or heterotrophic organisms? This refers to organisms who must use reduced organic carbon compounds with carbon carbon bonds for growth. They cannot use inorganic CO as their primary source of carbon. Heterotrophs instead get energy from the breakdown of organic molecules they consume. Mixotrophs and mixotrophic organisms can combine different sources of energy and carbon. They can alternate between heterotrophy or autotrophy, heterotrophy or methylotrophy or between phototrophy or chemotrophy or any combination thereof depending on the environmental conditions.

“Reduced cofactor” is the term used herein. Refers to intracellular energy carriers and redox, such as NADH and NADPH. They can also donate high-energy electrons in reduction-oxidation processes. The terms?reduced and reducing cofactor? are interchangeable. redox cofactor? “Can be interchangeably used.”

“As used herein the term ‘C1 compound?, or?1C compound?” or ?C1 compound? C1 compounds are chemical species that have reduced species, but no carbon-carbon bonds. C1 compounds can contain one carbon atom (e.g. formate, formic, formamide and formaldehyde), or multiple carbons (e.g. dimethylether, dimethylamine and dimethyl sulfur). C1 compounds can also be organic (e.g. formate, formic acids) or inorganic (e.g. methane and methanol). C1 compounds are often used as a source for energy and carbon for methylotrophs.

“Central metabolite” is the term used herein. refers to organic carbon compounds, such as acetyl-coA, pyruvate, pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, malate, malic acid, lactate, lactic acid, acetate, acetic acid, citrate and/or citric acid, that can be converted into carbon-based products of interest by a host cell or organism. The central metabolites are usually restricted to the reduced organic compounds that can be obtained in any given host cell. In certain embodiments, the central metabolism can also be the carbon product of concern. In this case, no further chemical conversion is required.

“References to a particular chemical specie include not only that species, but also the water-solvated forms. Carbon dioxide, for example, includes both the gaseous (CO2) and water-solvated forms (e.g. bicarbonate ion).

“The term “biosynthetic path” is used herein. or?metabolic path? A set of biochemical reactions that convert (transmute) one chemical species into another. Anabolic pathways are a way to make a larger molecule out of smaller molecules. This requires energy. Catabolic pathways are the breaking down of larger molecules and often releasing energy. The term “energy conversion pathway” is used herein. A metabolic pathway that converts energy from a C1-compound to a reducing agent is called “energy conversion pathway”. The term “carbon fixation pathway” is used. A biosynthetic pathway which converts inorganic carbon (e.g. carbon dioxide, bicarbonate, or formate) to reduced organic carbon such as one or several carbon product precursors. The term “methylotrophic pathway” is used. A biosynthetic pathway that transforms C1 compounds into compounds with carbon-carbon bonds such as one or several carbon product precursors. “Carbon product biosynthetic pathway” is the term. A biosynthetic pathway which converts one or several carbon product precursors into one or multiple carbon based products of interest.

“Engineered methylotrophic organism” is the term used herein. Or?engineered-methylotrophic organism? This refers to organisms genetically engineered for conversion of C1 compounds such as formate or formaldehyde to organic carbon compounds. An engineered methylotroph does not have to derive its organic carbon compounds exclusively from C1 compounds, as is the case herein. An engineered methylotroph can also refer to an original methylotrophic, mixotrophic organism that has been genetically engineered to include energy conversion, carbon fixation and/or carbon product biosynthetic routes in addition to its endogenous ability to methylotrophic. The term “engineer”,? ?engineering? or ?engineered,? As used herein, it refers to genetic manipulation of biomolecules like DNA, RNA, and/or proteins, or any similar technique, commonly known in biotechnology art.”

“Carbon-based products of interest” is the term used herein. refers to a desired product containing carbon atoms and include, but not limited to alcohols such as ethanol, propanol, isopropanol, butanol, octanol, fatty alcohols, fatty acid esters, wax esters; hydrocarbons and alkanes such as propane, octane, diesel, Jet Propellant 8, polymers such as terephthalate, 1,3-propanediol, 1,4-butanediol, polyols, polyhydroxyalkanoates (PHAs), polyhydroxybutyrates (PHBs), acrylate, adipic acid, epsilon-caprolactone, isoprene, caprolactam, rubber; commodity chemicals such as lactate, docosahexaenoic acid (DHA), 3-hydroxypropionate, ?-valerolactone, lysine, serine, aspartate, aspartic acid, sorbitol, ascorbate, ascorbic acid, isopentenol, lanosterol, omega-3 DHA, lycopene, itaconate, 1,3-butadiene, ethylene, propylene, succinate, citrate, citric acid, glutamate, malate, 3-hydroxyprionic acid (HPA), lactic acid, THF, gamma butyrolactone, pyrrolidones, hydroxybutyrate, glutamic acid, levulinic acid, acrylic acid, malonic acid; specialty chemicals such as carotenoids, isoprenoids, itaconic acid; biological sugars such as glucose, fructose, lactose, sucrose, starch, cellulose, hemicellulose, glycogen, xylose, dextrose, galactose, uronic acid, maltose, polyketides, or glycerol; central metabolites, such as acetyl-coA, pyruvate, pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihydroxyacetone phosphate, glyceraldehyde-3-phosphate, malate, malic acid, lactate, lactic acid, acetate, acetic acid, citrate and/or citric acid, from which other carbon products can be made; pharmaceuticals and pharmaceutical intermediates such as 7-aminodesacetoxycephalosporonic acid, cephalosporin, erythromycin, polyketides, statins, paclitaxel, docetaxel, terpenes, peptides, steroids, omega fatty acids and other such suitable products of interest. These products can be used as intermediates in the production of other products such as pharmaceuticals, biofuels, industrial chemicals and specialty chemicals.

“Hydrocarbon” is the term used herein. A chemical compound made up of carbon, hydrogen, and optionally oxygen. ?Surfactants? Surfactants are substances that can reduce the liquid’s surface tension. They typically consist of a water-soluble head, a hydrocarbon chain, or tail. The hydrocarbon chain can be hydrophobic and the water-soluble group can be either ionic or notionic. “Biofuel” is a term that refers to fuel that comes from biological sources. Any fuel that comes from a biological source is called “biofuel”.

“The accession numbers in this description were derived from the NCBI (National Center for Biotechnology Information) database maintained by the National Institute of Health USA. These accession numbers were added to the database on August 1, 2011. Enzyme Classification Numbers, (E.C. The Enzyme Classification Numbers (E.C.) described in this description were derived from the KEGG Ligand Database, which is maintained by the Kyoto Encyclopedia of Genes and Genomics and sponsored in part by University of Tokyo. The E.C. The E.C.

“Other terms in the fields of recombinant DNA technology, microbiology and metabolic engineering as used herein will generally be understood by someone of ordinary skill in applicable arts.”

“Source of C1 Compounds.”

“Some embodiments use suitable C1 compounds such as formate, formic, methanol, and/or formaldehyde. The electrochemical reduction of CO2 can produce formate, formic, formaldehyde, and methanol [see, e.g. Hori, 2008].

“In certain cases, liquid feedstocks like formate, formic, or formaldehyde can be preferred to gaseous feedstocks such methane or synthetic gas. Methane, a gas that has low water solubility in the water, is commonly used as a feedstock for engineered or evolved methylotrophs. (Biological systems are aqueous). The same is true for synthesis gas, which is composed of molecular hydrogen (and carbon monoxide). It also has low water solubility. High rates of mass transfer between the liquid and gas phases can be difficult at large scale reactors or fermentors. In contrast, formate, formaldehyde and methanol due to their higher solubility/miscibility in H2O, do not have this problem. Formate, formaldehyde, formaldehyde, or methanol can be more beneficial when water is used as a solvent in the growth medium.

The energy efficiency of electrochemical carbon dioxide conversion impacts the overall energy efficiency for a bio-manufacturing procedure using an engineered or evolved methylotroph of this invention. Overall energy efficiency of electrolyzers is 56-73% for current densities between 110-300 mA/cm2 and 800-1600 (PEM) electrolyzers [Whipple 2010,]. However, while electrochemical systems have had high current densities and energy efficiency, they are not able to achieve both. For electrochemical productions of formate and formaldehyde, as well as formic acid and formaldehyde, further technology improvements are required.

“Organisms or host cells for engineering or evolution”

“The host cell or organism, as disclosed herein, may be chosen from methylotrophic eukaryotic or prokaryotic systems, such as bacterial cells (Gram-negative (e.g., Alphaproteobacterium) or Gram-positive), archaea and yeast cells. You can use the same cells or cell lines that are used in industrial and laboratory settings. In some embodiments, host cells/organisms can be selected from Bacillus species including Bacillus methanolicus, Bilophila wadsworthia, Burkholderia species including Burkholderia phymatum, Candida species including Candida boidinii, Candida sonorensis, Cupravidus necator (formerly Alcaligenes eutrophus and Ralstonia eutropha), Hyphomicrobium species including Hyphomicrobium methylovorum, Hyphomicrobium zavarzinii, Methanococcus maripaludis, Methanomonas methanooxidans, Methanosarcina species, Methylibium petroleiphilum, Methylobacillus flagellatus, Methylobacillus flagellatum, Methylobacillus fructoseoxidans, Methylobacillus glycogenes, Methylobacillus viscogenes, Methylobacter bovis, Methylobacter capsulatus, Methylobacter vinelandii, Methylobacterium species including Methylobacterium dichloromethanicum, Methylobacterium extorquens, Methylobacterium mesophilicum, Methylobacterium organophilum, Methylobacterium rhodesianum, Methylococcus capsulatus, Methylococcus minimus, Methylocystis species including Methylocystis parvus, Methylomicrobium alcaliphilum, Methylomonas species including Methylomonas agile, Methylomonas albus, Methylomonas clara, Methylomonas methanica (formerly Bacillus methanicus and Pseudomonas methanica), Methylomonas methanolica, Methylomonas rosaceous, Methylomonas rubrum, Methylomonas streptobacterium, Methylophilus methylotrophus, Methylosinus species including Methylosinus sporium, Methylosinus trichosporium, Methylosporovibrio methanica, Methyloversatilis universalis, Methylovorus mays, Mycobacterium vaccae, Nautilia sp. Nautilia profundicola strain AmN, Paracoccus species (including Paracoccus veritrificans), Paracoccus versutus, Paracoccus zeaxanthinifaciens), Nautilia profundicola species, Nautilia lithotrophica species, Nautilia profundicola species, Methylobacterium capsulatus, Methylobacterium mesophilicum, Methylobacterium, Mycobacterium species, Verrucomicrobia species and Xanthomicrobia species, Xanthomicrobia species, Xanthomicrobia species, s, Nautilia s, Nautilia s, Nautilia s, Nautilia s, Nautilia a s, Nautilia s, Nautilia s, Nautilia s, Nautilia species, s s s s s s s s s s s s s s s es s s s s s s s s s s s s s s, s s s s s s s s s s s s s s The genetic modifications and metabolic alterations described herein are intended to be used with Paracoccus species, Paracoccus versutus or Paracoccus zeaxanthinifaciens as a host organism. The complete genome sequencing of many organisms and the skill level in genomics would allow those who are skilled in the art to easily apply the teachings and guidance herein to virtually all other methylotrophic hosts cells and organisms. Paracoccus denitrificans’ metabolic modifications can be easily applied to other species, by including the same or an analogous encoding DNA from other species. These genetic modifications can include genetic alterations of species homologs in general and, in particular, orthologs paralogs or non-orthologous genes displacements.

“In different aspects of the invention, cells are genetically engineered or metabolically evolved for optimized energy conversion and/or carbon fixation. The terms “metabolically evolved” and “metabolic evolution” are interchangeable. Or?metabolic evolution? “This refers to growth-based selection (metabolic evolutionary) of host cells that show improved growth (cell yield).

“Exemplary genomes and nucleic acids include full and partial genomes of a number of organisms for which genome sequences are publicly available and can be used with the disclosed methods, such as, but not limited to, Aeropyrum pernix; Agrobacterium tumefaciens; Anabaena; Anopheles gambiae; Apis mellifera; Aquifex aeolicus; Arabidopsis thaliana; Archaeoglobusfulgidus; Ashbya gossypii; Bacillus anthracis; Bacillus cereus; Bacillus halodurans; Bacillus licheniformis; Bacillus subtilis; Bacteroides fragilis; Bacteroides thetaiotaomicron; Bartonella henselae; Bartonella quintana; Bdellovibrio bacteriovirus; Bifidobacterium longum; Blochmannia floridanus; Bordetella bronchiseptica; Bordetella parapertussis; Bordetella pertussis; Borrelia burgdorferi; Bradyrhizobium japonicum; Brucella melitensis; Brucella suis; Buchnera aphidicola; Burkholderia mallei; Burkholderia pseudomallei; Caenorhabditis briggsae; Caenorhabditis elegans; Campylobacter jejuni; Candida glabrata; Canis familiaris; Caulobacter crescentus; Chlamydia muridarum; Chlamydia trachomatis; Chlamydophila caviae; Chlamydophila pneumoniae; Chlorobium tepidum; Chromobacterium violaceum; Ciona intestinalis; Clostridium acetobutylicum; Clostridium perfringens; Clostridium tetania Corynebacterium diphtheriae; Corynebacterium efficiens; Coxiella burnetii; Cryptosporidium hominis; Cryptosporidium parvum; Cyanidioschyzon merolae; Debaryomyces hansenii; Deinococcus radiodurans; Desulfotalea psychrophila; Desulfovibrio vulgaris; Drosophila melanogaster; Encephalitozoon cuniculi; Enterococcusfaecalis; Erwinia carotovora; Escherichia coli; Fusobacterium nucleatum; Gallus gallus; Geobacter sulfurreducens; Gloeobacter violaceus; Guillardia theta; Haemophilus ducreyi; Haemophilus influenzae; Halobacterium; Helicobacter hepaticus; Helicobacter pylori; Homo sapiens; Kluyveromyces waltii; Lactobacillus johnsonii; Lactobacillus plantarum; Legionella pneumophila; Leifsonia xyli; Lactococcus lactis; Leptospira interrogans; Listeria innocua; Listeria monocytogenes; Magnaporthe grisea; Mannheimia succiniciproducens; Mesoplasma florum; Mesorhizobium loti; Methanobacterium thermoautotrophicum; Methanococcoides burtonii; Methanococcus jannaschii; Methanococcus maripaludis; Methanogenium frigidum; Methanopyrus kandleri; Methanosarcina acetivorans; Methanosarcina mazei; Methylococcus capsulatus; Mus musculus; Mycobacterium bovis; Mycobacterium leprae; Mycobacterium paratuberculosis; Mycobacterium tuberculosis; Mycoplasma gallisepticum; Mycoplasma genitalium; Mycoplasma mycoides; Mycoplasma penetrans; Mycoplasma pneumoniae; Mycoplasma pulmonis; Mycoplasma mobile; Nanoarchaeum equitans; Neisseria meningitidis; Neurospora crassa; Nitrosomonas europaea; Nocardia farcinica; Oceanobacillus iheyensis; Onions yellows phytoplasma; Oryza sativa; Pan troglodytes; Paracoccus denitrificans; Paracoccus versutus; Paracoccus zeaxanthinifaciens; Pasteurella multocida; Phanerochaete chrysosporium; Photorhabdus luminescens; Picrophilus torridus; Plasmodium falciparum; Plasmodium yoelii yoelii; Populus trichocarpa; Porphyromonas gingivalis Prochlorococcus marinus; Propionibacterium acnes; Protochlamydia amoebophila; Pseudomonas aeruginosa; Pseudomonas putida; Pseudomonas syringae; Pyrobaculum aerophilum; Pyrococcus abyssi; Pyrococcus furiosus; Pyrococcus horikoshii; Pyrolobus fumarii; Ralstonia solanacearum; Rattus norvegicus; Rhodopirellula baltica; Rhodopseudomonas palustris; Rickettsia conorii; Rickettsia typhi; Rickettsia prowazekii; Rickettsia sibirica; Saccharomyces cerevisiae; Saccharomyces bayanus; Saccharomyces boulardii; Saccharopolyspora erythraea; Schizosaccharomyces pombe; Salmonella enterica; Salmonella typhimurium; Schizosaccharomyces pombe; Shewanella oneidensis; Shigella flexneria; Sinorhizobium meliloti; Staphylococcus aureus; Staphylococcus epidermidis; Streptococcus agalactiae; Streptococcus mutans; Streptococcus pneumoniae; Streptococcus pyogenes; Streptococcus thermophilus; Streptomyces avermitilis; Streptomyces coelicolor; Sulfolobus solfataricus; Sulfolobus tokodaii; Synechococcus; Synechoccous elongates; Synechocystis; Takifugu rubripes; Tetraodon nigroviridis; Thalassiosira pseudonana; Thermoanaerobacter tengcongensis; Thermoplasma acidophilum; Thermoplasma volcanium; Thermosynechococcus elongatus; Thermotagoa maritima; Thermus thermophilus; Treponema denticola; Treponema pallidum; Tropheryma whipplei; Ureaplasma urealyticum; Vibrio cholerae; Vibrio parahaemolyticus; Vibrio vulnificus; Wigglesworthia glossinidia; Wolbachia pipientis; Wolinella succinogenes; Xanthomonas axonopodis; Xanthomonas campestris; Xylellafastidiosa; Yarrowia lipolytica; Yersinia pseudotuberculosis; and Yersinia pestis nucleic acids.”

“In some embodiments, sources for encoding nucleic acid for enzymes for biosynthetic pathways can include, for instance, any species in which the encoded gene product is capable to catalyze the referenced reaction. Exemplary species for such sources include, for example, Aeropyrum pernix; Aquifex aeolicus; Aquifex pyrophilus; Candidatus Arcobacter sulfidicus; Candidatus Endoriftia persephone; Candidatus Nitrospira defluvii; Chlorobium limicola; Chlorobium tepidum; Clostridium pasteurianum; Desulfobacter hydrogenophilus; Desulfurobacterium thermolithotrophum; Geobacter metallireducens; Halobacterium sp. NRC-1 Hydrogenimonas thermophila Hydrogenivirga 128-5R1 Hydrogenobacter thermophilus Hydrogenobaculum. Y04AAS1; Lebetimonas acidiphila Pd55T ; Leptospirillum ferriphilum; Leptospirillum ferrodiazotrophum; Leptospirillum rubarum; Magnetococcus marinus; Magnetospirillum magneticum; Mycobacterium bovis; Mycobacterium tuberculosis; Methylobacterium nodulans; Nautilia lithotrophica; Nautilia profundicola; Nautilia sp. strain AMN; Nitratifractor saltuginis; Nitratiruptor. strain SB155-2; Paracoccus denitrificans; Paracoccus versutus; Paracoccus zeaxanthinifaciens; Persephonella marina; Rimcaris exoculata episymbiont; Streptomyces avermitilis; Streptomyces coelicolor; Sulfolobus avermitilis; Sulfolobus solfataricus; Sulfolobus tokodaii; Sulfurihydrogenibium azorense; Sulfurihydrogenibium sp. Y03AOP1; Sulfurihydrogenibium yellowstonense; Sulfurihydrogenibium subterraneum; Sulfurimonas autotrophica; Sulfurimonas denitrificans; Sulfurimonas paralvinella; Sulfurovum lithotrophicum; Sulfurovum sp. strain NBC37-1 PCC 7120; Acidithiobacillus ferrooxidans; Allochromatium vinosum; Aphanothece halophytica; Oscillatoria limnetica; Rhodobacter capsulatus; Thiobacillus denitrificans; Cupriavidus necator (formerly Ralstonia eutropha), Methanosarcina barkeri; Methanosarcia mazei; Methanococcus maripaludis; Mycobacterium smegmatis; Burkholderia stabilis; Candida boidinii; Candida methylica; Pseudomonas sp. 101; Methylcoccus capsulatus; Mycobacterium gastri; Cenarchaeum symbiosum; Chloroflexus aurantiacus; Erythrobacter sp. 101; Methylcoccus capsuleatus; Mycobacterium gastri; Cenarchaeum symbiosum; Chloroflexus aurantiacus; Erythrobacter sp. The complete genome sequence of more than 4400 species (including a variety yeast, fungi and mammalian genes) is now publicly available. This allows for identification of genes that encode the required energy conversion, carbon fixation, or carbon product biosynthetic activities for one or more genes from related or distant species. The metabolic modifications that enable methylotrophic growth or production of carbon-based goods described herein for Paracoccus denitrificans are easily applicable to all methylotrophic microorganisms. The teachings and guidance herein will help those who are skilled in the art to see that a metabolic modification demonstrated in one organism can be applied equally in other organisms.

“In certain instances, such as where an alternative energy conversion, carbon fixation or carbon product biosynthetic pathway is present in an unrelated specie, enhanced methylotrophic and carbon-based products production can be conferred onto a host species by exogenous expression a paralog (or paralogs) from the unrelated specie that catalyzes an identical, but non-identical, metabolic reaction to replace referenced reaction. The fact that different metabolic networks are different between organisms means that different gene usages might be possible. Those skilled in the art will understand this. The teachings and guidance herein will help those who are skilled in the art to understand that the invention’s methods and teachings can be applied to any microbial organism using cognate metabolic modifications. This is done to create a microbial species that produces carbon-based products from C1 compounds.

“It is important to note that you can use various engineered strains, mutations, and/or combinations of the organisms or cells discussed herein.”

“Methods to Identify and Select Candidate Enzymes For a Metabolic Activity Of Interest”

“In one aspect, this invention provides a method to identify candidate proteins or enzymes capable of performing a desired metabolism. Bayer and his colleagues took advantage of the rapid growth in gene and genome sequence databases, and the affordability of commercial gene synthesis to develop a synthetic metagenomics strategy. They used a bioinformatic search approach to identify homologous or related enzymes in sequence databases, optimize their encoding genes for heterologous expression, synthesize and clone the sequence into an expression vector, and screen for the desired function in E.coli and yeast. There may be thousands of homologs available in publicly accessible sequence databases, depending on the protein or metabolic activity of the target gene. It can sometimes be difficult or impossible to synthesize all homologs in a reasonable amount of time and at a reasonable cost. This invention addresses this problem by providing an alternative method to identify and select candidate sequences of protein for a metabolic activity. These are the steps of the method. The first step is to identify the enzyme(s) of interest for the desired metabolic activity. This could be an enzyme-catalyzed enzyme in an energy conversion or methylotrophic carbon fixation pathway. The enzyme(s), of interest, have typically been experimentally confirmed to perform the desired activity. This could be in the published scientific literature. One or more enzymes of interest may have been expressed heterologously and demonstrated functionally in some embodiments. A bioinformatic search of protein classification and grouping databases such as Entrez Protein Clusters [Klimke 2009], Tatusov 2003], Clusters of Orthologous groups (COGs] [Tatusov 1997; Tatusov 2003] and InterPro [Zdobnov 2001] is used to find protein groups that contain the enzyme(s), or closely related enzymes. For bioinformatic analysis purposes, if the enzyme(s), of interest, contain multiple subunits then the protein that corresponds to the catalytic or largest subunit is chosen. A third step is to perform an expert-guided, systematic search to determine which database groups are likely to have a majority members whose metabolism is similar to the protein(s). The fourth step is to compile a list of NCBI Protein accession number corresponding each member of each selected grouping. Finally, the corresponding sequences of protein are then downloaded from the sequence database. This set may also include sequences from other sources than the public databases. Optionally, fifth, one or more outgroup proteins are identified and added into the set. Outgroup proteins are proteins that may have some functional, structural, and/or sequence similarities with the model enzyme(s), but do not possess an essential feature or desired metabolic activity. E.C. is an example of flavocytochrome C (E.C. 1.8.2.3 is similar to sulfidequinone-oxidoreductase. 1.8.5.4 is similar to sulfide quinone oxidoreductase (E.C. Sixth, all sequences of proteins are aligned using a sequence alignment program such as MUSCLE (Edgar, 2004a, Edgar, 2004b). Seventh, the MUSCLE alignment is used to create a tree. This can be done using methods that are well-known to those who are skilled in the art such as neighbor joining or UPGMA (Sokal, 1958; Murtagh 1984). Eighth, different types of clades are chosen from the tree to ensure that there is enough screening proteins. Final, one protein is chosen from each clade for gene synthesis or functional screening using the following heuristics:

“Therefore, when constructing the engineered/or evolved methylotroph of the invention it would be obvious to those skilled in the arts that it is possible to replace/additional genes in a metabolic pathway such as an energy conversion pathway or carbon fixation pathway or a methylotrophic pathway. With homologs identified by the methods here, whose gene products catalyze similar or substantially identical metabolic reactions, it would be possible to do so using the teachings and guidance provided herein. These modifications can be used to improve the kinetic properties and/or optimize the engineered or evolved methylotroph.

“Methods to Design Nucleic Acids Encoding enzymes for Heterologous Expression”

“The present invention, in one aspect, provides a computer program product that can be used to design a nucleic acids that encode a protein or enzyme that is optimized for the target organism or host cell (the target species). The program can be stored on a hard drive that is computer-readable. It contains a number of instructions that, when executed by the processor, cause it to perform operations. These operations are part of the program. The program selects the codon at each position of the protein of concern in which the rank-order codon usage frequency in the target species of that codon is equal to the rank-order codon usage frequency in the source species gene. The source and target species must have the same genetic code (the mapping codons into amino acids [Jukes 1993],] and the codon frequency table (the frequency at which each synonymous codon occurs within a genome [Grantham 1980],) in order to select the desired codon at each position. The usage frequency of each codon can be calculated for source species with a complete sequence of their genome. This is done by adding the number of instances for that codon in all annotated sequences and then multiplying that number by 1000. The usage frequency of source species that do not have a complete genome can be calculated using any available coding sequences, or the codon frequency tables of closely related organisms. The program will then standardize the start codon to ATG and stop codons to TAA. It also allows for the second and third last codons to be converted to one of the twenty possible codons (one for each amino acid). To improve the probability that the codon optimized sequence of nucleic acids can be synthesized using commercial gene synthesis [Sambrook 2001] or DNA assembly methods [WO/2010/070295], the program runs a series if checks. The program checks for key restriction enzyme recognition site locations in DNA assembly methods or standards. It also looks at whether sequence repeats exist. If sequence motifs, G or C homopolymers greater that 5 nucleotides are present, as well as any sequence motifs that could give rise to spurious transposon sites. The program will then run all the synonymous mutation checks to determine if the codon optimized sequence of nucleic acids fails. If it does, it will create a new sequence that passes all the checks and minimizes differences in codon frequencies between the original sequence and the new sequence.

“Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application-specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These implementations may include one or more computer programmes that can be executed and/or translated on a programmable platform, including at least one processor. This processor can be either special or general-purpose, and is coupled to receive and transmit data and instructions to a storage system, an input device and at least a output device. These computer programs (also called software, software applications, code, programs) can include machine instructions for the programmable processor. They may also be implemented in any programming language, including high-level procedural or object-oriented programming languages and/or assembly/machine programming languages. Computer programs can be deployed in many forms, including standalone programs, modules, components, subroutines, and other units that are suitable for use within a computing environment. You can deploy a computer program to execute or interpret on one computer, on multiple computers at the same site, or spread across multiple sites and connected by a network.

A computer program can, in one embodiment, be stored on an electronic storage medium. Computer readable storage media stores computer data. This data may include computer program code, which can be executed by a processor or computer system. A computer readable medium can include computer readable media that is used for the tangible or fixed storage or for communication media to allow for the transient interpretation code-containing signals. Computer-readable storage media can refer to tangible or physical storage, as opposed to signals. It may also include volatile and nonvolatile media that are removable or non-removable and media that can be used in any technology or method for tangible storage of information, such as computer-readable instructions and data structures, program modules, or other data. Computer-readable storage media can include, but not be limited to: RAM, EPROM and EEPROM flash memory or another solid state memory technology, CDROM, DVD, other optical storage, magnetic cassettes magnetic tape magnetic disk storage or any other physical or materials that can be used to tangibly store information, data, instructions, and can be accessed using a computer or processor.

Program 210 could be a computer program that performs the functions and processes described above. Program 210 could contain various instructions and subroutines that, when loaded into memory206 and executed by processor204, cause processor 204 various operations. Some or all of these operations may result in the methods, processes and/or functions described herein.

Computer processing device 200 could include different input and output forms, even though they are not shown. The I/O could include network adapters and USB adapters, Bluetooth radios or mice, Bluetooth keyboards, Bluetooth radios or touchpads, displays or touch screens, Bluetooth radios or Bluetooth radios. It may also include Bluetooth speakers, microphones or sensors.

“Methods to Expression Heterologous Enzymes”

“Composite Nucleic Acids can be made to contain one or more of the following energy conversion, methylotrophic and/or carbon fixation pathways encoding nucleic acid. These composite nucleic acid can be transfected to a host organism that allows for the expression of one or more of your desired proteins. By operably linking nucleic acid sequences that encode one or more standard genetic parts with the protein(s), of interest, nucleic acid can be made into composite nucleic Acids. Standardized genetic parts are sequences of nucleic acids that have been modified to conform to a set of technical standards such as an assembly norm [Knight, 2003, Shetty 2008; Shetty 2011,]. Standardized genetic parts may encode transcriptional initiation and termination elements, translational elements, translational elements, or other elements. They can also include protein affinity tags, proteins degradation tags, protein localization tag, selectable markers as well as replication elements. The advantage of standardized genetic parts is that they can be independently validated, characterized, and then easily combined with other parts to create functional nucleic acid [Canton 2008]. Mixing and matching standard genetic parts that encode different expression control elements with nucleic Acids encoding proteins can speed up the process of achieving soluble expression and validating their function. The set of standardized parts could include constitutive promoters with varying strengths [Davis 2011, ribosome binding site of varying strength [Anderson 2007,] and protein degradation of tags of different strengths [Andersen 1998].

“Exogenous expression can be achieved in Paracoccus and other prokaryotic cell types by modifying nucleic acid encoding proteins of concern to add solubility tags to the protein of particular interest. This will ensure that the protein is soluble. The addition of the maltose binding proteins to a protein has been shown to increase soluble expression in E. coli. Chaperone proteins such as DnaK and DnaJ, GroES, GroEL, and GroES may be co-expressed or increased with proteins of interest [Greene, 1999; Kapust, 1999; Sachdev, 2000]. This promotes correct folding and assembly [Martinez Alonso 2009; Martinez Alonso 2010, 2010].

“Exogenous expression can be achieved in Parococcus and other prokaryotic cell lines by using nucleic acid sequences within the genes or cDNAs. These sequences can encode targeting signals, such as an Nterminal mitochondrial signal or another signal. These signals can be removed prior to transformation into prokaryotic host cell. E.coli expression was increased when a mitochondrial leader sequence was removed [Hoffmeister 2005]. Exogenous expression can occur in yeast and other eukaryotic cell types. Genes can be expressed in cytosol, without the addition or targeting of a leader sequence. It is possible to modify a nucleic acids sequence to include or remove a targeting sequence, and incorporate it into an exogenous sequence of nucleic acids to give desired properties.

“Example 2 shows how to introduce exogenous nucleic acid into the methylotrophic bacteria Paracoccus versutus or Paracoccus denitrificans by conjugative plasmid transmission.

“Production of Central Metabolites for the Carbon-Based Products Of Interest”

“In some embodiments, the engineered or evolved methylotroph according to the present invention produces central metabolites such as citrate, succinate and fumarate. These metabolites can also include dihydroxyacetone, trihydroxyacetone phosphate (DHA), 3-hydroxypropionate, and pyruvate. The engineered or evolved methylotroph creates central metabolites either as intermediates or products of carbon fixation, methylotrophic pathway, or as intermediates or products of host metabolism. One or more transporters can be expressed in an engineered or evolved methylotroph to allow the cell to export the central metabolism. One or more members of the C4-dicarboxylate carrier family, which is a group of enzymes, are responsible for exporting succinate from cells to the media [Janausch 2002; Kim 2007,]. These central metabolites are easily converted into other products (FIG. 7).”

“In some instances, the engineered or evolved methylotroph might interconvert between central metabolites to create alternate carbon-based products. One embodiment of the engineered or evolved methylotroph can produce aspartate by expressing one (or more) aspartate aminotransferase(E.C.). 2.6.1.1) to convert L-glutamate and oxaloacetate to L-aspartate, and 2-oxoglutarate.

“In another embodiment, an engineered and/or developed methylotroph produces dihydroxyacetonephosphate by expressing one (or more) dihydroxyacetonekinases. 2.7.1.29), such C. freundii DhaK to convert dihydroxyacetone to ATP.

“In another embodiment, an engineered or evolved methylotroph produces serine in the carbon-based product. The metabolic reactions required for serine biosynthesis are: phosphoglycerate deshydrogenase, (E.C. 1.1.1.95), phosphoserine transaminase (E.C. 2.6.1.52, phosphoserine transaminase (E.C. 3.1.3.3). Phosphoglycerate dehydrogenase, such as E. coli SerA, converts 3-phospho-D-glycerate and NAD+ to 3-phosphonooxypyruvate and NADH. Phosphoserine transaminase, such as E. coli SerC, interconverts between 3-phosphonooxypyruvate+L-glutamate and O-phospho-L-serine+2-oxoglutarate. E. coli SerB converts O-phosphor-L-serine into L-serine.

“In another embodiment, an engineered or evolved methylotroph produces glutamate in the carbon-based product. Glutamate dehydrogenase is one of the metabolic reactions required for glutamate biosynthesis. 1.4.1.4. E.g., E. coli GDhA), which converts??ketoglutarate (NH3) and NADPH into glutamate. The following diagram shows how Glutamate can be converted into other carbon-based products. 8.”

“In another embodiment, an engineered or evolved methylotroph produces itaconate to be the carbon-based product. Itaconate biosynthesis requires aconitate encarboxylase (E. C. 4.1.1; such as the one from A. terreus), which converts cisaconitate into itaconate, and CO2. The following diagram shows how itaconate can be converted into other carbon-based products. 8.”

“Production Sugars as Carbon-Based Products Of Interest”

“Industrial production from biological organisms of chemical products is often achieved using a sugar source such as glucose or fructose as the feedstock. Hence, in certain embodiments, the engineered and/or evolved methylotroph of the present invention produces sugars including glucose and fructose or sugar phosphates including triose phosphates (such as 3-phosphoglyceraldehyde and dihydroxyacetone-phosphate) as the carbon-based products of interest. Interconversion may be possible for sugars and sugarphosphates. E.C. 5.3.1.9; e.g., E. coli Pgi) may interconvert between D-fructose 6-phosphate and D-glucose-6-phosphate. Phosphoglucomutase, E.C. 5.4.2.2; e.g., E. coli Pgm) converts D-?-glucose-6-P to D-?-glucose-1-P. Glucose-1-phosphatase (E.C. 3.1.3.10: e.g. E.coli Agp converts D?-glucose-1?P to D?-glucose. Aldose 1-epimerase (E.C. 5.1.3.3. e.g. E.coli GalM) D?-glucose or D-??-glucose. Optionally, the sugars and sugar phosphates can be exported from the engineered or evolved methylotroph into culture medium.

“Sugar-phosphates can be converted to sugars by dephosphorylation, which occurs intra- or extracellularly. For example, phosphatases such as a glucose-6-phosphatase (E.C. 3.1.3.9) or glucose-1-phosphatase (E.C. 3.1.3.10) or glucose-1-phosphatase (E.C. Exemplary phosphatases include Homo sapiens glucose-6-phosphatase G6PC (P35575), Escherichia coli glucose-1-phosphatase Agp (P19926), E. cloacae glucose-1-phosphatase AgpE (Q6EV19) and Escherichia coli acid phosphatase YihX (POA8Y3).”

“Sugarphosphates can also be exported via transporters from engineered or evolved methylotrophs into culture media. Sugar phosphate transporters act generally as anti-porters of inorganic phosphate. A. thaliana triose phosphatetransporter APE2 is an example of a triose phosphate transportationer (Genbank accession at5G46110.4). E.coli’s sugar phosphate transporter, UhpT (NP_418122.1), A. Thalian glucose-6-phosphate Transporter GPT1(AT5G54800.1) and A. thaliana GPT2 (or homologs thereof) are examples of glucose-6 phosphate transporters. You can also dephosphorylate glucose-6-phosphate and add it to glucose transport using Genbank accession numbers AAA16222., AAD19898., O43826.

Permeases allow sugars to diffuse from engineered or evolved methylotrophs into culture media. H. sapiens glucose transporter, GLUT-1, -3 or -7 (P11166; P11169; Q6PXP3) and S. cerevisiae glucose transporter HXT-1 (P32465,P32467,P39003). Synechocystis species sp. (P21906). 1148 glucose/fructose:H+ symporter GlcP (T.C. 2.A.1.1.32; P15729) [Zhang, 1989], Streptomyces lividans major glucose (or 2-deoxyglucose) uptake transporter GlcP (T.C. 2.A.1.1.35, Q7BEC4 [van Wezel 2005], Plasmodium falciparum hexose transporter PfHT1 2.A. 2.A. One or more active transporters can be introduced into the cell to allow active efflux of sugars from engineered and/or evolved methylotrophs. Examples of transporters are the mouse glucose transporter (GLUT 1) or its homologs.

Click here to view the patent on Google Patents.