WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Metabolic Network and Their Evolution

Posted on (Update: )
Tags: Metabolic Network, Evolution

The note is for Wagner, A. (2012). Metabolic Networks and Their Evolution. In O. S. Soyer (Ed.), Evolutionary Systems Biology (Vol. 751, pp. 29–52). Springer New York.


Metabolic Networks are large systems of chemical reactions that serve two main purposes:

  1. convert sources of energy in the environment into forms of energy useful to an organism
  2. synthesize small molecules (20 amino acids in proteins, DNA/RNA nucleotide, lipids (脂质), and several enzyme cofactors) needed for cell growth from sources of chemical elements-nutrients-in the environment.

Most reactions are catalyzed by enzymes (encoded by genes)

the structure, function, and evolution of metabolic networks have attracted a great amount of research interest for many decades.

Older work: primarily focus on small networks, comprising a handful of reactions, or on linear sequences of reactions.

Experimental analysis of such small-scale systems involves classical biochemistry, which includes

  1. enzyme activities
  2. reaction rate constants
  3. metabolic fluxes (the rates at which enzymes convert substrates into products)

Quantitative models of such small systems are kinetic models, use ordinary differential equations (parameters are the above measurement quantities)

with rise to prominence of systems biology increasing attention started to focus on genome-scale metabolic systems. Such systems comprise not just a few but hundreds or even thousands of reactions.

Two technological and methodological advances made the analysis of such large metabolic networks feasible,

  1. complete genome sequences
  2. identify the complete or nearly complete set of chemical reactions that proceed in an organism’s metabolism.
  • Difficulties of a quantitative understanding of genome-scale metabolic networks:
    • it is difficult estimate kinetic rate constants for hundreds of enzymes and
    • it is difficult to measure all metabolic fluxes in a large metabolic network (methods using isotopic tracers and other tools can measure the metabolic flux through many but not all reactions)

    then many approaches to understand the function of genome-scale metabolic networks focus on coarser-grained representations of such networks.

    • An especially prominent and fruitful in this area is called flux balance analysis (FBA), which requires only stoichiometric (化学当量的,化学计算的) information about individual reactions, and which can predict the biosynthetic abilities of a network under some general assumptions.
  • An important goal of system biology: predict a metabolic phenotype (the identity of the molecules that a metabolic network can synthesis), as well as their rate of synthesis, from a metabolic genotype (the set of enzymes encoded by a genome and their regulation)
    1. experimental techniques failed, and we need computational approaches.
    2. two objectives of FBA:
      • uses constraints given by reaction stoichiometry, reversibility, and maximal nutrient uptake rates of an organism to predict the metabolic fluxes that are allowed in a metabolic steady state (attained by a cell population that is exposed to the same environment over extended periods of times, such as in a chemostat), for all network reactions.
      • then uses linear programming to identify those allowed metabolic fluxes that maximize certain desired phenotypic properties, such as ATP or NADPH production, or the rate at which biomass with a known chemical composition is produced.
    3. FBA is only one among several constrain-based techniques. Others:
      • MOMA: minimization of metabolic adjustment, aims to predict how metabolic networks react to loss of individual chemical reactions.
      • extreme pathway analysis, elementary mode analysis, and the minimal metabolic behavior (MMB): decompose allowable fluxes into minimal sets analogous to basis vectors
    4. main limitation of most constraint-based methods: they do not account for the regulation of enzymes, such as through transcriptional regulation. microbial laboratory evolution experiments have shown that within a few hundred generations, a microbial strains’ growth phenotype is a given environment can approach the FBA-predicted phenotype. This means that regulatory constraints can be overcome on short evolutionary time scales.
    5. to use constraint-based modeling for any one organism, the reactions in its metabolic network have to be known, as do its biomass composition, and nutrient uptake constraints.

the reasons of focusing on genome-scale metabolic networks:

  1. a substantial amount about their structure and their evolution in recent years have been learned.
  2. they are the first systems that allow a comprehensive understanding of the relationship between a metabolic genotype (the DNA that encodes all metabolic enzymes an organism harbors) and a metabolic phenotype (the biosynthetic and energetic abilities of a metabolic network in a given environment.)

genome-scale metabolic networks are the first class of systems for which we can build a bridge between genotype and phenotype on the scale of entire organisms.

A metabolic network is a whole comprised of many enzyme parts. The structure and function:

  1. the whole network constrains how its parts change over time, that is, natural selection on the function of the whole imposes constraints on the parts.
  2. the parts and their change influence the function of the whole.

two complementary perspectives:

  1. different aspects of the evolution of network parts, and how the whole network constrains this evolution.
  2. changes in these parts that can change the function of the whole. (MORE IMPORTANT, because it can teach us about how evolutionary change in metabolic networks can lead to new biosynthetic abilities.)

A whole constraining its parts

constrained evolution of network enzymes

three principal processes are relevant to evolution of a metabolic network’s parts (the enzymes that catalyze its reactions):

  1. the accumulation of changes–point mutations–in the DNA sequence of the genes encoding these enzymes.
  2. the duplication of enzyme-coding genes
  3. changes in the regulation of enzyme activities, for example through changes in the regulatory DNA sequences that help regulate the transcription of enzyme-coding genes.

Point mutation

two classes of point mutations

  1. synonymous or silent mutations, $K_s$
  2. non-synonymous or amino acid replacement mutations $K_a$

the ratio $K_a / K_s$ is less than one for most enzyme-coding genes. the smaller this ratio is, the fewer amino acid replacement changes have been tolerated in the evolutionary history of a gene. In other words, a gene with a very small ratio $K_a / K_s$ has experienced stronger selection in its history than a gene with a large ratio $K_a / K_s$

evolutionary constraints can depend on enzyme’s location in a genome-scale metabolic network, and on the metabolic flux through the enzyme.

graph of enzyme: two enzymes are connected if they share at least one metabolite as a substrate or as a product.

An enzyme’s connectivity can be viewed as a measure of its position in the network, and of how central a role it might play in the network.

the connectivity of an enzyme can influence its rate of evolution. In the metabolic network of the yeast S.cerevisiae, more highly connected enzymes evolve more slowly, that is their ratio Ka/Ks is lower than for less connected enzymes. The likely reasons comes from the effects of perturbations on the rate at which a highly connected enzyme catalyzes formation of its reaction product. Products of highly connected enzymes may be substrates for many other reactions.

the association between enzyme connectivity and constraint is not strong and may even be absent in some groups of organisms, such as mammals.

analogous observations hold for enzymes with high metabolic flux. these are enzymes that turn over many molecules of substrate per unit time, and they are often involved in central metabolic processes. Specifically, enzymes with high flux tend to evolve more slowly. They can tolerate fewer amino acid changes than enzymes with low flux. the reason becomes clear if one considers that most amino acid substitutions will reduce rather than increase an enzyme’s activity, and thus reduce the metabolic flux that the enzyme can support.

in addition to the relationship between enzyme connectivity, flux, and constraints on enzyme evolution, several other observations have been made about the constrained evolution of metabolic genes. e.g, metabolic genes can be more constrained in their evolution than non-metabolic genes (what is metabolic gene?) in addition, different classes of enzymes are constrained to a different degree.

in a minority of genes, the incidence of amino acid changing substitutions may actually exceed that of silent substitutions. in these genes, the ratio Ka/Ks may exceed 1. Patterns like this indicate the action of positive selection, that is, one or more amino acid changes were favored by selection, and have swept through an evolving population, which can explain the elevated rated of amino acid change. A ratio of Ka/Ks that exceeds 1 indicates beneficial functional changes in a protein. Unfortunately, without detailed and laborious biochemical analyses it can be difficult to understand why a change is beneficial.

Gene Duplication

Gene duplication is a ubiquitous process in the evolution of most genomes. For example, as many as half of the genes in the human genome have a duplicate.

Gene duplication arise as by-products of DNA recombination and DNA repair processes that sometimes duplicate stretches of an organism’s DNA. The duplicated stretches can be very short, comprising only a few nucleotides, or they can be very long, comprising large segments of chromosome, entire chromosomes, or even the entire genome. If any duplicated stretch of DNA includes at least one gene, a gene duplication has occurred. Most duplicate genes are eliminated from a genome shortly after the duplication. However, a small fraction of duplicates is usually preserved, indicating that their duplication either did no harm or was favored by selection. Over time duplicates may preserve a similar function, they may acquire specialized functions, or they may evolve completely new functions.

the metabolic significance of gene duplications is that they can increase the level of an enzyme’s expression. Enzymes that are products of duplicated genes may occur in higher concentrations in the cell, and they may therefore support greater metabolic flux through them. One might therefore predict that enzymes with high metabolic flux should often be the product of duplicate genes.

the preservation of gene duplications is favored in enzyme-coding genes whose protein products catalyze high-flux reactions. Many such genes occur in central metabolism.

an extreme form of duplication is the duplication of an entire genome. most duplicated genes typically get lost over time, and only a small fraction of them remain. It has been shown that the enzyme-coding genes preserved in duplicate after an ancient genome duplication in S.cerevisiae preferentially encode glycolytic enzymes. This preferential preservation allows a higher flux through glycolysis relative to other parts of yeast’s metabolism, because it increases the total amount of glycolytic enzymes relative to other enzymes.

the constraints that a whole metabolic network imposes on the duplication of its parts arises through the increased enzyme expression that such duplication cause.

Gene Regulation

enzymes can be regulated on the level of their RNA expression, their protein expression, their biochemical activity, for example through phosphorylation (磷酸化), and in many other ways.

regulation is extremely malleable and can change on short evolutionary time-scales for many enzymes.

Parts Transforming the Whole

it is often useful to think of a metabolism as being partitioned into two major parts, a core and a periphery. core metabolism comprises processes central to life, while the periphery includes reactions that are needed to metabolize specific sources of chemical elements. The periphery also includes secondary metabolism.

core metabolism is held to be highly optimized in different ways. e.g, it has been suggested that among a number of alternative “designs” of the TCA cycle, the structure of the cycle realized in nature uses the smallest number of chemical transformations, and produces the highest yield in ATP.

although changes in core metabolism do occur, variation in the reaction complement of a metabolic network tends to be more frequent in the periphery of metabolism.

Reaction Deletions

The elimination can occur through loss of function mutations in enzyme-coding genes.

FBA can predict the spectrum of molecules that can be synthesized by a given metabolic network from a set of nutrients in the environment.

FBA is also useful to reconstruct the evolutionary trajectory that can transform a complex metabolic network like that of E. coli into the much simpler network of its relative Buchnera through a sequence of mutations that eliminate enzyme-coding genes and reactions from a metabolic network.

Reaction Additions

several mechanisms by which reactions can get added to a network

  1. duplication
  2. horizontal gene transfer

horizontal gene transfer occurs both in prokaryotes (原核生物) and eukaryotes (真核生物), but it is much more prevalent in prokaryotes. It can change genome organization on short evolutionary time-scales.

metabolic genes that are preserved after horizontal transfer are often responsible for metabolic reactions that transport and metabolize nutrients.

A systematic Analysis of Metabolic Innovation

Beyond the well-worn idea that innovations require a combination of mutation and natural selection, we know little about the principles underlying their origins. To identify such principles requires that one can study the relationship between genotype and phenotype systematically, not just for one genotype and one phenotype, but for many genotype and many phenotypes.

systems where one predict phenotype from genotype are currently the best starting points for understanding principles of innovation.

An organism’s metabolic genotype is the part of the organism’s genome that encodes metabolic enzymes. It is often more expedient to represent this genotype more compactly, such as through the presence or absence of specific enzyme-catalyzed reactions in the network.

The current known universe of metabolic reactions comprises more than 5000 such reactions, each of which can be present or absent in the metabolic network of any one organism. The whole possible metabolic networks form a vast collection, a space of metabolic genotypes. This space is much larger than the number of metabolic networks that could have existed on earth since life’s origin.

define a distance between metabolic genotypes as the fraction of metabolic reactions in which these genotypes differ. Two genotypes (metabolic networks) would differ maximally if they did not share a single reaction.

metabolic genotype space is a high dimensional space with many counterintuitive properties, whose structure is akin to that of hypercubes–cubes in multidimensional spaces.

to classify metabolic phenotypes, it is expedient to focus on metabolism’s central task, the ability to sustain life – to synthesize all biomass molecules – in different chemical environments. for example, if one focuses on carbon metabolism, one can ask which molecules can serves as sole carbon and energy sources for a metabolic network.

populations of organisms: each of whose member may have a different metabolic genotype, as a collection of points in metabolic genotype space. such a population explores metabolic genotype space through mutation (changes in enzyme-coding genes that add or delete reactions from a network) and natural selection that preserves well-adapted phenotypes. suppose that individuals in this population have a metabolic phenotype that is well adapted to a population’s current environment.

two major difficulties with finding such novel and superior metabolic phenotypes through a blind evolutionary search conducted by a population in the vast metabolic genotype space.

  1. only one or a few metabolic genotypes in this space have the superior phenotype, while the space is so large, it would be difficult or impossible to find these genotypes in realistic amounts of time.
  2. during this search, individuals in a population have to preserve their old phenotype, which allows them to survive on existing nutrients.

two major features of the space can help overcome them

  1. there are not few but hyperastronomically many genotypes with a given metabolic phenotype, these metabolic genotypes are connected in metabolic genotype space in the following sense. genotype network
  2. the spectrum of new phenotypes in the neighborhood of one metabolic genotype is typically not identical to that in the neighborhood of another genotype. Different neighborhoods of metabolic networks contain different novel phenotypes.


  1. genotypes with the same phenotype form large and far-reaching genotype networks.
  2. the neighborhoods of different genotypes on the same genotype network typically contain different metabolic phenotypes.

the features of metabolic genotype space occur in systems whose genotype-phenotype relationship is such that more genotypes than phenotypes exist, and where phenotypes are to some extent robust to changes in genotype.

Conclusions and Future Challenges

many of the studies discussed are based on comparative analyses of metabolic networks, aided by computational predictions of metabolic phenotypes.

the ability to predict metabolic phenotype from metabolic genotype has opened completely new avenues for a systematic understanding of metabolic innovation. It allows us to study metabolic innovations not one by one, as case studies in natural history, but systematically, as part of a metabolic genotype space that encapsulates all possible metabolism.

Published in categories Note