Evolution of Mitochondrial Genomes and the Mitochondrial Genetic Code
Paul Higgs
Dept. of Physics and Astronomy, McMaster University, Hamilton, Ontario, Canada.
http://physwww.mcmaster.ca/~higgsp/Home.htm
Mitochondria are components of cells that possess their own genome distinct from the main genome in the nucleus. In animal species, the mitochondrial genome is approximately 16000 bases long and contains only 37 genes. We have built a relational database called OGRe (Jameson et al. 2003) that contains the genomes of over 700 species. This is available on line at http://ogre.mcmaster.ca. It allows comparison of gene sequences, codon usage and the order of genes across species.
The genetic code is the set of assignments between the 64 possible codons in DNA and the 20 possible amino acids in proteins (Figure 1). The code controls the process of translation (i.e. protein synthesis). The genetic code contains eight blocks of four codons where the base at the third position can mutate synonymously without affecting the corresponding amino acid. The frequencies of bases at these four-fold degenerate (FFD) sites respond directly to mutation pressure, i.e. the frequency of each base can increase or decrease according to the relative rates of mutation to and from that base. Due to the asymmetric replication mechanism of the mitochondrial genome, the two strands of DNA are not equivalent (Reyes et al. 1998). The frequencies of each base are not equal on the two strands, and the frequencies of the four bases on one strand are not equal (e.g. %G on one strand is equal to %C on the complementary strand, but %G and %C on the same strand are not equal). Base frequencies at FFD sites vary greatly between species, indicating that the relative rates of the different types of mutation differ between species.
At first and second positions in the codon, mutations in the DNA usually cause changes at the amino acid level. Many amino acid changes will be prevented by natural selection if they disrupt the protein function. Therefore, selection on the amino acid sequence reduces the degree of base frequency variation at first and second positions relative to the FFD sites. Nevertheless, mutation pressure on the DNA is sufficient to cause some variation in amino acid frequencies too (Bharanidharan et al. 2004; Singer & Hickey, 2000).
Using a simple evolutionary model, we show that first position sites in mitochondria are less constrained by selection than second position sites, and therefore that the frequencies of bases at first position are more responsive to mutation pressure than those at second position (Urbina et al. 2006). We define a measure of distance between amino acids that is dependent on 8 measured physical properties, and a similarity measure that is the inverse of this distance. Columns 1, 2 3 and 4 of the genetic code correspond to codons with U, C, A and G in their second position, respectively. The similarity of amino acids in the four columns decreases systematically from column 1 to 2 to 3 to 4. We then show that the responsiveness of first position bases to mutation pressure is dependent on the second position base, and follows the same decreasing trend through the four columns. This shows that the more different two amino acids are in their physical properties, the more selection acts against the amino acid change, and the less frequently the change is seen in the data. The same argument can be used to predict the responsiveness of individual amino acid frequencies to mutation pressure: the amino acids whose frequencies vary most are those whose neighbouring amino acids in the code structure are most similar to them.
These results are linked to the observation that the genetic code appears to be optimized (Freeland & Hurst, 1998). Codons differing by a single mutation usually code for amino acids with similar properties. For this reason, the canonical code minimizes the effect of mutational and translational errors. The real code is better than almost all randomly reshuffled codes in this respect. This shows that natural selection influenced the evolution of the canonical code very early in its history.
Most organisms use the canonical genetic code, which evolved prior to the last common ancestor of all current life. However, many modified genetic codes are found in specific genomes in which one or more codons have been reassigned to a different amino acid (Knight et al. 2001). The majority of codon reassignments occur in mitochondrial genomes. Figure 1 shows the three examples of codon reassignments in the vertebrate mitochondrial genome.
The molecular mechanisms that give rise to codon reassignments are well understood in many cases. Usually changes occur in tRNAs such as mutation in the anticodon or modification of a base to a non-standard base. The puzzle is to understand the way that this change became fixed in a population. It should be a difficult and disruptive process for an organism to go through because of the negative selective effects that occur during the change-over period. I will discuss a new theory for codon reassignment that incorporates four possible mechanisms of codon reassignment (Sengupta and Higgs, 2005). I will then discuss the data from mitochondrial genome sequences that can be used to find the phylogenetic locations of reassignments and to determine which of the mechanisms has occurred in real cases.
References
Bharanidharan D, Bhargavi GR, Uthanumallian K, Gautham N (2004) Correlations between nucleotide frequencies and amino acid composition in 115 bacterial species. Biochem. Biophys. Res. Comm. 315: 1097-1103.
Freeland SJ, Hurst LD (1998) The genetic code is one in a million. J. Mol. Evol. 47: 238-248.
Jameson D, Gibson AP, Hudelot C, Higgs PG (2003) OGRe: a relational database for comparative analysis of mitochondrial genomes. Nucl. Acids. Res. 31, 202-206.
Knight RJ, Freeland SJ, Landweber LF (2001) Rewiring the keyboard: Evolvability of the genetic code. Nature Reviews Genetics 2: 49-58.
Reyes A, Gissi C, Pesole G, Saccone C (1998) Asymmetrical directional mutation pressure in the mitochondrial genome of mammals. Mol Biol Evol 15: 957-966.
Sengupta S, Higgs PG (2005) A Unified Model of Codon Reassignment in Alternative Genetic Codes. Genetics 170, 831-840.
Singer GAC, Hickey DA (2000) Nucleotide bias causes a genome wide bias in the amino acid composition of proteins. Mol Biol Evol 17: 1581-1588.
Urbina D, Tang B, Higgs PG. (2006) The response of amino acid frequencies to directional mutational pressure in mitochondrial genome sequences is related to the physical properties of the amino acids and to the structure of the genetic code. J. Mol. Evol. (in press). Online at http://dx.doi.org/10.1007/s00239-005-0051-1
Second Position |
||||||
U |
C |
A |
G |
Third Pos. |
||
F i r s t P o s i t i o n |
U |
Phe Phe |
Ser Ser Ser Ser |
Tyr Tyr |
Cys Cys |
U C A G |
Leu Leu Leu Leu Leu Leu |
Stop Stop |
Trp Trp |
||||
C |
Pro Pro Pro Pro |
His His |
Arg Arg Arg Arg |
U C A G |
||
Gln Gln |
||||||
A |
Ile Ile |
Thr Thr Thr Thr |
Asn Asn |
Ser Ser |
U C A G |
|
Met Met |
Lys Lys |
Stop Stop |
||||
G |
Val Val Val Val |
Ala Ala Ala Ala |
Asp Asp |
Gly Gly |
U C A G |
|
Glu Glu |
Figure 1 - The vertebrate mitochondrial genetic code. This differs from the canonical code in three ways: AGU is Trp instead of Stop; AUA is Met instead of Ile; AGA and AGG are Stop instead of Arg.