Proteins, evolution of

Proteins, evolution of

Proteins are large organic molecules that are involved in all aspects of cell structure and function. They are made up of polypeptide chains, each constructed from a basic set of 20 amino acids, covalently linked in specific sequences. Each amino acid is coded by three successive nucleotide residues in deoxyribonucleic acid (DNA); the sequence of amino acids in a polypeptide chain, which determines the structure and function of the protein molecule, is thus specified by a sequence of nucleotide residues in DNA. See Gene, Protein

Sequence analyses of polypeptides which are shared by diverse taxonomic groups have provided considerable information regarding the genetic events that have accompanied speciation. Interspecies comparison of the amino acid sequences of functionally similar proteins has been used to estimate the amount of genetic similarity between species; species that are genetically more similar to each other are considered to be evolutionarily more closely related than those that are genetically less similar.

The study of functionally related proteins from different animal species has suggested that single amino acid substitutions are the predominant type of change during evolution of such proteins. Insertions or deletions of one or more amino acids have also been reported. In proteins that serve the same function in dissimilar species small differences in the amino acid sequence will often not affect overall functioning of the protein molecule.

In taxonomic protein sequence analysis, the amino acid sequence of a protein from one species is compared with the amino acid sequence of the protein from another species, and the minimum number of nucleotide replacements (in DNA) required to shift from one amino acid to another is calculated. Peptide “genealogies” can be constructed from many such comparisons in a related group of organisms.

Classical versus protein-derived phylogenies

It has been recognized for a long time that the amino acid sequences of a protein are species-specific. Protein sequencing has been used widely since the mid-1960s to examine taxonomic relationships. Results indicate that, in general, genealogical relationships (phylogenies) based on sequence analyses correspond fairly well with the phylogeny of organisms as deduced from more classical methods involving morphological and paleontological data.

Evolutionary biologists are turning increasingly to the new nucleic acid sequencing technology as an alternative to determining the amino acid sequence of proteins. Knowing the actual nucleotide sequences of genes rather than having to infer them from protein sequence data allows more accurate data to be used in determining genealogical relationships of organisms. For example, silent nucleotide substitutions (that is, base changes in DNA codons that do not result in amino acid changes) can be detected.

Orthologous and paralogous sequences

The reconstruction of phylogenies from analysis of protein sequences is based on the assumption that the genes coding for the proteins are homologous, that is, descendants of a common ancestor. Those sequences whose evolutionary history reflects that of the species in which they are found are referred to as orthologous. The cytochrome c molecules (present in all eukaryotes) are an example of an orthologous gene family. Organisms as diverse as humans and yeast have a large proportion of the amino acids in these molecules in common; they derive from a single ancestral gene present in a species ancestral to both these organisms and to numerous others.

Sequences which are descendants of an ancestral gene that has duplicated are referred to as paralogous. Paralogous genes evolve independently within each species. The genes coding for the human α, β, γ, δ, ε, and ζ hemoglobin chains are paralogous. Their evolution reflects the changes that have accumulated since these genes duplicated. Analysis of paralogous genes in a species serves to construct gene phylogenies, that is, the evolutionary history of duplicated genes within a given lineage.

Rate of evolution

Sequence data from numerous proteins have shown that different proteins evolve at different rates. Some proteins show fewer amino acid substitutions, or more conservation, than others. Proteins such as immunoglobulins, snake venom toxins, and albumins have changed extensively. Their function apparently requires relatively less specificity of structure and therefore has relatively greater tolerance for variance. By contrast, certain proteins, such as various histones, have changed relatively little over long periods of time. Histone H4 shows extreme conservation; it has essentially the same sequence in all eukaryotes examined. Such extensive sequence conservation is generally interpreted to indicate that the functions of H4 are extremely dependent on its entire structure; thus little or no change is tolerated in its structure. The rates at which different proteins evolve are therefore thought to be due to different functional constraints on the structure of the proteins—the more stringent the conditions that determine the function of a protein molecule, the smaller the chance that a random change will be tolerated in its structure.

Each protein generally has a nearly constant evolutionary rate (the rate of acceptance of mutations) in each line of descent. Exceptions to this rule have been reported, however, and much effort has been spent on determining whether these anomalies are genuine. Some anomalies have been shown to be due to comparison of nonhomologous proteins, and others due to sequencing errors. Other deviations from constant rate of sequence evolution remain to be explained; once uncovered, these may provide useful information about the mechanisms of evolution at the molecular level.