Table 1. Likely origins of plastids inferred under monophyly or different models of polyphyly.
unique unique monophyletic monophyletic
closely related closely related monophyletic monophyletic
closely related distantly related polyphyletic monophyletic
distantly related closely related monophyletic polyphyletic
distantly related distantly related polyphyletic polyphyletic
cult to transport across the plastid envelope. It would with implications for attempts to discriminate between
therefore be difficult to relocate the genes for such pro- monophyly and polyphyly of plastid origins (Lockhart et
teins to the nucleus, with consequent synthesis of the pro- al. 1992). Notwithstanding the degeneracy of the genetic
teins in the cytosol and post-translational import into the code, the biased nucleotide composition affects the
organelle. However, several studies have shown that indi- amino-acid composition of the proteins encoded. This can
vidual plastid genes can be artificially introduced into the be seen particularly clearly by looking at proteins that are
nucleus and, if the genes have been modified by the fusion plastid encoded in some species, and nuclear encoded in
of a region encoding a plastid-targeting sequence to the others. A good example is the plastid SecA, which is
coding sequence for the mature protein, the resulting pro- involved in translocation of lumenal proteins across the
tein can be re-imported effectively into the organelle thylakoid membrane. SecA is encoded in the nucleus of
(Cheung et al. 1988; Kanevski & Maliga 1994). Although green plants and algae, and in the plastid of non-green
such studies show that these proteins can, in principle, be algae. Table 2 shows the predicted content of several
re-imported into the organelle, they do not exclude the amino acids for SecA proteins from different sources
possibility that the need for import may cause a minor (Barbrook et al. 1998). The codons for the amino acids
reduction in fitness on the organism. A second suggestion alanine, glycine and proline are GC-rich, in that they have
for the retention of genes by the plastid is that it allows a necessarily to contain at least two G or C residues (and
rapid regulation of expression in response to the redox may contain three), and the codons for phenylalanine, iso-
state of the organelle (Allen 1993; Pfannschmidt et al. leucine, lysine, asparagine and tyrosine are AT-rich, in
1999). We will discuss the much-reduced dinoflagellate that they have necessarily to contain at least two A or T
plastid genome, and argue that its residual gene content residues (and may contain three). The SecA protein from
is consistent with Allen’s proposal. the red and brown algae can be seen to be relatively
depleted in the GC-rich codons for (A,G,P) and enriched
in the AT-rich codons (F,I,K,N,Y). Thus, if a protein
remains encoded in the plastid, there will be a shift in its
One of the notable features of plastid genomes is their amino-acid composition. For many proteins, this may be
high AT-content, both in coding regions and in non- detrimental to their function, and there may therefore be
coding regions. The high AT-content is not restricted to a selective advantage in transfer of the gene to the nucleus.
green plastids; rather it is seen across the whole spectrum Genes that have moved to the nucleus have been
of pigment types, and is higher than generally seen in described as ‘molecular refugees’ moving to a less oppress-
cyanobacteria. For example, the plastid genomes of Nicoti- ive coding environment (Howe et al. 2000). This potential
ana tabacum, Porphyra purpurea and Odontella sinensis are driving force for the transfer of genes to the nucleus may
ca. 62%, 67% and 68% AT, respectively (Reith & Mun- not act independently of the other driving forces proposed,
holland 1995; Kowallik et al. 1995). The genome of Syne- such as Muller’s ratchet, but it may act synergistically with
chocystis sp. PCC6803 is ca. 52% AT (Nakamura et al. them. (Although this argument has been presented in
1998). Although one cannot exclude the possibility that terms of a relatively GC-rich organelle genome becoming
the plastid originated from a bacterial species with a more AT-rich, if the plastid originated from an AT-rich bac-
AT-rich genome, it seems more likely that the plastid gen- terium and genes have the possibility of becoming more
ome has become AT-rich since endosymbiosis. This shift GC-rich by transposition to the nucleus, the same argu-
in nucleotide composition could be due to the nature of ment applies.)
DNA damage occurring in the plastid, or to a tendency If some proteins were particularly seriously affected by
of the plastid DNA polymerase to mis-incorporate A and the shift in amino-acid composition resulting from an
T rather than G and C in replication, or to a bias in the increased AT-content of their genes, there might be a
DNA repair machinery. (Interestingly, mitochondrial gen- greater advantage in the transfer of those genes to the
omes are also rather AT-rich (Lang et al. 1999).) It is nucleus. (Conversely, the shift towards AT-richness might
worth noting that such a bias in nucleotide composition even be advantageous for some genes and proteins, mak-
causes serious problems for phylogenetic inference based ing it less favourable for the genes to be transposed to the
on plastid genes. Many of the techniques used assume that nucleus.) Differing effects of a shift in amino-acid compo-
nucleotide composition remains constant over time, and sition on different proteins might therefore offer an expla-
violation of this can result in organisms with similar nucle- nation of why some genes have been retained by the
otide compositions being grouped artefactually closely, plastid, some have been transferred to the nucleus, and
Table 2. Content (%) of GC-rich (A,G,P) and GC-poor (F,I,K,N,Y) codons of the secA gene from green plants (Pisum sativum,
Spinacia oleracea), oxygenic photosynthetic bacteria (Anacystis nidulans R2, Anabaena variabilis, Phormidium laminosum, Synecho-
cystis sp. PCC6803, Prochloron didemni, Prochlorothrix hollandica) and plastid genomes of algae (Antithamnion sp., Porphyra purpurea,
Heterosigma carterae, Odontella sinensis and Pavlova lutherii ).
(Averages shown in italics for a group of organisms are above the overall average. Those underlined are below the overall average.
Modified from Barbrook et al. (1998).)
GC-rich GC-poor
nuclear or bacterial
Pisum sativum 5.0 4.0 2.0 4.0 7.9 7.9 3.0 0.0
Spinacia oleracea 5.9 3.0 2.0 4.0 8.9 7.9 4.0 0.0
Anacystis nidulans R2 6.9 5.9 3.0 3.0 5.9 5.9 4.0 2.0
Anabaena variabilis 6.9 4.0 2.0 2.0 8.9 5.9 3.0 2.0
Phormidium laminosum 5.9 1.0 3.0 2.0 5.0 5.9 5.9 3.0
Synechocystis sp. 4.0 2.0 2.0 2.0 6.9 6.9 5.9 2.0
Prochloron didemni 6.9 0.0 3.0 2.0 4.0 6.9 5.9 3.0
Prochlorothrix hollandica 5.9 2.0 4.0 2.0 6.9 5.0 4.0 3.0
average 5.9 2.7 2.6 2.6 6.8 6.5 4.2 1.9
Antithamnion sp. 4.0 0.0 0.0 1.0 13.9 15.8 10.9 5.0
Porphyra purpurea 4.0 0.0 2.0 2.0 11.9 7.9 5.0 5.9
Heterosigma carterae 7.9 0.0 2.0 7.9 13.9 12.9 7.9 4.0
Odontella sinensis 5.0 0.0 4.0 4.0 11.9 6.9 7.9 3.0
Pavlova lutherii 3.0 2.0 0.0 4.0 10.9 12.9 7.9 5.9
average 4.8 0.4 1.6 3.8 12.5 11.3 7.9 4.8
average overall 5.5 1.8 2.2 3.0 9.0 8.4 5.6 3.0
individual positions within the proteins encoded may be cases across some of the minicircles of a species but
particularly affected, and that this was masked in the not all of them. The sequences of core regions of
analysis of B. burgdorferi sequences by considering the closely related species are very different from each
coding region as a whole. Recognizing this would require other. Strikingly, the coding regions of minicircles
the identification of residues that are generally conserved with different genes are always in the same orien-
and seeing to what extent these have resisted the effects tation with respect to the core region. Given the
of biased nucleotide composition in genomes such as B. number of minicircles that have now been studied,
burgdorferi.) It is possible that considerations such as the it seems unlikely that this conservation of orientation
need for redox control may be responsible for determining is by chance. We return to this observation later.
which genes are retained in the organelle. (ii) Some minicircles contain more than one gene. For
example, the petB and atpA genes are on a single
3. A REDUCED PLASTID GENOME IN minicircle in both A. carterae and A. operculatum
DINOFLAGELLATES (Barbrook et al. 2001; Hiller 2001). The same is true
for psbD and psbE (Hiller 2001; R. E. R. Nisbet,
The general pattern of plastid genome organization is unpublished data). These arrangements are not seen
for the complement of 100–200 genes to be located on a across all species. For example, the petB and atpA
large circular molecule (e.g. Sugiura 1992; Glöckner et al. genes of H. triquetra are on different minicircles
2000). However, a striking exception to this has been (Zhang et al. 1999). The arrangements also do not
shown in several species of peridinin-containing dinoflag- reflect the general pattern in conventional plastid
ellate algae. The organization of plastid genes has been genomes, where petB, atpA, psbD and psbE genes are
best characterized in Heterocapsa triquetra, Amphidinium located at different positions, so it is unlikely that
operculatum and A. carterae (Zhang et al. 1999; Barbrook & these two-gene minicircles can have been derived
Howe 2000; Barbrook et al. 2001; Hiller 2001). These simply by fragmentation of a conventional genomic
species appear to lack a conventional plastid genome and circle. Northern analysis of RNA from A. opercula-
have instead several small circular DNA molecules, typi- tum indicates that the atpA and petB genes are either
cally about 2–3 kbp in size, which generally contain a sin- not co-transcribed or form part of an unstable dicis-
gle gene. Although plasmid-like DNAs have been reported tronic transcript that is very rapidly cleaved into
from plastids of some green algae, they seem to be in monocistronic ones (Barbrook et al. 2001).
addition to the ‘main’ chloroplast genome and may not
(iii) Some minicrcles contain gene fragments, or no
encode functional genes (e.g. La Claire & Wang 2000).
genes at all. Several minicircles have been reported
The difficulty of isolating intact dinoflagellate plastids
from A. operculatum and A. carterae that contain
means that these minicircles have not yet been shown
fragments of coding regions, or no identifiable
directly to be located in the plastid. However, the indirect
coding regions at all, although they retain a recogniz-
evidence for this location appears strong. The minicircle
able minicircle core (Barbrook et al. 2001; Hiller
genes encode products that, in all other species, are plas-
2001; V. L. Koumandou and R. E. R. Nisbet,
tid-encoded, and these include rRNA (see below). Fur-
unpublished data). More complex minicrcles have
thermore, the predicted protein products do not include
been reported from H. triquetra that contain frag-
organellar targeting sequences, and no other copies of the
ments of more than one gene, and it has been pro-
minicircle genes have been detected.
Remarkably, only few genes have been identified so far posed that these originated by fusion of two separate
on the putative plastid minicircles. The following have minicircles followed by deletion (Zhang et al. 2001).
been reported on minicircles from one or more species: (iv) The coding regions of the minicircles show unusual
atpA, atpB, petB, petD, psaA, psaB, psbA, psbB, psbC, psbD, features. One of the most striking features revealed
psbE, 16S rRNA and 23S rRNA (Zhang et al. 1999; Bar- by inspection of the coding regions is the apparent
brook & Howe 2000; Barbrook et al. 2001; Hiller 2001). use of anomalous initiation codons. For example,
It is remarkable that no evidence has yet been found of GTA has been proposed as an initiation codon for
RNA polymerase subunit genes, ribosomal protein genes the psaA and psbB genes, and possibly also psbC of
or tRNA genes. As discussed above, one proposal for the A. operculatum (Barbrook & Howe 2000; Barbrook
retention of a plastid genome is to allow rapid regulation et al. 2001). In the case of psbB, the predicted N-
of important genes in response to redox processes in the terminus of the protein aligns closely with well-con-
plastid (Pfannschmidt et al. 1999). The fact that all the served sequences from other plastids (figure 2). This
protein genes identified so far encode major subunits of makes the assignment of GTA as initiation codon
the complexes of the light reactions of oxygenic photosyn- reasonably convincing, although there are as yet no
thesis seems to be consistent with this. Several features of direct protein sequence data to confirm it. A limited
the dinoflagellate plastid minicircles are worthy of com- number of studies using RT–PCR have so far failed
ment. to detect editing of dinoflagellate plastid transcripts,
although the possible existence either of a low level
(i) Minicircles contain a conserved ‘core’ region. This of edited transcripts or of heavily modified tran-
region is similar between minicircles of a given spec- scripts (which therefore escaped amplification in
ies carrying different genes. There are sections RT–PCR) cannot be excluded. If GTA is indeed
within the core that are essentially completely con- used as an initiation codon, this would be very
served across all minicircles of a given species, and unusual for organelle genomes generally (Edqvist et
others that are moderately well conserved: in many al. 2000).
Figure 2. Aligned predicted N-termini of psbB from a range of plants and algae.
Figure 3. Codon preferences for the psaA, psbA,B,C, atpA and petB genes of Amphidinium operculatum and (in parentheses)
Heterocapsa triquetra.
Figure 3 shows the codon preference for A. operculatum ified makes it tempting to suggest that this represents a
and H. triquetra over a set of genes, psaA, psbA, psbB, psbC, plastid genome in the final stages of gene transfer to the
petB and atpA, which have been characterized from both. nucleus, with only those genes left that are essential for
The codon preference is heavily biased, although there effective regulation in response to redox or other require-
does not appear to be a consistent pattern, such as a pref- ments. It is of course possible that additional genes will
erence for A or T at the third codon position. So, for be discovered. However, the results of PCR with primers
example, there is a strong preference in A. operculatum for for genes that are generally located in the chloroplast,
GGT (Gly) over GGC/A/G and TCT (Ser) over together with sequencing of randomly selected clones indi-
TCC/A/G or AGT/C, yet TTC (Phe) is much preferred cates that the number of additional genes found will be
over TTT. There are also clear differences in the bias low. In support of this, the gene for the large subunit of
between the species. For example, although H. triquetra ribulose bis-phosphate carboxylase has been shown to be
has the same preference for TTC and GGT, TCA (which located in the nucleus in the dinoflagellate Gonyaulax
was only rarely used in A. operculatum) is much preferred polyedra (although the gene encodes a different form of
as a serine codon. This pattern of codon preferences is the enzyme from that usually found in plastids (Morse et
rather different from that seen in other plastids, where al. 1995)). The ribulose bisphosphate carboxylase–
there is a consistent preference for A or T at the third oxygenase large subunit gene is plastid located in other
codon position. The pattern of dinoflagellate preferences
algae and plants. Why the dinoflagellates should be in
is arguably rather more similar to cyanobacteria, where the
such an advanced state of gene loss is not clear. It is also
preferred nucleotide at the third position differs among
not clear how the minicircles are generated. They have a
different codon families (compare, for example TTT/C
superficial resemblance to the small circular DNA species
with GGT/C/A/G) as shown in figure 4. It will be interest-
found in plant mitochondria, which are derived by frag-
ing to see how patterns of codon preference vary across a
mentation of a ‘master’ chromosome by recombination
broader range of dinoflagellates. If any are found to have
across repeated sequences (Lonsdale et al. 1984). How-
a ‘conventional’ plastid genome organization, it will be
particularly interesting to see if they also have the conven- ever, there is as yet no evidence of a master chromosome
tional plastid preference for third position A or T. in dinoflagellate plastids. In addition, in the rare examples
Why should the dinoflagellate plastid genome be where there are two genes on the same minicircle, these
organized in this way? The limited number of genes ident- genes are not generally adjacent in other plastid genomes,
V GTT 111 (54) A GCT 170 (89) D GAT 85 (57) G GGT 155(145)
V GTC 3 (40) A GCC 6(144) D GAC 15 (52) G GGC 12 (75)
V GTA 80 (45) A GCA 78 (8) E GAA 120 (89) G GGA 97 (28)
V GTG 7 (59) A GCG 5 (23) E GAG 9 (26) G GGG 12 (37)
Figure 4. Codon preferences for the psaA, psbA,B,C, atpA and petB genes in the plastid of the liverwort Marchantia polymorpha
and (in parentheses) the cyanobacterium Synechocystis sp. PCC6803.
moved into the nucleus, and he has had to attach an
Discussion extremely long polar import sequence in order to achieve
R. Fray (Plant Science Division, University of Nottingham, that.
Nottingham, UK ). Have you looked at the transcripts from C. J. Howe. John Allen has, of course, theories on the
these minicircles? Do you get a defined-length transcript retention of genes. I have a theory on the retention of one
or does the RNA polymerase just keep going round these gene, which is the glutamate tRNA that is used to activate
circles many times? glutamate for haem biosynthesis. If you are not really able
C. J. Howe. That is a good question. We do find to import tRNAs into the organelle, but at least the early
defined-length transcripts, and they correspond essentially stages of haem biosynthesis are carried out in the
with the size that one would expect from the coding region organelle, then you will at least have to retain that gluta-
itself, with a little bit added on at each end. They do not mate tRNA gene. Obviously, that leaves a lot of others to
correspond to the size of the whole minicircle, and in the be explained.
cases where we have two genes in the minicircle we seem J. C. Gray (Department of Plant Sciences, University of
to find separate transcripts for each gene. Now that does Cambridge, Cambridge, UK ). Integrons have a relatively
not rule out the possibility of a much larger transcript that short recognition sequence, and not the 500 base pairs
is processed very rapidly, so we do not pick up the inter- that you have as your common region. Surely that would
mediate dicistronic transcript, but we seem to find just suggest your common region is doing something else
single-sized transcripts. rather than providing an integration-recognition site?
C. J. Leaver (Department of Plant Sciences, University of C. J. Howe. Yes, it may well be doing something else.
Oxford, Oxford, UK ). Does the copy number of these In classic integrons, they talk about a 59 bp element
circles or those genes vary relative to each other, or when (which in fact can be anything from 59 to about 150 bp).
you culture these, or is it constant? Within the minicircle core region there are regions of
C. J. Howe. We have looked at that to some extent. It greater conservation and lesser conservation, so it is poss-
is curious that one of the genes that you seem to pick up ible that one part of the core might be functioning as that
quite often when you are cloning these minicircles is the kind of attachment site while the rest could be doing other
psbA minicircle. We thought everything needs lots of psbA things, such as acting as a promotor or indeed supplying
because it is turning over very rapidly, so maybe the dino- a replication origin, allowing the whole thing to replicate
flagellates have many copies of the psbA compared with independently—this is just speculation. There are quite a
the other genes, so they make lots and lots of protein. But lot of short inverted repeats in these core regions, which
like all the best hypotheses, that turned out to be wrong. is interesting because one of the features of the 59 bp
Copy number experiments that we have done suggest that elements in the integron model is that they also have
the minicircles all seem to occur in fairly similar numbers inverted repeats bounding them, so there is an extension
of copies. There is not a specific overrepresentation of the of that similarity.