Protein Interaction Maps Using Yeast Two-Hybrid Assay: Shayantani Mukherjee, Sampali Bal and Partha Saha
Protein Interaction Maps Using Yeast Two-Hybrid Assay: Shayantani Mukherjee, Sampali Bal and Partha Saha
Protein Interaction Maps Using Yeast Two-Hybrid Assay: Shayantani Mukherjee, Sampali Bal and Partha Saha
ARTICLE
REVIEW
REVIEW ARTICLE
stream reporter gene. Based on this initial experiment, the
cDNA of a protein of interest is cloned in an appropriate
plasmid, which will express it as a BD fusion protein in
yeast cells in a two-hybrid screening. This BD fusion protein is known as bait. Similarly, a prey is prepared by
cloning the cDNA of another protein of interest in an
appropriate plasmid, which will express it as an AD
fusion protein in yeast cells. Then both the plasmids are
co-transformed into yeast cells or they are separately
transformed into haploid strains of opposite mating types
after which they are mated to give rise to diploid yeast
cells, where interactions between prey and bait may occur.
The positive clones are identified by assaying the expression of the reporter genes (Figure 1).
Although initially designed to predict single protein
protein interactions, the Y2H system soon emerged to be
a more dependable and consistent technique to predict
protein interactions on a large-scale. In fact, the approach
is more extensively used to screen a prey cDNA library
with single bait, yielding a method to identify multiple
interacting preys of a single bait protein. Recently, the
method has been further extended to examine interactions
among two separate pools of bait fusion proteins and prey
fusion proteins6 (discussed later). Hence, the most important fact about this methodology is its ability to scan a
large number of protein interactions at a time, which is
perhaps impossible to do with other classical biochemical
or physical methods.
REVIEW ARTICLE
NLS of SV40 large T antigen is fused in frame with lexA
to ensure its nuclear localization. The technique may not
be suitable for interactions involving membrane proteins,
because the membrane proteins may not be folded properly in the cytoplasm and nucleus due to the presence of a
large number of hydrophobic patches.
But an important fact about the system is that the functional information is not lost during the assay unlike the
other large-scale approaches, including 2D gel electrophoresis. Therefore the assay might render a functional
clue about an interacting protein, if at least one of its
interacting partners has a known function in an wellunderstood pathway. In some cases, the outcome of a
screening often results in many new hypotheses and in
generating preliminary idea about new undiscovered
pathways, which can be validated by other methods.
The most convincing argument in favour of the
two-hybrid interaction assay is the number and speed in
which many biochemical pathways can be resolved in
molecular details11. In fact, due to this advantage, the
two-hybrid interaction assay is being extensively used
nowadays to generate genome-wide PIMs of various
organisms.
two types18 the matrix approach and the library screening approach.
Matrix approach
The matrix method was first described by Russel et al.19
while exploring the interactions among Drosophila cellcycle regulators. In their work, a number of individual
proteinprotein interactions among cyclin-dependent
kinases and potential partners were tested. The results of
these tests were then displayed as two-dimensional arrays,
which were called interaction matrices. After this first
small-scale array experiment, the method was applied at a
genome-wide scale to determine proteinprotein interaction networks for yeast and some of the prokaryotic and
eukaryotic viruses6,1217.
The first genome-wide two-hybrid study was carried
out on bacteriophage T7. Bartel et al.12 screened a library
of BD fusions of T7 protein fragments against a random
library of T7 AD fusions. Among the 55 phage proteins,
they found 25 interactions, including four interactions that
had been described previously.
Recently, two publications by Ito et al.6 and Uetz
et al.14 have described Y2H-based large-scale approaches
to detect genome-wide proteinprotein interactions in S.
cerevisiae. In the approach followed by Ito et al.6, about
6000 possible yeast open reading frames (ORFs) were
cloned individually as BD fusion baits in an yeast strain
and as AD fusion preys in another yeast strain of opposite
mating type. Subsequently, these two collections of BD
and AD fusion clones were clustered separately into 65
pools, each containing 96 clones. Out of possible 4225
(65 65) mating reactions, the authors conducted 430 of
them covering about 10% of all possible combinations.
They selected the positive clones using four different
reporter gene expressions to maintain a high selection
pressure. Through the procedure, about 4 106 different
combinations of interaction were examined, which
account for about 10% of the total proteome analysis
(4 107). As a result of this screening, 866 positive
clones were obtained. Their analyses resulted in identification of 183 independent interactions, 12 of which were
previously known. Furthermore, the result of these systematic Y2H screens enabled the authors to characterize
many complex interaction networks, including the one
that explained a previously unsolved mechanism for the
connection between distinct steps of vesicular transport6.
The work done by Uetz et al.14 is the most extensive
and systematic study of proteinprotein interactions in
yeast published so far. In their approach, the authors took
two strategies a time and labour-intensive one-by-one
array approach and a high-throughput one.
One-by-one array approach: In this method, about 6000
yeast transformants were generated, each containing one
CURRENT SCIENCE, VOL. 81, NO. 5, 10 SEPTEMBER 2001
REVIEW ARTICLE
possible ORF cloned into GAL4 activation domain vector
to express the hybrid protein. The transformants were
distributed among 16 micro-assay plates, each containing
384 colonies. Thus a total of 6144 (384 16) types
of ORFs were expressed as AD fusion proteins (prey).
Simultaneously, a set of 192 GAL4 BD hybrids (bait) was
prepared in yeast of opposite mating types following similar steps.
Now, to screen for the positive interactions, individual
BD hybrid was mated with all the transformants of the
array (Figure 2). The diploids obtained after mating were
selected by the reporter gene expression. The use of this
array technique did not require further classification of
the interactor proteins, because just by looking at the
position of the array one could easily assign the prey and
bait engaged in the interaction.
The range of positives in each mating was found to be
from 1 to 30. However only 87 out of 192 baits used gave
any reproducible positive interactions (i.e. reproduced
also in a second screening). Out of all the positives tested,
281 interactions were found to be discrete, suggesting a
mean value of 3 interactions per protein.
High-throughput approach: The array approach described above yielded definite results, but was very laborious
and time-consuming. Hence, as an alternative, the authors
had also described a high-throughput approach. In this
Figure 2.
et al.14.
Comparison of the two approaches: When the highthroughput approach is compared with the array method,
it was observed that the two methods gave different
experimental results. 45% of 192 proteins used as bait
yielded interactions in the array method, but only 8% of
5345 potential ORFs yielded interaction in high throughput method (Figure 2). Some of the differences in the
number of interactions may have resulted due to nonrandom choice of proteins for the array approach. Some
categories of proteins like membrane proteins and metabolic enzymes are omitted during bait selection in the
array approach, since they are less likely to yield interactors compared to proteins of signalling pathways. Apart
from this, the application of stringent selection pressures
in the case of high-throughput method may partially
account for less number of positives. However, even after
considering these points, one can say that the array method
yields much more candidate interactors than the highthroughput method. Array method gave an average of 3.3
interacting partners per bait protein compared to 1.8 in
case of high-throughput approach. In spite of this, the
most important advantage of the high-throughput method
over the array method is the involvement of less time,
money and labour. The array method is systematic, but at
the same time it is time-consuming and laborious, which
severely limits the number of baits tested at a time (192
baits tested in array method compared to 5345 baits tested
in high-throughput method). Thus, one can take recourse
to the high-throughput approach for having a rough idea
of probable interactions after which more rigorous studies
can be done by the array method using selective
baits/targets.
Altogether, both approaches resulted in 957 interaction
pairs of which 109 were known before14. Results of the
screen could place functionally unclassified proteins into
appropriate pathways in the cellular context. As examples,
identification of proteins involved in arginine metabolism
and vacuolar protein transport could be mentioned.
Insight into biological processes like RNA splicing had
also been gained from the generated PIM.
461
REVIEW ARTICLE
Library screening approach
Y2H assay technique was primarily designed to detect a
physical association between two known proteins in a
cell. But with time, the technique rapidly became the most
widely used method to screen libraries for identification
of interacting protein pairs. Usually cDNA library is prepared in AD fusion vector and screened with a single BD
fused bait in the interaction assay. This concept of library
screening through Y2H assay technique is nowadays
applied to obtain genomewide PIMs of prokaryotes and
lower eukaryotes. In this case, instead of a cDNA library,
a genomic DNA library is used. Since the library is generated by random fragmentation of genomic DNA, some of
the identified prey interactors may contain intergenic,
non-coding regions. To generate the final PIM, these nonsense interactors and other possible false positives are
eliminated by adopting suitable statistical approaches.
This type of library approach was first adopted a few
years ago by Fromont-Racine et al.15 to generate a PIM of
S. cerevisiae. The authors generated a highly complex
library of random yeast genomic fragments consisting of
about 5 106 clones. This was then fused to AD vectors
to prepare a library of prey proteins and transformed into
yeast haploid cells. In parallel, some selective baits were
chosen and following the same procedure these were
transformed into yeast haploid cells of opposite mating
type. An efficient mating strategy between the prey library
and each of the baits was followed. After this, positive
diploid clones were selected and the prey fragments were
characterized by sequencing. All positive prey candidates
were classified according to distinct heuristic value and
the prey candidates having higher heuristic values were
chosen as baits for a second round of screening procedure. This procedure of screening the library with potential preys of the previous screens chosen as baits was
repeated several times. In this particular attempt, the
authors were able to characterize new interactions
between known splicing factors, identify new yeast
splicing factors and reveal novel potential functional links
between cellular pathways.
The library approach was further examined in C. elegans starting with only 27 proteins involved in vulval
development17. Though C. elegans is an organism with
less complexity than higher eukaryotes, still the validity
of the library screening approach in determining precise
PIM was checked using this system. Also, most of the
protein interactions involved in vulval development of the
organism were previously reported and well-studied.
Hence the result obtained from the library screening
method could be verified using already known interaction
pathways. Apart from this verification, the resulting map
also revealed the existence of some new interactions,
which could not be there due to false positives. With the
help of this approach, functional annotation of approximately 100 uncharacterized gene products was possible.
462
Figure 3. Outline of the strategy for building H. pylori proteomewide interaction map.
CURRENT SCIENCE, VOL. 81, NO. 5, 10 SEPTEMBER 2001
REVIEW ARTICLE
done confidently. Thus in total, 46.6% of H. pylori proteome were connected through PIMs. The resulting PIM
revealed many biological pathways and helped in prediction of functions of many proteins. A few examples include
the components involved in chemotaxis, urease complex
and DNA replication. The results also identified complexes in H. pylori that had been shown or postulated to
be present in other bacteria like E. coli, validating the
strategy adopted in the study.
Conclusion
The technology based on Y2H assay system revealing
proteinprotein interactions has already made a huge
impact on basic and applied biological research. However, its highest achievement is in the production of
genome-wide PIMs. As whole genome sequences of
organisms ranging from prokaryotes to human are being
added to the database, opportunities for Y2H assay system are increasing by leaps and bounds. Till date, attempts
have been made to generate exhaustive PIMs of prokaryotes (e.g. H. pylori) and lower eukaryotes (e.g. S. cerevisiae). The comprehensive version of Y2H assay
technique is also being applied to generate interaction
maps in multicellular organism like C. elegans20. Certain
aspects of higher eukaryotic genomes like huge number of
gene products and profuse presence of introns between
the coding sequences make the experiments more difficult. However, if analyses are done with cDNA libraries
from mRNA pools of specific tissues or organs using
random primers, then the number of expressed gene products reduces considerably. Such libraries can be used to
perform extensive Y2H screens following the library
screening method. Also, keeping in view the progress
made in preparing intact ORFs and long-insert cDNA
cloning technologies, it is quite possible that Y2H assay
technique may serve as a prototype for studies of protein
interactions in cells with much larger genomes. A recent
announcement by a consortium of four biotechnology
companies to generate extensive PIMs of human proteome
using both Y2H and mass spectrometry methods, justifies
the statement.
Recently, the Y2H system has been modified by several
groups to enable genetic selection against a specific
proteinprotein interaction21,22. This modified system is
called reverse two-hybrid technique and it makes use of
yeast strains in which the interaction of hybrid proteins
increases the expression of a counterselectable marker
that is toxic under a particular condition. In this situation,
the dissociation of interacting partners provides a selective advantage, facilitating detection. This system is useful in identification of mutant gene product affecting
a particular proteinprotein interaction, by screening
pools of randomly-generated mutants. Most importantly,
it enables the identification of proteins or small molecules
CURRENT SCIENCE, VOL. 81, NO. 5, 10 SEPTEMBER 2001
responsible for dissociating a protein complex. As proteinprotein interactions are becoming prospective drug
targets, reverse two-hybrid assay can be applied to identify compounds inhibiting these interactions23.
For various reasons, large-scale Y2H screens sometimes produce false positives. Few strategies are adopted
to avoid false positives. In their exhaustive analyses of
yeast proteomes, Uetz et al.14 have carried out each screen
in duplicate and considered only the reproducible interactors for further analyses. To reduce the occurrence of
non-specific interactors, Serebriiskii et al.24 have developed a dual-bait Y2H system where two unrelated baits
are simultaneously used. Specific interaction with one bait
will give rise to the activation of a set of reporter genes,
but non-specific interactions, which usually give rise to
false positives, will have greater probability to bind to
both the baits, resulting in the activation of different sets
of reporter genes. Use of such methods helps in avoiding
false positives in Y2H screens.
With the development of high-throughput approaches
for analysing proteomes like pull-down assays, mass spectrometry, etc. it is reasonable to question the strength of
Y2H assay technique compared to the others. Mass spectrometry, with its recent innovations and extensions can
now analyse and characterize a large number of proteins
in a short duration with great precision. Also, this technology has been found to be useful in identification of
proteins in a complex and in estimation of expression of a
gene25. However, in spite of the development of this outstanding method to analyse the proteome, the Y2H assay
technique remains the method of choice to interpret the
protein networks funtionally3. This is primarily because
Y2H assay is an in vivo technique and is a direct method
to screen binary, ternary protein complexes and even
weak interactions occurring in a living cell. The recent
development of three-hybrid system to look for molecules
mediating a particular proteinprotein interaction can be
mentioned in this context26. In contrast, mass spectrometry methods generally involve purification of protein
complexes from living cells and subsequent identification
of their components. Due to these reasons, mass spectrometry studies often require significant amount of purified proteins. Hence, the studies involving mass
spectrometry have concentrated on relatively abundant
complexes like ribosomes or spliceosomes27, 28. In addition, pairwise connectivity of protein components remains
unclear by this analysis. Hence two-hybrid studies, which
identify both weak and strong pairwise interactions in a
living cell, can allow the reconstruction of higher resolution PIMs. Moreover, Y2H technique is less expensive
and technically simpler to handle compared to mass spectrometry.
Using a proteomics workstation consisting of twodimensional gel electrophoresis followed by mass spectrometry, one can separate and characterize specific
proteins responsible for a particular pathological state of a
463
REVIEW ARTICLE
tissue or for an infective stage of a pathogen. However,
due to the use of denaturing condition in gel electrophoresis, the information about the interacting partners is lost in
this analysis. This information can easily be acquired by
trying out Y2H screens using the particular proteins as
baits. Hence, the techniques can be combined effectively
to render proteome-wide information of a wide variety of
organisms. In this context, we have to consider the fact
that not every ORF is suitable for Y2H analysis. Hence
other proteomics approaches like large-scale immunoprecipitation or glutathione S-transferase pull-down to purify
protein complexes together with mass spectrometry to
identify the interacting partners or any other strategies
should be implemented as alternative approaches to generate near-complete interaction maps. Furthermore, the
Y2H assay system needs to be supported to a large extent
by bioinformatics tools, which are essential requirements
in handling huge data sets and to extract some biologically relevant information from them. All things considered, Y2H assay system with all the advantages and new
developments is definitely going to reveal interesting new
frontiers of research in coordination with other functional
and structural genomics approaches.
1. Bai, C. and Elledge, S. J., Methods Enzymol., 1996, 273, 331
347.
2. Fields, S. and Song, O., Nature, 1989, 340, 245246.
3. Uetz, P. and Hughes, R. E., Curr. Opin. Microbiol., 2000, 3, 303
308.
4. Criekinge, W. V. and Beyaert, R., Biological Procedures Online,
(www.science.uwaterloo.ca/bpo/), 1999, 2.
5. Mendelsohn, A. R. and Brent, R., Curr. Opin. Biotechnol.,1994, 5,
482486.
6. Ito, T. et al., Proc. Natl. Acad. Sci. USA, 2000, 97, 11431147.
464