pipsqueak: Biological Overview | Evolutionary Homologs | Regulation | Developmental Biology | Effects of Mutation | References
Gene name - pipsqueak

Synonyms -

Cytological map position - 47A13--B1

Function - transcription factor

Keywords - Polycomb group, transcriptional activation and silencing of homeotic genes posterior group, eye

Symbol - psq

FlyBase ID: FBgn0004399

Genetic map position - 2R

Classification - Psq motif, BTB/POZ domain

Cellular location - nuclear

NCBI links: Precomputed BLAST | Entrez Gene | UniGene |

pipsqueak is a sequence-specific DNA binding protein that targets a Polycomb group protein complex to Polycomb response elements (PREs). The Polycomb (Pc) group (Pc-G) of repressors is essential for transcriptional silencing of homeotic genes that determine the axial development of metazoan animals. It is generally believed that the multimeric complexes formed by these proteins nucleate certain chromatin structures to silence promoter activity upon binding to PREs. Little is known, however, about the molecular mechanism involved in sequence-specific binding of these complexes. An immunoaffinity-purified Pc protein complex has been shown to contains a DNA binding activity specific to the (GA)n motif in a PRE from the bithoraxoid region of Ultrabithorax. This activity can be attributed primarily to the large protein isoform encoded by pipsqueak (psq) instead of to the well-characterized GAGA factor Trithorax-like (Trl). The functional relevance of psq to the silencing mechanism is strongly supported by its synergistic interactions with a subset of Pc-G that cause misexpression of homeotic genes (Huang, 2002).

The biological properties of Pipsqueak are not, however, confined to targeting of Pc-G repressors. Pipsqueak, can directly bind to Trl and is associated with Trl in vivo. Genetic interaction studies provide evidence that Psq and Trl act together in the transcriptional activation as well as the transcriptional silencing of homeotic genes. A complete colocalization of Psq and Trl on polytene interphase chromosomes and mitotic chromosomes suggests that the two proteins cooperate as general partners not only at homeotic loci, but also at hundreds of other chromosomal sites (Schwendemann, 2002).

Several salient features have been noted about PREs. For example, PREs can silence a distant marker gene. PREs can also exhibit a pairing-sensitive silencing effect, resulting in much stronger silencing on the marker gene when the PRE is present on the homologous chromosome. A high incidence of PRE insertion occurs at sites that contain preexisting PRE or PRE-like sequences. In general, PRE insertion creates a new chromosomal binding site for many Pc-G proteins. Further, PREs can confer transcription repression on Ultrabithorax (Ubx) in a Pc-dependent manner in cultured cells. Thus, PREs appear to act as the core sequences upon which Pc-G proteins assemble into large functional silencing complexes. It has been speculated that PREs at different chromosomal sites, when spatially juxtaposed, might cooperate and become more effective (Huang, 2002 and references).

How Pc-G can accomplish these tasks remains largely unclear. To date, less than half a dozen Pc-G genes have been thoroughly studied. Some Pc-G proteins contain domains that are capable of homophilic or heterophilic interaction, potentially facilitating formation and/or interaction of multimeric protein complexes. Consistently, large protein complexes containing Pc-G proteins have been identified. For example, PC, Polyhomeotic (Ph), and Posterior sex combs (Psc) are found in the Pc repression complex 1 of approximately 2 MDa. A smaller protein complex containing Enhancer of Zeste [E(Z)] and Extra sex combs (Esc) has also been reported. Since some Pc-G proteins have not been shown to copurify with these complexes, additional complexes might be expected. Germ line clones of many Pc-G mutations display similar but distinct patterns of embryonic defects, suggesting partially overlapping functions. Chromatin immunoprecipitation has also revealed substantial variation in the composition of the Pc-G complexes at different sites. Surprisingly, some of these sites are found in actively expressed genes. Thus, multiple Pc-G complexes might function in different contexts during development (Huang, 2002 and refereces therein).

A fundamental question yet to be addressed fully is how the Pc-G protein complexes recognize specific sequences in PRE. With the exception of pleiohomeotic (pho), which encodes the homolog of mammalian YY1, no existing Pc-G has been shown to bind specific DNA sequences. The Pho binding site is a functional constituent of PRE; however, the inability of a LexA-Pho fusion protein to silence a linked reporter gene as other Pc-G fusion proteins suggests that Pho alone may not be sufficient to target functional Pc-G complexes. The (GA)n motif present in PRE has been suggested to be critical for homeotic gene silencing. It has been further suggested that the GAGA factor (Trithorax-like), a well-characterized DNA binding protein for the GAGA motif, is involved in PRE binding. Contrary to the expected silencing effect, Trl has also been shown to act either as an antirepressor to alleviate the negative effects of histone H1 or as a transactivator in vitro, in cultured cells, and in stress response. In addition, Trl has been formerly classified as a member of the trithorax group of genes (trx-G) that antagonize Pc-G. Therefore, the role of GAF remains unresolved (Huang, 2002 and refereces therein).

An ~440-bp DNA fragment from the bithoraxoid region of Ubx can recapitulate both positive and negative effects of trx and Pc, respectively. Immunoaffinity chromatography has been used to purify tagged Pc-G complexes and then their DNA binding activity was assayed. The (GA)n motif in this fragment has been found to be the primary binding site for the Pc-G complexes. Several lines of evidence are presented to show that the DNA binding protein for the Ubx PRE is encoded by pipsqueak (Huang, 2002).

Several lines of evidence are provided to show that a novel DNA binding factor encoded by psq is a constituent of CHRASCH (chromatin-associated silencing complex for homeotics), a previously characterized major Pc-G protein complex (Chang, 2001). Since CHRASCH also contains a histone modification factor, HDAC1, it is suggested that this complex may represent a fully functional entity that can nucleate certain chromatin structures at and around specific sequences (i.e., PRE) of homeotic genes (Huang, 2002).

Biochemical purification of Pc-G protein complexes has been limited by their apparent instability. Thus, a balance between biochemical purity and functional integrity might be considered. Different approaches are required subsequently to substantiate the physiological relevance of copurified proteins. To meet these criteria, the strategy was adopted of purifying Pc-G protein complexes to sufficient homogeneity mainly by immunoaffinity chromatography under moderate conditions, then examining the biochemical functions potentially relevant to these complexes, followed by identifying the functional constituents of the complex and corresponding genes, and finally validating their roles with genetic studies (Huang, 2002).

The bxd region has been extensively examined for polycomb response elements. Although different fragments ranging from ~400 bp to ~1 kb have been studied, they share a common region represented almost entirely by the B-151 fragment analyzed in this study. Among the three binding motifs of this fragment, it was found that the (GA)n motif represents the most prominent binding site for CHRASCH. In recent studies, the role of this motif in silencing has been demonstrated in transgenic flies. Thus, it is believed that this motif plays a critical role in anchoring one of the major Pc-G complexes (i.e., CHRASCH). These results, however, are not mutually exclusive to the possibility that other motifs may be required for different functional aspects of PRE (Huang, 2002).

One of the most critical issues concerning the specific targeting of the Pc-G complex appears to reside in the identity of the DNA binding factor. These results support the conclusion that Psq-A plays a primary role in such a function for the following reasons: (1) Psq-A, but not Trithorax-like, is copurified with CHRASCH; (2) UV cross-linking studies strongly indicate that Psq-A binds directly to the (GA)n motif. Additional proteins, however, were also evident in these studies. At present, it is not possible to distinguish between the possibilities that these proteins represent degradation products of Psq-A, other novel binding proteins, or spurious cross-linking to sterically adjacent proteins in the complex. Nevertheless, it is clear that Psq-A is involved in the binding of the (GA)n motif in vitro. (3) Psq is colocalized with Pc-G protein at both ANTP-C and BX-C sites on polytene chromosomes. (4) There is a remarkably strong genetic interaction between Pc-G and psq that gives rise to leg transformation and ectopic Ubx expression. (5) It has been shown that the lack of Psq-A in one mutant (i.e., psqDelta18) is sufficient to account for genetic interaction with Pc (Huang, 2002).

Recent studies have indicated that Trithorax-like (Horard, 2000) or a combination of novel forms of Trithorax-like and Psq (Hodgson, 2001) is responsible for the binding of the Pc-G complex to the (GA)n motif. In one study, embryonic nuclear extracts were used to form the DNA-protein complex, followed by immunodetection with Trithorax-like antibody. Since multiple (GA)n motifs are present in the probes, it is difficult to exclude the possibility that Trithorax-like and Pc-G complexes might bind these motifs independently. Similar problems also arise from subsequent studies in which fusion proteins of LexA and Pc-G have been used to bind probes containing LexA binding sites, since the minimal Trithorax-like binding site, the GAG trinucleotide, is also present in the LexA probe. Although more purified fractions were used for DNA binding analysis in the other study (Hodgson, 2001), a combination of Bio-Rex 70 and Q-Sepharose may not provide sufficient resolving power to exclude the possibility that a large number of unrelated proteins are copurified. In addition, the final fractions appear to be enriched for a GAGA factor of ~54 kDa and to exclusively contain Psq (~70 kDa). Both proteins appear substantially smaller than the smallest forms detected in the original extracts (~67 kDa for Trithorax-like and ~95 kDa for Psq). Since both Trithorax-like and Psq antisera have nonspecific cross-reactivities (see Horowitz, 1996 for Psq), the identities of these proteins remain obscure. Nonetheless, despite these uncertainties, it is possible that Trithorax-like may play a role in certain aspects of the silencing mechanism as suggested by genetic studies (Huang, 2002 and references therein).

Other sequence-specific DNA binding factors have also been implicated for Pc-G targeting by genetic and/or biochemical studies. Pho is the only one that has been formally categorized as a Pc-G. Its binding sites are present in many PRE. In addition, mutations of the Pho binding site compromise the ability of PRE to silence reporter genes in larval tissues. However, Pho does not appear to be directly associated with many Pc-G proteins. Thus, despite its important role in homeotic gene silencing, it is not clear whether Pho is directly involved in the targeting of Pc-G complexes (Huang, 2002 and references therein).

Another potential candidate involved in the binding of the Pc-G complex is the Zeste protein for its copurification with Pc repression complex 1. It has been speculated that Zeste proteins act as the scaffold via self-multimerization to bring together regulatory sequences situated on the same chromosome or different chromosomes. Its binding site has also been found in several PRE. In contrast to the proposed role for silencing, however, previous molecular and genetic studies have shown that the Zeste protein is most likely an activator. For example, it stimulates transcription of the Ubx promoter in vitro. Expression of a Ubx-LacZ transgene is completely abolished by a zeste mutation. For its transactivating effect, zeste has been considered a trx-G. Consistent with this notion, direct physical interaction has recently been demonstrated between the Zeste protein and two trx-G proteins, Moire and Osa, of the Brahma nucleosome remodeling complex. Genetically, zeste has also been defined as a transactivator involved in transvection of several genes, including Ubx. In addition, several Pc-G have been identified as suppressors of zeste. These observations cast some doubts on the physiological relevance of the Zeste protein in homeotic gene silencing. It is important to note that the two best characterized PRE (i.e., bxd and Fab7) also respond to trx-G. Thus, the mere existence of binding sites in PRE may not necessarily provide an unambiguous indication of their functions. While the manuscript was under review, however, a recent study has shown that zeste mutations result in an extended expression of a Ubx transgene containing a replacement of the proximal promoter with a combination of multiple Zeste and NTF-1 binding sites (Hur, 2002), suggesting a role for zeste in Ubx silencing. However, since extended expression was also observed for a Ubx transgene containing multiple NTF-1 binding sites at the proximal promoter region, the exact role of zeste may need to be more thoroughly examined (Huang, 2002 and references therein).

In conclusion, these results provide direct evidence that a specific Psq isoform is critically involved in the targeting of a major Pc-G protein complex CHRASCH to the (GA)n motifs that are commonly found in PRE. Earlier studies have demonstrated that a functional HDAC1 is associated with CHRASCH and is required for the silencing in vivo (Chang, 2001). A simple model is suggested for homeotic gene silencing that involves the assembly of multimeric complexes by known Pc-G proteins and other novel proteins yet to be identified, direct binding to specific sequences of PRE, and subsequent modification of N-terminal tails of core histones to establish a silencing code for stable maintenance of an inactive state (Huang, 2002).

It is also relevant to note that the functions of Pc-G silencing complexes may not be fully revealed by previous genetic or biochemical approaches because of the lack of suitable mutations, easily tractable phenotypes, or sufficient stability of the protein complexes. In the case of psq, a grandchildless class of mutations, sufficient amounts of Psq remain detectable in most homozygous mutant adults, yet embryos produced by these adults become severely defective before the manifestation of homeotic genes (Horowitz, 1996). In addition, the presence of more Psq sites than Pc-G sites on polytene chromosomes suggests a much wider spectrum of target genes for Psq. These effects altogether could conceivably obscure the homeotic effect caused by psq mutations, unless a more-sensitized genetic background (e.g., Pc mutations) is provided. The roles of MI-2 and HDAC1 in homeotic gene silencing also become apparent with similar approaches. It is speculated that some novel functions of the silencing complex may be defined by more-systematic studies (Huang, 2002).


cDNA clone length - 5162 (primary ovarian transcript coding for PsqA)

Bases in 5' UTR - 733

Exons - 10

Bases in 3' UTR - 1230


Amino Acids - 1065 (PsqA), 1085 and other smaller splice variants

Structural Domains

At the amino terminus, PsqA contains a BTB domain (Godt, 1993; also referred to as a POZ domain by Bardwell, 1994), a motif that has been shown to function in protein-protein interactions. Although BTB domains are often found near the N terminus of Cys2-His2 zinc finger proteins, PsqA does not appear to contain a zinc finger. Downstream of the BTB domain, PsqA contains 34 alternating histidine residues, (HX)n, a motif that is present in a number of other Drosophila proteins, primarily transcription factors. It has been proposed that these histidine repeats could mediate protein-protein interactions by coordinating metal ions to form a 'histidine-metal zipper' between two proteins containing the repeats. The presence of two potential protein-protein interaction domains suggests that PsqA monomers may interact with each other or with heterologous protein species. Additionally, PsqA contains four tandem copies of a conserved sequence of unknown function at its carboxy terminus, termed the psq motif (Horowitz, 1996).

Pipsqueak (Psq) belongs to a family of proteins defined by a phylogenetically old protein-protein interaction motif. Like the GAGA factor and other members of this family, Psq is an important developmental regulator in Drosophila, having pleiotropic functions during oogenesis, embryonic pattern formation, and adult development. The GAGA factor controls the transcriptional activation of homeotic genes and other genes by binding to control elements containing the GAGAG consensus motif. Binding is associated with formation of an open chromatin structure that makes the control regions accessible to transcriptional activators. Psq contains a novel DNA-binding domain, which binds, like the GAGA factor zinc finger DNA-binding domain, to target sites containing the GAGAG consensus motif. Binding is suppressed, as in the GAGA factor and other proteins of the family, by the associated protein-protein interaction motif. The DNA-binding domain, which is called the Psq domain, is identical with a previously identified region consisting of four tandem repeats of a conserved 50-amino acid sequence, the Psq motif. The Psq domain seems to be structurally related to known DNA-binding domains, both in its repetitive character and in the putative three-alpha-helix structure of the Psq motif, but it lacks the conserved sequence signatures of the classical eukaryotic DNA-binding motifs. Psq may thus represent the prototype of a new family of DNA-binding proteins (Lehmann, 1998).

It was asked if the Psq domain of D. melanogaster would exhibit DNA binding specificity. A 0.8-kilobase polymerase chain reaction fragment encoding the Drosophila Psq domain was cloned and the polypeptide was expressed by in vitro-translation. When this polypeptide is incubated with the hspGAGA2 oligonucleotide, a strong complex is formed. Formation of this complex is inhibited by increasing amounts of unlabeled hspGAGA2 but not by unrelated oligonucleotides shown to be ineffective in competing for binding of the A. mellifera Psq domain. Psq is thus the second GAGA-binding protein, other than the GAGA factor, that has been identified in D. melanogaster (Lehmann, 1998).

The similarity of the target sites recognized by Psq and GAGA factor suggests that binding of full-length Psq to GAGA sites in vivo might require the help of the GAGA factor. Binding of full-length Psq and Psq Delta240 to hspGAGA2 was tested in the absence and presence of the full-length GAGA-519 isoform of D. melanogaster. GAGA-519 in vitro translation products proved to be able to bind to hspGAGA2 with high affinity. Full-length Psq showed no binding and Psq Delta240 showed strong binding to this oligonucleotide. When either of these two proteins is mixed with the in vitro translated GAGA-519 isoform, the resulting pattern of DNA-protein complexes is the sum of the complex patterns observed in the presence of only the single proteins. Thus, the GAGA-519 isoform does not seem to be able to promote DNA-binding of full-length Psq in vitro. It remains to be shown if Psq isoforms containing both the BTB/POZ and Psq domains in fact bind to GAGA sites or other DNA-binding sites in vivo or if they exert their functions independent of DNA binding. Since isoforms also lacking the BTB/POZ domain seem to be expressed in vivo, binding to GAGA sites or related target sites may be reserved to these isoforms (Lehmann, 1998).

Psq cannot be easily assigned to any of the known families of eukaryotic DNA-binding proteins. Repeats with homology to the Psq motif are present in at least one additional Drosophila protein, the TKR protein (Haller, 1987), suggesting that this protein is also able to bind to DNA. Interestingly, a Drosophila BTB/POZ domain-encoding gene (BTB-III) has been identified (Zollman, 1994) that has an embryonic RNA distribution pattern very similar to that of Tkr and that maps to the same chromosomal position. It is thus interesting to speculate that the Tkr locus is more complex than previously supposed, encoding several protein isoforms, one of which contains a BTB/POZ domain in addition to the Psq domain. Beyond Drosophila, a homology to the Psq motif is found in a polypeptide predicted by an open reading frame of Caenorhabditis elegans cosmid T01C1. The Psq domain may thus define a new class of DNA-binding domains. At present, an extensive search of the protein sequence data bases reveals no other eukaryotic proteins with clear cut homology to the Psq motif. However, searching the Blocks Data base with a multiple alignment of the eight Psq repeats reveals significant sequence similarities to the DNA-binding domain of prokaryotic recombinases and thereby provides a link between the Psq domain and the homeodomain, for which such similarities have been described as well. Cocrystal structures with DNA of two recombinases, Hin recombinase gammaDelta-resolvase, show that their DNA-binding domains consist of three alpha-helices flanked by extended arms, which make contacts to the minor groove. The highest similarity between the Psq motif and the recombinase DNA-binding domain is observed within the C-terminal recognition helix, which forms a helix-turn-helix motif with helix 2 and inserts into the major groove. Remarkably, the recognition helix of members of the Hin recombinase family makes specific major groove contacts to a sequence that is clearly related to the GAGA motif. The Psq motif has the same size of about 50 amino acid residues as the recombinase DNA-binding domains, and secondary structure predictions for the Psq motif are compatible with the triple-helix structure of these domains. A similar triple-helix structure is formed by the homeodomain and Myb DNA-binding domain. It is interesting to note that, like the Psq domain, also the Myb DNA-binding domain consists of imperfect tandem repeats of a conserved sequence motif. The Psq domain thus seems to be structurally related, both in its conformation and in its repetitive structure, to known DNA-binding motifs, but it eludes the classification into one of the prevalent categories of eukaryotic DNA-binding domains. Identification of additional members of the Psq family and determination of the structure of the Psq domain complexed with DNA will help to better define this new class of DNA-binding domains (Lehmann, 1998).

pipsqueak: Evolutionary Homologs | Regulation | Developmental Biology | Effects of Mutation | References

date revised: 28 December 2003

Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.

The Interactive Fly resides on the
Society for Developmental Biology's Web server.