|
Gene name - Enhancer of Polycomb Synonyms - Cytological map position - 48A2--48A2 Function - Pc group protein and modifier of PEV Keywords - Polycomb group protein and |
Symbol - E(Pc) FlyBase ID: FBgn0000581 Genetic map position - 2-61.9 Classification - novel chromatin protein Cellular location - nuclear |
Enhancer of Polycomb [E(Pc)] is a gene with a dual identity, serving as a suppressor of position-effect variegation (PEV) (Sinclair, 1998a) and as a Polycomb Group (PcG) gene. E(Pc) and its relationship to PEV will be discussed first.
For many years, histologists have distinguished between two types of chromatin: the genetically active euchromatin and the genetically inactive heterochromatin. Position-effect variegation occurs when chromosomal rearrangements juxtapose a euchromatic gene next to a broken segment of heterochromatin. Expression of the transposed gene is repressed in some cells but not in others, producing a mosaic phenotype. Such heterochromatin-induced gene silencing occurs because there are differences in structure and in the regulation of gene expression between heterochromatin and euchromatin, and these changes can be transmitted to the translocated gene. Repression is probably caused by the spreading of heterochromatin into the euchromatic gene, causing inactivation, although models invoking sub-nuclear localization have been gaining support. Many modifiers of PEV have been identified (Sass, 1998). Suppressors of variegation [Su(var)s] are predicted to encode either structural components of heterochromatin, or proteins that regulate heterochromatin components. To date, several Su(var)s have been cloned. These include: Su(var) 3-7, a zinc finger protein; Su(var) 205, which encodes HP1, a protein purified from heterochromatin; modulo, a DNA-binding protein (Garzino, 1992); Su(var) 3-6, the protein phosphatase 1 catalytic subunit (Dombradi, 1992); Su(var) 3-9, a protein that contains domains found in other chromatin regulators (Tschiersch, 1994) and the gene encoding S-adenosylmethionine synthetase, which is required for spermine production (Larsson, 1996). The known Su(var)s havestructures consistent with a role in heterochromatin formation (Stankunas, 1998 and references).
In addition to its role in the enhancement of PEV, Enhancer of Polycomb is also a member of the Polycomb Group (PcG) gene family. Several groups have investigated whether PcG mutations modify PEV, or if Su(var)s have homeotic phenotypes, as a test of the hypothesis that the two groups have overlapping functions. The results suggest that there is little overlap between the groups, with the exception of the PcG genes Enhancer of Polycomb and Enhancer of zeste, which act as Su(var)s (Sinclair, 1998a). However, a trithorax group protein, Additional sex combs, can serve as an enhancer of PEV, supporting the notion that there is an overlap between regulators of homeotic genes and modifiers of PEV (Sinclair, 1998b). E(Pc) is unusual among PcG mutations because it does not, by itself, possess the capacity to generate a homeotic phenotype in embryos or adults (only Abd-B shows a very modest ectopic expression in embryos), once the maternal contribution of E(Pc) protein or mRNA is removed. However, mutations in E(Pc) enhance homeotic mutations in the PcG genes Polycomb, Polycomb like, polyhomeotic, Sex combs extra, Sex comb on midleg and super sex combs, suggesting that E(Pc) is important for PcG function. It may be that, like Su(z)2, E(Pc) is partially functionally redundant and, therefore, lacks homeotic effects in embryos and adults. One other gene, S-adenosylmethionine synthetase, has been identified that has no homeotic phenotype: it enhances phenotypes of PcG mutants and acts as a Su(var) (Larsson, 1996). Like S-adenosylmethionine synthetase, E(Pc) may be indirectly required for PcG function. It may be that E(Pc) regulates PcG or Su(var) expression, or has other indirect effects. E(Pc) has now been cloned from Drosophila and mouse. Both gene products contain a large novel domain that is also conserved in yeast and nematode proteins. The E(Pc) protein of Drosophila is ubiquitously expressed and binds to polytene chromosomes at about 100 sites, of which only about a third overlap with Pc-binding sites. Interestingly, E(Pc) is not detected at the heterochromatic chromocenter, supporting a model in which the E(Pc) has a functional rather than a structural role in heterochromatin, and supporting the conclusion that there is less overlap between mechanisms of heterochromatin formation and PcG repression than had previously been supposed (Stankunas, 1998).
A simple model to explain how E(Pc) functions as both a PcG protein (promoting gene silencing) and a Su(var) proposes that E(Pc) protein has a structural role in heterochromatin and in PcG complexes. This model is very unlikely because E(Pc) is not detected in heterochromatin of the chromocenter, as is HP1, another Su(var). The sequence analysis of E(Pc) does not reveal active sites of any known enzymes, making it unlikely that E(Pc) has an enzymatic function and argues against a hypothesis that E(Pc) modifies the chromodomain of HP1 and Pc (Kennison, 1995). The alternative suggestion (Kennison, 1995) that E(Pc) interacts with the chromodomains of HP1 and Pc has not been ruled out. These experiments do not address the possibility that E(Pc) directly affects nuclear compartmentalization of chromatin into active and inactive compartments, or that E(Pc) is needed for the establishment of a nuclear architecture required for establishment of repression. Nevertheless, the observation that E(Pc) is a chromatin protein of limited distribution makes it probable that E(Pc) regulates genes and thus may have indirect effects on nuclear architecture. The discrete binding of E(Pc) to polytene chromosomes makes it unlikely that E(Pc) plays a general role in chromosome or chromatin structure which is only indirectly affected by heterochromatin formation and repression by PcG proteins (such as S-adenosylmethione synthetase). Rather, it is suggested that E(Pc) has an indirect effect on the formation of heterochromatin, and thus on position-effect variegation; this probably occurs via the regulation of genes required for heterochromatin formation. Consistent with this idea, E(Pc) binds to the locations of some modifiers of PEV, but so far there is no evidence for direct regulation of modifiers of PEV by E(Pc) (Stankunas, 1998).
The E(Pc) gene is located upstream of invected and is transcribed in the same direction (Stankunas, 1998).
Bases in 5' UTR - 1.1 kb
Exons - 7
Bases in 3' UTR - 1.4 kb
The protein has an estimated size of 220 kDa, making it the largest member of the PcG characterized so far. It contains many charged residues and has an estimated pI of 5.79. Two of the basic amino acid-rich sequences, amino acids 325-KKRKHK-330 and 675- KRRRLRRKK-683 are probable nuclear localization signals. E(Pc) is similar to a number of cloned PcG proteins because of the presence of multiple regions enriched in specific amino acids. E(Pc) contains 7 glutamine-rich regions, some of which are perfect repeats, the longest being 12 consecutive glutamines. Glutamine repeats are also found in Polyhomeotic (ph) and Additional sex combs (Asx) (Sinclair, 1998b). Glutamine repeats have been implicated in protein-protein interactions and in transcriptional activation. The 18 amino acid sequence from aa 835-852 contains 15 alanines. Alanine-rich motifs are found in Asx (Sinclair, 1998b), Sex comb on midleg and on a new member of the PcG, Cramped. Alanine-rich regions have been implicated in repression by transcription factors, but their function in PcG proteins is unknown. E(Pc) contains two arginine-rich sequences at aa 780-793 and 1976-1982, a feature also found in the carboxy terminus of Su(z)2. E(Pc) contains a putative leucine zipper at amino acids 644-672. Leucine zippers form coiled coils. Analysis of E(Pc) predicts a coiled coil between amino acids 651 and 690, consistent with the leucine zipper being functional. There are two additional coiled coils predicted to occur between amino acids 208 and 258, and 1633 and 1630 (Stankunas, 1998).
Drosophila Enhancer of Polycomb, E(Pc), is a suppressor of position-effect variegation and an enhancer of both Polycomb and trithorax mutations. A homologous yeast protein, Epl1, is a subunit of the NuA4 histone acetyltransferase complex. Epl1 depletion causes cells to accumulate in G2/M and global loss of acetylated histones H4 and H2A. In relation to the Drosophila protein, mutation of Epl1 suppresses gene silencing by telomere position effect. Epl1 protein is found in the NuA4 complex and a novel highly active smaller complex named Piccolo NuA4 (picNuA4). The picNuA4 complex contains Esa1, Epl1, and Yng2 as subunits and strongly prefers chromatin over free histones as substrate. Epl1 conserved N-terminal domain bridges Esa1 and Yng2 together, stimulating Esa1 catalytic activity and enabling acetylation of chromatin substrates. A recombinant picNuA4 complex shows characteristics similar to the native complex, including strong chromatin preference. Cells expressing only the N-terminal half of Epl1 lack NuA4 HAT activity, but possess picNuA4 complex and activity. These results indicate that the essential aspect of Esa1 and Epl1 resides in picNuA4 function. It is proposed that picNuA4 represents a nontargeted histone H4/H2A acetyltransferase activity responsible for global acetylation, whereas the NuA4 complex is recruited to specific genomic loci to perturb locally the dynamic acetylation/deacetylation equilibrium (Boudreault, 2003).
There appear to be two E(Pc) paralogs in mammals, which have been named EPC1 and EPC2 in humans, and Epc1 and Epc2 in mice. A mouse EST clone containing sequences homologous to E(Pc) was used to screen an embryonic cDNA library: a 3.9 kb cDNA was recovered and termed Epc1-L. It contains 5' and 3' untranslated sequences, including a poly(A) tail, and a 2.3 kb ORF. Epc1-L encodes a protein of only 764 amino acids, about a third of the length of the Drosophila homolog. Another clone is a 1.65 kb cDNA, which contains 825 bp identical to Epc1-L, plus 109 bp not found in Epc1-L, and a downstream sequence that is identical to Epc1-L for the remainder of the clone. These divergent sequences probably correspond to alternatively spliced exons, although the genomic structure has not been determined to confirm this. The shorter splice variant has been termed Epc1-S. At the location corresponding to the insertion of the 109 bp in Epc1-S, Epc1-L contains 769 bp not found in Epc1-S. Both the 769 and 109 bp sequences are open through their entire length, but Epc1-L and Epc1-S use different reading frames downstream, even though their nucleotide sequences are identical. The result is that Epc-S terminates much earlier than Epc-L, to yield a protein of 344 amino acids. Another clone has also been sequenced, a short clone from Epc2 (Stankunas, 1998).
Comparison of E(Pc) with the Epc1-S sequence reveals three regions of sequence similarity termed EPcA-C, respectively. EPcA is an amino terminal region of 266 amino acids, which is 51% identical and 71% similar to the equivalent mouse domain. Within EPcA is a stretch of 54 amino acids in which 46 amino acids are identical to the equivalent mouse sequence and 50/54 amino acids are similar. Interestingly, this highly conserved sequence contains a consensus tyrosine phosphorylation sequence 214-RKNDEASY-221. The EPcA domain is hydrophilic and contains 34% charged residues, as opposed to 23% in the protein as a whole. The domain is also strikingly rich in methionine and tyrosine (6% and 4.15%, respectively), when compared to methionine and tyrosine in the protein as a whole (1.8% and 1.9% respectively). Structural analysis predicts that the EPcA domain in flies and mice contains three alpha helices in the regions of amino acids 5-55, 90-155 and 200-260. EPcB is an interior domain of 102 amino acids (52% identical and 64% similar in flies and mice) that contains 15% arginine and 10% serine residues. EPcC contains just 24 amino acids (54% identical and 75% similar) and is unremarkable except for its conservation over a long evolutionary distance. Epc-L contains a short glutamine repeat, but it does not contain the alanine or the arginine repeats found in E(Pc). Interestingly, Epc1-S contains an alanine repeat but does not contain EPcB or EPcC. Perhaps Epc1-L and Epc1-S have acquired different functions. Epc2 initiates at the same methionine used in E(Pc), unlike Epc1 which initiates at a methionine internal to that of E(Pc). Neither Epc1-L nor Epc1-S contains a putative leucine zipper (Stankunas, 1998).
To determine if EPC1 or EPC2 co-map with known human mutations that affect growth or differentiation, the cytological location of EPC1 and EPC2 was determined using fluorescence in situ hybridization (FISH). EPC1 maps to 10p11-12 and EPC2 to 22q13.3. EPC1 also localizes to region 10p11.2-12 on the human transcript map. No transcripts match the EPC2 sequence in the same database. These data show that EPC1 and EPC2 are distinct genes. However, no mutations that map to these regions have phenotypes affecting cell growth or determination (Stankunas, 1998).
E(Pc) homologs were sought in yeast: Caenorhabditis elegans databases and matches were found in both organisms. The EPcA domain is conserved [26% identity, 46% similarity when compared to E(Pc)] in YFL024C, a hypothetical yeast protein proposed to encode a 96.7 kDa protein of no known function, that has been rename EPL1, for Enhancer of Polycomb-like. EPL1, like E(Pc) contains glutamine repeats and an alanine repeat. Some of the EPcA domain is conserved (42% identity, 52% similarity over a 156 amino acid sequence) in a C. elegans EST, referred to as cEPc. The cEPc protein also contains the EPcB domain. When the EPcA domain was used in a BLAST search to rescreen the databases, another human protein, called BR140 (and also termed peregrin) was recovered (Thompson, 1994). BR140 contains only some of the EPcA domain. The conserved region shows 33% identity over a 64 amino acid region. Interestingly, BR140 contains a bromodomain, found in the trithorax Group gene brahma, as well as in other highly conserved proteins required for transcriptional activation, and a PHD domain, a cysteine cluster found in many proteins including trithorax and Polycomblike. Therefore, BR140 contains three separate domains shared with different Drosophila proteins implicated in gene regulation and chromatin. The structure of BR140, which shows significant conservation of a short sequence within EPcA suggests that EPcA is modular, and that different parts of the sequence have different functions. A modular organization for EPcA is also supported by the different spacing between conserved regions of EPL1 and those of higher eukaryotes (Stankunas, 1998).
The full-length murine Epc1 cDNA, which should hybridize to both splice variants, was used to probe Northern blots prepared from adult mouse tissues and from embryonic stages. A complex pattern of hybridization was seen. In adult tissues,there appear to be two mRNAs of 4.0 and 2.6 kb, plus one other smaller mRNA that varies in size from 1.3-2.0 kb, depending on the tissue. The 4.0 kb transcript is the major transcript in most tissues, but the 4.0 and 2.6 kb transcripts appear to be regulated independently, as can be seen by comparing amounts of these transcripts in liver or skeletal muscle with kidney. Epc1 is expressed in all tissues tested except spleen, and is present throughout embryonic development, although there are additional mRNAs of higher molecular weight expressed in embryogenesis. While the 4.0 and 2.6 kb transcripts likely represent Epc1-L and Epc1-S, respectively, the possibility cannot be ruled out that Epc2 transcripts or transcripts from uncharacterized loci are also detected (Stankunas, 1998).
Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.
The Interactive Fly resides on the
Society for Developmental Biology's Web server.