modifier of mdg4


REGULATION

Targets of Activity

The 5'-untranslated region of the Drosophila gypsy retrotransposon contains an "insulator," which disrupts the interactions between distally located enhancers and proximal promoter elements. The insulator effect is dependent on the suppressor of Hairy-wing (su[Hw]) protein, which binds to reiterated sites within the 350 base pairs of the gypsy insulator, and additionally acts as a transcriptional activator of gypsy. This study shows that the 350-base pair su(Hw) binding site-containing gypsy insulator behaves as a matrix/scaffold attachment region (MAR/SAR), involved in interactions with the nuclear matrix. In vitro experiments using nuclear matrices from Drosophila, murine, and human cells demonstrate specific binding of the gypsy insulator, not observed with any other sequence within the retrotransposon. Moreover, it is shown that the gypsy insulator, like previously characterized MAR/SARs, specifically interacts with topoisomerase II and histone H1, i.e. with two essential components of the nuclear matrix. Experiments within cells in culture demonstrate differential effects of the gypsy MAR sequence on reporter genes, namely no effect under conditions of transient transfection and a repressing effect in stable transformants, as expected for a sequence involved in chromatin structure and organization (Nabirochkin, 1998).

The presence of a MAR/SAR within gypsy is not totally unexpected, since "boundary" elements are in general regions which contain not only enhancer and insulating elements, but also matrix attachment domains. The rather original feature of the gypsy sequence is that all three domains, which in general are sufficiently "dispersed" so as to allow isolation of "pure" enhancers, MAR/SAR, or insulators, are in the present case "gathered" within a single and relatively short (350 bp) sequence. This rather uncommon situation might in fact be relevant to the pressure for compactness within retroviral sequences, as it is known that retroviruses can only package a limited amount of genetic information. A consequence of compaction is that the gypsy insulator and its associated components are most probably interacting, in vivo, with elements of the nuclear matrix. Accordingly, proteins of the nuclear matrix might play a role in the insulation process, and conversely the su(Hw) protein (which is essential for insulation) might interact with proteins of the matrix. Such interactions could actually account for the data on gypsy insulation and fit with previously proposed models for the gypsy effects (Nabirochkin, 1998).

A first series of data strongly suggested that the gypsy insulator, like all previously characterized insulators, essentially prevents interactions between distal enhancer and promoter, without any direct repressing effect on the enhancer itself. This directional effect can most easily be accounted for by the "looping model" involving generation of structural domains isolated one from the other by attachment of boundary sequences (MAR/SAR) to the nuclear matrix. Alternatively, a series of data on gypsy insulation (essentially in mod(mdg4) mutants) discloses bidirectional repressing effects, which can be accounted for by a model involving heterochromatinization. The present data (showing that the gypsy insulator behaves as a MAR/SAR) are clearly in agreement with the structural looping model, but also support the heterochromatinization model. Indeed, the gypsy MAR/SAR DNA per se, in the absence of su(Hw) protein, is involved in histone H1 nucleation (as shown in this paper), and it has been demonstrated that histone H1 nucleation is associated with both DNA compaction and transcriptional silencing. Additionally, Laemmli and co-workers have found that histone H1 can be removed from MAR/SAR domains by distamycin and distamycin-like proteins (D-like proteins, such as the high mobility group proteins); this has led to the proposal that MAR/SARs can activate or repress transcription of adjacent genes depending on the nucleation/depletion of histone H1. The gypsy MAR/SAR could then be responsible for the repressing effect observed in the mod(mdg4) mutants, as well as in the present assay within heterologous cells (assuming further that appropriate D-like proteins are absent in those cells). Taking into account, in addition, that mutations in the mod(mdg4) or the su(Hw) genes modify position-effect variegation, it could be further hypothesized that the su(Hw)/mod(Mdg4) complex acts as the D-like proteins and modifies the nucleation processes to allow the switch from a repressing to an active state. Accordingly, a model in which the su(Hw) binding sites and the associated su(Hw)/mod(Mdg4) complex modulate the effects of the MAR/SAR DNA sequence could rather simply account for the biological effects of the gypsy insulator in both the wild type and su(Hw)/mod(mdg4) mutants. The proposed model would then reconcile the two previous models for gypsy insulation, i.e. the heterochromatinization and the looping models (Nabirochkin, 1998 and references).

Trans-splicing of mod(mdg4)

The Drosophila BTB domain containing gene mod(mdg4) produces a large number of protein isoforms combining a common N-terminal region of 402 aa with different C termini. The genomic structure of this complex locus has been deduced and it has been found that at least seven of the mod(mdg4) isoforms are encoded on both of its antiparallel DNA strands, suggesting the generation of mature mRNAs by trans-splicing. Drosophila can produce mod(mdg4) mRNAs by trans-splicing of pre-mRNAs generated from transgenes inserted at distant chromosomal positions. Evidence is presented for the occurance of trans-splicing of mod(mdg4)-specific exons encoded by the parallel DNA strand. The mod(mdg4) locus represents a new type of complex gene structure in which genetic complexity is resolved by extensive trans-splicing, raising important implications for genome sequencing projects. Demonstration of naturally occurring trans-splicing in the model organism Drosophila opens new experimental approaches toward an analysis of the underlying mechanisms (Dorn, 2001).

During the molecular analysis of the mod(mdg4) locus, 26 different classes of transcripts were identified all containing a common 5' sequence (exons 1-4) but different 3' regions. The deduced proteins contain an N-terminal (BTB) domain. Furthermore, in most of the isoforms a conserved C-terminal C2H2-containing protein motif is found. Two new cDNA clones representing isoforms mod(mdg4)-52.2 and mod(mdg4)-54.1 were isolated by screening an embryonic cDNA library. Two additional putative isoforms, mod(mdg4)-53.6 and mod(mdg4)-54.7 have been detected by searching the genomic region of mod(mdg4) for ORFs containing the C-terminal C2H2 consensus sequence. Mod(mdg4)-58.8 represents another new isoform, which was so far identified by a 3'-truncated cDNA clone. RT-PCR experiments reveal the existence of mod(mdg4)-53.6, -54.7, and -58.8 transcripts in early embryos. By sequencing the resulting PCR products, the putative proteins were deduced. All of them contain the conserved C-terminal protein consensus sequence. Altogether 20 of 26 identified mod(mdg4) isoforms combine both the N-terminal BTB domain and the C-terminal consensus sequence -- this might be of functional significance (Dorn, 2001).

On the basis of extensive cDNA sequence data and the available sequence of the mod(mdg4) region, the exon/intron structure of mod(mdg4) has been deduced. Interestingly, seven of the transcripts are not colinearly located within the locus. Relative to the common exons 1-4, exon 5 of isoforms mod(mdg4)-53.1, -62.3, -55.6, -53.6, -54.7, -57.4, and -67.2 are encoded by the antiparallel DNA strand. Beside this finding, the exon/intron structure of mod(mdg4) suggests differential splicing within isoform-specific exons (isoforms mod(mdg4)-55.7 and -52.2; mod(mdg4)-54.6, -56.3, -54.2, and -46.3; mod(mdg4)-58.6 and -54.1). According to its genomic structure, the genetic density at the mod(mdg4) complex is unusually high. Altogether 26 independent transcripts of an average size of 2 kb are encoded in a genomic region of 28 kb (Dorn, 2001).

To demonstrate trans-splicing, specific exons from the common exons 1-4 have been separated to different chromosomes via transgene insertions. In this assay, the specific exons mod(mdg4)-55.1 and mod(mdg4)-53.1, which are encoded by antiparallel DNA strands, were chosen. Notably, the 3'-untranslated regions of both isoforms are antisense within a region of 149 nt. This experiment should also test for putative trans-splicing of isoform mod(mdg4)-55.1, which is colinearly located with respect to the common exons 1-4. In the transgenes, both specific exons have been sequence-tagged via PCR and the resulting fragment containing genomic sequences of 2.0-kb and 1.2-kb upstream of the splice sites of mod(mdg4)-55.1 and mod(mdg4)-53.1, respectively, was cloned in both orientations into the Drosophila transformation vector pUAST. Expression of the inserted sequences is induced by the yeast transcriptional activator GAL4, expressed from an independent driver element (Dorn, 2001).

One of the important criteria for efficient mRNA trans-splicing is the presence of a 3' splice site in absence of a functional 5' splice site as represented in outrons. To meet this criteria, the production of independent transcript(s) containing one or several of the endogenous-specific mod(mdg4) exons would be expected. Searching for putative promoter elements within the mod(mdg4) complex, several TATA-box-containing elements were found. One of these is located upstream of the specific exon mod(mdg4)-55.1 and is contained within the transgene construct used for the trans-splicing assay. To prove its function in vivo, all transgenic lines have been tested for expression of the transgene in absence of any GAL4-driver element. Independent of the insertion site of the transgene and its orientation relative to the UAS sequence, trans-splicing of the tagged specific exon mod(mdg4)-55.1 could be demonstrated. For PCR primers, again the forward primer E4-F and the primer 55.1-tag1-back have been used. However, the level of expression and/or the efficiency of trans-splicing is variable in different transformants. This could be because of chromosomal position effects, depending on the insertion site of the transgene. It is concluded from these results that independent mRNAs containing the common exons 1-4, in one case, and mRNAs containing the specific exon mod(mdg4)-55.1 in the other case, are produced endogenously (Dorn, 2001).

The mod(mdg4) locus represents an unusual type of gene structure. Both DNA strands within the locus are used to encode a large number of protein isoforms. A transgenic approach clearly demonstrates that both the colinearly located specific exons [demonstrated for exon mod(mdg4)-55.1] and those encoded by the antiparallel DNA strand [shown for exon mod(mdg4)-53.1] are substrates for trans-splicing. This result also suggests that all other mod(mdg4) isoforms might be generated by trans-splicing, implying the initiation of independent pre-mRNAs at several promoter elements within the mod(mdg4) complex. Multiple TATA-box-containing elements were found throughout the locus. One of these is located upstream of the mod(mdg4)-55.1 isoform and is contained in the transgene. Expression of the transgene independent of the inducible promoter element in six independent transgenic lines indicates a putative promoter function. However, further experiments should demonstrate the existence of multiple promoters at the mod(mdg4) locus. Moreover, these results demonstrate that trans-splicing occurs within mod(mdg4), independent of the chromosomal context of the common exons 1-4 and the 3'-specific exons. This also raises the question about the special requirements for initiating trans-splicing at mod(mdg4). Further experiments should clarify whether RNA recognition or nuclear compartmentalization or both plays a role in the initiation of trans-splicing (Dorn, 2001).

The data suggest that trans-splicing is a general property of the mod(mdg4) locus. Three possible types of trans-splicing events have been invisioned. The transcript containing common exons 1-4 is produced in large quantities and contains putative interaction sites with upstream regions of pre-mRNAs containing the specific mod(mdg4) exon(s). The specific exons are transcribed as mono-exonic, di-exonic, or polyexonic mRNAs from both cDNA strands. Depending on the site of trans-splicing, three different protein isoforms (A, B, and C) are produced. The expression of the alternatively spliced mod(mdg4) isoforms could be regulated at several levels: (1) differential spatial and temporal expression of pre-mRNAs containing one (or groups of) alternatively spliced specific exons; (2) differences in selectivity and efficiency of trans-splicing to generate different quantities of mature mRNAs, and (3) variable stability of the isoform specific transcripts. As a result, >20 different mod(mdg4) protein isoforms are produced that all contain a common region of 402 aa, including the N-terminal BTB domain implicated in dimerization/oligomerization and variable C termini with the conserved C2H2 motif. The variable C termini are implicated to specify the function of individual isoforms in different processes like chromatin insulator function, programmed cell death, or modification of gene silencing (Dorn, 2001).

Two mutant alleles of the same gene, each located in one of the two homologous chromosomes, may in some instances restore the wild-type function of the gene. This is the case with certain combinations of mutant alleles in the mod(mdg4) gene. This gene encodes several different proteins, including Mod(mdg4)2.2, a component of the gypsy insulator. This protein is encoded by two separate transcription units that can be combined in a trans-splicing reaction to form the mature Mod(mdg4)2.2-encoding RNA. Molecular characterization of complementing alleles shows that they affect the two different transcription units. Flies homozygous for each allele are missing the Mod(mdg4)2.2 protein, whereas wild-type trans-heterozygotes are able to synthesize almost normal levels of the Mod(mdg4)2.2 product. This protein is functional as judged by its ability to form a functional insulator complex. The results suggest that the interallelic complementation in the mod(mdg4) gene is a consequence of trans-splicing between two different mutant transcripts. A conclusion from this observation is that the trans-splicing reaction that takes place between transcripts produced on two different mutant chromosomes ensures wild-type levels of functional protein (Mongelard, 2002).

The interallelic complementation mechanism reported here is different from those previously described. Interallelic complementation between mutations affecting the coding region of the gene has been observed when each allele is affected in only one of two separate functional domains of a multifunctional protein. Two alleles, each deficient in a different domain, may complement one another. Such a complementation mechanism requires the production of abnormal proteins by the mutant loci. This is not the case for mod(mdg4), as assessed by Western blots and in situ immunodetection. The complementation observed here has its molecular origin in the cell's ability to produce a wild-type RNA by combining information present in two mutant transcripts. A trans-splicing event is most probably involved in the production of the final Mod(mdg4)2.2 mRNA. How trans-splicing is integrated with transcription and pre-mRNA processing reactions remains to be addressed. Abundant evidence suggests that transcription and processing of the mRNA are coordinated nuclear events. For example, splicing factors are recruited to the sites of transcription by RNA polymerase II. Similarly, mRNA capping and polyadenylation seem to occur right at the transcription site. Finally, even the packaging of the mature mRNA into heterogeneous ribonucleoparticles prior to cytoplasmic export could be coupled to other pre-mRNA processes. One may therefore argue that to enter a trans-splicing reaction, two premessenger RNAs need to be in physical proximity also during their transcription, before they engage in cis-splicing and other processing events. In the case of a wild-type mod(mdg4) locus, this condition is always met, thanks to the configuration of the gene: the two transcription units are in the same locus. In the case of flies undergoing interallelic complementation, this proximity condition may be met because of the extensive somatic pairing that exists between homologous chromosomes in both polytene and diploid cells (Mongelard, 2002).

It remains an open question whether a physiologically significant number of trans-splicing events are possible when the transcripts involved are produced at distant nuclear locations. It has been recently shown that two transcripts, one produced by the normal mod(mdg4) gene and a second one by a transgene inserted elsewhere in the genome, may be combined, presumably by trans-splicing. This latter study used a nonquantitative RT-PCR-based assay to detect the trans-spliced mRNA. It is therefore difficult to assess whether the increased distance between sites of transcription of both pre-mRNAs diminishes the efficiency of trans-splicing. If the level of trans-splicing is low when the mutant alleles are physically far away in the genome, the levels of protein synthesized might not be sufficient to restore the wild-type function of the mod(mdg4) gene. If this is the case, interallelic complementation at the mod(mdg4) locus will exhibit properties similar to transvection, in which phenotypic complementation is sensitive to the pairing of the two complementing alleles. The combined analysis of trans-splicing and somatic pairing of homologous chromosomes may constitute a powerful tool to study the intricate succession of events involved in the transcription and processing of the RNA in higher eukaryotes (Mongelard, 2002).

The modifier of mdg4, mod(mdg4), locus in Drosophila melanogaster represents a new type of complex gene in which functional diversity is resolved by mRNA trans-splicing. A protein family of greater than 30 transcriptional regulators, which are supposed to be involved in higher-order chromatin structure, is encoded by both DNA strands of this locus. Mutations in mod(mdg4) have been identified independently in a number of genetic screens involving position-effect variegation, modulation of chromatin insulators, apoptosis, pathfinding of nerve cells, and chromosome pairing, indicating pleiotropic effects. The unusual gene structure and mRNA trans-splicing are evolutionary conserved in the distantly related species Drosophila virilis. Chimeric mod(mdg4) transcripts encoded from nonhomologous chromosomes containing the splice donor from D. virilis and the acceptor from D. melanogaster are produced in transgenic flies. A significant amount of protein can be produced from these chimeric mRNAs. The evolutionary and functional conservation of mod(mdg4) and mRNA trans-splicing in both Drosophila species is furthermore demonstrated by the ability of D. virilis mod(mdg4) transgenes to rescue recessive lethality of mod(mdg4) mutant alleles in D. melanogaster (Gabler, 2005).

The majority of genes in higher eukaryotes represents monocistronic units where noncoding intron regions interrupt the protein-coding exon sequences. The resulting mature mRNA usually encodes a unique polypeptide. Recent advances in genome analysis of several model organisms and the molecular characterization of a large number of genes revealed that alternative pre-mRNA splicing is one of the main mechanisms generating a highly expanded proteome diversity. Thus, protein families with slightly different isoforms or even proteins with unrelated functions can be produced from single or multiple promoter elements within one gene. Regulatory integration of different transcriptional units is found in gene complexes like Hox genes, hemoglobin genes, or immunoglobin genes. This organization reflects clustering of genes with related functions. With mod(mdg4) a new type of functional clustering has been discovered in Drosophila. This complex locus encodes greater than 30 isoforms generated by mRNA trans-splicing. Protein isoforms produced by mod(mdg4) contain a common 402-amino-acid N-terminal region encoded by the four 5'-exons but differ in their C-terminal region encoded by alternative 3'-exons. This kind of trans-splicing clearly differs from splice leader trans-splicing that predominates in Caenorhabditis and Trypanosomes where polycistronic transcripts are resolved by addition of noncoding leader sequences. Mutational dissection and differential binding of Mod(mdg4) isoforms onpolytene chromosomes suggest that the variable C-terminal regions encoded by any of the alternative 3'-exons determine functional specificity. Specific Mod(mdg4) isoforms are supposed to be involved in control of heterochromatic gene silencing, regulation of homeotic genes, function of chromatin insulators, nerve cell pathfinding, induction of apoptosis, and control of meiotic processes. Genomic structure and transgene analysis demonstrate the specific functional organization of the complex mod(mdg4) locus.Mature mod(mdg4) transcripts are generated by a trans-splicing mechanism combining one primary transcript comprising the common four 5'-exons with another transcription unit contributing one of the alternative 3'-exons. A comparably complex gene structure was also described for a number of other genes in Drosophila, including Broad, tramtrack, GAGA-factor/Trl, and lola, all of which encode numerous protein isoforms with alternative C termini. Interestingly, in addition to mod(mdg4), mRNA trans-splicing was recently reported for the lola locus. Another unique characteristic of these genes is that they all encode BTB/POZ domain proteins, which frequently contain Cys2His2 zinc-finger motifs within the variable C-terminal region (Gabler, 2005).

The limited knowledge of the functional significance of the large number of mod(mdg4) isoforms and the unusual type of gene structure in D. melanogaster prompted an analysis of the orthologous locus from the distantly related species D. virilis. It represents an evolutionarily distant species that was separated ~40–60 million years ago from the Sophophora, which includes D. melanogaster. This period of time allowed for the selection of functionally essential genes. A number of orthologous genes have been studied in detail and their functional conservation in D. virilis was demonstrated by mutant rescue experiments. The degree of the overall conservation within coding regions is variable and can reach up to 98% similarity. The results demonstrate a strong evolutionary conservation of all Mod(mdg4) isoforms identified in D. virilis, indicating the functional significance of the multiple isoforms. Evidence has been presented for a functional differentiation of at least two isoforms, Mod(mdg4)-58.0 and Mod(mdg4)-67.2 in D. melanogaster. The high degree of sequence conservation of both isoforms in D. virilis is in good agreement with binding to corresponding sites on polytene chromosomes as shown for isoform Mod(mdg4)-58.0. Its binding to corresponding subdivisions on polytene chromosomes suggests an involvement in regulation of a subset of orthologous genes in D. melanogaster and D. virilis (Gabler, 2005).

The common N-terminal region, which is part of all isoforms and therefore supposed to contribute general functions, shows an extended identity beyond the BTB/POZ domain. This common protein region represents about two-thirds of any of the Mod(mdg4) proteins. The ubiquitously expressed protein Chip interacts with the common region of Mod(mdg4) in D. melanogaster. Chip is supposed to facilitate enhancer-promoter interactions in a large number of genes and interacts genetically and physically with several LIM- and homeodomain-containing transcription factors. These data, together with the observed pleiotropic mutant effects of most mod(mdg4) mutants, indicate a putative link between the several hundred binding sites of Mod(mdg4) on polytene chromosomes and their involvement in transcriptional regulation of a large number of genes. The strong conservation of the common protein region in both Drosophila species might be the consequence of the evolutionarily conserved interaction with Chip and other putative interacting proteins. The N-terminal BTB/POZ domain is almost identical in both species. This domain was shown to mediate homo- and/or heterodimerization. A similar degree of conservation between D. melanogaster and D. virilis was found for the BTB/POZ domain containing gene GAGA/Trl. Also in this case at least two alternatively spliced isoforms containing a common N-terminal region of 400 amino acids but variable C termini have been described. However, in contrast to mod(mdg4), no significant functional differentiation between the two GAGA isoforms has been described (Gabler, 2005).

If specific C termini of orthologous Mod(mdg4) isoforms are compared, a remarkable degree of identity within the FLYWCH domain, a Cys2His2-motif-containing protein domain, is found. This domain is supposed to be involved in protein-protein interactions. Strong conservation of most amino acid positions within this motif between orthologous isoforms implies their functional importance for isoform-specific interactions with other proteins. The unique C-terminal region of isoform Mod(mdg4)-67.2 has been demonstrated to interact with Su(Hw) to create a functional gypsy insulator element whereas the unique C terminus of isoform Mod(mdg4)-56.3/Doom interacts with the baculovirus inhibitor of apoptosis protein/IAP. The high degree of sequence identity suggests that these interactions are conserved in D. virilis. If the orthologous D. virilis isoforms Mod(mdg4)-64.2, Mod(mdg4)-60.1, and Mod(mdg4)-67.2 are compared with their counterparts in D. melanogaster, it becomes evident that additional amino acid positions flanking the FLYWCH motif are highly conserved. However, the extension and the location of the identity beyond the FLYWCH motif is isoform dependent. In case of Mod(mdg4)-67.2, an additional strongly conserved sequence motif of 22 amino acids is located at the C terminus. On the basis of pull-down experiments with a C-terminal truncated (deletion of 43 amino acids) Mod(mdg4)-67.2 protein and the observed phenotype connected with the corresponding mutant protein (Mod(mdg4)-67.2T6) the FLYWCH domain itself is not sufficient for interaction with Su(Hw), indicating the functional importance of the strongly conserved 22 C-terminal amino acids. Also, the isoforms without the FLYWCH motif are conserved as shown for Mod(mdg4)-58.0 (identity of 51% within the unique C terminus). Recently, an evolutionary analysis of several Dipteran orthologous mod(mdg4) loci revealed a significant conservation of most isoforms, including Mod(mdg4)-58.0, Mod(mdg4)-60.1, Mod(mdg4)-64.2, and Mod(mdg4)-67.2 (Gabler, 2005).

Two conclusions can be drawn from the evolutionary conservation of Mod(mdg4) proteins. First, the large number of isoforms is functionally important in both Drosophila species and second, the conservation of the unique C-terminal regions clearly points to a functional differentiation between single isoforms (Gabler, 2005).

In the present study it was demonstrated that along with the evolutionary conservation of the unusual gene structure of mod(mdg4) in D. virilis mRNA trans-splicing is also conserved in both species. Three different assays were performed to prove the existence of chimeric transcripts in vivo. The identification of chimeric mod(mdg4) isoforms in transgenic flies clearly indicates that the mechanism of mRNA trans-splicing is conserved between the distantly related Drosophila species. Quantitative RT-PCR experiments reveal that in case of isoform Mod(mdg4)-67.2 the chimeric D. virilis/D. melanogaster transcript in transgenic flies containing two copies of the second chromosomal P(w+ Dv mod(mdg4) 6.8kb NotI-XbaI) transgene represents ~12% of the corresponding endogenous D. melanogaster transcript. The Mod(mdg4)-67.2 protein can be clearly detected on polytene chromosomes of 2-P(w+ Dv mod(mdg4) 11.5kb NotI)/+; mod(mdg4)02/mod(mdg4)02 larvae but not in mod(mdg4)02 homozygous larvae. Because the specific mod(mdg4)-67.2 exons are not encoded by the D. virilis transgene, this result strongly suggests that the cytologically detected protein represents the chimeric D. virilis/D. melanogaster Mod(mdg4)-67.2 protein, which is produced in a significant amount. In fact, the presence of considerable amounts of the full-length Mod(mdg4)-67.2 protein was demonstrated in Western blot analysis. The maintenance of the binding pattern of the chimeric Mod(mdg4)-67.2 isoform compared to the D. melanogaster Mod(mdg4)-67.2 on polytene chromosomes also implicates the functional conservation of the D. virilis N-terminal region (Gabler, 2005).

Interallelic complementation is facilitated by mRNA trans-splicing if two mutations disrupting independent mod(mdg4) mRNAs are combined in trans. It is assumed that the close proximity of donor and acceptor mRNAs within the mod(mdg4) locus is a prerequisite for generation of significant amounts of wild-type Mod(mdg4)-67.2 protein. The lola locus of D. melanogaster represents a second complex gene in which mRNA trans-splicing was demonstrated. Mutations interfering with the pairing of the lola locus reduce the in vivo trans-splicing of isoform T from 44 to 1%. However, the consequences on a protein level were not examined. The transgene assay ofr mod(mdg4) clearly demonstrates that even underrepresented chimeric transcripts produced from mRNAs encoded by nonhomologous chromosomes can produce considerable levels of the corresponding protein. Mutant rescue experiments with two different D. virilis mod(mdg4) transgenes indicate the functional conservation of Mod(mdg4) protein isoforms. Both the P(w+ Dv mod(mdg4) 11.5kb NotI) transgene, which encodes the five proximal isoforms, and the P(w+ Dv mod(mdg4) 6.8kb NotI-XbaI) transgene, encoding exclusively common exons 1–4, facilitate rescue of recessive lethality of mod(mdg4) mutant alleles. It is supposed that the rescue ability of the short transgene depends mainly on its capacity to produce sufficient chimeric transcripts consisting of the D. virilis common exons and the endogenous D. melanogaster specific exons, which was demonstrated at least for isoform Mod(mdg4)-67.2. However, the significantly reduced rescue ability of the shorter transgene indicates that all or some isoforms have to exceed a critical threshold to restore viability completely. The P(w+ Dv mod(mdg4) 11.5kb NotI) transgene, which produces five orthologous D. virilis isoforms, significantly improves rescue ability. Position effects influencing the expression level of the transgene cannot be excluded. Further experiments with a series of independent insertions of the short transgene scattered throuhgout the genome should provide further insight into a putative correlation of genomic transgene position and efficiency of trans-splicing (Gabler, 2005).

The observed frequency of chimeric transcripts, although significantly lower as compared to the corresponding endogenous transcript, can be interpreted in two ways. First, the splice donor containing the D. virilis mod(mdg4) common exons is produced at a high level, enabling its spreading in the nucleus. Thus a significant number of donor molecules are in close proximity to mod(mdg4) acceptor mRNAs, even if they are transcribed from a nonhomologous chromosome. The much higher expression of the common exons compared to the specific isoform mod(mdg4)-67.2 in w1118 females (116-fold) is in agreement with this hypothesis. A second explanation supposes transcription of both precursor mRNAs within the same compartment of the nucleus, thereby increasing the frequency of chimeric mRNAs (Gabler, 2005).

Insulators form gene loops by interacting with promoters in Drosophila

Chromatin insulators are regulatory elements involved in the modulation of enhancer-promoter communication. The 1A2 and Wari insulators are located immediately downstream of the Drosophila yellow and white genes, respectively. Using an assay based on the yeast GAL4 activator, it was found that both insulators are able to interact with their target promoters in transgenic lines, forming gene loops. The existence of an insulator-promoter loop is confirmed by the fact that insulator proteins could be detected on the promoter only in the presence of an insulator in the transgene. The upstream promoter regions, which are required for long-distance stimulation by enhancers, are not essential for promoter-insulator interactions. Both insulators support basal activity of the yellow and white promoters in eyes. Thus, the ability of insulators to interact with promoters might play an important role in the regulation of basal gene transcription (Erokhin, 2011).

Insulators regulate gene activity in a variety of organisms. The defining feature of insulators as a class of regulatory elements is their ability to block enhancer-promoter interactions only when positioned between them (Erokhin, 2011).

Two mutually non-exclusive but rather complementary mechanisms can account for the ability of insulators to block enhancers and support long-distance interactions. Experiments with transgenic lines suggest that the interaction between insulators can result in the formation of chromatin loops that either block or facilitate long-distance enhancer-promoter communication depending on the nature of the interacting insulators as well as on the distances between all the elements involved (enhancers, insulators and promoters) and their relative 'strength'. Alternatively, insulator action can be explained by the ability of insulators to form direct contacts with either an enhancer (the decoy model) or a promoter, thereby inactivating them. For example, the insulator protein CTCF binds to the unmethylated maternal allele of the imprinting control region (ICR) in the Igf2/H19 imprinting domain and blocks enhancer-promoter communication by directly interacting with Igf2 promoters. Insulators of the Drosophila Abd-B gene can establish contact with a region upstream of the promoter that is required for proper enhancer-promoter communication. Several Drosophila insulators [scs, scs', IdefixU3 and Faswb] have been shown to contain promoters, which, according to the decoy model, may tether enhancers in nonproductive interactions. The stalled promoters of the bithorax complex display insulator activity in embryos. Many insulator proteins, such as CTCF, CP190, Mod(mdg4)-67.2 [Mod(mdg4) - FlyBase] and BEAF (BEAF-32), are frequently found bound to the promoters (Erokhin, 2011 and references therein).

Previously, two well-studied tissue-specific Drosophila genes, yellow and white, were shown to contain insulators immediately downstream of their coding regions. The yellow gene is responsible for dark pigmentation of the larval and adult cuticle and its derivatives, whereas the white locus determines eye pigmentation. The 1A2 insulator located on the 3' side of the yellow gene contains two binding sites for the Su(Hw) protein. Additional proteins, Mod(mdg4)-67.2, CP190 and E(y)2, interact with Su(Hw) and are required for the activity of Su(Hw)-dependent insulators. None of the known DNA-binding insulator proteins binds to the Wari insulator located on the 3' side of the white gene. However, stage-specific binding of CP190 and E(y)2 to the Wari insulator has been observed (Erokhin, 2010), which was indicative of its relationship to Su(Hw) insulators (Erokhin, 2011).

This study presents evidence that the 1A2 and Wari insulators interact with their target promoters and that this facilitates the formation of a gene loop between the promoter and terminator regions (Erokhin, 2011).

These insulators can support a gene loop that brings together a promoter and a terminator. The results obtained by ChIP assay suggest that insulator-promoter interactions are transcription dependent. To date, transcription-dependent gene looping has been demonstrated in yeast and HIV provirus. In yeast, loop formation was reported to be organized by TFIIB and the Ssu72 and Pta1 components of the 3'-end processing machinery. It is possible that this mechanism is conserved between eukaryotes and that the interaction between an insulator and a promoter is required to facilitate the formation of a gene loop and/or its stabilization (Erokhin, 2011).

It has been suggested that gene loop formation might be a common feature of gene activation that serves to promote efficient transcriptional elongation and transcription reinitiation by facilitating RNAP II recycling from the terminator to the promoter, reinforcing the coupling of transcription with mRNA export and enhancing terminator function. This study has found that the interaction of insulators with promoters is required for the basal activity of the white and yellow promoters in the eye. In addition to the possible role of a gene loop in the enhancement of RNAP II recycling and mRNA export, insulators might serve to bring to the promoter the remodeling and histone modification complexes that improve the binding and stabilization of the TFIID complex (Erokhin, 2011).

Recently, Chopra (2009) have found that the enhancer-blocking activity of several promoters and insulators depends on general transcription factors that inhibit RNAP II elongation. That study suggests that insulators interact with components of the RNAP II complex at stalled promoters and that the resulting chromatin loops can prevent the inappropriate activation of stalled genes by enhancers associated with the neighboring locus. This study found that the upstream promoter regions required for interactions with enhancers are not necessary for insulator-promoter interactions, which provides evidence that insulator proteins can interact with general transcription factors or proteins involved in the organization of promoter architecture. Certain types of insulators [the Su(Hw)-dependent 1A2, the Zw5-dependent scs, and Wari] can effectively interact with the yellow promoter, whereas others appear not to (the GAF-dependent Fab-7 and CTCF-dependent Mcp). GAF and CTCF are frequently found bound to promoter regions (Smith, 2009; Bartkuhn, 2009; Bushey, 2009; Nègre, 2010), which indicates that insulators that utilize these proteins are also involved in long-distance interactions with some promoters. For example, it is speculated that the Fab-7 insulator can interact with stalled promoters, such as the Abd-B promoter (Erokhin, 2011).

This study has shown that the GAL4 activator is unable to stimulate the promoter when GAL4 binding sites are placed downstream of the insulator. It appears likely that the loop is also formed between the insulator and promoter in this case, but that GAL4 is rendered outside the loop and blocked by the insulator. Thus, a chromatin loop formed by the promoter and insulator can prevent undesirable interactions with downstream regulatory elements. This provides evidence that the promoter-binding capacity of at least some insulators might contribute to their enhancer-blocking activity (Erokhin, 2011).

The genome-wide analysis of binding sites for insulator proteins has shown that they are present at the 3' and 5' UTRs of many Drosophila genes (Nègre, 2010). The 1A2 and Wari insulators at the 3' end of the yellow and white genes were identified only as a result of the extensive use of these genes in insulator assays. Thus, it appears that insulators are likely to be located at the 3' UTRs of many genes. Further experiments are required to resolve this issue and to elucidate the mechanisms and functional role of insulator-promoter interactions in transcriptional regulation (Erokhin, 2011).

Protein Interactions

suppressor of Hairy wing physically interacts with Modifier of mdg4 (Mod[mdg4]). su(Hw) protein was applied to a glutathione-Sepharose 4B column in the presence or absence of a gluathione S-transferase-Mod(mdg4) fusion protein, and the proteins retained in the column were eluted with glutathione and subjected to Western blot analysis. The su(Hw) protein is retained in the column only when previously incubated with modified Mod(mdg4) protein, indicating that the proteins physically interact (Gerasimova, 1995).

The gypsy insulator is thought to play a role in nuclear organization and the establishment of higher order chromatin domains by bringing together several individual insulator sites to form rosette-like structures in the interphase nucleus. The Su(Hw) and Mod(mdg4) proteins are components of the gypsy insulator required for its effect on enhancer-promoter interactions. Using the yeast two-hybrid system, it has been shown that the Mod(mdg4) protein can form homodimers, which can then interact with Su(Hw). The BTB domain of Mod(mdg4) is involved in homodimerization, whereas the C-terminal region of the protein is involved in interactions with the leucine zipper and adjacent regions of the Su(Hw) protein. Analyses using immunolocalization on polytene chromosomes confirm the involvement of these domains in mediating the interactions between these proteins. Studies using diploid interphase cells further suggest the contribution of these domains to the formation of rosette-like structures in the nucleus. The results provide a biochemical basis for the aggregation of multiple insulator sites and support the role of the gypsy insulator in nuclear organization (Ghosh, 2001).

The formation of loops or higher order domains of chromatin structure requires the individual insulator sites from different chromosomal locations to come together in the nucleus. This organization must be mediated by interactions among protein components of the insulator. These interactions are indeed possible and take place in vivo in the case of the gypsy insulator of Drosophila. Mapping the domains of the Su(Hw) and Mod(mdg4) proteins involved in this interaction might shed light on how insulators could be involved in the establishment of higher order chromatin organization. Disruption of the leucine zipper and regions B and C of Su(Hw) renders the gypsy insulator unable to interfere with enhancer-promoter interactions. Results presented here indicate that disruption of this region of Su(Hw) also abolishes its interaction with Mod(mdg4) and eliminates the punctate nuclear staining pattern, suggesting that interaction between the two proteins is required for establishing domains in the nucleus, and that the establishment of these domains correlates with the functionality of the insulator (Ghosh, 2001).

Mod(mdg4) has at least 21 different isoforms generated by alternative splicing. All the proteins contain a common N-terminus of 402 amino acids that includes a BTB/POZ domain, whereas the C-terminus of the protein is variable. Most of these Mod(mdg4) proteins are present in a few sites on polytene chromosomes and only the Mod(mdg4) 2.2 protein (the product of a splice variants, the 2.2 kb transcript that is the major form in the wild-type Canton S strain) appears to be a general component of the gypsy insulator. The Su(Hw) protein interacts with Mod(mdg4) 2.2 through the C-terminal domain of the Mod(mdg4) 2.2 protein. Since this domain is specific to this form of the protein and it is not present in any of the other variants, this result supports the idea that Mod(mdg4) 2.2 is the component of the gypsy insulator, whereas other mod(mdg4)-encoded proteins might have more specific roles in the cell. Deletion of the BTB domain eliminates homodimeric interactions between Mod(mdg4) 2.2 and results in weakened interactions between Su(Hw) and Mod(mdg4) 2.2. This result could be interpreted as suggesting that Su(Hw) and Mod(mdg4) 2.2 interact through the BTB domain. However, this domain by itself is not able to interact with the full-length Su(Hw) protein or with the LZ-B-C region; this is not due to incorrect folding of the protein, since the BTB domain by itself is able to fold properly and mediate interaction with full-length Mod(mdg4) or another BTB domain. These results are interpreted to suggest that the BTB domain mediates the formation of Mod(mdg4) 2.2 dimers, which in turn are required to mediate the interaction with Su(Hw) (Ghosh, 2001).

BTB domain-containing proteins frequently have zinc fingers involved in DNA binding. The Mod(mdg4) 2.2 protein is unusual in the sense that it does not possess any such DNA-binding domain at the C-terminus. However, the presence of a domain that mediates interactions with Su(Hw), which binds DNA through its zinc fingers, might serve the purpose of recruiting this protein to chromatin. The BTB domain is responsible for self-oligomerization of proteins such as GAGA, promyelocytic leukemia zinc finger protein (PLZF) and ZID in vitro. Interestingly, although the BTB domain-containing promyelocytic leukemia zinc finger protein appears to form only dimers in solution, a short four-stranded antiparallel ß-sheet between two symmetry-related dimers can be observed in the crystal. This interaction involves four different peptide chains and, therefore, can give rise to the formation of tetramers and oligomers of higher stoichiometry, suggesting that BTB-containing proteins can form large multimers. This observation is especially significant in the context of proposed models for insulator function, which require multiple insulator sites to come together in one large aggregate. It might be possible for Mod(mdg4) 2.2 to interact with several Mod(mdg4) 2.2 molecules, thus helping to bring together several Mod(mdg4) binding sites to form insulator aggregates as observed in interphase diploid cells. Alternatively, Mod(mdg4) 2.2 might interact with other BTB domain-containing proteins, which might be an integral part of the gypsy insulator complex. The BTB domain forms an extensive dimer interface that is a possible binding site for other proteins. Since the presence of the BTB domain is only partially required for binding of Su(Hw), there might possibly be other as yet unidentified partners of Mod(mdg4) that interact with the BTB domain. Alternatively, the BTB dimer interface might stabilize the interaction of Su(Hw) with the C-terminal region of Mod(mdg4) (Ghosh, 2001).

The finding of specific domains of the Su(Hw) and Mod(mdg4) proteins that mediate intermolecular interactions provides a strong biochemical foundation for the involvement of these proteins in the establishment of chromosomal loops. These loops are the basis for the proposed role of insulators in the formation of higher order chromatin domains and nuclear organization of the chromosomes during interphase. These studies also provide support for the involvement of other proteins in insulator function. The identification of these proteins will provide additional evidence to understand the mechanisms by which these important sequences control eukaryotic gene expression (Ghosh, 2001).

A family of baculovirus inhibitor-of-apoptosis (IAP) genes is present in mammals, insects, and baculoviruses, but the mechanism by which these IAPs block apoptosis is currently unknown. A protein encoded by the Drosophila mod(mdg4) gene binds to the baculovirus IAPs. This protein induces rapid apoptosis in insect cells. Baculovirus IAPs and P35, an inhibitor of aspartate-specific cysteine proteases, block Doom-induced apoptosis. The carboxyl terminus encoded by the 3' exon of the doom cDNA, which distinguishes it from other mod(mdg4) cDNAs, is responsible for induction of apoptosis and engagement of the IAPs. Doom localizes to the nucleus, while the IAPs localize to the cytoplasm, but when expressed together, Doom and the IAPs all localize in the nucleus. Thus, IAPs might block apoptosis by interacting with and modifying the behavior of Doom-like proteins that reside in cellular apoptotic pathways (Harvey, 1997).

It is thought that su(Hw) protein forms discrete domains of gene activity by segregating promoters from enhancer elements through a change in chromatin organization. Functional domains of the su(Hw) protein have been characterized that mediate the silencing effect of mod(mdg4) mutations. Two of three regions of su(Hw), regions B and C, located between the leucine zipper motif and the C-terminal acidic domain, are conserved across Drosophila species and are necessary for both the unidirectional and bidirectional repression of transcription by su(Hw). These domains are implicated in an interaction with Mod(mdg4), which is thought to mediate the unidirectional repression due to insulator function. In contrast, two acidic domains, the N-terminal acidic domain and the C-terminal acidic domain, both dispensable for the unidirectional repression of enhancer elements, are critical for the bidirectional silencing of enhancer activity observed in mutants lacking functional Mod(mdg4) protein. Bidirectional repression is thought to be due to changes in large blocks of chromatin structure (Gdula, 1997).

The presence of a MAR/SAR within gypsy is not totally unexpected, since "boundary" elements are in general regions which contain not only enhancer and insulating elements, but also matrix attachment domains. The rather original feature of the gypsy sequence is that all three domains, which in general are sufficiently "dispersed" so as to allow isolation of "pure" enhancers, MAR/SAR, or insulators, are in the present case "gathered" within a single and relatively short (350 bp) sequence. This rather uncommon situation might in fact be relevant to the pressure for compactness within retroviral sequences, as it is known that retroviruses can only package a limited amount of genetic information. A consequence of compaction is that the gypsy insulator and its associated components are most probably interacting, in vivo, with elements of the nuclear matrix. Accordingly, proteins of the nuclear matrix might play a role in the insulation process, and conversely the su(Hw) protein (which is essential for insulation) might interact with proteins of the matrix. Such interactions could actually account for the data on gypsy insulation and fit with previously proposed models for the gypsy effects (Nabirochkin, 1998).

A first series of data strongly suggested that the gypsy insulator, like all previously characterized insulators, essentially prevents interactions between distal enhancer and promoter, without any direct repressing effect on the enhancer itself. This directional effect can most easily be accounted for by the "looping model" involving generation of structural domains isolated one from the other by attachment of boundary sequences (MAR/SAR) to the nuclear matrix. Alternatively, a series of data on gypsy insulation (essentially in mod(mdg4) mutants) discloses bidirectional repressing effects, which can be accounted for by a model involving heterochromatinization. The present data (showing that the gypsy insulator behaves as a MAR/SAR) are clearly in agreement with the structural looping model, but also support the heterochromatinization model. Indeed, the gypsy MAR/SAR DNA per se, in the absence of su(Hw) protein, is involved in histone H1 nucleation (as shown in this paper), and it has been demonstrated that histone H1 nucleation is associated with both DNA compaction and transcriptional silencing. Additionally, Laemmli and co-workers have found that histone H1 can be removed from MAR/SAR domains by distamycin and distamycin-like proteins (D-like proteins, such as the high mobility group proteins); this has led to the proposal that MAR/SARs can activate or repress transcription of adjacent genes depending on the nucleation/depletion of histone H1. The gypsy MAR/SAR could then be responsible for the repressing effect observed in the mod(mdg4) mutants, as well as in the present assay within heterologous cells (assuming further that appropriate D-like proteins are absent in those cells). Taking into account, in addition, that mutations in the mod(mdg4) or the su(Hw) genes modify position-effect variegation, it could be further hypothesized that the su(Hw)/mod(Mdg4) complex acts as the D-like proteins and modifies the nucleation processes to allow the switch from a repressing to an active state. Accordingly, a model in which the su(Hw) binding sites and the associated su(Hw)/mod(Mdg4) complex modulate the effects of the MAR/SAR DNA sequence could rather simply account for the biological effects of the gypsy insulator in both the wild type and su(Hw)/mod(mdg4) mutants. The proposed model would then reconcile the two previous models for gypsy insulation, i.e. the heterochromatinization and the looping models (Nabirochkin, 1998 and references).

Germ line transformation of white- Drosophila embryos with P-element vectors containing white expression cassettes results in flies with different eye color phenotypes due to position effects at the sites of transgene insertion. These position effects can be cured by specific DNA elements, such as the Drosophila scs and scs' and by gypsy elements, that have insulator activity in vivo. Matrix attachment regions (MARs) are DNA elements that are identified and defined by their ability to bind to DNA- and histone-depleted nuclei, which are generally termed nuclear matrices. MARs are typically AT-rich elements that contain consensus cleavage sites for topoisomerase II, and they may contain one or more loosely defined short sequence motifs, but, in general, their structures are not highly homologous. MARs are dispersed throughout eukaryotic genomes, having been found in centromeric DNA, within genes, and in intergenic regions. Especially interesting is the observation that the gypsy insulator of Drosophila has been identified as a MAR. This is a retroviral sequence that binds Suppressor of Hairy wing and the su(Hw) associated protein Mod(mdg4) (Nabirochkin,1998). The matrix-binding activities of MARs have been conserved throughout eukaryotic evolution. The functions of MARs in vivo are largely unknown, but one commonly held view is that MARs anchor individual chromatin loops to a proteinaceous matrix or scaffold in both interphase nuclei and mitotic chromosomes (Namciu, 1998 and references).

A test was performed of the ability of human MARS to insulate white from position effect variagation. Two different human MARs, from the apolipoprotein B and alpha1-antitrypsin loci, insulate white transgene expression from position effects in Drosophila. Both elements reduce variability in transgene expression without enhancing levels of white gene expression. In contrast, expression of white transgenes containing human DNA segments without matrix-binding activity is highly variable in Drosophila transformants. These data indicate that human MARs can function as insulator elements in vivo in Drosophila (Namciu, 1998).

Insulation of enhancer-promoter communication by a gypsy transposon insert in the Drosophila cut gene: Cooperation between Suppressor of Hairy-wing and Modifier of mdg4 proteins

The Drosophila mod(mdg4) gene products counteract heterochromatin-mediated silencing of the white gene and help activate genes of the bithorax complex. They also regulate the insulator activity of the gypsy transposon when gypsy inserts between an enhancer and promoter. The Su(Hw) protein is required for gypsy-mediated insulation, and the Mod(mdg4)-67.2 protein binds to Su(Hw). The aim of this study was to determine whether Mod(mdg4)-67.2 is a coinsulator that helps Su(Hw) block enhancers or a facilitator of activation that is inhibited by Su(Hw). Evidence is provided that Mod(mdg4)-67.2 acts as a coinsulator by showing that some loss-of-function mod(mdg4) mutations decrease enhancer blocking by a gypsy insert in the cut gene. The C terminus of Mod(mdg4)-67.2 binds in vitro to a region of Su(Hw) that is required for insulation, while the N terminus mediates self-association. The N terminus of Mod(mdg4)-67.2 also interacts with the Chip protein, which facilitates activation of cut. Mod(mdg4)-67.2 truncated in the C terminus interferes in a dominant-negative fashion with insulation in cut but does not significantly affect heterochromatin-mediated silencing of white. It is inferred that multiple contacts between Su(Hw) and a Mod(mdg4)-67.2 multimer are required for insulation. It is theorized that Mod(mdg4)-67.2 usually aids gene activation but can also act as a coinsulator by helping Su(Hw) trap facilitators of activation, such as the Chip protein (Gause, 2001).

This study found that certain loss-of-function alleles of mod(mdg4) reduce insulation by the Su(Hw) protein in the cut gene. This is evidence that mod(mdg4) products are not simply targets of Su(Hw) insulator activity but contribute to the insulator activity of Su(Hw). Wild-type Mod(mdg4)-67.2, the major protein product of mod(mdg4), interacts with a region of Su(Hw) that has been shown to be required for insulation in vivo, but the truncated versions of the Mod(mdg4)-67.2 proteins produced by the viable mod(mdg4)u1 and mod(mdg4)T6 alleles did not. This is consistent with the observation that binding of Mod(mdg4) proteins to Su(Hw) binding sites on salivary gland polytene chromosomes is greatly reduced in mod(mdg4)u1 mutants. mod(mdg4)u1 and mod(mdg4)T6 more strongly reduce insulator activity than do null alleles of mod(mdg4) and that this antimorphic nature of mod(mdg4)u1 may stem from the ability of the mutant protein to interact with wild-type Mod(mdg4)-67.2 protein. To explain these observations, a model is proposed in which a multimer of Mod(mdg4)-67.2 interacts with more than one Su(Hw) molecule to form the active insulator complex, and the truncated Mod(mdg4)-67.2 proteins produced by mod(mdg4)u1 and mod(mdg4)T6 destabilize this complex (Gause, 2001).

The evidence that Mod(mdg4)-67.2 is an active component of the gypsy insulator that blocks gene activation appears at first glance to be contradictory to the evidence indicating that the mod(mdg4) gene is a member of the trxG of genes that activate genes in the bithorax complex. Another trxG protein, however, also appears to have insulator activity. The GAGA factor encoded by the Trithorax-like (Trl) gene is similar to Mod(mdg4)-67.2 in that it contains a BTB/POZ motif at the N terminus, self-interacts, and supports activation of the bithorax complex. GAGA factor is also required for enhancer blocking by the insulator associated with the even-skipped promoter. This insulator activity requires GAGA binding sites just proximal to the transcription start site and is diminished by Trl mutations. Potential GAGA binding sites are found just proximal to many promoters in Drosophila, including sequences associated with insulator activity in the alpha1 tubulin gene promoter. The GAGA-dependent insulator just proximal to the eve promoter does not prevent activation of the eve promoter by upstream enhancers even though it is positioned between them. Indeed, GAGA binding sites just proximal to the engrailed gene promoter potentiate activation by an upstream enhancer. To resolve the paradoxical insulator and activator activities of the GAGA and Mod(mdg4)-67.2 BTB/POZ proteins, therefore, it must be theorized that the function of promoter-proximal insulators is to aid activation of the promoters that contain them by helping to capture and anchor distal activator or facilitator proteins near the promoter. If so, it is feasible that the Mod(mdg4)-67.2 protein has a promoter-anchoring function in the bithorax complex, but when bound to Su(Hw), it anchors activator or facilitator proteins far from the promoter, thereby preventing activation (Gause, 2001).

The centrosomal protein CP190 is a component of the gypsy chromatin insulator

Chromatin insulators, or boundary elements, affect promoter-enhancer interactions and buffer transgenes from position effects. The gypsy insulator of Drosophila is bound by a protein complex with two characterized components, the zinc finger protein Suppressor of Hairy-wing [Su(Hw)] and Mod(mdg4)2.2, which is one of the multiple spliced variants encoded by the modifier of mdg4 [mod(mdg4)] gene. A genetic screen for dominant enhancers of the mod(mdg4) phenotype identified the Centrosomal Protein 190 (CP190) as an essential constituent of the gypsy insulator. The function of the centrosome is not affected in CP190 mutants whereas gypsy insulator activity is impaired. CP190 associates physically with both Su(Hw) and Mod(mdg4)2.2 and colocalizes with both proteins on polytene chromosomes. CP190 does not interact directly with insulator sequences present in the gypsy retrotransposon but binds to a previously characterized endogenous insulator, and it is necessary for the formation of insulator bodies. The results suggest that endogenous gypsy insulators contain binding sites for CP190, which is essential for insulator function, and may or may not contain binding sites for Su(Hw) and Mod(mdg4)2.2 (Pai, 2004).

A genetic screen for dominant enhancers of mod(mdg4) has resulted in the identification of CP190 as a third component of the gypsy insulator. CP190 is present at gypsy retrotransposon insulator sites and overlaps extensively with Su(Hw) and Mod(mdg4)2.2 at presumed endogenous insulators. CP190 displays a specific distribution pattern on polytene chromosomes, showing significant overlap with Su(Hw) and Mod(mdg4)2.2 at the junctions between transcriptionally inert bands and transcriptionally active interbands. Similar localization patterns have been reported for other insulators. For example, the faswb insulator at the notch locus and the BEAF-32 protein of the scs' insulator are also present at the boundaries between bands and interbands. Results suggest that CP190 can bind DNA on its own or can be tethered to the chromosome through interactions with Su(Hw). Mutations in the CP190 gene impair the function of the insulator present in the gypsy retrotransposon without affecting the presence of Su(Hw) and Mod(mdg4)2.2, suggesting an essential task for CP190 in the activity of this insulator. In addition, the lethality of CP190 mutants suggests a critical role for the CP190 protein in the function of gypsy endogenous insulators. This essential role may be a consequence of the requirement of CP190 for the formation of insulator bodies in the nuclei of diploid cells (Pai, 2004).

The insulator present in the gypsy retrotransposon contains only Su(Hw) binding sites, and CP190 is present in this insulator through direct interactions with Su(Hw). The gypsy insulator contains 12 Su(Hw) binding sites, and at least four are needed for insulator activity. However, clusters of three or more Su(Hw) binding sites are rare in the genome. Therefore, a critical question is whether the sites of Su(Hw) and Mod(mdg4)2.2 localization present throughout the genome truly function as insulators. The presence of CP190 at these sites and its ability to bind DNA might explain this apparent paradox. For example, the endogenous insulator present in the yellow-achaete region has only two binding sites for Su(Hw). Nevertheless, the y454 fragment containing this insulator is able to bind CP190, suggesting that this protein might act in concert with Su(Hw) to confer insulator activity. It is therefore possible that endogenous gypsy insulators are composed of binding sites for Su(Hw) and/or for CP190 and, together with Mod(mdg4)2.2, form a complex. Endogenous gypsy insulators may have few or no Su(Hw) binding sites, and they may rely on CP190 to bind DNA and tether other insulator components such as Mod(mdg4)2.2 via protein-protein interactions (Pai, 2004).

Previous studies have suggested that gypsy insulators separated at a distance in the genome may come together and form large insulator bodies in the nucleus during interphase. These aggregates represent higher order structures of chromatin and are implicated in the regulation of gene expression by compartmentalizing the genome into transcriptionally independent domains. The formation of these aggregates appears to require Mod(mdg4) function because the large aggregates are missing in mod(mdg4) mutants. The formation of gypsy insulator bodies is severely impaired also in CP190 mutants, suggesting that CP190 plays an essential role in the formation of these bodies and in the establishment of the chromatin domain organization mediated by gypsy endogenous insulators. It is possible that the BTB/POZ protein-protein interaction domains of both CP190 and Mod(mdg4)2.2 are required for and contribute to the stability of the interactions among insulator sites. In vitro-expressed CP190 lacking the BTB/POZ domain is soluble, whereas the wt protein is not, further suggesting that CP190 might exist as a complex with itself or other proteins in vivo, and the formation of this complex is likely mediated by the BTB/POZ domain. However, because CP190 is present at the gypsy insulator in the absence of Mod(mdg4)2.2 protein, the interaction between these two proteins may not be crucial for CP190 recruitment to the insulator (Pai, 2004).

Previous studies have identified CP190 as a centrosome-specific protein during mitosis that also associates with chromatin during interphase. Although many of these studies have focused on the possible role of CP190 during cell division, the current results suggest that centrosomal function and cell division are not affected in CP190 mutants. This conclusion is supported by independent studies of CP190 function during the cell cycle. The main function of CP190 might then be to regulate chromosome-related processes during interphase. Several lines of evidence suggest that this role is related to the function of the gypsy insulator: mutations in CP190 alter gypsy-induced phenotypes; CP190 colocalizes with Su(Hw) and Mod(mdg4)2.2 on polytene chromosomes and in diploid cell nuclei, and CP190 associates physically with gypsy insulator components in vitro and in vivo. However, the centrosomal localization of CP190 might also be important for its role in the gypsy insulator despite being unnecessary for cell cycle progression. The centrosome could either be a temporary storage site for CP190 during mitosis, or a site for a mitosis-specific modification that could be important for CP190 reassociation with chromosomes later in the cell cycle. The presence of CP190 in the centrosome could also be related to the regulation of the level of this protein in the cell. In fact, it has been shown that some chromatin-binding proteins are targeted to the centrosome for degradation. Alternatively, the presence of CP190 at the centrosome might be related to a possible role in the ubiquitin modification pathway. Recent findings have linked BTB/POZ domain proteins to ubiquitin E3 ligase function, some of which are known to be present at the centrosome. CP190 may be involved in similar types of interactions as an adaptor for ubiquitin E3 ligases and might target associated insulator proteins to the centrosome during mitosis for ubiquitination and/or degradation, which in turn may be required for properly reestablishing chromosome domain boundaries after mitosis (Pai, 2004).

The ubiquitin ligase dTopors directs the nuclear organization of a chromatin insulator

Chromatin insulators are gene regulatory elements implicated in the establishment of independent chromatin domains. The gypsy insulator of D. melanogaster confers its activity through a protein complex that consists of three known components, Su(Hw), Mod(mdg4)2.2 (a spliced variant encoded by the modifier of mdg4), and CP190. Drosophila Topoisomerase I-interacting RS protein (dTopors) interacts with the insulator protein complex and is required for gypsy insulator function. In the absence of Mod(mdg4)2.2, nuclear clustering of insulator complexes is disrupted and insulator activity is compromised. Overexpression of dTopors in the mod(mdg4)2.2 null mutant rescues insulator activity and restores the formation of nuclear insulator bodies. dTopors associates with the nuclear lamina, and mutations in lamin disrupt dTopors localization as well as nuclear organization and activity of the gypsy insulator. Thus, dTopors appears to be involved in the establishment of chromatin organization through its ability to mediate the association of insulator complexes with a fixed nuclear substrate (Capelson, 2005).

A yeast two-hybrid screen for proteins that interact with Mod(mdg4)2.2 resulted in identification of dTopors as a factor involved in the activity of the gypsy insulator. dTopors was found to interact with the three known insulator components, Su(Hw), Mod(mdg4)2.2, and CP190, and to associate with the gypsy insulator complex on chromosomes and in diploid nuclei. Additionally, dTopors appears to physically associate with the nuclear lamina. Genetically, dTopors was shown to behave as a positive factor involved in gypsy insulator activity. Consistently, reduction in levels of dTopors, observed in the background of a dTopors-spanning deletion or of an inducible dTopors RNAi construct, results in the disruption of insulator activity. The effects of elevated levels of dTopors are particularly dramatic as they restore the activity of a compromised gypsy insulator on multiple levels. The enhancer blocking function of the insulator, the binding of Su(Hw) to chromatin, and the formation of insulator bodies in cell nuclei -- all compromised in mod(mdg4)u1 mutants -- are rescued by overexpression of dTopors (Capelson, 2005).

These effects can be explained by a model in which dTopors acts as a nuclear lamina-associated factor that serves to tether the gypsy insulator complexes to a fixed substrate. In the wild-type situation, Mod(mdg4)2.2 mediates the coalescence of distant insulator sites and the subsequent establishment of chromatin compartments, whereas dTopors may be involved in further organization of insulator bodies at specific nuclear attachment points through its direct interaction with both Mod(mdg4)2.2 and Su(Hw). The absence of Mod(mdg4)2.2 leads to the breakdown of nuclear organization and the destabilization of Su(Hw)-chromatin association. Through tethering distant insulator sites to a nuclear substrate, dTopors, when present at elevated levels, may be able to compensate for the loss of a component such as Mod(mdg4)2.2. By stabilizing the nuclear organization of insulator complexes, dTopors may also promote the binding of Su(Hw) to chromatin. This explanation is further reinforced by the observed disruptive effects of a lamin mutation on the nuclear organization and the enhancer blocking activity of the gypsy insulator (Capelson, 2005).

The connection between gypsy insulator activity and nuclear insulator bodies has relied predominantly on the effects of the mutations in Mod(mdg4)2.2 and CP190 on both enhancer blocking function and insulator body integrity. The activity of dTopors provides further evidence for a functional relationship between insulators and their nuclear localization, since rescue of insulator phenotypes by dTopors is accompanied by the recovery of insulator bodies. Establishment of independent chromatin domains, which has been proposed as the main function of insulators, is thought to rely on structural partitioning of chromatin through physical interactions between distant loci or through interactions with a fixed nuclear substrate. It has been previously intimated that gypsy insulators may employ both types of structural organization to ensure the establishment of domain autonomy. This work suggests that the gypsy insulator may undergo physical clustering through the BTB domains of Mod(mdg4)2.2 and of CP190 and may utilize the attachment to the nuclear lamina via dTopors. The interaction of the insulator with a nuclear substrate is further supported by a recent report that gypsy insulator proteins associate with the nuclear matrix, of which lamin is a principal component. Tethering to a subnuclear surface has also been implicated in the activity of the chicken β-globin insulator, where β-globin insulator loci were observed to interact with the nucleolar surface, perhaps via a direct association between the insulator protein CTCF and the nucleolar component nucleophosmin (Capelson, 2005).

The E3 ubiquitin ligase activity of dTopors was not found to act directly on the known insulator proteins, yet the RING domain of dTopors appears to be essential for its positive effect on the gypsy insulator. It thus remains possible that an unknown factor involved in insulator activity may be a substrate for dTopors-mediated ubiquitination. A connection between the gypsy insulator complex and the ubiquitin conjugation pathway is also suggested by the presence of BTB domains in Mod(mdg4)2.2 and CP190, since BTB domain proteins have been proposed to act as substrate adaptors for the ubiquitin RING E3 ligases. It is feasible that BTB-containing insulator proteins and RING-containing dTopors are involved in ubiquitin conjugation with functional consequences for the insulator (Capelson, 2005).

The association of dTopors with a subset of insulator binding sites on polytene chromosomes implies that its presence is not required by all insulator complexes. This may be a consequence of the proposed function of dTopors as a tethering factor, such that the interaction between distant insulator loci may alleviate the need for dTopors at every binding site of the insulator complex. Alternatively, it may suggest that endogenous insulator complexes are not all functionally equivalent, and that the enzymatic properties of dTopors may be important for specific insulator complexes. The ubiquitin ligase activity of dTopors may be involved in regulation of insulator complexes, such that modification of a yet uncharacterized component by ubiquitin can lead to variation in function of endogenous insulators (Capelson, 2005).

SUMO conjugation attenuates the activity of the gypsy chromatin insulator

Chromatin insulators have been implicated in the establishment of independent gene expression domains and in the nuclear organization of chromatin. Post-translational modification of proteins by Small Ubiquitin-like Modifier (SUMO) has been reported to regulate their activity and subnuclear localization. Evidence is presented suggesting that two protein components of the gypsy chromatin insulator of Drosophila melanogaster, Mod(mdg4)2.2 and CP190, are sumoylated, and that SUMO is associated with a subset of genomic insulator sites. Disruption of the SUMO conjugation pathway improves the enhancer-blocking function of a partially active insulator, indicating that SUMO modification acts to regulate negatively the activity of the gypsy insulator. Sumoylation does not affect the ability of CP190 and Mod(mdg4)2.2 to bind chromatin, but instead appears to regulate the nuclear organization of gypsy insulator complexes. The results suggest that long-range interactions of insulator proteins are inhibited by sumoylation and that the establishment of chromatin domains can be regulated by SUMO conjugation (Capelson, 2006).

Two protein components of the gypsy chromatin insulator, Mod(mdg4)2.2 and CP190, were found to be modified by SUMO in vitro and in vivo. dTopors was observed to interfere with their sumoylation by possibly disrupting the contacts between the SUMO E2 enzyme Ubc9 and substrate insulator proteins. The inhibitory effect of dTopors, although relatively subtle, is consistent across the various assays utilized such that any time dTopors was introduced at higher levels, either by direct addition in vitro or by increasing expression in vivo, it was found to result in reduced sumoylation of Mod(mdg4)2.2 and CP190. Disruption of SUMO conjugation by mutations in genes coding for Ubc9 and SUMO exerts a positive effect on gypsy insulator activity, suggesting that the normal role of SUMO modification is to antagonize insulator function. A fraction of chromatin-bound insulator proteins appears to be associated with SUMO, yet mutations in the SUMO pathway are not seen to affect the chromatin-binding properties of CP190 or Mod(mdg4)2.2. Instead, sumoylation interferes with the formation of nuclear insulator bodies, such that overexpression of Ubc9 leads to breakdown of nuclear insulator structures, whereas lower levels of Ubc9 and sumoylation result in a partial recovery of coalescence lost in the absence of Mod(mdg4)2.2 (Capelson, 2006).

These findings suggest that modification of CP190 and Mod(mdg4)2.2 by SUMO may prevent self-association and thus interfere with long-range interactions between distant insulator complexes required to form insulator bodies. Thereby, sumoylation may preclude formation of closed chromatin loops and the consequent establishment of autonomous gene expression domains (Capelson, 2006).

Multiple lines of evidence point to a role for SUMO modification in transcriptional repression. Sumoylation of histones has been characterized as a mark of repressed chromatin, whereas SUMO conjugation to certain transcriptional regulators leads to their association with histone deacetylases, which remove the active acetylation marks from histones. SUMO modification of the Polycomb group (PcG) protein SOP-2 is required for its function in stable repression of Hox genes, and another PcG repressor, Pc2, acts as a SUMO E3 ligase. Modification of gypsy insulator proteins by SUMO does not seem to associate them exclusively with transcriptional repression, as reduction of sumoylation in lwr/smt3 mutants results in the upregulation of expression from the ombP1-D1 locus, but in the downregulation of transcription at y2 and ct6. In these cases, transcriptional output appears to correlate only with the enhancer-blocking activity of the insulator. Nevertheless, it is possible that one of the roles of sumoylation involves association of selected insulator sites in the genome with transcriptional repression. Sumoylated insulator complexes may not participate in the formation of expression domains, but instead, could target silencing factors to the surrounding chromatin (Capelson, 2006).

In mammalian nuclei, the homolog of dTopors localizes to PML bodies, which are enriched in the SUMO conjugation machinery. If inhibition of sumoylation is also a property of mammalian Topors, it may play a role in preventing further sumoylation of factors that are targeted to these nuclear compartments. In this manner, ICP0 also localizes to the PML bodies, where it causes desumoylation of two primary components, PML and SP100. It has been reported that Topors may function as a SUMO E3 ligase for the tumor suppressor p53 protein. This apparent contradiction with the current results may be due to several reasons. Topors and dTopors may have diverged their functions regarding the SUMO pathway, such that Topors functions as a SUMO E3 while dTopors interferes with SUMO addition due to its conserved interaction with Ubc9. Alternatively, the involvement of dTopors in the SUMO pathway may be substrate-specific, since it may bind to Ubc9 in ways that allow for interaction with a given target protein or prevent it. In the context of the gypsy insulator, the interference of dTopors with sumoylation is consistent with previous observations that dTopors promotes insulator activity, whereas sumoylation appears to disrupt it (Capelson, 2006).

It has been suggested that SUMO conjugation may affect the function of the modified protein even after the SUMO tag itself has been removed, creating a cellular memory for protein regulation. This idea has arisen partly to explain the commonly observed contradiction between the small percentage of a given protein that is modified by SUMO and the dramatic consequences of the modification on the protein's cellular function. Sumoylation may be needed for proteins to enter stable complexes or functional states, but the persistence of the SUMO modification may not be required after the initial establishment. Thus, the actual effect of sumoylation may far exceed that of the detectable sumoylated population since the function of a much larger proportion of molecules has been altered by SUMO conjugation and subsequent deconjugation. Similarly to other reported cases, the sumoylated forms of Mod(mdg4)2.2 and of CP190 represent a small fraction of the total pool of the insulator proteins, yet the phenotypic effects of the loss of these forms are quite striking. It is possible that SUMO attachment regulates the initial organization of chromatin domains, perhaps in earlier development or following mitosis, yet once established, the domains may be stably maintained without SUMO. Additionally, the rapid conjugation and deconjugation cycle of the SUMO tag implies that sumoylation may be used by processes that require reassembly upon signal. In that sense, SUMO modification seems particularly suitable for the regulation of gene expression domains as it can result in 'remembered' yet flexible states (Capelson, 2006).

Genome-wide studies of the multi-zinc finger Drosophila Suppressor of Hairy-wing protein in the ovary

The Drosophila Suppressor of Hairy-wing [Su(Hw)] protein is a globally expressed, multi-zinc finger (ZnF) DNA-binding protein. Su(Hw) forms a classic insulator when bound to the gypsy retrotransposon and is essential for female germline development. These functions are genetically separable, as exemplified by Su(Hw)(f) that carries a defective ZnF10, causing a loss of insulator but not germline function. This sutyd, completed the first genome-wide analysis of Su(Hw)-binding sites (SBSs) in the ovary, showing that tissue-specific binding is not responsible for the restricted developmental requirements for Su(Hw). Mapping of ovary Su(Hw)(f) SBSs revealed that female fertility requires binding to only one third of the wild-type sites. It was demonstrate that Su(Hw)(f) retention correlates with binding site affinity and partnership with Modifier of (mdg4) 67.2 protein. Finally, clusters of co-regulated ovary genes flanked by Su(Hw)(f) bound sites were identifed, and it was shown that loss of Su(Hw) has limited effects on transcription of these genes. These data imply that the fertility function of Su(Hw) may not depend upon the demarcation of transcriptional domains. These studies establish a framework for understanding the germline Su(Hw) function and provide insights into how chromatin occupancy is achieved by multi-ZnF proteins, the most common transcription factor class in metazoans (Soshnev, 2012).

Su(Hw) is a broadly expressed transcription factor that is required for oogenesis. Much of the understanding of Su(Hw) function has been obtained through investigation of the gypsy insulator. These studies have led to the concept that Su(Hw) is an architectural protein involved in establishing higher order chromosomal structure critical for regulation of gene expression. However, emerging evidence suggests that the function of Su(Hw) extends beyond that of an insulator protein, including the recent demonstration that 1A-2, a cluster of two SBSs, is required for activation of yar, a non-coding RNA gene (Soshnev, 2008). These data suggest that Su(Hw) has multiple functions in the genome (Soshnev, 2012).

Previous studies estimate that between five to eighteen percent of SBSs are cell type specific, with evidence that 1%-3% of SBSs are developmentally regulated. This study used ChIP-seq coupled with extensive ChIP-qPCR to show that Su(Hw) chromosome occupancy is largely constitutive throughout development. While a small set of 'ovary-specific' SBSs were identified among the low fold enrichment SBSs, it was shown that these sites are occupied in non-ovary tissues. The data are consistent with the previous analysis of SBSs in the three megabase alcohol dehydrogenase region, in which Su(Hw) binding was conserved between different tissues. These studies provide a cautionary note for investigations relying solely on computational evaluation of high-throughput genomic datasets, as it was found that extensive validation is required to establish confident binding thresholds needed for data interpretation (Soshnev, 2012).

The ovary-specific developmental requirement for Su(Hw) may be explained based on its function at the gypsy insulator. The insulator properties of Su(Hw) suggest that oogenesis may require establishment of domain boundaries that permit appropriate gene expression in the ovary. To test this postulate, genome-wide binding sites were defined for Su(Hw)f, a mutant isoform that lacks insulator activity, but retains fertility. These studies revealed that Su(Hw)f was retained at only one third of wild-type sites. Ostensibly, these observations are surprising for an architectural protein, as two-thirds of SBSs can be lost without effects on essential functions needed for fertility. These global analyses were extended through direct studies of co-regulated gene clusters delimited by f-retained SBSs. Loss of Su(Hw) was shown to have limited, if any, effects on expression of these genes in the ovary. Based on these observations, it is suggested that the essential ovary function of Su(Hw) may not be related to establishment of boundaries of transcriptional domains, a conclusion supported by recent findings that null and nearly null alleles of mod(mdg4) and Cp190 do not affect oogenesis. It is suggested that Su(Hw) may act locally to change gene expression. Recent studies demonstrate that Su(Hw) is associated with repressed chromatin domains and is enriched in lamin-associated domains. These observations, together with findings that enhancer blocking activity of the gypsy insulator is disrupted by a lamin mutation, suggest that Su(Hw)-dependent regulation may involve gene silencing that requires Su(Hw) targeting to the nuclear periphery (Soshnev, 2012).

The availability of a high-quality dataset of SBSs provided the opportunity to investigate the genome-wide association of Su(Hw) with its partner proteins, Mod67.2 and CP190. These analyses showed that SBS-O sites represented the largest class. Further, it was found that SBS-O and SBS-C sites displayed sequence conservation that extended beyond the Su(Hw)-binding motif, which was not observed for the SBS-CM class. These data suggest that Mod67.2 confers greater flexibility to Su(Hw) association, a postulate supported by the demonstration that Mod67.2 facilitates Su(Hw) occupancy. These findings imply that the structurally related BTB-domain protein CP190 cannot replace the function of Mod67.2 in facilitating Su(Hw) occupancy of SBSs. Although SBSs collectively display no enrichment with genic features, a skewed localization of SBS-CM sites to the 5'- and 3'-end of genes and coding exons was found. Taken together, these data indicate that different classes of SBSs may have distinct regulatory contributions in the genome (Soshnev, 2012).

Su(Hw) has 12 ZnFs, with ten corresponding to C2H2 fingers and two corresponding to C2HC. Previous studies suggest that the major mode for Su(Hw) chromosome association is DNA binding, as loss of ZnF7 causes complete loss of in vivo localization to chromosomes that correlates with defective in vitro binding. This study has demonstrated that loss of ZnF10 eliminates Su(Hw)f occupancy at two-thirds of SBSs, with binding site selection of Su(Hw)f showing greater constraints than Su(Hw)+. While Su(Hw)f is lost at many genomic sites, this protein binds f-lost SBSs in vitro, although with reduced affinity relative to Su(Hw)+. Yet, this reduced Su(Hw)f-binding affinity cannot account for all f-lost sites, as there is an absence of a strict correlation between in vitro DNA binding and in vivo chromosome Su(Hw)f occupancy. Further investigation revealed that some SBSs showed tissue-specific Su(Hw)f retention and that Su(Hw)f retention was optimal at SBSs that associate with Mod67.2, a protein partner associated with enhanced occupancy of Su(Hw). Taken together, these data suggest that Su(Hw)f retention is affected by multiple factors, including DNA sequence, tissue-specific effects that may depend on local chromatin structure and a protein partner of the gypsy insulator complex (Soshnev, 2012).

Multi-ZnF domains are the most common DNA-binding motif among transcription factors in metazoan genomes. The data are relevant to understanding how mutation of a single ZnF within a large ZnF-binding domain impacts chromatin occupancy of this class of transcription factors. It was shown that individual fingers may make distinct contributions to chromosome association, without altering the recognition sequence of the binding site. Interestingly, a second well-characterized vertebrate insulator protein CCCTC-binding factor (CTCF) is an eleven ZnF DNA-binding protein. Mutations in the gene encoding CTCF have been found in several human tumor samples, including breast, prostate and kidney. These tumor-associated alleles carried missense mutations that changed specific CTCF ZnFs, with none producing a truncated protein. Interestingly, in vitro studies demonstrated that these CTCF ZnF mutants had altered in vitro DNA-binding properties, reminiscent of Su(Hw)f. However, no in vivo binding studies were completed. Data obtained from analysis of Su(Hw)f predict that the cancer-associated CTCF mutations may alter the in vivo landscape of CTCF occupancy genome-wide. As a result, these effects may lead to complex changes in gene expression that may promote tumorigenesis (Soshnev, 2012).

Tissue-specific regulation of chromatin insulator function

Chromatin insulators organize the genome into distinct transcriptional domains and contribute to cell type-specific chromatin organization. However, factors regulating tissue-specific insulator function have not yet been discovered. This study identified the RNA recognition motif-containing protein Shep as a direct interactor of two individual components of the gypsy insulator complex in Drosophila. Mutation of shep improves gypsy-dependent enhancer blocking, indicating a role as a negative regulator of insulator activity. Unlike ubiquitously expressed core gypsy insulator proteins, Shep is highly expressed in the central nervous system (CNS) with lower expression in other tissues. A novel, quantitative tissue-specific barrier assay was developed to demonstrate that Shep functions as a negative regulator of insulator activity in the CNS but not in muscle tissue. Additionally, mutation of shep alters insulator complex nuclear localization in the CNS but has no effect in other tissues. Consistent with negative regulatory activity, ChIP-seq analysis of Shep in a CNS-derived cell line indicates substantial genome-wide colocalization with a single gypsy insulator component but limited overlap with intact insulator complexes. Taken together, these data reveal a novel, tissue-specific mode of regulation of a chromatin insulator (Matzat, 2012).

Chromatin insulators are DNA-protein complexes that influence eukaryotic gene expression by organizing the genome into distinct transcriptional domains. Functionally conserved from Drosophila to humans, insulators regulate interactions between regulatory elements such as enhancers and promoters and demarcate silent and active chromatin regions. Chromatin insulators are thought to exert effects on gene expression by constraining the topology of chromatin and facilitating the formation of intra- and inter-chromosomal looping. These higher order interactions can vary between cell types, thereby facilitating tissue-specific transcriptional output (Matzat, 2012).

Drosophila harbor several distinct classes of chromatin insulators, including the well studied gypsy insulator, also known as the Suppressor of Hairy wing (Su(Hw)) insulator. The zinc-finger DNA-binding protein, Su(Hw), recognizes a particular motif, imparting specificity to the gypsy insulator. In addition to Su(Hw), the core gypsy insulator complex contains Centrosomal protein 190 (CP190), which also harbors a zinc finger domain, and the non-DNA-binding protein, Modifier of mdg4 2.2 (Mod(mdg4)2.2). These core proteins are required for gypsy insulator activity. Both CP190 and Mod(mdg4)2.2 contain broad complex, tramtrack, bric-a-brac (BTB) dimerization domains that have been suggested to mediate insulator-insulator interactions and facilitate the formation of long range insulator-mediated loops along the chromatin fiber (Matzat, 2012).

Specialized nuclear arrangement of gypsy insulator complexes correlates tightly with insulator function. The gypsy insulator proteins bind to thousands of sites throughout the genome with more than half of Su(Hw) binding sites occurring in intergenic regions and a large number of sites located within introns. Consistent with a role in boundary formation, Su(Hw) sites are positively correlated with both Lamin-associated domains and boundaries between transcriptionally active and silent chromatin. It has been shown that gypsy insulator proteins coalesce at a small number of foci in diploid nuclei, termed insulator bodies, which have been proposed to act either as hubs of higher order chromatin domains or storage sites for insulator proteins. Importantly, mutation of certain insulator components results in impaired insulator activity coincident with diffuse or smaller, more numerous insulator bodies. However, formation of insulator bodies is not sufficient for gypsy insulator activity, and a detailed mechanistic understanding of insulator bodies is still lacking. Nevertheless, the tight correlation between gypsy insulator function and insulator body localization suggests an important role for these structures. Finally, in addition to a variety of accessory proteins, a role for RNA in insulator function and insulator body organization was suggested based on RNA-dependent protein interaction with insulator complexes (Matzat, 2012).

Genome-wide studies indicate that the locations of insulator protein binding sites are mainly consistent across different cell types but that insulator-dependent looping configurations may dictate differences in gene expression. In Drosophila, it has been shown that external stimuli can alter chromatin association of CP190, possibly leading to a change in chromatin looping. Recent large-scale chromatin conformation capture (3C)-based studies have implicated insulator protein binding sites as key contact points mediating looping throughout the genome. In several studies across species, specific chromatin conformations are observed in loci that produce tissue- or cell-type specific transcripts. Whether insulators either establish tissue-specific chromatin organization or maintain configurations established via transcription is unclear. Furthermore, factors that control tissue-specific insulator-dependent chromatin organization remain unknown (Matzat, 2012).

This study identifies a CNS enriched, RNA recognition motif (RRM) containing protein, Alan Shepard (Shep), as the first tissue-specific regulator of gypsy insulator activity and insulator body localization. Shep interacts directly with Mod(mdg4)2.2 and Su(Hw) and also associates with gypsy insulator proteins in vivo. Using a novel quantitative, tissue-specific insulator assay, it was found that Shep negatively regulates gypsy insulator activity in the CNS. In addition, mutation of Shep improves compromised insulator function and insulator body formation. Finally, genome-wide localization in the CNS-derived BG3 cell line reveals enrichment of overlap between Shep and Mod(mdg4)2.2 but less frequent than expected overlap among Shep, Su(Hw) and Mod(mdg4)2.2 together. These data suggest that gypsy chromatin insulator function can be regulated in a tissue-specific manner (Matzat, 2012).

Two lines of evidence indicate that Shep affects insulator activity in a tissue-specific manner. First, insulator body localization is altered in CNS but not other tissues of shep mutants. Second, barrier activity is improved in CNS but not muscle tissue when Shep levels are reduced. Finally, genome-wide mapping of Shep and gypsy insulator proteins in BG3 cells reveals substantial overlap with individual insulator proteins but lack of three-way overlap, further supporting a role for Shep in negative regulation of insulator activity in certain tissues (Matzat, 2012).

Shep acts as a tissue-specific negative regulator of gypsy insulator function and insulator body localization. Shep localization is most enriched in the CNS at both embryonic and larval stages; however, it is also expressed at lower levels in additional tissues. Although this study has demonstrated that Shep functions in the CNS, Shep can also repress enhancer blocking activity in the wing and could possibly affect insulator activity in other tissues. For example, ubiquitous reduction of Shep levels strongly improves overall barrier activity, suggesting that tissues outside of the CNS may also harbor Shep activity. Nonetheless, Shep does not appear to function in all tissues; knockdown of Shep does not affect barrier activity in muscle tissue, no changes in insulator body localization are observed in eye or leg tissue of shep mutants, and no effect is observed for y2 enhancer blocking in pigment cells of shep mutants. Interestingly, when Shep is overexpressed in muscle tissue, reduction of barrier activity is observed, suggesting that a certain threshold of Shep protein is needed to repress insulator activity. Since Shep protein can be detected at least at low levels in all tissues tested thus far, it is unlikely that the mere presence of Shep protein is sufficient to disrupt gypsy insulator activity. It remains to be determined what other cofactors, such as proteins or RNAs, may contribute to Shep activity (Matzat, 2012).

Shep may negatively regulate insulator activity by interfering with insulator protein interactions required for their activity. ChIP-seq analyses shows that the genome-wide binding profile of Shep in CNS-derived BG3 cells overlaps substantially with that of Mod(mdg4)2.2 but not extensively with both Su(Hw) and Mod(mdg4)2.2 combined. Lack of three-way overlap is not entirely unexpected given that Shep is a negative regulator of gypsy insulator activities. Shep coimmunoprecipitation experiments copurify only a small fraction of total insulator proteins present in nuclear extracts, suggesting that Shep-insulator complexes are not abundant or not stable in vivo. Since Shep can bind either Mod(mdg4)2.2 or Su(Hw) in vitro at a 1:1 ratio, Shep binding could compete with direct interaction between Mod(mdg4)2.2 and Su(Hw) or their interactions with other factors such as CP190. Moreover, the finding that mod(mdg4) mutants are highly sensitive to Shep dosage suggests an antagonistic functional relationship between Mod(mdg4)2.2 and Shep. Specifically, Shep may negatively regulate higher order insulator-insulator complex interactions, which appear to be mediated by direct interaction between Mod(mdg4)2.2 and CP190. Insulator body localization in larval brains of shep, mod(mdg4)u1 mutants reverts back to a wildtype pattern compared to compromised mod(mdg4)u1 mutants, perhaps indicating that the normal function of Shep may be to prevent larger insulator complexes from forming in these cell types (Matzat, 2012).

The results are consistent with the possibility that Shep promotes tissue-specific chromatin configurations by modulating insulator complexes. While differential occupancy of insulator proteins at their respective binding sites may play a role in regulating certain loci, occupancy throughout the genome does not differ extensively between cell types. Therefore, alternate mechanisms to control insulator activity likely exist. Shep activity could prevent insulator-insulator contacts otherwise present in tissues that do not express shep, resulting in relief of enhancer blocking or repression by silencers. Interestingly, shep was identified as a regulator of complex behavioral traits in screens for altered sensory-motor responsiveness to gravity and aggressive behavior), suggesting the possibility that regulation of an insulator-based mechanism could exist to effect changes in neurological function (Matzat, 2012 and references therein).

Given that Shep is an RRM-containing protein, RNA-binding may contribute to the ability of Shep to associate with insulator complexes in vivo. Shep RRMs are highly conserved, and lethality caused by Shep overexpression in the mod(mdg4) mutant background is not observed when the RRMs are mutated. This result suggests that Shep RRMs may be functional with respect to insulator activity. One possibility is that the specific RNA bound by Shep could affect targeting of Shep to insulator sites. Another not mutually exclusive prospect is that Shep is recruited to chromatin cotranscriptionally by binding nascent transcripts. It will be important to determine in future studies if Shep binds RNA while in complex with gypsy insulator proteins as well as the identities of Shep and insulator-associated RNA. These results point to a novel role for Shep and possibly RNA to regulate insulator activity in a tissue-specific manner (Matzat, 2012).

EAST organizes Drosophila insulator proteins in the interchromosomal nuclear compartment and modulates CP190 binding to chromatin

Recent data suggest that insulators organize chromatin architecture in the nucleus. The best studied Drosophila insulator proteins, dCTCF (a homolog of the vertebrate insulator protein CTCF) and Su(Hw), are DNA-binding zinc finger proteins. Different isoforms of the BTB-containing protein Mod(mdg4) interact with Su(Hw) and dCTCF. The CP190 protein is a cofactor for the dCTCF and Su(Hw) insulators. CP190 is required for the functional activity of insulator proteins and is involved in the aggregation of the insulator proteins into specific structures named nuclear speckles. This study has shown that the nuclear distribution of CP190 is dependent on the level of EAST protein, an essential component of the interchromatin compartment. EAST interacts with CP190 and Mod(mdg4)-67.2 proteins in vitro and in vivo. Over-expression of EAST in S2 cells leads to an extrusion of the CP190 from the insulator bodies containing Su(Hw), Mod(mdg4)-67.2, and dCTCF. In consistent with the role of the insulator bodies in assembly of protein complexes, EAST over-expression led to a striking decrease of the CP190 binding with the dCTCF and Su(Hw) dependent insulators and promoters. These results suggest that EAST is involved in the regulation of CP190 nuclear localization (Golovnin, 2015).

Insulators belong to the class of regulatory elements that organize the architecture of chromatin compartments. Insulators, or chromatin boundaries, are characterized by two properties: they interfere with enhancer-promoter interactions when located between them and buffer transgenes from chromosomal positions effects. To date, chromatin insulators have been characterized in a variety of species, indicative of their involvement in the global regulation of gene expression (Golovnin, 2015).

The well-studied Drosophila insulator proteins, dCTCF (homolog of vertebrate insulator protein CTCF) and Su(Hw), are DNA-binding zinc finger proteins. The Su(Hw) protein, encoded by the suppressor of Hairy wing [su(Hw)] gene, was one of the first insulator proteins identified in Drosophila. The best-studied Drosophila insulator found within the 5'-untranslated region of the gypsy retrovirus consists of 12 directly repeated copies of Su(Hw) binding sites. Genetic and molecular approaches have led to the identification and characterization of three proteins recruited by Su(Hw) to chromatin-Mod(mdg4)-67.2, CP190, and E(y)2/Sus1-that are required for the activity of the Su(Hw)-dependent insulators. The mod(mdg4) gene, also known as E(var)3-93D, encodes a large set of BTB/POZ protein isoforms. One of these isoforms, Mod(mdg4)-67.2, by its specific C-terminal domain interacts with the enhancer-blocking domain of the Su(Hw) protein. The BTB domain is located at the N-terminus of Mod(mdg4)-67.2 and mediates homo-multimerization (Golovnin, 2015).

Su(Hw), dCTCF, and most of other identified insulator proteins interact with Centrosomal Protein 190 kD (CP190). This protein (1096 amino acids) contains an N-terminal BTB/POZ domain, an aspartic-acid-rich D-region, four C2H2 zinc finger motifs, and a C-terminal E-rich domain. The BTB domain of CP190 forms stable homodimers that may be involved in protein-protein interactions. In addition to these motifs, CP190 also contains a centrosomal targeting domain (M) responsible for its localization to centrosomes during mitosis. It has been shown that CP190 is recruited to chromatin via its interaction with the DNA insulator proteins in interphase nucleus (Golovnin, 2015).

The Su(Hw), dCTCF, Mod(mdg4)-67.2, and CP190 proteins colocalize in discrete foci, named insulator bodies, in the Drosophila interphase cell nucleus. Contradictory reports have been published in which the insulator bodies are described either as protein-based bodies in the interchromatin compartment or as chromatin domains. As shown recently, insulator proteins rapidly coalesce from diffusely distributed speckles into large punctate insulator bodies in response to osmotic stress (Golovnin, 2015).

Cell exposure to hypertonic treatment, which enhances molecular crowding, makes it possible to discriminate between nucleoplasmic bodies formed mainly of RNA and proteins (such as PML bodies) and chromatin compartments such as Polycomb bodies formed due to the interaction of distantly located chromatin regions bound by Polycomb proteins. Nucleoplasmic bodies disappear under less crowded conditions and reassemble under normally crowded conditions, which can be interpreted as a consequence of increased intermolecular interactions between components of nucleoplasmic bodies. Similar to PML bodies, insulator bodies are preserved under hypertonic treatment, in contrast to chromatin-based structures that disappear as proteins dissociate from chromatin. The CP190 protein is suggested to be critical for the activity of insulators and to regulate the entry of other insulator proteins into the speckles. At the same time, CP190 associates with centrosomes throughout the nuclear division cycle in syncytial Drosophila embryos. Nuclear localization of CP190 is also sensitive to various kinds of stress, suggesting that this process is highly regulated. However, the mechanisms and proteins responsible for localization of CP190 in different nucleus compartments are unknown. This study has shown that the nuclear distribution of CP190 depends on the level of EAST, which is located mainly in the interchromatin compartment of the nucleus. EAST is a nuclear protein of 2362 amino acids which, except for 9 potential nuclear localization sequences and 12 potential PEST sites, contains no previously characterized motifs or functional domains. Together with Skeletor, Chromator, and Megator proteins, EAST forms the spindle matrix during mitosis. In the interphase nuclei, EAST localizes to the extrachromosomal compartment of the nucleus and is essential for the spatial organization of chromosomes (Golovnin, 2015).

Despite that the bulk of interphase EAST resides in the interchromosomal domain, the current model assumes that EAST can transiently interact with chromosomes. EAST physically interacts with Megator, a 260-kDa protein with a large N-terminal coiled-coil domain capable of self-assembly. It has been speculated that Megator can form polymers that, together with EAST, may serve as a structural basis for the nuclear extrachromosomal compartment. The results show that EAST interacts with CP190 and Mod(mdg4)-67.2 proteins and modulates their aggregation into the nuclear speckles. In case of EAST overexpression, CP190 binding to chromatin is reduced; consequently, the binding of Mod(mdg4)-67.2 and Su(Hw) is reduced as well, since CP190 is essential for it. On the basis of these results, it is hypothesized that EAST regulates localization of CP190 and insulator protein complexes in the interchromatin compartment, with these complexes subsequently determining organization of chromatin insulators (Golovnin, 2015).

The results suggest that insulator bodies are sensitive to the concentration of EAST in interphase cells. The properties of insulator bodies described previously and in this study suggest that they are formed by multiple interactions between proteins and resemble nuclear bodies composed of aggregated proteins and RNAs. As shown previously, the CP190 and Mod(mdg4) proteins interact with Su(Hw) and dCTCF and help the latter to enter the insulator bodies (Golovnin, 2015).

Taking into account the high level of dCTCF and Mod(mdg4) co-binding to chromosomes, it appears that dCTCF interacts with an as yet unidentified Mod(mdg4) isoform. Mod(mdg4)-67.2 and CP190 conjugate to the small ubiquitin-like modifier protein (SUMO). Specific interactions mediated by SUMO, the ability of Mod(mdg4) BTB to form oligomers, and the interaction between the BTB domain of Mod(mdg4)-67.2 and CP190 contribute to specific aggregation of the Su(Hw)/Mod(mdg4)-67.2/CP190 and dCTCF/CP190 complexes into the insulator bodies (Golovnin, 2015).

According to current views, the Megator protein can form polymers that, together with EAST, may serve as a structural basis for the nuclear extrachromosomal compartment. The overexpression of EAST leads to an extension of the EAST-Megator compartment, with consequent reduction in the effective volume available for the insulator proteins in the cell. As a result, the concentration of the insulator proteins increases, contributing to stabilization of the compact protein conformations visualized as insulator bodies. By interacting with Mod(mdg4)-67.2 and CP190, EAST may also be directly involved in nucleation of insulator bodies. It is possible that the truncated version of EAST (from 933 to 2362 aa) can more easily interact with the insulator proteins, which leads to noticeable enlargement of insulator bodies in S2 cell expressing EAST933-2362. The overexpression of EAST leads to segregation of the CP190 protein in independent speckles. The results suggest that EAST interacts with the CP190 region that includes BTB, D, and M domains. These domains are also required for CP190 interactions with other insulator proteins (Golovnin et al., in preparation). Thus, an increase in the EAST concentration may lead to displacement of the insulator proteins from the complex with CP190 (Golovnin, 2015).

The results do not exclude the possibility that EAST overexpression directly leads to dissociation of CP190 from chromatin. During mitosis, CP190 colocalizes with EAST in the spindle matrix, and the increase in the amount of EAST may well be responsible for dissociation of CP190 prior to chromosome condensation (Golovnin, 2015).

According to the current model, the insulator bodies help to form protein complexes that subsequently bind to regulatory elements such as insulators and promoters. In view of this hypothesis, it is likely that disturbances in the insulator bodies caused by EAST overexpression are responsible for the decrease in CP190 binding to the regulatory regions such as dCTCF- and Su(Hw)-dependent insulators and promoters. As shown recently, CP190 is required for recruiting Su(Hw) and Mod(mdg4)-67.2, but not dCTCF, to chromatin. Accordingly, it was observed that EAST overexpression affects the chromosomal binding of Su(Hw), but not of dCTCF. CP190 specifically interacts with the Mod(mdg4)-67.2 isoform, and Mod(mdg4)-67.2 at all Su(Hw) binding sites is colocalized with CP190. Thus, CP190 may be essential for recruiting the specific Mod(mdg4)-67.2 isoform to the Su(Hw) binding sites, with subsequent decrease in the amount of CP190 at the Su(Hw) binding sites, which leads to the substitution of Mod(mdg4)-67.2 by other Mod(mdg4) isoforms, as has been observed in this study (Golovnin, 2015).

Strong inactivation of EAST in S2 cells reduces the entry of the Mod(mdg4)-67.2/ Su(Hw) complex, but not of CP190, into the nucleus. It appears that EAST is involved in the regulation of nuclear localization of Mod(mdg4)-67.2, whose BTB domain can form multimeric complexes. Further study is required to elucidate this issue (Golovnin, 2015).


modifier of mdg4: Biological Overview | Developmental Biology | Effects of Mutation | References

Home page: The Interactive Fly © 1997 Thomas B. Brody, Ph.D.

The Interactive Fly resides on the
Society for Developmental Biology's Web server.