InteractiveFly: GeneBrief

CTCF: Biological Overview | References

Gene name - CTCF

Synonyms -

Cytological map position- 65F6-65F6

Function - transcription factor

Keywords - enhancer blocking, chromatin, boundary elements

Symbol - CTCF

FlyBase ID: FBgn0035769

Genetic map position - 3L: 7,346,677..7,349,796 [-]

Classification - zinc finger transcription factor

Cellular location - nucleus

NCBI links: EntrezGene

CTCF orthologs: Biolitmine

Recent literature
Fresán, U., Cuartero, S., O'Connor, M.B. and Espinás, M.L. (2015). The insulator protein CTCF regulates Drosophila steroidogenesis. Biol Open 4(7):852-7. PubMed ID: 25979705
The steroid hormone ecdysone is a central regulator of insect development. This report shows that CTC expression in the prothoracic gland is required for full transcriptional activation of the Halloween genes spookier, shadow and noppera-bo, which encode ecdysone biosynthetic enzymes, and for proper timing of ecdysone-responsive gene expression. Loss of CTCF resulted in delayed and less synchronized larval development that could only be rescued by feeding larvae with both, the steroid hormone 20-hydroxyecdysone and cholesterol. Moreover, CTCF-knockdown in prothoracic gland cells lead to increased lipid accumulation. In conclusion, the insulator protein CTCF is required for Halloween gene expression and cholesterol homeostasis in ecdysone-producing cells controlling steroidogenesis.

Shen, W., Wang, D., Ye, B., Shi, M., Zhang, Y. and Zhao, Z. (2015). A possible role of Drosophila CTCF in mitotic bookmarking and maintaining chromatin domains during the cell cycle. Biol Res 48: 27. PubMed ID: 26013116.
The CCCTC-binding factor (CTCF) is a highly conserved insulator protein that plays various roles in many cellular processes. CTCF is one of the main architecture proteins in higher eukaryotes, and in combination with other architecture proteins and regulators, also shapes the three-dimensional organization of a genome. Experiments show CTCF partially remains associated with chromatin during mitosis. However, the role of CTCF in the maintenance and propagation of genome architectures throughout the cell cycle remains elusive. This study performed a comprehensive bioinformatics analysis on public datasets of Drosophila CTCF (dCTCF). dCTCF-binding sites were characterized according to their occupancy status during the cell cycle, and three classes were identified: interphase-mitosis-common (IM), interphase-only (IO) and mitosis-only (MO) sites. Integrated function analysis showed dCTCF-binding sites of different classes might be involved in different biological processes, and IM sites were more conserved and more intensely bound. dCTCF-binding sites of the same class preferentially localized closer to each other, and were highly enriched at chromatin syntenic and topologically associating domains boundaries. These results revealed different functions of dCTCF during the cell cycle and suggested that dCTCF might contribute to the establishment of the three-dimensional architecture of the Drosophila genome by maintaining local chromatin compartments throughout the whole cell cycle.

Bonchuk, A., Maksimenko, O., Kyrchanova, O., Ivlieva, T., Mogila, V., Deshpande, G., Wolle, D., Schedl, P. and Georgiev, P. (2015). Functional role of dimerization and CP190 interacting domains of CTCF protein in Drosophila melanogaster. BMC Biol 13: 63. PubMed ID: 26248466
This study analyzed the CTCF, one of the few insulator proteins conserved from flies to man. The study focused on the identification and characterization of two CTCF protein interaction modules. The first mediates CTCF multimerization, while the second mediates CTCF-CP190 interactions. The multimerization domain maps in the N-terminus of the CTCF protein and likely mediates the formation of tetrameric complexes. The CP190 interaction module encompasses a sequence ~200 amino acids long that spans the C-terminal and mediates interactions with the N-terminal BTB domain of the CP190 protein. CTCF protein lacking sequences critical for CP190 interactions is almost as effective as wild type in rescuing the phenotypic effects of a CTCF null allele. The mutation does, however, affect CP190 recruitment to specific insulator elements and has a modest effect on CTCF chromatin association. A protein lacking the N-terminal CTCF multimerization domain incompletely rescues the zygotic and maternal effect lethality of the null and does not rescue the defects in Abd-B regulation evident in surviving adult CTCF mutant flies. Elimination of maternally contributed CTCF at the onset of embryogenesis has quite different effects on development and Abd-B regulation than is observed when the homozygous mutant animals develop in the presence of maternally derived CTCF activity. These results indicate that CTCF-CP190 interactions are less critical for the in vivo functions of the CTCF protein than the N-terminal CTCF-CTCF interaction domain. Also, that the phenotypic consequences of CTCF mutations differ depending upon when and how CTCF activity is lost.

Mourad, R., Li, L. and Cuvier, O. (2017). Uncovering direct and indirect molecular determinants of chromatin loops using a computational integrative approach. PLoS Comput Biol 13(5): e1005538. PubMed ID: 28542178
Chromosomal organization in 3D plays a central role in regulating cell-type specific transcriptional and DNA replication timing programs. Yet it remains unclear to what extent the resulting long-range contacts depend on specific molecular drivers. This study proposes a model that comprehensively assesses the influence on contacts of DNA-binding proteins, cis-regulatory elements and DNA consensus motifs. Using real data, a large number of predictions for long-range contacts involving known architectural proteins and DNA motifs is validated. The model outperforms existing approaches including enrichment test, random forests and correlation, and it uncovers numerous novel long-range contacts in Drosophila and human. The model uncovers the orientation-dependent specificity for long-range contacts between CTCF motifs in Drosophila, highlighting its conserved property in 3D organization of metazoan genomes. The model further unravels long-range contacts depending on co-factors recruited to DNA indirectly, as illustrated by the influence of cohesin in stabilizing long-range contacts between CTCF sites. It also reveals asymmetric contacts such as enhancer-promoter contacts that highlight opposite influences of the transcription factors EBF1, EGR1 or MEF2C depending on RNA Polymerase II pausing.
Rowley, M. J., Nichols, M. H., Lyu, X., Ando-Kuri, M., Rivera, I. S. M., Hermetz, K., Wang, P., Ruan, Y. and Corces, V. G. (2017). Evolutionarily conserved principles predict 3D chromatin organization. Mol Cell 67(5): 837-852.e837. PubMed ID: 28826674
Topologically associating domains (TADs), CTCF loop domains, and A/B compartments have been identified as important structural and functional components of 3D chromatin organization, yet the relationship between these features is not well understood. Using high-resolution Hi-C and HiChIP, this study shows that Drosophila chromatin is organized into domains that are termed compartmental domains that correspond precisely with A/B compartments at high resolution. Transcriptional state is a major predictor of Hi-C contact maps in several eukaryotes tested, including C. elegans and A. thaliana. Architectural proteins insulate compartmental domains by reducing interaction frequencies between neighboring regions in Drosophila, but CTCF loops do not play a distinct role in this organism. In mammals, compartmental domains exist alongside CTCF loop domains to form topological domains. The results suggest that compartmental domains are responsible for domain structure in all eukaryotes, with CTCF playing an important role in domain formation in mammals.
Pokholkova, G. V., Demakov, S. A., Andreenkov, O. V., Andreenkova, N. G., Volkova, E. I., Belyaeva, E. S. and Zhimulev, I. F. (2018). Tethering of CHROMATOR and dCTCF proteins results in decompaction of condensed bands in the Drosophila melanogaster polytene chromosomes but does not affect their transcription and replication timing. PLoS One 13(4): e0192634. PubMed ID: 29608600
Insulator proteins are central to domain organization and gene regulation in the genome. This study used ectopic tethering of CHROMATOR (CHRIZ/CHRO) and dCTCF to pre-defined regions of the genome to dissect the influence of these proteins on local chromatin organization, to analyze their interaction with other key chromatin proteins and to evaluate the effects on transcription and replication. Specifically, using UAS-GAL4DBD system, CHRO and dCTCF were artificially recruited into highly compacted polytene chromosome bands that share the features of silent chromatin type known as intercalary heterochromatin (IH). This led to local chromatin decondensation, formation of novel DHSes and recruitment of several "open chromatin" proteins. CHRO tethering resulted in the recruitment of CP190 and Z4 (Putzig), whereas dCTCF tethering attracted CHRO, CP190, and Z4. Importantly, formation of a local stretch of open chromatin did not result in the reactivation of silent marker genes yellow and mini-white immediately adjacent to the targeting region (UAS), nor did RNA polII become recruited into this chromatin. The decompacted region retained late replicated, similarly to the wild-type untargeted region.
Gambetta, M. C. and Furlong, E. E. M. (2018). The insulator protein CTCF is required for correct hox gene expression, but not for embryonic development in Drosophila. Genetics. PubMed ID: 30021792
Among Drosophila insulator binding proteins (IBPs) only CCCTC-binding factor (CTCF) has an obvious ortholog in Mammals. CTCF is essential for mammalian cell viability and is an important regulator of genome architecture. In flies, CTCF is both maternally deposited and zygotically expressed. Flies lacking zygotic CTCF die as young adults with homeotic defects, suggesting that specific Hox genes are misexpressed in inappropriate body segments. The lack of any major embryonic defects was assumed to be due to the maternal supply of CTCF protein, as maternally contributed factors are often sufficient to progress through much of embryogenesis. This study determined the requirement of CTCF for developmental progression in Drosophila. Animals were generated that completely lack both maternal and zygotic CTCF and it was found that, contrary to expectation, these mutants progress through embryogenesis and larval life. They develop to pharate adults, which fail to eclose from their pupal case. These mutants show exacerbated homeotic defects compared to zygotic mutants, misexpressing the Hox gene Abdominal-B outside of its normal expression domain early in development. These results indicate that loss of Drosophila CTCF is not accompanied by widespread effects on gene expression, which may be due to redundant functions with other IBPs. Rather, CTCF is required for correct Hox gene expression patterns and for the viability of adult Drosophila.
Chathoth, K. T. and Zabet, N. R. (2019). Chromatin architecture reorganisation during neuronal cell differentiation in Drosophila genome. Genome Res. PubMed ID: 30709849
The organization of the genome into topologically associating domains (TADs) was shown to have a regulatory role in development and cellular functioning, but the mechanism involved in TAD establishment is still unclear. This study presented the first high-resolution contact map of Drosophila neuronal cells (BG3) and identified different classes of TADs by comparing this to genome organization in embryonic cells (Kc167). Only some TADs were found to be conserved in both cell lines, whereas the rest are cell-specific TADs. This is supported by a change in the enrichment of architectural proteins at TAD borders, with BEAF-32 present in embryonic cells and CTCF in neuronal cells. Furthermore, strong divergent transcription was observed, together with RNA Polymerase II occupancy, and an increase in DNA accessibility at the TAD borders. TAD borders that are specific to neuronal cells are enriched in enhancers controlled by neuronal-specific transcription factors. These results suggest that TADs are dynamic across developmental stages and reflect the interplay between insulators, transcriptional states and enhancer activities.
Bonchuk, A., Kamalyan, S., Mariasina, S., Boyko, K., Popov, V., Maksimenko, O. and Georgiev, P. (2020). N-terminal domain of the architectural protein CTCF has similar structural organization and ability to self-association in bilaterian organisms. Sci Rep 10(1): 2677. PubMed ID: 32060375
CTCF is the main architectural protein found in most of the examined bilaterian organisms. The cluster of the C2H2 zinc-finger domains involved in recognition of long DNA-binding motif is only part of the protein that is evolutionarily conserved, while the N-terminal domain (NTD) has different sequences. Biophysical characterization was carried out of CTCF NTDs from various species representing all major phylogenetic clades of higher metazoans. With the exception of Drosophilides, the N-terminal domains of CTCFs show an unstructured organization and absence of folded regions in vitro. In contrast, NTDs of Drosophila melanogaster and virilis CTCFs contain unstructured folded regions that form tetramers and dimers correspondingly in vitro. Unexpectedly, most NTDs are able to self-associate in the yeast two-hybrid and co-immunoprecipitation assays. These results suggest that NTDs of CTCFs might contribute to the organization of CTCF-mediated long-distance interactions and chromosomal architecture.
Chen, X., Ke, Y., Wu, K., Zhao, H., Sun, Y., Gao, L., Liu, Z., Zhang, J., Tao, W., Hou, Z., Liu, H., Liu, J. and Chen, Z. J. (2019). Key role for CTCF in establishing chromatin structure in human embryos. Nature 576(7786): 306-310. PubMed ID: 31801998
In the interphase of the cell cycle, chromatin is arranged in a hierarchical structure within the nucleus, which has an important role in regulating gene expression. However, the dynamics of 3D chromatin structure during human embryogenesis remains unknown. This study reports that, unlike mouse sperm, human sperm cells do not express the chromatin regulator CTCF and their chromatin does not contain topologically associating domains (TADs). Following human fertilization, TAD structure is gradually established during embryonic development. In addition, A/B compartmentalization is lost in human embryos at the 2-cell stage and is re-established during embryogenesis. Notably, blocking zygotic genome activation (ZGA) can inhibit TAD establishment in human embryos but not in mouse or Drosophila. Of note, CTCF is expressed at very low levels before ZGA, and is then highly expressed at the ZGA stage when TADs are observed. TAD organization is significantly reduced in CTCF knockdown embryos, suggesting that TAD establishment during ZGA in human embryos requires CTCF expression. These results indicate that CTCF has a key role in the establishment of 3D chromatin structure during human embryogenesis.
Kyrchanova, O., Maksimenko, O., Ibragimov, A., Sokolov, V., Postika, N., Lukyanova, M., Schedl, P. and Georgiev, P. (2020). The insulator functions of the Drosophila polydactyl C2H2 zinc finger protein CTCF: Necessity versus sufficiency. Sci Adv 6(13): eaaz3152. PubMed ID: 32232161
In mammals, a C2H2 zinc finger (C2H2) protein, CTCF, acts as the master regulator of chromosomal architecture and of the expression of Hox gene clusters. Like mammalian CTCF, the Drosophila homolog, dCTCF, localizes to boundaries in the bithorax complex (BX-C). This study has determined the minimal requirements for the assembly of a functional boundary by dCTCF and two other C2H2 zinc finger proteins, Pita and Su(Hw). Although binding sites for these proteins are essential for the insulator activity of BX-C boundaries, these binding sites alone are insufficient to create a functional boundary. dCTCF cannot effectively bind to a single recognition sequence in chromatin or generate a functional insulator without the help of additional proteins. In addition, for boundary elements in BX-C at least four binding sites for dCTCF or the presence of additional DNA binding factors is required to generate a functional insulator.
Kaushal, A., Mohana, G., Dorier, J., Ozdemir, I., Omer, A., Cousin, P., Semenova, A., Taschner, M., Dergai, O., Marzetta, F., Iseli, C., Eliaz, Y., Weisz, D., Shamim, M. S., Guex, N., Lieberman Aiden, E. and Gambetta, M. C. (2021). CTCF loss has limited effects on global genome architecture in Drosophila despite critical regulatory functions. Nat Commun 12(1): 1011. PubMed ID: 33579945
Vertebrate genomes are partitioned into contact domains defined by enhanced internal contact frequency and formed by two principal mechanisms: compartmentalization of transcriptionally active and inactive domains, and stalling of chromosomal loop-extruding cohesin by CTCF bound at domain boundaries. While Drosophila has widespread contact domains and CTCF, it is currently unclear whether CTCF-dependent domains exist in flies. CTCF was genetically ablate in Drosophila, and mpacts on genome folding and transcriptional regulation were examined in the central nervous system. CTCF was found to be required to form a small fraction of all domain boundaries, while critically controlling expression patterns of certain genes and supporting nervous system function. It was also found that CTCF recruits the pervasive boundary-associated factor Cp190 to CTCF-occupied boundaries and co-regulates a subset of genes near boundaries together with Cp190. These results highlight a profound difference in CTCF-requirement for genome folding in flies and vertebrates, in which a large fraction of boundaries are CTCF-dependent and suggest that CTCF has played mutable roles in genome architecture and direct gene expression control during metazoan evolution.
Kyrchanova, O. V., Postika, N. Y., Sokolov, V. V. and Georgiev, P. G. (2022). Fragments of the Fab-3 and Fab-4 Boundaries of the Drosophila melanogaster Bithorax Complex That Include CTCF Sites Are not Effective Insulators. Dokl Biochem Biophys 502(1): 21-24. PubMed ID: 35275301
The segment-specific regulatory domains of the Bithorax complex (BX-C), which consists of three homeotic genes Ubx, abd-A and Abd-B, are separated by boundaries that function as insulators. Most of the boundaries contain binding sites for the architectural protein CTCF, which is conserved for higher eukaryotes. Previous work has shown that the CTCF sites determine the insulator activity of the boundaries of the Abd-B regulatory region. In this study, it was shown that fragments of the Fab-3 and Fab-4 boundaries of the abd-A regulatory region, containing CTCF binding sites, are not effective insulators.
Cavalheiro, G. R., Girardot, C., Viales, R. R., Pollex, T., Cao, T. B. N., Lacour, P., Feng, S., Rabinowitz, A. and Furlong, E. E. M. (2023). CTCF, BEAF-32, and CP190 are not required for the establishment of TADs in early Drosophila embryos but have locus-specific roles. Sci Adv 9(5): eade1085. PubMed ID: 36735786
The boundaries of topologically associating domains (TADs) are delimited by insulators and/or active promoters; however, how they are initially established during embryogenesis remains unclear. This was examined during the first hours of Drosophila embryogenesis. DNA-FISH confirms that intra-TAD pairwise proximity is established during zygotic genome activation (ZGA) but with extensive cell-to-cell heterogeneity. Most newly formed boundaries are occupied by combinations of CTCF, BEAF-32, and/or CP190. Depleting each insulator individually from chromatin revealed that TADs can still establish, although with lower insulation, with a subset of boundaries (~10%) being more dependent on specific insulators. Some weakened boundaries have aberrant gene expression due to unconstrained enhancer activity. However, the majority of misexpressed genes have no obvious direct relationship to changes in domain-boundary insulation. Deletion of an active promoter (thereby blocking transcription) at one boundary had a greater impact than deleting the insulator-bound region itself. This suggests that cross-talk between insulators and active promoters and/or transcription might reinforce domain boundary insulation during embryogenesis.
Kahn, T. G., Savitsky, M., Kuong, C., Jacquer, C., Cavalli, G., Chang, J. M. and Schwartz, Y. B. (2023). Topological screen identifies hundreds of Cp190- and CTCF-dependent Drosophila chromatin insulator elements. Sci Adv 9(5): eade0090. PubMed ID: 36735780
Drosophila insulators were the first DNA elements found to regulate gene expression by delimiting chromatin contacts. It is still not known how many of them exist and what impact they have on the Drosophila genome folding. Contrary to vertebrates, there is no evidence that fly insulators block cohesin-mediated chromatin loop extrusion. Therefore, their mechanism of action remains uncertain. To bridge these gaps, this study mapped chromatin contacts in Drosophila cells lacking the key insulator proteins CTCF and Cp190. With this approach, hundreds of insulator elements were found. Their study indicates that Drosophila insulators play a minor role in the overall genome folding but affect chromatin contacts locally at many loci. These observations argue that Cp190 promotes cobinding of other insulator proteins and that the model, where Drosophila insulators block chromatin contacts by forming loops, needs revision. This insulator catalog provides an important resource to study mechanisms of genome folding.

Eukaryotic transcriptional regulation often involves regulatory elements separated from the cognate genes by long distances, whereas appropriately positioned insulator or enhancer-blocking elements shield promoters from illegitimate enhancer action. Four proteins have been identified in Drosophila mediating enhancer blocking: Su(Hw), Zw5, BEAF32 and GAGA factor. In vertebrates, the single protein CTCF (CCCTC-binding factor), with 11 highly conserved zinc fingers, confers enhancer blocking in all known chromatin insulators. This study characterized an orthologous CTCF factor in Drosophila with a similar domain structure, binding site specificity and transcriptional repression activity as in vertebrates. In addition, this study demonstrates that one of the insulators (Fab-8) in the Drosophila Abdominal-B locus mediates enhancer blocking by CTCF. Therefore, the enhancer-blocking protein CTCF and, most probably, the mechanism of enhancer blocking mediated by this remarkably versatile factor are conserved from Drosophila to humans (Moon, 2005).

Expression of the eukaryotic genome is controlled by enhancer and silencer elements, both of which can mediate their function from a distance. Insulator elements with enhancer-blocking activity curb enhancer activity, such that only appropriate promoters are activated. The proteins that mediate insulator function have been identified for only a few Drosophila insulator sequences. These are Zw5, BEAF-32, GAGA factor and Su(Hw) (Moon, 2005).

Another perspective on the requirement of insulators comes from the fact that many genes are controlled by several regulatory elements needed for tissue- and cell-specific expression. For example, the Drosophila gene Abdominal-B (Abd-B) contains an extended 3' regulatory region that is functionally subdivided into distinct enhancer domains. Functional separation of the enhancer sequences is achieved by intervening insulators such as Frontabdominal (Fab)-7 and Fab-8. Although both elements have been shown to mediate enhancer-blocking function, the protein involved in this activity has not been described (Moon, 2005).

In sharp contrast to Drosophila, the genome of vertebrates is much more expanded, due primarily to larger distances between genes. Therefore, the need for insulators to separate genes may not seem as pronounced as it is in Drosophila. Indeed, until now, only a single protein, CTCF, has been identified to mediate enhancer-blocking activity (Ohlsson, 2001). Binding sites for CTCF have been shown to be involved in gene activation (Vostrov, 1997), gene repression (Baniahmad, 1990; Lobanenkov, 1990) and enhancer blocking (Bell, 1999; Hark, 2000; Kanduri, 2000; Szabo, 2000; Filippova, 2001; Lutz, 2003; Tanimoto, 2003). Furthermore, vertebrate- and mammalian-specific functions, such as X-chromosome inactivation and control of the epigenetic DNA methylation state, seem to involve CTCF (Lee, 2003; Moon, 2005 and references therein).

Obviously, the function of enhancer blocking has developed during evolution such that Drosophila uses several proteins and mechanisms for enhancer blocking and insulation (Kuhn, 2003). However, none of the known Drosophila insulator proteins has a counterpart found to be conserved in vertebrates. Rather, vertebrates use CTCF, which has not previously been found in Drosophila. This study characterizes a Drosophila orthologue of CTCF with similarities to many of the features identified for vertebrate CTCF. Furthermore, a previously characterized Drosophila insulator, Fab-8, mediates enhancer blocking by CTCF in Drosophila as well as in vertebrate cells. Thus, the enhancer-blocking protein CTCF and, probably, the mechanisms of CTCF-driven enhancer blocking are both conserved from Drosophila to humans (Moon, 2005).

FlyBase data entries and cDNA sequence analysis revealed an open reading frame (ORF) coding for a protein similar to vertebrate CTCF with respect to the overall structure. dCTCF contains all of the expected 11 zinc fingers (Zn-fingers), separated by both standard and noncanonical inter-finger linkers. Furthermore, most of the crucial DNA base recognition residues at positions −1, 2, 3 and 6 are identical. Variation in position 6 for fingers #6 and #9 generates a change from alanine or serine to methionine; this is of no consequence for the DNA-binding specificity, as the recognition code is not changed (Moon, 2005).

Similarities in Zn-fingers do not necessarily imply similarities in function. Therefore, whether dCTCF can act as a transcriptional repressor, as has been demonstrated previously for vertebrate CTCF (Burcin, 1997), has been examined. The strongest repressive function has been shown to reside within the combined carboxy-terminal plus Zn-finger domains (Lutz, 2000). Equivalent regions of Drosophila and chicken CTCF (chCTCF) were fused to the yeast GAL4 transcription factor DNA-binding domain. Both Drosophila and chicken GAL4-CTCF fusions repressed reporter gene activity to a similar extent in two different cell lines and in a way comparable with the previously characterized strong repressor GAL4-v-erbA362. These results clearly indicate that dCTCF, like its vertebrate counterpart, has transcriptional repressor activity (Moon, 2005).

In vertebrates, CTCF is ubiquitously expressed (Burke, 2002), apparently functioning as a global transcriptional regulator in all cell types (Ohlsson, 2001). In comparison, dCTCF RNA expression levels were monitered at various stages of fly development. Using in situ hybridization, it was found that dCTCF RNA is present in the cytoplasm of the nurse cells within the fly egg chamber, transported into and distributed uniformly in the developing oocyte and in 0-24 h embryos as a maternal. Later stages show expression in all tissues and stages, revealing that dCTCF is a ubiquitous factor as in vertebrates. Location of dCTCF protein is clearly nuclear, exemplified by the nuclear staining of syncytial blastoderm embryos with dCTCF-specific antibodies (Moon, 2005).

To extend the comparison of vertebrate and Drosophila CTCF, in vitro-translated Drosophila and vertebrate CTCF were tested for binding to several previously characterized vertebrate CTCF targets (CTS). The sequences tested included the CTS of the FII insulator element of the β-globin gene, the APP gene, the myc genes and the mouse ARF promoter. With the exception of the two myc FPV and A sites, all the other sequences bound chicken and Drosophila CTCF similarly (Moon, 2005).

A methylation-interference assay was used to determine whether both proteins contact the same guanidine nucleotides on a given target DNA site. Both Drosophila and human CTCF were found to contact the same nucleotides on the β-globin FII insulator fragment. These results indicate that, despite considerable overall sequence divergence, fly and human CTCF show a striking degree of functional conservation with respect to DNA binding (Moon, 2005).

To identify potential Drosophila CTCF regulatory targets, an in vitro screen was performed for CTCF-binding sites, and the Fab-8 element, for which enhancer-blocking and boundary function have been shown, was found. This sequence is situated in the Abd-B locus, separating and insulating enhancer domains infraabdominal-7 (iab-7) from iab-8. Since the protein involved in this mechanism was unknown, and since vertebrate CTCF mediates enhancer-blocking activity (Ohlsson, 2001), whether dCTCF might have a similar role in the context of the Fab-8 element was tested. In vitro binding of dCTCF to Fab-8, as determined by methylation interference, suggested two binding sites for CTCF. Binding site mutations resulting in single-site mutations (mut1 or mut2) and in a double-site mutation (mut1+2) were used for electrophoretic mobility shift assay (EMSA). The wild-type Fab-8 element generates two retarded bands corresponding to a different mobility of the same DNA molecule occupied by CTCF at one of the two closely spaced CTS sequences. These different mobilities are probably caused by a site-specific DNA bending, which has also been observed on other dual binding sites, such as the H19 locus. Excess protein generated a slow mobility complex only resolved after a long run of the gel, reflecting binding of CTCF to both sites (Moon, 2005).

To test in vivo dCTCF binding to this important element, crosslinked chromatin was prepared from Drosophila embryos and CTCF-occupied sites were precipitated with the anti-dCTCF-C antibody. PCR primers for the Fab-8 sequence identified specifically precipitated chromatin, whereas primers against a different non-dCTCF-binding site and mock-precipitated chromatin resulted in no signal (Moon, 2005).

To test the functional similarity between dCTCF and vertebrate CTCF, enhancer blocking of Fab-8 was analyzed in vertebrate K562 cells. In comparison to the known enhancer-blocking effect mediated by the FII sequence, a similar reduction in colony numbers mediated by Fab-8 was seen. More importantly, specific abrogation of CTCF binding by the double mutation, mut1+2, resulted in loss of enhancer blocking (Moon, 2005).

The crucial test for enhancer-blocking activity of dCTCF had to be carried out in flies. Therefore, a vector was used with two regulatory regions containing the iab-5 enhancer from the Abd-B locus and two copies of the minimal twist enhancer, PE, directing an additive pattern of expression when placed between divergently transcribed white and lacZ reporter genes. The iab-5 enhancer directs expression in the posterior one-third of the blastoderm stage embryo, whereas the 2 × PE enhancer activates transcription in the ventral-most region where twist is normally expressed. Enhancer elements are enhancing both the white gene as well as the lacZ gene. Altered patterns of transcription were observed when the 1 kb spacer sequence was replaced by the 680 bp Fab-8 element. On the white promoter, the iab-5 activity was completely abolished (shown as the lack of staining in the iab-5 activity region), while the 2 × PE enhancer was still activating the white gene. The lacZ promoter, conversely, could be activated only by the proximal iab-5 but not by the distal 2 × PE. This result suggests that the Fab-8 fragment blocks the respective distal enhancer for both the white and the lacZ promoters. When the CTCF sites were mutagenized (mut1+2), the iab-5 activity on white was partly restored. Similarly, the 2 × PE element again directed the transcription of the lacZ gene. Chromatin/CTCF immunoprecipitation revealed specific CTCF binding to the Fab-8 element of the enhancer-blocking vector, whereas binding to the Fab-8 mut element was clearly reduced. This correlation between strong CTCF binding and full enhancer-blocking function indicates that the activity of Fab-8 is at least partly mediated by CTCF and that dCTCF, similar to vertebrate CTCF, confers enhancer blocking (Moon, 2005).

Thus, at least one enhancer-blocking protein (CTCF) in Drosophila and vertebrates is conserved with a similar enhancer-blocking function. In addition to enhancer blocking, mammalian CTCF has gained functions involving the control of epigenetic states in the context of imprinted genes and X-chromosome inactivation (Lee, 2003; Lewis, 2004; Moon, 2005 and references therein).

Polycomb-mediated chromatin loops revealed by a subkilobase-resolution chromatin interaction map

The locations of chromatin loops in Drosophila were determined by Hi-C (chemical cross-linking, restriction digestion, ligation, and high-throughput DNA sequencing). Whereas most loop boundaries or "anchors" are associated with CTCF protein in mammals, loop anchors in Drosophila were found most often in association with the polycomb group (PcG) protein Polycomb (Pc), a subunit of polycomb repressive complex 1 (PRC1). Loops were frequently located within domains of PcG-repressed chromatin. Promoters located at PRC1 loop anchors regulate some of the most important developmental genes and are less likely to be expressed than those not at PRC1 loop anchors. Although DNA looping has most commonly been associated with enhancer-promoter communication, the results indicate that loops are also associated with gene repression (Eagen, 2017).

The locations of loop anchors in Drosophila determined in this study are notable both for correlations with ChIP-seq data and for the lack thereof. The lack of correlation with locations of CTCF protein was unexpected, inasmuch as most loop anchors in mammals are associated with CTCF protein, apparently bound to CTCF sequence motifs in a convergent orientation. There are evidently multiple patterns of protein association with loop anchors in metazoans. The association of loop anchors in Drosophila with Pc protein is noteworthy because it points to a role of looping not only in gene activation, as widely observed in the past, but in gene repression as well (Eagen, 2017).

Regions of PcG-repressed chromatin ('PcG domains') that are separated by hundreds of kilobases to megabases are known to be in enhanced spatial proximity, but details of their internal organization have only been investigated by averaging over many PcG domains. The high resolution of Hi-C contact maps in this study revealed chromatin loops within individual PcG domains, giving insight into their internal organization. PRC1 is known to compact nucleosome arrays in vitro. Knockdown of the PRC1 subunit, Polyhomeotic (Ph), in vivo decompacts PcG-repressed chromatin, and Ph that is unable to polymerize impairs the ability of PRC1 to form clusters. Together with these findings, the current results suggest that PRC1-bound chromatin loops within PcG-repressed domains either establish or maintain a condensed state (Eagen, 2017).

Previous analyses by 3C have pointed to associations of PcG proteins with chromatin loops for the Bithorax complex (BX-C) in S2 cells; for inv and en in BG3 and Sg4 cells; and for an embryonic, pupae, and adult transgenic reporter system. The current Hi-C data are, however, at higher resolution and genome-wide. Higher resolution allowed more comprehensive analysis, such as the unambiguous identification of loops and the segmentation of ANT-C into a series of TADs with one or two homeotic gene promoters per TAD. Genome-wide analysis revealed both the pervasive nature of Pc protein association and the absence of significant CTCF protein association, despite conservation of CTCF from Drosophila to humans (Eagen, 2017).

A report by Cubeñas-Potts (2017) on Drosophila chromatin loops in Kc cells appeared while this manuscript was in preparation. Cubeñas-Potts noted an enrichment of cohesin but a lack of Drosophila CTCF at loop anchors, consistent with the current observations. Cubeñas-Potts did not mention Pc, but a current analysis of their data revealed an enrichment of loop anchors at Pc ChIP peaks and an enrichment of Pc ChIP peaks at loop anchors. This study found a likelihood of repression of promoters at Pc-bound loop anchors, especially for developmental genes; Cubeñas-Potts observed an enrichment of active developmental enhancers at loop anchors, possibly because these are among the many loop anchors not bound by Pc, or because the chromatin at these loop anchors is bivalent, bound by nonhistone proteins and histone posttranslational modifications associated with both gene activation and repression (Eagen, 2017).

The occurrence of PRC1 at loop anchors could reflect a role in loop formation similar to that proposed for CTCF in mammals, wherein cohesion complexes extrude loops, in a process halted upon reaching bound CTCF. Consistent with this model, a large majority (72.8%) of Drosophila loop anchors are bound by the Rad21 subunit of cohesin. Regardless of whether PRC1 performs such a role, additional proteins must be involved, because PRC1 is present at only 26% of Drosophila loop anchors (Eagen, 2017).

High-resolution TADs reveal DNA sequences underlying genome organization in flies

Despite an abundance of new studies about topologically associating domains (TADs), the role of genetic information in TAD formation is still not fully understood. This study used HiCExplorer to annotate >2800 high-resolution (570 bp) TAD boundaries in Drosophila melanogaster. Eight DNA motifs enriched at boundaries were identified, including a motif bound by the M1BP protein, and two new boundary motifs. In contrast to mammals, the CTCF motif is only enriched on a small fraction of boundaries flanking inactive chromatin while most active boundaries contain the motifs bound by the M1BP or Beaf-32 proteins. Boundaries can be accurately predicted using only the motif sequences at open chromatin sites. It is proposed that DNA sequence guides the genome architecture by allocation of boundary proteins in the genome. Finally, an interactive online database is presented to access and explore the spatial organization of fly, mouse and human genomes (Ramirez, 2018).

How the DNA packs into the nucleus and coordinates functional activities is a long-standing question in biology. Recent studies have shown that the genome of different organisms is partitioned into chromatin domains, usually called topologically associated domains (TADs), which are invariable between cell types and evolutionary conserved in related species (Ramirez, 2018).

To understand TAD formation, researchers had focused on the proteins found at TAD boundaries. In mammalian cells, the CCCTC-binding factor (CTCF) protein has been shown to be enriched at chromatin loops, which also demarcate a subset of TAD boundaries (referred to as 'loop domains'). A proposed mechanism, based on the extrusion of DNA by cohesin, suggests that the DNA-binding motif of CTCF and its orientation determine the start and end of the loop. In line with this hypothesis, deletions of the CTCF DNA-motif effectively removed or altered the loop or caused changes in gene~enhancer interactions that lead to developmental abnormalities in mouse embryos. Additionally, acute depletion of CTCF leads to loss-of-TAD structure on CTCF containing boundaries. However, CTCF-cohesin loops only explain a fraction (<39%) of human TAD boundaries, while plants and bacteria lack CTCF homologs but also show TAD-like structures. Thus, it is possible that additional factors are involved in the formation of TADs (Ramirez, 2018).

In contrast to mammals, the genetic manipulation tools available in flies have allowed the characterization of several proteins that, like CTCF, are capable of inhibiting enhancer-promoter interactions. Throughout this paper these proteins are refered to as 'insulator proteins' and their binding motifs as 'insulators' or 'insulator motifs'. In flies, apart from CTCF, the following DNA-binding insulator proteins have been associated to boundaries: Boundary Element Associated Factor-32 (Beaf-32), Suppressor of Hairy-wing (Su(Hw)), and GAGA factor (GAF). Also, Zest white 5 (Zw5) has been proposed to bind boundaries. These insulator proteins recruit co-factors critical for their function, such as Centrosomal Protein-190 (CP190) and Mod(mdg4)12. Recently, novel insulator proteins have been described as binding partners of CP190: the zinc finger protein interacting with CP190 (ZIPIC), Pita which appear to have human homologs and localizes to TAD boundaries, and the Insulator binding factors 1 and 2 (Ibf1 and Ibf2). Except for CP190 and Mod(mdg4), all previously characterized boundary associated proteins bind to specific DNA motifs, suggesting that the 3D conformation of chromatin can be encoded by these motifs (Ramirez, 2018).

This study sought to identify the DNA encoding behind TAD boundaries in flies. First, software (HiCExplorer) was developed to obtain boundary positions at 0.5 kilobase resolution based on published Hi-C sequencing data from Drosophila melanogaster Kc167 cell line. Using these high-resolution TAD boundaries, eight significantly enriched DNA-motifs were identified. Five of these motifs are known to be bound by the insulator proteins: Beaf-32, CTCF, the heterodimer Ibf1 and Ibf2, Su(Hw) and ZIPIC. A large fraction of boundaries contain the motif bound by the motif-1 binding protein (M1BP), a protein associated to constitutively expressed genes. This motif has recently been found at boundaries. The two remaining DNA-motifs have not been associated to boundaries before. Surprisingly, it was found that depletion of Beaf-32 has no major effect on chromosome organisation, while the depletion of M1BP leads to cell arrest in M-phase and dramatically affects the Hi-C results. Using machine learning methods based on the acquired DNA-motif information, boundaries were accurately distinguish from non-boundaries and TAD boundaries that were missed when using only Hi-C data were identified. The results suggest that the genome architecture of flies can be explained predominantly by the genetic information. The methods for Hi-C data processing, TAD calling and visualization were implemented into an easy to use tool called HiCExplorer. To facilitate exploration of available Hi-C data, an interactive online database was provided containing processed high-resolution Hi-C data sets from fly, mouse and human genome (Ramirez, 2018).

This study used high resolution (DpnII restriction enzyme) and deeply sequenced (~246 million reads) Hi-C data to map the genomic positions of TAD boundaries within ~600 bp in D. melanogaster. This analysis revealed a larger number of TADs, including many small active TADs (23 kb mean length), that were absent in previous reports. TAD size, boundary strength, chromatin marks, gene orientation, and transcription at the TADs were characterized. Motif calling was performed at boundaries, validating the presence of known insulators, along with M1BP motif, which recently has also been shown to be associated to boundaries and core promoter motif 6 and motif 8, which have not been associated to boundaries before. Using different machine learning methods, this study found that DNA motifs and open chromatin are sufficient to accurately predict a major fraction of fly boundaries. Finally, a set of useful tools and a resource is presented for visualization and annotation of TADs in different organisms (Ramirez, 2018).

This study verifies various properties of fly boundaries indicated in previous publications. Most boundaries associate with promoters and active chromatin (Hou, 2012) and known insulator proteins are enriched at boundaries. A comprehensive set of core promoter motifs are detected at boundaries, including the newly discovered M1BP motif, and motifs which have been associated to housekeeping gene expression. However, some of the results contradict previous observations. For example, it was find that genes at boundaries have higher expression and lower variability of expression throughout fly development. This in line with (Hug, 2017; Ulianov, 2016) but in contrast with Hou (2012), who suggest that gene density and not the transcriptional state is important for boundary formation. Unlike Hou this study found that genes at boundaries tend to be divergently transcribed. In contrast to various earlier studies, CTCF does not appear to be a major boundary associated insulator in flies. This study also shows that the number of insulator motifs at boundaries correlates very little with boundary strength (Ramirez, 2018).

Most of these differences are due to the increased resolution of detected boundaries and the combined analysis of DNA motifs with ChIP-seq data, rather than ChIP-Seq peaks alone. This study shows that correlating boundaries with ChIP-Seq peaks alone is not a good measure when it comes to determinants of boundary formation. Many DNA-binding proteins show co-localization in ChIP-Seq data without presence of the corresponding DNA motifs. This is possible due to cross-linking artifacts and indirect binding, which is, in fact, aggravated at boundaries, which tend to contact each other in 3D space (Ramirez, 2018).

Another argument for considering motifs is the contradicting case of CTCF at boundaries. In contrast to earlier studies based on CTCF ChIP-seq, this study found that the CTCF motif is rarely associated to boundaries. This difference is caused by the quality of the ChIP-seq data that can produce spurious peaks. For example, a significant enrichment of CTCF at boundaries was observed in the ChIP-Seq data from another study. On the other hand, ChIP-seq data sets show significant enrichment if ChIP-seq peaks are only considered that contain the CTCF motif. For CTCF, and in general for ChIP-seq experiments in flies, 'phantom peaks' are known to occur at active promoters. Thus, to avoid misleading results the current analyses are based on motif presence when possible and for ChIP-Seq data sets, significance threshold along with motif binding affinity are used for analysis (instead of taking a significance cutoff alone) (Ramirez, 2018).

This study observed that boundary strength is associated with the chromatin states of flanking TADs and particular motif combinations, but is not associated with the number of co-occurring boundary motifs. Boundary strength is higher between active and inactive/PcG TADs while is lower at boundaries separating two TADs within the same state (e.g., active-active, inactive-inactive). Boundaries containing Beaf-32 are stronger when present together with either motif 6, Pita, or ZIPIC motif while weaker with motif 8. Although, the mechanism by which combinations of insulators alter the boundary strength still remains unclear, an association was observed of Nup98 with Pita motif, motif 6, and CTCF, suggesting that association with nuclear pore proteins may result in stronger boundaries. Nup98 has now been shown to be functionally important in mediating enhancer-promoter looping in the Drosophila genome (Ramirez, 2018).

The current results indicate that the two sets of boundary motifs (promoter and non-promoter) participate in the compartmentalization of different types of chromatin. Boundaries containing core promoter motifs are either flanking, or surrounded by active chromatin regions. In contrast, the boundaries containing non-promoter motifs tend to be within or at the borders of inactive or repressed chromatin. This finding is in line with previous reports showing an enrichment of CTCF at the borders of H3K27me3 domains and an enrichment of Beaf-32 in active chromatin. This indicates that insulator proteins might serve different functions guided by the DNA sequence. For example, this study observed that GAF motif, whose presence is negatively associated with TAD boundaries, is rather detected alone at 'loop domains' (Ramirez, 2018).

These analyses indicate that the depletion of the well-studied insulator protein Beaf-32, has no significant effect on the chromosome conformation. However, in Drosophila melanogaster, both the Beaf-32 and DREF proteins bind exactly the same DNA motif. Thus, the current results, as well as others point out that DREF, a protein that unlike Beaf-32 is conserved in humans, might have a more prominent role in genome organization than previously thought (Ramirez, 2018).

On the other hand, cells under M1BP knockdown grow slower in culture and get arrested in M-Phase, probably because M1BP is a transcription factor of constitutively expressed genes. Since M1BP depleted cells show cell cycle defects, it is difficult to separate the direct role of M1BP at boundaries from the indirect effects caused by deregulation of thousands of genes. To study the direct role of M1BP at boundaries, it would be useful to perform either deletion of M1BP motif on boundaries using CRISPR, as shown for CTCF in mammals or through acute and complete depletion of M1BP9 (Ramirez, 2018).

This study presents evidence that the DNA sequence contains features that can guide the formation of higher order chromosome organisation. The association of boundary types with a combination of motifs, and the fact that boundaries can be predicted using DNA sequence alone, in absence of any information about associated protein or histone marks leads the authors to propose a DNA-guided chromatin assembly model. In this model, the boundary elements are recognized by their proteins, which help loading TAD assembly factors onto chromatin. Promoter and non-promoter boundaries can thus have different mechanisms of formation. DNA motifs at inactive regions can attract proteins that may establish TAD domains by setting up barriers for chromatin marks. Although overall barrier activity of insulator proteins have been controversial, it is plausible that the barrier mechanism is active only at a subset of boundaries (like those of inactive TAD domains). DNA motifs at gene promoters can associate with core-promoter proteins which then guide the assembly of Pol-II pre-initiation complex. The pre-initiation complex can then recruit condensins. Once recruited, condensins can perform loop extrusion independent of Pol-II transcriptional activity, leading to emergence of TADs. Condensins can also remain associated to chromatin during mitosis, to re-establish TADs after the cell division. In general, the results indicate that active transcription and chromosome conformation are related. Future studies investigating the association of Pol-II pre-initiation complex and condensin activity on gene promoters would advance understanding of mechanism of TAD formation (Ramirez, 2018).

Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains

Several multiprotein DNA complexes capable of insulator activity have been identified in Drosophila melanogaster, yet only CTCF, a highly conserved zinc finger protein, and the transcription factor TFIIIC have been shown to function in mammals. CTCF is involved in diverse nuclear activities, and recent studies suggest that the proteins with which it associates and the DNA sequences that it targets may underlie these various roles. This study shows that the Drosophila homolog of CTCF (dCTCF) aligns in the genome with other Drosophila insulator proteins such as Suppressor of Hairy wing (SU(HW)) and Boundary Element Associated Factor of 32 kDa (BEAF-32) at the borders of H3K27me3 domains, which are also enriched for associated insulator proteins and additional cofactors. RNAi depletion of dCTCF and combinatorial knockdown of gene expression for other Drosophila insulator proteins leads to a reduction in H3K27me3 levels within repressed domains, suggesting that insulators are important for the maintenance of appropriate repressive chromatin structure in Polycomb (Pc) domains. These results shed new insights into the roles of insulators in chromatin domain organization and support recent models suggesting that insulators underlie interactions important for Pc-mediated repression. This study reveals important relationship between dCTCF and other Drosophila insulator proteins and speculates that vertebrate CTCF may also align with other nuclear proteins to accomplish similar functions (Van Bortle, 2012).

Improvements in genomic strategies for mapping genome-wide interactions have allowed recent studies to probe basic genome folding principles as well as insulator-mediated chromatin interactions. Results consistently support current models proposing roles for insulator proteins in chromosome organization and challenge the basic barrier and enhancer-blocking activities that classically defined these proteins. Instead, the ability of insulators to block the spread of heterochromatin and impede enhancer-promoter interactions may simply be consequences of a more paramount role in chromosome organization. New findings in Drosophila also suggest that insulators are required to mediate long-range interactions important for Polycomb (Pc) repression, and the recent identification of CTCF in transcription factories suggests that insulators may direct the localization of specific genomic loci to discrete nuclear subcompartments for gene regulation. Nevertheless, the finding that heterochromatin does not spread into flanking chromatin domains in response to insulator knockdown is surprising based on numerous examples of insulator mediated barrier function. Though individual insulator elements may indeed serve to prevent the spread of silencing chromatin, disruption of total insulator protein levels instead significantly affected the levels of H3K27me3 within rather than outside of repressive chromatin domains. Insulator knockdown had no effect on the expression of E(z) or total H3K27me3 levels. Therefore, the loss of H3K27me3 within Pc domains genome-wide suggests insulators play a critical role necessary for the maintenance of appropriate chromatin architecture at these specific loci. Given the requirement for insulators in long-range Pc interactions, it is speculated that long-range interactions mediated by dCTCF and other Drosophila insulator proteins are ultimately disrupted by insulator knockdown, and that H3K27me3 depletion likely reflects a defect in Pc mediated compaction and maintenance of H3K27me3 at developmental loci (Van Bortle, 2012).

Interestingly, however, expression of genes within repressive H3K27me3 domains was not significantly affected, suggesting Pc mediated gene silencing was not abrogated, or that additional steps are required to activate these developmental genes. Future studies investigating the role of insulators in Pc-mediated repression, and the effects of insulator knockdown in nuclear organization, will provide valuable insight into the relationship between insulator proteins and chromatin architecture (Van Bortle, 2012).

The diverse activities of CTCF in gene expression and chromatin organization require exploration of the proteins with which it functions and the target sequences associated with specific functions. By combining the resolution conferred by high throughput sequencing (ChIPseq), with mapping of core target sequences, this study provides a stringent but exhaustive map of direct binding sites for Drosophila insulators, and extends previous analyses of dCTCF, SU(HW), BEAF-32, and CP190 to include the insulator protein MOD(MDG4). It was shown that dCTCF aligns with both the SU(HW) and BEAF-32 insulators, where dCTCF becomes enriched for additional insulator and insulator-associated proteins. The presence of aligned dCTCF sites at the borders of H3K27me3 domains provides an excellent system to query the importance of insulator proteins at the boundaries of discrete chromatin domains. Recently identified correlations for insulator proteins at the boundaries of physical domains mapped in Drosophila melanogaster (Sexton, 2012) provide evidence for why only a subset of aligned dCTCF localize to H3K27me3 domain borders, and clearly demonstrate that insulators are also involved in the organization of other, distinct chromatin domains. Whereas Pc-repressed domains are relatively easily identifiable in the form of H3K27me3 signatures, future characterization of discrete physical domains and domain boundaries will require genome-wide interrogation of chromosome interactions in individual cell-types of interest. Nearly 40% of aligned dCTCF sites (~355) localize to physical domain boundaries mapped in late embryos by Sexton (2012), suggesting physical domains and insulator localization may be conserved at many loci across cell-types (Van Bortle, 2012).

Interestingly, dCTCF appears to target three different sequences in D. melanogaster, including the highly conserved core motif for which dCTCF has been described as binding in both Drosophila and mammals. The secondary motif appears highly similar to the conserved core consensus (AGGNGGC) with an insertion between the first pair of guanines (AGTGTGGC), and average dCTCF levels suggest this represents a low occupancy and potentially lower affinity binding site. These novel dCTCF sites are highly enriched for insulator protein CP190 when compared to its primary target sequence. This finding, combined with previous data indicating CP190 is essential for dCTCF binding to a subset of its target sites, suggests that CP190 might facilitate dCTCF binding to these secondary sites. The absence of CP190 in vertebrates may explain why these sequences have not been identified as mammalian target sequences, raising the possibility that these binding sites are a Drosophila specific phenomenon (Van Bortle, 2012).

Analysis of dCTCF insulator alignment at the eve locus and genome-wide uncovers a tight association with BEAF-32 and SU(HW), which may provide dCTCF with numerous advantages for effectively establishing a functional insulator. First, alignment of multiple insulator DNA elements may increase the likelihood of sequence accessibility at important loci, as insulator binding sites have been characterized by reduced nucleosome density. For example, an insulator-binding protein may access its cognate sequence, thereby creating an accessible landscape for other, potentially different insulator proteins to bind their respective targets. Second, by aligning in close proximity, recruitment of essential insulator proteins [i.e., CP190 and MOD(MDG4)] by one insulator-binding protein may facilitate recruitment by others, given that CP190 and MOD(MDG4) may be recruited as multimers. Third, given that dCTCF binds secondary sites that potentially require CP190, recruitment of CP190 by a neighboring insulator (i.e., SU(HW) or BEAF-32) may preclude dCTCF binding, thereby providing a regulatory step in dCTCF recruitment to DNA. Finally, by aligning with SU(HW) and BEAF-32, dCTCF establishes a unique identity compared to independent dCTCF sites, where it becomes enriched for additional cofactors, including L(3)MBT and Chromator (Van Bortle, 2012).

Though the data shed new and valuable insight into what appears to be cooperative insulator function in Drosophila melanogaster, many questions remain. Given current models that insulators function via intra- and inter-chromosomal interactions, it is plausible that aligned dCTCF sites and their enrichment for CP190 and MOD(MDG4) allow for stable chromosomal interactions. Current locus- and genome-wide interaction assays may effectively answer this question in the near future. While BEAF-32 has been defined as lineage specific, and SU(HW) appears to lack a counterpart in mammals, the current results suggest that mammalian CTCF may align with other, unique DNA-binding proteins important for appropriate insulator function at the boundaries of Pc domains (Van Bortle, 2012).

Insulators target active genes to transcription factories and polycomb-repressed genes to polycomb bodies

Polycomb bodies are foci of Polycomb proteins in which different Polycomb target genes are thought to co-localize in the nucleus, looping out from their chromosomal context. WInsulators, not Polycomb response elements (PREs), have been shown to mediate associations among Polycomb Group (PcG) targets to form Polycomb bodies. This study used live imaging and 3C interactions to show that transgenes containing PREs and endogenous PcG-regulated genes are targeted by insulator proteins to different nuclear structures depending on their state of activity. When two genes are repressed, they co-localize in Polycomb bodies. When both are active, they are targeted to transcription factories in a fashion dependent on Trithorax and enhancer specificity as well as the insulator protein CTCF. In the absence of CTCF, assembly of Polycomb bodies is essentially reduced to those representing genomic clusters of Polycomb target genes. The critical role of Trithorax suggests that stable association with a specialized transcription factory underlies the cellular memory of the active state (Li, 2013).

Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains

Recent advances enabled by the Hi-C technique have unraveled many principles of chromosomal folding that were subsequently linked to disease and gene regulation. In particular, Hi-C revealed that chromosomes of animals are organized into Topologically Associating Domains (TADs), evolutionary conserved compact chromatin domains that influence gene expression. Mechanisms that underlie partitioning of the genome into TADs remain poorly understood. To explore principles of TAD folding in Drosophila melanogaster, Hi-C and PolyA+ RNA-seq was performed in four cell lines of various origins (S2, Kc167, DmBG3-c2, and OSC). Contrary to previous studies, this study found that regions between TADs (i.e. the inter-TADs and TAD boundaries) in Drosophila are only weakly enriched with the insulator protein dCTCF, while another insulator protein Su(Hw) is preferentially present within TADs. However, Drosophila inter-TADs harbor active chromatin and constitutively transcribed (housekeeping) genes. Accordingly, it was found that binding of insulator proteins dCTCF and Su(Hw) predicts TAD boundaries much worse than active chromatin marks do. Interestingly, inter-TADs correspond to decompacted interbands of polytene chromosomes, whereas TADs mostly correspond to densely packed bands. Collectively, these results suggest that TADs are condensed chromatin domains depleted in active chromatin marks, separated by regions of active chromatin. The mechanism of TAD self-assembly is proposed based on the ability of nucleosomes from inactive chromatin to aggregate, and lack of this ability is found in acetylated nucleosomal arrays. Finally, this hypothesis is tested by polymer simulations, and it was found that TAD partitioning may be explained by different modes of inter-nucleosomal interactions for active and inactive chromatin (Ulianov, 2015).

Recently developed 3C-based methods coupled with high-throughput sequencing have enabled genome-wide investigation of chromatin organization. Studies performed in human, mouse, Drosophila, yeasts, Arabidopsis and several other species have unraveled general principles of genome folding. Chromosomes in mammals and Drosophila are organized hierarchically. At the megabase scale, mammalian chromosomes are partitioned into active and inactive compartments. At the sub-megabase scale, these compartments are subdivided into a set of self-interacting domains called Topologically Associating Domains (TADs); TADs themselves are often hierarchical and are split into smaller domains. Similar to mammals, Drosophila chromosomes are partitioned into TADs that are interspaced with short boundaries or longer inter-TAD regions (inter-TADs) (Ulianov, 2015).

Partitioning of mammalian genomes into TADs appears to be largely cell-lineage independent and evolutionary conserved. Disruption of certain TAD boundaries leads to developmental defects in humans and mice. TADs correlate with units of replication timing regulation in mammals and colocalize with epigenetic domains (either active or repressed) in Drosophila. The internal structure of TADs was reported to change in response to environmental stress, during cell differentiation, and embryonic development. In addition, comparative Hi-C analysis has demonstrated that genomic rearrangements between related mammalian species occur predominantly at TAD boundaries. Consequently, TADs appear to evolve primarily as constant and unsplit units. Previous studies in Drosophila embryonic nuclei and embryo-derived Kc167 cells detected TADs of various sizes roughly corresponding to epigenetic domains. Additionally, long-range genomic contacts and clustering of pericentromeric regions were revealed, and TAD boundaries were found to be enriched with active chromatin marks and insulator proteins. Both active and inactive TADs were identified, and their spatial segregation was observed (Ulianov, 2015).

Despite extensive studies, mechanisms underlying TAD formation remain obscure. Architectural proteins, including cohesin and CTCF, are often found at TAD boundaries; thus, they have been proposed to play a key role in the demarcation of TADs. However, several studies suggest that other mechanisms may be responsible for partitioning and formation of TADs. Firstly, depletion of various insulator proteins did not affect the profile of chromosome partitioning into TADs, but rather decreased intra-TAD interactions. Secondly, CTCF may mediate loops that occur between the start and the end of the so-called 'loop domains'. However, domains of similar sizes but without a loop were observed as well (so-called 'ordinary domains'. Thirdly, polymer simulations of a permanent chromatin loop yield a noticeable interaction between the loop bases on a simulated Hi-C map, but without a characteristic square shape of a TAD. Loops of this kind are thought to occur between insulator proteins such as Su(Hw) in the 'topological insulation' model. Finally, chromosomal domains similar to TADs in the bacterium Caulobacter crescentus are demarcated by actively transcribed genes, and are not affected by the knockout of SMC, a homolog of cohesin subunits (Ulianov, 2015).

This study presents evidences that question the role of insulators in the organization of TAD boundaries in Drosophila . The results suggest that TADs are self-organized and potentially highly dynamic structures formed by numerous transient interactions between nucleosomes of inactive chromatin, while inter-TADs and TAD boundaries contain highly acetylated nucleosomes that are less prone to interactions. Finally, a polymer model of TAD formation is developed based on the two types of nucleosomes, and it was found that a polymer composed of active and inactive chromatin blocks forms TADs on a simulated Hi-C map (Ulianov, 2015).

This study and others (Hou, 2012; Sexton, 2012) revealed that boundaries and inter-TADs in Drosophila, as opposed to TADs, are strongly enriched with active chromatin and its individual marks, as well as with active transcription and with constitutively transcribed housekeeping genes. Consequently, active chromatin marks, in the simplest case only total transcription and H3K4me3 (a mark of active promoters), can relatively well predict a TAD/inter-TAD profile. The existence of long inter-TADs composed of active chromatin is per se an argument for the ability of this type of chromatin to separate TADs. Furthermore, the current observations demonstrate that the presence of active chromatin and transcribed regions within TAD undermines the TAD integrity making TAD less compact and generating weak boundaries inside TAD. Consequently, a bona fide TAD is inactive; TADs containing active chromatin become less dense, acquire weak internal boundaries and eventually split into smaller TADs that are composed of inactive chromatin. The observation that the majority of housekeeping genes are located within inter- TADs and TAD boundaries suggests that evolutionary conservation and cell-type independence of TAD/inter-TAD profiles may be explained by conservation of positions of housekeeping genes along the chromosomes (Ulianov, 2015).

It is noted that chromosomal interaction domains similar to TADs have been observed in the bacterium Caulobacter crescentus, where they are demarcated by sites of active transcription. Although the basic level of chromosomal folding is different in bacteria and eukaryotes, the model proposed in (Le, 2013) and the model stem from common principles. In Caulobacter, active transcription is thought to disrupt the fiber of supercoils (plectonemes) by creating a stretch of non-packaged DNA, free of plectonemes, which spatially separates chromosomal regions flanking it. In the model, transcription disrupts chromatin organization by introducing a 'non-sticky' region of chromatin, which is less compact and more unfolded in space, and thus spatially separates two flanking regions. Computer modeling shows that stickiness of non-acetylated (inactive) nucleosomes and the absence of stickiness for acetylated (active) nucleosomes are sufficient for chromatin partitioning into TADs and inter-TADs. Self-association of nucleosomes may be explained by the interaction of positively charged histone tails (in particular, the tail of histone H4) of one nucleosome with the acidic patch of histones H2A/H2B at an adjacent nucleosome. Acetylation of histone tails, which is typical of active chromatin, may interfere with inter-nucleosomal associations. In addition to a high level of histone acetylation, other features of active chromatin including lower nucleosome density in inter-TADs, manifested as the decreased histone H3 occupancy, might contribute to the generation of TAD profiles (Ulianov, 2015).

It should be mentioned that a significant difference between the polymer simulations and models previously suggested by the Cavalli and Vaillant groups (Jost, 2014) is the use of saturating interactions between inactive nucleosomes. In the case of volume interactions, all nucleosomes of the same type adjacent in 3D space will attract each other; in the case of saturating interactions, each molecule may attract only one neighbor. Using volume interactions leads to the formation of a single dense blob, and does not produce TADs in a simulated Hi-C map.It is noted that the saturating nature of interactions between nucleosomes is based on the known properties of nucleosomal particles. Previous studies considered a variety of mechanisms that may lead to the formation of TADs. In particular, Barbieri (2012) studied segregation of two TADs using cubic lattice simulations of a short 152-monomer chain consisting of two TADs, assuming that inter- monomer interactions could only form between monomers belonging to the same TAD. In the current model, this study shows that TADs emerge without requiring such specific interactions; any two regions of sticky monomers separated by a non-sticky linker would form TADs. Another study proposed that transcription-induced supercoiling may be responsible for the formation of TADs (Benedetti, 2014). Although this model is consistent with the current observation that sites of active transcription demarcate TAD boundaries, there is limited evidence that supercoiling of chromatinized DNA exists in Drosophila and other organisms. On the contrary, the current model is based on known biochemical properties of nucleosomes (Ulianov, 2015).

The fact that a minor fraction of TADs is built mostly from active chromatin apparently contradicts the current model, suggesting that additional ways of chromatin self-organization could exist. One possibility is the establishment of long-range contacts between enhancers and their cognate promoters, as well as loops between pairs of insulators. Such loops formed inside active unstructured chromatin linkers (i.e., inter-TADs) could probably be sufficient to compact them and thus to fold into TADs (Ulianov, 2015).

TAD profiles of X chromosomes are almost identical in the male and female cell lines, that is in agreement with recently published observations (Ramírez, 2015). Thus, it seems that hyperacetylation of male X-chromosomes due to dosage compensation does not generate new TAD boundaries. However, it should be noted that MOF histone acetyltransferase of the MSL complex introduces only the H4K16ac mark. Although this modification is important to prevent inter-nucleosomal interactions, acetylation at other histone positions and H2B ubiquitylation contribute as well. Additionally, H4K16 acetylation generated by the dosage compensation system occurs preferentially at regions enriched with transcribed genes and hence within inter-TADs (Ulianov, 2015).

The current analysis does not support the previously reported (Hou, 2012; Sexton, 2012) strong enrichments of insulator proteins Su(Hw) and dCTCF at TAD boundaries in Drosophila. To assess the possible reasons of this divergence, the dCTCF distribution was re-analyzed with respect to TAD positions in the current dataset using the raw ChIP-seq data. No strong difference was observed in the dCTCF coverage in TADs and inter-TADs. Interestingly, this study obtained the same result while analyzing dCTCF and Su(Hw) binding within TAD boundaries identified by Hou (2012). However, a strong enrichment of dCTCF at TAD boundaries was observed when the peak distribution was analyzed instead of read coverage. Additionally, the effect was much weaker when modENCODE peaks were used. Hence, the discrepancy may be caused by a different peak calling procedure in modENCODE and in Hou. (2012). The biological significance of these observations remains to be determined. It is noted that disruption of the cohesin/CTCF complex in mammals, as well as depletion of the Vtd (also known as Rad21) cohesin subunit in Drosophila, did not lead to disappearance of TAD boundaries, but rather only slightly decreased interactions inside TADs (in mammals) and reduced TAD boundary strength in the Drosophila genome. These observations favor a role for the cohesin/CTCF complex, which is known to form loops, in chromatin compaction inside the TADs (Ulianov, 2015).

Binding of insulator proteins might contribute to establishing TAD boundaries through introducing active chromatin marks. Indeed, when inserted into an ectopic position, a classical insulator triggers hyperacetylation of the local chromatin domain and recruits chromatin-remodeling complexes. However, absence of strong enrichment of dCTCF at TAD boundaries and preferential location of Su(Hw) inside TADs mean that at least dCTCF- and Su(Hw)-dependent insulators are not the major determinants of TAD boundaries and inter-TADs (Ulianov, 2015).

TADs are predicted based on the analysis of averaged data from a cell population. Although they are usually represented as large chromatin globules, direct experimental evidence for the existence of such globules in individual cells is controversial. Using confocal and 3D-SIM microscopy, ~1-Mb globular domains have been observed within chromosomal territories. However, using STORM microscopy, chromatin in individual mammalian cells has been found to be organized into 'clutches' composed of several nucleosomes, and that increased histone acetylation dramatically reduces size of these clutches. It is thus possible that sub-megabase TADs revealed by Hi-C represent a set of nucleosome clutches separated by relatively short spacers of various sizes. These short clutches may occupy various positions within TADs in different cells and stochastically assemble to form short-living aggregates. The stochastic nature of TADs is supported by computer simulations (Ulianov, 2015).

Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation

Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding elements (CNEs). CNEs exhibit unexplained extreme levels of sequence conservation, with many acting as developmental long-range enhancers. Clusters of CNEs define the span of regulatory inputs for many important developmental regulators and have been described previously as genomic regulatory blocks (GRBs). Their function and distribution around important regulatory genes raises the question of how they relate to 3D conformation of these loci. This study shows that clusters of CNEs strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human and Drosophila. The set of TADs that are associated with high levels of noncoding conservation exhibit distinct properties compared to TADs devoid of extreme noncoding conservation. The close correspondence between extreme noncoding conservation and TADs suggests that these TADs are ancient, revealing a regulatory architecture conserved over hundreds of millions of years. Metazoan genomes contain many clusters of conserved noncoding elements. This study provides evidence that these clusters coincide with distinct topologically associating domains in humans and Drosophila, revealing a conserved regulatory genomic architecture (Harmston, 2017).

This work shows that the span of clusters of CNEs, known as GRBs, is predictive of the span of a subset of topologically associated domains (TADs) in both humans and Drosophila. These sets of TADs, referred to as GRB-TADs, are some of the largest, strongest and most gene sparse in humans and Drosophila and show distinct patterns of retrotransposon density and CTCF binding. Regions containing homologous developmental genes are associated with the same type of conservation and structure in humans and Drosophila. Not only are regulatory elements in these regions under intense selective pressure, but the association between the boundaries of GRBs and TADs also suggests that the basic 3D structure of these loci has existed over hundreds of millions of years of evolution, at least back to the ancestor of chordates and arthropods. At even more distant timescales, the phenomenon of microsynteny has existed since early Metazoa and is apparent across bilateria1. The syntenic relationship between RUNX2 and SUPT3H is conserved between humans and sponges, and the GRB containing the Iroquois and Sowah genes is conserved across a wide range of bilaterians, apart from tetrapods. This correspondence suggests that the regulatory domains around developmental gene paralogues created by whole-genome duplications (WGDs) are ancestral and not the result of convergent evolution, and may have existed since the origin of Metazoa (Harmston, 2017).

The striking concordance between the distribution of CNEs and topological organisation has far-reaching implications for understanding the nature and evolution of long-range regulation at developmental loci. Combining the concepts of GRBs and TADs leads to a model of regulatory domains with stronger predictive and explanatory power than either concept alone. The GRB model provides a framework for long-range regulation, in which the majority of regulatory elements within a GRB are dedicated to the control of its target gene, with other genes not responding to long-range regulation despite being located close to regulatory element. Target and bystander genes appear to exhibit distinct features that may help to explain this specificity. Topological organisation data provide more precise boundary estimates for these regions, including the ability to separate adjacent GRBs, and contributes information about the stability of the organisation of these domains and regulatory interactions within them across cells and tissue types. Therefore, a TAD enriched for extreme noncoding conservation is not representative of the regulatory domain of all of its constitutive genes, but primarily corresponds to the regulatory domain of the target gene under long-range regulation (Harmston, 2017).

The lower density of active genes and increased likelihood of Polycomb repression at GRBs may explain some of their topological features. Since GRBs represent the regulatory domains of developmental genes, in any one tissue or developmental stage most GRBs will be inactive and marked by Polycomb/H3K27me358. The degree of intermixing of chromatin between genomic domains depends strongly on their epigenetic state, with Polycomb-repressed chromatin showing little or no spatial overlap with active or inactive chromatin. It has been suggested that regions of active chromatin within TADs interfere with the packaging of chromatin, disrupting TAD formation and leading to less compact TADs, or fragmentation into smaller TADs60. The enrichment of active euchromatin at the boundaries of these regions may reflect the formation of barriers (Harmston, 2017).

The conservation and divergence of CTCF-binding sites is thought to play important roles in the evolution of regulatory domains25, 45. Constitutive CTCF sites are enriched at GRB and GRB-TAD boundaries, suggesting these regions are consistently insulated from neighbouring domains. The enrichment of CTCF at GRB and GRB-TAD boundaries, combined with the strength of interactions associated with GRBs, further suggests that these regions do not strongly interact with elements in adjacent domains. This is in addition to the association of GRBs with Polycomb-repressed chromatin and depletion of active euchromatin, which also promotes insulation from neighbouring regions, as described above. These features are reflective of the multiple mechanisms involved in insulating target genes from ectopic regulation by elements in neighbouring domains. It is highly probable that this type of restricted topological structure arose early in Metazoan evolution as a result of strong selective pressure to prevent this (Harmston, 2017).

GRBs and their constituent CNEs are defined by sequence conservation alone, which is stable across all cell types, with TADs having been found to be largely invariant across cell lines and between species. Even though the current methods for estimating GRBs have limitations with respect to their precision and coverage, this study has shown that the distribution of CNEs can nevertheless serve as an excellent proxy for the extent of a functionally distinct subset of TADs in both humans and Drosophila. Therefore, clusters of CNEs identified between evolutionarily distant species could be used to infer regulatory domains and predict their topological organisation in species lacking Hi-C data. TADs are also organised into inactive and active chromatin compartments, and the assignment of TADs to compartments can vary across cell types. GRB-TADs are preferentially located within compartment B across all of the lineages investigated, although they are more likely to switch compartments than those TADs lacking extreme noncoding conservation. This is consistent with previous work showing that GRBs represent regions that encompass regulatory elements of specific developmental regulators, which are repressed during the majority of developmental stages and cell types and only expressed in a limited subset (Harmston, 2017).

A remarkable property of TADs and GRBs is that they expand and shrink along with the entire genome, suggesting that the 3D organisation of regulatory loci is robust towards gain and loss of DNA between its constituent CNEs, even though insertion of repetitive elements is disfavoured. Previously, it was observed that GRBs are on average much more compact in Drosophila than in mammals, and that they also scale with genome size in fish genomes, and as previously observed TADs are smaller in Drosophila, concordant with the current hypothesis. Duplicated loci containing developmental regulators are more likely to be retained after WGD. Therefore, since tetrapod lineages have undergone two WGDs, a larger number of developmental regulators and associated GRBs are expected compared to arthropods, which is confirmed by the current analysis (Harmston, 2017).

The expansion and shrinkage of TADs and the turnover of regulatory elements within them is likely to be an important mechanism in metazoan evolution and responsible for differences in organismal complexity. The disruption of TAD boundaries perturbs spatial organisation, enhancer-promoter interactions and the expression of target genes. Deletions within TADs can lead to changes in enhancer-promoter interactions, in some cases causing disease, which suggests some level of selective pressure against such re-arrangements. The depletion of retrotransposons within GRB-TADs suggests that their insertion within this type of regulatory domain is under intense negative selection. This pressure may be due to the ability for retrotransposons to create new cis-regulatory elements, potentially perturbing the organisation of interactions within a regulatory domain, leading to ectopic expression and resulting in a negative effect on fitness in the majority of cases. In future, analysis of the evolutionary dynamics of CNEs and genomic rearrangements within TADs in multiple species will help to provide insights into their evolutionary dynamics (Harmston, 2017).

Because of the unknown reason for the extreme conservation of CNEs, their function as enhancers, and that their distribution closely follows the span of the TADs around genes known to be involved in long-range developmental regulation, it is tempting to speculate that CNEs are somehow directly involved in the chromatin folding of TADs, precisely arranging promoters and enhancers in 3D space. Indeed, just like TADs, the spatial proximity of promoters and developmental enhancers seems to be stable across different cell types, regardless of the activity of either. Currently, there is no evidence from sequence analysis that CNEs are involved in sequence-mediated interactions, and their role in chromatin folding remains an open question (Harmston, 2017).

It is concluded that this subset of TADs, which are associated with high levels of noncoding conservation, are functionally distinct, evolutionarily ancient 3D structures which represent the regulatory domains of key genes involved in embryonic development and morphogenesis. However, the spatial correspondence of GRBs and TADs does not offer immediate suggestions for the origin of extreme noncoding conservation. Just like other potential sources of selective pressure acting on these elements, current models of genome folding do not include a mechanism that could account for this level of selective pressure on elements within TADs. The main findings of this paper may help with formulating new hypotheses by focusing on their potential roles within TADs (Harmston, 2017).

Highly conserved ENY2/Sus1 protein binds to Drosophila CTCF and is required for barrier activity

Chromatin insulators affect interactions between promoters and enhancers/silencers and function as barriers for the spreading of repressive chromatin. Drosophila insulator protein dCTCF marks active promoters and boundaries of many histone H3K27 trimethylation domains associated with repressed chromatin. In particular, dCTCF binds to such boundaries between the parasegment-specific regulatory domains of the Bithorax complex. This study demonstrate that the evolutionarily conserved protein ENY2 is recruited to the zinc-finger domain of dCTCF and is required for the barrier activity of dCTCF-dependent insulators in transgenic lines. Inactivation of ENY2 by RNAi in BG3 cells leads to the spreading of H3K27 trimethylation and Pc protein at several dCTCF boundaries. The results suggest that evolutionarily conserved ENY2 is responsible for barrier activity mediated by the dCTCF protein (Maksimenko, 2014).

EAST organizes Drosophila insulator proteins in the interchromosomal nuclear compartment and modulates CP190 binding to chromatin

Recent data suggest that insulators organize chromatin architecture in the nucleus. The best studied Drosophila insulator proteins, dCTCF (a homolog of the vertebrate insulator protein CTCF) and Su(Hw), are DNA-binding zinc finger proteins. Different isoforms of the BTB-containing protein Mod(mdg4) interact with Su(Hw) and dCTCF. The CP190 protein is a cofactor for the dCTCF and Su(Hw) insulators. CP190 is required for the functional activity of insulator proteins and is involved in the aggregation of the insulator proteins into specific structures named nuclear speckles. This study has shown that the nuclear distribution of CP190 is dependent on the level of EAST protein, an essential component of the interchromatin compartment. EAST interacts with CP190 and Mod(mdg4)-67.2 proteins in vitro and in vivo. Over-expression of EAST in S2 cells leads to an extrusion of the CP190 from the insulator bodies containing Su(Hw), Mod(mdg4)-67.2, and dCTCF. In consistent with the role of the insulator bodies in assembly of protein complexes, EAST over-expression led to a striking decrease of the CP190 binding with the dCTCF and Su(Hw) dependent insulators and promoters. These results suggest that EAST is involved in the regulation of CP190 nuclear localization (Golovnin, 2015).

Insulators belong to the class of regulatory elements that organize the architecture of chromatin compartments. Insulators, or chromatin boundaries, are characterized by two properties: they interfere with enhancer-promoter interactions when located between them and buffer transgenes from chromosomal positions effects. To date, chromatin insulators have been characterized in a variety of species, indicative of their involvement in the global regulation of gene expression (Golovnin, 2015).

The well-studied Drosophila insulator proteins, dCTCF (homolog of vertebrate insulator protein CTCF) and Su(Hw), are DNA-binding zinc finger proteins. The Su(Hw) protein, encoded by the suppressor of Hairy wing [su(Hw)] gene, was one of the first insulator proteins identified in Drosophila. The best-studied Drosophila insulator found within the 5'-untranslated region of the gypsy retrovirus consists of 12 directly repeated copies of Su(Hw) binding sites. Genetic and molecular approaches have led to the identification and characterization of three proteins recruited by Su(Hw) to chromatin-Mod(mdg4)-67.2, CP190, and E(y)2/Sus1-that are required for the activity of the Su(Hw)-dependent insulators. The mod(mdg4) gene, also known as E(var)3-93D, encodes a large set of BTB/POZ protein isoforms. One of these isoforms, Mod(mdg4)-67.2, by its specific C-terminal domain interacts with the enhancer-blocking domain of the Su(Hw) protein. The BTB domain is located at the N-terminus of Mod(mdg4)-67.2 and mediates homo-multimerization (Golovnin, 2015).

Su(Hw), dCTCF, and most of other identified insulator proteins interact with Centrosomal Protein 190 kD (CP190). This protein (1096 amino acids) contains an N-terminal BTB/POZ domain, an aspartic-acid-rich D-region, four C2H2 zinc finger motifs, and a C-terminal E-rich domain. The BTB domain of CP190 forms stable homodimers that may be involved in protein-protein interactions. In addition to these motifs, CP190 also contains a centrosomal targeting domain (M) responsible for its localization to centrosomes during mitosis. It has been shown that CP190 is recruited to chromatin via its interaction with the DNA insulator proteins in interphase nucleus (Golovnin, 2015).

The Su(Hw), dCTCF, Mod(mdg4)-67.2, and CP190 proteins colocalize in discrete foci, named insulator bodies, in the Drosophila interphase cell nucleus. Contradictory reports have been published in which the insulator bodies are described either as protein-based bodies in the interchromatin compartment or as chromatin domains. As shown recently, insulator proteins rapidly coalesce from diffusely distributed speckles into large punctate insulator bodies in response to osmotic stress (Golovnin, 2015).

Cell exposure to hypertonic treatment, which enhances molecular crowding, makes it possible to discriminate between nucleoplasmic bodies formed mainly of RNA and proteins (such as PML bodies) and chromatin compartments such as Polycomb bodies formed due to the interaction of distantly located chromatin regions bound by Polycomb proteins. Nucleoplasmic bodies disappear under less crowded conditions and reassemble under normally crowded conditions, which can be interpreted as a consequence of increased intermolecular interactions between components of nucleoplasmic bodies. Similar to PML bodies, insulator bodies are preserved under hypertonic treatment, in contrast to chromatin-based structures that disappear as proteins dissociate from chromatin. The CP190 protein is suggested to be critical for the activity of insulators and to regulate the entry of other insulator proteins into the speckles. At the same time, CP190 associates with centrosomes throughout the nuclear division cycle in syncytial Drosophila embryos. Nuclear localization of CP190 is also sensitive to various kinds of stress, suggesting that this process is highly regulated. However, the mechanisms and proteins responsible for localization of CP190 in different nucleus compartments are unknown. This study has shown that the nuclear distribution of CP190 depends on the level of EAST, which is located mainly in the interchromatin compartment of the nucleus. EAST is a nuclear protein of 2362 amino acids which, except for 9 potential nuclear localization sequences and 12 potential PEST sites, contains no previously characterized motifs or functional domains. Together with Skeletor, Chromator, and Megator proteins, EAST forms the spindle matrix during mitosis. In the interphase nuclei, EAST localizes to the extrachromosomal compartment of the nucleus and is essential for the spatial organization of chromosomes (Golovnin, 2015).

Despite that the bulk of interphase EAST resides in the interchromosomal domain, the current model assumes that EAST can transiently interact with chromosomes. EAST physically interacts with Megator, a 260-kDa protein with a large N-terminal coiled-coil domain capable of self-assembly. It has been speculated that Megator can form polymers that, together with EAST, may serve as a structural basis for the nuclear extrachromosomal compartment. The results show that EAST interacts with CP190 and Mod(mdg4)-67.2 proteins and modulates their aggregation into the nuclear speckles. In case of EAST overexpression, CP190 binding to chromatin is reduced; consequently, the binding of Mod(mdg4)-67.2 and Su(Hw) is reduced as well, since CP190 is essential for it. On the basis of these results, it is hypothesized that EAST regulates localization of CP190 and insulator protein complexes in the interchromatin compartment, with these complexes subsequently determining organization of chromatin insulators (Golovnin, 2015).

The results suggest that insulator bodies are sensitive to the concentration of EAST in interphase cells. The properties of insulator bodies described previously and in this study suggest that they are formed by multiple interactions between proteins and resemble nuclear bodies composed of aggregated proteins and RNAs. As shown previously, the CP190 and Mod(mdg4) proteins interact with Su(Hw) and dCTCF and help the latter to enter the insulator bodies (Golovnin, 2015).

Taking into account the high level of dCTCF and Mod(mdg4) co-binding to chromosomes, it appears that dCTCF interacts with an as yet unidentified Mod(mdg4) isoform. Mod(mdg4)-67.2 and CP190 conjugate to the small ubiquitin-like modifier protein (SUMO). Specific interactions mediated by SUMO, the ability of Mod(mdg4) BTB to form oligomers, and the interaction between the BTB domain of Mod(mdg4)-67.2 and CP190 contribute to specific aggregation of the Su(Hw)/Mod(mdg4)-67.2/CP190 and dCTCF/CP190 complexes into the insulator bodies (Golovnin, 2015).

According to current views, the Megator protein can form polymers that, together with EAST, may serve as a structural basis for the nuclear extrachromosomal compartment. The overexpression of EAST leads to an extension of the EAST-Megator compartment, with consequent reduction in the effective volume available for the insulator proteins in the cell. As a result, the concentration of the insulator proteins increases, contributing to stabilization of the compact protein conformations visualized as insulator bodies. By interacting with Mod(mdg4)-67.2 and CP190, EAST may also be directly involved in nucleation of insulator bodies. It is possible that the truncated version of EAST (from 933 to 2362 aa) can more easily interact with the insulator proteins, which leads to noticeable enlargement of insulator bodies in S2 cell expressing EAST933-2362. The overexpression of EAST leads to segregation of the CP190 protein in independent speckles. The results suggest that EAST interacts with the CP190 region that includes BTB, D, and M domains. These domains are also required for CP190 interactions with other insulator proteins (Golovnin et al., in preparation). Thus, an increase in the EAST concentration may lead to displacement of the insulator proteins from the complex with CP190 (Golovnin, 2015).

The results do not exclude the possibility that EAST overexpression directly leads to dissociation of CP190 from chromatin. During mitosis, CP190 colocalizes with EAST in the spindle matrix, and the increase in the amount of EAST may well be responsible for dissociation of CP190 prior to chromosome condensation (Golovnin, 2015).

According to the current model, the insulator bodies help to form protein complexes that subsequently bind to regulatory elements such as insulators and promoters. In view of this hypothesis, it is likely that disturbances in the insulator bodies caused by EAST overexpression are responsible for the decrease in CP190 binding to the regulatory regions such as dCTCF- and Su(Hw)-dependent insulators and promoters. As shown recently, CP190 is required for recruiting Su(Hw) and Mod(mdg4)-67.2, but not dCTCF, to chromatin. Accordingly, it was observed that EAST overexpression affects the chromosomal binding of Su(Hw), but not of dCTCF. CP190 specifically interacts with the Mod(mdg4)-67.2 isoform, and Mod(mdg4)-67.2 at all Su(Hw) binding sites is colocalized with CP190. Thus, CP190 may be essential for recruiting the specific Mod(mdg4)-67.2 isoform to the Su(Hw) binding sites, with subsequent decrease in the amount of CP190 at the Su(Hw) binding sites, which leads to the substitution of Mod(mdg4)-67.2 by other Mod(mdg4) isoforms, as has been observed in this study (Golovnin, 2015).

Strong inactivation of EAST in S2 cells reduces the entry of the Mod(mdg4)-67.2/ Su(Hw) complex, but not of CP190, into the nucleus. It appears that EAST is involved in the regulation of nuclear localization of Mod(mdg4)-67.2, whose BTB domain can form multimeric complexes. Further study is required to elucidate this issue (Golovnin, 2015).

H3K27 modifications define segmental regulatory domains in the Drosophila bithorax complex

The bithorax complex (BX-C) in Drosophila melanogaster is a cluster of homeotic genes that determine body segment identity. Expression of these genes is governed by cis-regulatory domains, one for each parasegment. Stable repression of these domains depends on Polycomb Group (PcG) functions, which include trimethylation of lysine 27 of histone H3 (H3K27me3). To search for parasegment-specific signatures that reflect PcG function, chromatin from single parasegments was isolated and profiled. The H3K27me3 profiles across the BX-C in successive parasegments showed a 'stairstep' pattern that revealed sharp boundaries of the BX-C regulatory domains. Acetylated H3K27 was broadly enriched across active domains, in a pattern complementary to H3K27me3. The CCCTC-binding protein (CTCF) bound the borders between H3K27 modification domains; it was retained even in parasegments where adjacent domains lack H3K27me3. These findings provide a molecular definition of the homeotic domains, and implicate precisely positioned H3K27 modifications as a central determinant of segment identity (Bowman, 2014).

The Polycomb Group repression system is often described as a cellular memory mechanism, which can impose lifelong silencing of a gene in response to a transitory signal. That view seems valid, but the concept of a PcG regulatory domain is much richer. In the PS6 domain of the BX-C, for example, there are many enhancers to drive Ubx expression in specific cells at specific developmental times, all of which are blocked in parasegments one through five, but active in parasegments 6 through 12. Individual enhancers need not include a segmental address that is specified, for example, by gap and pair-rule DNA-binding factors; their function is segmentally restricted by the domain architecture. Indeed, these enhancers will drive expression in a different parasegment when inserted into a different domain (as in the Cbx transposition). Each domain has a distinctive collection of enhancers; the UBX pattern in PS5 is quite different from that in PS6. Thus, there are two developmental programs for Ubx, one in each of these parasegments, without the need for a duplication of the Ubx gene. Other loci with broad regions of H3K27 methylation may likewise be parsed into multiple domains, once histone marks are examined in specific cell types (Bowman, 2014).

The all-or-nothing H3K27me3 coverage of the BX-C parasegmental domains validates and refines the domain model. In particular, K27me3 is uniformly removed across the PS5 and PS7 domains in PS5 and PS7, even though the activated genes in those parasegments (Ubx and abd-A, respectively) are only transcribed in a subset of cells. It is interesting that both PRC1 and PRC2 components have binding patterns that do not fully reflect function (repression and K27 methylation, respectively), indicating the possibility that function of these complexes is regulated separately from binding. The challenges now are to understand how PcG regulated domains are established, differently in different parasegments, and to describe the molecular mechanisms, including changes in chromosome structure, that block gene activity in H3K27 trimethylated domains (Bowman, 2014).

A variably occupied CTCF binding site in the Ultrabithorax gene in the Drosophila Bithorax Complex

Although the majority of genomic binding sites for the insulator protein CTCF are constitutively occupied, a subset show variably occupancy. Such variable sites provide an opportunity to assess context-specific CTCF functions in gene regulation. This study has identified a variably occupied CTCF site in the Drosophila Ultrabithorax (Ubx) gene. This site is occupied in tissues where Ubx is active (third thoracic leg imaginal disc) but is not bound in tissues where the Ubx gene is repressed (first thoracic leg imaginal disc). Using chromatin conformation capture this site was shown to preferentially interact with the Ubx promoter region in the active state. The site lies close to Ubx enhancer elements and is also close to the locations of several gypsy transposon insertions that disrupt Ubx expression, leading to the bx mutant phenotype. Gypsy insertions carry the Su(Hw)-dependent gypsy insulator and were found to affect both CTCF binding at the variable site and the chromatin topology. This suggests that insertion of the gypsy insulator in this region interferes with CTCF function and supports a model for the normal function of the variable CTCF site as a chromatin loop facilitator, promoting interaction between Ubx enhancers and the Ubx transcription start site (Magbanua, 2014).

CTCF-dependent co-localization of canonical Smad signaling factors at architectural protein binding sites in D. melanogaster

The transforming growth factor beta (TGF-beta) and bone morphogenic protein (BMP) pathways transduce extracellular signals into tissue-specific transcriptional responses. During this process, signaling effector Smad proteins translocate into the nucleus to direct changes in transcription, but how and where they localize to DNA remain important questions. This study has mapped Drosophila TGF-beta signaling factors Mad, dSmad2, Medea and Schnurri genome-wide in Kc cells and find that numerous sites for these factors overlap with the architectural protein CTCF Depletion of CTCF by RNAi results in the disappearance of a subset of Smad sites, suggesting Smad proteins localize to CTCF binding sites in a CTCF-dependent manner. Sensitive Smad binding sites are enriched at low occupancy CTCF peaks within topological domains, rather than at the physical domain boundaries where CTCF may function as an insulator. In response to Decapentaplegic, CTCF binding is not significantly altered, whereas Mad, Medea, and Schnurri are redirected from CTCF to non-CTCF binding sites. These results suggest that CTCF participates in the recruitment of Smad proteins to a subset of genomic sites and in the redistribution of these proteins in response to BMP signaling (Van Bortle, 2015).

TGF-β effector proteins have been shown to co-localize with mammalian CTCF in a CTCF-dependent manner at just 2 individual loci. This observation has been extended to Drosophila using a genome-wide approach, providing evidence that architectural protein CTCF and canonical Smad signaling proteins, both highly conserved from fly to humans, co-localize on a global scale. Context-specific features were uncovered in which Smad localization is dependent or independent of CTCF binding. Interestingly, genome-wide analysis identifies Mad, dSmad2, Medea, and Schnurri binding to previously characterized response elements even in the absence of DPP ligand, in which levels of phosphorylated Mad are undetectable. This signal-independent clustering of signaling proteins suggests that the genomic TGF-β signaling response is not as simple as regulating binary 'off vs. on' states, dependent on phosphorylated Mad. However, attempts to map the genomic landscape of phosphorylated-Mad before and after DPP stimulation were unsuccessful, likely due to issues with currently available p-Mad antibodies. Though it was not possible to determine the role of phosphorylation as a determinant in Mad localization, it is conceivable that phosphorylation of Mad might play a role in regulating the resident time of DNA-binding, the recruitment of additional regulatory partners, or the ability to establish functional long-range interactions (Van Bortle, 2015).

Smad co-binding at dCTCF sites is sensitive to dCTCF depletion at low occupancy dCTCF target sequences for which Smad consensus sequences are depleted, whereas high occupancy dCTCF binding sites co-bound by additional architectural proteins remain unaffected. The dCTCF-independent recruitment of Smads to high occupancy APBSs suggests that additional architectural proteins may redundantly recruit Smads, or simply provide an accessible chromatin landscape to which Mad, Medea, and dSmad2 can associate. Nevertheless, dCTCF-dependent localization of Smad proteins to specific low occupancy elements is consistent with the CTCF-dependent nature of Smad binding at both the APP and H19 promoters in humans. It is speculated that dCTCF-dependent Smad localization to low occupancy APBSs within topological domains may represent regulatory elements involved in enhancer-promoter interactions, whereas dCTCF-independent high occupancy APBSs are involved in establishing higher-order chromosome organization. What role Smads might play in establishing or maintaining such long-range interactions relevant to chromosome architecture, or whether Smads and other transcription factors simply localize to high occupancy APBSs due to chromatin accessibility, remains difficult to address. However, it has been recently shown that high occupancy APBSs are distinct from analogous transcription factor hotspots, suggesting some level of specificity, most likely governed by protein-protein interactions, decides which factors can associate and where. Alternatively, the enrichment of ChIP-seq signal at high occupancy APBSs may, to some degree, reflect indirect association via long-range interactions with regulatory elements directly bound by Smad proteins. This possibility raises a potential explanation for why Smad ChIP signal is independent of dCTCF binding at high occupancy APBSs (Van Bortle, 2015).

Surprisingly, DPP-activated phosphorylation of Mad does not lead to significant changes in dCTCF binding, whereas Mad, Medea, and Schnurri levels increase at regulatory elements away from dCTCF. These results suggest that TGF-β signaling in Kc167 cells redirects Smad binding to genomic loci independent of architectural proteins, and that architectural proteins may facilitate binding of nuclear Smad proteins in the absence of signaling. The complete loss of Smad ChIP signal at numerous dCTCF binding sites enriched for the core dCTCF consensus sequence nevertheless provides compelling evidence that recruitment of Smad proteins is directly governed by Drosophila CTCF at a subset of binding sites. These results establish CTCF as an important determinant of Smad localization and, depending on the cell-type specific binding patterns of CTCF, suggest that CTCF might also influence the tissue-specific localization of Smad proteins analogous to master regulatory transcription factors in multi-potent stem cells (Van Bortle, 2015).

Architectural protein Pita cooperates with dCTCF in organization of functional boundaries in Bithorax Complex

Boundaries in the Bithorax Complex (BX-C) of Drosophila delimit autonomous regulatory domains that drive parasegment-specific expression of homeotic genes. BX-C boundaries have two critical functions: they must block crosstalk between adjacent regulatory domains, and at the same time facilitate boundary bypass. The C2H2 zinc finger Pita protein binds to several BX-C boundaries including Fab-7 and Mcp. To study Pita functions, a boundary replacement strategy was used by substituting modified DNAs for the Fab-7 boundary, which is located between the iab-6 and iab-7 regulatory domains. Multimerized Pita sites block iab-6<-->ab-7 crosstalk but fail to support iab-6 regulation of Abd-B (bypass). In the case of Fab-7 a novel sensitized background was used to show that the two Pita sites contribute its boundary function. Although Mcp is from BX-C, it does not function appropriately when substituted for Fab-7; it blocks crosstalk but does not support bypass. Mutation of the Mcp Pita site disrupts blocking activity and also eliminates dCTCF binding. In contrast, mutation of the Mcp dCTCF site does not affect Pita binding, and this mutant boundary retains partial function (Kyrchanova, 2017).

Previous studies on the Pita (also known as Spotted Dick) protein suggested that it is a transcriptional activator and showed that the replication defects in pita mutants and in RNAi knockdowns were due to a reduction in the expression of the replication origin protein Orc4. The experiments presented in this study, together with previous studies (Maksimenko, 2015), indicate that pita has an additional, if not an entirely different, function, which is chromosome architecture. This paper details the evidence in favor of this conclusion, and also discuss the implications of findings for boundary function in the context of BX-C (Kyrchanova, 2017).

Boundary replacement experiments provide compelling evidence that the zinc-finger protein Pita functions just like other insulator/architectural proteins. When placed in the context of Fab-7, multimerized Pita-binding sites block crosstalk between iab-6 and iab-7, but are not permissive for the regulatory interactions between iab-6 and the Abd-B gene. In this respect, the functioning of the multimerized Pita-binding sites is similar to that observed when multimerized sites for 'canonical' boundary factors, dCTCF and Su(Hw), are substituted for the Fab-7 boundary. In the context of Fab-7, they also block crosstalk between iab-6 and iab-7, but do not support bypass (Kyrchanova, 2017).

The boundary functions of the Pita protein are also supported by experiments testing its activity in a native context. For Fab-7, there are two Pita-binding sites in the HS2 hypersensitive region. Since previous studies have shown that HS1 is sufficient for full boundary activity, when the iab-7 PRE (HS3) is present, it is clear that Pita function is redundant. This was confirmed by introducing mutations in the two Pita-binding sites in a Fab-7 boundary, HS1+2+3, that lacks the '*' nuclease-hypersensitive site, but contains the iab-7 PRE. However, a different result was obtained in the context of a sensitized replacement, HS1+2, in which the iab-7 PRE (HS3) is deleted. In this sensitized background, the two Pita-binding sites in HS2 are essential for boundary activity (Kyrchanova, 2017).

Interestingly, the sensitized HS1+2 Fab-7 replacement has unprecedented properties. Unlike previously described Fab-7 mutations, which are dominant, the boundary defects of HS1+HS2 can be fully complemented by a wild-type boundary in trans. Additionally, as a homozygote, it has differential effects on the specification of dorsal and ventral tissues. The A6 (PS11) sternite is missing in HS1+2 males. This gain-of-function transformation indicates that boundary activity is disrupted in the cells that give rise to this ventral cuticular structure. By contrast, the A6 tergite is not only nearly normal in size, but is also properly specified. This finding means that boundary activity is largely retained in the PS11 cells that give rise to dorsal cuticle structures (Kyrchanova, 2017).

It is also worth noting that HS1+2 is very different from mutations that delete the iab-7 PRE (HS3) but retain the entire Fab-7 boundary. First, the vast majority of homozygous iab-7 PRE (HS3) deletion males are indistinguishable from wild type, arguing that the HS3 deletion retains full boundary function. Second, in a few of the males (~2.5%), small sections of the dorsal A6 tergite are missing. This phenotype is most readily explained by a loss of PRE silencing, and consequent gain-of-function transformation, in a subset of the cells that give rise to the dorsal cuticle. As the HS1+2 replacement differs from all of the iab-7 PRE (HS3) deletions isolated previously in that it lacks 'HS*', it would appear that this part of the boundary contains binding motifs for factors that are important for boundary function specifically in ventral tissues (Kyrchanova, 2017).

This would not be the only Fab-7 boundary factor that has 'developmentally' restricted activity. The two large complexes known to be important for Fab-7 HS1 boundary function, Elba and LBC, are active at different stages of development; the former in early embryos and the latter from mid-embryogenesis onwards. The fact that there is likely to be yet another boundary factor whose activity is developmentally restricted, fits with the idea that boundary function in flies can be subject to stage- and/or tissue-specific regulation (Magbanua, 2015; Kyrchanova, 2017 and references therein).

One of the paradoxes posed by the BX-C boundaries is that six of the nine regulatory domains in the complex are separated from their homeotic target genes by one or more boundaries. Consequently, these boundaries must, on the one hand, block regulatory interactions between adjacent domains and, on the other, facilitate boundary bypass. One of models to explain these two contradictory activities is that BX-C boundaries have unique properties, i.e. they are designed to block interactions between enhancers/silencers, but not between enhancers/silencers and promoters. This model gained currency from replacement experiments, which showed that the BX-C boundary Fab-8 can substitute for Fab-7, while two heterologous boundaries cannot. A prediction of the model is that other BX-C boundaries could also substitute for Fab-7. However, contrary to this prediction, the current experiments indicate that Mcp340 boundary behaves like the heterologous fly boundaries -- it blocks both crosstalk and bypass (Kyrchanova, 2017).

Analysis of the effects of mutations in the Pita and dCTCF sites of the Mcp340 boundary suggest that there is a complicated relationship between blocking crosstalk and blocking or enabling bypass. Although mutations in the Pita and dCTCF disrupt the functioning of Mcp340 replacement, the actual consequences of each mutation are quite distinct. In the case of the Pita mutation, loss of Pita binding was found to lead to a substantial reduction in the binding of dCTCF. This means that for this particular boundary, Pita association is required to recruit dCTCF. The requirement is not, however, reciprocal: deleting the Mcp dCTCF site has no effect on Pita association (Kyrchanova, 2017).

Correlated with the differential effects on protein binding, the M340ΔPita and M340ΔCTCF mutants have quite different phenotypes. The phenotype (mixed gain and loss of function) of the former resembles a classic Fab-7 boundary deletion in which the iab-7 PRE (HS3) is retained. In contrast, the phenotype of the latter is a mixture of gain and loss of function, together with cuticle that has morphological features identical to that in A6 (PS11) of wild-type flies. The presence of cuticle that has the proper PS11 identity argues that M340ΔCTCF retains residual boundary function that, in a subset of cells, is sufficient to not only block crosstalk between iab-6 and iab-7, but is also able to facilitate iab-6 bypass (Kyrchanova, 2017).

A simple interpretation of this finding is that the Pita protein differs from dCTCF in that it blocks crosstalk but can facilitate bypass. However, this simple model is not supported by other findings. First, as noted above, just like multimerized dCTCF sites, multimerized Pita sites block both crosstalk and bypass when substituted for Fab-7. Second, the two Pita sites in Fab-7 are not in themselves sufficient for blocking crosstalk. Third, the Fab-8 boundary, which has two dCTCF sites, but no sites for Pita, has both blocking and bypass activity when substituted for Fab-7. Moreover, these dCTCF sites appear to contribute to the bypass activity of the Fab-8 replacement. Thus, a more likely hypothesis is that there are other, as yet unidentified, factors that are bound to Mcp340 and contribute to the blocking and (newly acquired) bypass activities of the M340ΔCTCF mutant, in addition to the Pita protein. Taken together with the finding that reversing Fab-8 eliminates bypass activity (Kyrchanova, 2016), the current experiments with Mcp suggest that there may not be a common mechanism for generating both blocking and bypass activity. Rather, each BX-C boundary would appear to deploy distinct mechanisms that are adapted for their specific context within the complex (Kyrchanova, 2017).

Based on RNAi knockdown experiments in S2 cells, it has been suggested that Pita is a transcriptional activator and that it could play a crucial role in coordinating S phase progression. This idea was supported by experiments showing that the replication defects induced by Pita depletion are caused by a reduction in Orc4 expression. However, only 32 genes are downregulated (and 10 upregulated) after Pita RNAi, and most appear to have nothing to do with replication. Furthermore, as there are several thousand Pita sites in the genome, the number of affected genes is surprisingly low. In this light, an obvious question is whether blocking, instead of transcriptional activation, might account for the effects of Pita depletion on the Orc4 transcription? Although this study did not investigate how Pita functions in the S2 cells, there are reasons to think that this is a distinct possibility. ChIP experiments have shown that Pita binds to a region upstream of the Orc4 gene in S2 cells. ModEncode ChIP experiments indicate that there is a large PcG silenced domain just beyond this Pita site. Thus, an alternative possibility is that Orc4 expression is reduced when Pita is depleted, because the gene is silenced by the PcG spreading. Several of the other Pita transcriptional targets are also close to the PcG domains, and could be silenced in a similar manner (Kyrchanova, 2017).

The Drosophila homolog of the mammalian imprint regulator, CTCF, maintains the maternal genomic imprint in Drosophila melanogaster

CTCF is a versatile zinc finger DNA-binding protein that functions as a highly conserved epigenetic transcriptional regulator. CTCF is known to act as a chromosomal insulator, bind promoter regions, and facilitate long-range chromatin interactions. In mammals, CTCF is active in the regulatory regions of some genes that exhibit genomic imprinting, acting as insulator on only one parental allele to facilitate parent-specific expression. In Drosophila, CTCF acts as a chromatin insulator and is thought to be actively involved in the global organization of the genome. To determine whether CTCF regulates imprinting in Drosophila, CTCF mutant alleles were generated, and gene expression was assayed from the imprinted Dp(1;f)LJ9 mini-X chromosome in the presence of reduced CTCF expression. Disruption of the maternal imprint was observed when CTCF levels were reduced, but no effect was observed on the paternal imprint. The effect was restricted to maintenance of the imprint and was specific for the Dp(1;f)LJ9 mini-X chromosome. It is concluded that CTCF in Drosophila functions in maintaining parent-specific expression from an imprinted domain as it does in mammals. It is proposed that Drosophila CTCF maintains an insulator boundary on the maternal X chromosome, shielding genes from the imprint-induced silencing that occurs on the paternally inherited X chromosome (MacDonald, 2010).

The effect of dCTCF on maternal-specific expression is limited to the maintenance of imprint. The presence of mutant dCTCF in either the maternal or paternal parents, when the imprint is being established, does not affect the imprint in the progeny. These results are strikingly similar to the role of CTCF in mammalian imprinting, where CTCF assists in the postfertilization formation of an imprinted region, but is dispensable for the establishment of an imprint (MacDonald, 2010).

Furthermore, the requirement for dCTCF for maintenance of the maternal Dp(1;f)LJ9 imprint is specific and does not represent a ubiquitous role for dCTCF in regulating heterochromatic silencing. Not only is the paternal Dp(1;f)LJ9 imprint unaffected by mutant dCTCF, but other variegating Drosophila reporter genes respond differently to mutant dCTCF. Thus, the association of dCTCF expression with the maintenance of the maternal Dp(1;f)LJ9 imprint boundary demonstrates a distinct function for dCTCF in imprinted gene expression (MacDonald, 2010).

In mammals, maternally imprinted regions that bind CTCF rely critically on this binding to insulate the imprinted loci and establish distinct chromatin domains. The results show that a reduction in dCTCF levels disrupts the maternal imprint boundary on the Drosophila Dp(1;f)LJ9 mini-X chromosome, and consequently the marker gene, garnet, is silenced. Variegated silencing of garnet from Dp(1;f)LJ9PAT inheritance is a consequence of heterochromatin formation, nucleated from the paternal imprint control region, spreading in cis. The absence of an effect upon the introduction of dCTCF mutant alleles to Dp(1;f)LJ9PAT suggests that dCTCF binding and boundary function occurs only on the maternal chromosome. Thus, it is conceivable that a reduction in dCTCF levels enables the spreading of heterochromatin on the maternal Dp(1;f)LJ9MAT in a manner similar to that of the paternal Dp(1;f)LJ9PAT. This would suggest that dCTCF defines the boundary of a distinct maternal-specific imprinted chromatin domain required to maintain maternal-specific gene expression on the X chromosome (MacDonald, 2010).

The model organism Encyclopedia of DNA Elements (modENCODE) project provides detailed mapping of regulatory elements throughout the Drosophila genome. Large-scale profiling of dCTCF insulator sites from early embryo modENCODE data reveals several candidate dCTCF insulator sites present proximal to the predicted heterochromatic breakpoint of the Dp(1;f)LJ9 mini-X chromosome. These dCTCF insulator sites, located between the centric heterochromatic imprinting center and the imprint marker gene garnet, could account for the sensitivity of the maternal imprint to dCTCF expression. If dCTCF were bound only when the X chromosome was transmitted maternally, mutations to dCTCF would disrupt insulator function and lead to maternal silencing of the imprint marker gene. Although such binding remains to be tested, it is similar to the function of CTCF at mammalian imprinted regions (MacDonald, 2010).

That the structure of CTCF and its role as an insulator, barrier, and transcriptional regulator is conserved between mammals and insects have been well established. However, the finding that CTCF maintains its function in regulating the imprinting of diverse genes in such phylogenetically distinct organisms is remarkable. CTCF is a versatile DNA binding factor; subsets of its zinc fingers are adept at binding diverse DNA sequences, and the rest of the protein is able to maintain common regulator interactions and insulator function. This feature may explain how CTCF can regulate imprinting in organisms as diverse as insects and mammals, in which the imprinted target sequences are different (MacDonald, 2010).

Previously, the evolutionary origin of imprinting has been extrapolated from the conservation of imprinting among specific genes. Such studies have led to the proposal that mammalian imprinting is of relatively recent origin and restricted to eutherian mammals. However, studies showing that the molecular mechanism of imprinting is highly conserved have suggested a much more ancient origin. Mammalian imprint control elements inserted into transgenic Drosophila act as discrete silencing elements and can retain posttranscriptional silencing mechanisms involving noncoding RNA. Whereas these transgenic imprinting elements lose their parent-specific functions, the retention of epigenetic silencing mechanisms suggests an ancient and conserved origin of imprinting mechanisms. The finding that CTCF has a role in the maintenance of maternal imprints in insects, as it does in mammals, supports the possibility of evolutionary conservation for both CTCF function and the mechanisms of genomic imprinting (MacDonald, 2010).

Nature and function of insulator protein binding sites in the Drosophila genome

Chromatin insulator elements and associated proteins have been proposed to partition eukaryotic genomes into sets of independently regulated domains. This study tested this hypothesis by quantitative genome-wide analysis of insulator protein binding to Drosophila chromatin. Distinct combinatorial binding was found of insulator proteins to different classes of sites, and a novel type of insulator element was uncovered that binds CP190 but not any other known insulator proteins. Functional characterization of different classes of binding sites indicates that only a small fraction act as robust insulators in standard enhancer-blocking assays. Insulators restrict the spreading of the H3K27me3 mark but only at a small number of Polycomb target regions and only to prevent repressive histone methylation within adjacent genes that are already transcriptionally inactive. RNAi knockdown of insulator proteins in cultured cells does not lead to major alterations in genome expression. Taken together these observations argue against the concept of a genome partitioned by specialized boundary elements and suggest that insulators are reserved for specific regulation of selected genes (Schwartz, 2012).

The binding sites of insulator proteins are often taken to represent elements that partition the genome into independent regulatory domains and demarcate chromosomes into regions of 'active' and 'repressed' chromatin. The results presented in this study give little support to this view as a general principle of genome organization although it may be true in certain regions. Instead it is argued that: 1) insulator proteins bind to genomic sites in specific combinatorial patterns, 2) the properties of sites bound by key insulator proteins SU(HW) and CTCF are markedly different depending on whether the two co-bind with CP190, 3) many of the known insulator proteins sites do not function as robust enhancer blockers, and 4) at least in cultured cells the depletion of insulator proteins has a limited impact on genome-wide gene expression (Schwartz, 2012).

Classifications of combinatorial binding of insulator proteins have been described previously. These classifications relied on the overlapping of bound regions defined according to arbitrary statistical thresholds and the position of these regions relative to TSSs. Because they did not take into account the relative strengths of binding, such classifications grouped together binding sites with very different biochemical and functional properties (Schwartz, 2012).

In contrast, this study defines the persistent co-binding patterns based on the strength of binding of the associated proteins, treating regions strongly bound by a combination of proteins differently from regions at which the same proteins are detected according to a statistical threshold but where the extent of their binding is disproportional. It is argued that this approach retains the information on biochemical interrelations between the co-bound proteins and separates the sites with different functional properties. The strongest support for this argument comes from RNAi knock-down experiments which demonstrate that the effect of the loss of one insulator protein on the binding of another insulator protein is constrained to a specific class of co-bound regions. For example, the knock-down of SU(HW) results in the loss of CP190 from class 3 (gypsy-like) sites but not from class 9 (CTCF+CP190) or class 5 (BEAF-32+CP190) sites (Schwartz, 2012).

The approach to select the sites representative of each co-binding class is conservative and inevitably excluded a fraction of binding sites from downstream analyses. For example, strong SU(HW) binding sites assigned to class 14 by initial overlap comparison were not analyzed further due the uncertainty of their co-binding by CP190. It is therefore cautioned that selection of representative binding sites is not a complete genomic catalogue and readers are advised to use the ChIP-chip binding profiles, deposited to GEO and modMINE, to gauge whether their locus of interest has a strong insulator protein binding site (Schwartz, 2012).

The prevailing model in the field suggests that CP190 is recruited to different insulator elements by DNA binding proteins where it serves as a universal adapter that mediates interactions between different insulator elements. The current results present a more complex picture. First, RNAi knock-down experiments demonstrate that the binding of SU(HW) protein to class 3 (gypsy-like) sites is dependent on CP190, indicating that CP190 is not passively tethered to common sites by SU(HW) and instead plays an active role in recruitment and/or stabilization of the bound complex. Second, the sequence analysis of class 9 (CTCF+CP190) sites suggests that the binding of both proteins to these sites is likely due to the coincidence of cognate recognition sequences. Third, RNAi knock-down experiments indicate that BEAF-32 is dispensable for CP190 binding at shared sites. Clearly CP190 plays an active role in the selection of sites shared with SU(HW), CTCF or BEAF-32. It is still possible that once it co-binds, or binds sufficiently close to another insulator protein, it may mediate the trans-interactions of the bound sites. However, such interactions would have to be rather transient, at least in cultured cells, as they are not easily detected in the ChIP-chip data (Schwartz, 2012).

Three subclasses of a Drosophila insulator show distinct and cell type-specific genomic distributions

Insulators are protein-bound DNA elements that are thought to play a role in chromatin organization and the regulation of gene expression by mediating intra- and interchromosomal interactions. Suppressor of Hair-wing [Su(Hw)] and Drosophila CTCF (dCTCF) insulators are found at distinct loci throughout the Drosophila genome and function by recruiting an additional protein, Centrosomal Protein 190 (CP190). Chromatin immunoprecipitation (ChIP) and microarray analysis (ChIP-chip) experiments were performed with whole-genome tiling arrays to compare Su(Hw), dCTCF, boundary element-associated factor (BEAF), and CP190 localization on DNA in two different cell lines; evidence was found that BEAF is a third subclass of CP190-containing insulators. The DNA-binding proteins Su(Hw), dCTCF, and BEAF show unique distribution patterns with respect to the location and expression level of genes, suggesting diverse roles for these three subclasses of insulators in genome organization. Notably, cell line-specific localization sites for all three DNA-binding proteins as well as CP190 indicate multiple levels at which insulators can be regulated to affect gene expression. These findings suggest a model in which insulator subclasses may have distinct functions that together organize the genome in a cell type-specific manner, resulting in differential regulation of gene expression (Bushey, 2009).

Su(Hw), dCTCF, and BEAF have all been implicated in chromatin loop formation, and the interaction of these different DNA-binding proteins with CP190 could have functional implications for the arrangement of the chromatin fiber within the nucleus. The work presented in this study provides critical insight into the genome-wide distribution of these four insulator proteins and is a first, crucial step toward understanding the role that they play in chromatin organization and the regulation of gene expression (Bushey, 2009).

Although insulator elements containing Su(Hw), dCTCF, and BEAF could, in principle, play similar roles, it was found that they have very different distribution patterns with respect to gene location. Only 20% of Su(Hw) sites are located within 1 kb of the 5' or 3' ends of genes. In contrast, 47% of dCTCF sites and 84% of BEAF sites are found within 1 kb of gene ends, and their distributions are highly skewed toward the 5' end of highly expressed genes. dCTCF and BEAF appear to display further functional compartmentalization in their roles, since BEAF tends to be present at the 5' end of genes involved in metabolic processes and dCTCF is enriched near genes involved in developmental processes. This could indicate that BEAF plays a specific role in the regulation of gene units consisting of metabolic genes, whereas Su(Hw) may play a more general role by setting the foundation for chromatin organization. dCTCF, which shows an intermediate distribution compared with Su(Hw) and BEAF, may sometimes function in large-scale organization and sometimes work at the level of individual developmental gene units. Together, the three CP190 insulator subclasses could create a chromatin web that is part of the framework organizing DNA in the nucleus (Bushey, 2009).

Insulators have been typically characterized as sequences capable of regulating interactions between transcriptional regulatory sequences and/or chromatin states. This function can easily be envisioned in the case of Su(Hw) and dCTCF sites located far from genes, where these sites could flank a group of transcription units that would then represent a domain of coregulated genes. If this is the case, what is the function of the remaining dCTCF and BEAF sites located close to the 5' and 3' ends of genes? This distribution is surprising in the context of what is normally think of as insulator function; however, when CTCF-binding sites were mapped in the human genome, a similar distribution pattern was observed (Kim, 2007; Cuddapah, 2009). This is suggestive of a wider role for insulator proteins than just the establishment of chromatin domains, and, in fact, alternative insulator protein functions have been suggested. For example, CTCF in humans is present in the Igh locus in many of the VH as well as DH and JH exons, suggesting a role in V(D)J recombination (Degner, 2009). Additionally, this study provides evidence that insulator proteins near genes play a role in the regulation of expression of specific genes and suggests that the mechanism behind this regulation differs from classic transcription factors, since the same insulator complexes were seen to both activate and repress transcription. These functions could be a consequence of the ability of these proteins to both interact with each other and mediate intra- and interchromosomal loops. Bringing together various insulator protein-binding sites could facilitate localization to either transcriptionally active or transcriptionally repressed regions of the nucleus depending on the genomic context of the sites (Bushey, 2009).

Comparison of the genome-wide distribution of the three insulator subclasses in two different cell lines has led to insights into possible mechanisms employed during cell differentiation to establish different patterns of gene expression. Overall, the analysis suggests that cells may control insulator function at multiple levels and that these forms of regulation occur throughout the genome. Regulation of insulator function seems to begin at the level of DNA binding, since differential binding was observed at 5%-37% of sites for Su(Hw), dCTCF, and BEAF between two different cell lines even with the most conservative statistical analysis. Similar percentages of cell type-specific binding sites were observed for vertebrate CTCF between different cell lines (Kim, 2007; Cuddapah, 2009). Previous analysis of Su(Hw) binding in various tissues has not revealed any significant tissue-specific binding sites, perhaps because only a small number of sites was analyzed in these studies. Alternatively, the discrepancy could be due to the use of whole tissues in previous studies that contain multiple cell types, making it difficult to detect cell type-specific sites (Bushey, 2009).

After Su(Hw), dCTCF, and BEAF bind DNA, they are thought to recruit other proteins such as CP190. Regulation at this level was observed throughout the genome, where a subset of the Su(Hw), dCTCF, and BEAF sites recruit CP190 in a cell type-specific manner. The additional Su(Hw), dCTCF, and BEAF sites that do not recruit CP190 in either Kc or Mbn2 cells may do so in other cell types or other growth conditions not tested in this study. This idea is supported by the two dCTCF sites in the bithorax region that were found to contain CP190 in third instar larvae brains (Mohan, 2007) but not in the data sets collected in Kc cells or Mbn2 cells. Although further study is needed to determine which sites of insulator protein localization participate in chromatin organization, it is expected that sites lacking CP190 do not, since mutations in CP190 are known to disrupt insulator body formation and only those sites that recruit CP190 seem to affect gene expression. Therefore, these sites may represent insulators that are poised for incorporation into chromatin loops upon recruitment of CP190. On the other hand, these sites could function through the recruitment of an alternative cofactor and in this way represent a functionally distinct subset of Su(Hw)-, dCTCF-, and BEAF-binding sites (Bushey, 2009).

An additional layer of regulation may then occur at the level of protein-protein interactions mediated by CP190. This type of regulation cannot be gleaned from ChIP-chip data, but other experiments have shown that sumoylation of insulator proteins is able to inhibit protein-protein interactions affecting Su(Hw) insulator body formation but not association of insulator proteins with DNA. Similarly, vertebrate CTCF insulator function has been linked to poly(ADP-ribosyl)ation (PARlation), and it has been suggested that PARlation facilitates CTCF self-interaction. Furthermore, the presence of RNA and RNA-binding proteins may also contribute to the formation or maintenance of insulator bodies required to create chromatin loops. Finally, insulator bypass that results in the inactivation of insulator activity through pairing of nearby insulator elements, and specialized sequences such as the promoter targeting sequences (PTS), can allow an enhancer to bypass an insulator. These forms of regulation may alter the ability of insulator proteins to interact with one another to regulate insulator loop formation (Bushey, 2009).

It is expected that these various forms of regulation including DNA binding, CP190 recruitment, and loop formation result in changes in gene expression between different cell lines. However, transcription analysis with insulator proteins is difficult since insulator elements are thought to control regulatory elements such as enhancers and silencers that can be found far away from their target promoters. Therefore, determining which genes are controlled by an insulator site is not a trivial process. In the transcription analysis, genes were considered with a cell type-specific insulator site only within the gene or the 1 kb surrounding region. Despite this limitation, a significant enrichment was still seen for genes that change expression between cell types when they have a cell type-specific insulator site nearby, supporting the idea that insulator proteins are involved in the regulation of gene expression. Genes that did not change expression despite being located near a cell type-specific insulator protein-binding site may not be the actual target genes of the insulator sites; therefore, this analysis probably greatly underestimates the effect of insulator proteins on gene expression. Additionally, it was found that insulator protein-binding sites that localize to genes are enriched at genes with certain expression signals, high expression for dCTCF and BEAF, and low expression for Su(Hw). However, comparison between the two cell lines revealed that expression can be either positively or negatively regulated by sites with each DNA-binding protein. Therefore, although an insulator protein associates with a highly expressed gene, it may lead to either an increase or decrease in transcription of this gene. The observed level of expression may be an additive effect of many different regulatory elements, including multiple insulator sites. Different mechanisms may be used to regulate a highly transcribed gene versus a gene with low levels of transcription, and therefore the different insulator subclasses may target these different mechanisms (Bushey, 2009).

The transcription analysis in this study suggests that insulator proteins play a role in the regulation of gene expression, but has just begun to explore the depth of their effect. Numerous steps at which insulator activation can be subject to regulation allow for a vast amount of variation between different cell types and could play a major role in establishing the diverse patterns of chromatin organization necessary for cell type-specific gene expression. The different CP190 insulator subclasses might have distinct roles in this cell type-specific nuclear organization. In vertebrates, CTCF is the only insulator known thus far, and an important question to address in the future is the apparent disparity between genome complexity and insulator diversification between flies and vertebrates. It is possible that vertebrates have insulator subclasses represented by DNA-binding proteins other than CTCF that have not yet been identified. Alternatively, it is possible that vertebrate CTCF has acquired all the functions of the various Drosophila insulator subclasses. The distribution pattern of dCTCF suggests that it can play a role in both global organization and in the regulation of individual genes, making it the most likely candidate of the three Drosophila subclasses to play this overarching organizational role in vertebrates. Therefore, vertebrates may use methods other than variant DNA-binding proteins to distinguish insulator subclasses, such as recruitment of different CTCF interaction partners at different insulator sites. This would make it difficult to distinguish between the various layers of insulator control in the vertebrate genome. If this is the case, Drosophila could provide a powerful model system to dissect the various functions and levels of regulation of chromatin insulators (Bushey, 2009).

A comprehensive map of insulator elements for the Drosophila Genome

Insulators are DNA sequences that control the interactions among genomic regulatory elements and act as chromatin boundaries. A thorough understanding of their location and function is necessary to address the complexities of metazoan gene regulation. The genome-wide binding sites of 6 insulator associated proteins (dCTCF, CP190, BEAF-32, Su(Hw), Mod(mdg4), and GAF) was studied to obtain the first comprehensive map of insulator elements in Drosophila embryos. Over 14,000 putative insulators, including all classically defined insulators, were identified. Two major classes of insulators were defined by dCTCF/CP190/BEAF-32 and Su(Hw), respectively. Distributional analyses of insulators revealed that particular sub-classes of insulator elements are excluded between cis-regulatory elements and their target promoters; divide differentially expressed, alternative, and divergent promoters; act as chromatin boundaries; are associated with chromosomal breakpoints among species; and are embedded within active chromatin domains. Together, these results provide a map demarcating the boundaries of gene regulatory units and a framework for understanding insulator function during the development and evolution of Drosophila (Négre, 2010).

This study the embryonic binding profile of six factors previously known to be associated with insulator function in Drosophila. The analysis of insulator binding site distributions and protein composition suggest there exist 2 principal categories of insulator elements (Class I and Class II). In particular, it was shown that Class I insulators, identified by the binding of CTCF, CP190 or BEAF-32, segregate differentially expressed genes and delimit the boundaries of chromatin silencing, while they are depleted between known CRMs and their target genes. No evidence was found supporting a significant distinction between CP190/BEAF and CP190/CTCF or CTCF/BEAF. In contrast, the analyses suggest that BEAF-32, CP190, and CTCF are distributed and function quite similarly, while Su(Hw) appears distinct. The Class II insulators, bound by Su(Hw), are often exceptional in this analyses. It is noted that the analysis of genome-wide mapping data, expression data, and genome annotation provides an endogenous boundary assay that demonstrates that, while Su(Hw) has been described as an insulator before, it is not systematically associated with the boundaries of the gene units (Négre, 2010).

By helping to delimit the regulatory boundaries of genes, the Class I insulator map presented in this study will aid in the identification of transcription factor target genes and the construction of transcriptional regulatory networks. As an example of this concept, the distribution of known regulatory elements and insulators across the Antennapedia Complex (ANT-C) of homeotic genes is presented. This region quite strikingly demonstrates the potential utility of insulator binding data for cis-regulatory annotation. Across approximately 500 kb, cis-regulatory elements and their target promoters are found between insulator pairs. For example, a single insulator separates the lab and Edg84A genes, with their respective cis-regulatory elements narrowly partitioned on either side. The adjacent regulatory elements and promoters of zen and bcd are similarly insulator segregated (Négre, 2010).

Consistent with their observed regulatory boundary functions, Class I insulators are embedded within local regions of active chromatin and are frequently associated with syntenic breakpoints between species. Previous work has demonstrated that active promoters in yeast and Drosophila are associated with reduced nucleosome occupancy and low-salt soluble and high-salt insoluble chromatin. Therefore, surprisingly, dynamic chromatin is a shared feature between promoters and most classes of insulators. It is notable however that some studies have revealed functional similarities between insulators and promoters in transgenic assays. These results have been described as paradoxical, as insulators can negatively affect promoters by blocking communication between enhancers and promoters. One proposed model for insulator function is that they act as promoter 'decoys' by recruiting away factors necessary for transcriptional initiation. Alternatively, insulators and promoters might require common chromatin features to function by mechanisms that are still unknown. One potential interpretation is that the dynamic chromatin at insulators forms a flexible chromatin joint that would affect the probability of productive contact between separated regulatory elements. In this way, the similarity between promoters and insulators would be a consequence of their common requirement for dynamic chromatin, although with very different consequences. This model may explain why promoters are so frequently scored as insulators in the classical insulator assay, when an element is placed between an enhancer and a promoter (Négre, 2010).

Adaptive evolution and the birth of CTCF binding sites in the Drosophila genome

Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. This study examined the genome-wide binding of CTCF in four species of Drosophila separated by between approximately 2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, it was found that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, signatures of selection were detected in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. These data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes (Ni, 2012).

Ever since the importance of gene regulation for phenotypic variation has been proposed, evolution of cis-regulatory elements has been under intensive investigation with an emphasis on enhancers and transcription factor binding sites. Insulator elements are a special class of cis-elements implicated in many fundamental biological processes including transcriptional regulation. Despite their functional importance, the origin and evolution of insulator complexes remained largely uncharted. Only very recently was the first comparative ChIP-seq study on CTCF in mammalian species published. This study has presented a formal evolutionary genetic analysis of CTCF-related insulator elements in multiple Drosophila species (Ni, 2012).

CTCF binding was hound to be highly evolutionarily dynamic, with about 70% of binding events diverged between D. melanogaster and D. pseudoobscura. This high level of evolutionary divergence is consistent with a recent mammalian study, in which the CTCF binding conservation between human and mouse was estimated to be around 30%. While in mammalian species, CTCF binding profiles are more conserved than tissue-specific transcription factors; in Drosophila species, higher binding divergence of CTCF was observed compared to that of the developmental transcription factor Twist. In fact, the high degree of binding divergence observed in liver-specific transcription factor CEBPA and HNF4A has led to a proposal of neutral drift underlying binding evolution. However, the population genetic analysis of binding divergence of both the Twist data and the CTCF data indicates that both purifying and positive selection are active forces in CTCF binding evolution. Although previous studies on Drosophila noncoding DNA and DNA foot-printing-derived TFBS sequences have suggested the role of positive selection, this study presents the first genome-wide evidence in support of positive selection using protein-binding-associated DNA mapped in vivo (Ni, 2012).

The observation that young binding sites exhibit a signature of positive selection mimics the pattern observed with young genes, indicating that the origination of new binding sites is driven by positive selection. Further, the association between CTCF binding divergence and gene expression divergence indicates that change in CTCF binding has functional consequence. The fact that CTCF binding origination in multiple species coincided with new gene appearance also reinforces this functional view of binding change. The binding changes of this insulator protein may well result in regulatory rewiring through structurally redefining regulatory domains. It is predictd that this might be a universal mechanism in cis-regulatory evolution since CTCF protein is highly conserved across the metazoans. Indeed, in mammalian species, lineage-specific CTCF binding sites are observed to demarcate both chromatin and gene expression domains. Consistent also with the functional relevance of evolutionary changes in CTCF binding profiles, it was observed that old and conserved CTCF binding sites are subject to stronger purifying selection and that expression levels of genes near these conserved sites are less likely to diverge. Together these observations indicate that functional constraints maintain conserved binding. This meshes well with the study on Twist (He, 2011), in which it was found that the most developmentally important genes in early embryo development have the most conserved Twist binding. In summary, this study has provided evidence that the evolution of CTCF binding in Drosophila species is adaptive (Ni, 2012).

Analysis of chromatin boundary activity in Drosophila cells

Chromatin boundaries, also known as insulators, regulate gene activity by organizing active and repressive chromatin domains and modulate enhancer-promoter interactions. However, the mechanisms of boundary action are poorly understood, in part due to limited knowledge about insulator proteins, and a shortage of standard assays by which diverse boundaries could be compared. This paper reports the development of an enhancer-blocking assay for studying insulator activity in Drosophila cultured cells. The activities of diverse Drosophila insulators including suHw, SF1, SF1b, Fab7 and Fab8 are shown to be supported in these cells. It was further shown that double stranded RNA (dsRNA)-mediated knockdown of SuHw and dCTCF factors disrupts the enhancer-blocking function of suHw and Fab8, respectively, thereby establishing the effectiveness of using RNA interference in this cell-based assay for probing insulator function. It is concluded that the novel boundary assay provides a quantitative and efficient method for analyzing insulator mechanism and can be further exploited in genome-wide RNAi screens for insulator components. It provides a useful tool that complements the transgenic and genetic approaches for studying this important class of regulatory elements (Li, 2009).

Despite their diverse genomic origins and distinct cis- and trans- components, the Drosophila suHw, SF1, Fab7 and Fab8 elements function as potent enhancer-blockers in the Drosophila cells. This finding suggests that chromatin boundary represents a basic cell function that is shared by diverse tissues. The cell-based insulator assay was combined with RNAi-mediated gene knockdown to systematically test the requirement of SuHw and dCTCF in the function of several Drosophila insulators. RNAi-mediated knockdown of SuHw and dCTCF specifically disrupted the function of the suHw and Fab8 boundaries, respectively, thereby validating the functional specificity of the assay. The results suggest that multiple independent pathways in Drosophila mediate insulator function. This is in contrast with the pivotal role the CTCF protein plays in the enhancer-blocking activities in vertebrates (Li, 2009).

Cell culture assays have several important advantages that complement studies using in vivo system. The homogeneous cell populations in these assays can be used in biochemical and cell biological analyses. They allow more efficient and quantitative assessment of reporter readout from a large number of individual cells. Insulator activity has previously been demonstrated in Drosophila cells; this system has improved the assay with several novel features. First is the use of P-element-based transgene vector, which is known to mediate single to low copy number, non-tandem genomic integration of the assay transgenes. This would provide more native genomic and regulatory environment for studying chromatin boundary function. Large numbers of stably transfected cells with randomly integrated transgenes also provide a broader sampling of the genomic environment, a feature that can be exploited to examine boundary activity in blocking chromosomal position effect. The second improvement is the use of divergently transcribed dual reporters, which provides a linked readout to control for the 'off-targets' effects on the non-insulator components in the assay system, such as enhancers, promoters, reporters, the state of general transcription or other cellular functions that impact the reporter readout. It should also provide an important control for the chromosomal position effect near the transgene integration site in stably transfected cells. The use of fluorescent protein reporters further allows rapid and quantitative FACS assessment of the enhancer-blocking activity, a feature particular important in high-throughput applications. The activity of multiple Drosophila insulators has been established, along with the efficiency of RNAi-mediated gene knockdown; this should facilitate biochemical dissection of insulator function and genome-wide high throughput RNAi screens for novel boundary components (Li, 2009).

As most cell-based systems, the enhancer-blocking assay is limited in its application by potential tissue or developmental stage incompatibilities of the insulator and the cell. Studies have suggested that certain chromatin boundaries, such as Fab7 and SF1, are composed of distinct insulator activities that function in different tissues and/or developmental stages. Although this study has documented the functionality of several Drosophila insulators in S2 and Kc cells, both derived from embryonic cell lineages, other insulators may not function in these two cell lines. In addition, cultured cells may have, over the course of many passages, lost the physiological stoichiometry of relevant DNA or protein components, resulting in impaired function of certain insulators. Furthermore, the dynamic regulation of insulator activity in response to developmental and physiological cues would depend on the context of the whole animal. Therefore, the cell-based insulator assay presented in this study provides a useful tool that complements the transgenic and genetic approaches for studying this important class of regulatory elements (Li, 2009).

The chromatin insulator CTCF and the emergence of metazoan diversity

The great majority of metazoans belong to bilaterian phyla. They diversified during a short interval in Earth's history known as the Cambrian explosion, ~540 million years ago. However, the genetic basis of these events is poorly understood. This study argues that the vertebrate genome organizer CTCF (CCCTC-binding factor) played an important role for the evolution of bilaterian animals. Evidence is provided that the CTCF protein and a genome-wide abundance of CTCF-specific binding motifs are unique to bilaterian phyla, but absent in other eukaryotes. CTCF-binding sites within vertebrate and Drosophila Hox gene clusters have been maintained for several hundred million years, suggesting an ancient origin of the previously known interaction between Hox gene regulation and CTCF. In addition, a close correlation between the presence of CTCF and Hox gene clusters throughout the animal kingdom suggests conservation of the Hox-CTCF link across the Bilateria. On the basis of these findings, the existence of a Hox-CTCF kernel is proposed as principal organizer of bilaterian body plans. Such a kernel could explain (1) the formation of Hox clusters in Bilateria, (2) the diversity of bilaterian body plans, and (3) the uniqueness and time of onset of the Cambrian explosion (Heger, 2012).

CTCF genomic binding sites in Drosophila and the organisation of the Bithorax Complex

Insulator or enhancer-blocking elements are proposed to play an important role in the regulation of transcription by preventing inappropriate enhancer/promoter interaction. The zinc-finger protein CTCF is well studied in vertebrates as an enhancer blocking factor, but Drosophila CTCF has only been characterised recently. To date only one endogenous binding location for CTCF has been identified in the Drosophila genome, the Fab-8 insulator in the Abdominal-B locus in the Bithorax complex (BX-C). This study carried out chromatin immunopurification coupled with genomic microarray analysis to identify CTCF binding sites within representative regions of the Drosophila genome, including the 3-Mb Adh region, the BX-C, and the Antennapedia complex. Location of in vivo CTCF binding within these regions enabled construction of a robust CTCF binding-site consensus sequence (AGGNGGC, the same ase mammalian CTCF). CTCF binding sites identified in the BX-C map precisely to the known insulator elements Mcp, Fab-6, and Fab-8. Other CTCF binding sites correlate with boundaries of regulatory domains allowing localization of three additional presumptive insulator elements; 'Fab-2', 'Fab-3', and 'Fab-4'. With the exception of Fab-7, these data indicate that CTCF is directly associated with all known or predicted insulators in the BX-C, suggesting that the functioning of these insulators involves a common CTCF-dependent mechanism. Comparison of the locations of the CTCF sites with characterised Polycomb target sites and histone modification provides support for the domain model of BX-C regulation (Holohan, 2007).

The multiple zinc-finger DNA-binding protein CTCF is known to be required for the enhancer blocking action of vertebrate insulators, and a clear role for CTCF in the regulation of endogenous gene expression has been demonstrated at the imprinted Igf2. The mode of action of CTCF is, however, still unclear, although several studies have implicated CTCF in the formation of higher-order chromatin structure. CTCF molecules can interact to form clusters and thereby may mediate the formation of chromatin loop domains (Kurukuti, 2006; Splinter; 2006; Yusufzai, 2004). Partitioning of regulatory elements into independent chromatin loop domains is postulated to play a key role in the interactions between enhancers and promoters. The CTCF homolog of Drosophila is required for the insulator function of the Fab-8 element in the BX-C. This observation has opened up the prospect of utilising the wealth of genetic and molecular characterisation of BX-C transcriptional regulation for the analysis of CTCF function. This study used ChIP-array to investigate CTCF binding sites in regions of the Drosophila genome with a particular focus on the BX-C. CTCF not only associates with the Fab-8 insulator, but also with other mapped boundary elements, Fab-6 and Mcp. In addition, CTCF sites are located at other postulated boundaries within the BX-C; 'Fab-2', 'Fab-3', and 'Fab-4'. This provides a precise mapping of regulatory domain boundaries and a specific molecular foundation for the domain model of BX-C regulation (Holohan, 2007).

It is noted that the Fab-7 boundary may differ from the other characterised boundaries in the BX-C since no strong Patser match was found to the CTCF consensus in the functionally mapped Fab-7 boundary element. Although Fab-7 was not demonstrably enriched in the ChIP-array, significant CTCF association with Fab-7 was found in the more sensitive PCR-base ChIP assay. Given the lack of a strong Patser match (ChIP enrichment) this may suggest an indirect association. No CTCF site was seen between the abx/bx and the bxd/pbx regulatory elements. However, these elements are separated by a long distance, and it is not clear whether they require insulation (Holohan, 2007).

According to the domain model, the parasegment-specific regulatory domains that control the expression patterns of the Ubx, abd-A, and Abd-B genes of the BX-C are initially activated in appropriate parasegments by the early pattern-forming genes acting on initiator elements. Each regulatory domain is predicted to contain a particular initiator element, tuned to respond to a specific combination of gap and pair-rule gene products, thus activating the regulatory domain in the appropriate set of parasegments. This activation would be read by maintenance elements consisting of PREs that thereafter autonomously maintain each regulatory domain in either the OFF (silenced) or ON (active) state. Within a domain in the ON state, enhancers present in that domain would be able to engage with the relevant gene promoter and regulate expression of the gene. Boundary elements that flank each domain are proposed to restrict the effects of the initiator and maintenance elements to a single domain (Holohan, 2007).

Although boundary elements are postulated to have the common property of insulating the regulatory domains, no sequence similarity between the mapped boundary elements has been reported until now. This study shows that a set of these boundary elements contain CTCF binding sites and bind CTCF in vivo. CTCF has been shown to be required for the insulator activity of Fab-8, and it seems likely that CTCF will also be a required component at the other boundary elements. In support of this suggestion, it was found that the CTCF sites are well conserved within the sequenced insect genomes. The observation that CTCF sites flank a set of regulatory domains in the BX-C, together with the vertebrate studies that suggest that CTCF can mediate the formation of chromatin loops (Splinter, 2006; Yusufzai, 2004) supports the idea that interaction between CTCF sites may organise these domains into chromatin loops. However, how such a looping mechanism enables the autonomy of the individual regulatory domains and facilitates appropriate enhancer/promoter interactions is still unclear (Holohan, 2007).

A key feature of the domain model is the relationship between the boundary and maintenance elements. For the domains to be capable of independently being set to the ON or OFF state, the range of influence of PREs needs to be restricted by the domain boundaries. Each domain would require at least one PRE. Precise mapping of in vivo CTCF binding sites has enabled examination of their relationship with Polycomb target sites. In strong support of the domain model, it was found that the domains demarcated by CTCF sites contain Polycomb target sites. Indeed, an intimate relationship was found between CTCF and Polycomb binding sites for 'Fab-4', Mcp, Fab-6, and CTCF site 'C'. This fits with previous functional mapping indicating that boundary elements and PREs are closely associated at Fab-7, Fab-8, and Mcp. This arrangement would impose a polarity on the spread of chromatin modification from the PRE, such that modification may start at the PRE abutting one boundary and spread across the domain in one direction towards the next boundary. At the boundaries, CTCF may play many possible roles. It could participate in boundary element function allowing the independence of chromatin domains by acting as a chromatin insulator blocking the spread of chromatin modification. However, at the chicken ß-globin locus, the chromatin boundary appears to be separable from the CTCF binding site . Another possibility is suggested by that fact that CTCF has been demonstrated to block the progression of RNA polymerase (Zhao, 2004). This could potentially play an important role at boundaries in the BX-C to enable the independent function of PREs in neighbouring domains. There is considerable evidence that transcription through PREs may control their state, and many noncoding RNAs have been detected in the regulatory regions of the BX-C. One role for CTCF could be to act as a barrier to such noncoding transcription, preventing transcripts arising in one regulatory domain from crossing into the neighbouring domain and affecting the PRE state. Such a role would be consistent with the observed location of CTCF sites in this region, as a CTCF site closely abuts one side of each PRE (Holohan, 2007).

The individual regulatory domains must not only be able to act autonomously to set and maintain their activity state, but they must also be able to interact appropriately with the relevant gene promoters. Boundaries may play a role in this, and recently a long-range interaction has been demonstrated between Fab-7 and the Abd-B-RB promoter. This interaction was associated with lack of Abd-B expression, but similar interactions, bringing in appropriate enhancers, may also activate expression. The ability of CTCF to form clusters may facilitate such interactions, and it is intriguing that there are CTCF sites not only at the boundaries but also close to Abd-B promoters; the CTCF site 'B' is 300 bp upstream of the Adb-B-RB promoter. Clustering of boundaries together with Abd-B promoter sequences may enable interaction between the promoter and enhancers in domains in the ON state. The clustering may also be more selective; in S2 cells, which specifically express Abd-B-RB, several boundaries are embedded in chromatin bearing the repressive H3K27me3 modification, whereas Fab-8, CTCF site 'B', and the Abd-B-RB promoter are in the unmodified, presumably 'open', chromatin domain. It could be speculated that the expression of Abd-B-RB in these cells might be facilitated by interaction of the CTCF sites in the 'open' domain, Fab-8 and site 'B', enabling Fab-8 to bring appropriate enhancers to the Abd-B-RB promoter (Holohan, 2007).

ChIP-array analysis of CTCF genomic sites can be compared with ChIP-array analysis of binding sites for another Drosophila insulator-binding protein, Su(Hw). CTCF and Su(Hw) are both multi-zinc- finger DNA-binding proteins, and in both cases relatively long (~20 bp) consensus binding sites have been identified. In contrast to most DNA-binding proteins, it was found that strength of match to the consensus binding sites is a good predictor of in vivo occupancy. It was also investigated whether the data indicate any collaboration between CTCF and Su(Hw). This seemed an attractive possibility since removing Su(Hw) function in vivo has little effect; su(Hw) null mutant flies are female-sterile but viable. Also, the insulating activity of Fab-8 was significantly reduced when the CTCF sites were mutated but not completely abolished. However this study found no evidence for general colocalisation between CTCF and Su(Hw). A total of 60 Su(Hw) sites were identified in the Adh region, and only one of the fragments covering this region contained both CTCF and Su(Hw) sites. The single CTCF site identified in the achaete-scute complex was also some distance from the two Su(Hw) sites found. Subsequent ChIP-array analysis in the BX-C led to the identification of only one Su(Hw) site within the entire BX-C region, in a location devoid of CTCF binding sites. Indeed while the BX-C appears relatively enriched in CTCF sites compared to the Adh region, the converse is true for Su(Hw). For CTCF there are 4.7 sites/100 kb in the BX-C and 1.7 sites/100 kb in the Adh region, whereas for Su(Hw) the BX-C is depleted in sites with only 0.29/100 kb in comparison to 2.7/100 kb in the Adh region. Clearly, although CTCF and Su(Hw) both possess insulating ability, their sites of action do not correlate and there is no evidence from this analysis, covering approximately 3% of the Drosophila genome, for cooperative activity (Holohan, 2007).

By comparing the sequences of ChIP-enriched fragments a strong Drosophila consensus CTCF binding site was identified. Analysis of vertebrate CTCF target sequences leads to a proposal that vertebrate CTCF also binds to a similar consensus sequence. These findings do not support the current view that CTCF binds to divergent DNA sequences by engaging different subsets of the zinc fingers. Indeed, the binding site revealed here has been previously noted. Bell (1999) identified a CTCF binding site in the chicken β-globin insulator, and sequence comparisons between this site and other known CTCF sites identified a conserved 3' region, the mutation of which completely abolished CTCF binding and enhancer blocking. Filippova (2001) extended this comparison to include the Dm1 sites, mouse H19 DMD4 and DMD7 and human MYC A, and again identified a conserved region within the larger approximately 50-bp DNase footprint for each site. It is this conserved region that corresponds to the vertebrate CTCF site found here. Very recently, an analysis of CTCF binding in the human genome has generated a vertebrate CTCF consensus site (Kim, 2007), and a CTCF consensus has also been derived from analysis of conserved regions in the human genome (Xie, 2007). Both these sites are very similar to the consensus identified in this study; in particular they share the strong features of the CC at positions 1 and 2, the AG at positions 6 and 7, and the GGC at positions 10, 11, and 12. Overall, these findings indicate that CTCF in both Drosophila and vertebrates binds to a single core consensus sequence (Holohan, 2007).

In summary, ChIP-array analysis has enabled construction of a CTCF binding site consensus. Mapping of genomic binding sites leads to a proposal that all known or predicted insulators in the BX-C (with the possible exception of Fab-7) function in a CTCF dependent manner (Holohan, 2007).

The Drosophila insulator proteins CTCF and CP190 link enhancer blocking to body patterning

Insulator sequences guide the function of distantly located enhancer elements to the appropriate target genes by blocking inappropriate interactions. In Drosophila, five different insulator binding proteins have been identified, Zw5, BEAF-32, GAGA factor, Su(Hw) and dCTCF. Only dCTCF has a known conserved counterpart in vertebrates. This study found that the structurally related factors dCTCF and Su(Hw) have distinct binding targets. In contrast, the Su(Hw) interacting factor CP190 largely overlaps with dCTCF binding sites and interacts with dCTCF. Binding of dCTCF to targets requires CP190 in many cases, whereas others are independent of CP190. Analysis of the bithorax complex revealed that six of the borders between the parasegment specific regulatory domains are bound by dCTCF and by CP190 in vivo. dCTCF null mutations affect expression of Abdominal-B, cause pharate lethality and a homeotic phenotype. A short pulse of dCTCF expression during larval development rescues the dCTCF loss of function phenotype. Overall, this study demonstrates the importance of dCTCF in fly development and in the regulation of abdominal segmentation (Mohan, 2007).

The CP190 protein contains three classical C2H2 zinc-finger motifs and an N-terminal BTB/POZ domain. Both domains could potentially be involved in chromatin binding. In contrast, chromatin binding might be achieved by interaction with other factors, such as dCTCF. A possible interaction of dCTCF with CP190 was tested using co-immunoprecipitation. Precipitation of CP190 from Schneider cell extracts resulted in the detection of dCTCF. To confirm the interaction a FLAG-dCTCF fusion protein was expressed in Schneider cells and precipitated with either an antibody against CP190 or an antibody against FLAG. The CP190 precipitate contained endogenous dCTCF as well as FLAG-dCTCF in the same ratio as the input, suggesting that both dCTCF proteins are similarly associated with CP190. Furthermore, the reverse experiment using FLAG precipitation demonstrated that dCTCF and CP190 interact in vivo (Mohan, 2007).

Because CP190 and dCTCF colocalize on polytene chromosomes and interact in vivo, it was asked whether the overall amount of dCTCF protein might be changed in CP190-deficient third instar larvae. A Western blot analysis of both Cp1901 homozygotes (deficient in CP190) and wild-type larval extracts showed that the amount of dCTCF is reduced in Cp1901 homozygotes (Mohan, 2007).

Next it was of interest to know whether the reduced amount of dCTCF caused by the loss of CP190 affects dCTCF binding on the polytene chromosomes. It was found that the total number of dCTCF labeled sites is reduced in the Cp1901 mutant, whereas the number of CP190 sites was not affected by dCTCF mutants. The analysis of dCTCF binding in the two hypomorphic mutants CTCFEY15833/CTCFEY15833 and GE24185/GE24185 revealed that that the number of bound sites is reduced to about 50% and 25%, respectively. By close inspection of the chromosomes it was found that the set of dCTCF sites missing in the CP190 or in the dCTCF mutants overlap but are not identical. Thus, different sites vary in their requirement for CP190/dCTCF cooperation (Mohan, 2007).

Insulator elements with enhancer blocking activity establish independent regulatory domains. An analysis of binding sites (CTS) for the enhancer blocking factor dCTCF on salivary gland polytene chromosomes resulted in the identification of several hundred sites bound by dCTCF. All of these sites are found in interbands, and when inspected more precisely are often at the borders of interbands next to bands. Interbands harbor active housekeeping genes or regulatory regions of inactive genes, whereas bands contain the bodies of inactive genes. Interbands and bands differ in chromatin composition and modification. Thus, there is a clear border between interbands and bands. Any factors generating functional chromatin boundaries would be expected to be localized to the interband/band transition. This is not only the case for dCTCF, as a similar location has been found for Su(Hw). Also, BEAF-32 and Zw5 are located in interbands at hundreds of binding sites throughout the genome (Mohan, 2007).

The obvious question was whether dCTCF has a redundant function and therefore similar targets as the other Drosophila enhancer blocking factors. No significant colocalization of dCTCF with either BEAF-32 or with Su(Hw) on polytene chromosomes was detected. This may provide an explanation of how an organism with a small genome, such as Drosophila, can prevent promiscuous enhancer interaction with any nearby gene. Apparently, an elaborate system of different enhancer blockers and barrier factors fulfills the insulation of regulatory units (Mohan, 2007).

The biochemical composition and function of insulator complexes involving Su(Hw) have been studied in detail. The best studied binding site is the gypsy transposon with a 350-bp sequence containing 12 binding sites for Su(Hw). A functional complex of Su(Hw), Mod(mdg4)67.2, CP190, and possibly other factors has been documented (Capelson, 2005; Lei, 2006). Although there is no colocalization of Su(Hw) with dCTCF on polytene chromosomes, and only partial colocalization with Mod(mdg4), it was of interest to examine whether CP190 plays a role in dCTCF function. Vertebrate CTCF is a centrosomal factor during mitosis and a nuclear protein during interphase (Zhang, 2004), and that CP190 (centrosome binding protein) is associated with centrosomes as well. CP190 is essential for viability, but is not required for cell division (Butcher, 2004). CP190 knockdown in Schneider cells has no effect, whereas a null mutation in flies leads to pharate lethality. A similar phenotype is seen after dCTCF depletion in Schneider cells and in the pharate lethality in flies. The centrosomal function of CP190 is not required for the insulator activity in the context of Su(Hw) bound to gypsy (Pai, 2004). The localization of CP190 on polytene chromosomes overlaps with sites bound by Su(Hw) or by Mod(mdg4)67.2. In addition, CP190 is found at loci devoid of Su(Hw) or Mod(mdg4)67.2, suggesting that other factors might recruit CP190 to these sites (Pai, 2004). There is a significant overlap in dCTCF with CP190 binding sites. A functional dependence is seen, because at many sites binding of dCTCF depends on CP190. Although there is an overall reduction in the dCTCF amount observed in the CP190 mutant, differences in dCTCF occupancy in dCTCF and CP190 mutants indicate a discrimination between CP190-independent and -dependent sites. Furthermore, the previously characterized insulator Fab-8 is impaired in the absence of dCTCF (Moon, 2005) and by the reduction of CP190 (Mohan, 2007).

Another perspective on the requirement of insulators comes from the fact that many genes are controlled by several regulatory elements that are required for tissue and cell-specific expression. A prominent example is the Drosophila BX-C. This is one of two Hox gene clusters, which contain regulator genes controlling development. The BX-C is responsible for the correct specification of the posterior thorax segment (T3) and all of the abdominal segments. Within BX-C, only three protein coding genes, Ubx, abd-A and Abd-B, are responsible for the segment-specific development of organs and tissues. On the other hand, nine separate groups of many mutations are affecting segment-specific functions. The borders of some of these domains are genetically defined by elements Fab-6, Fab-7, Fab-8 and by Mcp. Proteins involved in such a functional separation are the GAGA factor in case of the Fab-7 element, and dCTCF for the Fab-8 sequence. Recently, it has been demonstrated that six of the BX-C domain junctions are bound by dCTCF (Holohan, 2007). Consequently, if these sites contribute to boundary function, gene activity within this locus should be changed. Indeed, a homeotic phenotype and a reduced expression of Abd-B was found in larval nerve cord. If dCTCF plays a central role in separating the different regulator domains in the BX-C and elsewhere in the genome, it is difficult to predict the dCTCF phenotype. The situation could be complicated as the three BX-C genes are controlling realizator genes as well as other regulators. Furthermore, individual BX-C genes repress others, for example Abd-B as well as the miRNA iab-4 and bxd expression repress Ubx. In addition, other factors, such as CP190 and perhaps additional unknown factors may contribute to the enhancer blocking function of dCTCF. For all of the CTS in the BX-C, dCTCF and CP190 binding was found. Although both factors clearly interact as seen by co-immunoprecipitation, CP190 may contact other DNA-bound factors as well, or may be directly targeted to chromatin (Mohan, 2007).

Thus, dCTCF shares several biochemical and functional features with Su(Hw), but is clearly targeted to dCTCF-specific sites. Overall, this study has shown that dCTCF is important for fly development, and has important functions in the regulation of abdominal segmentation (Mohan, 2007).

CTCF is required for Fab-8 enhancer blocking activity in S2 cells

CTCF is a conserved transcriptional regulator with binding sites in DNA insulators identified in vertebrates and invertebrates. The Drosophila Abdominal-B locus contains CTCF binding sites in the Fab-8 DNA insulator. Previous reports have shown that Fab-8 has enhancer blocking activity in Drosophila transgenic assays. This study now confirms the enhancer blocking capability of the Fab-8 insulator in stably transfected Drosophila S2 cells and shows this activity depends on the Fab-8 CTCF binding sites. Furthermore, knockdown of Drosophila CTCF by RNAi in these stable cell lines demonstrates that CTCF itself is critical for Fab-8 enhancer blocking (Ciavatta, 2007).

RNAi-independent role for Argonaute2 in CTCF/CP190 chromatin insulator function

A major role of the RNAi pathway in Schizosaccharomyces pombe is to nucleate heterochromatin, but it remains unclear whether this mechanism is conserved. To address this question in Drosophila, genome-wide localization of Argonaute2 (AGO2) by chromatin immunoprecipitation (ChIP)-seq was performed in two different embryonic cell lines; AGO2 was found to localize to euchromatin but not heterochromatin. This localization pattern is further supported by immunofluorescence staining of polytene chromosomes and cell lines, and these studies also indicate that a substantial fraction of AGO2 resides in the nucleus. Intriguingly, AGO2 colocalizes extensively with CTCF/CP190 chromatin insulators but not with genomic regions corresponding to endogenous siRNA production. Moreover, AGO2, but not its catalytic activity or Dicer-2, is required for CTCF/CP190-dependent Fab-8 insulator function. AGO2 interacts physically with CTCF and CP190, and depletion of either CTCF or CP190 results in genome-wide loss of AGO2 chromatin association. Finally, mutation of CTCF, CP190, or AGO2 leads to reduction of chromosomal looping interactions, thereby altering gene expression. It is proposed that RNAi-independent recruitment of AGO2 to chromatin by insulator proteins promotes the definition of transcriptional domains throughout the genome (Moshkovich, 2011).

This study provides the first evidence for an Argonaute protein functioning directly on euchromatin to effect changes in gene expression. The genome-wide binding profile of AGO2 displays striking overlap with insulator proteins. Genetic analysis revealed that AGO2, independent of its catalytic activity, promotes Fab-8 insulator activity. Like known insulator proteins, AGO2 also associates with promoters and can oppose PcG function. Genome-wide AGO2 recruitment to chromatin is dependent on CTCF and CP190 binding and may be partially achieved via looping interactions among cis-regulatory regions and promoters. It is proposed that AGO2 may act to facilitate or stabilize looping that is needed to partition the genome into independent transcriptional domains (Moshkovich, 2011).

These results suggest that the main function of AGO2 on chromatin resides in euchromatin and not in heterochromatin. Immunofluorescence localization of AGO2 on polytene chromosomes and cell lines indicates exclusion from heterochromatic and HP1-enriched regions. Furthermore, the majority of chromatin-associated AGO2 resides in nonrepetitive euchromatic but not repeat-rich regions, as determined by genome-wide ChIP-seq. It is suggested that the role of AGO2 in RNAi-dependent silencing of TEs occurs primarily at the post-transcriptional level and that AGO2 harbors a second RNAi-independent activity to promote chromatin insulator function (Moshkovich, 2011).

Several observations suggest that AGO2 chromatin association is mainly, if not exclusively, independent of the RNAi pathway. First, AGO2 chromatin association does not correspond to regions of the genome that produce high levels of endo-siRNAs, which are dependent on Dcr-2 and AGO2. Second, AGO2, but not Dcr-2, is required for Fab-8 insulator function. Finally, a catalytically inactive AGO2 protein, which is defective for RNAi, retains the ability to associate with chromatin and is functional with respect to both TrxG function and Fab-8 insulator activity (Moshkovich, 2011).

An intriguing question raised by these findings is whether or not the functions of AGO2 in RNAi and chromatin insulator activity are completely distinct. CP190 mutants were found to remain competent for silencing, suggesting that AGO2 chromatin association is not required for RNAi. Nevertheless, it remains possible that chromatin-associated AGO2 is loaded with siRNA. Future work will address how AGO2 subcellular localization and seemingly disparate functions in RNAi and chromatin insulator activities are regulated (Moshkovich, 2011).

A unique positive role for AGO2 but not other RNA silencing factors was identified in Fab-8 insulator function. Importantly, a catalytically inactive mutant form of AGO2 expressed at wild-type levels retains insulator activity, further suggesting that the RNAi pathway is dispensable for Fab-8 insulator function. A significant fraction of AGO2 resides in the nucleus, and physical interaction is observed between AGO2 and CP190. This interaction is insensitive to RNaseA, suggesting that RNA does not mediate the interaction between AGO2 and CP190. It remains possible that AGO2 can interact with siRNA or other RNA while associated with the insulator complex, although there is no evidence to support this hypothesis (Moshkovich, 2011).

This study shows that chromosomal looping in the Abd-B locus is dependent on CTCF, CP190, and AGO2. Confirming and extending previous studies, it was found that the Abd-B RB promoter interacts frequently with Fab-7, Fab-8, and the iab-8 enhancer and, moreover, that the Fab-8 region also contacts Fab-7 as well as multiple Abd-B promoters. Currently, the significance of insulator protein promoter association is unclear, but insulators may be thus situated to control looping interactions between promoters and cis-regulatory elements. Depletion of CP190 or CTCF reduces these high-frequency looping interactions, and loss of this specialized chromatin configuration could result in disassociation of AGO2. Given this possibility, AGO2 may act to detect the insulator-dependent conformation of this locus (Moshkovich, 2011).

AGO2 is recruited to chromatin insulator sites as well as noninsulator sites in a CTCF/CP190-dependent manner. It is speculated that AGO2 chromatin association with insulator sites could result from physical interactions with CP190 complexes, while AGO2 recruitment to other sites may be achieved at least in part by chromatin looping mediated by CP190 and CTCF. In fact, it was recently shown that PcG proteins can be transferred from a PRE to a promoter as a result of intervening insulator-insulator interactions. Once recruited to chromatin, AGO2 could perform a primarily structural function to promote or stabilize the frequency of CTCF/CP190-dependent looping interactions (Moshkovich, 2011).

AGO2 appears to promote Fab-8 insulator activity independently of an effect on gypsy insulator body localization. Previous work showed that both the gypsy class and CTCF/CP190 insulators colocalize to insulator bodies, suggesting that these subnuclear structures may be important for both gypsy and Fab-8 activities. However, since Fab-8 activity is not affected by RNA silencing components that disrupt gypsy insulator body localization, this subnuclear structure appears to be dispensable for Fab-8 function. Recent work indicates that the BX-C harbors multiple redundant cis-regulatory elements that can maintain looping interactions of this locus, suggesting that the configuration of the BX-C may not require a nuclear scaffold such as the gypsy insulator body (Moshkovich, 2011).

AGO2 mutations suppress the Polycomb phenotype, indicating that AGO2 behaves similarly to trxG genes and opposes PcG function. A previous study proposed that RNA silencing factors promote long-range PRE-dependent chromosomal pairing as well as PcG body formation but did not examine AGO2. This study found that the AGO251B-null mutation has no effect on Fab-X PRE pairing-dependent silencing on sd as assayed in that study, and genetic results suggest that AGO2 is unlikely to promote PRE-dependent interactions or PcG body formation, which are both positively correlated with PcG function. Interestingly, it has recently been shown in the case of AGO2-associated Fab-7 and Mcp boundary elements that long-range interactions are dependent on insulator sequences and not PREs. Future studies will elucidate the complex interplay between PcG and insulator organization as well as the role of AGO2 in the regulation of these structures (Moshkovich, 2011).

It remains to be seen whether Drosophila AGO2 euchromatin association and function may be conserved in other organisms. In Caenorhabditis elegans, the nuclear NRDE RNAi pathway can block transcriptional elongation of Pol II on a target transcript when treated with exogenous complementary dsRNA. Interestingly, this negative transcriptional effect is contemporaneous with an increase in H3K9me3. Whether the Argonaute protein NRDE-3/WAGO-12, which lacks Slicer activity, associates with euchromatin to effect this repression is not yet known. Furthermore, the C. elegans Argonaute Csr-1, loaded with 22G endo-siRNAs antisense to mRNAs of holocentric chromosomes, may serve as chromosomal attachment points to promote efficient chromosome segregation. Recently, it has been shown that Schizosaccharomyces pombe Ago1 participates in surveillance mechanisms to prevent readthrough transcription of mRNA. However, the majority of Ago1 associates with heterochromatic regions, and it is not clear thus far whether Ago1 directly associates with euchromatin or acts post-transcriptionally. An emerging theme from studies of RNAi in various model systems is that genome integrity and control of gene expression may be achieved by multiple yet overlapping mechanisms (Moshkovich, 2011).

Functional interaction between the Fab-7 and Fab-8 boundaries and the upstream promoter region in the Drosophila Abd-B gene

Boundary elements have been found in the regulatory region of the Drosophila Abdominal-B gene, which is subdivided into a series of iab domains. The best-studied Fab-7 and Fab-8 boundaries flank the iab-7 enhancer and isolate it from the four promoters regulating Abd-B expression. Recently binding sites for the Drosophila homolog of the vertebrate insulator protein CTCF (dCTCF) were identified in the Fab-8 boundary and upstream of Abd-B promoter A, with no binding of CTCF to the Fab-7 boundary being detected either in vivo or in vitro. Taking into account the inability of the yeast GAL4 activator to stimulate the white promoter when its binding sites are separated by a 5-kb yellow gene, a study was performed of the functional interactions between the Fab-7 and Fab-8 boundaries and between these boundaries and the upstream promoter A region containing a dCTCF binding site. It was found that dCTCF binding sites are essential for pairing between two Fab-8 insulators. However, a strong functional interaction between the Fab-7 and Fab-8 boundaries suggests that additional, as yet unidentified proteins are involved in long-distance interactions between them. Fab-7 and Fab-8 boundaries effectively interact with the upstream region of the Abd-B promoter (Kyrchanova, 2008).

Previously it was found that the relative orientation of Mcp elements defines the mode of loop formation that either allows or blocks stimulation of the white promoter by the GAL4 activator. This study has demonstrated that two PTS/F8 boundaries or Fab-8 insulators alone are also capable of orientation-dependent interaction. When these elements are located in opposite orientations, the loop configuration is favorable for communication between regulatory elements located beyond the loop. The loop formed by two insulators located in the same orientation juxtaposes two elements located within and beyond the loop, which leads to partial isolation of the GAL4 binding sites and the white promoter placed on the opposite sides of the insulators (Kyrchanova, 2008).

The orientation-dependent interaction may be accounted for by at least two proteins bound to the insulator that are involved in specific protein-protein interactions. In the case of a Fab-8 insulator, dCTCF is likely to be directly involved in pairing between two insulators. Since mutated Fab-8 insulators devoid of dCTCF binding sites proved to be incapable of interacting with each other, it is hypothesized that dCTCF facilitates the binding of a certain as yet unidentified protein (or proteins) that, in combination with dCTCF, accounts for orientation-dependent interaction between the Fab-8 insulators. Functional interactions between the Fab-7 boundary devoid of dCTCF binding sites and PTS/F8 or the upstream Abd-B A promoter region are also evidence for the existence of unidentified proteins that support organization of distance interactions in the Abd-B locus (Kyrchanova, 2008).

Recently it was shown that in the repressed state of the bithorax complex, all of its major regulatory elements binding PcG proteins, including PREs with adjacent boundaries and core promoters, interact at a distance, giving rise to a topologically complex structure (Lanzuolo, 2007). The question arises as to what proteins are important for such interactions. All PREs tested (Lanzuolo, 2007) were flanked by boundaries, suggesting that all these regulatory elements may be involved in long-distance interactions. As shown previously, the Fab-7 or Mcp boundaries including PREs can support physical association between even transposons located on different chromosomes. One of relevant models proposes that PcG proteins are capable of supporting highly specific long-distance interactions between transposons (Lanzuolo, 2007). However, it is known that many PcG complexes with similar properties can bind to Drosophila chromosomes, which leaves open the question as to how such protein complexes can ensure a high specificity of interactions between distantly located transposons. Moreover, there is no experimental evidence that PREs without additional regulatory elements can support long-distance interactions. In contrast, there are many proven cases showing that insulator proteins are involved in physical association between distant chromosomal regions. For example, the interaction between gypsy insulators can support activation of the yellow promoter by enhancers separated by many megabases. The Mod(mdg4)-67.2 and Su(Hw) proteins bound to the gypsy insulator are essential for such long-distance interactions. In mammals, the interaction of the imprinting control region on chromosome 7 with the Wsb1/Nf1 locus on chromosome 11 depends on the presence of the CTCF protein. In vivo interaction between Fab-7 and the Abd-B promoter is absolutely dependent on the presence of the Fab-7 insulator. Finally, this study has demonstrated the functional interaction between the Fab-7 and Fab-8 boundaries and the Abd-B promoter. These results support the model that transcriptional factors bound to boundaries can facilitate enhancer-promoter interactions in the bithorax complex. Further studies are necessary for identifying new proteins involved in long-distance interactions and for elucidating the mechanisms that allow interactions either between proper active enhancers and promoters or between only silenced enhancers and promoters (Lanzuolo, 2008).

CTCF is expressed during the transition from a nucleosome-based to a protamine-based chromatin configuration during spermiogenesis in Drosophila

In higher organisms, the chromatin of sperm is organised in a highly condensed protamine-based structure. In pre-meiotic stages and shortly after meiosis, histones carry multiple modifications. This study focused on post-meiotic stages and shows that also after meiosis, histone H3 shows a high overall methylation of K9 and K27; it was hypothesised that these modifications ensure maintenance of transcriptional silencing in the haploid genome. Furthermore, histones are lost during the early canoe stage, and just before this stage, hyper-acetylation of histone H4 and mono-ubiquitylation of histone H2A occurs. It is believed that these histone modifications within the histone-based chromatin architecture may lead to better access of enzymes and chromatin remodellers. This notion is supported by the presence of the architectural protein CTCF, numerous DNA breaks, SUMO, UbcD6 and high content of ubiquitin, as well as testes-specific nuclear proteasomes at this time. Moreover, the first transition protein-like chromosomal protein to be found in Drosophila, Tpl94D, is reported. It is proposed that Tpl94D (an HMG box protein) and the numerous DNA breaks facilitate chromatin unwinding as a prelude to protamine and Mst77F deposition. Finally, it is showm that histone modifications and removal are independent of protamine synthesis (Rathke, 2007).

The switch between a nucleosome-based chromatin configuration and a protamine-based structure is a specialised form of chromatin remodelling in the male germline. The mammalian zinc finger protein CTCF is involved in many epigenetic processes. Furthermore, paralogous variant of CTCF which is testis-specifically expressed, called BORIS, is exclusively expressed in the mammalian male germline. The function of BORIS in this context is still not clear (Loukinov, 2002). Drosophila, in contrast to mammals, contains only one CTCF gene (Moon, 2005). It was therefore asked whether Drosophila CTCF is also expressed in the testes, and immunostaining and anti-histone staining was performed on testes of transgenic flies expressing protamine-eGFP. CTCF expression was observed during pre-meiotic and meiotic stages at the chromosomes as has been shown for mitotic cell division in mammalian cell culture. Shortly after meiosis, CTCF is visible in young elongating nuclei, where it co-localises with the chromatin as indicated by the histone distribution. CTCF is also present in the early and late canoe stage spermatid heads. At the early canoe stage, CTCF is very diffusely distributed in comparison to histones. CTCF does not co-localise with the chromatin which starts to condense at one side of the nucleus. This diffuse distribution is still visible at the late canoe stage when protamine-eGFP starts to be deposited to the chromatin. CTCF is no longer detectable after the canoe stage. The earlier chromatin-associated CTCF localisation might indicate a very early role in chromatin reorganisation at the switch between the nebenkern and canoe stage. Furthermore, CTCF might be associated primarily with the chromatin, which is not yet condensing during these stages. The late canoe stage is the only post-meiotic stage where distinct regions of RNA polymerase II are found with an antibody directed against a phosphorylated subunit of active polymerase, indicative of transcription. At this precise stage, only a very small set of genes is thought to be transcribed. Also CTCF expression during chromatin reorganisation in the nucleus was detected in D. hydei (Rathke, 2007).

Sperm morphogenesis is characterised by an impressive degree of changes in cell architecture based on stored, translationally repressed mRNAs that are recruited at the appropriate time to the polysomes. Among these are mRNAs that encode Tpl94D and protamines. A dramatic switch in structure from the nucleosomal- to the protamine-based structure of chromatin takes place, and this remarkable chromatin reorganisation of the complete genome is a typical feature depending on stored mRNAs, e.g. for protamine synthesis. This process ultimately leads to an extremely condensed state of the haploid genome in the sperm, which is essential for male fertility in mammals. This study focused on the switch between a nucleosomal- and a protamine-based chromatin reorganisation. The major steps in chromatin organisation take place in the canoe stage of spermatid development. A candidate for a transition protein in Drosophila was identified. The corresponding gene tpl94D (CG31281) encodes a predicted basic high mobility group (HMG) protein of 18.8 kDa. In transgenic flies, Tpl94D-eGFP fusion proteins are expressed solely during the switch between histones and protamines, as is typical for mammalian transition proteins. Since a highly similar chain of events to those reported in mammals is observed, the Drosophila system is considered an excellent choice to study the mechanism of chromatin remodelling during male germ cell development (Rathke, 2007).

Generally, the bulk of histones, including their diverse modifications in the N-terminal tail, appear to be removed during the canoe stage. Furthermore, the nucleus accumulates ubiquitin at the early canoe stage, when mono-ubiquitylation of histone H2A is no longer detectable. Therefore, taking into account the known presence of proteasomes in the nucleus at this stage of chromatin reorganisation and the overlap of expression shown in this study, it is hypothesised that this ubiquitylation is targeting histones for degradation. This study investigated several mutants having mutations in ubiquitin-conjugating enzymes or ubiquitin ligases, exhibiting arrested spermiogenesis during spermatid development and that are male sterile. However, in all investigated mutants, histone removal is indistinguishable from that of wild-type flies (Rathke, 2007).

Many histone modifications were found after meiosis and were categorised into three classes (Rathke, 2007).

  1. Histone modifications that persist from pre-meiotic stages and keep the genome silent.

    The vast majority of the genome is transcriptionally silent in post-meiotic stages. This is accompanied by multiple histone modifications that persist from pre-meiotic stages and indicate silencing such as H3K9 and H3K27 methylation. These modifications do not change significantly during post-meiotic stages, which is in agreement with the hypothesis that these modifications predominantly play a role in maintaining transcriptional silencing. Previously, phosphorylation of histones have been analysed during spermatogenesis. Phosphorylated histone H4S1 and H3S10 are present during meiotic divisions. H3S10 phosphorylation is hardly detectable after meiosis, whereas phosphorylation of H4S1 persists until chromatin compaction starts.

  2. Histone modifications that persist from pre-meiotic stages and characterise transcriptionally active chromatin.

    The primary spermatocyte phase is characterised by a high level of transcriptional activity of housekeeping genes. In addition, genes are transcribed that are needed for the subsequent steps in spermatogenesis, as the majority of transcription ceases once meiotic division starts. H4 acetylation and H3K4 and H4R3 methylation of histones were investigated. These histone modifications, which are indicative of transcriptional activity, persist until histone degradation.

  3. Increasing or de novo appearance of histone modifications that decrease the affinity between histones and DNA as a prelude to histone removal.

    It might be that H4 hyper-acetylation, as postulated for mammals and/or other secondary modifications of histones are the first step towards histone removal. The fact that these modifications are conserved between mammals and flies adds support to this hypothesis. Indeed, histone H4 acetylation is very pronounced at the canoe stage and de novo mono-ubiquitylation of histone H2A is seen in round spermatids. Both types of histone modifications are proposed to be necessary for opening the chromatin and decreasing the contact between DNA and histones. The fact that histone H2A mono-ubiquitylation vanishes before the early canoe stage, thus before the hyper-acetylation of histone H4, leads to thinking about a stepwise remodelling of the chromatin. This study proposes that these histone modifications open the chromatin, so that enzymes and regulators have access to histone-based chromatin and can induce and prepare the reorganisation of the genome in the male germline.

It remains to be clarified whether and how these histone modifications influence the topology of the chromatin as a prelude to histone removal as well as for Tpl94D, Mst77F and protamine deposition. A functional approach based on analysis of mutants of histone-modifying enzymes is difficult, as all characterised histone-modifying enzymes are already active during Drosophila development or at least in spermatogonia and spermatocytes. Therefore a tissue-specific knock-out mutant would most probably exhibit arrest of spermatogenesis before meiosis, rendering it useless for experimental purposes (Rathke, 2007).

At the first glance, it might seem surprising that histones and all their modifications are removed. Instead of specifically reverting the differentially modified histones to their unmodified state, they are removed together with all histones. This might allow the paternal genome to form nucleosomes with unmodified histones after fertilisation and before zygote formation. Thus, the paternal genome starts embryogenesis with a nucleosomal chromatin lacking histone modifications (Rathke, 2007).

The data show that most of the histones are removed between the early and late canoe stage; such a process requires a loosening of contact between the histones and DNA, which in turn requires an unwinding of the chromatin structure. It is proposed that this unwinding process is facilitated by DNA nicks as they were widespread at this stage of chromatin reorganisation. Finally, Tpl94D, UbcD6 and SUMO were also observed to accumulate in the chromatin during this process. DNA breaks, Tpl94D, UbcD6 and SUMO were no longer detectable when protamines were fully expressed. Thus, it is proposed that all these proteins and the DNA breaks act together in an unknown manner to allow chromatin remodelling (Rathke, 2007).

The CTCF protein is present during pre-meiotic stages in the nucleus and stays associated with the chromosomes during meiosis. After meiosis, however, strong localisation to the nucleus is detected during the transition from round spermatid nuclei to the early canoe stage of spermiogenesis. It is speculated that CTCF might set borders in the chromatin for the histone modifications, which are characteristic of the canoe stage, such as acetylation and ubiquitylation. CTCF is visible for longer than histones and disappears together with active RNA polymerase II. CTCF might maintain chromatin accessibility to RNA polymerase II since a few genes are known to be transcribed at this time. In addition, transient occurrence of RNA polymerase II at the late canoe stage might require CTCF to insulate active genes from inactive ones. This idea needs to be tested in tissue-specific CTCF loss-of-function mutants; such mutants are, however, currently unavailable (Rathke, 2007).

The question of whether histone removal is dependent on a signal that monitors the start of protamine and Mst77F mRNA translation was addressed. Both histone modification and degradation are indistinguishable from the wild-type in loss-of-function mutants of Mst35Ba and Mst35Bb, the genes encoding protamine A and B, respectively. Also in nc3 mutants of Mst77F, histone removal is not disturbed. It is concluded that N-terminal tail modification of histones and histone degradation, on the one hand, and protamine deposition, on the other, are controlled by different pathways in the cell (Rathke, 2007).

In mammals, it is well known that after meiosis the nucleosomal conformation is lost. This is accompanied by the appearance of testis-specific linker histones. So far, no linker histone variants have been identified in Drosophila, but variants of H2A (H2AvD) and H3 (H3.3) are known. In mammals, histones are hyper-acetylated before being displaced from the DNA, and phosphorylation and ubiquitylation have also been proposed to occur. For Drosophila, H2A mono-ubiquitylation and a strong increase in H4 acetylation occur shortly before histone removal and degradation. In mammals, histones are replaced first by transition proteins (major types: TP1 and TP2). This study identified the high mobility group protein Tpl94D, a first probable candidate for a functional homologue of mammalian transition proteins. In mammals, transition proteins are subsequently replaced by protamines leading to chromatin with a doughnut structure. In Drosophila, it has recently been shown that the sperm nucleus also contains protamines. Protamines A and B are encoded by two closely related protamine genes, Mst35Ba and Mst35Bb. In addition, the identification of Mst77F shows that sperm nuclei contain at least one further abundant chromatin component. Moreover, in human sperm several new putative protamines have been identified by 2D gel electrophoresis and protein sequencing. In mammals, this chromatin reorganisation is essential for male fertility. Male flies carrying the deletion protDelta38.1, where both protamines as well as three additional ORFs are removed, show severely reduced fertility (Rathke, 2007).

In summary, a step-by-step scheme is proposed for chromatin reorganisation: (1) histone modifications lead to subsequent histone removal and degradation; (2) the exposed chromatin becomes nicked, resulting in DNA breaks; (3) Tpl94D deposition constitutes an intermediate stage that triggers subsequent protamine-based chromatin organisation (Rathke, 2007).

Since many features concerning spermiogenesis are conserved between Drosophila and mammals, it is proposed that Drosophila is an ideal system to gain further insight into the mechanism of chromatin reorganisation in spermatid nuclei, a process that is crucial for male fertility (Rathke, 2007).

Functions of CTCF orthologs in other species

YY1 and CTCF orchestrate a 3D chromatin looping switch during early neural lineage commitment

CTCF is an architectural protein with a critical role in connecting higher-order chromatin folding in pluripotent stem cells. Recent reports have suggested that CTCF binding is more dynamic during development than previously appreciated. This study set out to understand the extent to which shifts in genome-wide CTCF occupancy contribute to the 3D reconfiguration of fine-scale chromatin folding during early neural lineage commitment. Unexpectedly, a sharp decrease in CTCF occupancy was observed during the transition from naive/primed pluripotency to multipotent primary neural progenitor cells (NPCs). Many pluripotency gene-enhancer interactions are anchored by CTCF, and its occupancy is lost in parallel with loop decommissioning during differentiation. Conversely, CTCF binding sites in NPCs are largely preexisting in pluripotent stem cells. Only a small number of CTCF sites arise de novo in NPCs. Another zinc finger protein, Yin Yang 1 (YY1), was identified at the base of looping interactions between NPC-specific genes and enhancers. Putative NPC-specific enhancers exhibit strong YY1 signal when engaged in 3D contacts and negligible YY1 signal when not in loops. Moreover, siRNA knockdown of Yy1 specifically disrupts interactions between key NPC enhancers and their target genes. YY1-mediated interactions between NPC regulatory elements are often nested within constitutive loops anchored by CTCF. Together, these results support a model in which YY1 acts as an architectural protein to connect developmentally regulated looping interactions; the location of YY1-mediated interactions may be demarcated in development by a preexisting topological framework created by constitutive CTCF-mediated interactions (Beagan, 2017).

Transcriptional dysregulation of MYC reveals common enhancer-docking mechanism

Transcriptional dysregulation of the MYC oncogene is among the most frequent events in aggressive tumor cells, and this is generally accomplished by acquisition of a super-enhancer somewhere within the 2.8 Mb TAD where MYC resides. These diverse cancer-specific super-enhancers, differing in size and location, interact with the MYC gene through a common and conserved CTCF binding site located 2 kb upstream of the MYC promoter. Genetic perturbation of this enhancer-docking site in tumor cells reduces CTCF binding, super-enhancer interaction, MYC gene expression, and cell proliferation. CTCF binding is highly sensitive to DNA methylation, and this enhancer-docking site, which is hypomethylated in diverse cancers, can be inactivated through epigenetic editing with dCas9-DNMT. Similar enhancer-docking sites occur at other genes, including genes with prominent roles in multiple cancers, suggesting a mechanism by which tumor cell oncogenes can generally hijack enhancers. These results provide insights into mechanisms that allow a single target gene to be regulated by diverse enhancer elements in different cell types (Schuijers, 2018).

Aberrant transcriptional activation of the MYC oncogene occurs frequently in tumor cells and is associated with tumor aggression. MYC resides within a 2.8 Mb TAD and its aberrant activation is generally accomplished by acquisition of a super-enhancer somewhere within that TAD. How these diverse cancer-specific super-enhancers loop long distances to specifically interact with MYC has not been clear. This study finds that the diverse super-enhancers commonly interact with, and depend on, a conserved CTCF binding site located 2 kb upstream of the MYC promoter. Because tumor super-enhancers can encompass genomic regions as large as 200 kb, and CTCF occupies sites that occur on average every 10 kb, there is considerable opportunity for super-enhancers to adventitiously contain a CTCF-bound site, which in turn could serve to interact with the MYC CTCF site. Thus, different tumor super-enhancers have the opportunity to form through diverse mechanisms throughout this large TAD and can exploit the MYC CTCF site to interact with and activate MYC expression (Schuijers, 2018).

The concept that enhancer-promoter interactions generally occur within larger chromosomal loop structures such as TADs, which are themselves often formed by the interaction of CTCF proteins bound to each of the TAD loop anchors, is supported by the observations described here. These larger loop structures tend to insulate enhancers and genes within the CTCF-CTCF loops from elements outside those loops. Constraining DNA interactions within CTCF-CTCF loop structures in this manner may facilitate proper enhancer-promoter contacts (Schuijers, 2018).

The evidence described in this study argues that diverse human tumor cell super-enhancers depend on the MYC CTCF site for optimal levels of enhancer-promoter looping and mRNA expression. A recent independent study in K562 cells used a tiling CRISPR screen to systematically perturb the MYC locus and also found that full MYC expression and cell proliferation is dependent on this region, and some translocated enhancers can drive MYC expression in the absence of this CTCF site (Schuijers, 2018).

There are several potential explanations for these diverse results. It is possible that the −2 kb CTCF site is important for optimal MYC expression levels in human cells, but not in mice. It is conceivable that the deletion of a region containing the CTCF site can be compensated by features of the new enhancer landscape in the deletion mutations. Furthermore, additional mechanisms normally involved in enhancer-promoter interactions, such as YY1-YY1 interactions, may mask the loss of the CTCF site in vivo; YY1 is present in the MYC promoter region and is thus likely to contribute to DNA looping and expression (Schuijers, 2018).

These studies suggest that an additional set of human genes, beyond MYC, may utilize promoter-proximal enhancer-docking sites to mediate cell-type-specific enhancer-promoter interactions. Such CTCF-mediated enhancer-promoter interactions are generally nested within larger CTCF-mediated loops that would function as insulated neighborhoods. At these genes with CTCF-mediated enhancer docking, the promoter-proximal enhancer-docking sites tend to be constitutively bound by CTCF and these binding sites tend to be highly conserved. Indeed, two studies have reported that these genes tend to lose expression upon perturbation of CTCF, consistent with a role for CTCF in enhancer-promoter looping. Among these genes are cancer-associated genes that likely employ this mechanism to engender interactions with tumor-specific enhancers. For example, at CSNK1A1, a drug target in acute myeloid leukemia (AML) tumor cells, the evidence suggests that super-enhancers in these cancer cells use a CTCF enhancer-docking mechanism to interact with the oncogene. Thus, a CTCF-dependent enhancer-docking mechanism, which presumably facilitates interaction with different cell-specific enhancers during development, is exploited by cancer cells to dysregulate expression of prominent oncogenes (Schuijers, 2018).

MYC dysregulation is a hallmark of cancer. The c-MYC TF is an attractive target for cancer therapy because of the role that excessive c-MYC levels play in a broad spectrum of aggressive cancers, but direct pharmacologic inhibition of c-MYC remains an elusive challenge in drug discovery . The MYC enhancer-docking site, and presumably those of other oncogenes, can be repressed by dCas9-DNMT-mediated DNA methylation. Oncogene enhancer-docking sites may thus represent a vulnerability in multiple human cancers (Schuijers, 2018).

The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome

Chromatin structure plays an important role in modulating the accessibility of genomic DNA to regulatory proteins in eukaryotic cells. An integrative analysis on dozens of recent datasets generated by deep-sequencing and high-density tiling arrays revealed an array of well-positioned nucleosomes flanking sites occupied by the insulator binding protein CTCF across the human genome. These nucleosomes are highly enriched for the histone variant H2A.Z and 11 histone modifications. The distances between the center positions of the neighboring nucleosomes are largely invariant, and they were estimated to be 185 bp on average. Surprisingly, subsets of nucleosomes that are enriched in different histone modifications vary greatly in the lengths of DNA protected from micrococcal nuclease cleavage (106-164 bp). The nucleosomes enriched in those histone modifications previously implicated to be correlated with active transcription tend to contain less protected DNA, indicating that these modifications are correlated with greater DNA accessibility. Another striking result obtained from this analysis is that nucleosomes flanking CTCF sites are much better positioned than those downstream of transcription start sites, the only genomic feature previously known to position nucleosomes genome-wide. This nucleosome-positioning phenomenon is not observed for other transcriptional factors for which genome-wide binding data was available. It is suggested that binding of CTCF provides an anchor point for positioning nucleosomes, and chromatin remodeling is an important component of CTCF function (Fu, 2008).

CCCTC-binding factor and the transcription factor T-bet orchestrate T helper 1 cell-specific structure and function at the interferon-gamma locus

How cell type-specific differences in chromatin conformation are achieved, and their contribution to gene expression are incompletely understood. This study identified a cryptic upstream orchestrator of interferon-γ (Ifng) transcription, which is embedded within the human IL26 gene, compromised of a single CTCF-binding site and retained in all mammals, even surviving near-complete deletion of IL26 in rodents. CTCF and cohesins occupy this element in vivo in a cell-type non-specific manner. This element is approximated with two other sites located within the first intron and downstream of Ifng, where CTCF, cohesins and T-bet bind in a Th1-specific manner. These interactions, approximation of other elements within the locus to each other and to Ifng, and robust Ifng expression are dependent on CTCF and T-bet. The results demonstrate that cooperation between architectural (CTCF) and transcriptional enhancing (T-bet) factors and the elements to which they bind is required for proper Th1-specific expression of Ifng (Sekimata, 2009).

Gene-specific repression of the p53 target gene PUMA via intragenic CTCF-Cohesin binding

The p53 transcriptional program orchestrates alternative responses to stress, including cell cycle arrest and apoptosis, but the mechanism of cell fate choice upon p53 activation is not fully understood. PUMA (p53 up-regulated modulator of apoptosis), a key mediator of p53-dependent cell death, is regulated by a noncanonical, gene-specific mechanism. Using chromatin immunoprecipitation assays, it was found that the first half of the PUMA locus (approximately 6 kb) is constitutively occupied by RNA polymerase II and general transcription factors regardless of p53 activity. Using various RNA analyses, it was found that this region is constitutively transcribed to generate a long unprocessed RNA with no known coding capacity. This permissive intragenic domain is constrained by sharp chromatin boundaries, as illustrated by histone marks of active transcription (histone H3 Lys9 trimethylation [H3K4me3] and H3K9 acetylation [H3K9Ac]) that precipitously transition into repressive marks (H3K9me3). Interestingly, the insulator protein CTCF (CCCTC-binding factor) and the Cohesin complex occupy these intragenic chromatin boundaries. CTCF knockdown leads to increased basal expression of PUMA concomitant with a reduction in chromatin boundary signatures. Importantly, derepression of PUMA upon CTCF depletion occurs without p53 activation or activation of other p53 target genes. Therefore, CTCF plays a pivotal role in dampening the p53 apoptotic response by acting as a gene-specific repressor (Gomes, 2010).

Mediation of CTCF transcriptional insulation by DEAD-box RNA-binding protein p68 and steroid receptor RNA activator SRA

CCCTC-binding factor (CTCF) is a DNA-binding protein that plays important roles in chromatin organization, although the mechanism by which CTCF carries out these functions is not fully understood. Recent studies show that CTCF recruits the cohesin complex to insulator sites and that cohesin is required for insulator activity. This study showed that the DEAD-box RNA helicase p68 (DDX5) and its associated noncoding RNA, steroid receptor RNA activator (SRA), form a complex with CTCF that is essential for insulator function. p68 was detected at CTCF sites in the IGF2/H19 imprinted control region (ICR) as well as other genomic CTCF sites. In vivo depletion of SRA or p68 reduced CTCF-mediated insulator activity at the IGF2/H19 ICR, increased levels of IGF2 expression, and increased interactions between the endodermal enhancer and IGF2 promoter. p68/SRA also interacts with members of the cohesin complex. Depletion of either p68 or SRA does not affect CTCF binding to its genomic sites, but does reduce cohesin binding. The results suggest that p68/SRA stabilizes the interaction of cohesin with CTCF by binding to both, and is required for proper insulator function (Yao, 2010).

A number of investigations have examined the role of factors that interact with CTCF and affect its function as an insulator protein. The chromodomain helicase protein CHD8 has been shown to be important for insulation, although its mode of action is not known. Also, it has been shown that CTCF recruits the cohesin complex to its binding sites, and that the presence of cohesin is essential to insulator activity, probably because it stabilizes long-range intranuclear interactions between CTCF sites (Li, 2008; Parelho, 2008; Wendt, 2008). This study showed that the DEAD-box RNA-binding protein p68 (DDX5) interacts with CTCF both in vivo and in vitro, and that the noncoding RNA SRA, a functionally important RNA known to associate with p68, immunoprecipitates with CTCF. It was also shown that both p68 protein and SRA are necessary for the activity of CTCF as an insulator element in vivo (Yao, 2010).

The DEAD-box RNA helicase p68 and the partially homologous protein p72 are RNA-binding proteins that are involved in a wide variety of regulatory and biosynthetic functions. p68 is required for ribosome biogenesis, and its ATPase/helicase activities are important for pre-mRNA splicing and microRNA processing. Notably, p68 is an essential component of the Drosha complex. p68 also functions as a cofactor for a variety of transcriptional regulatory proteins, including ERα, the p53 tumor suppressor, and MyoD. At least some of these do not require the helicase activity of p68, and probably involve a distinct independent mechanism. In many of these cases, the active form of p68 may involve a complex with p72. Although this study focused on the role of p68, data suggest that p72 also plays a role in CTCF function. The p68/p72 protein specifically bind SRA, a functionally important RNA, and it has been shown that the coactivation of MyoD by p68 depends on the presence of SRA. SRA has also been shown to bind other proteins and modulate their activity. Since some splice variants of SRA code for protein, it is necessary to distinguish the activity of the RNA from that of the protein (Yao, 2010).

The CTCF-p68 interaction is critical to CTCF function as an enhancer-blocking insulator, as demonstrated by transient expression experiments with a reporter carrying CTCF-binding sites in which p68 is depleted by shRNA. Additionally, ChIP experiments show that p68 is present at the ICR of the human IGF2/H19 locus on chromosome 11 in HeLa cells, as well as the equivalent site on mouse chromosome 7 in MEF cells. Depletion of p68 results in an increase in IGF2 expression and a decrease in H19 expression, similar to that observed in HeLa cells upon depletion of cohesin components (Wendt, 2008). Loss of p68 also results in an increase in genomic contacts, as measured by 3C, between the endodermal enhancer and sites upstream of IGF2, consistent with loss of insulator function. The loss of p68 is not accompanied by a decrease in CTCF binding. It was also found that the binding of p68 to CTCF is RNA-dependent: Depletion of ssRNA by RNase A or down-regulation of SRA inhibited the CTCF-p68 interaction. This is similar to the behavior of the interaction between p68 and p53. It is therefore not surprising that the ability of CTCF to act as an insulator also depends on SRA. The protein SRAP did not coprecipitate with CTCF either in vitro or in vivo, suggesting that it is not involved in the CTCF-p68 interaction (Yao, 2010).

It has been reported that the Drosophila DEAD-box putative RNA helicase protein Rm62, which is homologous to p68, interacts physically with the DNA-binding insulator protein CP190 in an ssRNA-dependent manner and negatively regulates gypsy insulator function (Lei, 2006). It is striking that, in the case of Drosophila, the interacting factors are different from (CP190 rather than CTCF) and the effects are the opposite of (inhibitory rather than activating) those in vertebrates. These results hint at a common mechanism of action that has diverged (Yao, 2010).

What is the role of p68 in CTCF-dependent insulator function? In addition to its interaction with CTCF, p68 also bound to a component of the cohesin complex in vitro. Cohesin interacts with CTCF and is essential to insulator function. Previous studies have shown that loss of cohesin does not affect CTCF binding at most sites (Parelho, 2008). Similarly, depletion of p68 or its associated RNA, SRA, did not affect CTCF binding to the IGF2/H19 ICR in vivo; however, depletion of either did result in loss of cohesin from those sites, showing that the interactions observed in vitro are important in vivo. These data support a model in which cohesin binding to CTCF at the IGF2/H19 locus is further stabilized by cohesin interaction with p68/SRA. It is suggested that the effects on insulator function that were observed when p68/SRA is depleted reflect, at least in part, the loss of cohesin from the site. p68 is also found at sites occupied by ERα and cohesin. It will be interesting to determine whether, at such sites, p68 plays a role in stabilizing cohesin localization (Yao, 2010).

These results show that CTCF sites, the majority of which recruit cohesin, may require additional components to establish long-range interactions and maintain an active insulator complex. It remains to be determined whether p68/SRA, known to have multiple regulatory activities, contributes in other ways to CTCF function (Yao, 2010).

Nonallelic transcriptional roles of CTCF and cohesins at imprinted loci

The cohesin complex holds sister chromatids together and is essential for chromosome segregation. Recently, cohesins have been implicated in transcriptional regulation and insulation through genome-wide colocalization with the insulator protein CTCF, including involvement at the imprinted H19/Igf2 locus. CTCF binds to multiple imprinted loci and is required for proper imprinted expression at the H19/Igf2 locus. This study reports that cohesins colocalize with CTCF at two additional imprinted loci, the Dlk1-Dio3 and the Kcnq1/Kcnq1ot1 loci. Similar to the H19/Igf2 locus, CTCF and cohesins preferentially bind to the Gtl2 differentially methylated region (DMR) on the unmethylated maternal allele. To determine the functional importance of the binding of CTCF and cohesins at the three imprinted loci, CTCF and cohesins were depleted in mouse embryonic fibroblast cells. The monoallelic expression of imprinted genes at these three loci was maintained. However, mRNA levels for these genes were typically increased; for H19 and Igf2 the increased level of expression was independent of the CTCF-binding sites in the imprinting control region. Results of these experiments demonstrate an unappreciated role for CTCF and cohesins in the repression of imprinted genes in somatic cells (Lin, 2011).

A chromatin code for alternative splicing involving a putative association between CTCF and HP1alpha proteins

Alternative splicing is primarily controlled by the activity of splicing factors and by the elongation of the RNA polymerase II (RNAPII). Recent experiments have suggested a new complex network of splicing regulation involving chromatin, transcription and multiple protein factors. In particular, the CCCTC-binding factor (CTCF), the Argonaute protein AGO1, and members of heterochromatin protein 1 (HP1) family have been implicated in the regulation of splicing associated to chromatin and the elongation of RNAPII. These results raise the question of whether these proteins may associate at the chromatin level to modulate alternative splicing. Using ChIP-Seq data for CTCF, AGO1, HP1alpha, H3K27me3, H3K9me2, H3K36me3, RNAPII, total H3 and 5metC and alternative splicing arrays from two cell lines, this study analyzed the combinatorial code of their binding to chromatin in relation to the alternative splicing patterns between two mammalian cell lines, MCF7 and MCF10. Using Machine Learning techniques, the changes were obtained in chromatin signals that are most significantly associated to splicing regulation between these two breast cancer cell lines. Moreover, a map was built of the chromatin signals on the pre-mRNA, i.e., a chromatin-based RNA-map, which can explain 606 (68.55%) of the regulated events between MCF7 and MCF10. This chromatin code involves the presence of HP1alpha, CTCF, AGO1, RNAPII and histone marks around regulated exons and can differentiate patterns of skipping and inclusion. Additionally, a significant association of HP1alpha and CTCF activities was found around the regulated exons and a putative DNA binding site for HP1alpha. These results show that a considerable number of alternative splicing events could have a chromatin-dependent regulation involving the association of HP1alpha and CTCF near regulated exons. Additionally, further evidence was found for the involvement of HP1alpha and AGO1 in chromatin-related splicing regulation (Agirre, 2015).

Alternative splicing is primarily controlled by the activity of splicing factors and by the elongation of the RNA polymerase II (RNAPII). Recent experiments have suggested a new complex network of splicing regulation involving chromatin, transcription and multiple protein factors. In particular, the CCCTC-binding factor (CTCF), the Argonaute protein AGO1, and members of heterochromatin protein 1 (HP1) family have been implicated in the regulation of splicing associated to chromatin and the elongation of RNAPII. These results raise the question of whether these proteins may associate at the chromatin level to modulate alternative splicing (Agirre, 2015).

Using ChIP-Seq data for CTCF, AGO1, HP1α, H3K27me3, H3K9me2, H3K36me3, RNAPII, total H3 and 5metC and alternative splicing arrays from two cell lines, an analysis was carried out of the combinatorial code of their binding to chromatin in relation to the alternative splicing patterns between two cell lines, MCF7 and MCF10. Using Machine Learning techniques, the changes in chromatin signals were obtained that are most significantly associated to splicing regulation between these two cell lines. Moreover, a map was built of the chromatin signals on the pre-mRNA, i.e., a chromatin-based RNA-map, which can explain 606 (68.55%) of the regulated events between MCF7 and MCF10. This chromatin code involves the presence of HP1α, CTCF, AGO1, RNAPII and histone marks around regulated exons and can differentiate patterns of skipping and inclusion. Additionally, a significant association was found of HP1α and CTCF activities around the regulated exons and a putative DNA binding site for HP1α. These results show that a considerable number of alternative splicing events could have a chromatin-dependent regulation involving the association of HP1α and CTCF near regulated exons. Additionally, further evidence was found for the involvement of HP1α and AGO1 in chromatin-related splicing regulation (Agirre, 2015).

This work has derived a chromatin code for splicing that involves binding signals for HP1α and CTCF, as well as AGO1, RNAPII and histone marks, activity around regulated exons. Feature selection and cross-validation shows that this regulatory code is predictive for nearly 70% of the alternative splicing events regulated between two cell lines, MCF7 and MCF10, providing further evidence for a role of chromatin in the regulation of alternative splicing. This code also provides evidence for specific associations of various factors in relation to splicing differences between the two studied cell lines. This model shows that AGO1 activity downstream of alternative exon correlates with splicing changes in the direction of skipping in MCF7 compared to MCF10A, providing further indication that AGO1 association to chromatin could be implicated in splicing regulation. The previously described increased binding of CTCF binding downstream of inclusion events was also uncovered. Additionally, the density of RNAPII downstream of regulated exons, which tends to co-occur with CTCF and HP1α is an informative attribute to predict splicing change; and a relative increase in the region flanking the exon correlates with exon skipping in MCF7 compared to MCF10A. The association of the RNAPII density related to exon definition has been observed before and there is plenty of evidence supporting a regulation of alternative splicing associated with RNAPII elongation rates. These results corroborate the importance of RNAPII occupancy in the exon inclusion or skipping, and provide directionality in the relation between density changes and the pattern of differential splicing between cell lines (Agirre, 2015).

H3K36me3 also appeared as a relevant mark for splicing decisions in the current model. Several reports have described H3K36me3 as an exon marker and there is evidence of higher densities of H3K36me3 at constitutive exons compared to alternative exons. However, the opposite pattern has also been described, as for specific genes an increased density of H3K36me3 has been related to exon skipping, which agrees with the current study. Since since this study only analyzed splicing events in genes that do not change expression, the results imply that the observed changes in H3K36me3 signal near exon boundaries were not a consequence of gene expression, and could indeed correspond to a role in splicing (Agirre, 2015).

Interestingly, this study found a strong association between CTCF and HP1α signals genome-wide and intragenically, and the activity of both factors correlate with exon inclusion. Besides acting as insulator, CTCF is involved in the splicing regulation of some exons as an antagonist of DNA methylation and also works as a barrier for spreading of heterochromatin, through which it can influence RNAPII elongation. These analyses show that HP1α-binding downstream of the cassette exons, with the co-localization of CTCF, affects alternative splicing. HP1α belongs to a family of non-histone chromosomal proteins and is a key player in the transcriptional gene silencing (TGS) pathway. HP1 proteins have already been linked before to the regulation of splicing by chromatin. In particular, a study published the conclusion of this work also describes a positional effect on splicing for HP 1 proteins, providing further evidence of the relevance of the HP1 family in linking chromatin with RNA processing and giving support to the curren model. The same study found that HP1 proteins could act as mediators between DNA methylation and splicing for a subset of the regulated events. Although there have been previous reports of a relation between DNA methylation and alternative splicing, this study did not find it to be a strong determinant of the splicing changes between MCF7 and MCF10 cells, indicating that the HP1-dependent code that this study describes is related to a DNA-methylation independent effect that may be more prevalent in the investigated cell types (Agirre, 2015).

Even though there is only limited evidence of direct DNA-binding for HP1α, this study found a consensus motif associated to the significant HP1α-ChIP-Seq signals, which is highly specific to the significant HP1αChIP-Seq signals and non-overlapping with the motifs for CTCF, AGO1 or H3K9me2. HP1 proteins generally consist of two highly conserved domains. While one of the domains is known to bind H3K9me, the other one acts as the interaction interface with other proteins. The two domains are separated by a hinge region of variable length, which has been related to DNA and RNA binding. The found motif may be related to a sequence-specific interaction of this protein region with DNA, which may act as a modulator of the interaction of HP1 with H3K9 methylation. Recent analyses also provide evidence of HP1 proteins interacting with RNA binding proteins, highlighting their plasticity and central role in RNA processing regulation linked to chromatin (Agirre, 2015).

This study also found a frequent overlap of AGO1 with CTCF and HP1α-clusters, but not the other way around. Moreover, HP1α was found in the same downstream region as AGO1 but in the direction of inclusion, and regulating a distinct set of events. Depletion of AGO1 expression can induce splicing changes in both directions but generally decreases splicing efficiency. These analyses show that AGO1 and the co-localized CTCF and HP1α produce splicing changes in opposite directions. Despite the co-localization of AGO1 with CTCF and HP1α binding sites, this study found a weak but independent binding motif for AGO1. Recent analyses have produced candidate binding motifs for Drosophila and mouse Argonaute proteins. However, the motif from the current study does not resemble any of these motifs, suggesting a DNA- independent association of AGO1 to chromatin (Agirre, 2015).

Different predictive models to predict the splicing outcome, also called splicing codes, have been proposed before, but these did not include chromatin marks or proteins that interact with chromatin, like HP1, AGO1, CTCF and RNAPII, as described in this study. These analyses thus complement these previous descriptions by incorporating these new determinants of alternative splicing regulation. Although, motifs in the pre-mRNA sequence remain the main determinants of splicing regulation, this analysis indicates that a considerable fraction may be influenced by the properties of chromatin. There have been previous attempts to establish a general relation between histone marks and splicing regulation. However, only in one case a predictive model was proposed. Additionally, these approaches analyzed the relation between chromatin and splicing looking at one single condition at the time, rather than comparing two conditions, and exons were classified as constitutive or alternative based on RNA data from one single condition, rather than distinguishing those that are regulated from non-regulated ones between two conditions. The current approach has the advantage that, by comparing two conditions locally, it circumvents the caveats of comparing genomic regions with different sequence and structural properties. Moreover, the current approach relates changes of the chromatin signal between two conditions to the splicing changes of exons between the same two conditions, which provides a better descriptor of the association between chromatin changes and splicing regulation. In summary, this study has shown that a chromatin code for splicing can be defined involving HP1α, CTCF, RNAPII, various histone marks and AGO1, which can differentiate patterns of skipping, inclusion and non-regulated exons between two conditions. Additionally, the conserved motif found for HP1α and the presence of HP1α and AGO1 in the described splicing code provides further support for their involvement in chromatin-related splicing regulation (Agirre, 2015).

Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes

The pluripotent state of embryonic stem cells (ESCs) is produced by active transcription of genes that control cell identity and repression of genes encoding lineage-specifying developmental regulators. This study uses ESC cohesin ChIA-PET data to identify the local chromosomal structures at both active and repressed genes across the genome. The results produce a map of enhancer-promoter interactions and reveal that super-enhancer-driven genes generally occur within chromosome structures that are formed by the looping of two interacting CTCF sites co-occupied by cohesin (see Drosophila Cohesin). These looped structures form insulated neighborhoods whose integrity is important for proper expression of local genes. It was also found that repressed genes encoding lineage-specifying developmental regulators occur within insulated neighborhoods. These results provide insights into the relationship between transcriptional control of cell identity genes and control of local chromosome structure (Dowen, 2014).

CTCF regulates NELF, DSIF and P-TEFb recruitment during transcription

CTCF is a versatile transcription factor with well-established roles in chromatin organization and insulator function. Recent findings also implicate CTCF in the control of elongation by RNA polymerase (pol) II. This study shows that CTCF knockdown in HeLa abrogates pol II pausing at the early elongation checkpoint of c-myc by affecting recruitment of DRB-sensitivity-inducing factor (DSIF). CTCF knockdown also causes a termination defect on the U2 snRNA genes (U2), by affecting recruitment of negative elongation factor (NELF). In addition, CTCF is required for recruitment of positive elongation factor b (P-TEFb), which phosphorylates NELF, DSIF and Ser2 of the pol II CTD to activate elongation of transcription of c-myc and recognition of the snRNA gene-specific 3' box RNA processing signal. These findings implicate CTCF in a complex network of protein:protein/protein:DNA interactions and assign a key role to CTCF in controlling pol II transcription through the elongation checkpoint of the protein-coding c-myc and the termination site of the non-coding U2, by regulating the recruitment and/or activity of key players in these processes (Laitem, 2015).

Structural organization of the inactive X chromosome in the mouse

X-chromosome inactivation (XCI) involves major reorganization of the X chromosome as it becomes silent and heterochromatic. During female mammalian development, XCI is triggered by upregulation of the non-coding Xist RNA from one of the two X chromosomes. Xist coats the chromosome in cis and induces silencing of almost all genes via its A-repeat region. A role for Xist in organizing the inactive X (Xi) chromosome has been proposed. Recent chromosome conformation capture approaches have revealed global loss of local structure on the Xi chromosome and formation of large mega-domains, separated by a region containing the DXZ4 macrosatellite. This study investigate the structure, chromatin accessibility and expression status of the mouse Xi chromosome in highly polymorphic clonal neural progenitors (NPCs) and embryonic stem cells. A crucial role for Xist and the DXZ4-containing boundary was demonstrated in shaping Xi chromosome structure using allele-specific genome-wide chromosome conformation capture (Hi-C) analysis, an assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) and RNA sequencing. Deletion of the boundary disrupts mega-domain formation, and induction of Xist RNA initiates formation of the boundary and the loss of DNA accessibility. It was also shown that in NPCs, the Xi chromosome lacks active/inactive compartments and topologically associating domains (TADs), except around genes that escape XCI. Escapee gene clusters display TAD-like structures and retain DNA accessibility at promoter-proximal and CTCF-binding sites. Furthermore, altered patterns of facultative escape genes in different neural progenitor clones are associated with the presence of different TAD-like structures after XCI. These findings suggest a key role for transcription and CTCF in the formation of TADs in the context of the Xi chromosome in neural progenitors (Giorgetti, 2016).

CTCF-mediated topological boundaries during development foster appropriate gene regulation

The genome is organized into repeating topologically associated domains (TADs) (see Drosophila chromatin organization), each of which is spatially isolated from its neighbor by poorly understood boundary elements thought to be conserved across cell types. This study shows that deletion of CTCF (CCCTC-binding factor)-binding sites at TAD and sub-TAD topological boundaries that form within the HoxA (see Drosophila lab) and HoxC (see Drosophila Dfd) clusters during differentiation of mouse embryonic stem cells not only disturbs local chromatin domain organization and regulatory interactions but also results in homeotic transformations typical of Hox gene misregulation. Moreover, CTCF-dependent boundary function can be modulated by competing forces, such as the self-assembly of polycomb domains within the nucleus. Therefore, CTCF boundaries are not merely static structural components of the genome but instead are locally dynamic regulatory structures that control gene expression during development (Narendra, 2016).

CTCF and cohesin regulate chromatin loop stability with distinct dynamics

Folding of mammalian genomes into spatial domains is critical for gene regulation. The insulator protein CTCF (see Drosophila CTFC) and cohesin (see Drosophila Cohesin) control domain location by folding domains into loop structures, which are widely thought to be stable. Combining genomic and biochemical approaches this study shows that CTCF and cohesin co-occupy the same sites and physically interact as a biochemically stable complex. However, using single-molecule imaging it was found that CTCF binds chromatin much more dynamically than cohesin (~1-2 min vs. ~22 min residence time). Moreover, after unbinding, CTCF quickly rebinds another cognate site unlike cohesin for which the search process is long (~1 min vs. ~33 min). Thus, CTCF and cohesin form a rapidly exchanging 'dynamic complex' rather than a typical stable complex. Since CTCF and cohesin are required for loop domain formation, these results suggest that chromatin loops are dynamic and constantly break and reform throughout the cell cycle (Hansen, 2017).


Search PubMed for articles about Drosophila CTCF

Agirre, E., Bellora, N., Allo, M., Pages, A., Bertucci, P., Kornblihtt, A. R. and Eyras, E. (2015). A chromatin code for alternative splicing involving a putative association between CTCF and HP1alpha proteins. BMC Biol 13: 31. PubMed ID: 25934638

Baniahmad, A., Steiner, C., Kohne, A. C. and Renkawitz, R. (1990). Modular structure of a chicken lysozyme silencer: involvement of an unusual thyroid hormone receptor binding site. Cell 61: 505-514. PubMed ID: 2159385

Barbieri, M., Chotalia, M., Fraser, J., Lavitas, L. M., Dostie, J., Pombo, A. and Nicodemi, M. (2012). Complexity of chromatin folding is captured by the strings and binders switch model. Proc Natl Acad Sci U S A 109: 16173-16178. PubMed ID: 22988072

Beagan, J. A., Duong, M. T., Titus, K. R., Zhou, L., Cao, Z., Ma, J., Lachanski, C. V., Gillis, D. R. and Phillips-Cremins, J. E. (2017). YY1 and CTCF orchestrate a 3D chromatin looping switch during early neural lineage commitment. Genome Res 27(7): 1139-1152. PubMed ID: 28536180

Bell, A. C., West, A. G. and Felsenfeld, G. (1999). The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98: 387-396. PubMed ID: 10458613

Benedetti, F., Dorier, J., Burnier, Y. and Stasiak, A. (2014). Models that include supercoiling of topological domains reproduce several known features of interphase chromosomes. Nucleic Acids Res 42: 2848-2855. PubMed ID: 24366878

Bowman, S. K., Deaton, A. M., Domingues, H., Wang, P. I., Sadreyev, R. I., Kingston, R. E., Bender, W. (2014) H3K27 modifications define segmental regulatory domains in the Drosophila bithorax complex. Elife (Cambridge): e02833. PubMed ID: 25082344

Burcin, M., et al. (1997). Negative protein 1, which is required for function of the chicken lysozyme gene silencer in conjunction with hormone receptors, is identical to the multivalent zinc finger repressor CTCF. Mol Cell Biol 17: 1281-1288. PubMed ID: 9032255

Burke, L. J., Hollemann, T., Pieler, T. and Renkawitz, R. (2002). Molecular cloning and expression of the chromatin insulator protein CTCF in Xenopus laevis. Mech Dev 113: 95-98. PubMed ID: 11900981

Bushey, A. M., Ramos, E. and Corces, V. G. (2009). Three subclasses of a Drosophila insulator show distinct and cell type-specific genomic distributions. Genes Dev. 23(11): 1338-50. PubMed ID: 19443682

Butcher, R. D., et al. (2004). The Drosophila centrosome-associated protein CP190 is essential for viability but not for cell division. J. Cell Sci. 117: 1191-1199. PubMed ID: 14996941

Capelson, M. and Corces, V. G. (2005). The ubiquitin ligase dTopors directs the nuclear organization of a chromatin insulator. Mol Cell 20: 105-116. PubMed ID: 16209949

Ciavatta, D., Rogers, S. and Magnuson, T. (2007). Drosophila CTCF is required for Fab-8 enhancer blocking activity in S2 cells. J. Mol. Biol. 373(2): 233-9. PubMed ID: 17825318

Cubenas-Potts, C., Rowley, M. J., Lyu, X., Li, G., Lei, E. P. and Corces, V. G. (2017). Different enhancer classes in Drosophila bind distinct architectural proteins and mediate unique chromatin interactions and 3D architecture. Nucleic Acids Res 45(4): 1714-1730. PubMed ID: 27899590

Cuddapah, S., et al. (2009). Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 19: 24-32. PubMed ID: 19056695

Degner, S. C., et al. (2009). Cutting edge: Developmental stage-specific recruitment of cohesin to CTCF sites throughout immunoglobulin loci during B lymphocyte development. J. Immunol. 182: 44-48. PubMed ID: 19109133

Dowen, J. M., Fan, Z. P., Hnisz, D., Ren, G., Abraham, B. J., Zhang, L. N., Weintraub, A. S., Schuijers, J., Lee, T. I., Zhao, K. and Young, R. A. (2014). Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159: 374-387. PubMed ID: 25303531

Eagen, K. P., Aiden, E. L. and Kornberg, R. D. (2017). Polycomb-mediated chromatin loops revealed by a subkilobase-resolution chromatin interaction map. Proc Natl Acad Sci U S A 114(33): 8764-8769. PubMed ID: 28765367

Filippova, G. N., et al. (2001). CTCF-binding sites flank CTG/CAG repeats and form a methylation-sensitive insulator at the DM1 locus. Nat Genet 28: 335-343. PubMed ID: 11479593

Fu, Y., Sinha, M., Peterson, C. L. and Weng, Z. (2008). The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 4(7): e1000138. PubMed ID: 18654629

Giorgetti, L., Lajoie, B. R., Carter, A. C., Attia, M., Zhan, Y., Xu, J., Chen, C. J., Kaplan, N., Chang, H. Y., Heard, E. and Dekker, J. (2016). Structural organization of the inactive X chromosome in the mouse. Nature 535: 575-579. PubMed ID: 27437574

Golovnin, A., Melnikova, L., Shapovalov, I., Kostyuchenko, M. and Georgiev, P. (2015). EAST organizes Drosophila insulator proteins in the interchromosomal nuclear compartment and modulates CP190 binding to chromatin. PLoS One 10: e0140991. PubMed ID: 26489095

Gomes, N. P. and Espinosa, J. M. (2010). Gene-specific repression of the p53 target gene PUMA via intragenic CTCF-Cohesin binding. Genes Dev. 24(10): 1022-34. PubMed ID: 20478995

Hansen, A. S., Pustova, I., Cattoglio, C., Tjian, R. and Darzacq, X. (2017). CTCF and cohesin regulate chromatin loop stability with distinct dynamics. Elife 6. PubMed ID: 28467304

Hark, A. T., Schoenherr, C. J., Katz, D. J., Ingram, R. S., Levorse, J. M. and Tilghman, S. M. (2000). CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405: 486-489. PubMed ID: 10839547

Harmston, N., Ing-Simmons, E., Tan, G., Perry, M., Merkenschlager, M. and Lenhard, B. (2017). Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat Commun 8(1): 441. PubMed ID: 28874668

He, Q., Bardet, A. F., Patton, B., Purvis, J., Johnston, J., Paulson, A., Gogol, M., Stark, A. and Zeitlinger, J. (2011). High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species. Nat Genet 43: 414-420. PubMed ID: 21478888

Heger, P., Marin, B., Bartkuhn, M., Schierenberg, E. and Wiehe, T. (2012). The chromatin insulator CTCF and the emergence of metazoan diversity. Proc Natl Acad Sci U S A 109: 17507-17512. PubMed ID: 23045651

Holohan, E. E., et al.. (2007). CTCF genomic binding sites in Drosophila and the Organisation of the Bithorax Complex. PLoS Genet. 3(7): e112. PubMed ID: 17616980

Hou, L., Wang, L., Berg, A., Qian, M., Zhu, Y., Li, F. and Deng, M. (2012). Comparison and evaluation of network clustering algorithms applied to genetic interaction networks. Front Biosci (Elite Ed) 4: 2150-2161. PubMed ID: 22202027

Jost, D., Carrivain, P., Cavalli, G. and Vaillant, C. (2014). Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains. Nucleic Acids Res 42: 9553-9561. PubMed ID: 25092923

Kanduri, C., et al. (2000). Functional association of CTCF with the insulator upstream of the H19 gene is parent of origin-specific and methylation-sensitive. Curr. Biol. 10: 853-856. PubMed ID: 10899010

Kim, T. H., et al. (2007). Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128: 1231-1245. PubMed ID: 17382889

Kuhn, E. J., Viering, M. M., Rhodes, K. M. and Geyer, P. K. (2003). A test of insulator interactions in Drosophila. EMBO J 22: 2463-2471. PubMed ID: 12743040

Kurukuti S, Tiwari VK, Tavoosidana G, Pugacheva E, Murrell A, et al. (2006) CTCF binding at the H19 imprinting control region mediates maternally inherited higher-order chromatin conformation to restrict enhancer access to Igf2. Proc. Natl. Acad. Sci. 103: 10684-10689. PubMed ID: 16815976

Kyrchanova, O., Toshchakov, S., Podstreshnaya, Y., Parshikov, A. and Georgiev, P. (2008). Functional interaction between the Fab-7 and Fab-8 boundaries and the upstream promoter region in the Drosophila Abd-B gene. Mol. Cell. Biol. 28(12): 4188-95. PubMed ID: 18426914

Kyrchanova, O., Zolotarev, N., Mogila, V., Maksimenko, O., Schedl, P. and Georgiev, P. (2017). Architectural protein Pita cooperates with dCTCF in organization of functional boundaries in Bithorax Complex. Development [Epub ahead of print]. PubMed ID: 28619827

Laitem, C., Zaborowska, J., Tellier, M., Yamaguchi, Y., Qingfu, C., Egloff, S., Handa, H. and Murphy, S. (2015). CTCF regulates NELF, DSIF and P-TEFb recruitment during transcription. Transcription 6(5):79-90. PubMed ID: 26399478

Lanzuolo, C., et al. (2007). Polycomb response elements mediate the formation of chromosome higher-order structures in the bithorax complex. Nat. Cell Biol. 9: 1167-1174. PubMed ID: 17828248

Le, T. B., Imakaev, M. V., Mirny, L. A. and Laub, M. T. (2013). High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342: 731-734. PubMed ID: 24158908

Lee, J. T. (2003). Molecular links between X-inactivation and autosomal imprinting: X-inactivation as a driving force for the evolution of imprinting? Curr. Biol. 13: R242-R254. PubMed ID: 12646153

Lei, E. P. and Corces, V. G (2006). RNA interference machinery influences the nuclear organization of a chromatin insulator. Nat. Genet. 38: 936-941. PubMed ID: 16862159

Lei, E. P. and Corces, V. G. (2006). RNA interference machinery influences the nuclear organization of a chromatin insulator. Nat. Genet. 38: 936-941. PubMed ID: 16862159

Lewis, A. and Murrell, A. (2004). Genomic imprinting: CTCF protects the boundaries. Curr. Biol. 14: R284-R286. PubMed ID: 15062124

Li, H. B., Ohno, K., Gui, H. and Pirrotta, V. (2013). Insulators target active genes to transcription factories and polycomb-repressed genes to polycomb bodies. PLoS Genet 9: e1003436. PubMed ID: 23637616

Li, M., Belozerov, V. E. and Cai, H. N. (2009). Analysis of chromatin boundary activity in Drosophila cells. BMC Mol. Biol. 9: 109. PubMed ID: 19077248

Li, T., et al. (2008). CTCF regulates allelic expression of Igf2 by orchestrating a promoter-polycomb repressive complex 2 intrachromosomal loop. Mol Cell Biol 28: 6473-6482. PubMed ID: 18662993

Lin, S., et al. (2011). Nonallelic transcriptional roles of CTCF and cohesins at imprinted loci. Mol. Cell Biol. 31(15): 3094-104. PubMed ID: 21628529

Lobanenkov VV, Nicolas RH, Adler VV, Paterson H, Klenova EM, Polotskaja AV, Goodwin GH. (1990). A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5'-flanking sequence of the chicken c-myc gene. Oncogene 5: 1743-1753. PubMed ID: 2284094

Loukinov, D. I., et al. (2002). BORIS, a novel male germ-line-specific protein associated with epigenetic reprogramming events, shares the same 11-zinc-finger domain with CTCF, the insulator protein involved in reading imprinting marks in the soma. Proc. Natl. Acad. Sci. 99: 6806-6811. PubMed ID: 12011441

Lutz M et al. (2000). Transcriptional repression by the insulator protein CTCF involves histone deacetylases. Nucleic Acids Res 28: 1707-1713. PubMed ID: 10734189

MacDonald, W. A., et al. (2010). The Drosophila homolog of the mammalian imprint regulator, CTCF, maintains the maternal genomic imprint in Drosophila melanogaster. BMC Biol. 8: 105. PubMed ID: 20673338

Magbanua, J. P., Runneburger, E., Russell, S. and White, R. (2014). A variably occupied CTCF binding site in the Ultrabithorax gene in the Drosophila Bithorax Complex. Mol Cell Biol 35(1):318-30. PubMed ID: 25368383

Maksimenko, O., Kyrchanova, O., Bonchuk, A., Stakhov, V., Parshikov, A. and Georgiev, P. (2014). Highly conserved ENY2/Sus1 protein binds to Drosophila CTCF and is required for barrier activity. Epigenetics 9(9): 1261-70. PubMed ID: 25147918

Maksimenko, O., Bartkuhn, M., Stakhov, V., Herold, M., Zolotarev, N., Jox, T., Buxa, M. K., Kirsch, R., Bonchuk, A., Fedotova, A., Kyrchanova, O., Renkawitz, R. and Georgiev, P. (2015). Two new insulator proteins, Pita and ZIPIC, target CP190 to chromatin. Genome Res 25(1): 89-99. PubMed ID: 25342723

Mohan, M., et al. (2007). The Drosophila insulator proteins CTCF and CP190 link enhancer blocking to body patterning. EMBO J. 26(19): 4203-14. PubMed ID: 17805343

Moon, H., et al. (2005). CTCF is conserved from Drosophila to humans and confers enhancer blocking of the Fab-8 insulator. EMBO Rep. 6(2): 165-70. PubMed ID: 15678159

Moshkovich, N., et al. (2011). RNAi-independent role for Argonaute2 in CTCF/CP190 chromatin insulator function. Genes Dev. 25(16): 1686-701. PubMed ID: 21852534

Négre, N., et al. (2010). A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet. 6(1): e1000814. PubMed ID: 20084099

Narendra, V., Bulajić, M., Dekker, J., Mazzoni, E.O. and Reinberg, D. (2016). CTCF-mediated topological boundaries during development foster appropriate gene regulation. Genes Dev 30: 2657-2662. PubMed ID: 28087711

Ni, X., Zhang, Y. E., Negre, N., Chen, S., Long, M. and White, K. P. (2012). Adaptive evolution and the birth of CTCF binding sites in the Drosophila genome. PLoS Biol 10: e1001420. PubMed ID: 23139640

Ohlsson R, Renkawitz R, Lobanenkov V. (2001). CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet 17: 520-527. PubMed ID: 11525835

Pai, C. Y., Lei, E. P., Ghosh, D. and Corces, V. G. (2004). The centrosomal protein CP190 is a component of the gypsy chromatin insulator. Mol. Cell 16: 737-748. PubMed ID: 15574329

Parelho, V., et al. (2008). Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132: 422-433. PubMed ID: 18237772

Ramírez, F., Lingg, T., Toscano, S., Lam, K. C., Georgiev, P., Chung, H. R., Lajoie, B. R., de Wit, E., Zhan, Y., de Laat, W., Dekker, J., Manke, T. and Akhtar, A. (2015). High-affinity sites form an interaction network to facilitate spreading of the MSL complex across the X chromosome in Drosophila. Mol Cell 60: 146-162. PubMed ID: 26431028

Ramirez, F., Bhardwaj, V., Arrigoni, L., Lam, K. C., Gruning, B. A., Villaveces, J., Habermann, B., Akhtar, A. and Manke, T. (2018). High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun 9(1): 189. PubMed ID: 29335486

Rathke, C., et al. (2007). Transition from a nucleosome-based to a protamine-based chromatin configuration during spermiogenesis in Drosophila. J. Cell Sci. 120(Pt 9): 1689-700. PubMed ID: 17452629

Schuijers, J., Manteiga, J. C., Weintraub, A. S., Day, D. S., Zamudio, A. V., Hnisz, D., Lee, T. I. and Young, R. A. (2018). Transcriptional dysregulation of MYC reveals common enhancer-docking mechanism. Cell Rep 23(2): 349-360. PubMed ID: 29641996

Schwartz, Y. B., et al. (2012). Nature and function of insulator protein binding sites in the Drosophila genome. Genome Res 22: 2188-2198. PubMed ID: 22767387

Sekimata, M., et al. (2009). CCCTC-binding factor and the transcription factor T-bet orchestrate T helper 1 cell-specific structure and function at the interferon-gamma locus. Immunity 31(4): 551-64. PubMed ID: 19818655

Sexton, T., Yaffe, E., Kenigsberg, E., Bantignies, F., Leblanc, B., Hoichman, M., Parrinello, H., Tanay, A. and Cavalli, G. (2012). Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148: 458-472. PubMed ID: 22265598

Splinter, E., et al. (2006) CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev 20: 2349-2354. PubMed ID: 16951251

Szabo, P., Tang, S. H., Rentsendorj, A., Pfeifer, G. P. and Mann, J. R. (2000). Maternal-specific footprints at putative CTCF sites in the H19 imprinting control region give evidence for insulator function. Curr Biol 10: 607-610. PubMed ID: 10837224

Tanimoto, K., Sugiura, A., Omori, A., Felsenfeld, G., Engel, J. D. and Fukamizu, A. (2003). Human β-globin locus control region HS5 contains CTCF- and developmental stage-dependent enhancer-blocking activity in erythroid cells. Mol Cell Biol 23: 8946-8952. PubMed ID: 14645507

Ulianov, S. V., Khrameeva, E. E., Gavrilov, A. A., Flyamer, I. M., Kos, P., Mikhaleva, E. A., Penin, A. A., Logacheva, M. D., Imakaev, M. V., Chertovich, A., Gelfand, M. S., Shevelyov, Y. Y. and Razin, S. V. (2015). Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains. Genome Res. PubMed ID: 26518482

Van Bortle, K., Ramos, E., Takenaka, N., Yang, J., Wahi, J. E. and Corces, V. G. (2012). Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains. Genome Res. 22(11): 2176-87. PubMed ID: 22722341

Van Bortle, K., Peterson, A. J., Takenaka, N., O'Connor, M. B. and Corces, V. G. (2015). CTCF-dependent co-localization of canonical Smad signaling factors at architectural protein binding sites in D. melanogaster. Cell Cycle 14(16):2677-87. PubMed ID: 26125535

Vostrov, A. A. and Quitschke, W. W. (1997). The zinc finger protein CTCF binds to the APBß domain of the amyloid β-protein precursor promoter. Evidence for a role in transcriptional activation. J. Biol. Chem. 272: 33353-33359. PubMed ID: 9407128

Wendt, K. S., et al. (2008). Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451: 796-801. PubMed ID: 18235444

Xie. X., et al. (2007). Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc. Natl. Acad. Sci. 104: 7145-7150. PubMed ID: 17442748

Yao, H., et al. (2010). Mediation of CTCF transcriptional insulation by DEAD-box RNA-binding protein p68 and steroid receptor RNA activator SRA. Genes Dev. 24(22): 2543-55. PubMed ID: 20966046

Yusufzai, T. M., Tagami, H., Nakatani, Y. and Felsenfeld, G. (2004). CTCF tethers an insulator to subnuclear sites, suggesting shared insulator mechanisms across species. Mol Cell 13: 291-298. PubMed ID: 14759373

Zhang, R., Burke, L. J., Rasko, J. E., Lobanenkov, V. and Renkawitz, R. (2004). Dynamic association of the mammalian insulator protein CTCF with centrosomes and the midbody. Exp. Cell Res. 294: 86-93. PubMed ID: 14980504

Zhao, H. and Dean, A. (2004). An insulator blocks spreading of histone acetylation and interferes with RNA polymerase II transfer between an enhancer and gene. Nucleic Acids Res 32: 4903-4919. PubMed ID: 15371553

Biological Overview

date revised: 21 June 2023

Home page: The Interactive Fly © 2007 Thomas Brody, Ph.D.

The Interactive Fly resides on the
Society for Developmental Biology's Web server.