The Interactive Fly

Zygotically transcribed genes

RNA polymerase and general transcription factors

Proteins involved in messenger RNA synthesis

General Transcription Factors, as the protein factors involved in messenger RNA synthesis are known, are conserved across species as diverse as Saccharomyces cerevisiae, Drosophila and humans. TF stands for transcription factor; they were named in chronological order of their discovery. The entire set of General Transcription Factors is composed of about 30 subunits. Although the model below assumes that the factors are assembled by stages, there is some reason to believe that all thirty are also found assembled in a holoenzyme (Orphanides, 1996 and references).

Note: General Transcription Factors are listed below in order of recruitment to the promoter.


TFIID is multiprotein complex containing the TATA box binding protein (TBP) and (in Drosophila) at least seven other proteins known as TAFs or TBP associated factors. The first protein recruited to the promoter is TBP, which serves to induce a bend in the DNA. The 240 kD subunit (TAF250kd) contains an HMG-box, bromodomains, a serine kinase, and histone acetyltransferase activity. The smaller subunits are similar in structure to histones. Drosophila TBP-associated factor 60kD (also known as dTAFII62) and TBP-associated factor 40kD (also know as dTAFII42) are homologous to human hTAFII80 and hTAFII31 respectively; Drosophila and human proteins are homologous to histone H3 and histone H4, respectively. Both Drosophila and human TFIID also contain dTAFII30 alpha and hTAFII20 that are putatitive histone H2B homologues. In solution and in the crystalline state, the dTAFII42/dTAFII62 complex exists as a heterotetramer, resembling the (H3/H4)2 heterotetrameric core of the histone octamer, suggesting that TFIID contains a histone octamer-like substructure. TBP participates in TFIID function even in promoters lacking a TATA box (Xie, 1996).
     Drosophila                        FlyBase ID       Human homologs        Yeast homologs

     -----------------                 ----------       --------------------  --------------     


     TATA binding protein              FBgn0003687      TATA binding protein  TATA binding protein

     Tbp-related factor (Trf-1)        FBgn0010287      unknown

     Trf2                              FBgn0026758      TLF/TRF2

     TBP-associated factor (TAF) 250kD FBgn0010355      TAFII250              p130

     Bip2  (TAFII155)                  FBgn0026262      TAFII140               yTAFII47   

     TBP-associated factor 150kD       FBgn0011836      Not characterized     p150   

     TBP-associated factor 110kD       FBgn0010280      TAFII135              not characterized
     No hitter (testis specific)       FBgn0041103      

     TBP-associated factor 80kD        FBgn0010356      TAFII85               p90
     Cannonball (testis specific)      FBgn0011569      

     Cabeza                            FBgn0011571      TAFII68               

     TBP-associated factor 60kD        FBgn0010417      TAFII80               p60

     Taf55                             FBgn0024909      TAFII55               TAFII67

     TBP-associated factor  40kD       FBgn0011302      TAFII31               not characterized 

     TAF 30kD subunit alpha            FBgn0011290      hTAFII20              not characterized          

     TAF 30kD subunit beta             FBgn0011291      hTAFII28              p40          

     TATA binding protein associated 
               factor 24kD subunit     FBgn0028398      TAFII30

     Taf18                             FBgn0026324      TAFII18               TAFII19

     TBP-associated factor 16          FBgn0026324      TAFII60

     ENL/AF9                           FBgn0026441      TAFII60               TAFII30

TFIIB TFIIB associates with TBP on the opposite side of the DNA helix. The TFIIB-TBP-DNA ternary complex is formed by TFIIB
clamping the acidic C-terminal stirrup of TBP in its basic cleft, and interacting with the phosphoribose backbone
upstream and downstream of the center of the TATA element.

TFIIB physically links TFIID at the promoter with the pol II/TFIIF complex.

     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       ------------------         

     Transcription factor IIB          FBgn0004915      TFIIB
TFIIA Required for activation of transcription
     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       ------------------    

     Transcription factor IIA S        FBgn0013347      TFIIA gamma     

     Transcription factor IIA L        FBgn0011289      TFIIA alpha and beta     

TFIIE TFIIE contains a zinc-binding domain and is involved in promoter melting. TFIIE recruits TFIIH to the promoter.

     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       ------------------         

     Transcription factor IIEalpha     FBgn0015828      TFIIEalpha (56 kD)    

     Transcription factor IIEbeta      FBgn0015829      TFIIEbeta (34 kD)     

TFIIF TFIIF is the homolog of bacterial sigma subunit. Polymerase II cannot stably associate with the TFIID and TFIIB assembly at
the promoter and must be escorted to the promoter by TFIIF. TFIIF stimulates elongation.

     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       ------------------    

     Transcription factor TFIIFalpha   FBgn0010282      TFIIF RAP74    

     Transcription factor TFIIFbeta    FBgn0010421      TFIIF RAP30     

RNA polymerase For RNA polymerase II, the transition from initiation to elongation is accompanied by covalent modification of an unusual
structure at the carboxy terminus of its largest subunit. This evolutionarily conserved structure consists of multiple
tandem repeats of a heptapeptide, the RNA pol II carboxy-terminal domain (CTD). The number of times this sequence is
repeated varies from 26 in yeast to 52 in humans and seems to be directly related to genome complexity. The
phosphorylation of the CTD is central to the transcription mechanism of pol II. The unphosphorylated form of pol II is the
form recruited to the initiation complex. During initiation of RNA synthesis, the CTD becomes extensively phosphorylated
on serine and threonine residues.

     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       -----------------

     RNA polymerase II 215kD subunit   FBgn0003277      RNA polymerase II large subunit   

     RNA polymerase II 140kD subunit   FBgn0003276      RNA polymerase II small subunit      

TFIIH TFIIH is a multisubunit factor with 3'-5' helicase activity. The Drosophila TFIIH consists of 8 subunits (two listed here)
similar to their human counterparts. Besides the helicase activity, there is present RNA polII C-terminal domain kinase
activity (CDK7) and a cyclin partner for the kinase (Cyclin H). Cyclin H forms a ternary complex with CDK7 and MAT1.
This tripartite Cdk-activating kinase occurs in a free form and in association with 'core' TFIIH.

     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       ------------------          

     Transcription factor IIH          FBgn0015830      TFIIH (ERCC3)

     Cyclin-dependent kinase 7         FBgn0015617      CDK7  

P-TEFb A dimer of Cdk9 and Cyclin T that targets RNA polymerase II C-terminal domain.
Functions to overcome promoter-proximal pausing and premature termination -
promotes polymerase entry into productive elongation.

     Drosophila                        FlyBase ID       Human homologs

     -----------------                 ----------       ------------------          

     Cyclin dependent kinase 9         FBgn0019949      Cdk9

     Cyclin T                          FBgn0025455      Cyclin T  

TFIIS critical for efficient release of stalled RNA Pol II from intrinsic stop sites in promoter regions -
promotes transcriptional elongation and decreases pausing

Drosophila                                  FlyBase ID       Human homologs

-----------------                           ----------       ------------------          

RNA polymerase II elongation factor         FBgn0010422      TfIIS

Factors involved in function of RNA polymerase II

Factors involved in function of RNA polymerase III

Paf1 complex (coordinates histone modifications and changes in nucleosome structure with transcription activation and Pol II elongation)

How does messenger RNA synthesis take place?

The conventional model for formation of a preinitiation complex and ordered transcription by RNA polymerase II (pol II) is characterized by a distinct series of events: (1) recognition of core promoter elements by TFIID (containing TBP and several other protein subunits), (2) recognition of and binding to the TFIID-promoter complex by TFIIB, (3) recruitment of a TFIIE/pol II complex by TFIIB, (4) binding of TFIIE (related to bacterial sigma) and TFIIH (containing a helicase required for promoter melting) to complete the preinitiation complex, (5) promoter melting and formation of an "open" initiation complex, (6) synthesis of the first phosphodiester bond of the nascent mRNA transcript, (7) release of pol II contacts with the promoter (promoter clearance, and (8) elongation of the RNA transcript. TFIIA can join the complex at any stage after TFIID binding and stabilizes the initiation complex. TFIID can remain bound to the core promoter supporting reinitiation of transcription. (Orphanides, 1996 and Nikolov, 1997).

This model has been further refined to incorporate known alterations in the level of phosphorylation of the carboxy-terminal domain (CTD) of RNA polymerase II (Cho, 1999). Stable association of RNAPII with promoter sequences requires TFIID (or TBP), TFIIB, and TFIIF. However, the RNAPII transcription system is unique because, after the polymerase has stably associated with promoter sequences, two additional factors, TFIIE and TFIIH, are necessary for transcription. This requirement is likely related to a unique structure found at the carboxyl terminus of the largest subunit of RNAPII known as the carboxy-terminal domain (CTD). This conserved structure consists of multiple tandem repeats of the heptapeptide Tyr-Ser-Pro-Thr-Ser-Pro-Ser, which serves as a substrate for a number of protein kinases. At least two forms of RNAPII have been detected in cells. The most abundant form contains a phosphorylated CTD (RNAPIIO). A second form contains an unphosphorylated CTD and is known as RNAPIIA. The phosphorylation of the CTD has been correlated with function. It was found that the nonphosphorylated form of RNAPII is recruited to the initiation complex, whereas the elongating polymerase is found with a phosphorylated CTD. TFIIH contains a CTD kinase activity and this activity is efficient after RNAPII has associated with promoter sequences. A 150-kD polypeptide termed FCP1 has now been isolated. Together with RNAPII, FCP1 reconstitutes a highly specific CTD phosphatase activity. Functional analysis demonstrates that the CTD phosphatase allows recycling of RNAPII. Upon reaching termination sequences, the CTD becomes dephosphorylated by the FCP1 phosphatase within the ternary complex (consisting of DNA, polymerase and phosphatase) or immediately after the release of RNAPII from the DNA template. The phosphatase dephosphorylates the CTD allowing efficient recycling of RNAPII into transcription initiation complexes, which result in increased transcription. The phosphatase is found to stimulate elongation by RNAPII; however, this function is independent of its catalytic activity (Cho, 1999 and references).

A model is presented detailing the role of cycling of CTD phosphorylation in the function of RNAPII. After the termination of the previous transciption cycle, TBP remains bound to the TATA motif and provides the foundation for association of TFIIB. RNAPII, through its interactions with TFIIF, recognizes the TBP-TFIIB complex association with the TATA motif. Because TFIIF has been found to interact with both the phosphorylated and nonphosphorylated forms of RNAPII and FCP1 and to stimulate FCP1 activity, its association with RNAPII prior to association with the TB complex may be important in attaining an RNAPII that is fully dephosphorylated. The association of RNAPII with promoter sequences provides the foundation for the entry of TFIIE and allows the association of TFIIH, resulting in the formation of a fully competent transcription initiation complex. During the process of initiation and prior to the formation of a fully competent elongation complex, the CTD becomes phosphorylated in a TFIIH-dependent manner. Phosphorylation of the CTD does not affect elongation efficiency, but allows RNAPII to disengage from the promoter and from transcription initiation factors. In the presence of the ribonucleoside triphosphates, the transcription initiation complex disassembles with the release of TFIIB, TFIIE, and TFIIH. CTD phosphorylation provides a foundation for the association of factors involved in RNA processing, such as the capping enzyme, splicing factors, and factors involved in 3'-end formation. Upon transcription of termination/polyadenylation signals, the elongating complex is altered, resulting in the release of RNAPII from the template by an unknown process. It is possible that RNAPII is converted to the nonphosphorylated form prior to, or concomitant with, its release from the DNA template. This possibility is supported by studies demonstrating that FCP1 is capable of dephosphorylating the CTD of RNAPII not only in solution prior to incorporation into transcription initiation complexes, but also in active ternary elongation complexes stalled as a result of nucleotide starvation. The finding that FCP1 also stimulates elongation by RNAPII, independent of its phosphatase activity, suggests that FCP1 may remain associated with RNAPII during elongation. The finding that FCP1 is active in ternary complexes has implications for the mechanism of transcription termination as well as for the down-regulation of RNA processing. Similar to the signal imposed on phosphorylation of the CTD (disengagement of RNAPII from the promoter and from interaction with initiation factors), dephosphorylation of the CTD may result in a signal that releases factors from RNAPII that are involved in RNA maturation (Cho, 1999 and references).

Evolution of general transcription factors

How have the factors required for transcription initiation (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, and RNA polymerase II [pol II]) evolved to accommodate the elaborate transcriptional programs required for growth, differentiation, and development of multicellular organisms? Analysis of the complete Drosophila genome sequence, as well as those of C. elegans, Saccharomyces cerevisiae, and humans sheds light on this well studied question in eukaryotic biology. All four organisms encode single isoforms of RNA pol II, TFIIB, TFIIE, TFIIF, and TFIIH components, but multiple, sequence-related isoforms of TFIID components. In addition, Drosophila and humans encode multiple isoforms of TFIIA components. Current evidence indicates that tissue- and cell type-specific transcription is directed by differentially expressed TFIID and possibly TFIIA isoforms. Thus, in accord with experimental data, this analysis points to TFIIA and TFIID as the factors that help generate the broad transcriptional repertoire of multicellular organisms. The identification of the complete set of TFIIA and TFIID components in a genetically and biochemically tractable organism like Drosophila is an important step toward understanding the mechanisms governing developmentally regulated transcription not only in Drosophila but also in humans (Aoyagia, 2000 and references therein).

Biochemical fractionation of Drosophila embryos, human cells, and yeast cells has defined a set of multiprotein complexes termed general transcription factors (GTFs; TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) required for mRNA transcription initiation in vitro. Transcription is initiated by recognition of core promoter elements by TFIID and sequential or concerted assembly of the other GTFs and RNA pol II to form the preinitiation complex (PIC). Although GTFs play essential roles during transcription initiation, it is the factors that regulate the ability of the GTFs to assemble and stably bind a core promoter that are probably major determinants of gene-specific transcription levels. For example, activators and coactivators are thought to stimulate transcription by recruiting GTFs to a promoter, thereby accelerating PIC assembly (Aoyagia, 2000 and references therein).

The GTF TFIID is composed of TATA-binding protein (TBP) and coactivator subunits termed TBP-associated factors (TAFIIs). TAFIIs not only function as 'conventional' coactivators by serving as physical links between DNA-binding activator proteins and the PIC but also possess enzymatic or promoter recognition activities that presumably enhance the efficiency of PIC assembly. TFIIA has also been described as a coactivator and displays a number of TAFII-like properties: it binds to TBP and TAFIIs; it interacts with specific transcriptional activators; it is generally required for activated transcription in vitro; and it contributes to promoter selectivity (Aoyagia, 2000 and references therein).

Inactivation of individual TAFIIs in Drosophila , mammalian, and yeast cells has demonstrated that TAFIIs are not required for the transcription of all RNA pol II genes, and in fact there is great variation in regard to the identity and number of gene targets for individual TAFIIs. Furthermore, different domains within a single TAFII can play gene-specific roles in transcription. The isolation of a human B cell-specific isoform of TAFII130 (TAFII105) raises the possibility that substoichiometric subunits of TFIID mediate tissue- or cell type-specific transcription and that additional components of TFIID may have escaped detection because of their low abundance. These possibilities have been born out in Drosophila where isoforms of TAFII110 and TAFII80 (No hitter [Nht] and Cannonball [Can], respectively) are expressed exclusively in testis and regulate transcription of a subset of genes required for spermatogenesis, and isoforms of TBP (TBP-related factors [TRF1 and TRF2]) are expressed in a tissue-specific manner and bind different genes in salivary gland cells. Similarly, analysis of the human TFIIA-L isoform ALF (TFIIAalpha/ß-like factor) reveals that its expression is restricted to the testis; however, it remains to be determined if it is used for the transcription of testis-specific genes. In Drosophila , TFIIA-S is expressed in a dynamic pattern during eye development and is transiently upregulated in photoreceptor precursor cells before their fate is determined. Therefore, the role of TFIIA and TFIID in transcription initiation is governed by the expression patterns and activities of their varied components (Aoyagia, 2000 and references therein).

Finally, it is critical to note that analysis of the function of TAFIIs is complicated by the fact that they are components of at least two other complexes that lack TBP: p300/CBP-associated factor (PCAF) and TBP-free TAFII-containing complex (TFTC). The human PCAF histone acetyltransferase (HAT) complex contains three TAFIIs that are shared with TFIID (TAFII31/32, TAFII20/15, and TAFII30) and three TAFII isoforms (PCAF-associated factor 65ß [PAF65ß], PAF65alpha, and SPT3) related to TAFII100, TAFII70/80, and TAFII18, respectively. Yeast possess an analogous complex, Spr-Ada-Gcn5-acetyltransferase (SAGA), containing TFIID TAFIIs and the Gcn5 HAT, and Drosophila may also, since it contains a Gcn5/PCAF homolog that interacts with TAFII24 (Aoyagia, 2000 and references therein).

Searches of the completed Drosophila, C. elegans, and yeast genomes and the partial human genome for sequence homologs of biochemically identified components of the general transcription machinery have led to the following conclusions: (1) all of the components of RNA pol II, TFIIB, TFIIE, TFIIF, and TFIIH are encoded by single copy genes in Drosophila , C. elegans, and yeast;(2) multiple isoforms of TFIID components are encoded in Drosophila , C. elegans, humans, and yeast, and multiple isoforms of TFIIA components are encoded in Drosophila and humans; (3) each organism encodes isoforms of different sets of TFIIA and TFIID components, some which are unique to a particular organism (Aoyagia, 2000 and references therein).

Sequence comparisons uncovered Drosophila homologs of TAFIIs previously identified in yeast or humans by biochemical means but which had not been described in Drosophila (yeast TAFII67/human TAFII55, yeast TAFII30/ human ENL/AF-9, and yeast TAFII19/human TAFII18). Thus, all TAFIIs present in both yeast and humans are present in Drosophila , as well as C. elegans. In contrast, yeast TAFII47 and TAFII65 are absent from Drosophila, C. elegans, and apparently from humans, suggesting that these TAFIIs perform a yeast-specific role, such as serving as coactivators for DNA-binding activators that are not present in metazoans. Finally, there are TAFIIs present in Drosophila, C. elegans, and humans that are absent from yeast (human TAFII68/Drosophila Cabeza and multiple TAFII isoforms). In addition to Can and Nht, there are alternatively spliced forms of TAFII30alpha, two genes (TAFII24 and TAFII16) that encode Drosophila homologs of human TAFII30, and TAFII60 and TAF30alpha isoforms (TAFII60-2 and TAF30alpha-2, respectively). TFIIA-S and TFIIA-L are the only other GTF components in Drosophila and humans, respectively, that are expressed in multiple isoforms. The fact that these proteins are unique to multicellular organisms suggests that they play cell-specific roles (Aoyagia, 2000 and references therein).

A number of TAFIIs contain a common structural motif called the histone fold that was originally shown to drive folding and association of each of the core histones (H2A, H2B, H3, and H4) and subsequently shown to play a similar role in association of TAFIIs. TAFII pairs, such as Drosophila TAFII40 and TAFII60, form heterotetramers, analogous to H3 and H4, and numerous other TAFII-TAFII and TAFII-nonTAFII interactions have been shown to involve histone fold motifs. The demonstrated histone fold interaction of human TAFII135 and TAFII20, predicts that Drosophila isoforms of these proteins, Nht and TAFII30alpha-2, respectively, may heterodimerize and hints at the existence of a human TAFII20 isoform that would heterodimerize with the TAFII135 isoform, TAFII105. B cell-specific expression of the hypothetical TAFII20 isoform may explain why TAFII105 associates with TFIID in B cells but not in other cell types (Aoyagia, 2000 and references therein).

In addition to the TAFIIs indicated above, other Drosophila transcription factors contain histone fold motifs, including Prodos, NF-YC-like (CG3075), CG11301, CHRAC-14 (CG13399), CHRAC-16 (CG15736), Dr1 (CG4185), NC2alpha (CG10318), and BIP2 (CG2009). It is interesting to speculate that these factors may be unidentified TAFII components of TFIID or binding partners for known TAFIIs in complexes that lack TBP (Aoyagia, 2000 and references therein).

Analysis of eukaryotic genomes has defined sets of proteins that are similar in sequence to known components of TFIIA and TFIID. Since known components of TFIIA and TFIID have been shown to play key roles in developmentally regulated transcription, it is exciting to speculate that the newly identified genes will play similar roles and that TFIIA and TFIID components have evolved to support tissue- or cell type-specific transcriptional requirements of individual eukaryotic organisms. The challenge now is to determine if TAFIIs that have been identified on the basis of their sequence are components of TBP-containing complexes or other TAFII-containing complexes, whether TAFIIs and TFIIA isoforms are differentially expressed during development, and how differentially expressed TBP, TAFII, and TFIIA isoforms function in concert with the ubiquitously expressed form of TFIID and TFIIA to regulate gene expression. The subunit composition of human PCAF complex leads to the prediction that Drosophila TAFII60-2 and Can and C. elegans Y37E11AL.c are components of PCAF/SAGA and not TFIID. However, protein isoforms that are unique to a particular organism, such as Drosophila TAFII30alpha-2 and C. elegans F54F7.1 and K10D3.3, may be tissue- or cell type-specific components of TFIID and not of PCAF/SAGA. Drosophila may be the most appropriate organism for these studies since the biochemical activities of these factors can be determined using established TFIIA and TFIID purification schemes and in vitro transcription systems, and developmental requirements for these factors can be determined using existing mutants or mutants generated by traditional mutagenesis schemes, P-element insertion, RNA interference (RNAi), or homologous recombination (Aoyagia, 2000 and references therein).

Occupancy of the Drosophila hsp70 promoter by a subset of basal transcription factors diminishes upon transcriptional activation

The presence of general transcription factors and other coactivators at the Drosophila hsp70 gene promoter in vivo has been examined by polytene chromosome immunofluorescence and chromatin immunoprecipitation at endogenous heat-shock loci or at a hsp70 promoter-containing transgene. These studies indicate that the hsp70 promoter is already occupied by TATA-binding protein (TBP) and several TBP-associated factors (TAFs), TFIIB, TFIIF (RAP30), TFIIH (XPB), TBP-free/TAF-containg complex (GCN5 and TRRAP), and the Mediator complex subunit 13 before heat shock. After heat shock, there is a significant recruitment of the heat-shock transcription factor, RNA polymerase II, XPD, GCN5, TRRAP, or Mediator complex 13 to the hsp70 promoter. Surprisingly, upon heat shock, there is a marked diminution in the occupancy of TBP, six different TAFs, TFIIB, and TFIIF, whereas there is no change in the occupancy of these factors at ecdysone-induced loci under the same conditions. Hence, these findings reveal a distinct mechanism of transcriptional induction at the hsp70 promoters, and further indicate that the apparent promoter occupancy of the general transcriptional factors does not necessarily reflect the transcriptional state of a gene (Lebedeva. 2005; full text of article).

An inverse correlation was observed between factor occupancy and transcriptional activation. In the absence of heat shock, it was found that TBP, TAFs, TFIIB, TFIIF, TFIIH, TFTC, and Mediator are present at the hsp70 promoter region. These results are similar to previous observations in which the basal factors have been found to be present at transcriptionally inactive promoters. Surprisingly, however, the apparent occupancy of TBP, several TAFs, TFIIB, and TFIIF significantly decreases upon transcriptional activation. These results could be due to some of the following scenarios: (1) upon activation, the undetected factors are present but adopt a conformation that renders them refractory to polytene chromosome staining and to ChIP analysis; (2) the factors that are not detected are indeed absent and do not participate in the ongoing transcription of the genes; or (3) the factors are present only transiently at the actively transcribed promoter and thus exhibit lower average occupancy upon polytene chromosome staining and ChIP analysis (Lebedeva. 2005).

The first scenario requires that TBP, several TAFs, TFIIB, and TFIIF simultaneously become essentially invisible to polytene immunostaining as well as to ChIP analysis upon transcriptional activation of hsp70 and other heat-shock genes. The observed effects are not a consequence of the heat shock treatment, because these factors are observed at ecdysone-responsive genes that have been subjected to heat shock. Moreover, for several factors (TBP, TAF1, and TAF10), the immunostaining was repeated with two different polyclonal antibodies that were raised against different epitopes, and identical results were obtained after heat-shock treatment. Furthermore, histone H3 K14 acetylation was detected at the hsp70 promoter after heat shock. Thus, the conditions allow the access of antibodies to proteins that are in close proximity to hsp70 promoter DNA. Thus, given that these experiments involve the use of many highly specific polyclonal antibodies and that the effect is observed with multiple polypeptides and is not a consequence of the heat-shock treatment, the first model appears to be unlikely (Lebedeva. 2005).

In the second scenario, TBP, several TAFs, TFIIB, and TFIIF do not participate in the ongoing transcription of heat-shock genes after heat induction. For instance, the factors required for transcription reinitiation may be a subset of those that participate in the first round of transcription. In fact, biochemical studies in yeast have shown that some, but not all, GTFs remain at the promoter after initiation and form a platform for the assembly of subsequent reinitiation complexes. This subset of factors includes TBP, TAF5, TFIIA, TFIIH, TFIIE, and Mediator, but not TFIIB or TFIIF. In accord with those results, this stydy found that TFIIH (XPB subunit) and Mediator (MED13), but not TFIIB or TFIIF remain at the hsp70 promoter after heat induction. In contrast, the apparent occupancy of TFIID (TBP, TAF1, and several other TAFs) is significantly reduced upon heat shock. Thus, for the second scenario to be correct, TBP and several TAFs must be dispensable for transcription reinitiation from heat-induced hsp70 promoters (Lebedeva. 2005).

In the third scenario, the average occupancy of the basal transcription factors at the hsp70 promoters is higher in the inactive gene than in the transcriptionally induced gene. This situation could occur if the basal transcription factors are in a static complex at the inactive hsp70 promoter and in a rapid cycling state of preinitiation-complex assembly and disassembly at the transcriptionally active hsp70 promoter. More specifically, in vivo data in the context of the third scenario suggest that TBP, several TAFs, TFIIB, and TFIIF make a transition from a static state to a rapidly cycling state upon heat-shock induction (Lebedeva. 2005).

It should be considered that the latter two scenarios might appear to be inconsistent with in vivo KMnO4 footprinting data, which suggest that TFIID binds to the Drosophila hsp70 promoters both before and after heat shock. In this regard, it should be noted that ChIP (as well as immunofluorescence) and footprinting experiments yield distinct types of information. ChIP provides data regarding the occupancy of a particular factor at a specific DNA sequence but does not indicate how the factor interacts with DNA or if the factor is biochemically active. Moreover, in some instances, specific DNA-bound factors may not be detectable by ChIP (although, as discussed above, it is unlikely that multiple subunits of a protein complex, such as TFIID, would be invisible in a ChIP assay with multiple polyclonal antibodies). In vivo footprinting, however, shows that a factor is bound to a specific DNA sequence but does not indicate exactly what factor is bound to that sequence. Therefore, the models and data are not necessarily contradictory. For example, it is possible that the factor that is responsible for the TATA footprint in the induced gene is not TBP or TFIID but rather another protein, such as a TBP-related factor, or a TFTC/STAGA-type complex. Alternatively, an induced hsp70 promoter might not contain the complete TFIID complex but rather only a subcomplex or TBP alone that is in a ChIP-invisible state, possibly hidden under other proteins, such as the polymerase. At the present time, however, the resolution of these issues will require the development of more sophisticated assays for the analysis of the functions of transcription factors in vivo (Lebedeva. 2005).

Thus, a model for the activation of hsp70 genes is as follows. First, the inactive gene contains many GTFs (such as TFIIB, TFIID, TFIIF, and TFIIH) as well as the downstream paused RNA Pol II. Upon heat induction, HSF binds to the promoter and recruits coactivators, such as Mediator and SAGA complexes, and these factors promote the release of the paused polymerase and the assembly of a new transcription preinitiation complex. After initiation, the transcription complex might partially disassemble, at which point factors such as TFIIB and TFIID (or many TFIID subunits) dissociate from the template DNA. (TFIIF may remain associated with the elongating polymerase and thus depart the promoter region.) Then, in subsequent rounds of initiation (i.e., reinitiation), the reassociation of TFIIB and TFIID with the template may be fleeting with a low residence time at the promoter (the third scenario described above). Alternatively, TFIIB and TFIID may be dispensable for reinitiation (the second scenario described above). TFIIH, in contrast, is needed to unwind the template DNA for every new round of transcription; thus, the average occupancy of TFIIH at the promoter increases along with the polymerase in proportion to the number of transcription reinitiation events. Thus, upon heat induction, an increase would be observed in HSF, Mediator, SAGA/TFTC, TFIIH, and RNA Pol II as well as a decrease in TFIIB, TFIID (or many TFIID subunits), and TFIIF at the promoter (Lebedeva. 2005).

The specific mechanism of transcriptional activation by HSF at heat shock genes is likely to be one of multiple mechanisms of regulation that are used in vivo. For example, in contrast to what is seen at the hsp70 promoters, the apparent occupancy of TBP, TFIIB, and several TAFs at ecdysone-responsive promoters does not decrease upon transcriptional induction, even if the cells are also subjected to heat shock (Lebedeva. 2005).

In conclusion, these results with the hsp70 promoters provide an example of a transcriptional mechanism wherein the apparent occupancy of TBP, several TAFs, TFIIB, and TFIIF decreases upon gene activation. Therefore, the extent of the apparent occupancy of these factors at a given promoter does not necessarily reflect the transcriptional activity of that promoter. The discovery and analysis of distinct transcriptional mechanisms is a key step toward the ultimate goal of understanding all of many strategies that are used by the cell to control gene activity (Lebedeva. 2005).

The same transcriptional activator (MTF-1) requires different coactivator subunits depending on the context of the core promoter

Cells often fine-tune gene expression at the level of transcription to generate the appropriate response to a given environmental or developmental stimulus. Both positive and negative influences on gene expression must be balanced to produce the correct level of mRNA synthesis. To this end, the cell uses several classes of regulatory coactivator complexes including two central players, TFIID and Mediator (MED), in potentiating activated transcription. Both of these complexes integrate activator signals and convey them to the basal apparatus. Interestingly, many promoters require both regulatory complexes, although at first glance they may seem to be redundant. RNA interference (RNAi) was used in Drosophila cells to selectively deplete subunits of the MED and TFIID complexes to dissect the contribution of each of these complexes in modulating activated transcription. The robust response of the metallothionein genes to heavy metal was used as a model for transcriptional activation by analyzing direct factor recruitment in both heterogeneous cell populations and at the single-cell level. Intriguingly, it was found that MED and TFIID interact functionally to modulate transcriptional response to metal. The metal response element-binding transcription factor-1 (MTF-1) recruits TFIID, which then binds promoter DNA, setting up a 'checkpoint complex' for the initiation of transcription that is subsequently activated upon recruitment of the MED complex. The appropriate expression level of the endogenous metallothionein genes is achieved only when the activities of these two coactivators are balanced. Surprisingly, it was found that the same activator (MTF-1) requires different coactivator subunits depending on the context of the core promoter. Finally, the stability of multi-subunit coactivator complexes can be compromised by loss of a single subunit, underscoring the potential for combinatorial control of transcription activation (Marr, 2006).

There are four known metallothionein genes in Drosophila: MtnA, MtnB, MtnC, and MtnD. Of these, the best characterized is the MtnA gene, which produces a transcript of ~600 bases in length, bearing one intron. All of the regulatory elements required for robust response to heavy metals, including copper, lie within 500 bp of the transcription start site. The gene is controlled by a single activator, metal response element-binding transcription factor 1 (MTF-1), which binds two adjacent metal response elements (MRE) 50 bp upstream of the TATA-box (Zhang, 2001). Quantitative PCR (qPCR) analysis of the endogenous gene in Drosophila S2 cells shows that the gene is highly induced (~250-fold) after a short exposure to copper. The total amount of stable MtnA mRNA approximates the level of the abundant transcript for the ribosomal subunit Rp49. Primer extension analysis confirms that transcriptional activation of the endogenous MtnA gene originates from a unique start site overlapping the core promoter. The transcript accumulates linearly for ~12 h, thus measurements in this time window likely reflect relative levels of transcription of the MtnA gene. Importantly, induction at the endogenous chromosomal locus is easily assayed in order to measure physiologically relevant transcriptional activation in the context of native chromatin. Taken together, these properties establish the endogenous MtnA gene as a useful model for studying transcriptional mechanisms governing an inducible gene (Marr, 2006).

Using chromatin immunoprecipitation (ChIP), it was found that the sequence-specific DNA-binding protein MTF-1 is specifically recruited to the MtnA promoter region in response to copper. Curiously, the ChIP of the promoter region was compared to a region 1 kb downstream, a significant amount of MTF-1 was found to be present on the promoter even in the absence of added copper. Under these conditions, little transcription is detected from this gene. As a preliminary experiment to investigate a potential functional interaction between TFIID and MED, it was first asked whether the two complexes are both recruited in a signal-dependent manner to the MtnA gene. Using ChIP, it was found that both TBP and the TAFs are efficiently recruited to the promoter region in response to copper. In addition, the MED17, MED24, MED26, and MED27 subunits of MED are all recruited to the promoter region in response to copper treatment. Consistent with the high level of induction, RNAPII occupancy at the MtnA promoter is also increased in response to heavy metal treatment. Thus, both core coactivator complexes and RNAPII are efficiently recruited to the promoter region upon induction and resultant binding of MTF-1 to the MREs (Marr, 2006).

Because the ChIP assay is limited to measuring response in a heterogeneous population of cells, a transgenic model system was extablished in Drosophila S2 cells in order to visualize the response at the single-cell level. Such an approach has proved useful in understanding transcription factor dynamics in vivo. By selecting for stably transfected MtnA firefly luciferase reporters, a concatenated transgenic locus was generated in a clonal line of S2 cells. The transgenic locus was assayed for dependence on copper using a luciferase assay. Importantly, transcription initiates a unique site that maps to the correct start site of the MtnA core promoter. With this substantial increase in gene number (~2000) at the integrated transgenic locus, it should now be possible to visualize direct recruitment of specific transcription factors to the MtnA promoter within a single cell (Marr, 2006).

As expected, in the absence of heavy metal, MTF-1 is predominantly cytoplasmic; however, in agreement with ChIP data, some MTF-1 can be detected at the transgenic cluster even in the absence of a metal stimulus. Thus, antibody labeling of MTF-1 provides a useful marker for the subnuclear location of the transgene cluster in both induced and uninduced cells. Notably, the locus is not undergoing transcription (as detected by RNA FISH) in the absence of heavy metal induction despite the presence of some MTF-1 at the transgene cluster. Upon copper induction, MTF-1 vacates the cytoplasm and accumulates selectively at the transgenic locus. Under these same conditions, TBP is also actively recruited to this cluster. Consistent with not only TBP but holo-TFIID complex recruitment, it was found that TAF2 also accumulates at the transgene. Likewise MED components recruited to the transgene were detected using antibodies against MED26. As expected, RNAPII is recruited to the cluster in a copper-dependent manner consistent with the transcriptional induction of the transgene under these conditions. In contrast, TBP-related factor 1 (TRF1), a subunit known to be a key component of the RNA polymerase III core promoter recognition complex, is not recruited to the transgene. This negative control helps rule out the possibility that the tandemly reiterated transgene is simply nonspecifically attracting transcription factors (Marr, 2006).

Having established by two independent methods that both TFIID and MED complexes are recruited to the MtnA promoter in an activator-dependent manner, their role in potentiating transcriptional activation of the endogenous MtnA gene was investigated. The efficient technique of RNAi in Drosophila S2 cells was used to knock down expression of TFIID and MED subunits. In addition, the activator MTF-1 was knocked down to ascertain the extent of the activator’s role in induction. After treatment with copper, total RNA was purified from dsRNA treated and untreated S2 cells and then they were assayed by two independent methods. First, a primer extension analysis was used on equivalent amounts of total RNA. This assay revealed that an accurate transcription is detected from one distinct core promoter start site. Next, qPCR normalized to the Rp49 mRNA was used, to confirm that there is little or no global disturbance of RNAPII transcription (Marr, 2006).

Not surprisingly, depletion of MTF-1 severely reduced transcriptional activation from the MtnA promoter, confirming the central role of this activator. RNAi directed against TBP also had a dramatic inhibitory effect. The MtnA promoter is <10% as active when TBP levels are severely depleted. Surprisingly, knockdown of multiple TAFs had little apparent effect on the ability of MTF-1 to activate MtnA. Indeed, depletion of the TAFs actually stimulated (1.5- to 2-fold) production of RNA. With the exception of TAF11, a reduction of individual TAFs resulted in a remarkably uniform response. The reason for this uniformity became apparent when the stability of the TFIID complex was examined in the RNAi-treated cells. The overall stability of the holo-TFIID complex appears to be coupled to the stability of certain individual TAFs. In the most dramatic example, RNAi-targeted reduction of TAF4 leads to the concomitant loss of TAF1, TAF5, TAF6, and TAF9, as well as a detectable reduction in TBP. Interestingly, TAF2 and TAF11 are largely unaffected by depletion of TAF4. Similar results are observed for the other TAFs as well. When the transcript levels of the TAFs were measure after RNAi treatment, it is clear that the loss of stability occurs at the protein level, since the transcript levels for nontargeted TAFs are unaffected. For example, when TAF4 is targeted, only the TAF4 transcript is depleted (Marr, 2006).

In contrast to the TAFs, RNAi reduction of MED subunits gave striking but variable effects on the ability of MTF-1 to activate transcription from the MtnA promoter. Unlike TFIID, the response is far from uniform. For example, dsRNA directed against MED23 has little effect on induction of MtnA, while loss of MED17, the Drosophila SRB4 homolog, has a strong inhibitory effect. The lack of a uniform response in the MED RNAi led to a further investigation of the potential differential response upon depletion of MED subunits at related promoters activated by MTF-1. As discussed above, Drosophila has four metallothionein genes that respond to heavy metals. Three of these—MtnA, MtnB, and MtnD—are active in S2 cells. All three of these genes are specifically activated by the same factor, MTF-1. All three Mtn genes were examined in a single experiment using qPCR. First, it was confirmed that all three promoters, MtnA, MtnB, and MtnD, require MTF-1 for induction. Remarkably, distinct differential requirements were found for MED subunits depending on the promoter. For example, loss of MED13, a subunit of the larger MED complex (ARC-L) thought to play a repressive role in transcription, is not essential for MtnA induction. In contrast, MED13 was found to be important for both MtnB and MtnD activation by MTF-1. In contrast, the opposite specificity was seen with the MED26 subunit, a component of the smaller MED complex (CRSP), thought to play predominantly a coactivator role in transcription. Interestingly, MED26 is required for full induction of the MtnA promoter but is dispensable for MTF-1 activation of the MtnB and MtnD promoters. Thus, these experiments reveal a remarkable example of differential dependence on cofactor composition even though all three promoters tested use the same activator. Apparently, the precise role of individual MED subunits depends on the promoter context and structure, despite the absence of any evidence of direct binding of DNA by the MED complex (Marr, 2006).

To help rule out nonspecific effects on transcription such as a change in the concentration of free RNA polymerase, representative targets from TFIID and MED were tested in a transient transfection assay where the effect to a second promoter can be normalized. In these experiments, TAF4 and MED17 were chosed as representative targets, since TAF4 compromises much of the TFIID complex and MED 17 is likely a component of the core MED complex. The transient transfection data are largely consistent with the data generated at the endogenous locus and at the transgene (Marr, 2006).

The data presented above suggest that activation of the MtnA gene requires specific MED subunits, and at the same time the TAFs appear to be playing a potential negative regulatory role. Because it is clear that the TAFs are specifically recruited in S2 cells to the MtnA promoter in a copper-dependent manner by MTF-1, whether TFIID recruitment can occur in the absence of the MED complex was examined. To achieve this, RNAi directed against MED17 was used, which results in an almost complete loss of MED activity. Surprisingly, TFIID is still efficiently recruited to the MtnA gene. ChIP experiments confirmed that TBP and TAF2 are still actively (and likely directly) recruited to the endogenous MtnA gene by MTF-1 even when the gene is transcriptionally inactive as measured by qPCR analysis. The MtnA luciferase transgene system was used to investigate this relationship at the single-cell level. Without any RNAi, TBP, TAF2, and RNAPII were all recruited to the transgene. In agreement with the ChIP data above, even in the absence of MED activity, after MED17 depletion, TBP and TAF2 are nevertheless efficiently recruited to the transgene. In contrast, no RNAPII can be detected at the transgene consistent with the loss of transcription activation. Apparently, TFIID is recruited to the promoter, but the promoter is not active in supporting transcription. Importantly, recruitment of this 'inactive TFIID' is dependent on the activator MTF-1. In the absence of MTF-1, no TFIID or RNAPII is recruited to the transgene (Marr, 2006).

This perplexing result of recruiting an apparently 'inactive' TFIID prompted an examination of what happens when both TAFs and MEDs were depleted. Remarkably when both the TAFs and MED complex are depleted and 'removed' from the MtnA promoter, MTF-1-dependent activation of transcription is restored to ~95% the level of untreated cells, which is well above the inhibited level observed when the MEDs alone are depleted. In humans and Drosophila, TAFs can be subunits of other complexes such as TFTC and STAGA, so it is possible that the functional interaction analyzed is not TFIID-specific. To test this, specific subunits of these other complexes were targeted to determine if they would have a similar ability to rescue the MED knockdown. Unlike the TFIID subunits, RNAi against dAda2b, dGCN5, dSPT3, and dTRA1 was unable to rescue the loss of the MED subunits. These findings taken together suggest that most likely the functional relationship revealed by these experiments with the MtnA promoter, indeed, involve some regulatory transaction between TFIID and MED (Marr, 2006).

The requirement for coactivator complexes mediating transcriptional responses to activators has been well documented. However, by using an inducible Drosophila gene as a model system, a previously unknown functional interaction has been uncovered between two coactivator complexes, TFIID and MED. In the absence of TAFs, the cell responds inappropriately to a metal stimulus. The cell synthesizes 50%–200% more mRNA from the MtnA gene than it does in the presence of the TAFs. The data suggest that at this gene, TFIID is recruited in an inactive state, a state that impedes initiation of transcription. It is believed that this sets up a checkpoint early in the initiation process to meter the RNA synthesis. The MED complex must be recruited to get past this checkpoint. It is postulated that the MED complex likely modifies TFIID, converting it to an active state. This could be accomplished either through one of the known enzymatic activities of MED, phosphorylating (cdk8) or ubiquitylating (MED8) TFIID subunits, or through some, as yet undetected, chaperone-like function that remodels TFIID into an active conformation. Not surprisingly then, in the absence of MED subunits the cell cannot mount an appropriate response to environmental signals. In fact, depletions of many of the MED subunits lead to <20% of the normal amount of mRNA. Unlike the uniform response to depletion of TAFs, the response to depletion of MEDs is much less uniform. One possibility is that the MED complex is more functionally and structurally diverse than TFIID. Indeed, alternative subcomplexes of MED have been purified biochemically, whereas no such subcomplexes of TFIID have been reported (Marr, 2006).

By analysis of three different Mtn genes, all of which are dependent on the same single activator, it was found, surprisingly, that there is a differential requirement of specific MED subunits at the three Mtn promoters. This is taken as evidence that, depending on the precise arrangement of cis elements and promoter context, the same activator can require different mediator subunits or modules to transmit its signals to the basal apparatus (Marr, 2006).

Interestingly, the kinase module of the MED complex, previously linked with repression functions, is required for efficient activation at two of the promoters. This result, combined with the finding that at the MtnA promoter the TAFs have a repressive regulatory influence on transcription initiation, underscores the difficulty in assigning black and white functions to the coactivator complexes. It is likely that both TFIID and MED interpret multiple inputs from cellular signals and act either positively or negatively depending on the signals received as well as the specific promoter context. As such, the complexes may better be viewed as coregulators since they can play either a positive or negative role in the process of modulating gene expression. For example, only when both TFIID and MED are intact do Drosophila S2 cells produce the appropriate amounts of MtnA mRNA. In contrast, when either coactivator complex is disrupted, aberrant levels of transcription are seen. However, when both coactivator complexes are depleted, a significant level of metal inducible activation is actually restored. Presumably, in this 'stripped down' system, some portion of the remaining TBP pool can mediate transcription. Curiously, in the absence of TAFs but with a full complement of MEDs, there is also an aberrant level of transcription consistent with the notion that there is some finely tuned codependence between the TBP/TAF complex and the MED complex at this promoter (Marr, 2006).

The results also reinforce the notion that the activator is the primary determinant of the transcriptional response. The MTF-1 depletion experiments were the most detrimental to mRNA induction. In the absence of MTF-1, there is no detectable activation of the Mtn genes. In contrast, there is some residual transcription of MtnA even when either the MEDs or TBP are largely depleted from the Drosophila cells. This remaining activity could be due to incomplete depletion, or it could indicate alternative mechanisms of activation that are activator-dependent but can partially bypass the requirement for the coregulator complexes (Marr, 2006).

In the course of testing the requirement for TAFs in activated transcription, the codependent stability of the TFIID complex was discovered. Particularly striking is the finding that TAF4 depletion destabilizes most of the other TAFs and, to some extent, even TBP. Therefore, the TAF depletion experiments most likely reflect a loss of holo-TFIID rather than just the loss of individual subunits. It is worth noting that metazoan organisms contain multiple variants of TAF4: TAF4b in vertebrates and No-hitter in Drosophila. Both of these have been implicated in tissue-specific gene expression. It is conceivable that substitution of this keystone TAF can provide a mechanism to change the entire coregulator profile of TFIID (Marr, 2006).

One intriguing question this work raises is: Why would an activator recruit an inactive TFIID complex to the promoter? There are several previously described cases in which TFIID occupancy at a promoter does not strictly correlate with transcriptional activity. However, in most of these cases the genes being examined were either in a repressed or an unstimulated state. In contrast, the current studies were designed to specifically measure the role of coactivator complexes such as TFIID and MED in the context of an active gene MtnA upon metal stimulation. The ability to deplete MED activity under these conditions revealed the unexpected finding that although TFIID is dynamically recruited to the MtnA promoter, TFIID is mainly held in an 'inactive' state until the second cofactor complex, MED, is recruited. Perhaps this recruitment of an 'inactive' TFIID is a more common phenomenon that can only be detected in special circumstances and may represent a previously unappreciated control mechanism in transcription activation. If the activator first recruits TFIID, then subsequently recruits MED, and there is a requirement for additional factors to potentiate the secondary recruitment of coregulator assemblies, then this provides a potential checkpoint for fine-tuning the control of gene expression. Alternatively, since the cell invests a significant amount of energy in making a high level of transcript, requirement of continued stimulation (i.e., activator bound at the promoter) for mRNA production would provide the most economical use of resources (Marr, 2006).

TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-dependent versus TATA-dependent transcription

The RNA polymerase II core promoter is a structurally and functionally diverse transcriptional module. RNAi depletion and overexpression experiments revealed a genetic circuit that controls the balance of transcription from two core promoter motifs, the TATA box and the downstream core promoter element (DPE). In this circuit, TBP activates TATA-dependent transcription and represses DPE-dependent transcription, whereas Mot1 and NC2 block TBP function and thus repress TATA-dependent transcription and activate DPE-dependent transcription. This regulatory circuit is likely to be one means by which biological networks can transmit transcriptional signals, such as those from DPE-specific and TATA-specific enhancers, via distinct pathways (Hsu, 2008).

The RNA polymerase II core promoter comprises the sequences that direct the initiation of transcription. Although it has often been presumed that the core promoter is a generic entity, current evidence indicates that there is considerable diversity in core promoter structure and function. Hence, the core promoter is a regulatory element (Hsu, 2008 and references therein).

This study focuses on the relation between two core promoter motifs: the downstream core promoter element (DPE) and the TATA box. The TATA box is the most ancient core promoter motif, as it is conserved from archaebacteria to humans. It has a consensus of TATAWAAR, where the upstream T nucleotide is typically located about -31 or -30 relative to the A + 1 in the Initiator (Inr) element. The DPE appears to be conserved among metazoans. It is strictly located from +28 to +33 relative to the A + 1 in the Inr, and has a consensus of RGWYVT in Drosophila (Hsu, 2008).

Both the TATA box and DPE are binding sites for the TFIID basal transcription factor, but TFIID appears to have distinct modes of binding to the two core promoter motifs. The TBP subunit of TFIID binds to the TATA box, whereas the TAF6 and TAF9 subunits of TFIID are in close proximity to the DPE. In addition, the DNase I footprinting patterns on TATA-containing versus DPE-containing promoters are different. In particular, TFIID footprints of DPE-dependent core promoters exhibit a periodic 10-bp DNase I digestion pattern that suggests an extended, close interaction of TFIID from the Inr through the DPE (Hsu, 2008 and references therein).

There are differences in the functional properties of DPE-dependent versus TATA-dependent core promoters. For instance, an enhancer-trapping analysis in Drosophila revealed the existence of DPE-specific as well as TATA-specific transcriptional enhancers. It was also found that a set of factors (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNA polymerase II, PC4, and Sp1) that is sufficient for transcription of promoters containing both TATA and DCE (downstream core element) motifs is not able to transcribe a DPE-dependent promoter. In that case, DPE-dependent transcription was additionally found to require casein kinase II (CKII) and Mediator. In other studies, NC2 (also known as Dr1-Drap1), which was originally identified as a repressor of TATA-dependent transcription, was found to activate transcription from five different DPE-dependent core promoters in reactions performed with a nuclear extract. With a purified transcription system, however, NC2 activation of a DPE-dependent core promoter was not observed (Hsu, 2008).

To determine the nature of the factors that promote DPE-dependent versus TATA-dependent transcription, the properties of key transcription factors was investigated by RNAi depletion, overexpression, and chromatin immunoprecipitation (ChIP) analyses with multiple DPE-dependent and TATA-dependent promoters. The new findings reveal a regulatory circuit that controls the balance between DPE-dependent versus TATA-dependent transcription (Hsu, 2008).

This study used cultured Drosophila cells as the experimental system to investigate DPE versus TATA function. Two sets of reporter constructs were created that contain either TATA or DPE motifs driving a luciferase reporter gene. The DPE-dependent and TATA-dependent promoters in each set were identical, except for the sequences at the positions of the DPE and TATA motifs, and had comparable transcriptional activities (Hsu, 2008).

The effects of several transcription factors were investigated upon DPE versus TATA transcription by RNAi depletion analysis. The transcription factors were selected on the basis of their fundamental importance as well as their potential role in DPE-dependent transcription. First RNAi depletion of each target factor was carried out, and then one-half of the cells was transfected with the DPE-dependent reporter construct and the other half of the cells with the TATA-dependent reporter. The resulting transcription levels were assessed by measurement of the luciferase activities relative to those in mock RNAi controls (Hsu, 2008).

Depletion of TBP sharply decreases TATA-dependent transcription, but has little effect on DPE-dependent transcription. This effect was observed with a distinct and independent set of DPE-dependent and TATA-dependent reporter constructs as well as with a different nonoverlapping dsRNA probe for TBP. Consistent with the ability of TFIIA to promote TBP binding to DNA, depletion of TFIIA reduces TATA transcription more than DPE transcription with two different sets of reporter constructs. In contrast, no differential DPE versus TATA effects were seen upon RNAi depletion of TAF4 (which is essential for the structural integrity of TFIID), TFIIB, CKIIα, a PC4-like protein, subunits of Mediator (Med17, Med24), or subunits of the SAGA/TFTC complex (Gcn5, Spt3, Ada2b) (Hsu, 2008).

Thus, these findings indicate that TBP and, to a lesser extent, TFIIA have a key role in discriminating between DPE- versus TATA-dependent transcription. The stronger effect of TBP relative to TFIIA is consistent with an auxiliary function of TFIIA, such as its ability to increase the binding of TBP to the TATA box. Because depletion of TBP did not adversely affect DPE-dependent transcription, the possibility was considered that DPE-dependent transcription might involve a factor, such as SAGA/TFTC, that lacks TBP. Therefore the effect of depletion of three SAGA/TFTC subunits (Gcn5, Spt3, and Ada2b) was tested, but no substantial decrease was seen in DPE-dependent transcription or any differential DPE versus TATA effects. Thus, it appears unlikely that SAGA/TFTC is important for DPE-dependent transcription. Lastly, upon depletion of CKII, Mediator, PC4-like, TAF4, and TFIIB, a decrease was observed in both DPE-dependent and TATA-dependent transcription. These results are consistent with a more general transcriptional function rather than a DPE-specific or TATA-specific activity for these factors (Hsu, 2008).

NC2 has been previously found to be a DPE-specific transcriptional activator. With a different biochemical system, however, NC2-mediated enhancement of DPE transcription was not observed. Therefore attempts were made to clarify these apparently contrasting results by RNAi analysis of NC2 with DPE versus TATA reporter gene systems. NC2 comprises two subunits, NC2α (Drap1) and NC2β (Dr1). Upon RNAi depletion of either NC2α or NC2β, a more substantial decrease was seen in DPE- relative to TATA-dependent transcription with two different sets of reporter genes as well as with two different dsRNAs. These results therefore indicate that NC2 promotes DPE-dependent transcription relative to TATA-dependent transcription in cultured cells (Hsu, 2008).

Next, the effects were tested of Mot1 (also known as BTAF1 and Hel89B) on DPE versus TATA transcription. Like NC2, Mot1 antagonizes TBP function. NC2 represses TATA-dependent transcription by blocking the association of TBP with other factors such as TFIIA and TFIIB. Mot1 is an ATPase that removes TBP from DNA by an ATP-dependent mechanism. Genetic studies in Saccharomyces cerevisiae suggest that NC2 and Mot1 have related functions. NC2 and Mot1 bind to overlapping regions in the yeast genome and form a complex with TBP and DNA. In addition, although NC2 and Mot1 are often thought to be repressive, a positive function for these factors has been observed in vitro and in vivo (Hsu, 2008 and references therein).

It was observed that RNAi depletion of Mot1 has a stronger detrimental effect on DPE-dependent than TATA-dependent transcription. This effect was seen with two different sets of reporter genes as well as with two independent nonoverlapping dsRNA fragments. Thus, like NC2, Mot1 promotes DPE- relative to TATA-dependent transcription (Hsu, 2008).

To investigate the relationship between TBP, NC2, and Mot1 in the regulation of core promoter activity, different combinations of these factors were codepleted and the resulting effects upon DPE versus TATA transcription were determined. Codepletion of both NC2α and Mot1 preferentially decreases DPE relative to TATA transcription to an extent that is similar to that seen upon depletion of either NC2α or Mot1 alone. These results suggest that NC2 and Mot1 promote DPE-dependent transcription via the same pathway. In contrast, when TBP + Mot1 or TBP + NC2α were codepleted, nearly the same effect on DPE versus TATA transcription was seen as that seen upon depletion of TBP alone. These findings suggest that TBP is downstream from NC2 and Mot1 in the pathway that regulates DPE versus TATA transcription. Thus, NC2 and Mot1 appear to modulate DPE versus TATA transcription by acting via TBP (Hsu, 2008).

To complement the RNAi depletion studies, the effects of overexpression of TBP, Mot1, or NC2 was investigated in S2 cells. In these experiments, TBP, Mot1, or NC2 expression vectors were cotransfected along with the DPE-dependent or TATA-dependent reporter constructs. Overexpression of TBP increases TATA-dependent transcription and decreases DPE-dependent transcription. Conversely, overexpression of Mot1 increases DPE-dependent transcription and decreases TATA-dependent transcription. Overexpression of both subunits of NC2 decreases TATA-dependent transcription, but has little effect on DPE-dependent transcription. Consistent with the two NC2 subunits functioning together in a complex, overexpression of NC2α alone or NC2β alone has no effect on DPE-dependent or TATA-dependent transcription. In addition, a parallel set of overexpression experiments was carried out with TBP, Mot1, and NC2 with a different set of DPE-dependent and TATA-dependent reporter genes, and nearly identical results were obtained. These findings further demonstrate that TBP favors TATA relative to DPE transcription, whereas Mot1 and NC2 favor DPE relative to TATA transcription (Hsu, 2008).

To examine the functions of TBP, Mot1, and NC2 in a more natural context, the effects of RNAi depletion of TBP, Mot1, or NC2 upon transcription of endogenous DPE- or TATA-containing genes was tested in Drosophila Kc cells. In these experiments, secondary/late ecdysone-responsive genes, that are activated upon ecdysone induction, were employed. In this manner, it was possible to characterize the requirements for TBP, Mot1, and NC2 for transcriptional activation (Hsu, 2008).

Many genes in Drosophila are activated by the steroid hormone 20-hydroxyecdysone (20HE). A list of genes was obtained that was induced by 20HE in Drosophila Kc cells. From this list, secondary/late-response genes were identified with DPE+Inr motifs (CG9511, CG16876, Glut1) or TATA + Inr motifs (Obp99c, CG4500) in their core promoters. The 20HE induction of these genes in Kc cells was confirmed by using real-time RT-PCR. In addition, the transcription start sites of each of these genes was verified by primer extension analysis of mRNA isolated from Kc cells (Hsu, 2008).

The RNAi analysis of the endogenous secondary/late-response genes was carried out as follows: TBP, TAF4, NC2α, and Mot1 were each individually depleted by RNAi in Kc cells for 4 d, and then the ecdysone-responsive genes were induced with 20HE for 24 h. The total RNA was isolated, and the transcript levels of the selected genes were determined by real-time RT-PCR. It was observed that depletion of TBP decreases transcription of the TATA-containing promoters and increases transcription of the DPE-containing promoters. Thus, these results suggest not only that TBP activates TATA-dependent promoters, but also that it represses DPE-dependent promoters. Conversely, it was found that depletion of Mot1 or NC2α decreases transcription of DPE-containing promoters and increases transcription of TATA-containing promoters. These findings suggest a positive function of Mot1 and NC2 at DPE-dependent promoters and a negative function at TATA-containing promoters. RNAi depletion of TAF4 causes a substantial decrease in transcription from both DPE-containing and TATA-containing promoters. These results further support the conclusion that TAF4 is required for both DPE-dependent and TATA-dependent transcription (Hsu, 2008).

The RNAi depletion analysis with the endogenous genes leads to nearly the same conclusions as the experiments with the transfected luciferase reporter genes. Both sets of experiments indicate that TBP favors TATA-dependent relative to DPE-dependent transcription, and that Mot1 and NC2 favor DPE-dependent relative to TATA-dependent transcription. However, it is useful to note the two distinctions. First, TBP depletion results in an increase in transcription from endogenous DPE-containing genes, but does not alter transcription from transfected DPE-dependent reporter genes. Second, depletion of Mot1 or NC2α causes an increase in transcription from endogenous TATA-containing genes, but results in a slight decrease in transcription from transfected TATA-dependent reporter genes. The analysis of the endogenous genes is likely to provide a more accurate representation of TBP, Mot1, and NC2 activity than the studies with the transfected genes, because the endogenous genes are in their natural context at the normal copy number and the experiments with the endogenous genes do not involve the extra transfection procedure. Thus, the findings from the analysis of the endogenous genes suggest a repressive function of TBP at DPE-dependent promoters as well as a repressive function of Mot1 and NC2 at TATA-dependent promoters (Hsu, 2008).

The secondary/late ecdysone-responsive genes were further characterized by ChIP analysis with TBP and RNA polymerase II (Rpb3 subunit), for which ChIP-quality antibodies were available. With the TATA-containing CG4500 promoter, there is increased ChIP signal for both TBP and Rpb3 in the promoter region upon 20HE induction. In the control/reference TATA-containing hsp70 promoter, an increase in ChIP of TBP and Rpb3 was also observed in the promoter region. By comparison, with the DPE-containing Glut1 and CG16876 promoters, there is increased ChIP of Rpb3 in the promoter region upon 20HE induction; however, the ChIP signal for TBP does not increase under the same conditions. The absence of an increased ChIP signal for TBP with the DPE-containing promoters does not necessarily indicate that TBP is not present at the promoter; for instance, it is possible that TBP may be in an altered configuration that masks the accessibility of the antibodies. Yet, whether or not TBP is in close proximity to the DPE-containing promoters, these results show that there are differences in the nature of the interaction of TBP with TATA-containing versus DPE-containing promoters (Hsu, 2008).

It is also relevant to note that secondary/late-response genes were chosen in these studies, because secondary/late genes are more likely than primary/early-response genes to be in a naïve state prior to ecdysone induction. To test this notion, RNAi depletion analyses was carried out with two primary/early-response genes, E74A and E75B, both of which contain DPE motifs. With these genes, no change was observed in transcription upon RNAi depletion of TBP, TAF4, Mot1, or NC2α. Moreover, ChIP analysis further revealed that both TBP and RNA polymerase II (Rpb3 subunit) are present at the promoters prior to ecdysone induction. Therefore, it appears likely that these primary/early-response genes exist in a preactivated state that does not require the subsequent action of factors such as TFIID, Mot1, or NC2 (Hsu, 2008).

The RNAi depletion and overexpression data reveal a regulatory circuit with the following properties: TBP activates TATA-dependent transcription and represses DPE-transcription; then, Mot1 and NC2 act to block both the activating and repressive functions of TBP. In this model, there are opposing forces that alter the balance between DPE versus TATA transcription. A decrease in TBP or an increase in Mot1/NC2 favors DPE transcription, whereas an increase in TBP or a decrease in Mot1/NC2 favors TATA transcription. Importantly, the functions of Mot1 and NC2 are dependent on TBP. In addition, the proposed circuit is consistent with the known antagonistic relationship between TBP and NC2 as well as between TBP and Mot1 (Hsu, 2008).

How might TBP repress DPE-dependent transcription? Two possible explanations are suggested. (1) In the absence of a TATA box, TBP might interfere with the proper assembly of the transcription initiation complex. (2) There may be an essential DPE-directed transcription factor that is inhibited by TBP. It is possible that DPE-mediated transcription does not directly involve TBP; there is substantial evidence of RNA polymerase II-mediated transcription occurring in the absence of TBP (Hsu, 2008 and references therein).

It was also considered whether either of the TBP-related factors, TRF1 and TRF2, are used instead of TBP at DPE-containing promoters. To this end, the effect of depleting TRF1 or TRF2 was examined upon the expression of DPE-containing versus TATA-containing endogenous genes. TRF1, which is largely involved in RNA polymerase III transcription in Drosophila, has little or no effect on transcription of DPE-containing or TATA-containing genes. TRF2 is important for both DPE-mediated and TATA-mediated transcription. The effect of TRF2 is similar to that of TAF4, which appears to contribute to both DPE-depentend and TATA-dependent transcription. Neither TRF1 nor TRF2 exhibit an opposite effect on DPE-mediated versus TATA-mediated transcription as do TBP, Mot1, and NC2. In addition, a genome-wide ChIP analysis of TRF2 did not reveal an association of TRF2 with DPE-containing genes. Thus, at the present time, there is no evidence suggesting a specific link between either TRF1 or TRF2 and DPE-mdidated or TATA-mediated transcription (Hsu, 2008).

In conclusion, the analysis of TBP, Mot1, and NC2 in the context of DPE-containing versus TATA-containing promoters has revealed a regulatory circuit that controls the balance between DPE-mediated versus TATA-mediated transcription. This circuit may be a key means by which DPE or TATA specificity of transcriptional enhancers is achieved. In the future, it will be interesting and important to build upon this core circuit to identify the connections and mechanisms by which biological networks use DPE and TATA specificity to increase the number of pathways by which signals can be transmitted (Hsu, 2008).

Structures of three distinct activator-TFIID complexes

Sequence-specific DNA-binding activators, key regulators of gene expression, stimulate transcription in part by targeting the core promoter recognition TFIID complex and aiding in its recruitment to promoter DNA. Although it has been established that activators can interact with multiple components of TFIID, it is unknown whether common or distinct surfaces within TFIID are targeted by activators and what changes if any in the structure of TFIID may occur upon binding activators. As a first step toward structurally dissecting activator/TFIID interactions, the three-dimensional structures of TFIID bound to three distinct activators (i.e., the tumor suppressor p53 protein, glutamine-rich Sp1 and the oncoprotein c-Jun) was determined and their structures were compared as determined by electron microscopy and single-particle reconstruction. By a combination of EM and biochemical mapping analysis, these results uncover distinct contact regions within TFIID bound by each activator. Unlike the coactivator CRSP/Mediator complex that undergoes drastic and global structural changes upon activator binding, instead, a rather confined set of local conserved structural changes were observed when each activator binds holo-TFIID. These results suggest that activator contact may induce unique structural features of TFIID, thus providing nanoscale information on activator-dependent TFIID assembly and transcription initiation (Liu, 2009).

Three D density difference maps generated from reconstructions of the three independent activator/TFIID assemblies (i.e., p53-IID, Sp1-IID, and c-Jun-IID) and free holo-TFIID have served as a method to map the most likely contact sites of these activators within the native TBP-TAF complex. Remarkably, each activator contacts TFIID via select TAF interfaces within TFIID. The unique and localized arrangements of these three activators contacting different surfaces of TFIID could be indicative of the wide diversity of potential activator contact points within TFIID that would be dependent on both the specificity of activation domains as well as core promoter DNA sequences appended to target gene promoters. It is also possible, however, that these distinct activator-TFIID contacts can form a common scaffold when TFIID binds to the core promoter DNA (Liu, 2009).

It is well established that activators including p53, Sp1, and c-Jun frequently work synergistically with each other or other activators to potentiate selective gene expression programs in response to a variety of stimuli in vivo. Therefore, combinatorial mechanisms of promoter activation might favor distinct nonoverlapping activator-binding sites within TFIID, which can be achieved by specific interactions between selective TAF subunits and activators. Indeed, it was established that TAF1 and TAF4 serve as coactivators for Sp1, while TAF1, TAF6, and TAF 9 mediate p53-dependent transactivation and TAF1 and TAF7 subunits are thought to be coactivators for c-Jun. Since activators make sequence-specific contacts with the DNA template at various positions upstream of the core promoter, it is also plausible that activators bound to unique surfaces of TFIID can influence specific structures of a promoter as the DNA traverses along TFIID resulting in distinct activator/promoter DNA structures (Liu, 2009).

Activator mapping results also complement and structurally extend the functional relevance of previous biochemical and immunomapping studies of TFIID. For example, label transfer studies show that the N-terminal activation domain of p53 contacts TAF6, confirming previous biochemical evidence showing that amino acids 1-42 of p53 contact TAF6/9. In support of this observation, the p53-IID 3D structure indicates that p53 contacts TFIID at lobes A and C where TAF6/9 are located as determined by EM immunomapping. In addition, previous studies have shown that both TBP and TAF1 can directly contact p53 in the absence of additional TFIID subunits. Interestingly, body-labeled p53 cross-linked to TAF1, TAF5, and weakly to TBP, thus extending the immunomapping studies that determined the locations of TBP and the N terminus of TAF1 at lobe C. Thus, EM activator mapping studies show a significant interface between p53 and specific TAFs located at lobes A and C of TFIID. Likewise, Sp1 label transfer results confirmed previous biochemical data showing a direct interaction between TAF4 and the N-terminal glutamine-rich domains of Sp1. In addition to TAF4, TAF6 was identified as weakly cross-linked to Sp1, suggesting that TAF6 may also be in the vicinity but perhaps more distal to the N terminus of Sp1. The largest TFIID subunit, TAF1, was cross-linked when body-labeled Sp1 was used. This result was not entirely unexpected, since previous studies found that TAF1 is required for Sp1-dependent transactivation, possibly through a direct interaction between TAF1 and Sp1 (Liu, 2009).

In comparison with p53 and Sp1, body-labeled c-Jun was shown to contact TAF1 and TAF6 in label transfer studies with no subunits contacting the N-terminal activation domain of c-Jun. This N-terminal activation domain of c-Jun may be structurally flexible or predominantly unstructured and is apparently positioned away from TFIID contacts. Indeed, successful structural studies of c-Jun thus far have been limited to the C-terminal leucine zipper DNA-binding region when bound to DNA. Previous biochemical assays have shown that the C-terminal basic leucine zipper DNA-binding region also contacts the N terminus of TAF1 (Liu, 2009).

It is worth noting that the extra density representing c-Jun and the other activator polypeptides in EM studies may not reflect the full-expected size of the activators. This is due to the presence of large unstructured regions in these proteins that are averaged out during structural analysis. As activators contain multiple molten globular domains that likely interact with different partners, one would expect a high degree of structural disorder in the domains that are not in direct contact with TFIID. Thus, the extra density associated with each activator determined from the single-particle reconstructions likely only represents minimally the most stably associated portion of activators bound to TFIID. This common situation would invariably lead to underrepresenting the actual size of the activator in a manner not unlike crystal structures of domains with flexible loops that become 'invisible' in the crystal structure (Liu, 2009).

Based on EM immunomapping, there are two copies of TAF6 within TFIID, wherein one copy resides in lobe A and another in lobe B. Collectively, the current studies suggest that two distinct activators (p53 and c-Jun) strongly contact the two different TAF6 subunits that are each located in different lobes of TFIID. It is unknown how p53 or c-Jun discriminates between TAF6 on lobe A versus B when binding to TFIID. In the future, it will be interesting to investigate if these two activators can bind to a single TFIID molecule simultaneously and decipher 3D structures of TFIID assemblies bound to select endogenous promoter DNA sequences in the presence and absence of distinct activators that are engaged in synergistic transcriptional activation (Liu, 2009).

It is of note that unlike the radical, diverse, and global structural changes observed with CRSP/Mediator complexes upon activator binding, TFIID largely retains its overall architecture when bound by three different activators. Interestingly, this study found that two of the activator/IID structures, p53-IID and Sp1-IID assemblies appear to be more constricted around the central cavity with narrower ChB-D and ChA-B channels, while the third structure, c-Jun-IID, remains most similar to free holo-TFIID. In particular, the p53-IID structure more closely resembles the closed conformational state of the previous cryo-TFIID structure. To test if p53-bound TFIID mimics the most closed conformational form of holo-TFIID, 3D reconstructions were performed using either the most closed or 'open' cryo-TFIID structures as an initial reference volume for refinement. Interestingly, it was found that both newly refined 3D structures generated from either the closed or open reference volume are fairly similar, with possibly a partial occupancy of p53 on lobe A. These findings suggest that the overall p53-TFIID structure tends to move toward the closed conformation with moderate movement at the outer tips of lobes A and B, even though p53-IID is predominantly observed in an intermediate average conformational form between the most closed and open forms. Perhaps factors contacting lobe A or C can induce certain coordinated movements within lobes that lead to a closed conformation of TFIID (Liu, 2009).

Although TFIID largely retains its prototypic global architecture upon activator binding, several common localized structural changes induced upon activator binding were observed in the 3D reconstruction. For example, a prominent and consistent induced extra density protrusion located in lobe D was observed when each of the three different activators binds TFIID. Given that all these activators are represented by distinct densities with unique sizes and shapes within the bound TFIID structure, and the fact that it has been demonstrated that they each can target different subunits within TFIID by a number of independent biochemical assays, it seems reasonable to assign 'unique and significant' extra densities located at distinct sites as representing the different bound activators. In contrast, the common similarly sized extra density seen at lobe D of each activator-IID structure most likely represents a conserved conformational change induced by these three different activators. Interestingly, this protrusion in lobe D resides distal to each of the activator-binding sites, suggesting that these three activators may potentially induce a long-range internal conformational change within TFIID. It would be intriguing to identify which TAF subunits are located at the tip of lobe D and eventually determine the function, if any, of this extended lobe in activator-induced transcription initiation. However, despite the potential significance of these structural changes induced by activators, it is premature to speculate regarding their functional importance (Liu, 2009).

Architecture of an RNA polymerase II transcription pre-initiation complex

The protein density and arrangement of subunits of a complete, 32-protein, RNA polymerase II (pol II) transcription pre-initiation complex (PIC) were determined by means of cryogenic electron microscopy and a combination of chemical cross-linking and mass spectrometry. The PIC showed a marked division in two parts, one containing all the general transcription factors (GTFs) and the other pol II. Promoter DNA was associated only with the GTFs, suspended above the pol II cleft and not in contact with pol II. This structural principle of the PIC underlies its conversion to a transcriptionally active state; the PIC is poised for the formation of a transcription bubble and descent of the DNA into the pol II cleft (Murakami, 2013).

This study has revealed a central principle of the PIC: the association of promoter DNA only with the GTFs and not with pol II. Promoter DNA is suspended above the pol II cleft, contacting three GTFs -- TFIIB, TFIID (TBP subunit), and TFIIE -- at the upstream end of the cleft (TATA box) and contacting TFIIH (Ssl2 helicase subunit) at the downstream end. In between, the DNA is free and available for action of the helicase, which untwists the DNA to introduce negative superhelical strain and thereby promote melting at a distance (Murakami, 2013).

This principle of the PIC is a consequence of the rigidity of duplex DNA. The promoter duplex must follow a straight path, whereas bending through ~90° is required for binding in the pol II cleft. Only after melting can the DNA bend for entry in the cleft. Melting is thermally driven, induced by untwisting strain in the DNA above the cleft. A melted region is short-lived and must be captured by binding to pol II, which occurs rapidly enough because the DNA is positioned above the cleft. The GTFs therefore catalyze the formation of a stably melted region (transcription bubble) in two ways, by the introduction of untwisting strain (by the helicase) and by positioning promoter DNA (Murakami, 2013).

Untwisting strain is distributed throughout the DNA above the pol II cleft, so melting may occur at any point, but only a melted region adjacent to TFIIB is stabilized by binding to pol II. The reason is again the rigidity of duplex DNA, and the requirement for a sharp bend adjacent to TFIIB to penetrate the pol II cleft. A single strand of DNA must extend from the point of contact with TFIIB, ~13 bp downstream of the TATA box, through the binding site for the transcription bubble in pol II. TFIIB may also interact with the single strand to stabilize the bubble (Murakami, 2013).

These conclusions are based on results from both cryo-EM and XL-MS, which served to validate one another: Segmentation and labeling of electron density, based on fitting pol II and other known structures, was consistent with all but three of 266 cross-links observed. The PIC structure is also consistent with partial structural information from x-ray crystallography (pol II-TFIIB, pol II-TFIIS, TFIIA-TBP-TFIIB-DNA, and Tfb2-Tfb5), from nuclear magnetic resonance (Tfb1-Tfa1 and Tfa2-DNA), and from EM (core and holo TFIIH). This consistency provides cross-validation, both supporting this PIC structure and establishing the relevance of the partial structural information. Further consistency was found with the results of FeBABE cleavage mapping of complexes formed in yeast nuclear extract; the locations of proteins along the DNA in the PIC structure and those determined with FeBABE cleavage differ by no more than 5 bp. This PIC structure also agrees with results of protein-DNA cross-linking in a reconstituted human transcription system; positions of TFIIE and TFIIH differ between the two studies by ~20 and 10 bp. The location of Ssl2 in this structure, ~30 bp downstream from the TATA box, supports the proposal, made on the basis of previous DNA-protein cross-linking analysis, that helicase action torques the DNA to introduce untwisting strain and thereby to promote melting at a distance (Murakami, 2013).

Association of the winged helix motif of the TFIIEalpha subunit of TFIIE with either the TFIIEbeta subunit or TFIIB distinguishes its functions in transcription

In eukaryotes, the general transcription factor TFIIE consists of two subunits, alpha and beta, and plays essential roles in transcription. Structure-function studies indicate that TFIIE has three-winged helix (WH) motifs, with one in TFIIEα and two in TFIIEβ. Recent studies suggested that, by binding to the clamp region of RNA polymerase II, TFIIEα-WH promotes the conformational change that transforms the promoter-bound inactive preinitiation complex to the active complex. To elucidate its roles in transcription, functional analyses of point-mutated human TFIIEα-WH proteins were carried out. In vitro transcription analyses identified two classes of mutants. One class was defective in transcription initiation, and the other was defective in the transition from initiation to elongation. Analyses of the binding of this motif to other general transcription factors showed that the former class was defective in binding to the basic helix-loop-helix motif of TFIIEβ and the latter class was defective in binding to the N-terminal cyclin homology region of TFIIB. Furthermore, TFIIEα-WH bound to the TFIIH XPB subunit at a third distinct region. Therefore, these results provide further insights into the mechanisms underlying RNA polymerase II activation at the initial stages of transcription (Tanaka, 2015).

dTAF10- and dTAF10b-containing complexes are required for ecdysone-driven larval-pupal morphogenesis in Drosophila melanogaster

In eukaryotes the TFIID complex is required for preinitiation complex assembly which positions RNA polymerase II around transcription start sites. Histone acetyltransferase complexes including SAGA and ATAC, modulate transcription at several steps through modification of specific core histone residues. This study investigated the function of Drosophila proteins TAF10 and TAF10b, which are subunits of dTFIID and dSAGA, respectively. The simultaneous deletion of both dTaf10 genes impaired the recruitment of the dTFIID subunit dTAF5 to polytene chromosomes, while binding of other TFIID subunits, dTAF1 and RNAPII was not affected. The lack of both dTAF10 proteins resulted in failures in the larval-pupal transition during metamorphosis and in transcriptional reprogramming at this developmental stage. Importantly, the phenotype resulting from dTaf10+dTaf10b mutation could be rescued by ectopically added ecdysone, suggesting that dTAF10- and/or dTAF10b-containing complexes are involved in the expression of ecdysone biosynthetic genes. These data support the idea that the presence of dTAF10 proteins in dTFIID and/or dSAGA is required only at specific developmental steps. It is proposed that distinct forms of dTFIID and/or dSAGA exist during Drosophila metamorphosis, wherein different TAF compositions serve to target RNAPII at different developmental stages and tissues (Pahi, 2015).

Rapid dynamics of general transcription factor TFIIB binding during preinitiation complex assembly revealed by single-molecule analysis

Transcription of protein-encoding genes in eukaryotic cells requires the coordinated action of multiple general transcription factors (GTFs) and RNA polymerase II (Pol II; see Drosophila Pol II). A "step-wise" preinitiation complex (PIC) assembly model has been suggested based on conventional ensemble biochemical measurements, in which protein factors bind stably to the promoter DNA sequentially to build a functional PIC. However, recent dynamic measurements in live cells suggest that transcription factors mostly interact with chromatin DNA rather transiently. To gain a clearer dynamic picture of PIC assembly, this study established an integrated in vitro single-molecule transcription platform reconstituted from highly purified human transcription factors and complemented it by live-cell imaging. Real-time measurements were performed of the hierarchal promoter-specific binding of TFIID, TFIIA, and TFIIB. Surprisingly, it was found that while promoter binding of TFIID and TFIIA is stable, promoter binding by TFIIB is highly transient and dynamic (with an average residence time of 1.5 sec). Stable TFIIB-promoter association and progression beyond this apparent PIC assembly checkpoint control occurs only in the presence of Pol II-TFIIF. This transient-to-stable transition of TFIIB-binding dynamics has gone undetected previously and underscores the advantages of single-molecule assays for revealing the dynamic nature of complex biological reactions (Zhang, 2016).

Identification of regions in the Spt5 subunit of DSIF that are involved in promoter proximal pausing

DRB-sensitivity inducing factor (DSIF2, or Spt4/5) is a conserved transcription elongation factor that both inhibits and stimulates transcription elongation in metazoans. In Drosophila and vertebrates, DSIF together with negative elongation factor (NELF) associates with RNA polymerase II (Pol II) during early elongation and causes Pol II to pause in the promoter proximal region of genes. The mechanism of how DSIF establishes pausing is not known. This study constructed Spt5 mutant forms of DSIF and tested their capacity to restore promoter proximal pausing to DSIF-depleted Drosophila nuclear extracts. The C-terminal repeats (CTR) region of Spt5, which has been implicated in both inhibition and stimulation of elongation, is dispensable for promoter proximal pausing. A region encompassing KOW4 and KOW5 of Spt5 is essential for pausing, and mutations in KOW5 specifically shift the location of the pause. RNA crosslinking analysis reveals that KOW5 directly contacts the nascent transcript and deletion of KOW5 disrupts this interaction. These results suggest that KOW5 is involved in promoter proximal pausing through contact with the nascent RNA (Qiu, 2017).

Drosophila TRF2 and TAF9 regulate lipid droplet size and phospholipid fatty acid composition

The general transcription factor TBP (TATA-box binding protein) and its associated factors (TAFs) together form the TFIID complex, which directs transcription initiation. Through RNAi and mutant analysis, this study identified a specific TBP family protein, TRF2, and a set of TAFs that regulate lipid droplet (LD) size in the Drosophila larval fat body. Among the three Drosophila TBP genes, trf2, tbp and trf1, only loss of function of trf2 results in increased LD size. Moreover, TRF2 and TAF9 regulate fatty acid composition of several classes of phospholipids. Through RNA profiling, TRF2 and TAF9 were found to affect the transcription of a common set of genes, including peroxisomal fatty acid beta-oxidation-related genes that affect phospholipid fatty acid composition. Knockdown of several TRF2 and TAF9 target genes results in large LDs, a phenotype which is similar to that of trf2 mutants. Together, these findings provide new insights into the specific role of the general transcription machinery in lipid homeostasis (Fan, 2017).

This study reveals a rather specific role of TRF2 and TAFs, which are general transcription factors, in regulating LD size. In addition, TRF2 and TAF9 affect phospholipid fatty acid composition, most likely through ACOX genes which mediate peroxisomal fatty acid β-oxidation (Fan, 2017).

By binding to their responsive elements in target genes, specific transcription factors like SREBP (see Drosophila Srebp), PPARs and NHR49, play important roles in lipid metabolism. It is interesting to find that the general transcription machineries, in this case TRF2 and core TAFs, also exhibit specificity in regulating lipid metabolism. In the Drosophila late 3rd instar larval fat body, defects in trf2 cause increased LD size, whereas mutation of the other two homologous genes, tbp and trf1, have no obvious effects on lipid storage. Inactivation of taf genes causes a similar phenotype to trf2 mutation, suggesting that TRF2 may associate with these TAF proteins to direct transcription of specific target genes. Moreover, trf2 mutants have large LDs at both 2nd and early 3rd instar larval stages, suggesting that general transcription factors are also required at early developmental stages for LD size regulation. Interestingly, taf9 mutants have no obvious phenotype at these stages. It is possible that TAF9 may act as an accessory factor compared to promoter-binding TRF2. This is consistent with the fact that less genes are affected in taf9 mutants than trf2 mutants in RNA-seq analysis. It was also found that knockdown of trf2 in larval and adult fat body leads to different LD phenotype. This may be due to different lipid storage status or different LD size regulatory mechanisms between larval and adult stages (Fan, 2017).

The finding of this study adds to the growing evidence supporting a specific role of general transcription factors in lipid homeostasis. For example, knockdown of RNA Pol II subunits such as RpII140 and RpII33 leads to small and dispersed LDs in Drosophila S2 cells. Mutation in DNA polymerase δ (POLD1) leads to lipodystrophy with a progressive loss of subcutaneous fat. Furthermore, TAF8 and TAF7L were reported to be involved in adipocyte differentiation. Moreover, previous studies showed that several subunits of the Mediator complex interact with specific transcription factors and play important roles in lipid metabolism. Added together, these lines of evidence strongly support essential and specific roles of the core/basal transcriptional machinery components in lipid metabolism (Fan, 2017).

Using RNA-seq analysis, rescue experiments and ChIP-qPCR, identified several target genes regulated by TRF2 and TAF9. It is possible that other genes may regulate LD size but were missed in the RNA-seq analysis and RNAi screening assay because of either insufficient alterations in genes expression (lower than the twofold threshold) or low efficiency of RNAi. Among all the verified target genes of TRF2 and TAF9,CG10315, which strongly rescues the trf2G0071 mutant phenotype when overexpressed and encodes the eukaryotic translation initiation factor eIF2B-δ, may be a good candidate for further study. Although they are best known for their molecular functions in mRNA translation regulation, eIFs have been implicated in several other processes, including cancer and metabolism. For example, in yeast, eIF2B physically interacts with the VLCFA synthesis enzyme YBR159W. In adipocytes, eIF2α activity is correlated with the anti-lipolytic and adipogenesis inhibitory effects of the AMPK activator AICAR. In addition, given the evidence that some eIFs, such as eIF4G and eIF-4a, localize on LDsand knockdown of some eIFs, including eIF-1A, eIF-2β, eIF3ga, eIF3-S8 and eIF3-S9, results in large LDs in Drosophila S2 cells, it is important to further explore the specific mechanisms of these eIFs in LD size regulation (Fan, 2017).

Although TRF2 exists widely in metazoans and shares sequence homology in its core domain with TBP, it recognizes sequence elements distinct from the TATA-box. A previous study has investigated TRF2- and TBP-bound promoters throughout the Drosophila genome in S2 cells and revealed that some sequence elements, such as DRE, are strongly associated with TRF2 occupancy while the TATA-box is strongly associated with TBP occupancy (Isogai, 2007). This study also identified that DRE is significantly enriched in extended promoters of the 181 target genes. The distribution of TATA-boxes in the core promoters of the 181 target genes compared with all genes was further explored, and it was found that the TATA-box is not enriched in the core promoters of TRF2 target genes. The proportion of TATA-box is 0.155 (75 of 484 isoforms) for the 181 target genes while the proportion is 0.217 (7849 of 36099 isoforms) for all genes as the background. These results suggest that TRF2 and TAF9 may regulate the expression of a subset of genes by recognizing specific sequence elements such as DRE but not the TATA-box (Fan, 2017).

This study shows that expression of peroxisomal fatty acid β-oxidation pathway genes, including two acyl-CoA oxidase (ACOX) genes, CG4586 and CG9527, the β-ketoacyl-CoA thiolase gene CG9149, and the enoyl-CoA hydratase gene CG9577, is regulated by TRF2 and TAF9. Lipidomic analysis indicates that in the fat body of trf2 and taf9 RNAi, many phospholipids, such as PA, PC, PG and PI, contain more long chain fatty acids. Furthermore, knockdown of CG4586 and CG9527 in the fat body also causes similar changes.

These results coincide with the function of ACOX, which is implicated in the peroxisomal fatty acid β-oxidation pathway for catabolizing very long chain fatty acids and some long chain fatty acids. Similar to these findings, a previous study found that defective peroxisomal fatty acid β-oxidation resulted in enlarged LDs in C. elegans and blocked catabolism of LCFAs, such as vaccenic acid, which probably contributed to LD expansion in mutant worms. Since overexpressing CG4586 or CG9527 only marginally rescues the enlarged LD phenotype of trf2 mutants, it remains to be determined whether the increased level of long chain fatty acid-containing phospholipids contributes to LD size. Regarding the regulation of fatty acid chain length in phospholipids, a recent study reported that there was increased acyl chain length in phospholipids of lung squamous cell carcinoma accompanied by significant changes in the expression of fatty acid elongases (ELOVLs) compared to matched normal tissues. A functional screen followed by phospholipidomic analysis revealed that ELOVL6 is mainly responsible for phospholipid acyl chain elongation in cancer cells. The current findings provide new clues about the regulation of fatty acid chain length in phospholipids. ELOVL and the peroxisomal fatty acid β-oxidation pathway may represent two opposing regulators in determining fatty acid chain length in vivo (Fan, 2017).

Previous studies have shown that TRF2 is involved in specific biological processes including embryonic development, metamorphosis, germ cell differentiation and spermiogenesis. The current results reveal a novel function of TRF2 in the regulation of specialized transcriptional programs involved in LD size control and phospholipid fatty acid composition. Since TRF2 is conserved among metazoans, its role in the regulation of lipid metabolism may be of considerable relevance to various organisms including mammals. These findings may provide new insights into both the regulation of lipid metabolism and the physiological functions of TRF2 (Fan, 2017).

The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription

Gene expression is regulated by promoters, which initiate transcription, and enhancers, which control their temporal and spatial activity. However, the discovery that mammalian enhancers also initiate transcription questions the inherent differences between enhancers and promoters. This study investigate the transcriptional properties of predominantly mesodermal enhancers during Drosophila embryogenesis using characterized developmental enhancers. While the timing of enhancer transcription is generally correlated with enhancer activity, the levels and directionality of transcription are highly varied among active enhancers. To assess how this impacts function, a dual transgenic assay was developed to simultaneously measure enhancer and promoter activities from a single element in the same embryo. Extensive transgenic analysis revealed a relationship between the direction of endogenous transcription and the ability to function as an enhancer or promoter in vivo, although enhancer RNA (eRNA) production and activity are not always strictly coupled. Some enhancers (mainly bidirectional) can act as weak promoters, producing overlapping spatio-temporal expression. Conversely, bidirectional promoters often act as strong enhancers, while unidirectional promoters generally cannot. The balance between enhancer and promoter activity is generally reflected in the levels and directionality of eRNA transcription and is likely an inherent sequence property of the elements themselves (Mikhaylichenko, 2018).

Through the integration of information on transcription initiation in the noncoding genome [using deeply sequenced CAGE (Shiraki, 2003) and PRO-cap (Mahat, 2016)] with that of developmental enhancer activity (using hundreds of in vivo characterized embryonic enhancers), this study assessed the general properties of Drosophila eRNA. The results indicate that the general features of eRNA are highly conserved from flies to humans, including the level and orientation of eRNA transcription and the relative positioning of the INR motif. During the course of this study, 56 transgenic lines were generated to functionally assess regulatory elements with different eRNA properties for both enhancer and promoter activity. The results uncovered a number of intriguing features suggesting that there is a continuum of enhancer and promoter functions matching the continuum of endogenous transcription (Mikhaylichenko, 2018).

Comparing endogenous enhancer transcription with endogenous enhancer activity in transgenic embryos revealed a very strong global correlation between both the timing (developmental stage) and place (tissue) of enhancer activity. This is consistent with similar global comparisons in cell culture models and suggests a mechanistic link to TF occupancy or some other property of enhancer function. However, active enhancers were observed that have a wide range of eRNA levels, with many active enhancers having very low or undetectable eRNA at the stages when the enhancer is active. Similarly low levels of eRNA may also occur in other species; 35% of putative C. elegans enhancers do not overlap transcription initiation clusters (TICs), while 60% of intergenic putative mouse enhancers do not contain eRNA, as reported in one study. While these percentages may be overestimated due to the inclusion of elements that are not enhancers, nearly a third (20%-33%) of nontranscribed regulatory regions demonstrated enhancer activity in a luciferase assay. In the context of this study, all elements were confirmed embryonic enhancers, and this study carefully matched the stage of enhancer activity to the stage of eRNA detection. Active embryonic enhancers therefore are transcribed in a broad range, with the highly transcribing enhancers producing several orders of magnitude more transcripts than those with the weakest transcription, suggesting that eRNA production and enhancer activation can be uncoupled (at least for a subset of enhancers). For enhancers with very weak transcription, eRNAs are likely to be present only sporadically or in a minority of cells, suggesting that their continued presence is unlikely to be essential for these enhancers' function, although the act of transcription might be (Mikhaylichenko, 2018).

The presence of Pol II and the basal transcriptional machinery at enhancers and their ability to transcribe eRNAs question whether there is an inherent difference between an enhancer and a promoter, with some proposing a unified architecture between the two. To disentangle both activities, a new dual transgenic assay was developed that can measure enhancer and promoter activity at the same genomic location in the same embryos such that the timing as well as tissue specificity of both activities can be directly compared. Transgenic assays have the advantage of being able to measure regulatory activity at the endogenous levels of TFs and within a consistent chromatinized context-two properties that have a major impact on both enhancer and promoter activity. The readout (in situ hybridization) provides both spatial and temporal information at single-cell resolution, although it is difficult to derive quantitative information on activity-a clear disadvantage compared with in vitro reporter assays (Mikhaylichenko, 2018).

This study tested 27 regulatory elements (20 in both orientations) from different genomic locations and with different transcriptional properties for both enhancer and promoter activity. The results indicate that highly transcribed developmental enhancers can function as weak promoters in vivo. The spatial pattern of promoter activity was generally a subset of the tissues in which the enhancer was active, indicating that both activities can occur in the same cells from the same element. This promoter function depended largely on the orientation in which the element was inserted, matching the direction of enhancer transcription in its endogenous location: Bidirectional elements (both enhancers and promoters) can generally function as promoters in both orientations, while unidirectional elements have orientation-dependent activity. This indicates that promoter activity has intrinsic directionality and suggests the presence of directional sequence motifs within enhancer elements. In keeping with this, bidirectional mammalian promoter regions contain separate motifs that promote transcription in either direction (Core, 2014; Duttke, 2015); the current results point to a similar sequence-based determinant of enhancer directionality in Drosophila, supported by the presence of potential 'pairs' of INR motifs within bidirectional enhancers at the two points of maximal divergent transcription. Intragenic enhancers have been shown previously to act as alternative promoters, regulating unidirectional transcription in the direction of the host gene's expression to produce lncRNAs that are abundant, stable (polyadenylated), and spliced (Kowalczyk, 2012). In the case of the intergenic enhancers examined in this study, there is no evidence that they produce stable long transcripts. Standard strand-specific poly(A)+ RNA-seq did not detect any RNA at the vast majority of these enhancers, suggesting that Pol II elongation is fundamentally different at intergenic versus intragenic enhancers (Mikhaylichenko, 2018).

Recent high-throughput studies indicate that the same sequences can function as both promoters and enhancers in vitro, although gene promoters generated more promoter activity compared with distal elements (Nguyen, 2016; van Arensbergen, 2017). While the current results also show that the same sequences can harbor both activities, some key differences were uncovered. Tested elements that overlap a gene's main promoter, while acting as strong promoters both endogenously and in the promoter assay, do not possess enhancer activity (at least for four of the five elements examined). In contrast, some alternative gene promoters have an intriguing dual functionality, being able to act with seemingly equal strength as strong enhancers and promoters at the same stage in the same tissues. Using luciferase assays, Li (2012) found that strong and weak promoters have different enhancer activities with an inverse relationship between the two functions. In the context of embryonic development, the current results generally agree with this: Strong promoters (the main genes' promoters) generally have no detectable enhancer activity, while 'strong' (highly active) intergenic enhancers have weak (or not detectable) promoter activity (at least for the ones that were tested). This indicates that developmental enhancers and gene promoters generally have different intrinsic properties (Mikhaylichenko, 2018).

However, this study also found interesting intermediate cases between the two, which suggests a relationship between the directionality of eRNA transcription and the ability to function as an enhancer or promoter in vivo. When bidirectionally transcribed, alternative gene promoters can function as both strong promoters and enhancers in vivo in both orientations. In contrast, when unidirectionally transcribed, the element can generally function only as a promoter (and, in a few cases, as an enhancer) in an orientation-dependent manner matching its direction of transcription. One interesting example of the latter is an ~400-bp DNase 1 hypersensitivity site (DHS element) that overlaps the promoter of the twist gene and is transcribed in a unidirectional manner. This element can function as both a promoter and an enhancer but, interestingly, can perform both functions in only one orientation and is inactive in the other. These results suggest that some enhancers may have evolved to drive proximal orientation-dependent activation, possessing strong intrinsic promoter potential but lacking the ability to act more distally in an orientation-independent manner. To summarize, bidirectional cis-regulatory elements (either enhancers or promoters) can often function as both enhancers and promoters (although to different degrees) in an orientation-independent manner. In contrast, unidirectional elements generally function only in the orientation in which they are transcribed (Mikhaylichenko, 2018).

Taken together, these results suggest a continuum of functions that mirrors the continuum of eRNA directionality and levels of transcription at cis-regulatory elements. This spans from gene promoters that have high levels of unidirectional transcription and function mainly as orientation-dependent promoters (with little or no enhancer function) to elements with bidirectional (high level) transcription giving both promoter and enhancer orientation-independent activity (alternative promoters) to more distal elements with low levels of asymmetric or bidirectional transcription, which function mainly as enhancers, with a subset having weak orientation-independent promoter activity (Mikhaylichenko, 2018).

Bidirectionality has been suggested to be the ground state of transcription (Jin, 2017), and enhancer transcription may reflect this, serving as a source of evolutionary novelty. The finding that some TF-binding sites may possess the ability to initiate transcription suggests that selection for enhancer activity could allow promoter activity to arise as a by-product. If the presence of low-level promoter activity either as a consequence of selection for enhancer activity or simply due to the relative nonspecificity of the transcriptional machinery is common in enhancer elements, then eRNA could be exploited by evolution for other purposes in transcriptional regulation, including coactivator activity (e.g., activation of CBP [Bose, 2017] or TF trapping at enhancers [Sigova, 2015]). Alternatively, transcribed enhancers may have evolved from promoters, where a promoter was duplicated and became separated from its target gene over evolutionary time. Gradually, the sequence features leading to strong promoter activity would become more degenerate, while the element may gain more TF-binding sites. Although there is currently no evidence of this, it would fit with the promoter and enhancer activity that was observed in this study and with the fact that some species do not have distal enhancers but rather regulate gene expression by TF binding very close to the promoter. Although very speculative, alternative promoters may represent an intermediate state (from an evolutionary perspective) between promoters and enhancers. A previous study proposed that developmental enhancers evolve from inducible-type promoters (Arenas-Mena 2017). Of the elements that were tested in this study, main gene promoters appear to have evolved to drive proximal orientation-dependent activation, possessing strong intrinsic promoter potential. At the other extreme, distal enhancers possess weak promoter potential but seem to have specialized toward a distal orientation-independent mode of action-a function achieved, presumably, through acquiring binding sites for a set of factors distinct from promoters. Distal enhancers themselves represent a heterogeneous population of elements with variable transcriptional properties. The coexistence of the two functions opens many questions: How can the same regulatory element facilitate enhancer and promoter function? Can one function be perturbed independently of the other? A preliminary answer to the latter is suggested in this study: Enhancer function was unaffected by changing orientation, while some promoter activity was lost, suggesting separate directional sequence determinants for these promoters' activity (Mikhaylichenko, 2018).

Assembly of SNAPc, Bdp1, and TBP on the U6 snRNA gene promoter in Drosophila melanogaster

U6 snRNA is transcribed by RNA polymerase III (Pol III) and has an external upstream promoter that consists of a TATA sequence recognized by the TBP subunit of the Pol III basal transcription factor IIIB, and a proximal sequence element (PSE) recognized by the small nuclear RNA activating protein complex (SNAPc). Previous work found that Drosophila melanogaster SNAPc (DmSNAPc) bound to the U6 PSE can recruit the Pol III general transcription factor Bdp1 to form a stable complex with the DNA. This study shows that DmSNAPc-Bdp1 can recruit TBP to the U6 promoter, and a region of Bdp1 was identified that is sufficient for TBP recruitment. Moreover, it was found that this same region of Bdp1 cross-links to nucleotides within the U6 PSE at positions that also cross-link to DmSNAPc. Finally, cross-linking mass spectrometry reveals likely interactions of specific DmSNAPc subunits with Bdp1 and TBP. These data, together with previous findings, have allowed the build of a more comprehensive model of the DmSNAPc-Bdp1-TBP complex on the U6 promoter that includes nearly all of DmSNAPc, a portion of Bdp1, and the conserved region of TBP (Kim, 2020).

RNA polymerase III (Pol III) transcribes genes for tRNAs, 5S rRNA, and various small nuclear RNAs (snRNAs). Genes for the tRNAs and 5S rRNA have gene-internal promoters that usually are TATA-less. However, other genes, including U6 snRNA, 7SK RNA, tRNAsel, H1, and MRP RNAs, have gene-external promoters that consist of two distinct elements, a TATA sequence and a proximal sequence element (PSE) centered about 30 and 55 bp, respectively, upstream of the transcription start site. The TATA sequence is recognized by the Pol III general transcription factor TFIIIB, and the PSE is recognized by the small nuclear RNA activating protein complex (SNAPc) (Kim, 2020).

TFIIIB contains three subunits, most often TBP, Brf1, and Bdp1. These three subunits form an architectural scaffold for Pol III recruitment and together coordinate conformational changes that lead to the formation of an open complex. Interestingly, depending upon the type of gene and/or the organism, TFIIIB can exhibit subunit heterogeneity. For example, in the fruit fly Drosophila melanogaster, the TFIIIB that assembles on Pol III genes that have internal promoters contains the TBP-related factor 1 (TRF1) in place of TBP (Verma, 2013). However, U6 and U6-type genes with external promoters utilize a TFIIIB that contains the canonical TBP rather than TRF1 (Verma, 2013). In another example, human Pol III-transcribed genes with internal promoters utilize a TFIIIB that contains canonical Brf1, whereas Pol III-transcribed snRNA genes require an alternative Brf known as Brf2 (Kim, 2020).

SNAPc is a multisubunit factor that binds to the PSE (termed the PSEA in fruit flies, the subject of this paper) to activate the transcription of snRNA genes. D. melanogaster SNAPc (DmSNAPc) consists of three subunits, DmSNAP190, DmSNAP50, and DmSNAP43, that are homologs of the three essential subunits of human SNAPc. Although all three DmSNAPc subunits are required for DNA-binding activity, little is understood of the specific roles that the individual fly or human SNAPc subunits play in the recruitment of TFIIIB and the transcriptional activation of snRNA genes (Kim, 2020).

Previously, by using site-specific protein-DNA photo-cross-linking assays, nucleotide positions were identified where each of the individual DmSNAPc subunits cross-linked as part of the complex to U6 snRNA gene promoter DNA. Likewise, interactions were reported of the TFIIIB subunits (in the absence of DmSNAPc) with specific nucleotides in the U6 snRNA gene promoter. Those studies revealed both the linear positions (translational location along the DNA helix) and rotational positions (face of the DNA double helix) occupied by each of the DmSNAPc and TFIIIB subunits on the DNA. Furthermore, by cleaving the DmSNAPc proteins at specific sites after photo-cross-linking, it was possible to identify domains or regions of DmSNAP190, DmSNAP50, and DmSNAP43 that cross-linked to specific nucleotides within or adjacent to the PSEA (Kim, 2020).

Finally, in more recent work, it was found that DmSNAPc can recruit Bdp1 to the U6 snRNA gene promoter in the absence of TBP and Brf1 (Verma, 2018). Furthermore, an 87-amino-acid region of Bdp1 was identified that was required for Bdp1 to be recruited to the U6 snRNA gene promoter by DmSNAPc. Over the years, this has allowed the building of a more and more encompassing picture of the architecture of the protein-DNA complex assembled on the U6 promoter (Kim, 2020).

Given the findings from that previous work, this study has now examined the recruitment of TBP to the U6 snRNA gene promoter by the DmSNAPc-Bdp1 complex. Furthermore, site-specific protein-DNA photo-cross-linking assays were applied to map the DmSNAPc, Bdp1, and TBP interactions with specific nucleotides of the U6 promoter. Finally, the architecture was examined of both the DmSNAPc-Bdp1-U6 promoter complex and the DmSNAPc-Bdp1-TBP-U6 promoter complex by applying cross-linking mass spectrometry (CXMS). The results of these studies allowed development of a more detailed model of the Pol III transcriptional machinery assembled on the U6 snRNA gene promoter that includes nearly all of DmSNAPc and parts of the TFIIIB components Bdp1 and TBP (Kim, 2020).

The canonical pathway for the assembly of the Pol III preinitiation complex (PIC) on tRNA genes involves the binding of TFIIIC to the gene-internal promoter followed by recruitment of TFIIIB (either preassembled or assembled in a stepwise process that involves the initial recruitment of Brf1 and TBP, followed by Bdp1 in a subsequent step) and finally RNA polymerase. (PIC assembly on 5S genes is believed to be similar but requires the prior binding of TFIIIA to aid in the recruitment of TFIIIC.) In contrast, the results raise the interesting possibility that PIC assembly on Pol III genes with external promoters in D. melanogaster proceeds by an alternate pathway that involves the following initial steps: first, DmSNAPc binds to the PSEA; second, DmSNAPc recruits Bdp1; and third, the promoter-bound DmSNAPc-Bdp1 complex and the TATA box recruit TBP. Brf1 and RNA polymerase may, in turn, assemble on the promoter at a subsequent step of PIC formation (Kim, 2020).

This study has proposed a model for the DmSNAPc-Bdp1-TBP complex on the U6 promoter that is consistent with EMSA, site-specific protein-DNA photo-cross-linking, and CXMS experiments. Furthermore, the DmSNAPc model is fully consistent with coimmunoprecipitation experiments that mapped regions of the three DmSNAPc subunits that are required for their assembly with each other. The model further provides a rationale for the recruitment of Bdp1 and TBP by DmSNAPc. Bdp1 cross-links to DNA nucleotide positions that extend upstream of the TATA box into positions that are actually a part of the PSEA. These positions are also occupied by DmSNAP190 and DmSNAP43 (but not DmSNAP50), indicating that Bdp1 must lie in close proximity to DmSNAP190 and DmSNAP43. Also supporting this model, the CXMS experiments revealed cross-linking of Bdp1 with DmSNAP190 and DmSNAP43 (but not with DmSNAP50) (Kim, 2020).

Furthermore, additional evidence was generated, beyond that previously published, that residues 424 to 510 of Bdp1 are involved in the recruitment of Bdp1 by DmSNAPc. For example, an internal deletion of residues 424 to 510 resulted in the complete loss of Bdp1 recruitment by DmSNAPc. Moreover, Bdp1 residues 424 to 510 alone exhibited the same pattern as full-length Bdp1 in site-specific protein-DNA photo-cross-linking, suggesting that this region of Bdp1 extended into the U6 PSEA, where it would reside in close proximity to DmSNAP190 and DmSNAP43. Finally, the CXMS data with full-length Bdp1 showed that Bdp1 residues 424 to 510, together with nearby residues flanking that region, were responsible for the majority of the protein-protein cross-links between DmSNAPc and Bdp1 (Kim, 2020).

In work by others, the N-terminal region of human Bdp1, more so than the C-terminal region, was found to interact with DmSNAPc. Interestingly, the CXMS studies revealed that lysines within the N-terminal region of Bdp1 (lysines 203, 206, and 231) cross-link to both DmSNAP190 and DmSNAP43. Thus, it is possible that a region of fly Bdp1 N-terminal of the SANT domain, as well as residues 424 to 510 C-terminal of the SANT domain, interact with DmSNAPc. Perhaps this potential N-terminal interaction of fly Bdp1 with DmSNAPc is not stable enough to be detected in the current EMSAs (Kim, 2020).

Interestingly, by EMSA, it has not been possible to convincingly demonstrate the existence of a complex that contains both DmSNAPc and Brf1 together with Bdp1 and TBP. Essentially, either DmSNAPc-Bdp1-TBP or Brf1-Bdp1-TBP, which lacks DmSNAPc, is seen. The modeling suggests a rationale for this result. The finding that a region of DmSNAP43 lies on or near the upper surface of TBP suggests that the binding of DmSNAPc and that of Brf1 are mutually exclusive. Yeast Brf1 was modeled into the proposed SNAPc-Bdp1-TBP complex in accordance with a published cryo-EM structure. Depending upon the exact positioning of DmSNAP43, it may sterically or otherwise interfere with the binding of Brf1 along the upper surface of TBP. If this is true, it would suggest some form of regulation to govern the transition from a DmSNAPc-Bdp1-TBP complex to a Brf1-Bdp1-TBP complex (Kim, 2020).

In light of this potential regulation, the finding cannot be ignored that the C-terminal region of fly DmSNAP190 appears to be structurally related to the ligand-binding domains of members of the nuclear hormone receptor superfamily. The location of this domain, near DmSNAP43 and the SANT domain of Bdp1, raises the intriguing possibility that the activity of D. melanogaster SNAPc and the expression of snRNA genes are regulated by an unknown small organic molecule of intracellular or extracellular origin. This could provide an interesting avenue of future research (Kim, 2020).

The work reported in this study furthermore suggests pathways toward U6 preinitiation complex assembly in flies and humans that are analogous but different with respect to the intermediary factor that acts as a stabilizing bridge between SNAPc and TBP. In flies, PSEA-bound DmSNAPc recruits Bdp1 in a TATA box-independent manner and TBP in a TATA-dependent manner, with Bdp1 acting to stabilize DmSNAPc and TBP on the PSEA and TATA box, respectively. In humans, factor assembly appears to occur analogously but involving Brf2 instead of Bdp1: PSE-bound SNAPc interacts with Brf2 independent of a TATA box, and this complex recruits TBP only in the presence of a TATA box. One obvious explanation for the difference is that flies do not have Brf2, so different mechanisms have evolved in flies and humans for TBP recruitment to U6 gene promoters (Kim, 2020).

In a broader sense, work on snRNA genes has extended the perspective on the diversity of the TFIIIB components that can be assembled into the Pol III PIC: TBP (for snRNA genes) versus TRF1 (for tRNA and 5S RNA genes) in flies and Brf2 (for snRNA genes) versus Brf1 (for tRNA and 5S RNA genes) in humans. The only constant TFIIIB component known so far is Bdp1. The snRNA work has also revealed different pathways for TFIIIB assembly, at least in vitro, on SNAPc-dependent genes versus TFIIIC-dependent genes. The former seem to proceed initially by SNAPc-dependent recruitment of Bdp1 or Brf2, followed by TBP recruitment, whereas the latter are thought to occur by TFIIIC-dependent recruitment of TFIIIB either as a preformed complex or proceeding first through Brf1 and TBP recruitment, followed by Bdp1 in a subsequent step (Kim, 2020).

TFIID Enables RNA Polymerase II Promoter-Proximal Pausing

RNA polymerase II (RNAPII) transcription is governed by the pre-initiation complex (PIC), which contains TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNAPII, and Mediator. After initiation, RNAPII enzymes pause after transcribing less than 100 bases; precisely how RNAPII pausing is enforced and regulated remains unclear. To address specific mechanistic questions, human RNAPII promoter-proximal pausing was reconstituted in vitro, entirely with purified factors (no extracts). As expected, NELF and DSIF increased pausing, and P-TEFb promoted pause release. Unexpectedly, the PIC alone was sufficient to reconstitute pausing, suggesting RNAPII pausing is an inherent PIC function. In agreement, pausing was lost upon replacement of the TFIID complex with TATA-binding protein (TBP), and PRO-seq experiments revealed widespread disruption of RNAPII pausing upon acute depletion (t = 60 min) of TFIID subunits in human or Drosophila cells. These results establish a TFIID requirement for RNAPII pausing and suggest pause regulatory factors may function directly or indirectly through TFIID (Fant, 2020).

RNA polymerase II (RNAPII) transcribes all protein-coding and many non-coding RNAs in the human genome. RNAPII transcription initiation occurs within the pre-initiation complex (PIC), which contains TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNAPII, and Mediator. After initiation, RNAPII enzymes typically pause after transcribing 20-80 bases, and paused polymerases represent a common regulatory intermediate. Accordingly, paused RNAPII has been implicated in enhancer function, development and homeostasis, and diseases ranging from cancer to viral pathogenesis. Precisely how RNAPII promoter-proximal pausing is enforced and regulated remains unclear; however, protein complexes, such as NELF and DSIF, increase pausing, whereas the activity of CDK9 (P-TEFb complex) correlates with pause release (Fant, 2020).

Although much has been learned about RNAPII promoter-proximal pausing and its regulation, the underlying molecular mechanisms remain enigmatic. One reason for this is the complexity of the human RNAPII transcription machinery, which includes the ∼4.0 MDa PIC and many additional regulatory factors. Another underlying reason is that much current understanding derives from cell-based assays, which are indispensable but cannot reliably address mechanistic questions. For instance, factor knockdowns or knockouts cause unintended secondary effects and the factors and biochemicals present at each gene in a population of cells cannot possibly be defined. In vitro assays can overcome such limitations, but these have typically involved nuclear extracts, which contain a similarly undefined mix of proteins, nucleic acids, and biochemicals. To circumvent these issues, this study sought to reconstitute RNAPII promoter-proximal pausing entirely from purified human factors (no extracts). Success with this task enabled addressing some basic mechanistic questions and opens the door for future studies to better define the contribution of specific factors in RNAPII promoter-proximal pause regulation (Fant, 2020).

Structural data indicate that TFIID lobe C subunits TAF1 (see Drosophila Taf250) and TAF2 bind promoter DNA downstream of the TSS (Louder, 2016; Patel, 2018). Past studies revealed that insertion of 10-bp DNA at the +15 site relative to the TSS disrupted RNAPII pausing at the HSP70 gene in Drosophila S2 cells (Kwak, 2013). This led to a 'complex interaction' model for pausing, in which a promoter-bound factor(s) establishes an interaction (directly or indirectly) with the paused RNAPII complex. In agreement with this model, a TFIID requirement was observed for RNAPII promoter-proximal pausing in vitro, which is further supported by PRO-seq data in TAF-depleted human and Drosophila S2 cells. Additional evidence for TFIID-dependent regulation of RNAPII pausing derives from correlations among paused genes and DNA sequence elements bound by TFIID. Defects in TFIID function are linked to numerous diseases, including cancer and neurodegenerative disorders. Its requirement for RNAPII promoter-proximal pause regulation may underlie these and other biological functions (Fant, 2020).

Biochemical reconstitution of RNAPII promoter-proximal pausing provides a level of mechanistic control that is simply not possible with cell-based assays; consequently, it was discovered that RNAPII pausing is an inherent property of the human PIC and that TFIID is a key PIC factor that establishes pausing. The results also reveal NELF, DSIF, and P-TEFb as auxiliary factors that, although not required for pausing, enable robust regulation of this common transcriptional intermediate state. Time course experiments indicated that polymerases in the paused region remained active and generated elongated transcripts over time. Experiments with P-TEFb showed enhanced release of paused intermediates, providing further evidence that polymerases in the paused region were active and competent for elongation. However, some transcripts remained in the pause region after the 10-min reactions, even with added P-TEFb. This result is also consistent with current models that invoke alternative outcomes for promoter-proximal paused RNAPII, including premature termination, arrest, or a more stable paused intermediate. Addressing the mechanisms and factors that regulate these distinct outcomes could be explored in future studies (Fant, 2020).

Despite its advantages, the reconstituted in vitro transcription assay does not match the complexity of regulatory inputs that converge upon active promoters in a living cell. To test the TFIID requirement for promoter-proximal pausing in cells, it was possible to rapidly deplete TFIID lobe C subunits TAF1 and TAF2 using Trim-Away, and genome-wide changes in nascent transcription were assessed with PRO-seq. Consistent with the in vitro data, global transcription increased at protein-coding genes upon TAF1/2 knockdown, with evidence for enhanced pause release. PRO-seq reads increased at 5' ends and downstream of promoter-proximal pause sites at thousands of genes in TAF1/2-depleted cells. These data are consistent with increased pause release and increased re-initiation, two processes that are coupled in metazoan cells. Unexpectedly, however, increased pause release did not yield similar genome-wide increases in gene body reads. Instead, the PRO-seq data revealed a sharp reduction in reads downstream of promoter-proximal pause sites, at around +300 from the TSS in both human and Drosophila cells. These results implicate additional regulatory mechanisms, downstream of the pause site, that may terminate or arrest RNAPII. Although future studies are needed to identify the factors involved, it is noted that the Integrator complex was recently shown to cleave nascent transcripts downstream of pause sites at hundreds of genes in Drosophila cells (Tatomer, 2019). Because promoter-proximal pausing helps ensure proper capping of transcripts at their 5' ends, downstream regulatory mechanisms may become important when RNAPII promoter-proximal pausing is disrupted (Fant, 2020).

A TFIID requirement for RNAPII promoter-proximal pausing implies that other pause regulatory factors may function directly or indirectly through TFIID. Although additional mechanistic aspects remain to be addressed, it is notable that pause regulatory factors, including P-TEFb and MYC, interact (directly or indirectly) with TFIID; moreover, TFIID is conformationally flexible and likely undergoes structural reorganization during RNAPII transcription initiation and pause release. Such structural transitions may contribute to TFIID-dependent regulation of RNAPII pausing. Whereas nucleosomes likely affect promoter-proximal pausing, they are not required, based upon our results and data in Drosophila and mammalian systems. TFIID possesses multiple domains that bind chromatin marks associated with transcriptionally active loci, including H3K4me3, which suggests TFIID function is regulated in part through epigenetic mechanisms. Future studies should help establish whether specific chromatin marks contribute to TFIID-dependent regulation of RNAPII pausing, potentially by affecting TFIID promoter occupancy or by impacting TFIID structure and function (Fant, 2020).

The Integrator complex cleaves nascent mRNAs to attenuate transcription

Cellular homeostasis requires transcriptional outputs to be coordinated, and many events post-transcription initiation can dictate the levels and functions of mature transcripts. To systematically identify regulators of inducible gene expression, high-throughput RNAi screening of the Drosophila Metallothionein A (MtnA) promoter was performed. This revealed that the Integrator complex, which has a well-established role in 3' end processing of small nuclear RNAs (snRNAs), attenuates MtnA transcription during copper stress. Integrator is an evolutionarily conserved complex that contains 14 subunits and regulates RNA processing and gene transcription by associating with the C-terminal domain of RNA polymerase II large subunit. Integrator complex subunit 11 (IntS11) endonucleolytically cleaves MtnA transcripts, resulting in premature transcription termination and degradation of the nascent RNAs by the RNA exosome, a complex also identified in the screen. Using RNA-seq, >400 additional Drosophila protein-coding genes whose expression increases upon Integrator depletion. This study focused on a subset of these genes and confirmed that Integrator is bound to their 5' ends and negatively regulates their transcription via IntS11 endonuclease activity. Many noncatalytic Integrator subunits, which are largely dispensable for snRNA processing, also have regulatory roles at these protein-coding genes, possibly by controlling Integrator recruitment or RNA polymerase II dynamics. Altogether, these results suggest that attenuation via Integrator cleavage limits production of many full-length mRNAs, allowing precise control of transcription outputs (Tatomer, 2019).

In response to physiological cues, environmental stress, or exposure to pathogens, specific transcriptional programs are induced. These responses are often coordinated, rapid, and robust, in part because many metazoan genes are maintained in a poised state with RNA polymerase II (RNAPII) engaged prior to induction. In addition to promoter-proximal pausing, there are many regulatory steps post transcription initiation that dictate the characteristics and fate of mature transcripts. For example, alternative splicing and/or 3' end processing events can lead to the production of multiple isoforms from a single locus, and these transcripts can have distinct stabilities, translation potential, or subcellular localization (Tatomer, 2019).

It is particularly important that genes produce full-length functional mRNAs and mechanisms such as telescripting, involving U1 snRNP, actively suppress premature cleavage and polyadenylation events in eukaryotic cells. Nevertheless, many promoters are known to generate short unstable RNAs. This suggests that premature transcription termination may often occur, thereby limiting RNAPII elongation and production of full-length mRNAs (for review, see Kamieniarz-Gdula and Proudfoot 2019). Moreover, this process can be regulated. For example, it was recently shown that the cleavage and polyadenylation factor PCF11 stimulates premature termination to attenuate the expression of many transcriptional regulators in human cells (Kamieniarz-Gdula, 2019). Potentially deleterious truncated transcripts generated by premature termination are often removed from cells by RNA surveillance mechanisms, including by the RNA exosome. However, the full repertoire of cellular factors and cofactors that control the metabolic fate of nascent RNAs, especially during the early stages of transcription elongation, is still unknown (Tatomer, 2019).

An unbiased genome-scale RNAi screen was performed in Drosophila cells to reveal factors that control the output of a model inducible eukaryotic promoter. Transcription of Drosophila Metallothionein A (MtnA), which encodes a metal chelator, is rapidly induced when the intracellular concentration of heavy metals (e.g., copper or cadmium) is increased. This increase in transcriptional output is dependent on the MTF-1 transcription factor, which relocalizes to the nucleus upon metal stress and binds to the MtnA promoter. The RNAi screen identified MTF-1 and other known regulators of MtnA transcription, but also surprisingly identified the Integrator complex as a potent inhibitor of MtnA during copper stress. Integrator harbors an endonuclease that cleaves snRNAs and enhancer RNAs, and this study has found that Integrator can likewise cleave nascent MtnA transcripts to limit mRNA production. Using RNA-seq, hundreds of additional Drosophila protein-coding genes were found whose expression increases upon Integrator depletion. Focused studies on a subset of these genes confirmed that Integrator can cleave these nascent RNAs, thereby limiting productive transcription elongation. Altogether, it is proposed that Integrator-catalyzed premature termination can function as a widespread and potent mechanism to attenuate expression of protein-coding genes (Tatomer, 2019).

Altogether, the data indicate that the Integrator complex can attenuate the expression of protein-coding genes by catalyzing premature transcription termination. The IntS11 endonuclease cleaves a subset of nascent mRNAs, which ultimately triggers degradation of the transcripts by the RNA exosome along with RNAPII termination. It is suggested that many protein-coding genes are negatively regulated via this attenuation mechanism, and the Drosophila MtnA promoter highlights context-specific regulation by Intgerator. Transcription of MtnA is induced by copper or cadmium stress, and yet this study finds that Integrator is robustly recruited to the MtnA promoter only under copper stress conditions. This is not because the Integrator complex is generally diassembled or 'poisoned' by cadmium, as Integrator continues to regulate the outputs of other protein-coding genes. It is instead proposed that context-specific regulation of this locus may be related to the fact that cadmium is a strictly toxic metal, while copper is required for the function of a subset of enzymes and must be maintained in a narrow concentration range. Therefore, homeostatic control of MtnA is required to maintain copper levels, while cells need to maximally produce MtnA in the presence of cadmium. It is thus proposed that regulation of MtnA levels by Integrator during copper stress is for fine-tuning purposes, perhaps to limit maximal transcriptional induction and/or facilitate transcriptional shut-off once copper stress has passed. The results suggest that the Integrator complex can be recruited to gene loci only when needed, thereby ensuring tight control over transcriptional output (Tatomer, 2019).

In addition to cleaving MtnA transcripts, Integrator cleaves multiple other RNA classes in metazoan cells, including enhancer RNAs (Lai, 2015), snRNAs (Baillat, 2005), telomerase RNA (Rubtsova, 2019), and some herpesvirus microRNA precursors (Cazalla, 2011; Xie, 2015). Using RNA-seq, this study has expanded this list of Integrator target loci and identified hundreds of additional protein-coding genes that are negatively regulated by Integrator. Focused is placed on a set of Integrator-dependent genes; Integrator was found to catalyze premature transcription termination of these genes, consistent with prior studies that suggested roles for Integrator in termination (Skaar, 2015; Shah, 2018; Gomez-Orte, 2019). Some of these genes (CG8620, Pepck1, and Sirup) have promoter-proximal RNAPII that rapidly turns over, which may indicate that Integrator can aid in clearing paused or stalled RNAPII. Once Integrator has cleaved the nascent mRNAs, this study finds that they are rapidly degraded from their 3' ends by the RNA exosome. This may be critical for enabling subsequent rounds of transcription (especially at the MtnA locus), perhaps because the small RNAs can form stable RNA-DNA hybrids (R-loops) that block transcription initiation or elongation (Tatomer, 2019).

Endonucleolytic cleavage is critical for Integrator regulation at snRNA and protein-coding genes, but the data indicate that these loci have different dependencies on Integrator subunits. Genetic studies indicate that Integrator subunits 4, 9, and 11 (which form the Integrator cleavage module) are most important for snRNA processing, while the non-catalytic Integrator subunits (all of which currently lack annotated molecular functions) play minor roles. In contrast, large increases in mRNA expression were observed when many of the non-catalalytic subunits were depleted (especially IntS1, IntS2, IntS5, IntS6, IntS7, and IntS8). IntS13 was recently shown to be able to function independently from other Integrator subunits at enhancers (Barbieri, 2018), suggesting the existence of submodules or 'specialized' complexes that may enable the activity and function of Integrator to be distinctly regulated depending on the gene locus and cellular state. Future work will reveal the subunit requirements of Integrator complexes at distinct loci and clarify the interplay between IntS11 endonuclease activity and other Integrator subunits. For example, the non-catalytic subunits may be critical for the formation and targeting of the complex to specific loci and/or controlling RNAPII dynamics (Tatomer, 2019).

Finally, it is noted that the metazoan Integrator complex has parallels with the yeast Nrd1-Nab3-Sen1 (NNS) complex that (1) terminates transcription at both mRNA and snRNA loci and (2) interacts with the RNA exosome. Interestingly, the underlying molecular mechanisms of transcription termination carried out by these two complexes are quite distinct. NNS uses the Sen1 helicase to pull the nascent transcript out of the RNAPII active site, while Integrator likely promotes termination by taking advantage of its RNA endonuclease activity and providing an entry site for a 5'-3' exonuclease. There is currently conflicting data on whether the canonical 'torpedo' exonuclease Rat1/Xrn2 is involved in termination at snRNA genes as only subtle termination defects have been observed at these loci when Rat1/Xrn2 is depleted from cells. Notably, Cpsf73 has been shown to behave as both an endonuclease and exonuclease, raising the possibility that IntS11 could support a 'Rat1/Xrn2-like' function and mediate termination. Future studies that compare and contrast the Integrator and NNS complexes, especially how their recruitment and termination activities are controlled, will shed light on this important facet of gene regulation. In summary, transcription attenuation through premature termination was first described decades ago in bacteria, and the current work indicates that the metazoan Integrator complex can function analogously to limit expression from protein-coding genes (Tatomer, 2019).

Mediator and RNA polymerase II clusters associate in transcription-dependent condensates

Models of gene control have emerged from genetic and biochemical studies, with limited consideration of the spatial organization and dynamics of key components in living cells. This study used live-cell superresolution and light-sheet imaging to study the organization and dynamics of the Mediator coactivator and RNA polymerase II (Pol II) directly. Mediator and Pol II each form small transient and large stable clusters in living embryonic stem cells. Mediator and Pol II are colocalized in the stable clusters, which associate with chromatin, have properties of phase-separated condensates, and are sensitive to transcriptional inhibitors. It is suggested that large clusters of Mediator, recruited by transcription factors at large or clustered enhancer elements, interact with large Pol II clusters in transcriptional condensates in vivo (Cho, 2018).

A conventional view of eukaryotic gene regulation is that transcription factors, bound to enhancer DNA elements, recruit coactivators such as the Mediator complex, which is thought to interact with RNA polymerase II (Pol II) at the promoter. This model is supported by a large body of molecular genetic and biochemical evidence, yet the direct interaction of Mediator and Pol II has not been observed and characterized in living cells. Using superresolution and light-sheet imaging, the organization and dynamics of endogenous Mediator and Pol II in live mouse embryonic stem cells (mESCs) was studied. Whether Pol II and Mediator interact in a manner consistent with condensate formation was directly tested, their biophysical properties were quantitatively characterized, and the implications of these observations for transcription regulation in living mammalian cells was considered (Cho, 2018).

To visualize Mediator and Pol II in live cells, mouse embryonic stem cell lines were generated with endogenous Mediator and Pol II labeled with Dendra2, a green-to-red photoconvertible fluorescent protein. Live-cell superresolution imaging was performed and Mediator was found to form clusters with a range of dynamic temporal signatures. Mediator exists in a population of transient small (~100 nm) clusters with an average lifetime of 11.1 ± 0.9 s, comparable to that of transient Pol II clusters observed in this study and previously in differentiated cell types. In addition, it was observed that both Mediator and Pol II form a population of large (>300 nm) clusters (~14 per cell), each comprising ~200 to 400 molecules, that are temporally stable (lasting the full acquisition window of the live-cell superresolution imaging) (Cho, 2018).

The extent to which these clusters depend on the stem cell state was tested. The mESCs were subjected to a protocol to differentiate them into epiblastlike cells (EpiLCs) within 24 h. Differentiation had no apparent effect on the population of transient clusters, consistent with previous observations that transient clusters persist in differentiated cell types. However, both the size and the number of stable clusters decreased along the course of differentiation, suggesting that these stable clusters are prone to change as cells differentiate (Cho, 2018).

Focused was placed on the stable clusters of Mediator and Pol II and whether they are colocalized was investigated. mESCs were generated with endogenous Mediator and Pol II tagged with JF646-HaloTag and Dendra2, respectively. Direct imaging of both JF646-Mediator and Dendra2-Pol II showed bright spots of large accumulations in the nucleus, which corresponded to stable Pol II clusters according to subsequent superresolution imaging of Dendra2-Pol II in the same nuclei. The same observations were made with Dendra2-Mediator. Of 143 Mediator clusters imaged by dual-color light-sheet imaging, 129 (90%) had a colocalizing Pol II cluster. It was concluded that these Mediator and Pol II clusters colocalize in live mESCs (Cho, 2018).

Previous studies have shown that high densities of Mediator are located at enhancer clusters called super-enhancers (SEs) and that some are disrupted by loss of the BET (bromodomain and extraterminal family) protein BRD4 (Drosophila homolog: fs(1)h), which is a cofactor associated with Mediator. This study found that treatment of mESCs with JQ1, a drug that causes loss of BRD4 from enhancer chromatin, dissolved transient and stable clusters of both Mediator and Pol II clusters (Cho, 2018).

After transcription initiation, Pol II transcribes a short distance (~100 base pairs), pauses, and is released to continue elongation when phosphorylated by CDK9. It was hypothesized that inhibition of CDK9 might selectively affect the Pol II stable clusters. It was observed that upon incubation with DRB (5,6-dichloro-1-beta-d-ribofuranosyl-benzimidazole), Pol II stable clusters dissolved but Mediator stable clusters remained. Quantification of Mediator-Pol II colocalization revealed that incubation with DRB progressively decreased the fraction of Mediator stable clusters that colocalized with Pol II. This effect could be reversed when DRB was washed out; the colocalization fraction recovered completely. These results imply that the association between Mediator and Pol II clusters may be hierarchical, with upstream enhancer recruitment controlling both clusters but downstream transcription inhibition selectively affecting Pol II clusters (Cho, 2018).

The long-term dynamics of stable clusters were characterized by using lattice light-sheet imaging in live mESCs. It was observed that clusters can merge upon contact. The time scale of coalescence was very rapid, comparable to the full volumetric acquisition frame rate (15-s time interval). The added-up intensity of the two precursor clusters was close to that of the newly merged cluster. These biophysical dynamics are reminiscent of those of biomolecular condensates in vivo (Cho, 2018).

In addition to coalescence, in vivo condensates had rapid turnover of the molecular components, as shown by fast recovery in fluorescence recovery after photobleaching (FRAP) assays, and were sensitive to a nonspecific aliphatic alcohol, 1,6-hexanediol. FRAP analyses of clusters revealed very rapid dynamics and turnover of their components: 60% of the Mediator and 90% of Pol II components were exchanged within ~10 s within clusters. Moreover, the treatment of mESCs with 1,6-hexanediol resulted in the gradual dissolution of both Mediator and Pol II clusters. Together, these results suggest that the stable clusters are in vivo condensates of Mediator and Pol II (Cho, 2018).

It was hypothesized that a phase separation model with induced condensation at the recruitment step of Mediator to enhancers would qualitatively account for the observations in this study. The model implies that the condensates are chromatin associated and colocalize with enhancer-controlled active genes. Therefore these two specific implications were tested. The diffusion dynamics of Mediator clusters were tracked by computing their mean squared displacement as a function of time (n = 6 cells). On short time scales, the cluster motion was subdiffusive, with an exponent α = 0.40 ± 0.12. This is the same exponent found in the subdiffusional behavior of chromatin loci in eukaryotic cells. The same diffusional parameters were also observed when tracking a chromatin locus labeled by dCas9-based chimeric array of guide RNA oligonucleotides (CARGO) in mESCs. It is concluded that clusters diffuse like chromatin-associated domains (Cho, 2018).

It was hypothesized that clusters were in close physical proximity to actively transcribed genes that can be visualized by global run-on nascent RNA labeling with ethynyl uridine (EU). The run-on results showed that 2 min after DRB washout, virtually all Mediator clusters observed were proximal or overlapping with nascent RNA accumulations, as imaged by Click labeling of EU in fixed cells. Yhe MS2 endogenous RNA labeling system was employed to investigate whether active transcription could be observed at Esrrb, one of the top SE-controlled genes in mESCs. Bright foci were observed consistent with nascent MS2-labeled gene loci, and the gene loci were confirmed by dual-color RNA fluorescence in situ hybridization (FISH) targeting the MS2 sequence and intronic regions of Esrrb. Intronic FISH on 125 Esrrb loci from 82 fixed cells showed that 93% of Esrrb loci had a stable Mediator cluster nearby (within 1 µm) but only ~22% of the loci colocalized with a stable Mediator cluster, suggesting that the Mediator-bound enhancer only occasionally colocalizes with the gene. The variability in colocalization may be explained by a dynamic 'kissing' model, where a distal Mediator cluster colocalizes with the gene only at certain time points (Cho, 2018).

By dual-color three-dimensional (3D) live-cell imaging with lattice light-sheet microscopy, it was found that some Mediator clusters were up to a micrometer away from the active Esrrb gene locus but in some instances directly colocalized with the gene. In addition, the dynamic interaction between Mediator clusters and the gene locus was directly observed, supporting the dynamic kissing model. Tracking of loci in all six cells indicated that colocalization below the resolution limit of 300 nm occurred at ~30% of the time points. However, even when they were not overlapping, the Mediator cluster and the gene loci moved as a pair through the nucleus, consistent with two adjacent regions anchoring to the same underlying chromatin domain. It is proposed that Mediator clusters form at the Esrrb SE and then interact occasionally and transiently with the transcription apparatus at the Esrrb promoter (Cho, 2018).

This study has found that Mediator and Pol II form large stable clusters in living cells and has shown that these clusters have properties expected for biomolecular condensates. The condensate properties were evident through coalescence, rapid recovery in FRAP analysis, and sensitivity to hexanediol. In a model of phase separation on the basis of scaffold-client relationships, it is possible that enhancer-associated Mediator forms a condensate and provides a 'scaffold' for 'client' RNA Pol II molecules. The model proposed whereby large Mediator clusters at enhancers transiently kiss the transcription apparatus at promoters has a number of implications for gene control mechanisms. The presence of large Mediator clusters at some enhancers may allow Mediator condensates to contact the transcription apparatus at multiple gene promoters simultaneously. The large size of the Mediator clusters may also mean that the effective distance of the enhancer-promoter DNA elements can be in the same order as the size of the clusters (>300 nm), larger than the distance requirement for direct contact. It is speculated that such clusters may help explain gaps of hundreds of nanometers that are found in previous studies measuring distances between functional enhancer-promoter DNA elements. Such cluster sizes also imply that some long-range interactions could go undetected in DNA interaction assays that depend on much closer physical proximity of enhancer and promoter DNA elements (Cho, 2018).

Coactivator condensation at super-enhancers links phase separation and gene control

Super-enhancers (SEs) are clusters of enhancers that cooperatively assemble a high density of the transcriptional apparatus to drive robust expression of genes with prominent roles in cell identity. This study demonstrates that the SE-enriched transcriptional coactivators BRD4 and MED1 form nuclear puncta at SEs that exhibit properties of liquid-like condensates and are disrupted by chemicals that perturb condensates. The intrinsically disordered regions (IDRs) of BRD4 and MED1 can form phase-separated droplets, and MED1-IDR droplets can compartmentalize and concentrate the transcription apparatus from nuclear extracts. These results support the idea that coactivators form phase-separated condensates at SEs that compartmentalize and concentrate the transcription apparatus, suggest a role for coactivator IDRs in this process, and offer insights into mechanisms involved in the control of key cell-identity genes (Sabari, 2018).

Phase separation of fluids is a physicochemical process by which molecules separate into a dense phase and a dilute phase. Phase-separated biomolecular condensates, which include the nucleolus, nuclear speckles, stress granules, and others, provide a mechanism to compartmentalize and concentrate biochemical reactions within cells. Biomolecular condensates produced by liquid-liquid phase separation allow rapid movement of components into and within the dense phase and exhibit properties of liquid droplets such as fusion and fission. Dynamic and cooperative multivalent interactions among molecules, such as those produced by certain intrinsically disordered regions (IDRs) of proteins, have been implicated in liquid-liquid phase separation (Sabari, 2018).

Enhancers are gene regulatory elements bound by transcription factors (TFs) and other components of the transcription apparatus that function to regulate expression of cell type-specific genes. Super-enhancers (SEs) -- clusters of enhancers that are occupied by exceptionally high densities of transcriptional machinery -- regulate genes with especially important roles in cell identity. DNA interaction data show that enhancer elements in the clusters are in close spatial proximity with each other and the promoters of the genes that they regulate, consistent with the notion of a dense assembly of transcriptional machinery at these sites. This high-density assembly at SEs has been shown to exhibit sharp transitions of formation and dissolution, forming as the consequence of a single nucleation event and collapsing when concentrated factors are depleted from chromatin or when nucleation sites are deleted. These properties of SEs led to the proposal that the high-density assembly of biomolecules at active SEs is due to phase separation of enriched factors at these genetic elements. This study has provided experimental evidence that the transcriptional coactivators BRD4 and MED1 (a subunit of the Mediator complex) form condensates at SEs. This establishes a new framework to account for the diverse properties described for these regulatory elements and expands the known biochemical processes regulated by phase separation to include the control of cell-identity genes (Sabari, 2018).

SEs regulate genes with prominent roles in healthy and diseased cellular states. SEs and their components have been proposed to form phase-separated condensates, but with no direct evidence. This study demonstrates that two key components of SEs, BRD4 and MED1, form nuclear condensates at sites of SE-driven transcription. Within these condensates, BRD4 and MED1 exhibit apparent diffusion coefficients similar to those previously reported for other proteins in phase-separated condensates in vivo. The IDRs of both BRD4 and MED1 are sufficient to form phase-separated droplets in vitro, and the MED1-IDR facilitates phase separation in living cells. Droplets formed by MED1-IDR are capable of concentrating transcriptional machinery in a transcriptionally competent nuclear extract. These results support a model in which transcriptional coactivators form phase-separated condensates that compartmentalize and concentrate the transcription apparatus at SE-regulated genes and identify SE components that likely play a role in phase separation (Sabari, 2018).

SEs are established by the binding of master TFs to enhancer clusters. These TFs typically consist of a structured DNA-binding domain and an intrinsically disordered transcriptional activation domain. The activation domains of these TFs recruit high densities of many transcription proteins, which, as a class, are enriched for IDRs. Although the exact client-scaffold relationship between these components remains unknown, it is likely that these protein sequences mediate weak multivalent interactions, thereby facilitating condensation. It is proposed that condensation of such high-valency factors at SEs creates a reaction crucible within the separated dense phase, where high local concentrations of the transcriptional machinery ensure robust gene expression (Sabari, 2018).

The nuclear organization of chromosomes is likely influenced by condensates at SEs. DNA interaction technologies indicate that the individual enhancers within the SEs have exceptionally high interaction frequencies with one another, consistent with the idea that condensates draw these elements into close proximity in the dense phase. Several recent studies suggest that SEs can interact with one another and may also contribute in this fashion to chromosome organization. Cohesin, an SMC (structural maintenance of chromosomes) protein complex, has been implicated in constraining SE-SE interactions because its loss causes extensive fusion of SEs within the nucleus. These SE-SE interactions may be due to a tendency of liquid-phase condensates to undergo fusion (Sabari, 2018).

The model whereby phase separation of coactivators compartmentalizes and concentrates the transcription apparatus at SEs and their regulated genes raises many questions. How does condensation contribute to regulation of transcriptional output? A study of RNA Pol II clusters, which may be phase-separated condensates, suggests a positive correlation between condensate lifetime and transcriptional output. What components drive formation and dissolution of transcriptional condensates? These studies indicate that BRD4 and MED1 likely participate, but the roles of DNA-binding TFs, RNA Pol II, and regulatory RNAs require further study. Why do some proteins, such as HP1a, contribute to phase-separated heterochromatin condensates and others contribute to euchromatic condensates? The rules that govern partitioning into specific types of condensates have begun to be studied and will need to be defined for proteins involved in transcriptional condensates. Does condensate misregulation contribute to pathological processes in disease, and will new insights into condensate behaviors present new opportunities for therapy? Mutations within IDRs and misregulation of phase separation have already been implicated in a number of neurodegenerative diseases. Tumor cells have exceptionally large SEs at driver oncogenes that are not found in their cell of origin, and some of these are exceptionally sensitive to drugs that target SE components. How is it possible to take advantage of phase separation principles established in physics and chemistry to more effectively improve understanding of this form of regulatory biology? Addressing these questions at the crossroads of physics, chemistry, and biology will require collaboration across these diverse sciences (Sabari, 2018).

Transcription factors activate genes through the phase-separation capacity of their activation domains

Gene expression is controlled by transcription factors (TFs) that consist of DNA-binding domains (DBDs) and activation domains (ADs). The DBDs have been well characterized, but little is known about the mechanisms by which ADs effect gene activation. This study, carried out in murine embryonic stem cells, reports that diverse ADs form phase-separated condensates with the Mediator coactivator. For the OCT4 and GCN4 TFs, this study shows that the ability to form phase-separated droplets with Mediator in vitro and the ability to activate genes in vivo are dependent on the same amino acid residues. For the estrogen receptor (ER), a ligand-dependent activator, it was shown that estrogen enhances phase separation with Mediator, again linking phase separation with gene activation. These results suggest that diverse TFs can interact with Mediator through the phase-separating capacity of their ADs and that formation of condensates with Mediator is involved in gene activation (Boija, 2018).

Regulation of gene expression requires that the transcription apparatus be efficiently assembled at specific genomic sites. DNA-binding transcription factors (TFs) ensure this specificity by occupying specific DNA sequences at enhancers and promoter-proximal elements. TFs typically consist of one or more DNA-binding domains (DBDs) and one or more separate activation domains (ADs). While the structure and function of TF DBDs are well documented, comparatively little is understood about the structure of ADs and how these interact with coactivators to drive gene expression (Boija, 2018).

The structure of TF DBDs and their interaction with cognate DNA sequences has been described at atomic resolution for many TFs, and TFs are generally classified according to the structural features of their DBDs. For example, DBDs can be composed of zinc-coordinating, basic helix-loop-helix, basic-leucine zipper, or helix-turn-helix DNA-binding structures. These DBDs selectively bind specific DNA sequences that range from 4 to 12 bp, and the DNA binding sequences favored by hundreds of TFs have been described. Multiple TF molecules typically bind together at any one enhancer or promoter-proximal element. For example, at least eight different TF molecules bind a 50-bp core component of the interferon (IFN)-β enhancer (Boija, 2018).

Anchored in place by the DBD, the AD interacts with coactivators, which integrate signals from multiple TFs to regulate transcriptional output. In contrast to the structured DBD, the ADs of most TFs are low-complexity amino acid sequences not amenable to crystallography. These intrinsically disordered regions (IDRs) have therefore been classified by their amino acid profile as acidic, proline, serine/threonine, or glutamine rich or by their hypothetical shape as acid blobs, negative noodles, or peptide lassos. Remarkably, hundreds of TFs are thought to interact with the same small set of coactivator complexes, which include Mediator and p300. ADs that share little sequence homology are functionally interchangeable among TFs; this interchangeability is not readily explained by traditional lock-and-key models of protein-protein interaction. Thus, how the diverse ADs of hundreds of different TFs interact with a similar small set of coactivators remains a conundrum. Recent studies have shown that the AD of the yeast TF GCN4 binds to the Mediator subunit MED15 at multiple sites and in multiple orientations and conformations. The products of this type of protein-protein interaction, where the interaction interface cannot be described by a single conformation, have been termed 'fuzzy complexes'. These dynamic interactions are also typical of the IDR-IDR interactions that facilitate formation of phase-separated biomolecular condensates (Boija, 2018).

It has recently been proposed that transcriptional control may be driven by the formation of phase-separated condensates and it was demonstrated that the coactivator proteins MED1 and BRD4 form phase-separated condensates at super-enhancers (SEs). This study report that diverse TF ADs phase separate with the Mediator coactivator. The embryonic stem cell (ESC) pluripotency TF OCT4, the estrogen receptor (ER), and the yeast TF GCN4 form phase-separated condensates with Mediator and require the same amino acids or ligands for both activation and phase separation. It is proposed that IDR-mediated phase separation with coactivators is a mechanism by which TF ADs activate genes (Boija, 2018).

The results described in this study support a model whereby TFs interact with Mediator and activate genes by the capacity of their ADs to form phase-separated condensates with this coactivator. For both the mammalian ESC pluripotency TF OCT4 and the yeast TF GCN4, it was found that the AD amino acids required for phase separation with Mediator condensates were also required for gene activation in vivo. For ER, it was found that estrogen stimulates the formation of phase-separated ER-MED1 droplets. ADs and coactivators generally consist of low-complexity amino acid sequences that have been classified as IDRs, and IDR-IDR interactions have been implicated in facilitating the formation of phase-separated condensates. It is proposed that IDR-mediated phase separation with Mediator is a general mechanism by which TF ADs effect gene expression and provide evidence that this occurs in vivo at SEs. It is suggested that the ability to phase separate with Mediator, which would employ the features of high valency and low-affinity characteristic of liquid-liquid phase-separated condensates, operates alongside an ability of some TFs to form high-affinity interactions with Mediator (Boija, 2018).

The model that TF ADs function by forming phase-separated condensates with coactivators explains several observations that are difficult to reconcile with classical lock-and-key models of protein-protein interaction. The mammalian genome encodes many hundreds of TFs with diverse ADs that must interact with a small number of coactivators, and ADs that share little sequence homology are functionally interchangeable among TFs. The common feature of ADs-the possession of low-complexity IDRs-is also a feature that is pronounced in coactivators. The model of coactivator interaction and gene activation by phase-separated condensate formation thus more readily explains how many hundreds of mammalian TFs interact with these coactivators (Boija, 2018).

Previous studies have provided important insights that prompted an investigation of the possibility that TF ADs function by forming phase-separated condensates. TF ADs have been classified by their amino acid profile as acidic, proline rich, serine/threonine rich, glutamine rich, or by their hypothetical shape as acid blobs, negative noodles, or peptide lassos. Many of these features have been described for IDRs that are capable of forming phase-separated condensates. Evidence that the GCN4 AD interacts with MED15 in multiple orientations and conformations to form a 'fuzzy complex' is consistent with the notion of dynamic low-affinity interactions characteristic of phase-separated condensates. Likewise, the low complexity domains of the FET (FUS/EWS/TAF15) RNA-binding proteins can form phase-separated hydrogels and interact with the RNA polymerase II C-terminal domain (CTD) in a CTD phosphorylation-dependent manner; this may explain the mechanism by which RNA polymerase II is recruited to active genes in its unphosphorylated state and released for elongation following phosphorylation of the CTD (Boija, 2018).

The model described in this study for TF AD function may explain the function of a class of heretofore poorly understood fusion oncoproteins. Many malignancies bear fusion-protein translocations involving portions of TFs. These abnormal gene products often fuse a DNA or chromatin-binding domain to a wide array of partners, many of which are IDRs. For example, MLL may be fused to 80 different partner genes in AML, the EWS-FLI rearrangement in Ewing's sarcoma causes malignant transformation by recruitment of a disordered domain to oncogenes, and the disordered phase-separating protein FUS is found fused to a DBD in certain sarcomas. Phase separation provides a mechanism by which such gene products result in aberrant gene expression programs; by recruiting a disordered protein to the chromatin, diverse coactivators may form phase-separated condensates to drive oncogene expression. Understanding the interactions that compose these aberrant transcriptional condensates, their structures, and behaviors may open new therapeutic avenues (Boija, 2018).

Nucleosome Positioning around Transcription Start Site Correlates with Gene Expression Only for Active Chromatin State in Drosophila Interphase Chromosomes

This study analyzed the whole-genome experimental maps of nucleosomes in Drosophila melanogaster and classified genes by the expression level in S2 cells (RPKM value, reads per kilobase million) as well as the number of tissues in which a gene was expressed (breadth of expression, BoE). Chromatin in 5'-regions of genes were classified into four states according to the hidden Markov model (4HMM). Only the Aquamarine chromatin state was considered as Active, while the remaining three states were defined as Non-Active. Surprisingly, about 20/40% of genes with 5'-regions mapped to Active/Non-Active chromatin possessed the minimal/at least modest RPKM and BoE. Regardless of RPKM/BoE the genes of Active chromatin possessed the regular nucleosome arrangement in 5'-regions, while genes of Non-Active chromatin did not show respective specificity. Only for genes of Active chromatin the RPKM/BoE positively correlates with the number of nucleosome sites upstream/around TSS and negatively with that downstream TSS. It is proposed that for genes of Active chromatin, regardless of RPKM value and BoE the nucleosome arrangement in 5'-regions potentiates transcription, while for genes of Non-Active chromatin, the transcription machinery does not require the substantial support from nucleosome arrangement to influence gene expression (Levitsky, 2020).

Quantitative imaging of transcription in living Drosophila embryos reveals the impact of core promoter motifs on promoter state dynamics

Genes are expressed in stochastic transcriptional bursts linked to alternating active and inactive promoter states. A major challenge in transcription is understanding how promoter composition dictates bursting, particularly in multicellular organisms. This study investigated two key Drosophila developmental promoter motifs, the TATA box (TATA) and the Initiator (INR). Using live imaging in Drosophila embryos and new computational methods, it was demonstrated that bursting occurs on multiple timescales ranging from seconds to minutes. TATA-containing promoters and INR-containing promoters exhibit distinct dynamics, with one or two separate rate-limiting steps respectively. A TATA box is associated with long active states, high rates of polymerase initiation, and short-lived, infrequent inactive states. In contrast, the INR motif leads to two inactive states, one of which relates to promoter-proximal polymerase pausing. Surprisingly, the model suggests pausing is not obligatory, but occurs stochastically for a subset of polymerases. Overall, these results provide a rationale for promoter switching during zygotic genome activation (Pimmett, 2021).

Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species

The preinitiation complex (PIC) for transcriptional initiation by RNA polymerase (Pol) II is composed of general transcription factors that are highly conserved. However, analysis of ChIP-seq datasets reveals kinetic and compositional differences in the transcriptional initiation process among eukaryotic species. In yeast, Mediator associates strongly with activator proteins bound to enhancers, but it transiently associates with promoters in a form that lacks the kinase module. In contrast, in human, mouse, and fly cells, Mediator with its kinase module stably associates with promoters, but not with activator-binding sites. This suggests that yeast and metazoans differ in the nature of the dynamic bridge of Mediator between activators and Pol II and the composition of a stable inactive PIC-like entity. As in yeast, occupancies of TATA-binding protein (TBP) and TBP-associated factors (Tafs) at mammalian promoters are not strictly correlated. This suggests that within PICs, TFIID is not a monolithic entity, and multiple forms of TBP affect initiation at different classes of genes. TFIID in flies, but not yeast and mammals, interacts strongly at regions downstream of the initiation site, consistent with the importance of downstream promoter elements in that species. Lastly, Taf7 and the mammalian-specific Med26 subunit of Mediator also interact near the Pol II pause region downstream of the PIC, but only in subsets of genes and often not together. Species-specific differences in PIC structure and function are likely to affect how activators and repressors affect transcriptional activity (Petrenko, 2021).

Transcription factor TFIIEbeta interacts with two exposed positions in helix 2 of the Antennapedia homeodomain to control homeotic function in Drosophila

Homeodomains (HDs) increase their DNA-binding specificity by interacting with additional cofactors outlining a Hox interactome with a multiplicity of protein-protein interactions. In Drosophila, the first link of functional contact with a general transcription factor (GTF) was found between Antennapedia (Antp) and BIP2 (TFIID complex). Hox proteins also interact with other components of Pol II machinery such as the subunit Med19 from Mediator (MED) complex, TFIIEbeta and transcription-pausing factor M1BP. This paper focused on the Antp-TFIIEbeta protein-protein interface to establish the specific contacts as well as its functional role. TFIIEbeta was found to interact with Antp through the HD independently of the YPWM motif and the direct physical interaction is at helix 2, specifically amino acidic positions I32 and H36 of Antp. These two positions in helix 2 are crucial for Antp homeotic function in head involution, and thoracic and antenna-to tarsus transformations. Interestingly, overexpression of Antp and TFIIEbeta in the antennal disc showed that this interaction is required for the antenna-to-tarsus transformation. These results open the possibility to more broadly analyze Antp-TFIIEbeta interaction on the transcriptional control for the activation and/or repression of target genes in the Hox interactome during Drosophila development (Altamirano-Torres, 2018).

To analyze the interplay between Hox and the general transcription machinery, this study focused on Antp-TFIIEβ protein-protein interface to establish the specific contacts, as well as the functional role of this interaction. The results showed a direct physical interaction of TFIIEβ with the 32 and 36 positions of helix 2 Antp HD in cell culture and in vivo. These two positions on helix 2 HD are required for interaction with TFIIEβ, and this interaction is necessary for homeotic transformation (Altamirano-Torres, 2018).

The results demonstrate that Antp HD was necessary for maintaining the interaction with TFIIEβ. Previous studies have confirmed that the HD is sufficient for interaction with GTFs. For example, it has been found that the AbdA HD was sufficient for TFIIEβ interaction and that when the DNA-binding of the HD is mutated, the interaction is diminished but not abolished. Another example used Bimolecular fluorescence complementation (BiFC) in vivo to demonstrate that the Ubx HD and AbdA HD are sufficient for direct interaction with Med19. In addition of the conserved HD affinities to DNA and RNA, several protein-protein interactions also relied on the HD, such as dimerization of Scr, and Antp interaction with Eyeless (Altamirano-Torres, 2018).

Although this study found that Antp-TFIIEβ interaction is YPWM-independent in BiFC cell culture and the presence of an intact YPWM motif in the helix 2 Antp mutant showed neither interaction by BiFC nor functional activity, co-expression of the YPWM mutant and TFIIEβ reduced the signal interaction in embryos. A similar result in embryos was found in a earlier study where YPWM Antp mutant showed a reduction but not an abolition of TFIIEβ interaction on Drosophila embryos, that could be attributable to the presence of helix 2 in the mutant. Altogether, this suggest that interactions of Antp with TFIIEβ could change from one tissue to another with complex formation in different tissues using various interfaces (YPWM and/or HD), contributing to the plasticity of Hox interaction properties (Altamirano-Torres, 2018).

Deletional analysis of Antp HD suggested interaction of TFIIEβ through the helix 2 of Antp HD. Based on the reported 3D-structure of Antp HD DNA complex, in which helix 2 is on the opposite side of the HD-DNA binding, this study selected the conserved residues 32 and 36, which are exposed and physically available, as candidates for TFIIEβ interaction. To perform a molecular dissection on the Antp-TFIIEβ interaction, the residues I32 and H36 of helix 2, either individually or together, were studied by site-directed mutagenesis in cell culture. BiFC results show a drastic reduction of the interaction by mutation of these two residues, indicating that they are directly involved on Antp-TFIIEβ interaction. It has been demonstrated that AntpHD is internalized to the nuclei, through the residues 43-58 of the third helix. Therefore, since the mutations examined in this study are present on helix 2, the Antp NLS were not affected. To confirm that, immunostaining of Antp helix 2 mutants on cells and embryos showing very clearly the nuclear localization of Antp helix 2 single mutants and double mutant Antp. These results indicated that Antp helix 2 mutants include NLSs for their localization into the nucleus. Moreover, it was also demonstrated that helix 2 mutant keeps its transactivation activity and is capable to interact with EXD in cells and embryos confirming that mutation of these amino acids did not alter DNA binding affinity and the protein conformation to perform essential activities required for in vivo transformation (Altamirano-Torres, 2018).

Since both substitutions by alanines or structurally similar residues affected Antp-TFIIEβ interaction in cell culture in the same manner, I32A-H36A HD mutant was selected for the in vivo analysis in Drosophila. In concordance with BiFC cell culture assay, the results showed no interaction in embryos or in imaginal discs with Antp mutant I32A-H36A. Therefore, residues 32 and 36 of Antp helix 2 are crucial for the interaction with TFIIEβ in BiFC assays in Drosophila embryos and imaginal discs. This is relevant because residues 32 and 36 on Antp helix 2 are identical and highly conserved within Drosophila Hox proteins and can be extrapolated for the interaction with TFIIEβ to another homeoproteins due to the high Hox conservation (Altamirano-Torres, 2018).

Although the results very clearly show Antp-TFIIEβ interaction through positions 32 and 36 of helix 2, this does not exclude the possibility of another amino acid positions, either at helix 2 or the intervening loop, that could be involved to a minor extent on the interaction. For example, position 30 and 33, in addition to the helix 2 amino acids 32 and 36, have also been reported in human POU proteins Oct-1 and Oct-2 interaction with VP16 transactivator factor of Herpes Simplex Virus (Altamirano-Torres, 2018).

Because the precise molecular mechanisms of Antp in transcriptional regulation remains unclear, attempts were made to shed light on these by determining whether I32 and H36 are important for Antp function. When Antp is ectopically expressed on embryos it causes inhibition of head-involution and transformation of prothoracic segment T1 into T2 and antennae into mesothoracic (T2) legs. Antp ectopic expression exhibits that residues 32 and 36 of HD helix 2 are essential for its function in embryo head involution and homeotic transformations of thorax and antenna. Lack of homeotic transformations of AntpI32A-H36A double mutant expression indicates that residues 32 and 36 of HD helix 2 are absolutely required for the Antp ectopic homeotic function in Drosophila. Likewise, Antp mutated in the YPWM motif is not capable of transforming the antenna, and a single exposed residue on helix 1 of Scr HD is necessary for its homeotic function, showing that beside the HD DNA-binding, exposed positions on the HD are crucial for Hox functional activity (Altamirano-Torres, 2018).

To determine the functional relevance of the Antp-TFIIEβ interaction, co-expression of TFIIEβ and double mutant AntpI32A-H36A was directed to the antenna, showing a drastic reduction of the antenna transformation. These findings clearly demonstrate that Antp-TFIIEβ interaction (visualized by BiFC in live larvae) is necessary for the Antp homeotic function with a very strong transformation of the antenna into T2 mesothoracic leg. Together, these results imply that very subtle changes of two amino acids in the Antp HD helix 2 can have dramatic effects on protein-protein interaction with TFIIEβ, affecting transcriptional control and the functional properties of antenna-to-tarsus transformation (Altamirano-Torres, 2018).

These results show that the interaction between TFIIEβ and Antp HD contributes to transcriptional regulation and functional activities of Antennapedia. In the Pol II PIC formation, TFIIE is a heterodimer with α and β subunits, regulating TFIIH activities such as kinase on RNA Pol II CTD, ATPase and DNA helicase. TFIIEβ binds to both TFIIB and TFIIF in important activities needed for promoter melting and stabilization as well as for the transition to elongation. Thus, Antp-TFIIEβ interaction may represent a key control point for modulation of transcription factors involved in activation or repression functions. Repression activity of Antp-TFIIEβ interaction may imply destabilization of the PIC complex or the inhibition of TFIIEβ functions modulating TFIIH ATPase, CTD kinase or helicase activities. For example, it has been determined by in vitro transcription and co-immunoprecipitation assays that the zinc-finger TF Kruppel (Kr), a Drosophila segmentation protein for late embryonic development, interacts in a dimeric way with TFIIEβ and this interaction represses transcription. If it is considered that Antp dictates leg fate by repressing the activity of antenna-determining genes such as Hth and Dll in the leg imaginal discs, it could be reasonable that Antp-TFIIEβ can be involved in repression. Co-expression of Antp with TFIIEβ resulted in a reduction to 47% of the expression of Luciferase compared with of Antp alone, however further experiments need to be done to evaluate the precise molecular mechanism of this interaction. It could also be possible that Antp facilitates the arrival of TFIIEβ to the PIC and subsequently the recruitment and/or activation of TFIIH, allowing an efficient transcription elongation. For example, mutation of Med19 on haltere imaginal discs shows that Med19 is required for Ubx target gene activation. Another example would be that Kr binds to TFIIB in a monomeric way, and this interaction activates transcription in vitro. Thus, further experiments are needed to determine the fine molecular mechanism of how interaction between Antp and TFIIEβ contribute to transcriptional regulation by activation or repression activities, or even both (Altamirano-Torres, 2018).

This study has presented a clear interaction of TFIIEβ with two amino acid positions of Antp HD that are important for Antp homeotic function, and this interplay is essential to the Antp antenna-to-tarsus transformation. In conclusion, amino acids 32 and 36 of Antp HD helix 2 play a very important role in determining the specificity of the TFIIEβ interaction. Altogether, these results provide insights into the molecular interface of Antp HD with TFIIEβ to evaluate the extent to which these molecular contacts translate into functional properties in activation or repression of target genes. The role of residues 32 and 36 on Antp helix 2 can be extrapolated for the interaction of TFIIEβ with other homeoproteins, for example Scr, Ubx and AbdA, due to the highly Hox conservation. In addition, Antp-TFIIEB interaction open the possibility to more broadly explore the interplay between Antp and additional transcription factors in the Hox interactome for the genetic control of development in Drosophila (Altamirano-Torres, 2018).

Large-scale analysis of Drosophila core promoter function using synthetic promoters

The core promoter plays a central role in setting metazoan gene expression levels, but how exactly it "computes" expression remains poorly understood. To dissect its function, a comprehensive structure-function analysis in was carried out in Drosophila. First, a genome-wide bioinformatic analysis was performed, providing an improved picture of the sequence motifs architecture. Then synthetic promoters' activities of ~3,000 mutational variants with and without an external stimulus (hormonal activation) were measured, at large scale and with high accuracy using robotics and a dual luciferase reporter assay. A strong impact was observed on activity of the different types of mutations, including knockout of individual sequence motifs and motif combinations, variations of motif strength, nucleosome positioning, and flanking sequences. A linear combination of the individual motif features largely accounts for the combinatorial effects on core promoter activity. These findings shed new light on the quantitative assessment of gene expression in metazoans (Punzi, 2022).

Appropriate gene expression with the correct timing is crucial for the development and diversity of all organisms. The control of gene expression occurs primarily at the process of transcription, and the core promoter-the region immediately surrounding the transcription start site (TSS)-makes an essential contribution for setting the gene expression level (Punzi, 2022).

The RNA polymerase II (Pol II) core promoter is the minimal DNA sequence that is recognized by the basal transcription machinery. It comprises the TSS and approximately 150 bp of the flanking sequence. The accurate transcription initiation and basal expression level of a gene are primarily determined by differential recruitment of the transcription machinery, consisting of Pol II and general transcription factors (GTFs), to its core promoter region. Genome-wide studies have revealed various properties of native core promoters. In particular, sequence motifs that are over-represented around TSSs mostly mark the potential binding sites of GTFs or other transcription factors (TFs). A number of core promoter elements (CPE) have been described in eukaryotic core promoters, such as the TATA box, the initiator (Inr), or the downstream promoter element (DPE). These elements however typically only occur in a fraction of promoters, prompting the question of how the transcription machinery finds the core promoter in the absence of such motifs. Yet unknown motifs or the incorporation of physical properties of the DNA within the core promoter region may contribute to an explanation. Moreover, genetic variations occurring at the motif sites alter both promoter strength and TSS position significantly. Although the genomic analysis of native sequences suggests certain causal relationships, the variations in genomic sequences have been very challenging to predict. This makes it difficult to uncover the sequence attributes responsible for activity changes. Noteworthy, Arnold (2017) showed for the main motifs Inr, TATA, and DPE that the resemblance with the canonical sequences correlates with the responsiveness of the enhancer targeting the promoter (i.e., how much expression changes when an enhancer is active), with an increasing responsiveness observed for higher position weight matrix (PWM) match scores. Interestingly, they also found that the correlation is higher for strongly responsive sequences than for weaker ones. However, it remains difficult to ascertain the influence of specific features except by directly altering them and measuring the effect on expression levels (Punzi, 2022).

Facilitated by DNA synthesis technology and next-generation sequencing, high-throughput approaches such as massively parallel reporter assays (MPRAs) have been developed to test how the DNA sequence affects gene expression (transcripts) at single molecule resolution and at large scale. A second kind of MPRA method quantifies the protein fluorescence as the readout of reporter gene expression but can only obtain discrete expression measurements because of their 'bin' sorting design (which cannot sense subtle effects) and of the intrinsically relatively narrow dynamical range of the fluorescence readout. Moreover, most of these studies focused on enhancers, especially on single TF binding sites. Only few MPRAs were designed for in vivo promoter analysis, such as the extensive studies on fully designed yeast proximal promoter regions and yeast core promoter sequences, or the analysis of autonomous promoter activity of random genome fragments in humans, and in Drosophila melanogaster (D. melanogaster) (Arnold, 2017). Thus, despite the pivotal role of core promoters in transcription initiation, it remains poorly understood how the components and sequence features of the core promoter determine expression levels (Punzi, 2022).

This study aims to dissect the core promoter comprehensively and to elucidate the sequence determinants of promoters in D. melanogaster S2 cells. First the motif architecture of D. melanogaster core promoters were interrogated by developing a statistical framework based on PWMs to compute the over-representation of candidate motifs in promoter sequences. Using the state-of-the-art motif finding tool XXmotif algorithm leads to the de novo detection of all currently known, but also of several previously unknown motifs that are conserved and enriched in promoter regions. Drosophila melanogaster core promoters cluster into four classes characterized by distinct motif architectures and other promoter attributes. Then promoter activity was tested using a dual luciferase assay, which is highly sensitive with a linear and broad dynamical range. The entire experimental pipeline was integrated using automated robotic systems, including cloning and luciferase gene expression readout. By extensively measuring the activity of mutagenized core promoter sequences for 19 representative genes, the functional specificity of sequence motifs was corroberated. Their strength, as measured by the position weight matrix (PWM) score, and their precise positioning are essential features determining core promoter activity. Additionally, core promoter motifs were comprehensively mutagenized using single base-pair mutations to produce expression-based position probability matrices (PPMs). Combinatorial motif mutations that alter both the strength and the positioning of all motifs often result in strong effects on activity, which are compared with the effects of individual motif mutations: A linear combination of these individual motif features can largely account for the joint effects on core promoter activity. In addition, the influence of surrounding regions on promoter activity was investigated. By testing sequences impacting -1 and +1 nucleosomes, their influence on the constitutive core promoter activity was shown to be relatively mild, the effect being stronger for nucleosome positioning sequences downstream of the TSS. The influence of context sequences (i.e., the background sequences surrounding the CPEs) was also tested, and their strong impact on expression was confirmed. Finally, the response upon activation was investigated through an external hormonal stimulus by the steroid hormone ecdysone (a transcriptional activator). This hormone is important for metamorphosis, molting, and development of the eye and the nervous system in insects. Its active form (20-hydroxyecdysone) constitutes, together with its receptor (the ecdysone receptor EcR), a well-studied activator system for gene expression. It was found that the responsiveness of a given promoter depends on its architecture. Notably, ecdysone can induce both developmental and constitutive core promoters but the induction is stronger with the developmental ones. A negative correlation was found between the ecdysone inducibility and the basal expression level; this correlation is more significant for constitutive promoters (Punzi, 2022).

These results reinforce the conclusions drawn from other smaller scale studies for the roles of core promoter motifs in determining transcriptional output, also generalizing their effects to more promoter architectures. Nevertheless, the major contribution of this work is to bring new insights into D. melanogaster core promoter function (Punzi, 2022).

First, based on the CPE classes identified by XXmotif, four core promoter architectures (Ar. 1-4) were defined, reflecting different modes of transcriptional regulation at the core promoter and different physical properties of the DNA. The co-occurrence of CPEs within the classes indicates that each motif class recruits a specific transcription initiation complex utilizing several binding sites. One such example is the TFIID complex that assembles at the DNA due to interactions to the Class 1 elements INR bound by the subunits TAF1 and TAF2 and the DPE element bound by the subunits TAF6 and TAF9. It is proposed that the remaining Class 1 elements also contribute to the binding of TFIID. Within Class 2, TATA-boxes are known to be bound by TBP, which is another part of the TFIID complex. Since TATA-boxes are anti-correlated to DPE, the novel ATGAA-positioned similarly to DPE-might replace it in Class 2 promoters. A similar hypothesis can be stated for Class 4 consisting of TCT and RDPE. Genes containing the TCT are not regulated by TFIID, but by a special RNA polymerase II system for ribosomal protein genes. The clustering also suggests two distinct preferred compositions in the 3rd class (INR2 + Ohler6 pair and DRE + Ohler7 pair) (Punzi, 2022).

The well-known functional motifs like INR, TATA-Box, MTEDPE, INR2 (more widely known as Ohler1 or motif 1), DRE, and Ohler7 (Ohler, 2002) are necessary for gene expression. Their roles are unique and they cannot be replaced by positionally or functionally similar motifs from other architectures. Pairwise knockouts mostly elicit more significantly negative effects on transcription, and these effects show in some cases superadditivity. Conversely, most of the motif consensus sequences tend to increase core promoter activity. All these findings are consistent between different core promoters and emphasize again the importance of the sequence motifs for core promoter function (Punzi, 2022).

However, not all well-characterized motifs have a significant effect on expression in the current measurements. This is especially the case with TCT, which stands in contrast with the strong loss of transcriptional activity observed by Parry (2010) in a mutational analysis. The differences may arise from transcription originating at another location on the reporter plasmid or differences in translation efficiency, as discussed above. However, TCT is the only CA less TSS-motif and is part of a specialized TCT-based Pol II transcription system, distinct from the INR-based system (Parry, 2010). This might explain why this motif makes almost no contribution to promoter activity in the current measurements, although it exists in nearly all ribosomal protein gene promoters in D. melanogaster. By contrast, housekeeping core promoter motifs like INR2 and Ohler6 that co-occur in multiple promoters show stronger influence in the current data. It is known that more than half of the ribosomal core promoters contain this INR2 motif. A recent study proposed that the INR2 binding protein M1BP can act as an intermediary factor to recruit TRF2 for proper transcription of ribosomal protein genes (Baumann, 2017). The current perturbation analysis of INR2 in various ribosomal promoter backgrounds supports that finding. The results obtained with Ohler6 also suggest that the unknown TF(s) that bind to it may function similarly as M1BP (Punzi, 2022).

Among the four tested novel motif candidates discovered by XXmotif, TTGTTrev and RDPE were identified as having measurable effects on expression after mutation, hereby confirming their biological relevance. TTGTTrev shares a similar function with a negative regulatory element for binding of a transcriptional repressor AEF-1. The occurrence of RDPE is highly correlated with TCT and can partially replace the function of MTEDPE in developmental architectures. However, it is noted that the mutations in the two newly discovered motifs like TTGTT and CGpal show little effect on expression, suggesting that these two computationally derived over-represented sequences lack functional importance as core promoter elements. They are therefore likely to represent binding sites of transcription factors that are not expressed in the current experiments. Due to the similarity of TTGTT with TCT, this motif may act as a redundant version of the TCT motif (Punzi, 2022).

The highly sensitive assay can also accurately capture the partially subtle expression changes caused by single base-pair variations of motifs. It was confirmed that the most over-represented sequence of a given motif in the genome mainly stands for its best functional form, but differences were also seen with the computationally derived matrices: The expression-based activity logos are generally less specific. The two kinds of motifs are complementary since they reflect different phenomena: In silico discovered motifs are expected to reflect binding affinities, whereas the expression measurements capture the effect on transcription initiation, which could be buffered, for example, by alternative pathways/coactivator complexes (Punzi, 2022).

Altering motif positions overall decreases expression. This phenomenon has been observed before by Schor (2017), who showed using CAGE measurements that changing the distance between motifs could have a major impact on transcriptional initiation and overall transcripts levels. More generally, Arnold (2017) demonstrated that the positional occurrence of specific 5-mers relative to the TSS is predictive of the enhancer sequences' responsiveness by a linear model, which is difficult to validate with the current measurements due to too few data points for positioning compared to a deep sequencing method. Several studies have suggested that the exact spacing is essential for synergism between the core promoter motifs to function as active pairs to recruit GTFs along with Pol II for accurate transcription initiation. The results are in line with these previous findings for strictly positioned motifs such as INR, MTEDPE, and TATA-Box. Their locations and spacings are highly restricted for the effective binding of the TFIID to nucleate the PIC. Other motifs that can function over wide ranges and are not necessary for constituting the major machinery, for example, DRE, Ohler6, and Ohler7, show less stringent location requirement and smaller effects on expression, as long as they do not disrupt other sequence features (Punzi, 2022).

Importantly, this study also demonstrated that not only the core promoter motifs but also their context sequences determine expression output, giving insights into the debated role of motif flankings and context sequences of core promoters. The results uncover that sequence motifs mostly prefer their native context. Remarkably, although only INR and INR-like motifs including INR2 and Ohler7 can drive higher expression when their consensus sequences are inserted into motif-less core promoters, the motif combinations from almost all the other defined architectures can result in a substantial increase of expression level, revealing the importance of motif synergism. This study did not test motif activity in random sequence context. Nevertheless an influence was seen of the sequence context independent of the motifs, which may obey complicated rules. It is however beyond the scope of this study (Punzi, 2022).

Considering that pairwise motif disruption already suggests certain levels of synergistic effects, the higher order combinatorial effect of mutant motifs and their context on expression may be more difficult to understand. To dissect this complexity of the mutant combinations, a linear regression model was used to check how much of the core promoter activity can be correlated with individual effects. Surprisingly, it was found that the expression changes caused by single mutations of sequence motifs joined in a linear fashion can predict to an important extent the output of the free mutant combinations. Hence, promoter expression levels of mixed and combined motifs can largely be explained by simple linear addition of their individual contributions. The sequence features were extended from the motifs alone to larger sequence blocks that contain motifs together with their context. Here too, it was found that a linear model describes the expression of these inter-architectural block combinations well. A linear combination of individual sequence features like the motifs or wider sequence blocks including their context sequences can account for two-thirds of the variance in expression levels, as regulated by the core promoter. To unravel the nonlinear interactions, more data and detailed models would however be necessary (Punzi, 2022).

The ecdysone responsiveness highly depends on the core promoter architecture. This developmental stimulus functions more strongly on developmental core promoters. There is a generally negative correlation between the ecdysone responsiveness and the basal expression level. The strongest promoters can barely be induced by ecdysone. The higher the expression level, the more difficult it is to further boost the signal, hinting at the saturation of the promoter expression. This effect is stronger for constitutive core promoters, showing their less efficient activation. The disruption of INR in developmental core promoters can lead to a reduction in the ecdysone responsiveness, which is consistent with what was reported in a previous study in Spodoptera frugiperda. Taken together, the different sequence motifs composing distinct core promoter architectures can predict their ecdysone responsiveness: Developmental core promoters exhibit a stronger inducibility (Punzi, 2022).

Finally, by investigating the effect of potential nucleosome binding, moderate effects were observed on expression (compared to motif knockouts) driven by these different potential nucleosomal backgrounds. Note that although nucleosomal presence on plasmid for one construct was tested, it is not known if the promoters have native nucleosome occupancy. Greater expression variation was found for housekeeping and ribosomal core promoters than for developmental core promoters when changing the TSS nucleosomal sequence downstream the TSS (block 7); this suggests the significance of the genomic +1 nucleosomal sequences for the function of constitutive core promoters (Punzi, 2022).

The current method based on the luciferase assay for assessing promoter activity has however limitations: Different translation efficiency due to the varying 5'UTR between transcripts is not captured, and the assay is blind toward the TSS that is actually being used in the endogenous promoters. These phenomena could lead to promoter activity measurements that do not perfectly reflect the endogenous expression. Additional shortcomings of this technique are that the measurements were performed using episomal plasmids in transiently transfected cells. Although the genomic ± 1 nucleosome positioning sequences were inserted from different genes surrounding the tested core promoter region, they still lack the ability to represent the endogenous chromosomal context and the higher order genomic structure, which might change the basal expression levels as well as the ecdysone inducibilities. Furthermore, the method has a moderate throughput that is lower than most sequencing-based approaches, and the cloning and colony picking procedures also limit sequence recovery from the designed oligonucleotides (Punzi, 2022).

list of proteins involved in messenger RNA synthesis


Altamirano-Torres, C., Salinas-Hernandez, J. E., Cardenas-Chavez, D. L., Rodriguez-Padilla, C. and Resendez-Perez, D. (2018). Transcription factor TFIIEbeta interacts with two exposed positions in helix 2 of the Antennapedia homeodomain to control homeotic function in Drosophila. PLoS One 13(10): e0205905. PubMed ID: 30321227

Aoyagia, N. and Wassarman, D. A. (2000). Genes encoding Drosophila melanogaster RNA polymerase II general transcription factors: diversity in TFIIA and TFIID components contributes to gene-specific transcriptional regulation. J. of Cell Bio. 150: F45-50. 10908585

Arenas-Mena, C. (2017). The origins of developmental gene regulation. Evol Dev 19(2): 96-107. PubMed ID: 28116828

Arnold, C. D., Zabidi, M. A., Pagani, M., Rath, M., Schernhuber, K., Kazmar, T. and Stark, A. (2017). Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat Biotechnol 35(2): 136-144. PubMed ID: 28024147

Barbieri, E., Trizzino, M., Welsh, S. A., Owens, T. A., Calabretta, B., Carroll, M., Sarma, K. and Gardini, A. (2018). Targeted enhancer activation by a subunit of the integrator complex. Mol Cell 71(1): 103-116 e107. PubMed ID: 30008316

Baumann, D. G. and Gilmour, D. S. (2017). A sequence-specific core promoter-binding transcription factor recruits TRF2 to coordinately transcribe ribosomal protein genes. Nucleic Acids Res 45(18): 10481-10491. PubMed ID: 28977400

Boija, A., Klein, I. A., Sabari, B. R., Dall'Agnese, A., Coffey, E. L., Zamudio, A. V., Li, C. H., Shrinivas, K., Manteiga, J. C., Hannett, N. M., Abraham, B. J., Afeyan, L. K., Guo, Y. E., Rimel, J. K., Fant, C. B., Schuijers, J., Lee, T. I., Taatjes, D. J. and Young, R. A. (2018). Transcription factors activate genes through the phase-separation capacity of their activation domains. Cell 175(7): 1842-1855. PubMed ID: 30449618

Bose, D. A., Donahue, G., Reinberg, D., Shiekhattar, R., Bonasio, R. and Berger, S. L. (2017). RNA binding to CBP stimulates histone acetylation and transcription. Cell 168(1-2): 135-149 e122. PubMed ID: 28086087

Cazalla, D., Xie, M. and Steitz, J. A. (2011). A primate herpesvirus uses the integrator complex to generate viral microRNAs. Mol Cell 43(6): 982-992. PubMed ID: 21925386

Cho, H., et al. (1999). A protein phosphatase functions to recycle RNA polymerase II. Genes Dev. 13: 1540-52. Medline abstract: 10385623

Cho, W. K., Spille, J. H., Hecht, M., Lee, C., Li, C., Grube, V. and Cisse, II (2018). Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science 361(6400): 412-415. PubMed ID: 29930094

Core, L. J., Martins, A. L., Danko, C. G., Waters, C. T., Siepel, A. and Lis, J. T. (2014). Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46(12): 1311-1320. PubMed ID: 25383968

Duttke, S. H. C., Lacadie, S. A., Ibrahim, M. M., Glass, C. K., Corcoran, D. L., Benner, C., Heinz, S., Kadonaga, J. T. and Ohler, U. (2015). Human promoters are intrinsically directional. Mol Cell 57(4): 674-684. PubMed ID: 25639469

Elrod, N. D., Henriques, T., Huang, K. L., Tatomer, D. C., Wilusz, J. E., Wagner, E. J. and Adelman, K. (2019). Mol Cell 76(5):738-752. PubMed ID: 31809743

Fan, W., Lam, S. M., Xin, J., Yang, X., Liu, Z., Liu, Y., Wang, Y., Shui, G. and Huang, X. (2017). Drosophila TRF2 and TAF9 regulate lipid droplet size and phospholipid fatty acid composition. PLoS Genet 13(3): e1006664. PubMed ID: 28273089

Fant, C. B., Levandowski, C. B., Gupta, K., Maas, Z. L., Moir, J., Rubin, J. D., Sawyer, A., Esbin, M. N., Rimel, J. K., Luyties, O., Marr, M. T., Berger, I., Dowell, R. D. and Taatjes, D. J. (2020). TFIID enables RNA polymerase II promoter-proximal pausing. Mol Cell. PubMed ID: 32229306

Gomez-Orte, E., Saenz-Narciso, B., Zheleva, A., Ezcurra, B., de Toro, M., Lopez, R., Gastaca, I., Nilsen, H., Sacristan, M. P., Schnabel, R. and Cabello, J. (2019). Disruption of the Caenorhabditis elegans Integrator complex triggers a non-conventional transcriptional mechanism beyond snRNA genes. PLoS Genet 15(2): e1007981. PubMed ID: 30807579

Hsu, J.-Y., et al. (2008). TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-dependent versus TATA-dependent transcription. Genes Dev. 22: 2353-2358. PubMed Citation: 18703680

Isogai, Y, Keles S, Prestel M, Hochheimer A, Tjian R. (2007). Transcription of histone gene cluster by differential core-promoter factors. Genes Dev. 21(22): 2936-49. PubMed ID: 17978101

Jin, Y., Eser, U., Struhl, K. and Churchman, L. S. (2017). The ground state and evolution of promoter region directionality. Cell 170(5): 889-898 e810. PubMed ID: 28803729

Kamieniarz-Gdula, K., Gdula, M. R., Panser, K., Nojima, T., Monks, J., Wisniewski, J. R., Riepsaame, J., Brockdorff, N., Pauli, A. and Proudfoot, N. J. (2019). Selective and roles of vertebrate PCF11 in premature and full-length transcript termination. Mol Cell 74(1): 158-172. PubMed ID: 30819644

Kim, M. K., Tranvo, A., Hurlburt, A. M., Verma, N., Phan, P., Luo, J., Ranish, J. and Stumph, W. E. (2020). Assembly of SNAPc, Bdp1, and TBP on the U6 snRNA gene promoter in Drosophila melanogaster. Mol Cell Biol. PubMed ID: 32253345

Kwak, H. and Lis, J. T. (2013). Control of transcriptional elongation. Annu Rev Genet 47: 483-508. PubMed ID: 24050178

Levitsky, V. G., Zykova, T. Y., Moshkin, Y. M. and Zhimulev, I. F. (2020). Nucleosome Positioning around Transcription Start Site Correlates with Gene Expression Only for Active Chromatin State in Drosophila Interphase Chromosomes. Int J Mol Sci 21(23). PubMed ID: 33291385

Louder, R. K., He, Y., Lopez-Blanco, J. R., Fang, J., Chacon, P. and Nogales, E. (2016). Structure of promoter-bound TFIID and model of human pre-initiation complex assembly. Nature 531(7596): 604-609. PubMed ID: 27007846

Lebedeva, L. A., et al. (2005). Occupancy of the Drosophila hsp70 promoter by a subset of basal transcription factors diminishes upon transcriptional activation. Proc. Natl. Acad. Sci. 102(50): 18087-92. PubMed citation: 16330756

Lai, F., Gardini, A., Zhang, A. and Shiekhattar, R. (2015). Integrator mediates the biogenesis of enhancer RNAs. Nature 525(7569): 399-403. PubMed ID: 26308897

Xie, M., Zhang, W., Shu, M. D., Xu, A., Lenis, D. A., DiMaio, D. and Steitz, J. A. (2015). The host Integrator complex acts in transcription-independent maturation of herpesvirus microRNA 3' ends. Genes Dev 29(14): 1552-1564. PubMed ID: 26220997

Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., Sim, H. S., Peh, S. Q., Mulawadi, F. H., Ong, C. T., Orlov, Y. L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C. L., Ge, W., Wang, H., Davis, C., Fisher-Aylor, K. I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fullwood, M. J., Cheung, E., Liu, E., Sung, W. K., Snyder, M. and Ruan, Y. (2012). Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148(1-2): 84-98. PubMed ID: 22265404

Liu, W. L., et al. (2009). Structures of three distinct activator-TFIID complexes. Genes Dev. 23(13): 1510-21. PubMed Citation: 19571180

Mahat, D. B., Kwak, H., Booth, G. T., Jonkers, I. H., Danko, C. G., Patel, R. K., Waters, C. T., Munson, K., Core, L. J. and Lis, J. T. (2016). Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat Protoc 11(8): 1455-1476. PubMed ID: 27442863

Marr, M. T., Isogai, Y., Wright, K. J. and Tjian, R. (2006). Coactivator cross-talk specifies transcriptional output. Genes Dev. 20(11): 1458-69. 16751183

Mikhaylichenko, O., Bondarenko, V., Harnett, D., Schor, I. E., Males, M., Viales, R. R. and Furlong, E. E. M. (2018). The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription. Genes Dev 32(1):42-57. PubMed ID: 29378788

Murakami, K., Elmlund, H., Kalisman, N., Bushnell, D. A., Adams, C. M., Azubel, M., Elmlund, D., Levi-Kalisman, Y., Liu, X., Gibbons, B. J., Levitt, M. and Kornberg, R. D. (2013). Architecture of an RNA polymerase II transcription pre-initiation complex. Science 342: 1238724. Abstract

Nguyen, T. A., Jones, R. D., Snavely, A. R., Pfenning, A. R., Kirchner, R., Hemberg, M. and Gray, J. M. (2016). High-throughput functional comparison of promoter and enhancer activities. Genome Res 26(8): 1023-1033. PubMed ID: 27311442

Nikolov, D. B. and Burley, S. K. (1997). RNA polymerase II transcription initiation: A structural view. Proc. Natl. Acad. Sci. 94: 15-22. Medline abstract: 8990153

Ohler, U., Liao, G. C., Niemann, H. and Rubin, G. M. (2002). Computational analysis of core promoters in the Drosophila genome. Genome Biol 3(12): RESEARCH0087. PubMed ID: 12537576

Orphanides, G., Lagrange, T., and Reinberg, D. (1996). The general transcription factors of RNA polymerase II. Genes Dev. 10: 2657-83. Medline abstract: 8946909

Pahi, Z., Kiss, Z., Komonyi, O., Borsos, B. N., Tora, L., Boros, I. M. and Pankotai, T. (2015). dTAF10- and dTAF10b-containing complexes are required for ecdysone-driven larval-pupal morphogenesis in Drosophila melanogaster. PLoS One 10: e0142226. PubMed ID: 26556600

Parry, T. J., Theisen, J. W., Hsu, J. Y., Wang, Y. L., Corcoran, D. L., Eustice, M., Ohler, U. and Kadonaga, J. T. (2010). The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev 24(18): 2013-2018. PubMed ID: 20801935

Patel, A. B., Louder, R. K., Greber, B. J., Grunberg, S., Luo, J., Fang, J., Liu, Y., Ranish, J., Hahn, S. and Nogales, E. (2018). Structure of human TFIID and mechanism of TBP loading onto promoter DNA. Science 362(6421). PubMed ID: 30442764

Petrenko, N. and Struhl, K. (2021). Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species. Elife 10. PubMed ID: 34515029

Pimmett, V. L., Dejean, M., Fernandez, C., Trullo, A., Bertrand, E., Radulescu, O. and Lagha, M. (2021). Quantitative imaging of transcription in living Drosophila embryos reveals the impact of core promoter motifs on promoter state dynamics. Nat Commun 12(1): 4504. PubMed ID: 34301936

Punzi, G., Ursini, G., Chen, Q., Radulescu, E., Tao, R., Huu Qi, Z., Jung, C., Bandilla, P., Ludwig, C., Heron, M., Sophie Kiesel, A., Museridze, M., Philippou-Massier, J., Nikolov, M., Renna Max Schnepf, A., Unnerstall, U., Ceolin, S., Muhlig, B., Gompel, N., Soeding, J. and Gaul, U. (2022). Large-scale analysis of Drosophila core promoter function using synthetic promoters. Mol Syst Biol 18(2): e9816. PubMed ID: 35156763

Qiu, Y. and Gilmour, D. S. (2017). Identification of regions in the Spt5 subunit of DSIF that are involved in promoter proximal pausing. J Biol Chem [Epub ahead of print]. PubMed ID: 28213523

Rubtsova, M. P., Vasilkova, D. P., Moshareva, M. A., Malyavko, A. N., Meerson, M. B., Zatsepin, T. S., Naraykina, Y. V., Beletsky, A. V., Ravin, N. V. and Dontsova, O. A. (2019). Integrator is a key component of human telomerase RNA biogenesis. Sci Rep 9(1): 1701. PubMed ID: 30737432

Sabari, B. R., Dall'Agnese, A., Boija, A., Klein, I. A., Coffey, E. L., Shrinivas, K., Abraham, B. J., Hannett, N. M., Zamudio, A. V., Manteiga, J. C., Li, C. H., Guo, Y. E., Day, D. S., Schuijers, J., Vasile, E., Malik, S., Hnisz, D., Lee, T. I., Cisse, II, Roeder, R. G., Sharp, P. A., Chakraborty, A. K. and Young, R. A. (2018). Coactivator condensation at super-enhancers links phase separation and gene control. Science 361(6400). PubMed ID: 29930091

Schor, I. E., Degner, J. F., Harnett, D., Cannavo, E., Casale, F. P., Shim, H., Garfield, D. A., Birney, E., Stephens, M., Stegle, O. and Furlong, E. E. (2017). Promoter shape varies across populations and affects promoter evolution and expression noise. Nat Genet 49(4): 550-558. PubMed ID: 28191888

Shah, N., Maqbool, M. A., Yahia, Y., El Aabidine, A. Z., Esnault, C., Forne, I., Decker, T. M., Martin, D., Schuller, R., Krebs, S., Blum, H., Imhof, A., Eick, D. and Andrau, J. C. (2018). Tyrosine-1 of RNA polymerase II CTD controls global termination of gene transcription in mammals. Mol Cell 69(1): 48-61 e46. PubMed ID: 29304333

Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T., Fukuda, S., Sasaki, D., Podhajska, A., Harbers, M., Kawai, J., Carninci, P. and Hayashizaki, Y. (2003). Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100(26): 15776-15781. PubMed ID: 14663149

Sigova, A. A., Abraham, B. J., Ji, X., Molinie, B., Hannett, N. M., Guo, Y. E., Jangi, M., Giallourakis, C. C., Sharp, P. A. and Young, R. A. (2015). Transcription factor trapping by RNA in gene regulatory elements. Science 350(6263): 978-981. PubMed ID: 26516199

Skaar, J. R., Ferris, A. L., Wu, X., Saraf, A., Khanna, K. K., Florens, L., Washburn, M. P., Hughes, S. H. and Pagano, M. (2015). The Integrator complex controls the termination of transcription at diverse classes of gene targets. Cell Res 25(3): 288-305. PubMed ID: 25675981

Tatomer, D. C., Elrod, N. D., Liang, D., Xiao, M. S., Jiang, J. Z., Jonathan, M., Huang, K. L., Wagner, E. J., Cherry, S. and Wilusz, J. E. (2019). The Integrator complex cleaves nascent mRNAs to attenuate transcription. Genes Dev 33(21-22): 1525-1538. PubMed ID: 31530651

Tanaka, A., Akimoto, Y., Kobayashi, S., Hisatake, K., Hanaoka, F. and Ohkuma, Y. (2015). Association of the winged helix motif of the TFIIEalpha subunit of TFIIE with either the TFIIEbeta subunit or TFIIB distinguishes its functions in transcription. Genes Cells 20: 203-216. PubMed ID: 25492609

van Arensbergen, J., FitzPatrick, V. D., de Haas, M., Pagie, L., Sluimer, J., Bussemaker, H. J. and van Steensel, B. (2017). Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 35(2): 145-153. PubMed ID: 28024146

Verma, N., Hung, K. H., Kang, J. J., Barakat, N. H. and Stumph, W. E. (2013). Differential utilization of TATA box-binding protein (TBP) and TBP-related factor 1 (TRF1) at different classes of RNA polymerase III promoters. J Biol Chem 288(38): 27564-27570. PubMed ID: 23955442

Verma, N., Hurlburt, A. M., Wolfe, A., Kim, M. K., Kang, Y. S., Kang, J. J. and Stumph, W. E. (2018). Bdp1 interacts with SNAPc bound to a U6, but not U1, snRNA gene promoter element to establish a stable protein-DNA complex. FEBS Lett 592(14): 2489-2498. PubMed ID: 29932462

Xie, X., et al. (1996). Structural similarity between TAFs and the heterotetrameric core of the histone octamer. Nature 380: 316-322. Medline abstract: 8598927

Zhang, Z., English, B. P., Grimm, J. B., Kazane, S. A., Hu, W., Tsai, A., Inouye, C., You, C., Piehler, J., Schultz, P. G., Lavis, L. D., Revyakin, A. and Tjian, R. (2016). Rapid dynamics of general transcription factor TFIIB binding during preinitiation complex assembly revealed by single-molecule analysis. Genes Dev 30: 2106-2118. PubMed ID: 27798851

date revised: 15 December 2022

Zygotically transcribed genes

Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.

The Interactive Fly resides on the
Society for Developmental Biology's Web server.