Drosophila gene families: RNA polymerase, transription, and general transcription factors

The Interactive Fly

Zygotically transcribed genes

RNA polymerase and general transcription factors

Factors involved in function of RNA polymerase II
How does messenger RNA synthesis take place?
Evolution of general transcription factors
TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-dependent versus TATA-dependent transcription
Structures of three distinct activator-TFIID complexes
Architecture of an RNA polymerase II transcription pre-initiation complex
Association of the winged helix motif of the TFIIEalpha subunit of TFIIE with either the TFIIEbeta subunit or TFIIB distinguishes its functions in transcription
dTAF10- and dTAF10b-containing complexes are required for ecdysone-driven larval-pupal morphogenesis in Drosophila melanogaster
Identification of regions in the Spt5 subunit of DSIF that are involved in promoter proximal pausing
Drosophila TRF2 and TAF9 regulate lipid droplet size and phospholipid fatty acid composition
Assembly of SNAPc, Bdp1, and TBP on the U6 snRNA gene promoter in Drosophila melanogaster
TFIID Enables RNA Polymerase II Promoter-Proximal Pausing
The Integrator complex cleaves nascent mRNAs to attenuate transcription
Mediator and RNA polymerase II clusters associate in transcription-dependent condensates
Transcription factors activate genes through the phase-separation capacity of their activation domains
Nucleosome Positioning around Transcription Start Site Correlates with Gene Expression Only for Active Chromatin State in Drosophila Interphase Chromosomes
Quantitative imaging of transcription in living Drosophila embryos reveals the impact of core promoter motifs on promoter state dynamics
Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species
Transcription factor TFIIEbeta interacts with two exposed positions in helix 2 of the Antennapedia homeodomain to control homeotic function in Drosophila
Functionally distinct promoter classes initiate transcription via different mechanisms reflected in focused versus dispersed initiation patterns
Assessment of the roles of Spt5-nucleic acid contacts in promoter proximal pausing of RNA polymerase II
Distinct gene-selective roles for a network of core promoter factors in Drosophila neural stem cell identity
The catalytic-dead Pcif1 regulates gene expression and fertility in Drosophila

Promoters and Enhancers

see Enhancers and cis-regulations

Proteins involved in messenger RNA synthesis

General Transcription Factors, as the protein factors involved in messenger RNA synthesis are known, are conserved across species as diverse as Saccharomyces cerevisiae, Drosophila and humans. TF stands for transcription factor; they were named in chronological order of their discovery. The entire set of General Transcription Factors is composed of about 30 subunits. Although the model below assumes that the factors are assembled by stages, there is some reason to believe that all thirty are also found assembled in a holoenzyme (Orphanides, 1996 and references).

Note: General Transcription Factors are listed below in order of recruitment to the promoter.

TFIID: TFIID is multiprotein complex containing the TATA box binding protein (TBP) and (in Drosophila) at least seven other proteins known as TAFs or TBP associated factors. The first protein recruited to the promoter is TBP, which serves to induce a bend in the DNA. The 240 kD subunit (TAF250kd) contains an HMG-box, bromodomains, a serine kinase, and histone acetyltransferase activity. The smaller subunits are similar in structure to histones. Drosophila TBP-associated factor 60kD (also known as dTAFII62) and TBP-associated factor 40kD (also know as dTAFII42) are homologous to human hTAFII80 and hTAFII31 respectively; Drosophila and human proteins are homologous to histone H3 and histone H4, respectively. Both Drosophila and human TFIID also contain dTAFII30 alpha and hTAFII20 that are putatitive histone H2B homologues. In solution and in the crystalline state, the dTAFII42/dTAFII62 complex exists as a heterotetramer, resembling the (H3/H4)2 heterotetrameric core of the histone octamer, suggesting that TFIID contains a histone octamer-like substructure. TBP participates in TFIID function even in promoters lacking a TATA box (Xie, 1996).

     Drosophila                        FlyBase ID       Human homologs        Yeast homologs



     -----------------                 ----------       --------------------  --------------     



 



     TATA binding protein              FBgn0003687      TATA binding protein  TATA binding protein





     Tbp-related factor (Trf-1)        FBgn0010287      unknown





     Trf2                              FBgn0026758      TLF/TRF2





     TBP-associated factor (TAF) 250kD FBgn0010355      TAFII250              p130





     Bip2  (TAF_II155)                  FBgn0026262      TAFI_I140               yTAF_II47   





     TBP-associated factor 150kD       FBgn0011836      Not characterized     p150   





     TBP-associated factor 110kD       FBgn0010280      TAFII135              not. characterized

     
     No hitter (testis specific)       FBgn0041103      





     TBP-associated factor 80kD        FBgn0010356      TAFII85               p90

     
     Cannonball (testis specific)      FBgn0011569      



     Cabeza                            FBgn0011571      TAFII68               



     TBP-associated factor 60kD        FBgn0010417      TAFII80               p60





     Taf55                             FBgn0024909      TAFII55               TAFII67





     TBP-associated factor  40kD       FBgn0011302      TAFII31               not characterized 





     TAF 30kD subunit alpha            FBgn0011290      hTAFII20              not characterized          





     TAF 30kD subunit beta             FBgn0011291      hTAFII28              p40          



     

     TATA binding protein associated 

               factor 24kD subunit     FBgn0028398      TAFII30





     Taf18                             FBgn0026324      TAFII18               TAFII19





     TBP-associated factor 16          FBgn0026324      TAFII60





     ENL/AF9                           FBgn0026441      TAFII60               TAFII30

TFIIB TFIIB associates with TBP on the opposite side of the DNA helix. The TFIIB-TBP-DNA ternary complex is formed by TFIIB
clamping the acidic C-terminal stirrup of TBP in its basic cleft, and interacting with the phosphoribose backbone
upstream and downstream of the center of the TATA element.

TFIIB physically links TFIID at the promoter with the pol II/TFIIF complex.




     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       ------------------         



     Transcription factor IIB          FBgn0004915      TFIIB





TFIIA



Required for activation of transcription




     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       ------------------    



     Transcription factor IIA S        FBgn0013347      TFIIA gamma     





     Transcription factor IIA L        FBgn0011289      TFIIA alpha and beta     







TFIIE



TFIIE contains a zinc-binding domain and is involved in promoter melting.  TFIIE recruits TFIIH to the promoter.




     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       ------------------         





     Transcription factor IIEalpha     FBgn0015828      TFIIEalpha (56 kD)    





     Transcription factor IIEbeta      FBgn0015829      TFIIEbeta (34 kD)     







TFIIF



TFIIF is the homolog of bacterial sigma subunit.  Polymerase II cannot stably associate with the TFIID and TFIIB assembly at 
the promoter and must be escorted to the promoter by TFIIF.  TFIIF stimulates elongation.






     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       ------------------    





     Transcription factor TFIIFalpha   FBgn0010282      TFIIF RAP74    





     Transcription factor TFIIFbeta    FBgn0010421      TFIIF RAP30     







RNA polymerase



For RNA polymerase II, the transition from initiation to elongation is accompanied by covalent modification of an unusual 
structure at the carboxy terminus of its largest subunit.  This evolutionarily conserved structure consists of multiple 
tandem repeats of a heptapeptide, the RNA pol II carboxy-terminal domain (CTD).  The number of times this sequence is 
repeated varies from 26 in yeast to 52 in humans and seems to be directly related to genome complexity.  The 
phosphorylation of the CTD is central to the transcription mechanism of pol II.  The unphosphorylated form of pol II is the 
form recruited to the initiation complex.  During initiation of RNA synthesis, the CTD becomes extensively phosphorylated 
on serine and threonine residues.




     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       -----------------



     RNA polymerase II 215kD subunit   FBgn0003277      RNA polymerase II large subunit   





     RNA polymerase II 140kD subunit   FBgn0003276      RNA polymerase II small subunit      













TFIIH



TFIIH is a multisubunit factor with 3'-5' helicase activity.  The Drosophila TFIIH consists of 8 subunits (two listed here) 


similar to their human counterparts.  Besides the helicase activity, there is present  RNA polII C-terminal domain kinase 
activity (CDK7) and a cyclin partner for the kinase (Cyclin H). Cyclin H forms a ternary complex with CDK7 and MAT1.


This tripartite Cdk-activating kinase occurs in a free form and in association with 'core' TFIIH. 




     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       ------------------          



     Transcription factor IIH          FBgn0015830      TFIIH (ERCC3)





     Cyclin-dependent kinase 7         FBgn0015617      CDK7  









P-TEFb

A dimer of Cdk9 and Cyclin T that targets RNA polymerase II C-terminal domain.
Functions to overcome promoter-proximal pausing and premature termination - 
promotes polymerase entry into productive elongation. 




     Drosophila                        FlyBase ID       Human homologs



     -----------------                 ----------       ------------------          



     Cyclin dependent kinase 9         FBgn0019949      Cdk9





     Cyclin T                          FBgn0025455      Cyclin T  





TFIIS



critical for efficient release of stalled RNA Pol II from intrinsic stop sites in promoter regions - 
promotes transcriptional elongation and decreases pausing




Drosophila                                  FlyBase ID       Human homologs



-----------------                           ----------       ------------------          



RNA polymerase II elongation factor         FBgn0010422      TfIIS

Factors involved in function of RNA polymerase II

Heat shock protein 83
Negative elongation factor E (causes polymerase to pause in the promoter proximal region of heat shock genes)
Cdk8 (component of Mediator complex that phosphorylates PolII)

Factors involved in function of RNA polymerase III

Brf (TRF1 rather than TBP forms a complex with BRF that plays a major role in RNA pol III transcription)
Inverse regulator a (common alternative name: Pcf11) (RNA binding motif protein - dismantles elongation complexes by a Pol II C-terminal domain (CTD) dependent mechanism)
Spt5
DSIF, composed of Spt4 and Spt5, establishes the pause in transcription by recruiting NELF to the elongation complex - physically
interacts with MYC oncoprotein and is essential for efficient transcriptional activation of MYC targets in cultured cells - Integrator-bound
PP2A dephosphorylates the RNA Pol II C-terminal domain and Spt5, preventing the< transition to productive elongation - Pho interacts with Spt5 to facilitate
transcriptional switches at the hsp70 locus - interacts directly with MSL1 and is required downstream of MSL complex for dosage compensation
Spt6 (transcription elongation factor implicated in RNA processing and degradation of improperly processed pre-mRNA)
Su(Tpl) (common alternative name: ELL - occludin homology domain protein - a Pol II elongation factor capable of stimulating the rate of transcription)

Paf1 complex (coordinates histone modifications and changes in nucleosome structure with transcription activation and Pol II elongation)

hyrax (Cdc73 component)
paf1
Rtf1

How does messenger RNA synthesis take place?

The conventional model for formation of a preinitiation complex and ordered transcription by RNA polymerase II (pol II) is characterized by a distinct series of events: (1) recognition of core promoter elements by TFIID (containing TBP and several other protein subunits), (2) recognition of and binding to the TFIID-promoter complex by TFIIB, (3) recruitment of a TFIIE/pol II complex by TFIIB, (4) binding of TFIIE (related to bacterial sigma) and TFIIH (containing a helicase required for promoter melting) to complete the preinitiation complex, (5) promoter melting and formation of an "open" initiation complex, (6) synthesis of the first phosphodiester bond of the nascent mRNA transcript, (7) release of pol II contacts with the promoter (promoter clearance, and (8) elongation of the RNA transcript. TFIIA can join the complex at any stage after TFIID binding and stabilizes the initiation complex. TFIID can remain bound to the core promoter supporting reinitiation of transcription. (Orphanides, 1996 and Nikolov, 1997).

This model has been further refined to incorporate known alterations in the level of phosphorylation of the carboxy-terminal domain (CTD) of RNA polymerase II (Cho, 1999). Stable association of RNAPII with promoter sequences requires TFIID (or TBP), TFIIB, and TFIIF. However, the RNAPII transcription system is unique because, after the polymerase has stably associated with promoter sequences, two additional factors, TFIIE and TFIIH, are necessary for transcription. This requirement is likely related to a unique structure found at the carboxyl terminus of the largest subunit of RNAPII known as the carboxy-terminal domain (CTD). This conserved structure consists of multiple tandem repeats of the heptapeptide Tyr-Ser-Pro-Thr-Ser-Pro-Ser, which serves as a substrate for a number of protein kinases. At least two forms of RNAPII have been detected in cells. The most abundant form contains a phosphorylated CTD (RNAPIIO). A second form contains an unphosphorylated CTD and is known as RNAPIIA. The phosphorylation of the CTD has been correlated with function. It was found that the nonphosphorylated form of RNAPII is recruited to the initiation complex, whereas the elongating polymerase is found with a phosphorylated CTD. TFIIH contains a CTD kinase activity and this activity is efficient after RNAPII has associated with promoter sequences. A 150-kD polypeptide termed FCP1 has now been isolated. Together with RNAPII, FCP1 reconstitutes a highly specific CTD phosphatase activity. Functional analysis demonstrates that the CTD phosphatase allows recycling of RNAPII. Upon reaching termination sequences, the CTD becomes dephosphorylated by the FCP1 phosphatase within the ternary complex (consisting of DNA, polymerase and phosphatase) or immediately after the release of RNAPII from the DNA template. The phosphatase dephosphorylates the CTD allowing efficient recycling of RNAPII into transcription initiation complexes, which result in increased transcription. The phosphatase is found to stimulate elongation by RNAPII; however, this function is independent of its catalytic activity (Cho, 1999 and references).

A model is presented detailing the role of cycling of CTD phosphorylation in the function of RNAPII. After the termination of the previous transciption cycle, TBP remains bound to the TATA motif and provides the foundation for association of TFIIB. RNAPII, through its interactions with TFIIF, recognizes the TBP-TFIIB complex association with the TATA motif. Because TFIIF has been found to interact with both the phosphorylated and nonphosphorylated forms of RNAPII and FCP1 and to stimulate FCP1 activity, its association with RNAPII prior to association with the TB complex may be important in attaining an RNAPII that is fully dephosphorylated. The association of RNAPII with promoter sequences provides the foundation for the entry of TFIIE and allows the association of TFIIH, resulting in the formation of a fully competent transcription initiation complex. During the process of initiation and prior to the formation of a fully competent elongation complex, the CTD becomes phosphorylated in a TFIIH-dependent manner. Phosphorylation of the CTD does not affect elongation efficiency, but allows RNAPII to disengage from the promoter and from transcription initiation factors. In the presence of the ribonucleoside triphosphates, the transcription initiation complex disassembles with the release of TFIIB, TFIIE, and TFIIH. CTD phosphorylation provides a foundation for the association of factors involved in RNA processing, such as the capping enzyme, splicing factors, and factors involved in 3'-end formation. Upon transcription of termination/polyadenylation signals, the elongating complex is altered, resulting in the release of RNAPII from the template by an unknown process. It is possible that RNAPII is converted to the nonphosphorylated form prior to, or concomitant with, its release from the DNA template. This possibility is supported by studies demonstrating that FCP1 is capable of dephosphorylating the CTD of RNAPII not only in solution prior to incorporation into transcription initiation complexes, but also in active ternary elongation complexes stalled as a result of nucleotide starvation. The finding that FCP1 also stimulates elongation by RNAPII, independent of its phosphatase activity, suggests that FCP1 may remain associated with RNAPII during elongation. The finding that FCP1 is active in ternary complexes has implications for the mechanism of transcription termination as well as for the down-regulation of RNA processing. Similar to the signal imposed on phosphorylation of the CTD (disengagement of RNAPII from the promoter and from interaction with initiation factors), dephosphorylation of the CTD may result in a signal that releases factors from RNAPII that are involved in RNA maturation (Cho, 1999 and references).

Evolution of general transcription factors

How have the factors required for transcription initiation (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, and RNA polymerase II [pol II]) evolved to accommodate the elaborate transcriptional programs required for growth, differentiation, and development of multicellular organisms? Analysis of the complete Drosophila genome sequence, as well as those of C. elegans, Saccharomyces cerevisiae, and humans sheds light on this well studied question in eukaryotic biology. All four organisms encode single isoforms of RNA pol II, TFIIB, TFIIE, TFIIF, and TFIIH components, but multiple, sequence-related isoforms of TFIID components. In addition, Drosophila and humans encode multiple isoforms of TFIIA components. Current evidence indicates that tissue- and cell type-specific transcription is directed by differentially expressed TFIID and possibly TFIIA isoforms. Thus, in accord with experimental data, this analysis points to TFIIA and TFIID as the factors that help generate the broad transcriptional repertoire of multicellular organisms. The identification of the complete set of TFIIA and TFIID components in a genetically and biochemically tractable organism like Drosophila is an important step toward understanding the mechanisms governing developmentally regulated transcription not only in Drosophila but also in humans (Aoyagia, 2000 and references therein).

Biochemical fractionation of Drosophila embryos, human cells, and yeast cells has defined a set of multiprotein complexes termed general transcription factors (GTFs; TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) required for mRNA transcription initiation in vitro. Transcription is initiated by recognition of core promoter elements by TFIID and sequential or concerted assembly of the other GTFs and RNA pol II to form the preinitiation complex (PIC). Although GTFs play essential roles during transcription initiation, it is the factors that regulate the ability of the GTFs to assemble and stably bind a core promoter that are probably major determinants of gene-specific transcription levels. For example, activators and coactivators are thought to stimulate transcription by recruiting GTFs to a promoter, thereby accelerating PIC assembly (Aoyagia, 2000 and references therein).

The GTF TFIID is composed of TATA-binding protein (TBP) and coactivator subunits termed TBP-associated factors (TAF_IIs). TAF_IIs not only function as 'conventional' coactivators by serving as physical links between DNA-binding activator proteins and the PIC but also possess enzymatic or promoter recognition activities that presumably enhance the efficiency of PIC assembly. TFIIA has also been described as a coactivator and displays a number of TAF_II-like properties: it binds to TBP and TAF_IIs; it interacts with specific transcriptional activators; it is generally required for activated transcription in vitro; and it contributes to promoter selectivity (Aoyagia, 2000 and references therein).

Inactivation of individual TAF_IIs in Drosophila , mammalian, and yeast cells has demonstrated that TAF_IIs are not required for the transcription of all RNA pol II genes, and in fact there is great variation in regard to the identity and number of gene targets for individual TAF_IIs. Furthermore, different domains within a single TAF_II can play gene-specific roles in transcription. The isolation of a human B cell-specific isoform of TAF_II130 (TAF_II105) raises the possibility that substoichiometric subunits of TFIID mediate tissue- or cell type-specific transcription and that additional components of TFIID may have escaped detection because of their low abundance. These possibilities have been born out in Drosophila where isoforms of TAF_II110 and TAF_II80 (No hitter [Nht] and Cannonball [Can], respectively) are expressed exclusively in testis and regulate transcription of a subset of genes required for spermatogenesis, and isoforms of TBP (TBP-related factors [TRF1 and TRF2]) are expressed in a tissue-specific manner and bind different genes in salivary gland cells. Similarly, analysis of the human TFIIA-L isoform ALF (TFIIAalpha/ß-like factor) reveals that its expression is restricted to the testis; however, it remains to be determined if it is used for the transcription of testis-specific genes. In Drosophila , TFIIA-S is expressed in a dynamic pattern during eye development and is transiently upregulated in photoreceptor precursor cells before their fate is determined. Therefore, the role of TFIIA and TFIID in transcription initiation is governed by the expression patterns and activities of their varied components (Aoyagia, 2000 and references therein).

Finally, it is critical to note that analysis of the function of TAF_IIs is complicated by the fact that they are components of at least two other complexes that lack TBP: p300/CBP-associated factor (PCAF) and TBP-free TAF_II-containing complex (TFTC). The human PCAF histone acetyltransferase (HAT) complex contains three TAF_IIs that are shared with TFIID (TAF_II31/32, TAF_II20/15, and TAF_II30) and three TAF_II isoforms (PCAF-associated factor 65ß [PAF65ß], PAF65alpha, and SPT3) related to TAF_II100, TAF_II70/80, and TAF_II18, respectively. Yeast possess an analogous complex, Spr-Ada-Gcn5-acetyltransferase (SAGA), containing TFIID TAF_IIs and the Gcn5 HAT, and Drosophila may also, since it contains a Gcn5/PCAF homolog that interacts with TAF_II24 (Aoyagia, 2000 and references therein).

Searches of the completed Drosophila, C. elegans, and yeast genomes and the partial human genome for sequence homologs of biochemically identified components of the general transcription machinery have led to the following conclusions: (1) all of the components of RNA pol II, TFIIB, TFIIE, TFIIF, and TFIIH are encoded by single copy genes in Drosophila , C. elegans, and yeast;(2) multiple isoforms of TFIID components are encoded in Drosophila , C. elegans, humans, and yeast, and multiple isoforms of TFIIA components are encoded in Drosophila and humans; (3) each organism encodes isoforms of different sets of TFIIA and TFIID components, some which are unique to a particular organism (Aoyagia, 2000 and references therein).

Sequence comparisons uncovered Drosophila homologs of TAF_IIs previously identified in yeast or humans by biochemical means but which had not been described in Drosophila (yeast TAF_II67/human TAF_II55, yeast TAF_II30/ human ENL/AF-9, and yeast TAF_II19/human TAF_II18). Thus, all TAF_IIs present in both yeast and humans are present in Drosophila , as well as C. elegans. In contrast, yeast TAF_II47 and TAF_II65 are absent from Drosophila, C. elegans, and apparently from humans, suggesting that these TAF_IIs perform a yeast-specific role, such as serving as coactivators for DNA-binding activators that are not present in metazoans. Finally, there are TAF_IIs present in Drosophila, C. elegans, and humans that are absent from yeast (human TAF_II68/Drosophila Cabeza and multiple TAF_II isoforms). In addition to Can and Nht, there are alternatively spliced forms of TAF_II30alpha, two genes (TAF_II24 and TAF_II16) that encode Drosophila homologs of human TAF_II30, and TAF_II60 and TAF30alpha isoforms (TAF_II60-2 and TAF30alpha-2, respectively). TFIIA-S and TFIIA-L are the only other GTF components in Drosophila and humans, respectively, that are expressed in multiple isoforms. The fact that these proteins are unique to multicellular organisms suggests that they play cell-specific roles (Aoyagia, 2000 and references therein).

A number of TAF_IIs contain a common structural motif called the histone fold that was originally shown to drive folding and association of each of the core histones (H2A, H2B, H3, and H4) and subsequently shown to play a similar role in association of TAF_IIs. TAF_II pairs, such as Drosophila TAF_II40 and TAF_II60, form heterotetramers, analogous to H3 and H4, and numerous other TAF_II-TAF_II and TAF_II-nonTAF_II interactions have been shown to involve histone fold motifs. The demonstrated histone fold interaction of human TAF_II135 and TAF_II20, predicts that Drosophila isoforms of these proteins, Nht and TAF_II30alpha-2, respectively, may heterodimerize and hints at the existence of a human TAF_II20 isoform that would heterodimerize with the TAF_II135 isoform, TAF_II105. B cell-specific expression of the hypothetical TAF_II20 isoform may explain why TAF_II105 associates with TFIID in B cells but not in other cell types (Aoyagia, 2000 and references therein).

In addition to the TAF_IIs indicated above, other Drosophila transcription factors contain histone fold motifs, including Prodos, NF-YC-like (CG3075), CG11301, CHRAC-14 (CG13399), CHRAC-16 (CG15736), Dr1 (CG4185), NC2alpha (CG10318), and BIP2 (CG2009). It is interesting to speculate that these factors may be unidentified TAF_II components of TFIID or binding partners for known TAF_IIs in complexes that lack TBP (Aoyagia, 2000 and references therein).

Analysis of eukaryotic genomes has defined sets of proteins that are similar in sequence to known components of TFIIA and TFIID. Since known components of TFIIA and TFIID have been shown to play key roles in developmentally regulated transcription, it is exciting to speculate that the newly identified genes will play similar roles and that TFIIA and TFIID components have evolved to support tissue- or cell type-specific transcriptional requirements of individual eukaryotic organisms. The challenge now is to determine if TAF_IIs that have been identified on the basis of their sequence are components of TBP-containing complexes or other TAF_II-containing complexes, whether TAF_IIs and TFIIA isoforms are differentially expressed during development, and how differentially expressed TBP, TAF_II, and TFIIA isoforms function in concert with the ubiquitously expressed form of TFIID and TFIIA to regulate gene expression. The subunit composition of human PCAF complex leads to the prediction that Drosophila TAF_II60-2 and Can and C. elegans Y37E11AL.c are components of PCAF/SAGA and not TFIID. However, protein isoforms that are unique to a particular organism, such as Drosophila TAF_II30alpha-2 and C. elegans F54F7.1 and K10D3.3, may be tissue- or cell type-specific components of TFIID and not of PCAF/SAGA. Drosophila may be the most appropriate organism for these studies since the biochemical activities of these factors can be determined using established TFIIA and TFIID purification schemes and in vitro transcription systems, and developmental requirements for these factors can be determined using existing mutants or mutants generated by traditional mutagenesis schemes, P-element insertion, RNA interference (RNAi), or homologous recombination (Aoyagia, 2000 and references therein).

Structures of three distinct activator-TFIID complexes

Sequence-specific DNA-binding activators, key regulators of gene expression, stimulate transcription in part by targeting the core promoter recognition TFIID complex and aiding in its recruitment to promoter DNA. Although it has been established that activators can interact with multiple components of TFIID, it is unknown whether common or distinct surfaces within TFIID are targeted by activators and what changes if any in the structure of TFIID may occur upon binding activators. As a first step toward structurally dissecting activator/TFIID interactions, the three-dimensional structures of TFIID bound to three distinct activators (i.e., the tumor suppressor p53 protein, glutamine-rich Sp1 and the oncoprotein c-Jun) was determined and their structures were compared as determined by electron microscopy and single-particle reconstruction. By a combination of EM and biochemical mapping analysis, these results uncover distinct contact regions within TFIID bound by each activator. Unlike the coactivator CRSP/Mediator complex that undergoes drastic and global structural changes upon activator binding, instead, a rather confined set of local conserved structural changes were observed when each activator binds holo-TFIID. These results suggest that activator contact may induce unique structural features of TFIID, thus providing nanoscale information on activator-dependent TFIID assembly and transcription initiation (Liu, 2009).

Three D density difference maps generated from reconstructions of the three independent activator/TFIID assemblies (i.e., p53-IID, Sp1-IID, and c-Jun-IID) and free holo-TFIID have served as a method to map the most likely contact sites of these activators within the native TBP-TAF complex. Remarkably, each activator contacts TFIID via select TAF interfaces within TFIID. The unique and localized arrangements of these three activators contacting different surfaces of TFIID could be indicative of the wide diversity of potential activator contact points within TFIID that would be dependent on both the specificity of activation domains as well as core promoter DNA sequences appended to target gene promoters. It is also possible, however, that these distinct activator-TFIID contacts can form a common scaffold when TFIID binds to the core promoter DNA (Liu, 2009).

It is well established that activators including p53, Sp1, and c-Jun frequently work synergistically with each other or other activators to potentiate selective gene expression programs in response to a variety of stimuli in vivo. Therefore, combinatorial mechanisms of promoter activation might favor distinct nonoverlapping activator-binding sites within TFIID, which can be achieved by specific interactions between selective TAF subunits and activators. Indeed, it was established that TAF1 and TAF4 serve as coactivators for Sp1, while TAF1, TAF6, and TAF 9 mediate p53-dependent transactivation and TAF1 and TAF7 subunits are thought to be coactivators for c-Jun. Since activators make sequence-specific contacts with the DNA template at various positions upstream of the core promoter, it is also plausible that activators bound to unique surfaces of TFIID can influence specific structures of a promoter as the DNA traverses along TFIID resulting in distinct activator/promoter DNA structures (Liu, 2009).

Activator mapping results also complement and structurally extend the functional relevance of previous biochemical and immunomapping studies of TFIID. For example, label transfer studies show that the N-terminal activation domain of p53 contacts TAF6, confirming previous biochemical evidence showing that amino acids 1-42 of p53 contact TAF6/9. In support of this observation, the p53-IID 3D structure indicates that p53 contacts TFIID at lobes A and C where TAF6/9 are located as determined by EM immunomapping. In addition, previous studies have shown that both TBP and TAF1 can directly contact p53 in the absence of additional TFIID subunits. Interestingly, body-labeled p53 cross-linked to TAF1, TAF5, and weakly to TBP, thus extending the immunomapping studies that determined the locations of TBP and the N terminus of TAF1 at lobe C. Thus, EM activator mapping studies show a significant interface between p53 and specific TAFs located at lobes A and C of TFIID. Likewise, Sp1 label transfer results confirmed previous biochemical data showing a direct interaction between TAF4 and the N-terminal glutamine-rich domains of Sp1. In addition to TAF4, TAF6 was identified as weakly cross-linked to Sp1, suggesting that TAF6 may also be in the vicinity but perhaps more distal to the N terminus of Sp1. The largest TFIID subunit, TAF1, was cross-linked when body-labeled Sp1 was used. This result was not entirely unexpected, since previous studies found that TAF1 is required for Sp1-dependent transactivation, possibly through a direct interaction between TAF1 and Sp1 (Liu, 2009).

In comparison with p53 and Sp1, body-labeled c-Jun was shown to contact TAF1 and TAF6 in label transfer studies with no subunits contacting the N-terminal activation domain of c-Jun. This N-terminal activation domain of c-Jun may be structurally flexible or predominantly unstructured and is apparently positioned away from TFIID contacts. Indeed, successful structural studies of c-Jun thus far have been limited to the C-terminal leucine zipper DNA-binding region when bound to DNA. Previous biochemical assays have shown that the C-terminal basic leucine zipper DNA-binding region also contacts the N terminus of TAF1 (Liu, 2009).

It is worth noting that the extra density representing c-Jun and the other activator polypeptides in EM studies may not reflect the full-expected size of the activators. This is due to the presence of large unstructured regions in these proteins that are averaged out during structural analysis. As activators contain multiple molten globular domains that likely interact with different partners, one would expect a high degree of structural disorder in the domains that are not in direct contact with TFIID. Thus, the extra density associated with each activator determined from the single-particle reconstructions likely only represents minimally the most stably associated portion of activators bound to TFIID. This common situation would invariably lead to underrepresenting the actual size of the activator in a manner not unlike crystal structures of domains with flexible loops that become 'invisible' in the crystal structure (Liu, 2009).

Based on EM immunomapping, there are two copies of TAF6 within TFIID, wherein one copy resides in lobe A and another in lobe B. Collectively, the current studies suggest that two distinct activators (p53 and c-Jun) strongly contact the two different TAF6 subunits that are each located in different lobes of TFIID. It is unknown how p53 or c-Jun discriminates between TAF6 on lobe A versus B when binding to TFIID. In the future, it will be interesting to investigate if these two activators can bind to a single TFIID molecule simultaneously and decipher 3D structures of TFIID assemblies bound to select endogenous promoter DNA sequences in the presence and absence of distinct activators that are engaged in synergistic transcriptional activation (Liu, 2009).

It is of note that unlike the radical, diverse, and global structural changes observed with CRSP/Mediator complexes upon activator binding, TFIID largely retains its overall architecture when bound by three different activators. Interestingly, this study found that two of the activator/IID structures, p53-IID and Sp1-IID assemblies appear to be more constricted around the central cavity with narrower ChB-D and ChA-B channels, while the third structure, c-Jun-IID, remains most similar to free holo-TFIID. In particular, the p53-IID structure more closely resembles the closed conformational state of the previous cryo-TFIID structure. To test if p53-bound TFIID mimics the most closed conformational form of holo-TFIID, 3D reconstructions were performed using either the most closed or 'open' cryo-TFIID structures as an initial reference volume for refinement. Interestingly, it was found that both newly refined 3D structures generated from either the closed or open reference volume are fairly similar, with possibly a partial occupancy of p53 on lobe A. These findings suggest that the overall p53-TFIID structure tends to move toward the closed conformation with moderate movement at the outer tips of lobes A and B, even though p53-IID is predominantly observed in an intermediate average conformational form between the most closed and open forms. Perhaps factors contacting lobe A or C can induce certain coordinated movements within lobes that lead to a closed conformation of TFIID (Liu, 2009).

Although TFIID largely retains its prototypic global architecture upon activator binding, several common localized structural changes induced upon activator binding were observed in the 3D reconstruction. For example, a prominent and consistent induced extra density protrusion located in lobe D was observed when each of the three different activators binds TFIID. Given that all these activators are represented by distinct densities with unique sizes and shapes within the bound TFIID structure, and the fact that it has been demonstrated that they each can target different subunits within TFIID by a number of independent biochemical assays, it seems reasonable to assign 'unique and significant' extra densities located at distinct sites as representing the different bound activators. In contrast, the common similarly sized extra density seen at lobe D of each activator-IID structure most likely represents a conserved conformational change induced by these three different activators. Interestingly, this protrusion in lobe D resides distal to each of the activator-binding sites, suggesting that these three activators may potentially induce a long-range internal conformational change within TFIID. It would be intriguing to identify which TAF subunits are located at the tip of lobe D and eventually determine the function, if any, of this extended lobe in activator-induced transcription initiation. However, despite the potential significance of these structural changes induced by activators, it is premature to speculate regarding their functional importance (Liu, 2009).

Architecture of an RNA polymerase II transcription pre-initiation complex

The protein density and arrangement of subunits of a complete, 32-protein, RNA polymerase II (pol II) transcription pre-initiation complex (PIC) were determined by means of cryogenic electron microscopy and a combination of chemical cross-linking and mass spectrometry. The PIC showed a marked division in two parts, one containing all the general transcription factors (GTFs) and the other pol II. Promoter DNA was associated only with the GTFs, suspended above the pol II cleft and not in contact with pol II. This structural principle of the PIC underlies its conversion to a transcriptionally active state; the PIC is poised for the formation of a transcription bubble and descent of the DNA into the pol II cleft (Murakami, 2013).

This study has revealed a central principle of the PIC: the association of promoter DNA only with the GTFs and not with pol II. Promoter DNA is suspended above the pol II cleft, contacting three GTFs -- TFIIB, TFIID (TBP subunit), and TFIIE -- at the upstream end of the cleft (TATA box) and contacting TFIIH (Ssl2 helicase subunit) at the downstream end. In between, the DNA is free and available for action of the helicase, which untwists the DNA to introduce negative superhelical strain and thereby promote melting at a distance (Murakami, 2013).

This principle of the PIC is a consequence of the rigidity of duplex DNA. The promoter duplex must follow a straight path, whereas bending through ~90° is required for binding in the pol II cleft. Only after melting can the DNA bend for entry in the cleft. Melting is thermally driven, induced by untwisting strain in the DNA above the cleft. A melted region is short-lived and must be captured by binding to pol II, which occurs rapidly enough because the DNA is positioned above the cleft. The GTFs therefore catalyze the formation of a stably melted region (transcription bubble) in two ways, by the introduction of untwisting strain (by the helicase) and by positioning promoter DNA (Murakami, 2013).

Untwisting strain is distributed throughout the DNA above the pol II cleft, so melting may occur at any point, but only a melted region adjacent to TFIIB is stabilized by binding to pol II. The reason is again the rigidity of duplex DNA, and the requirement for a sharp bend adjacent to TFIIB to penetrate the pol II cleft. A single strand of DNA must extend from the point of contact with TFIIB, ~13 bp downstream of the TATA box, through the binding site for the transcription bubble in pol II. TFIIB may also interact with the single strand to stabilize the bubble (Murakami, 2013).

These conclusions are based on results from both cryo-EM and XL-MS, which served to validate one another: Segmentation and labeling of electron density, based on fitting pol II and other known structures, was consistent with all but three of 266 cross-links observed. The PIC structure is also consistent with partial structural information from x-ray crystallography (pol II-TFIIB, pol II-TFIIS, TFIIA-TBP-TFIIB-DNA, and Tfb2-Tfb5), from nuclear magnetic resonance (Tfb1-Tfa1 and Tfa2-DNA), and from EM (core and holo TFIIH). This consistency provides cross-validation, both supporting this PIC structure and establishing the relevance of the partial structural information. Further consistency was found with the results of FeBABE cleavage mapping of complexes formed in yeast nuclear extract; the locations of proteins along the DNA in the PIC structure and those determined with FeBABE cleavage differ by no more than 5 bp. This PIC structure also agrees with results of protein-DNA cross-linking in a reconstituted human transcription system; positions of TFIIE and TFIIH differ between the two studies by ~20 and 10 bp. The location of Ssl2 in this structure, ~30 bp downstream from the TATA box, supports the proposal, made on the basis of previous DNA-protein cross-linking analysis, that helicase action torques the DNA to introduce untwisting strain and thereby to promote melting at a distance (Murakami, 2013).

Association of the winged helix motif of the TFIIEalpha subunit of TFIIE with either the TFIIEbeta subunit or TFIIB distinguishes its functions in transcription

In eukaryotes, the general transcription factor TFIIE consists of two subunits, alpha and beta, and plays essential roles in transcription. Structure-function studies indicate that TFIIE has three-winged helix (WH) motifs, with one in TFIIEα and two in TFIIEβ. Recent studies suggested that, by binding to the clamp region of RNA polymerase II, TFIIEα-WH promotes the conformational change that transforms the promoter-bound inactive preinitiation complex to the active complex. To elucidate its roles in transcription, functional analyses of point-mutated human TFIIEα-WH proteins were carried out. In vitro transcription analyses identified two classes of mutants. One class was defective in transcription initiation, and the other was defective in the transition from initiation to elongation. Analyses of the binding of this motif to other general transcription factors showed that the former class was defective in binding to the basic helix-loop-helix motif of TFIIEβ and the latter class was defective in binding to the N-terminal cyclin homology region of TFIIB. Furthermore, TFIIEα-WH bound to the TFIIH XPB subunit at a third distinct region. Therefore, these results provide further insights into the mechanisms underlying RNA polymerase II activation at the initial stages of transcription (Tanaka, 2015).

dTAF10- and dTAF10b-containing complexes are required for ecdysone-driven larval-pupal morphogenesis in Drosophila melanogaster

In eukaryotes the TFIID complex is required for preinitiation complex assembly which positions RNA polymerase II around transcription start sites. Histone acetyltransferase complexes including SAGA and ATAC, modulate transcription at several steps through modification of specific core histone residues. This study investigated the function of Drosophila proteins TAF10 and TAF10b, which are subunits of dTFIID and dSAGA, respectively. The simultaneous deletion of both dTaf10 genes impaired the recruitment of the dTFIID subunit dTAF5 to polytene chromosomes, while binding of other TFIID subunits, dTAF1 and RNAPII was not affected. The lack of both dTAF10 proteins resulted in failures in the larval-pupal transition during metamorphosis and in transcriptional reprogramming at this developmental stage. Importantly, the phenotype resulting from dTaf10+dTaf10b mutation could be rescued by ectopically added ecdysone, suggesting that dTAF10- and/or dTAF10b-containing complexes are involved in the expression of ecdysone biosynthetic genes. These data support the idea that the presence of dTAF10 proteins in dTFIID and/or dSAGA is required only at specific developmental steps. It is proposed that distinct forms of dTFIID and/or dSAGA exist during Drosophila metamorphosis, wherein different TAF compositions serve to target RNAPII at different developmental stages and tissues (Pahi, 2015).

Rapid dynamics of general transcription factor TFIIB binding during preinitiation complex assembly revealed by single-molecule analysis

Transcription of protein-encoding genes in eukaryotic cells requires the coordinated action of multiple general transcription factors (GTFs) and RNA polymerase II (Pol II; see Drosophila Pol II). A "step-wise" preinitiation complex (PIC) assembly model has been suggested based on conventional ensemble biochemical measurements, in which protein factors bind stably to the promoter DNA sequentially to build a functional PIC. However, recent dynamic measurements in live cells suggest that transcription factors mostly interact with chromatin DNA rather transiently. To gain a clearer dynamic picture of PIC assembly, this study established an integrated in vitro single-molecule transcription platform reconstituted from highly purified human transcription factors and complemented it by live-cell imaging. Real-time measurements were performed of the hierarchal promoter-specific binding of TFIID, TFIIA, and TFIIB. Surprisingly, it was found that while promoter binding of TFIID and TFIIA is stable, promoter binding by TFIIB is highly transient and dynamic (with an average residence time of 1.5 sec). Stable TFIIB-promoter association and progression beyond this apparent PIC assembly checkpoint control occurs only in the presence of Pol II-TFIIF. This transient-to-stable transition of TFIIB-binding dynamics has gone undetected previously and underscores the advantages of single-molecule assays for revealing the dynamic nature of complex biological reactions (Zhang, 2016).

Identification of regions in the Spt5 subunit of DSIF that are involved in promoter proximal pausing

DRB-sensitivity inducing factor (DSIF2, or Spt4/5) is a conserved transcription elongation factor that both inhibits and stimulates transcription elongation in metazoans. In Drosophila and vertebrates, DSIF together with negative elongation factor (NELF) associates with RNA polymerase II (Pol II) during early elongation and causes Pol II to pause in the promoter proximal region of genes. The mechanism of how DSIF establishes pausing is not known. This study constructed Spt5 mutant forms of DSIF and tested their capacity to restore promoter proximal pausing to DSIF-depleted Drosophila nuclear extracts. The C-terminal repeats (CTR) region of Spt5, which has been implicated in both inhibition and stimulation of elongation, is dispensable for promoter proximal pausing. A region encompassing KOW4 and KOW5 of Spt5 is essential for pausing, and mutations in KOW5 specifically shift the location of the pause. RNA crosslinking analysis reveals that KOW5 directly contacts the nascent transcript and deletion of KOW5 disrupts this interaction. These results suggest that KOW5 is involved in promoter proximal pausing through contact with the nascent RNA (Qiu, 2017).

Drosophila TRF2 and TAF9 regulate lipid droplet size and phospholipid fatty acid composition

The general transcription factor TBP (TATA-box binding protein) and its associated factors (TAFs) together form the TFIID complex, which directs transcription initiation. Through RNAi and mutant analysis, this study identified a specific TBP family protein, TRF2, and a set of TAFs that regulate lipid droplet (LD) size in the Drosophila larval fat body. Among the three Drosophila TBP genes, trf2, tbp and trf1, only loss of function of trf2 results in increased LD size. Moreover, TRF2 and TAF9 regulate fatty acid composition of several classes of phospholipids. Through RNA profiling, TRF2 and TAF9 were found to affect the transcription of a common set of genes, including peroxisomal fatty acid beta-oxidation-related genes that affect phospholipid fatty acid composition. Knockdown of several TRF2 and TAF9 target genes results in large LDs, a phenotype which is similar to that of trf2 mutants. Together, these findings provide new insights into the specific role of the general transcription machinery in lipid homeostasis (Fan, 2017).

This study reveals a rather specific role of TRF2 and TAFs, which are general transcription factors, in regulating LD size. In addition, TRF2 and TAF9 affect phospholipid fatty acid composition, most likely through ACOX genes which mediate peroxisomal fatty acid β-oxidation (Fan, 2017).

By binding to their responsive elements in target genes, specific transcription factors like SREBP (see Drosophila Srebp), PPARs and NHR49, play important roles in lipid metabolism. It is interesting to find that the general transcription machineries, in this case TRF2 and core TAFs, also exhibit specificity in regulating lipid metabolism. In the Drosophila late 3^rd instar larval fat body, defects in trf2 cause increased LD size, whereas mutation of the other two homologous genes, tbp and trf1, have no obvious effects on lipid storage. Inactivation of taf genes causes a similar phenotype to trf2 mutation, suggesting that TRF2 may associate with these TAF proteins to direct transcription of specific target genes. Moreover, trf2 mutants have large LDs at both 2^nd and early 3^rd instar larval stages, suggesting that general transcription factors are also required at early developmental stages for LD size regulation. Interestingly, taf9 mutants have no obvious phenotype at these stages. It is possible that TAF9 may act as an accessory factor compared to promoter-binding TRF2. This is consistent with the fact that less genes are affected in taf9 mutants than trf2 mutants in RNA-seq analysis. It was also found that knockdown of trf2 in larval and adult fat body leads to different LD phenotype. This may be due to different lipid storage status or different LD size regulatory mechanisms between larval and adult stages (Fan, 2017).

The finding of this study adds to the growing evidence supporting a specific role of general transcription factors in lipid homeostasis. For example, knockdown of RNA Pol II subunits such as RpII140 and RpII33 leads to small and dispersed LDs in Drosophila S2 cells. Mutation in DNA polymerase δ (POLD1) leads to lipodystrophy with a progressive loss of subcutaneous fat. Furthermore, TAF8 and TAF7L were reported to be involved in adipocyte differentiation. Moreover, previous studies showed that several subunits of the Mediator complex interact with specific transcription factors and play important roles in lipid metabolism. Added together, these lines of evidence strongly support essential and specific roles of the core/basal transcriptional machinery components in lipid metabolism (Fan, 2017).

Using RNA-seq analysis, rescue experiments and ChIP-qPCR, identified several target genes regulated by TRF2 and TAF9. It is possible that other genes may regulate LD size but were missed in the RNA-seq analysis and RNAi screening assay because of either insufficient alterations in genes expression (lower than the twofold threshold) or low efficiency of RNAi. Among all the verified target genes of TRF2 and TAF9,CG10315, which strongly rescues the trf2^G0071 mutant phenotype when overexpressed and encodes the eukaryotic translation initiation factor eIF2B-δ, may be a good candidate for further study. Although they are best known for their molecular functions in mRNA translation regulation, eIFs have been implicated in several other processes, including cancer and metabolism. For example, in yeast, eIF2B physically interacts with the VLCFA synthesis enzyme YBR159W. In adipocytes, eIF2α activity is correlated with the anti-lipolytic and adipogenesis inhibitory effects of the AMPK activator AICAR. In addition, given the evidence that some eIFs, such as eIF4G and eIF-4a, localize on LDsand knockdown of some eIFs, including eIF-1A, eIF-2β, eIF3ga, eIF3-S8 and eIF3-S9, results in large LDs in Drosophila S2 cells, it is important to further explore the specific mechanisms of these eIFs in LD size regulation (Fan, 2017).

Although TRF2 exists widely in metazoans and shares sequence homology in its core domain with TBP, it recognizes sequence elements distinct from the TATA-box. A previous study has investigated TRF2- and TBP-bound promoters throughout the Drosophila genome in S2 cells and revealed that some sequence elements, such as DRE, are strongly associated with TRF2 occupancy while the TATA-box is strongly associated with TBP occupancy (Isogai, 2007). This study also identified that DRE is significantly enriched in extended promoters of the 181 target genes. The distribution of TATA-boxes in the core promoters of the 181 target genes compared with all genes was further explored, and it was found that the TATA-box is not enriched in the core promoters of TRF2 target genes. The proportion of TATA-box is 0.155 (75 of 484 isoforms) for the 181 target genes while the proportion is 0.217 (7849 of 36099 isoforms) for all genes as the background. These results suggest that TRF2 and TAF9 may regulate the expression of a subset of genes by recognizing specific sequence elements such as DRE but not the TATA-box (Fan, 2017).

This study shows that expression of peroxisomal fatty acid β-oxidation pathway genes, including two acyl-CoA oxidase (ACOX) genes, CG4586 and CG9527, the β-ketoacyl-CoA thiolase gene CG9149, and the enoyl-CoA hydratase gene CG9577, is regulated by TRF2 and TAF9. Lipidomic analysis indicates that in the fat body of trf2 and taf9 RNAi, many phospholipids, such as PA, PC, PG and PI, contain more long chain fatty acids. Furthermore, knockdown of CG4586 and CG9527 in the fat body also causes similar changes.

These results coincide with the function of ACOX, which is implicated in the peroxisomal fatty acid β-oxidation pathway for catabolizing very long chain fatty acids and some long chain fatty acids. Similar to these findings, a previous study found that defective peroxisomal fatty acid β-oxidation resulted in enlarged LDs in C. elegans and blocked catabolism of LCFAs, such as vaccenic acid, which probably contributed to LD expansion in mutant worms. Since overexpressing CG4586 or CG9527 only marginally rescues the enlarged LD phenotype of trf2 mutants, it remains to be determined whether the increased level of long chain fatty acid-containing phospholipids contributes to LD size. Regarding the regulation of fatty acid chain length in phospholipids, a recent study reported that there was increased acyl chain length in phospholipids of lung squamous cell carcinoma accompanied by significant changes in the expression of fatty acid elongases (ELOVLs) compared to matched normal tissues. A functional screen followed by phospholipidomic analysis revealed that ELOVL6 is mainly responsible for phospholipid acyl chain elongation in cancer cells. The current findings provide new clues about the regulation of fatty acid chain length in phospholipids. ELOVL and the peroxisomal fatty acid β-oxidation pathway may represent two opposing regulators in determining fatty acid chain length in vivo (Fan, 2017).

Previous studies have shown that TRF2 is involved in specific biological processes including embryonic development, metamorphosis, germ cell differentiation and spermiogenesis. The current results reveal a novel function of TRF2 in the regulation of specialized transcriptional programs involved in LD size control and phospholipid fatty acid composition. Since TRF2 is conserved among metazoans, its role in the regulation of lipid metabolism may be of considerable relevance to various organisms including mammals. These findings may provide new insights into both the regulation of lipid metabolism and the physiological functions of TRF2 (Fan, 2017).

Assembly of SNAPc, Bdp1, and TBP on the U6 snRNA gene promoter in Drosophila melanogaster

U6 snRNA is transcribed by RNA polymerase III (Pol III) and has an external upstream promoter that consists of a TATA sequence recognized by the TBP subunit of the Pol III basal transcription factor IIIB, and a proximal sequence element (PSE) recognized by the small nuclear RNA activating protein complex (SNAPc). Previous work found that Drosophila melanogaster SNAPc (DmSNAPc) bound to the U6 PSE can recruit the Pol III general transcription factor Bdp1 to form a stable complex with the DNA. This study shows that DmSNAPc-Bdp1 can recruit TBP to the U6 promoter, and a region of Bdp1 was identified that is sufficient for TBP recruitment. Moreover, it was found that this same region of Bdp1 cross-links to nucleotides within the U6 PSE at positions that also cross-link to DmSNAPc. Finally, cross-linking mass spectrometry reveals likely interactions of specific DmSNAPc subunits with Bdp1 and TBP. These data, together with previous findings, have allowed the build of a more comprehensive model of the DmSNAPc-Bdp1-TBP complex on the U6 promoter that includes nearly all of DmSNAPc, a portion of Bdp1, and the conserved region of TBP (Kim, 2020).

RNA polymerase III (Pol III) transcribes genes for tRNAs, 5S rRNA, and various small nuclear RNAs (snRNAs). Genes for the tRNAs and 5S rRNA have gene-internal promoters that usually are TATA-less. However, other genes, including U6 snRNA, 7SK RNA, tRNAsel, H1, and MRP RNAs, have gene-external promoters that consist of two distinct elements, a TATA sequence and a proximal sequence element (PSE) centered about 30 and 55 bp, respectively, upstream of the transcription start site. The TATA sequence is recognized by the Pol III general transcription factor TFIIIB, and the PSE is recognized by the small nuclear RNA activating protein complex (SNAPc) (Kim, 2020).

TFIIIB contains three subunits, most often TBP, Brf1, and Bdp1. These three subunits form an architectural scaffold for Pol III recruitment and together coordinate conformational changes that lead to the formation of an open complex. Interestingly, depending upon the type of gene and/or the organism, TFIIIB can exhibit subunit heterogeneity. For example, in the fruit fly Drosophila melanogaster, the TFIIIB that assembles on Pol III genes that have internal promoters contains the TBP-related factor 1 (TRF1) in place of TBP (Verma, 2013). However, U6 and U6-type genes with external promoters utilize a TFIIIB that contains the canonical TBP rather than TRF1 (Verma, 2013). In another example, human Pol III-transcribed genes with internal promoters utilize a TFIIIB that contains canonical Brf1, whereas Pol III-transcribed snRNA genes require an alternative Brf known as Brf2 (Kim, 2020).

SNAPc is a multisubunit factor that binds to the PSE (termed the PSEA in fruit flies, the subject of this paper) to activate the transcription of snRNA genes. D. melanogaster SNAPc (DmSNAPc) consists of three subunits, DmSNAP190, DmSNAP50, and DmSNAP43, that are homologs of the three essential subunits of human SNAPc. Although all three DmSNAPc subunits are required for DNA-binding activity, little is understood of the specific roles that the individual fly or human SNAPc subunits play in the recruitment of TFIIIB and the transcriptional activation of snRNA genes (Kim, 2020).

Previously, by using site-specific protein-DNA photo-cross-linking assays, nucleotide positions were identified where each of the individual DmSNAPc subunits cross-linked as part of the complex to U6 snRNA gene promoter DNA. Likewise, interactions were reported of the TFIIIB subunits (in the absence of DmSNAPc) with specific nucleotides in the U6 snRNA gene promoter. Those studies revealed both the linear positions (translational location along the DNA helix) and rotational positions (face of the DNA double helix) occupied by each of the DmSNAPc and TFIIIB subunits on the DNA. Furthermore, by cleaving the DmSNAPc proteins at specific sites after photo-cross-linking, it was possible to identify domains or regions of DmSNAP190, DmSNAP50, and DmSNAP43 that cross-linked to specific nucleotides within or adjacent to the PSEA (Kim, 2020).

Finally, in more recent work, it was found that DmSNAPc can recruit Bdp1 to the U6 snRNA gene promoter in the absence of TBP and Brf1 (Verma, 2018). Furthermore, an 87-amino-acid region of Bdp1 was identified that was required for Bdp1 to be recruited to the U6 snRNA gene promoter by DmSNAPc. Over the years, this has allowed the building of a more and more encompassing picture of the architecture of the protein-DNA complex assembled on the U6 promoter (Kim, 2020).

Given the findings from that previous work, this study has now examined the recruitment of TBP to the U6 snRNA gene promoter by the DmSNAPc-Bdp1 complex. Furthermore, site-specific protein-DNA photo-cross-linking assays were applied to map the DmSNAPc, Bdp1, and TBP interactions with specific nucleotides of the U6 promoter. Finally, the architecture was examined of both the DmSNAPc-Bdp1-U6 promoter complex and the DmSNAPc-Bdp1-TBP-U6 promoter complex by applying cross-linking mass spectrometry (CXMS). The results of these studies allowed development of a more detailed model of the Pol III transcriptional machinery assembled on the U6 snRNA gene promoter that includes nearly all of DmSNAPc and parts of the TFIIIB components Bdp1 and TBP (Kim, 2020).

The canonical pathway for the assembly of the Pol III preinitiation complex (PIC) on tRNA genes involves the binding of TFIIIC to the gene-internal promoter followed by recruitment of TFIIIB (either preassembled or assembled in a stepwise process that involves the initial recruitment of Brf1 and TBP, followed by Bdp1 in a subsequent step) and finally RNA polymerase. (PIC assembly on 5S genes is believed to be similar but requires the prior binding of TFIIIA to aid in the recruitment of TFIIIC.) In contrast, the results raise the interesting possibility that PIC assembly on Pol III genes with external promoters in D. melanogaster proceeds by an alternate pathway that involves the following initial steps: first, DmSNAPc binds to the PSEA; second, DmSNAPc recruits Bdp1; and third, the promoter-bound DmSNAPc-Bdp1 complex and the TATA box recruit TBP. Brf1 and RNA polymerase may, in turn, assemble on the promoter at a subsequent step of PIC formation (Kim, 2020).

This study has proposed a model for the DmSNAPc-Bdp1-TBP complex on the U6 promoter that is consistent with EMSA, site-specific protein-DNA photo-cross-linking, and CXMS experiments. Furthermore, the DmSNAPc model is fully consistent with coimmunoprecipitation experiments that mapped regions of the three DmSNAPc subunits that are required for their assembly with each other. The model further provides a rationale for the recruitment of Bdp1 and TBP by DmSNAPc. Bdp1 cross-links to DNA nucleotide positions that extend upstream of the TATA box into positions that are actually a part of the PSEA. These positions are also occupied by DmSNAP190 and DmSNAP43 (but not DmSNAP50), indicating that Bdp1 must lie in close proximity to DmSNAP190 and DmSNAP43. Also supporting this model, the CXMS experiments revealed cross-linking of Bdp1 with DmSNAP190 and DmSNAP43 (but not with DmSNAP50) (Kim, 2020).

Furthermore, additional evidence was generated, beyond that previously published, that residues 424 to 510 of Bdp1 are involved in the recruitment of Bdp1 by DmSNAPc. For example, an internal deletion of residues 424 to 510 resulted in the complete loss of Bdp1 recruitment by DmSNAPc. Moreover, Bdp1 residues 424 to 510 alone exhibited the same pattern as full-length Bdp1 in site-specific protein-DNA photo-cross-linking, suggesting that this region of Bdp1 extended into the U6 PSEA, where it would reside in close proximity to DmSNAP190 and DmSNAP43. Finally, the CXMS data with full-length Bdp1 showed that Bdp1 residues 424 to 510, together with nearby residues flanking that region, were responsible for the majority of the protein-protein cross-links between DmSNAPc and Bdp1 (Kim, 2020).

In work by others, the N-terminal region of human Bdp1, more so than the C-terminal region, was found to interact with DmSNAPc. Interestingly, the CXMS studies revealed that lysines within the N-terminal region of Bdp1 (lysines 203, 206, and 231) cross-link to both DmSNAP190 and DmSNAP43. Thus, it is possible that a region of fly Bdp1 N-terminal of the SANT domain, as well as residues 424 to 510 C-terminal of the SANT domain, interact with DmSNAPc. Perhaps this potential N-terminal interaction of fly Bdp1 with DmSNAPc is not stable enough to be detected in the current EMSAs (Kim, 2020).

Interestingly, by EMSA, it has not been possible to convincingly demonstrate the existence of a complex that contains both DmSNAPc and Brf1 together with Bdp1 and TBP. Essentially, either DmSNAPc-Bdp1-TBP or Brf1-Bdp1-TBP, which lacks DmSNAPc, is seen. The modeling suggests a rationale for this result. The finding that a region of DmSNAP43 lies on or near the upper surface of TBP suggests that the binding of DmSNAPc and that of Brf1 are mutually exclusive. Yeast Brf1 was modeled into the proposed SNAPc-Bdp1-TBP complex in accordance with a published cryo-EM structure. Depending upon the exact positioning of DmSNAP43, it may sterically or otherwise interfere with the binding of Brf1 along the upper surface of TBP. If this is true, it would suggest some form of regulation to govern the transition from a DmSNAPc-Bdp1-TBP complex to a Brf1-Bdp1-TBP complex (Kim, 2020).

In light of this potential regulation, the finding cannot be ignored that the C-terminal region of fly DmSNAP190 appears to be structurally related to the ligand-binding domains of members of the nuclear hormone receptor superfamily. The location of this domain, near DmSNAP43 and the SANT domain of Bdp1, raises the intriguing possibility that the activity of D. melanogaster SNAPc and the expression of snRNA genes are regulated by an unknown small organic molecule of intracellular or extracellular origin. This could provide an interesting avenue of future research (Kim, 2020).

The work reported in this study furthermore suggests pathways toward U6 preinitiation complex assembly in flies and humans that are analogous but different with respect to the intermediary factor that acts as a stabilizing bridge between SNAPc and TBP. In flies, PSEA-bound DmSNAPc recruits Bdp1 in a TATA box-independent manner and TBP in a TATA-dependent manner, with Bdp1 acting to stabilize DmSNAPc and TBP on the PSEA and TATA box, respectively. In humans, factor assembly appears to occur analogously but involving Brf2 instead of Bdp1: PSE-bound SNAPc interacts with Brf2 independent of a TATA box, and this complex recruits TBP only in the presence of a TATA box. One obvious explanation for the difference is that flies do not have Brf2, so different mechanisms have evolved in flies and humans for TBP recruitment to U6 gene promoters (Kim, 2020).

In a broader sense, work on snRNA genes has extended the perspective on the diversity of the TFIIIB components that can be assembled into the Pol III PIC: TBP (for snRNA genes) versus TRF1 (for tRNA and 5S RNA genes) in flies and Brf2 (for snRNA genes) versus Brf1 (for tRNA and 5S RNA genes) in humans. The only constant TFIIIB component known so far is Bdp1. The snRNA work has also revealed different pathways for TFIIIB assembly, at least in vitro, on SNAPc-dependent genes versus TFIIIC-dependent genes. The former seem to proceed initially by SNAPc-dependent recruitment of Bdp1 or Brf2, followed by TBP recruitment, whereas the latter are thought to occur by TFIIIC-dependent recruitment of TFIIIB either as a preformed complex or proceeding first through Brf1 and TBP recruitment, followed by Bdp1 in a subsequent step (Kim, 2020).

TFIID Enables RNA Polymerase II Promoter-Proximal Pausing

RNA polymerase II (RNAPII) transcription is governed by the pre-initiation complex (PIC), which contains TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNAPII, and Mediator. After initiation, RNAPII enzymes pause after transcribing less than 100 bases; precisely how RNAPII pausing is enforced and regulated remains unclear. To address specific mechanistic questions, human RNAPII promoter-proximal pausing was reconstituted in vitro, entirely with purified factors (no extracts). As expected, NELF and DSIF increased pausing, and P-TEFb promoted pause release. Unexpectedly, the PIC alone was sufficient to reconstitute pausing, suggesting RNAPII pausing is an inherent PIC function. In agreement, pausing was lost upon replacement of the TFIID complex with TATA-binding protein (TBP), and PRO-seq experiments revealed widespread disruption of RNAPII pausing upon acute depletion (t = 60 min) of TFIID subunits in human or Drosophila cells. These results establish a TFIID requirement for RNAPII pausing and suggest pause regulatory factors may function directly or indirectly through TFIID (Fant, 2020).

RNA polymerase II (RNAPII) transcribes all protein-coding and many non-coding RNAs in the human genome. RNAPII transcription initiation occurs within the pre-initiation complex (PIC), which contains TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, RNAPII, and Mediator. After initiation, RNAPII enzymes typically pause after transcribing 20-80 bases, and paused polymerases represent a common regulatory intermediate. Accordingly, paused RNAPII has been implicated in enhancer function, development and homeostasis, and diseases ranging from cancer to viral pathogenesis. Precisely how RNAPII promoter-proximal pausing is enforced and regulated remains unclear; however, protein complexes, such as NELF and DSIF, increase pausing, whereas the activity of CDK9 (P-TEFb complex) correlates with pause release (Fant, 2020).

Although much has been learned about RNAPII promoter-proximal pausing and its regulation, the underlying molecular mechanisms remain enigmatic. One reason for this is the complexity of the human RNAPII transcription machinery, which includes the ∼4.0 MDa PIC and many additional regulatory factors. Another underlying reason is that much current understanding derives from cell-based assays, which are indispensable but cannot reliably address mechanistic questions. For instance, factor knockdowns or knockouts cause unintended secondary effects and the factors and biochemicals present at each gene in a population of cells cannot possibly be defined. In vitro assays can overcome such limitations, but these have typically involved nuclear extracts, which contain a similarly undefined mix of proteins, nucleic acids, and biochemicals. To circumvent these issues, this study sought to reconstitute RNAPII promoter-proximal pausing entirely from purified human factors (no extracts). Success with this task enabled addressing some basic mechanistic questions and opens the door for future studies to better define the contribution of specific factors in RNAPII promoter-proximal pause regulation (Fant, 2020).

Structural data indicate that TFIID lobe C subunits TAF1 (see Drosophila Taf250) and TAF2 bind promoter DNA downstream of the TSS (Louder, 2016; Patel, 2018). Past studies revealed that insertion of 10-bp DNA at the +15 site relative to the TSS disrupted RNAPII pausing at the HSP70 gene in Drosophila S2 cells (Kwak, 2013). This led to a 'complex interaction' model for pausing, in which a promoter-bound factor(s) establishes an interaction (directly or indirectly) with the paused RNAPII complex. In agreement with this model, a TFIID requirement was observed for RNAPII promoter-proximal pausing in vitro, which is further supported by PRO-seq data in TAF-depleted human and Drosophila S2 cells. Additional evidence for TFIID-dependent regulation of RNAPII pausing derives from correlations among paused genes and DNA sequence elements bound by TFIID. Defects in TFIID function are linked to numerous diseases, including cancer and neurodegenerative disorders. Its requirement for RNAPII promoter-proximal pause regulation may underlie these and other biological functions (Fant, 2020).

Biochemical reconstitution of RNAPII promoter-proximal pausing provides a level of mechanistic control that is simply not possible with cell-based assays; consequently, it was discovered that RNAPII pausing is an inherent property of the human PIC and that TFIID is a key PIC factor that establishes pausing. The results also reveal NELF, DSIF, and P-TEFb as auxiliary factors that, although not required for pausing, enable robust regulation of this common transcriptional intermediate state. Time course experiments indicated that polymerases in the paused region remained active and generated elongated transcripts over time. Experiments with P-TEFb showed enhanced release of paused intermediates, providing further evidence that polymerases in the paused region were active and competent for elongation. However, some transcripts remained in the pause region after the 10-min reactions, even with added P-TEFb. This result is also consistent with current models that invoke alternative outcomes for promoter-proximal paused RNAPII, including premature termination, arrest, or a more stable paused intermediate. Addressing the mechanisms and factors that regulate these distinct outcomes could be explored in future studies (Fant, 2020).

Despite its advantages, the reconstituted in vitro transcription assay does not match the complexity of regulatory inputs that converge upon active promoters in a living cell. To test the TFIID requirement for promoter-proximal pausing in cells, it was possible to rapidly deplete TFIID lobe C subunits TAF1 and TAF2 using Trim-Away, and genome-wide changes in nascent transcription were assessed with PRO-seq. Consistent with the in vitro data, global transcription increased at protein-coding genes upon TAF1/2 knockdown, with evidence for enhanced pause release. PRO-seq reads increased at 5' ends and downstream of promoter-proximal pause sites at thousands of genes in TAF1/2-depleted cells. These data are consistent with increased pause release and increased re-initiation, two processes that are coupled in metazoan cells. Unexpectedly, however, increased pause release did not yield similar genome-wide increases in gene body reads. Instead, the PRO-seq data revealed a sharp reduction in reads downstream of promoter-proximal pause sites, at around +300 from the TSS in both human and Drosophila cells. These results implicate additional regulatory mechanisms, downstream of the pause site, that may terminate or arrest RNAPII. Although future studies are needed to identify the factors involved, it is noted that the Integrator complex was recently shown to cleave nascent transcripts downstream of pause sites at hundreds of genes in Drosophila cells (Tatomer, 2019). Because promoter-proximal pausing helps ensure proper capping of transcripts at their 5' ends, downstream regulatory mechanisms may become important when RNAPII promoter-proximal pausing is disrupted (Fant, 2020).

A TFIID requirement for RNAPII promoter-proximal pausing implies that other pause regulatory factors may function directly or indirectly through TFIID. Although additional mechanistic aspects remain to be addressed, it is notable that pause regulatory factors, including P-TEFb and MYC, interact (directly or indirectly) with TFIID; moreover, TFIID is conformationally flexible and likely undergoes structural reorganization during RNAPII transcription initiation and pause release. Such structural transitions may contribute to TFIID-dependent regulation of RNAPII pausing. Whereas nucleosomes likely affect promoter-proximal pausing, they are not required, based upon our results and data in Drosophila and mammalian systems. TFIID possesses multiple domains that bind chromatin marks associated with transcriptionally active loci, including H3K4me3, which suggests TFIID function is regulated in part through epigenetic mechanisms. Future studies should help establish whether specific chromatin marks contribute to TFIID-dependent regulation of RNAPII pausing, potentially by affecting TFIID promoter occupancy or by impacting TFIID structure and function (Fant, 2020).

The Integrator complex cleaves nascent mRNAs to attenuate transcription

Cellular homeostasis requires transcriptional outputs to be coordinated, and many events post-transcription initiation can dictate the levels and functions of mature transcripts. To systematically identify regulators of inducible gene expression, high-throughput RNAi screening of the Drosophila Metallothionein A (MtnA) promoter was performed. This revealed that the Integrator complex, which has a well-established role in 3' end processing of small nuclear RNAs (snRNAs), attenuates MtnA transcription during copper stress. Integrator is an evolutionarily conserved complex that contains 14 subunits and regulates RNA processing and gene transcription by associating with the C-terminal domain of RNA polymerase II large subunit. Integrator complex subunit 11 (IntS11) endonucleolytically cleaves MtnA transcripts, resulting in premature transcription termination and degradation of the nascent RNAs by the RNA exosome, a complex also identified in the screen. Using RNA-seq, >400 additional Drosophila protein-coding genes whose expression increases upon Integrator depletion. This study focused on a subset of these genes and confirmed that Integrator is bound to their 5' ends and negatively regulates their transcription via IntS11 endonuclease activity. Many noncatalytic Integrator subunits, which are largely dispensable for snRNA processing, also have regulatory roles at these protein-coding genes, possibly by controlling Integrator recruitment or RNA polymerase II dynamics. Altogether, these results suggest that attenuation via Integrator cleavage limits production of many full-length mRNAs, allowing precise control of transcription outputs (Tatomer, 2019).

In response to physiological cues, environmental stress, or exposure to pathogens, specific transcriptional programs are induced. These responses are often coordinated, rapid, and robust, in part because many metazoan genes are maintained in a poised state with RNA polymerase II (RNAPII) engaged prior to induction. In addition to promoter-proximal pausing, there are many regulatory steps post transcription initiation that dictate the characteristics and fate of mature transcripts. For example, alternative splicing and/or 3' end processing events can lead to the production of multiple isoforms from a single locus, and these transcripts can have distinct stabilities, translation potential, or subcellular localization (Tatomer, 2019).

It is particularly important that genes produce full-length functional mRNAs and mechanisms such as telescripting, involving U1 snRNP, actively suppress premature cleavage and polyadenylation events in eukaryotic cells. Nevertheless, many promoters are known to generate short unstable RNAs. This suggests that premature transcription termination may often occur, thereby limiting RNAPII elongation and production of full-length mRNAs (for review, see Kamieniarz-Gdula and Proudfoot 2019). Moreover, this process can be regulated. For example, it was recently shown that the cleavage and polyadenylation factor PCF11 stimulates premature termination to attenuate the expression of many transcriptional regulators in human cells (Kamieniarz-Gdula, 2019). Potentially deleterious truncated transcripts generated by premature termination are often removed from cells by RNA surveillance mechanisms, including by the RNA exosome. However, the full repertoire of cellular factors and cofactors that control the metabolic fate of nascent RNAs, especially during the early stages of transcription elongation, is still unknown (Tatomer, 2019).

An unbiased genome-scale RNAi screen was performed in Drosophila cells to reveal factors that control the output of a model inducible eukaryotic promoter. Transcription of Drosophila Metallothionein A (MtnA), which encodes a metal chelator, is rapidly induced when the intracellular concentration of heavy metals (e.g., copper or cadmium) is increased. This increase in transcriptional output is dependent on the MTF-1 transcription factor, which relocalizes to the nucleus upon metal stress and binds to the MtnA promoter. The RNAi screen identified MTF-1 and other known regulators of MtnA transcription, but also surprisingly identified the Integrator complex as a potent inhibitor of MtnA during copper stress. Integrator harbors an endonuclease that cleaves snRNAs and enhancer RNAs, and this study has found that Integrator can likewise cleave nascent MtnA transcripts to limit mRNA production. Using RNA-seq, hundreds of additional Drosophila protein-coding genes were found whose expression increases upon Integrator depletion. Focused studies on a subset of these genes confirmed that Integrator can cleave these nascent RNAs, thereby limiting productive transcription elongation. Altogether, it is proposed that Integrator-catalyzed premature termination can function as a widespread and potent mechanism to attenuate expression of protein-coding genes (Tatomer, 2019).

Altogether, the data indicate that the Integrator complex can attenuate the expression of protein-coding genes by catalyzing premature transcription termination. The IntS11 endonuclease cleaves a subset of nascent mRNAs, which ultimately triggers degradation of the transcripts by the RNA exosome along with RNAPII termination. It is suggested that many protein-coding genes are negatively regulated via this attenuation mechanism, and the Drosophila MtnA promoter highlights context-specific regulation by Intgerator. Transcription of MtnA is induced by copper or cadmium stress, and yet this study finds that Integrator is robustly recruited to the MtnA promoter only under copper stress conditions. This is not because the Integrator complex is generally diassembled or 'poisoned' by cadmium, as Integrator continues to regulate the outputs of other protein-coding genes. It is instead proposed that context-specific regulation of this locus may be related to the fact that cadmium is a strictly toxic metal, while copper is required for the function of a subset of enzymes and must be maintained in a narrow concentration range. Therefore, homeostatic control of MtnA is required to maintain copper levels, while cells need to maximally produce MtnA in the presence of cadmium. It is thus proposed that regulation of MtnA levels by Integrator during copper stress is for fine-tuning purposes, perhaps to limit maximal transcriptional induction and/or facilitate transcriptional shut-off once copper stress has passed. The results suggest that the Integrator complex can be recruited to gene loci only when needed, thereby ensuring tight control over transcriptional output (Tatomer, 2019).

In addition to cleaving MtnA transcripts, Integrator cleaves multiple other RNA classes in metazoan cells, including enhancer RNAs (Lai, 2015), snRNAs (Baillat, 2005), telomerase RNA (Rubtsova, 2019), and some herpesvirus microRNA precursors (Cazalla, 2011; Xie, 2015). Using RNA-seq, this study has expanded this list of Integrator target loci and identified hundreds of additional protein-coding genes that are negatively regulated by Integrator. Focused is placed on a set of Integrator-dependent genes; Integrator was found to catalyze premature transcription termination of these genes, consistent with prior studies that suggested roles for Integrator in termination (Skaar, 2015; Shah, 2018; Gomez-Orte, 2019). Some of these genes (CG8620, Pepck1, and Sirup) have promoter-proximal RNAPII that rapidly turns over, which may indicate that Integrator can aid in clearing paused or stalled RNAPII. Once Integrator has cleaved the nascent mRNAs, this study finds that they are rapidly degraded from their 3' ends by the RNA exosome. This may be critical for enabling subsequent rounds of transcription (especially at the MtnA locus), perhaps because the small RNAs can form stable RNA-DNA hybrids (R-loops) that block transcription initiation or elongation (Tatomer, 2019).

Endonucleolytic cleavage is critical for Integrator regulation at snRNA and protein-coding genes, but the data indicate that these loci have different dependencies on Integrator subunits. Genetic studies indicate that Integrator subunits 4, 9, and 11 (which form the Integrator cleavage module) are most important for snRNA processing, while the non-catalytic Integrator subunits (all of which currently lack annotated molecular functions) play minor roles. In contrast, large increases in mRNA expression were observed when many of the non-catalalytic subunits were depleted (especially IntS1, IntS2, IntS5, IntS6, IntS7, and IntS8). IntS13 was recently shown to be able to function independently from other Integrator subunits at enhancers (Barbieri, 2018), suggesting the existence of submodules or 'specialized' complexes that may enable the activity and function of Integrator to be distinctly regulated depending on the gene locus and cellular state. Future work will reveal the subunit requirements of Integrator complexes at distinct loci and clarify the interplay between IntS11 endonuclease activity and other Integrator subunits. For example, the non-catalytic subunits may be critical for the formation and targeting of the complex to specific loci and/or controlling RNAPII dynamics (Tatomer, 2019).

Finally, it is noted that the metazoan Integrator complex has parallels with the yeast Nrd1-Nab3-Sen1 (NNS) complex that (1) terminates transcription at both mRNA and snRNA loci and (2) interacts with the RNA exosome. Interestingly, the underlying molecular mechanisms of transcription termination carried out by these two complexes are quite distinct. NNS uses the Sen1 helicase to pull the nascent transcript out of the RNAPII active site, while Integrator likely promotes termination by taking advantage of its RNA endonuclease activity and providing an entry site for a 5'-3' exonuclease. There is currently conflicting data on whether the canonical 'torpedo' exonuclease Rat1/Xrn2 is involved in termination at snRNA genes as only subtle termination defects have been observed at these loci when Rat1/Xrn2 is depleted from cells. Notably, Cpsf73 has been shown to behave as both an endonuclease and exonuclease, raising the possibility that IntS11 could support a 'Rat1/Xrn2-like' function and mediate termination. Future studies that compare and contrast the Integrator and NNS complexes, especially how their recruitment and termination activities are controlled, will shed light on this important facet of gene regulation. In summary, transcription attenuation through premature termination was first described decades ago in bacteria, and the current work indicates that the metazoan Integrator complex can function analogously to limit expression from protein-coding genes (Tatomer, 2019).

Mediator and RNA polymerase II clusters associate in transcription-dependent condensates

Models of gene control have emerged from genetic and biochemical studies, with limited consideration of the spatial organization and dynamics of key components in living cells. This study used live-cell superresolution and light-sheet imaging to study the organization and dynamics of the Mediator coactivator and RNA polymerase II (Pol II) directly. Mediator and Pol II each form small transient and large stable clusters in living embryonic stem cells. Mediator and Pol II are colocalized in the stable clusters, which associate with chromatin, have properties of phase-separated condensates, and are sensitive to transcriptional inhibitors. It is suggested that large clusters of Mediator, recruited by transcription factors at large or clustered enhancer elements, interact with large Pol II clusters in transcriptional condensates in vivo (Cho, 2018).

A conventional view of eukaryotic gene regulation is that transcription factors, bound to enhancer DNA elements, recruit coactivators such as the Mediator complex, which is thought to interact with RNA polymerase II (Pol II) at the promoter. This model is supported by a large body of molecular genetic and biochemical evidence, yet the direct interaction of Mediator and Pol II has not been observed and characterized in living cells. Using superresolution and light-sheet imaging, the organization and dynamics of endogenous Mediator and Pol II in live mouse embryonic stem cells (mESCs) was studied. Whether Pol II and Mediator interact in a manner consistent with condensate formation was directly tested, their biophysical properties were quantitatively characterized, and the implications of these observations for transcription regulation in living mammalian cells was considered (Cho, 2018).

To visualize Mediator and Pol II in live cells, mouse embryonic stem cell lines were generated with endogenous Mediator and Pol II labeled with Dendra2, a green-to-red photoconvertible fluorescent protein. Live-cell superresolution imaging was performed and Mediator was found to form clusters with a range of dynamic temporal signatures. Mediator exists in a population of transient small (~100 nm) clusters with an average lifetime of 11.1 ± 0.9 s, comparable to that of transient Pol II clusters observed in this study and previously in differentiated cell types. In addition, it was observed that both Mediator and Pol II form a population of large (>300 nm) clusters (~14 per cell), each comprising ~200 to 400 molecules, that are temporally stable (lasting the full acquisition window of the live-cell superresolution imaging) (Cho, 2018).

The extent to which these clusters depend on the stem cell state was tested. The mESCs were subjected to a protocol to differentiate them into epiblastlike cells (EpiLCs) within 24 h. Differentiation had no apparent effect on the population of transient clusters, consistent with previous observations that transient clusters persist in differentiated cell types. However, both the size and the number of stable clusters decreased along the course of differentiation, suggesting that these stable clusters are prone to change as cells differentiate (Cho, 2018).

Focused was placed on the stable clusters of Mediator and Pol II and whether they are colocalized was investigated. mESCs were generated with endogenous Mediator and Pol II tagged with JF646-HaloTag and Dendra2, respectively. Direct imaging of both JF646-Mediator and Dendra2-Pol II showed bright spots of large accumulations in the nucleus, which corresponded to stable Pol II clusters according to subsequent superresolution imaging of Dendra2-Pol II in the same nuclei. The same observations were made with Dendra2-Mediator. Of 143 Mediator clusters imaged by dual-color light-sheet imaging, 129 (90%) had a colocalizing Pol II cluster. It was concluded that these Mediator and Pol II clusters colocalize in live mESCs (Cho, 2018).

Previous studies have shown that high densities of Mediator are located at enhancer clusters called super-enhancers (SEs) and that some are disrupted by loss of the BET (bromodomain and extraterminal family) protein BRD4 (Drosophila homolog: fs(1)h), which is a cofactor associated with Mediator. This study found that treatment of mESCs with JQ1, a drug that causes loss of BRD4 from enhancer chromatin, dissolved transient and stable clusters of both Mediator and Pol II clusters (Cho, 2018).

After transcription initiation, Pol II transcribes a short distance (~100 base pairs), pauses, and is released to continue elongation when phosphorylated by CDK9. It was hypothesized that inhibition of CDK9 might selectively affect the Pol II stable clusters. It was observed that upon incubation with DRB (5,6-dichloro-1-beta-d-ribofuranosyl-benzimidazole), Pol II stable clusters dissolved but Mediator stable clusters remained. Quantification of Mediator-Pol II colocalization revealed that incubation with DRB progressively decreased the fraction of Mediator stable clusters that colocalized with Pol II. This effect could be reversed when DRB was washed out; the colocalization fraction recovered completely. These results imply that the association between Mediator and Pol II clusters may be hierarchical, with upstream enhancer recruitment controlling both clusters but downstream transcription inhibition selectively affecting Pol II clusters (Cho, 2018).

The long-term dynamics of stable clusters were characterized by using lattice light-sheet imaging in live mESCs. It was observed that clusters can merge upon contact. The time scale of coalescence was very rapid, comparable to the full volumetric acquisition frame rate (15-s time interval). The added-up intensity of the two precursor clusters was close to that of the newly merged cluster. These biophysical dynamics are reminiscent of those of biomolecular condensates in vivo (Cho, 2018).

In addition to coalescence, in vivo condensates had rapid turnover of the molecular components, as shown by fast recovery in fluorescence recovery after photobleaching (FRAP) assays, and were sensitive to a nonspecific aliphatic alcohol, 1,6-hexanediol. FRAP analyses of clusters revealed very rapid dynamics and turnover of their components: 60% of the Mediator and 90% of Pol II components were exchanged within ~10 s within clusters. Moreover, the treatment of mESCs with 1,6-hexanediol resulted in the gradual dissolution of both Mediator and Pol II clusters. Together, these results suggest that the stable clusters are in vivo condensates of Mediator and Pol II (Cho, 2018).

It was hypothesized that a phase separation model with induced condensation at the recruitment step of Mediator to enhancers would qualitatively account for the observations in this study. The model implies that the condensates are chromatin associated and colocalize with enhancer-controlled active genes. Therefore these two specific implications were tested. The diffusion dynamics of Mediator clusters were tracked by computing their mean squared displacement as a function of time (n = 6 cells). On short time scales, the cluster motion was subdiffusive, with an exponent α = 0.40 ± 0.12. This is the same exponent found in the subdiffusional behavior of chromatin loci in eukaryotic cells. The same diffusional parameters were also observed when tracking a chromatin locus labeled by dCas9-based chimeric array of guide RNA oligonucleotides (CARGO) in mESCs. It is concluded that clusters diffuse like chromatin-associated domains (Cho, 2018).

It was hypothesized that clusters were in close physical proximity to actively transcribed genes that can be visualized by global run-on nascent RNA labeling with ethynyl uridine (EU). The run-on results showed that 2 min after DRB washout, virtually all Mediator clusters observed were proximal or overlapping with nascent RNA accumulations, as imaged by Click labeling of EU in fixed cells. Yhe MS2 endogenous RNA labeling system was employed to investigate whether active transcription could be observed at Esrrb, one of the top SE-controlled genes in mESCs. Bright foci were observed consistent with nascent MS2-labeled gene loci, and the gene loci were confirmed by dual-color RNA fluorescence in situ hybridization (FISH) targeting the MS2 sequence and intronic regions of Esrrb. Intronic FISH on 125 Esrrb loci from 82 fixed cells showed that 93% of Esrrb loci had a stable Mediator cluster nearby (within 1 µm) but only ~22% of the loci colocalized with a stable Mediator cluster, suggesting that the Mediator-bound enhancer only occasionally colocalizes with the gene. The variability in colocalization may be explained by a dynamic 'kissing' model, where a distal Mediator cluster colocalizes with the gene only at certain time points (Cho, 2018).

By dual-color three-dimensional (3D) live-cell imaging with lattice light-sheet microscopy, it was found that some Mediator clusters were up to a micrometer away from the active Esrrb gene locus but in some instances directly colocalized with the gene. In addition, the dynamic interaction between Mediator clusters and the gene locus was directly observed, supporting the dynamic kissing model. Tracking of loci in all six cells indicated that colocalization below the resolution limit of 300 nm occurred at ~30% of the time points. However, even when they were not overlapping, the Mediator cluster and the gene loci moved as a pair through the nucleus, consistent with two adjacent regions anchoring to the same underlying chromatin domain. It is proposed that Mediator clusters form at the Esrrb SE and then interact occasionally and transiently with the transcription apparatus at the Esrrb promoter (Cho, 2018).

This study has found that Mediator and Pol II form large stable clusters in living cells and has shown that these clusters have properties expected for biomolecular condensates. The condensate properties were evident through coalescence, rapid recovery in FRAP analysis, and sensitivity to hexanediol. In a model of phase separation on the basis of scaffold-client relationships, it is possible that enhancer-associated Mediator forms a condensate and provides a 'scaffold' for 'client' RNA Pol II molecules. The model proposed whereby large Mediator clusters at enhancers transiently kiss the transcription apparatus at promoters has a number of implications for gene control mechanisms. The presence of large Mediator clusters at some enhancers may allow Mediator condensates to contact the transcription apparatus at multiple gene promoters simultaneously. The large size of the Mediator clusters may also mean that the effective distance of the enhancer-promoter DNA elements can be in the same order as the size of the clusters (>300 nm), larger than the distance requirement for direct contact. It is speculated that such clusters may help explain gaps of hundreds of nanometers that are found in previous studies measuring distances between functional enhancer-promoter DNA elements. Such cluster sizes also imply that some long-range interactions could go undetected in DNA interaction assays that depend on much closer physical proximity of enhancer and promoter DNA elements (Cho, 2018).

Transcription factors activate genes through the phase-separation capacity of their activation domains

Gene expression is controlled by transcription factors (TFs) that consist of DNA-binding domains (DBDs) and activation domains (ADs). The DBDs have been well characterized, but little is known about the mechanisms by which ADs effect gene activation. This study, carried out in murine embryonic stem cells, reports that diverse ADs form phase-separated condensates with the Mediator coactivator. For the OCT4 and GCN4 TFs, this study shows that the ability to form phase-separated droplets with Mediator in vitro and the ability to activate genes in vivo are dependent on the same amino acid residues. For the estrogen receptor (ER), a ligand-dependent activator, it was shown that estrogen enhances phase separation with Mediator, again linking phase separation with gene activation. These results suggest that diverse TFs can interact with Mediator through the phase-separating capacity of their ADs and that formation of condensates with Mediator is involved in gene activation (Boija, 2018).

Regulation of gene expression requires that the transcription apparatus be efficiently assembled at specific genomic sites. DNA-binding transcription factors (TFs) ensure this specificity by occupying specific DNA sequences at enhancers and promoter-proximal elements. TFs typically consist of one or more DNA-binding domains (DBDs) and one or more separate activation domains (ADs). While the structure and function of TF DBDs are well documented, comparatively little is understood about the structure of ADs and how these interact with coactivators to drive gene expression (Boija, 2018).

The structure of TF DBDs and their interaction with cognate DNA sequences has been described at atomic resolution for many TFs, and TFs are generally classified according to the structural features of their DBDs. For example, DBDs can be composed of zinc-coordinating, basic helix-loop-helix, basic-leucine zipper, or helix-turn-helix DNA-binding structures. These DBDs selectively bind specific DNA sequences that range from 4 to 12 bp, and the DNA binding sequences favored by hundreds of TFs have been described. Multiple TF molecules typically bind together at any one enhancer or promoter-proximal element. For example, at least eight different TF molecules bind a 50-bp core component of the interferon (IFN)-β enhancer (Boija, 2018).

Anchored in place by the DBD, the AD interacts with coactivators, which integrate signals from multiple TFs to regulate transcriptional output. In contrast to the structured DBD, the ADs of most TFs are low-complexity amino acid sequences not amenable to crystallography. These intrinsically disordered regions (IDRs) have therefore been classified by their amino acid profile as acidic, proline, serine/threonine, or glutamine rich or by their hypothetical shape as acid blobs, negative noodles, or peptide lassos. Remarkably, hundreds of TFs are thought to interact with the same small set of coactivator complexes, which include Mediator and p300. ADs that share little sequence homology are functionally interchangeable among TFs; this interchangeability is not readily explained by traditional lock-and-key models of protein-protein interaction. Thus, how the diverse ADs of hundreds of different TFs interact with a similar small set of coactivators remains a conundrum. Recent studies have shown that the AD of the yeast TF GCN4 binds to the Mediator subunit MED15 at multiple sites and in multiple orientations and conformations. The products of this type of protein-protein interaction, where the interaction interface cannot be described by a single conformation, have been termed 'fuzzy complexes'. These dynamic interactions are also typical of the IDR-IDR interactions that facilitate formation of phase-separated biomolecular condensates (Boija, 2018).

It has recently been proposed that transcriptional control may be driven by the formation of phase-separated condensates and it was demonstrated that the coactivator proteins MED1 and BRD4 form phase-separated condensates at super-enhancers (SEs). This study report that diverse TF ADs phase separate with the Mediator coactivator. The embryonic stem cell (ESC) pluripotency TF OCT4, the estrogen receptor (ER), and the yeast TF GCN4 form phase-separated condensates with Mediator and require the same amino acids or ligands for both activation and phase separation. It is proposed that IDR-mediated phase separation with coactivators is a mechanism by which TF ADs activate genes (Boija, 2018).

The results described in this study support a model whereby TFs interact with Mediator and activate genes by the capacity of their ADs to form phase-separated condensates with this coactivator. For both the mammalian ESC pluripotency TF OCT4 and the yeast TF GCN4, it was found that the AD amino acids required for phase separation with Mediator condensates were also required for gene activation in vivo. For ER, it was found that estrogen stimulates the formation of phase-separated ER-MED1 droplets. ADs and coactivators generally consist of low-complexity amino acid sequences that have been classified as IDRs, and IDR-IDR interactions have been implicated in facilitating the formation of phase-separated condensates. It is proposed that IDR-mediated phase separation with Mediator is a general mechanism by which TF ADs effect gene expression and provide evidence that this occurs in vivo at SEs. It is suggested that the ability to phase separate with Mediator, which would employ the features of high valency and low-affinity characteristic of liquid-liquid phase-separated condensates, operates alongside an ability of some TFs to form high-affinity interactions with Mediator (Boija, 2018).

The model that TF ADs function by forming phase-separated condensates with coactivators explains several observations that are difficult to reconcile with classical lock-and-key models of protein-protein interaction. The mammalian genome encodes many hundreds of TFs with diverse ADs that must interact with a small number of coactivators, and ADs that share little sequence homology are functionally interchangeable among TFs. The common feature of ADs-the possession of low-complexity IDRs-is also a feature that is pronounced in coactivators. The model of coactivator interaction and gene activation by phase-separated condensate formation thus more readily explains how many hundreds of mammalian TFs interact with these coactivators (Boija, 2018).

Previous studies have provided important insights that prompted an investigation of the possibility that TF ADs function by forming phase-separated condensates. TF ADs have been classified by their amino acid profile as acidic, proline rich, serine/threonine rich, glutamine rich, or by their hypothetical shape as acid blobs, negative noodles, or peptide lassos. Many of these features have been described for IDRs that are capable of forming phase-separated condensates. Evidence that the GCN4 AD interacts with MED15 in multiple orientations and conformations to form a 'fuzzy complex' is consistent with the notion of dynamic low-affinity interactions characteristic of phase-separated condensates. Likewise, the low complexity domains of the FET (FUS/EWS/TAF15) RNA-binding proteins can form phase-separated hydrogels and interact with the RNA polymerase II C-terminal domain (CTD) in a CTD phosphorylation-dependent manner; this may explain the mechanism by which RNA polymerase II is recruited to active genes in its unphosphorylated state and released for elongation following phosphorylation of the CTD (Boija, 2018).

The model described in this study for TF AD function may explain the function of a class of heretofore poorly understood fusion oncoproteins. Many malignancies bear fusion-protein translocations involving portions of TFs. These abnormal gene products often fuse a DNA or chromatin-binding domain to a wide array of partners, many of which are IDRs. For example, MLL may be fused to 80 different partner genes in AML, the EWS-FLI rearrangement in Ewing's sarcoma causes malignant transformation by recruitment of a disordered domain to oncogenes, and the disordered phase-separating protein FUS is found fused to a DBD in certain sarcomas. Phase separation provides a mechanism by which such gene products result in aberrant gene expression programs; by recruiting a disordered protein to the chromatin, diverse coactivators may form phase-separated condensates to drive oncogene expression. Understanding the interactions that compose these aberrant transcriptional condensates, their structures, and behaviors may open new therapeutic avenues (Boija, 2018).

Nucleosome Positioning around Transcription Start Site Correlates with Gene Expression Only for Active Chromatin State in Drosophila Interphase Chromosomes

This study analyzed the whole-genome experimental maps of nucleosomes in Drosophila melanogaster and classified genes by the expression level in S2 cells (RPKM value, reads per kilobase million) as well as the number of tissues in which a gene was expressed (breadth of expression, BoE). Chromatin in 5'-regions of genes were classified into four states according to the hidden Markov model (4HMM). Only the Aquamarine chromatin state was considered as Active, while the remaining three states were defined as Non-Active. Surprisingly, about 20/40% of genes with 5'-regions mapped to Active/Non-Active chromatin possessed the minimal/at least modest RPKM and BoE. Regardless of RPKM/BoE the genes of Active chromatin possessed the regular nucleosome arrangement in 5'-regions, while genes of Non-Active chromatin did not show respective specificity. Only for genes of Active chromatin the RPKM/BoE positively correlates with the number of nucleosome sites upstream/around TSS and negatively with that downstream TSS. It is proposed that for genes of Active chromatin, regardless of RPKM value and BoE the nucleosome arrangement in 5'-regions potentiates transcription, while for genes of Non-Active chromatin, the transcription machinery does not require the substantial support from nucleosome arrangement to influence gene expression (Levitsky, 2020).

Quantitative imaging of transcription in living Drosophila embryos reveals the impact of core promoter motifs on promoter state dynamics

Genes are expressed in stochastic transcriptional bursts linked to alternating active and inactive promoter states. A major challenge in transcription is understanding how promoter composition dictates bursting, particularly in multicellular organisms. This study investigated two key Drosophila developmental promoter motifs, the TATA box (TATA) and the Initiator (INR). Using live imaging in Drosophila embryos and new computational methods, it was demonstrated that bursting occurs on multiple timescales ranging from seconds to minutes. TATA-containing promoters and INR-containing promoters exhibit distinct dynamics, with one or two separate rate-limiting steps respectively. A TATA box is associated with long active states, high rates of polymerase initiation, and short-lived, infrequent inactive states. In contrast, the INR motif leads to two inactive states, one of which relates to promoter-proximal polymerase pausing. Surprisingly, the model suggests pausing is not obligatory, but occurs stochastically for a subset of polymerases. Overall, these results provide a rationale for promoter switching during zygotic genome activation (Pimmett, 2021).

Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species

The preinitiation complex (PIC) for transcriptional initiation by RNA polymerase (Pol) II is composed of general transcription factors that are highly conserved. However, analysis of ChIP-seq datasets reveals kinetic and compositional differences in the transcriptional initiation process among eukaryotic species. In yeast, Mediator associates strongly with activator proteins bound to enhancers, but it transiently associates with promoters in a form that lacks the kinase module. In contrast, in human, mouse, and fly cells, Mediator with its kinase module stably associates with promoters, but not with activator-binding sites. This suggests that yeast and metazoans differ in the nature of the dynamic bridge of Mediator between activators and Pol II and the composition of a stable inactive PIC-like entity. As in yeast, occupancies of TATA-binding protein (TBP) and TBP-associated factors (Tafs) at mammalian promoters are not strictly correlated. This suggests that within PICs, TFIID is not a monolithic entity, and multiple forms of TBP affect initiation at different classes of genes. TFIID in flies, but not yeast and mammals, interacts strongly at regions downstream of the initiation site, consistent with the importance of downstream promoter elements in that species. Lastly, Taf7 and the mammalian-specific Med26 subunit of Mediator also interact near the Pol II pause region downstream of the PIC, but only in subsets of genes and often not together. Species-specific differences in PIC structure and function are likely to affect how activators and repressors affect transcriptional activity (Petrenko, 2021).

Transcription factor TFIIEbeta interacts with two exposed positions in helix 2 of the Antennapedia homeodomain to control homeotic function in Drosophila

Homeodomains (HDs) increase their DNA-binding specificity by interacting with additional cofactors outlining a Hox interactome with a multiplicity of protein-protein interactions. In Drosophila, the first link of functional contact with a general transcription factor (GTF) was found between Antennapedia (Antp) and BIP2 (TFIID complex). Hox proteins also interact with other components of Pol II machinery such as the subunit Med19 from Mediator (MED) complex, TFIIEbeta and transcription-pausing factor M1BP. This paper focused on the Antp-TFIIEbeta protein-protein interface to establish the specific contacts as well as its functional role. TFIIEbeta was found to interact with Antp through the HD independently of the YPWM motif and the direct physical interaction is at helix 2, specifically amino acidic positions I32 and H36 of Antp. These two positions in helix 2 are crucial for Antp homeotic function in head involution, and thoracic and antenna-to tarsus transformations. Interestingly, overexpression of Antp and TFIIEbeta in the antennal disc showed that this interaction is required for the antenna-to-tarsus transformation. These results open the possibility to more broadly analyze Antp-TFIIEbeta interaction on the transcriptional control for the activation and/or repression of target genes in the Hox interactome during Drosophila development (Altamirano-Torres, 2018).

To analyze the interplay between Hox and the general transcription machinery, this study focused on Antp-TFIIEβ protein-protein interface to establish the specific contacts, as well as the functional role of this interaction. The results showed a direct physical interaction of TFIIEβ with the 32 and 36 positions of helix 2 Antp HD in cell culture and in vivo. These two positions on helix 2 HD are required for interaction with TFIIEβ, and this interaction is necessary for homeotic transformation (Altamirano-Torres, 2018).

The results demonstrate that Antp HD was necessary for maintaining the interaction with TFIIEβ. Previous studies have confirmed that the HD is sufficient for interaction with GTFs. For example, it has been found that the AbdA HD was sufficient for TFIIEβ interaction and that when the DNA-binding of the HD is mutated, the interaction is diminished but not abolished. Another example used Bimolecular fluorescence complementation (BiFC) in vivo to demonstrate that the Ubx HD and AbdA HD are sufficient for direct interaction with Med19. In addition of the conserved HD affinities to DNA and RNA, several protein-protein interactions also relied on the HD, such as dimerization of Scr, and Antp interaction with Eyeless (Altamirano-Torres, 2018).

Although this study found that Antp-TFIIEβ interaction is YPWM-independent in BiFC cell culture and the presence of an intact YPWM motif in the helix 2 Antp mutant showed neither interaction by BiFC nor functional activity, co-expression of the YPWM mutant and TFIIEβ reduced the signal interaction in embryos. A similar result in embryos was found in a earlier study where YPWM Antp mutant showed a reduction but not an abolition of TFIIEβ interaction on Drosophila embryos, that could be attributable to the presence of helix 2 in the mutant. Altogether, this suggest that interactions of Antp with TFIIEβ could change from one tissue to another with complex formation in different tissues using various interfaces (YPWM and/or HD), contributing to the plasticity of Hox interaction properties (Altamirano-Torres, 2018).

Deletional analysis of Antp HD suggested interaction of TFIIEβ through the helix 2 of Antp HD. Based on the reported 3D-structure of Antp HD DNA complex, in which helix 2 is on the opposite side of the HD-DNA binding, this study selected the conserved residues 32 and 36, which are exposed and physically available, as candidates for TFIIEβ interaction. To perform a molecular dissection on the Antp-TFIIEβ interaction, the residues I32 and H36 of helix 2, either individually or together, were studied by site-directed mutagenesis in cell culture. BiFC results show a drastic reduction of the interaction by mutation of these two residues, indicating that they are directly involved on Antp-TFIIEβ interaction. It has been demonstrated that AntpHD is internalized to the nuclei, through the residues 43-58 of the third helix. Therefore, since the mutations examined in this study are present on helix 2, the Antp NLS were not affected. To confirm that, immunostaining of Antp helix 2 mutants on cells and embryos showing very clearly the nuclear localization of Antp helix 2 single mutants and double mutant Antp. These results indicated that Antp helix 2 mutants include NLSs for their localization into the nucleus. Moreover, it was also demonstrated that helix 2 mutant keeps its transactivation activity and is capable to interact with EXD in cells and embryos confirming that mutation of these amino acids did not alter DNA binding affinity and the protein conformation to perform essential activities required for in vivo transformation (Altamirano-Torres, 2018).

Since both substitutions by alanines or structurally similar residues affected Antp-TFIIEβ interaction in cell culture in the same manner, I32A-H36A HD mutant was selected for the in vivo analysis in Drosophila. In concordance with BiFC cell culture assay, the results showed no interaction in embryos or in imaginal discs with Antp mutant I32A-H36A. Therefore, residues 32 and 36 of Antp helix 2 are crucial for the interaction with TFIIEβ in BiFC assays in Drosophila embryos and imaginal discs. This is relevant because residues 32 and 36 on Antp helix 2 are identical and highly conserved within Drosophila Hox proteins and can be extrapolated for the interaction with TFIIEβ to another homeoproteins due to the high Hox conservation (Altamirano-Torres, 2018).

Although the results very clearly show Antp-TFIIEβ interaction through positions 32 and 36 of helix 2, this does not exclude the possibility of another amino acid positions, either at helix 2 or the intervening loop, that could be involved to a minor extent on the interaction. For example, position 30 and 33, in addition to the helix 2 amino acids 32 and 36, have also been reported in human POU proteins Oct-1 and Oct-2 interaction with VP16 transactivator factor of Herpes Simplex Virus (Altamirano-Torres, 2018).

Because the precise molecular mechanisms of Antp in transcriptional regulation remains unclear, attempts were made to shed light on these by determining whether I32 and H36 are important for Antp function. When Antp is ectopically expressed on embryos it causes inhibition of head-involution and transformation of prothoracic segment T1 into T2 and antennae into mesothoracic (T2) legs. Antp ectopic expression exhibits that residues 32 and 36 of HD helix 2 are essential for its function in embryo head involution and homeotic transformations of thorax and antenna. Lack of homeotic transformations of Antp^I32A-H36A double mutant expression indicates that residues 32 and 36 of HD helix 2 are absolutely required for the Antp ectopic homeotic function in Drosophila. Likewise, Antp mutated in the YPWM motif is not capable of transforming the antenna, and a single exposed residue on helix 1 of Scr HD is necessary for its homeotic function, showing that beside the HD DNA-binding, exposed positions on the HD are crucial for Hox functional activity (Altamirano-Torres, 2018).

To determine the functional relevance of the Antp-TFIIEβ interaction, co-expression of TFIIEβ and double mutant Antp^I32A-H36A was directed to the antenna, showing a drastic reduction of the antenna transformation. These findings clearly demonstrate that Antp-TFIIEβ interaction (visualized by BiFC in live larvae) is necessary for the Antp homeotic function with a very strong transformation of the antenna into T2 mesothoracic leg. Together, these results imply that very subtle changes of two amino acids in the Antp HD helix 2 can have dramatic effects on protein-protein interaction with TFIIEβ, affecting transcriptional control and the functional properties of antenna-to-tarsus transformation (Altamirano-Torres, 2018).

These results show that the interaction between TFIIEβ and Antp HD contributes to transcriptional regulation and functional activities of Antennapedia. In the Pol II PIC formation, TFIIE is a heterodimer with α and β subunits, regulating TFIIH activities such as kinase on RNA Pol II CTD, ATPase and DNA helicase. TFIIEβ binds to both TFIIB and TFIIF in important activities needed for promoter melting and stabilization as well as for the transition to elongation. Thus, Antp-TFIIEβ interaction may represent a key control point for modulation of transcription factors involved in activation or repression functions. Repression activity of Antp-TFIIEβ interaction may imply destabilization of the PIC complex or the inhibition of TFIIEβ functions modulating TFIIH ATPase, CTD kinase or helicase activities. For example, it has been determined by in vitro transcription and co-immunoprecipitation assays that the zinc-finger TF Kruppel (Kr), a Drosophila segmentation protein for late embryonic development, interacts in a dimeric way with TFIIEβ and this interaction represses transcription. If it is considered that Antp dictates leg fate by repressing the activity of antenna-determining genes such as Hth and Dll in the leg imaginal discs, it could be reasonable that Antp-TFIIEβ can be involved in repression. Co-expression of Antp with TFIIEβ resulted in a reduction to 47% of the expression of Luciferase compared with of Antp alone, however further experiments need to be done to evaluate the precise molecular mechanism of this interaction. It could also be possible that Antp facilitates the arrival of TFIIEβ to the PIC and subsequently the recruitment and/or activation of TFIIH, allowing an efficient transcription elongation. For example, mutation of Med19 on haltere imaginal discs shows that Med19 is required for Ubx target gene activation. Another example would be that Kr binds to TFIIB in a monomeric way, and this interaction activates transcription in vitro. Thus, further experiments are needed to determine the fine molecular mechanism of how interaction between Antp and TFIIEβ contribute to transcriptional regulation by activation or repression activities, or even both (Altamirano-Torres, 2018).

This study has presented a clear interaction of TFIIEβ with two amino acid positions of Antp HD that are important for Antp homeotic function, and this interplay is essential to the Antp antenna-to-tarsus transformation. In conclusion, amino acids 32 and 36 of Antp HD helix 2 play a very important role in determining the specificity of the TFIIEβ interaction. Altogether, these results provide insights into the molecular interface of Antp HD with TFIIEβ to evaluate the extent to which these molecular contacts translate into functional properties in activation or repression of target genes. The role of residues 32 and 36 on Antp helix 2 can be extrapolated for the interaction of TFIIEβ with other homeoproteins, for example Scr, Ubx and AbdA, due to the highly Hox conservation. In addition, Antp-TFIIEB interaction open the possibility to more broadly explore the interplay between Antp and additional transcription factors in the Hox interactome for the genetic control of development in Drosophila (Altamirano-Torres, 2018).

Functionally distinct promoter classes initiate transcription via different mechanisms reflected in focused versus dispersed initiation patterns

Recruitment of RNA polymerase II (Pol II) to promoters is essential for transcription. Despite conflicting evidence, the Pol II preinitiation complex (PIC) is often thought to have a uniform composition and to assemble at all promoters via an identical mechanism. Using Drosophila melanogaster S2 cells as a model, this study demonstratea that different promoter classes function via distinct PICs. Promoter DNA of developmentally regulated genes readily associates with the canonical Pol II PIC, whereas housekeeping promoters do not, and instead recruit other factors such as DREF. Consistently, TBP and DREF are differentially required by distinct promoter types. TBP and its paralog TRF2 also function at different promoter types in a partially redundant manner. In contrast, TFIIA is required at all promoters, and this study identified factors that can recruit and/or stabilize TFIIA at housekeeping promoters and activate transcription. Promoter activation by tethering these factors is sufficient to induce the dispersed transcription initiation patterns characteristic of housekeeping promoters. Thus, different promoter classes utilize distinct mechanisms of transcription initiation, which translate into different focused versus dispersed initiation patterns (Serebreni, 2023).

Transcription of protein-coding genes by RNA polymerase II (Pol II) is a highly regulated process orchestrated by noncoding regulatory elements, namely enhancers and promoters. Pol II recruitment at promoters leads to transcription initiation from the core promoter region, a roughly 100 base-pair region around the transcription start site (TSS) at the 5' end of protein-coding genes. Although core promoter DNA fragments on their own are typically not sufficient for activity in vivo and support only low levels of transcription in vitro, the TATA-box core promoter is sufficient to bind the TATA-binding protein (TBP) and assemble the Pol II preinitiation complex ). This finding suggests that the core promoter DNA sequence has a crucially important function for PIC assembly and transcription and made the TATA-box core promoter subtype a prominent model for studies of PIC assembly and transcription initiation. Based on multiple lines of evidence, promoters in Drosophila melanogaster can be categorized into two broad classes (i) developmental promoters of developmentally regulated or cell-type-restricted genes that contain TATA-boxes, downstream promoter elements (DPEs), and/or Initiator (INR) motifs and (ii) housekeeping promoters of broadly or ubiquitously expressed genes that contain TCT, DRE, and Ohler1/6 motifs. These two classes of promoters exhibit distinctive regulatory properties, respond differently toward activating cues, and are activated by distinct sets of coactivators. In addition, developmental promoters typically display focused initiation at a single, dominant TSS, whereas housekeeping promoters typically display dispersed initiation at multiple TSSs) (Serebreni, 2023).

The general transcription factors (GTFs: TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) assemble the PIC hierarchically at TATA-box core promoters: the TATA-binding protein (TBP) within TFIID binds to the TATA-box motif in promoter DNA and recruits TFIIA, followed by the remaining GTFs and Pol II. TFIIA cooperates with TFIID to commit PIC assembly into an active state on promoters in vitro. However, the nature of the PIC and PIC assembly at different core promoter subtypes and whether they relate to these promoters' distinct functions, remain unknown; moreover, the distinct properties of core promoter subtypes seem incompatible with a single mechanism of PIC assembly and transcription initiation (Serebreni, 2023).

Some evidence indeed suggests that different promoters utilize different PIC components. For example, some cells do not seem to require TBP, and some promoters require only a subset of GTFs for transcription in vitro or in cells, which is in line with the existence of different stable intermediates or alternative arrangements of the PIC on promoter DNA. Further, promoter-bound multi-subunit protein complexes that are part of the PIC, such as TFIID, can exhibit different arrangements. For instance, the Taf9 subunit of TFIID regulates cell-type-specific genes in neural stem cells, whereas the Taf3 subunit of TFIID activates cell-type-specific genes in myoblasts (Serebreni, 2023).

In addition, some GTFs might not be required in all cells and/or GTF paralogs may regulate transcription in distinct cell types or at specific promoters. The TBP-related factors TBP2 (also known as TRF3) and TBPL1 (TRF2 in Drosophila) have, for example, been implicated in transcription in early steps of mouse oocyte differentiation and during spermatogenesis, respectively, In Drosophila, Trf2 has been suggested to regulate the transcription of ribosomal protein genes, histone H1, and DPE motif-containing promoters. This cumulative evidence suggests that different promoter-bound GTF assemblies may exist on different promoter types and/or in different cell types, which potentially relates to these promoters' distinct properties (Serebreni, 2023).

This study used DNA affinity purification to identify proteins that closely interact with core promoters, combined with protein depletion and PRO-seq to identify proteins that are required for the transcriptional function of core promoters. Differential use of TBP and Trf2 was found at different promoter subtypes, and distinct recruitment mechanisms of TFIIA were found: TFIIA was enriched at developmental promoters in vitro and required for their activity in vivo, suggesting a direct recruitment mechanism and compact PIC architecture at this promoter class. In contrast, TFIIA was not enriched at housekeeping promoters in vitro but still required for their activity in vivo, suggesting an indirect recruitment mechanism and/or dispersed PIC architecture at these promoters. This work suggests that direct recruitment of TFIIA at developmental promoters leads to their focused initiation pattern, whereas indirect recruitment of TFIIA at housekeeping promoters leads to their dispersed initiation pattern (Serebreni, 2023).

In contrast to a prevalent model that Pol II PIC assembly and transcription activation occur similarly at all promoters, this study found that different core promoter types recruit and activate Pol II via distinct strategies that depend on different factors. Developmental promoter DNA is sufficient to recruit and assemble a Pol II PIC from nuclear extract in vitro, by having high affinity to GTFs such as TBP. Found as part of a soluble Pol II holoenzyme in yeast, TBP in complex with TFIIA is tightly associated with chromatin in metazoans and important in directing Pol II PIC assembly on DNA and cofactor mediated transcription in vitro (Serebreni, 2023).

The data indicate that most TATA-less promoters are independent of TBP and utilize TRF2, or TBP and TRF2 in a redundant fashion. Transcription in the absence of TBP has been observed for particular promoters, potentially involving TBP paralogs such as TRF2 in flies. Even though TRF2 has been reported to be unable to bind DNA directly, it may be recruited indirectly to promoters, potentially through interactions with TFIIA and/or TFIID. This is analogous to transcription initiation during oocyte growth when the mammalian TBP paralog TBPL2 cooperates with TFIIA to initiate transcription independently of TFIID. The promoters of snRNA genes also function independently of TBP yet depend on SNAPc. At these promoters, SNAPc seems to directly bind TFIIA and/or TFIIB via an interface shared with TBP (Serebreni, 2023).

The partial redundancy of TBP and TRF2, especially when one of the two is depleted reconciles the current results with recent structural studies of PIC assembly at non-TATA-box promoters: as TBPL1 or other TBP paralogs had not been considered during complex assembly in vitro, TBP was included in the PIC, irrespective of the promoter type. This might have been possible given the flexibility of the PIC, including TFIID that has been reported as sufficiently flexible to accommodate either TBP or TRF2 at different classes of promoters (Serebreni, 2023).

Interestingly, several proteins were found that had been described as insulator or architectural proteins bound to housekeeping promoters, both in vitro and in vivo. This is consistent with the observations that topological chromatin boundaries in Drosophila coincide with housekeeping genes. This could either be a coincidence or—more likely—reflect that these genomic regions and proteins mediate both functions. At least Chromator has transcription-activating activity toward housekeeping core promoters. It is interesting to speculate whether the housekeeping transcriptional program, which is inherently incompatible with cell-type-specific or developmental transcriptional regulation, can per se mediate insulation or if the respective factors have evolved both functions independently (Serebreni, 2023).

Housekeeping promoters also bind sequence-specific TFs such as DREF and M1BP, which in turn interact with cofactors such as GFZF, Chromator and Putzig that—directly or indirectly—recruit GTFs (e.g., TFIIA) and Pol II. These differences in the assembly and stability of the DNA–protein interface and protein complexes might explain the distinct transcription initiation patterns at developmental and housekeeping promoters, which generally exhibit focused and dispersed initiation patterns, respectively. Indeed, forced recruitment of housekeeping activators such as GFZF to arbitrary DNA sequences is sufficient to induce broad transcription initiation patterns, consistent with the initiation patterns observed at housekeeping promoters in vivo and with alternative PIC recruitment. This directly links the transcription-activating cofactors of developmental and housekeeping programs to the distinct initiation patterns observed for the respective promoters. It is note that even for dispersed housekeeping promoters, TSS choice is not entirely random or arbitrary but that certain positions seem to be favored, likely relating to local DNA structure, the energy barrier landscape for both DNA helix melting and phospho-diester-bond formation (Serebreni, 2023).

Given that key features of the promoter types, such as their initiation patterns, sequence motifs and their enhancer responsiveness is observed in Drosophila cell types as different as embryonic S2 cells and adult ovarian OSCs, and because GTFs are typically broadly expressed across cell types, the relative utilization of cofactors is expected to be similar in most cellular contexts. Moreover, while some of the specific TFs do not have one-to-one orthologs outside insects, focused and dispersed initiation patterns are widely observed across a wide range of species, including mammals. It will be exciting to see how homologous and analogous factors function at these distinct promoter types in different species (Serebreni, 2023).

The alternative mechanisms converge on TFIIA that is essential for transcription initiation at all promoter types. A central role of TFIIA recruitment for transcription initiation is consistent with the direct interaction of the TBP paralog TBPL2 with TFIIA in oocyte transcription, the direct interaction of SNAPc with TFIIA and/or TFIIB and noncanonical Pol II transcription of transposon-rich and H3K9me3-marked piRNA source loci in Drosophila germ cells through the TFIIA paralog moonshiner and TRF2. Essentiality for some or all promoter types might extend to other GTFs that could not be tested in this study, including TFIIB that is required at most promoters in human HAP1 cells (Serebreni, 2023).

Some features of Drosophila housekeeping promoters, including the dispersed patterns of transcription initiation, are similarly observed for the majority of vertebrate CpG island promoters comprising roughly 70% of all promoters; FANTOM Consortium and the RIKEN PMI and CLST. The functional regulatory dichotomy of these promoters combined with the evidence of distinct PIC composition and initiation mechanisms here and in other recent studies suggest that it is necessary to challenge the notion of a universal model of rigid and uniform PIC assembly. It will be exciting to see future functional, biochemical, and structural studies revealing more diverse transcription initiation mechanisms at the different promoter types in our genomes (Serebreni, 2023).

This study used two complementary strategies to explore the flexibility of enhancers with regard to nucleotide and motif identity at specific enhancer positions as well as the position dependence of motif activity. Even though median enhancer activity drops significantly when randomizing an 8-nt stretch at important positions, many sequence variants, including variants of the wild-type motif but also other TF motifs, can achieve strong enhancer activity. The diverse solutions at each position show that enhancers exhibit some degree of flexibility. However, as only a few hundred out of the >65,000 tested sequences work, the flexibility at any given position is constrained. Similarly, systematically pasting different motifs into hundreds of enhancer positions revealed that motif activity is strongly modulated by the enhancer sequence context. Therefore, constrained sequence flexibility and the modulation of motif function by the sequence context seem to be key features of enhancers (Serebreni, 2023).

The observation that both Drosophila and human TF motifs require specific enhancer sequence contexts suggests that this is a general principle of enhancers. Even though motifs possess some intrinsic strengths, their potential to activate transcription strongly depends on the sequence context and follows certain syntax rules, including motif flanks, combinations, and distances. Although this study cannot assess the mechanistic causes for these rules, they might be related to local DNA shape or to more general enhancer DNA properties such as DNA bending. The observation that homotypic interactions of certain motifs at close distances (e.g., GATA or ETS) are negatively associated with enhancer activity is consistent with repressive homotypic interactions between pluripotency TFs found by thermodynamic modeling; the mechanisms, however, are still unclear. Intermotif distances can impact the synergy between TFs at the level of DNA binding or after binding, such as cofactor recruitment and activation, which could explain both positive and negative TF-TF interactions. Although these syntax rules seem to be stricter for some TF motifs (e.g., GATA) and more relaxed for others (e.g., P53), the results show that motifs are not simply independent modules. Instead, they interact with all enhancer features in a highly cooperative manner, which can modulate motif activity by more than 100-fold. This is an important result that supports a model where enhancer activity is encoded through a complex interdependence between motifs and context, rather than motifs acting independently and additively. Whereas tissue- or cell type–specificity can already be predicted by motif presence-absence patterns alone, the encoding of different enhancer strengths seems to depend on more complex cis-regulatory syntax rules. The functional implications of mutations in TF motifs or elsewhere within enhancer sequences can therefore only be assessed in the context of these syntax features (Serebreni, 2023).

The motif syntax rules described in this study agree well with the ones learned by DeepSTARR trained on genome-wide enhancer activity data and the BPNet model trained on endogenous TF binding and cooperativity, suggesting that these rules are important in wild-type enhancer sequences. As an ectopic reporter assay STARR-seq measures the potential of sequences to act as enhancers, even if the sequences might be repressed endogenously at the chromatin level, making it a powerful tool to uncover the sequence determinants for enhancer activity. It will be interesting to explore the sequence rules and mechanisms by which chromatin modulates endogenous enhancer activities and gene expression using complementary methods. In addition, DeepSTARR also predicted with good accuracy the activity of all randomized sequence variants and of motifs pasted in different enhancer contexts. This supports the validity of computational models such as DeepSTARR and their use in in-silico-like experiments (e.g., motif pasting experiments with a larger set of TF motifs across many more genomic positions) to improve understanding of the regulatory information encoded in enhancer sequences and the impact of mutations (Serebreni, 2023).

This study shows that enhancer sequences are flexible enough for enhancer strength to be achieved by a small yet diverse set of sequence variants, and that mutations in information-poor positions have little impact on the enhancer activity in a single cell type. This flexibility allows many different sequences to achieve similar enhancer activities in a single cell type, which might be an important prerequisite for the evolution of developmental enhancers that operate under many additional constraints, for example, regarding the precise spatiotemporal control of enhancer activities. As the activity in a given cell can be achieved by many solutions, the specific solutions that fulfill additional requirements can be explored during evolution. Indeed, previous studies that have analyzed expression changes of enhancer mutations across different cell types in vivo have observed that the cell type–specific expression patterns of enhancers can change upon (minimal) sequence perturbations. The fact that enhancer strength in any given cell type and enhancer specificity across cell types and developmental time are subject to different yet overlapping sequence constraints highlights the complexity of the regulatory code. It is expected that the combination of quantitative enhancer-sequence-to-function models in individual cell types and qualitative predictions of enhancer activities across cell types will provide unprecedented progress in understanding of enhancer biology and the ability to read and write enhancer sequences (Serebreni, 2023).

Assessment of the roles of Spt5-nucleic acid contacts in promoter proximal pausing of RNA polymerase II

Promoter proximal pausing of RNA Polymerase II (Pol II) is a critical transcriptional regulatory mechanism in metazoans that requires the transcription factor, DSIF (DRB sensitivity-inducing factor) and the negative elongation factor NELF. DSIF, composed of Spt4 and Spt5, establishes the pause by recruiting NELF to the elongation complex. However, the role of DSIF in pausing beyond NELF recruitment remains unclear. This study used a highly purified in vitro system and Drosophila nuclear extract to investigate the role of DSIF in promoter proximal pausing. Two domains of Spt5 were identified, the KOW4 and NGN domains, that directly facilitate Pol II pausing. The KOW4 domain promotes pausing through its interaction with the nascent RNA while the NGN domain does so through a short helical motif that is in close proximity to the non-transcribed DNA template strand. Removal of this sequence in Drosophila has a male-specific dominant negative effect. The alpha helical motif is also needed to support fly viability. It was also shown that the interaction between the Spt5 KOW1 domain and the upstream DNA helix is required for DSIF association with the Pol II elongation complex. Disruption of the KOW1-DNA interaction is dominant lethal in vivo. Finally, the KOW2-3 domain of Spt5 was shown to mediate the recruitment of NELF to the elongation complex. In summary, these results reveal additional roles for DSIF in transcription regulation and identify specific domains important for facilitating Pol II pausing (Dollinger, 2023).

Eukaryotic transcription is a highly regulated process that depends on the precise spatiotemporal coordination of multiple interacting factors at each stage of the transcription cycle. Initiation, elongation, and termination have long been regarded as the primary canonical steps of this cycle. However, promoter proximal pausing of RNA polymerase II (Pol II) is now recognized as an additional critical post-initiation step in metazoan transcription. Promoter proximal pausing is characterized by an accumulation of Pol II ~30 to 60 nucleotides downstream of the transcription start site. This phenomenon was first observed as a concentration of transcriptionally engaged Pol II at the 5′ end of the beta-globin gene in nuclei from mature hen erythrocytes that were expected to be transcriptionally silent. Several subsequent studies led to the observation of similar phenomena on mammalian c-myc and HIV-1, as well as at non-induced Drosophila heat shock genes. The work by Gilmour and Lis on the Drosophila hsp70 gene established that a single Pol II molecule associates with the non-induced hsp70 gene ind the region between −12 and +65 and subsequent experiments demonstrated that this Pol II is transcriptionally engaged. Since then, genomic methods have provided overwhelming evidence that promoter proximal pausing is a ubiquitous step in the transcription cycle for most Drosophila and mammalian protein-coding genes. Pausing is associated with several critical regulatory functions, including developmental control and the maintenance of a nucleosome-free, permissive chromatin architecture around promoters (Dollinger, 2023).

Promoter proximal pausing requires Transcriptional inhibitor DRB sensitivity-inducing factor (DSIF) and negative elongation factor (NELF), two factors that function cooperatively to establish the pause. DSIF is a widely conserved eukaryotic transcription factor that associates with the elongation complex after the transcription of at least 18 nucleotides. The role of DSIF in pausing was first identified as an activity that rendered Pol II transcription sensitive to inhibition by the nucleoside analog 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole (DRB). NELF was identified as an inhibitory factor that, together with DSIF, works to repress metazoan Pol II transcription. Release of the pause and the transition to productive elongation is thought to be mediated by the cyclin-dependent kinase positive transcription elongation factor b (P-TEFb; a dimer of Cyclin dependent kinase 9 and Cyclin T), which phosphorylates Pol II, DSIF, and NELF, resulting in the ejection of NELF from the elongation complex and the transformation of DSIF from a negative to a positive elongation factor (Dollinger, 2023).

A structure of the human paused elongation complex containing Pol II, DSIF, and NELF sheds light on the possible mechanisms by which NELF induces the pause . In this model, NELF stabilizes the formation of a half-translocated RNA–DNA duplex in the active site, preventing an incoming nucleotide from base pairing with the template. Furthermore, the interaction between NELF-C and the open Pol II trigger loop may interfere with trigger loop folding, which is needed to close off the active site and facilitate nucleotide addition. However, the role of DSIF in promoter proximal pausing has been less clear. DSIF is the lynchpin of the paused elongation complex because it is required to recruit NELF, but how the interactions between DSIF and the Pol II elongation complex contribute to pausing remains ambiguous. Several in vitro studies using highly purified systems indicate that on its own, DSIF either has no effect or a slight stimulatory effect on transcription. Hence, whether DSIF serves solely as an adapter that recruits regulators of elongation or itself contributes to pausing is an open question (Dollinger, 2023).

Of particular interest are the interactions between the Spt5 subunit and the nucleic acid scaffold. Spt5 has several domains, including unstructured N- and C-terminal regions, a NusG N-terminal (NGN) domain, and several Kyprides, Ouzounis, Woese (KOW) domains. Structures of the human elongation complex revealed that the NGN and KOW1 domains form part of the upstream DNA exit tunnel and that the KOW4 and KOW5 domains form a clamp around the nascent transcript. Comparison of the Spt5 conformations between cryo-EM structures of the paused and active elongation complexes highlights a repositioning of the KOW1 and KOW4 domains upon pause release, resulting in an opening of the nucleic acid clamps. Translocation of Pol II requires the movement of the nucleic acids through their respective exit channels. For Pol II to move along the DNA, the upstream DNA must be able to exit though the upstream DNA exit channel, the mouth of which is framed by the Spt5 DNA clamp, and the nascent transcript must exit through the Spt5 RNA clamp (Dollinger, 2023).

This study hypothesized that Spt5–nucleic acid interactions facilitate promoter proximal pausing by restricting the movement of the upstream DNA and nascent RNA through their exit channels. To test this hypothesis, DSIF mutants generated in which the charges of basic nucleic acid-interacting residues of Spt5 were reversed. To identify the pausing functions of the Spt5-nucleic acid contacts, a highly purified in vitro system was used to screen these mutants for Pol II binding and NELF recruitment. Each mutant’s ability was tested to rescue promoter proximal pausing in Drosophila nuclear extract depleted of wild-type DSIF. It was found that the contacts between the KOW1 domain and the upstream DNA mediate the association of DSIF with the elongation complex; since DSIF binding to the elongation complex is a prerequisite for NELF recruitment, the KOW1-DNA interaction thus governs promoter proximal pausing indirectly. Furthermore, the expression of the Spt5 KOW1 mutant is lethal in Drosophila. In contrast, the interactions between the KOW4 domain and the nascent transcript directly facilitate promoter proximal pausing. A short helical motif in the NGN domain was identified that is critical to facilitating the pause. This sequence is highly conserved in eukaryotes that encode NELF but notably absent in eukaryotes that lack promoter proximal pausing and NELF. In flies, the replacement of this helical motif with homologous sequences from Saccharomyces cerevisiae and Caenorhabditis elegans results in a male-specific dominant negative effect. Spt5 NGN mutants also fail to support Drosophila viability when wild-type Spt5 has been depleted with RNAi. Taken together, these results provide a functional assessment of the various domains of Spt5 (Dollinger, 2023).

This work provides a functional assessment of the roles of various Spt5 domains in facilitating promoter proximal pausing. In addition to mediating interactions between NELF and the Pol II elongation complex, DSIF facilitates promoter proximal pausing through the KOW4 and NGN domains of the Spt5 subunit. The KOW4 domain interacts extensively with the nascent transcript; work from the Cramer group has shown that this domain switches from a 'closed' to an 'open' conformation when the elongation complex transitions from a paused state to an active state, suggesting that disengagement of the KOW4 domain from the RNA is a prerequisite for pause release. This work supports this hypothesis. Reversing the charge of KOW4 residues anticipated to interact with the RNA results in a pausing defect in Drosophila nuclear extract. Notably, this defect is accompanied by robust Pol II binding and NELF recruitment that is comparable to that of WT DSIF, indicating an effect mutations on Pol II pausing. Thus, the maintenance of the promoter proximal pause is likely dependent in part on the KOW4-RNA interaction, which is likely disrupted by the opening of the Spt5 RNA clamp (Dollinger, 2023).

The KOW4 domain’s interaction with the nascent transcript may depend on the phosphorylation state of the linker region between the KOW4 and KOW5 domains. A previous study has shown that phosphorylation of this region by P-TEFb can act as a switch that determines whether Pol II enters productive elongation or prematurely terminates. Phosphorylation of the KOW4-5 linker on Ser666 by P-TEFb in human cells is associated with an increased proportion of Pol II in the gene body. This phosphorylation event may result in structuring of the flexible linker that forces the opening of the RNA clamp, allowing pause release. The KOW4–RNA interaction may also be mediated by NELF-E. The flexible NELF-E tentacle was shown to crosslink to the Spt5-KOW4 domain along the mouth of the RNA exit channel. Interaction with NELF may help stabilize the KOW4 domain in the 'closed' position, facilitating pausing (Dollinger, 2023).

Ectopic expression of the KOW4-Asp mutant in Drosophila did not have a dominant negative effect and the mutant was able to support viability in flies expressing Spt5 RNAi. This suggests that mutating the RNA-interacting residues of the KOW4 domain may not be sufficient to fully disrupt the promoter proximal pause in vivo. Additional contacts provided by the Spt5 NGN domain, NELF, and other factors such as nucleosomes present a much more complex regulatory context than the one reconstituted using Drosophila nuclear extract, which could account for the apparent discrepancy between the current in vitro and in vivo results (Dollinger, 2023).

The Spt5 NGN domain also plays a significant role in pausing. Replacement of a short helical motif in the Drosophila NGN domain with homologous unstructured loop regions from yeast or worms results in a severe pausing defect while leaving Pol II binding and NELF recruitment functions intact. This is the first report of a role for the NGN domain in transcriptional pausing in a eukaryotic system. The possible function was explored of a conserved arginine, hR246(dR283), that was oriented to interact with the non-transcribed template strand. Though no difference was observed in pausing activity between our NGN-K.p. and NGN-K.p._R mutants in nuclear extract, re-insertion of the arginine had a dramatic effect in flies. The NGN-K.p._R mutant had a less severe dominant negative effect than its NGN-K.p. counterpart, indicating that the conserved arginine residue is critical to the NGN domain’s function. Neither the NGN-S.c., NGN-K.p., nor the NGN-K.p._R mutants were able to support Drosophila viability when expressed in the presence of Spt5 RNAi, indicating that the full NGN alpha-helical motif is necessary for proper fly development (Dollinger, 2023).

Experiments in Bacillus subtilis previously described RNA polymerase pausing mediated by the interaction of the NGN domain of the bacterial homolog NusG with the non-transcribed DNA in the transcription bubble. However, unlike in Drosophila, this process is dependent on the presence of a DNA sequence motif and does not involve a helical motif similar to what is described in this study. Indeed, the alpha helical motif appears to be exclusive to NELF-encoding eukaryotes, though the conserved hR246 (dR283) residue also appears in archaeal species. Available structures of archaeal Spt5 indicate that this arginine is located in a beta strand rather than the alpha helix found in metazoans. This beta strand is also present in E. coli and in B. subtilis, but both these species lack the conserved arginine found in NELF-encoding eukaryotes and archaea. Notably, in archaea, the NGN domain is required for stimulation of elongation, suggesting that the function of the conserved arginine is context-dependent (Dollinger, 2023).

The NGN domain is highly conserved across all domains of life and exhibits significant structural similarity from species to species. Paradoxically, the function of this domain is varied. In some cases, such as E. coli, archaea, and S. cerevisiae, the NGN domain stimulates elongation, but in B. subtilis and Drosophila, the NGN domain promotes pausing. It is proposed that the DNA-interacting region of the NGN domain is a subdomain that has evolved to serve different functions in various species. This may explain how the highly conserved NGN domain can serve as both a stimulator and a repressor of transcription. Ectopic expression of the NGN-S.c. and NGN-K.p. mutants greatly inhibited the development of adult male flies. In Drosophila, the NGN domain may also promote dosage compensation by stimulating the upregulation of genes on the single male X chromosome. Spt5 has been shown to interact with the dosage compensation factor male-specific lethal (MSL1) through the NGN domain as well as through the KOW domains. Though the mechanisms of this interaction are unknown, it is possible that mutations of the NGN domain described in this study disrupted either the association between Spt5 and MSL1 or their joint function, resulting in the male-specific dominant negative effect that was observed. The NGN domain’s non-transcribed-DNA-interacting region is likely a hotspot for regulating Pol II processivity, making it a logical target for transcription regulation by MSL1 (Dollinger, 2023).

The NGN mutations described in this study may have also disrupted the function of RNA polymerase I (Pol I). Mass spectrometry and immunoprecipitation experiments in yeast demonstrated that Pol I is able to associate with Spt4/5 and later genetic studies demonstrated that Spt5 regulates Pol I transcription. This interaction is mediated at least in part by the Spt5 NGN domain. Thus, it is possible that replacing the NGN helical motif in vivo disrupted not only the processivity of Pol II but also the processivity of Pol I, dysregulating the synthesis of ribosomal RNA. Such a substantial disruption would account not only for the failure of the NGN mutants to support Drosophila viability but could also explain the dominant lethality of the KOW1-Asp mutant given that the KOW1 and NGN domains together form the DNA clamp (Dollinger, 2023).

This study also demonstrated that disrupting the interaction between the Spt5 KOW1 domain and the upstream DNA results in impaired binding of DSIF to the Pol II elongation complex. This is in agreement with previous work in yeast, which showed that deletion of this domain reduced the affinity of Spt5 for the elongation complex. The KOW1 domain is the only KOW domain conserved across all three domains of life, so its role in Pol II elongation complex binding is likely a conserved feature in Spt5 and Spt5 homologs. Ectopic expression of the KOW1-Asp mutant in Drosophila had a dominant lethal effect, highlighting the importance of this region. In addition to facilitating Spt5-Pol II interaction, the KOW1 domain also ensures physical separation of the upstream DNA and the transcript, potentially preventing the formation of irregular structures such as R-loops, which have been linked to genome instability (Dollinger, 2023).

Surprisingly, no elongation complex binding defect was observed in the KOW4-Asp mutant, suggesting that the interaction between the KOW4 domain and the nascent transcript is not necessary for DSIF-Pol II binding. This is in contrast to previous studies in yeast and Drosophila. In yeast, digestion of the nascent transcript with RNaseI nearly eliminated Spt5 binding to the elongation complex. Moreover, a prior study showed that DSIF failed to bind to Pol II elongation complexes that had transcripts shorter than 18 nucleotides. However, because varying the transcript length in these elongation complexes also resulted in varying the length of the upstream DNA, the decrease in DSIF binding could be attributed to reduced interaction between the DNA template and the Spt5 KOW1 domain rather than loss of the KOW4–transcipt interaction. Complexes with 18 nt transcripts only have ~4 base pairs of double-stranded upstream DNA extending out of the Pol II; based on the structures of the human elongation complex, association with the KOW1 domain requires at least ~10 bp of upstream DNA (Dollinger, 2023).

The mechanism of Pol II-DSIF association may nevertheless rely on multiple contact points. While this study has shown that the KOW1 domain is necessary for initial Pol II elongation complex-DSIF binding, recent structural experiments from the demonstrated that Spt5 can be retained on the elongation complex despite the displacement of the KOW1 and NGN domains and Spt4. This suggests that the RNA clamp formed by the KOW4 and KOW5 domains may function to preserve the Pol II-DSIF interaction after the initial association. Furthermore, though no effect of oSpt5 KOW2-3 domain mutations on Pol II binding was seen, it is possible that this region also plays a stabilizing role that helps maintain the association of DSIF with the elongation complex during various conformational transitions (Dollinger, 2023).

Of the nine DSIF mutants described in this study, all but one were able to bind NELF to a degree that was comparable to WT DSIF. This is perhaps unsurprising since the mutations in the Spt5 NGN and KOW1 domains are not located near the modeled paths of the NELF-A and NELF-E tentacles. Moreover, no effect was observed on NELF binding by the mutations in the Spt5 KOW4 domain. The NELF-E C-terminal tentacle is thought to stretch across the mouth of the RNA exit channel between the nascent transcript and the KOW4 domain, so it seemed likely that disrupting the contact between the Spt5 domain and the RNA would also disturb the NELF-E interaction. Nevertheless, the observation is in line with that of a previous study that deleted the NELF-E tentacle and failed to see an effect on pausing in vitro. Furthermore, mutating a pocket of residues (Spt4 R79, R82, K109) in close proximity to a putative contact point between Spt4 and NELF-A that was previously identified by crosslinking mass spectrometry had no effect on NELF recruitment, suggesting that crosslinking results must be interpreted with caution and followed up with biochemical analyses, particularly with regards to intrinsically disordered regions such as the NELF-A tentacle. It is possible that NELF recruitment is mediated in part by Spt4-NELF-A interaction, but verifying this will require careful and systematic biochemical assessment of both Spt4 and the NELF-A C-terminus. Previous biochemical data suggests that deletion of the NELF-A tentacle impairs Pol II pausing in vitro, so future work to interrogate the intrinsically disordered regions of this subunit will be necessary for a complete mechanistic description of NELF recruitment to the elongation complex (Dollinger, 2023).

Mutating the KOW2-3 domain of Spt5 reduced NELF binding in the in vitro system. The KOW2-3 domain is located near the modeled path of the NELF-E N-terminal region and has the greatest number of putative NELF-E contacts. Notably, NELF binding was not completely abolished and could be restored by adding greater quantities of NELF. Moreover, the KOW2-3-S.c. mutant exhibited no dominant negative effect when expressed in flies and was able to rescue the effects of RNAi knockdown of endogenous Spt5, suggesting that while the KOW2-3 domain contributes for NELF recruitment to the elongation complex, the mutations made in this study did not interfere with development. Recent work in human cells showed that the formation of biomolecular condensates mediated by the NELF-A tentacle enhances the recruitment of NELF to promoters. It is possible a similar phenomenon occurs at Drosophila promoters, resulting in a cellular concentration of NELF that is sufficient to overcome the defect of the KOW2-3-S.c. mutant. Interactions between NELF and Spt4, NELF and Pol II, as well as NELF and the nucleic acid scaffold likely serve as additional stabilizing contact points and may even drive the initial recruitment of the NELF complex (Dollinger, 2023).

This study performed an extensive analysis of the domains of the larger DSIF subunit, Spt5, and showed that the NGN and KOW4 domains facilitate pausing in a manner distinct from the role of DSIF as the mediator of NELF-Pol II interaction. It was also shown that the KOW1 domain facilitates DSIF binding to the Pol II elongation complex and that the KOW2-3 domain contributes to NELF recruitment (Dollinger, 2023).

Distinct gene-selective roles for a network of core promoter factors in Drosophila neural stem cell identity

The transcriptional mechanisms that allow neural stem cells (NSC) to balance self-renewal with differentiation are not well understood. Employing an in vivo RNAi screen this study identified NSC-TAFs, a subset of nine TATA-binding protein associated factors (TAFs), as NSC identity genes in Drosophila. Depletion of NSC-TAFs results in decreased NSC clone size, reduced proliferation, defective cell polarity and increased hypersensitivity to cell cycle perturbation, without affecting NSC survival. Integrated gene expression and genomic binding analyses revealed that NSC-TAFs function with both TBP and TRF2, and that NSC-TAF-TBP and NSC-TAF-TRF2 shared target genes encode different subsets of transcription factors and RNA-binding proteins with established or emerging roles in NSC identity and brain development. Taken together, these results demonstrate that core promoter factors are selectively required for NSC identity in vivo by promoting cell cycle progression and NSC cell polarity. Because pathogenic variants in a subset of TAFs have all been linked to human neurological disorders, this work may stimulate and inform future animal models of TAF-linked neurological disorders (Neves, 2019).

In order to understand in more detail the basis for NSC identity this study carried out a focused RNAi screen in live Drosophila, testing transcriptional regulators that influence the number of NSCs and the size of NSC lineages. Unexpectedly it was found that a subset of TAFs (TATA box-binding protein-associated factors) and the TBP-related factor 2 (TRF2) are required for maintaining normal NSC numbers, NSC proliferation and NSC cell polarity, but do not appear to be required for NSC survival. It was further found that NSCs depleted for TAFs or TRF2 are hypersensitive to cell cycle manipulation. TAFs have been well characterized as subunits of the ~1 megadalton Transcription Factor IID (TFIID) complex comprised of the TATA box-binding protein (TBP) and 13 individual TAFs. The main function of TFIID is to recognize and bind to the core promoter, a segment of DNA that is sufficient to direct accurate and efficient RNA polymerase II transcription (Neves, 2019).

A number of recent studies suggest that some TFIID subunits are neither universally required for gene expression nor invariant. However, while different subsets of TFIID subunits have been shown to be required for the self-renewal of both murine and human embryonic stem cells (ESCs), the function of TAFs in stem cell populations in vivo has not been investigated. Notably, a number of recent genetic studies have identified pathogenic variants in several TFIID subunits. First, variants in TAF1, TAF2, TAF8 and TAF13 have all been linked to intellectual disability and microcephaly. Second, insertion of an SVA-type retrotransposon in a noncoding region of TAF1, that results in abnormal splicing and reduced expression of TAF1 in patient-derived NSCs, is associated with X-linked dystonia parkinsonism. Third, TBP is a candidate microcephaly and intellectual disability gene in patients with a subtelomeric 6q deletion, whereas de novo expansion of CAG repeats in TBP is thought to cause spinocerebellar ataxia 17. Finally, mutations in TAF6 have been linked to a Cornelia de Lange-like syndrome, a clinically heterogeneous disorder characterized by developmental delay and intellectual disability. This study extend the role of TAFs, TBP and TRF2 in developmental gene regulation by showing they are members of a core promoter network involved in NSC identity (Neves, 2019).

The emergence in metazoans of both TBP and TAF paralogs, and of additional core promoter elements, has been proposed to contribute to the evolution of bilaterians by supporting more complex transcriptional programs. Evidence from genetic and biochemical studies in a wide variety of model systems suggest that this diversity has indeed allowed multiple TAFs to take on cell- or tissue-specific functions. For example, the TAF9 paralog TAF9B regulates neuronal gene expression by associating with the SAGA/PCAF co-activator complex, whereas the TAF7 paralog TAF7L associates with TBP-related factor 2 (TRF2) to direct expression of a subset of post-meiotic genes during spermiogenesis. However, these examples are generally restricted to orphan TAFs, while prototypical TAFs are primarily present in TFIID and/or SAGA complexes (Neves, 2019).

In this study, using Drosophila NSCs as a model, gene-selective functions have been uncovered for a subset of TAFs, NSC-TAFs, and some of these functions are shared with TRF2 whereas others are shared with their canonical binding partner, TBP. The finding that NSC-TAFs did not regulate survival was unexpected, as deletion of Taf9 in chicken DT40 cells, of Taf4a in mouse embryos or TAF9 depletion in wing disc epithelial cells all result in increased apoptosis. However, it's unclear whether TAFs are required for survival of embryonic stem cells (ESCs) as inducible depletion of TAF8 resulted in ESC cell death in one study whereas no cell death was detected upon knockdown of either TAF5 or TAF6 in a different study. In contrast to a report using murine ESCs in which a stable TAF5 knockdown ESC line prematurely differentiated without affecting the cell cycle, this study shows that NSC-TAFs and TRF2 control stem cell identity in part, through direct regulation of the cell cycle. Depletion of NSC-TAFs or TRF2 by RNAi diminished the number of NSCs that incorporated the thymidine analog EdU, lowered the mitotic index, and rendered NSCs hypersensitive to cell cycle manipulation. Moreover, DamID peaks were identified at key cell cycle genes, including E2F1, CycE and string. It was initially hypothesized that NSCs depleted for NSC-TAFs or TRF2 would both exhibit an extended G1 phase and be hypersensitive to manipulation of the G1/S transition. However, quantification of cell cycle phases using a FUCCI-based reporter showed that NSCs depleted for NSC-TAFs or TRF2 were in fact primarily in G2 and exhibited hypersensitivity to manipulation of both the G1/S and G2/M transitions. Intriguingly, a recent study showed that quiescent NSCs, which are known to extend a primary process, arrest primarily in G2 and are labeled by tribbles (trbl), which encodes a conserved pseudokinase. These phenotypes are remarkably similar to those observed upon depletion of NSC-TAFs or TRF2 and it is noted that the trbl locus is occupied by TBP, TRF2S and TAF5 (Neves, 2019).

Because NSC-TAFs and TRF2 exhibit similar loss of function phenotypes and share at least 45 target genes in addition to the type I NSC marker ase, it is proposed that they function together to direct expression of a subset of NSC-expressed genes. This hypothesis was tested by combining expression analysis using RNA-seq of FACS-purified NSCs and by determining the genomic binding sites for TBP, TRF2S and a representative NSC-TAF (TAF5), using Targeted DamID (TaDa). RNA-seq experiments revealed that many genes co-regulated by TBP and TAF9 are known or predicted to be important for NSC identity including the chromatin remodeler domino, the cell cycle genes E2F1, CycE and string and the temporal identity factors Syp and svp. However, the functional relevance of the NSC-TAF-TBP target genes remains to be determined. Similarly, TAF9-dependent genes that were unaffected upon TBP knockdown and that are bound by TAF5 are good candidates for mediating NSC-TAF's function in self-renewal, such as the transcription factors Chinmo, Klumpfuss, HmgD and the polarity protein Insc. Lastly, while the function of the transcription factors (dati, hng3, HmgZ, mamo, Hr4, bi, jim) that are co-regulated by TRF2 and TAF9 and co-occupied by TRF2S and TAF5 are not well characterized, Dati, Jim and Mamo have recently been identified as important components of gene regulatory networks uncovered in a single-cell RNA-seq atlas of the adult Drosophila brain (Neves, 2019).

Because TRF2 neither binds the TATA box nor has sequence-specific DNA binding activity, how does the putative NSC-TAF-TRF2 complex recognize its target genes? Several lines of evidence suggest that DREF, which directly binds the DRE element, could be part of the DNA-targeting mechanism. First, depletion of DREF also results in fewer NSCs and smaller NSC lineages. Second, DREF is known to be part of the TRF2 complex. Third, motif analyses with HOMER revealed that the DRE was over-represented in peaks identified by TaDa for all three fusion proteins. While a TRF2 complex larger than 500 kDa has been purified from embryonic nuclear extracts, none of the identified TRF2-binding proteins were TAFs. However, a more recent identification of TRF2S-binding proteins from ovary lysates identified several TAFs, raising the possibility that NSC-TAFs and TRF2 form a complex in vivo (Neves, 2019).

While depletion of TBP did not result in loss of Ase expression, nor diminish the number of NSCs, TBP was found to be essential for NSC cell cycle progression and directly regulated expression of many cell cycle genes. Intriguingly, knockdown of the RNA Pol II subunit RpII33 resulted in a more severe cell proliferation phenotype than TBP knockdown, but it is note that a complication of the RNAi experiments is that TBP depletion reduced expression of UAS transgenes including the RNAi transgene itself whereas TRF2 or NSC-TAF depletion increased expression of UAS transgenes. However, given the lack of NSC loss in TBP-depleted brains, it was surprising to find that more than a third of NSC-expressed genes detected by low-input RNA-seq were affected upon TBP knockdown. This is even more striking considering the fact that several NSC-TAFs (TAF1, TAF4, TAF8 and TAF9) were among the genes downregulated upon TBP depletion yet the NSC-TAF and TBP phenotypes are clearly different. Importantly, a study that sought to model the human neurological disorder SCA17 in Drosophila showed evidence that TBP is required for normal brain function, as removal of one copy of Tbp recapitulates some features of SCA17, such as impaired motility and age-dependent accumulation of vacuoles in the brain (Neves, 2019).

It was reasoned that the subset of TAFs that were identified is unique, as it is distinct from any previously described TAF complex. For example, NSC-TAFs partially overlap with a subset of TAFs that appear to co-regulate the size and composition of lipid droplets in the Drosophila fat body with TRF2 (Fan, 2017), yet that study did not identify lipid droplet functions for TAF2, TAF7 or TAF8, which are NSC-TAFs. Similarly, in mouse ESCs TFIID was proposed to be integral to the pluripotency circuitry yet depletion of two NSC-TAF orthologs (TAF7 or TAF8) did not affect ESCs identity (Neves, 2019).

Recent exome sequencing studies have produced compelling evidence that pathogenic variants in TAF1, TAF2, TAF8 and TAF13 are linked to intellectual disability and microcephaly. However, none of these variants have been modeled in vivo, in part due to a gap in our understanding of the function of TAFs during brain development. By demonstrating that TAFs are required for NSC cell cycle progression, NSC cell polarity, and act to prevent premature differentiation while not affecting survival, this work provides a foundation for future studies aimed at uncovering the causative variants in these disorders (Neves, 2019).

The catalytic-dead Pcif1 regulates gene expression and fertility in Drosophila

Eukaryotic mRNAs are modified at the 5' end with a methylated guanosine (m(7)G) that is attached to the transcription start site (TSS) nucleotide. The TSS nucleotide is 2'-O-methylated (Nm) by CMTR1 in organisms ranging from insects to human. In mammals, the TSS adenosine can be further N (6) -methylated by RNA polymerase II phosphorylated CTD-interacting factor 1 (PCIF1) to create m(6)Am. Curiously, the fly ortholog of mammalian PCIF1 is demonstrated to be catalytic-dead, and its functions are not known. This study shows that Pcif1 mutant flies display a reduced fertility which is particularly marked in females. Deep sequencing analysis of Pcif1 mutant ovaries revealed transcriptome changes with a notable increase in expression of genes belonging to the mitochondrial ATP synthetase complex. Furthermore, the Pcif1 protein is distributed along euchromatic regions of polytene chromosomes, and the Pcif1 mutation behaved as a modifier of position-effect-variegation (PEV) suppressing the heterochromatin-dependent silencing of the white gene. Similar or stronger changes in the transcriptome and PEV phenotype were observed in flies that expressed a cytosolic version of Pcif1. These results point to a nuclear cotranscriptional gene regulatory role for the catalytic-dead fly Pcif1 that is probably based on its conserved ability to interact with the RNA polymerase II carboxy-terminal domain (Franco, 2023).

list of proteins involved in messenger RNA synthesis

References

Altamirano-Torres, C., Salinas-Hernandez, J. E., Cardenas-Chavez, D. L., Rodriguez-Padilla, C. and Resendez-Perez, D. (2018). Transcription factor TFIIEbeta interacts with two exposed positions in helix 2 of the Antennapedia homeodomain to control homeotic function in Drosophila. PLoS One 13(10): e0205905. PubMed ID: 30321227

Aoyagia, N. and Wassarman, D. A. (2000). Genes encoding Drosophila melanogaster RNA polymerase II general transcription factors: diversity in TFIIA and TFIID components contributes to gene-specific transcriptional regulation. J. of Cell Bio. 150: F45-50. 10908585

Arenas-Mena, C. (2017). The origins of developmental gene regulation. Evol Dev 19(2): 96-107. PubMed ID: 28116828

Arnold, C. D., Zabidi, M. A., Pagani, M., Rath, M., Schernhuber, K., Kazmar, T. and Stark, A. (2017). Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat Biotechnol 35(2): 136-144. PubMed ID: 28024147

Barbieri, E., Trizzino, M., Welsh, S. A., Owens, T. A., Calabretta, B., Carroll, M., Sarma, K. and Gardini, A. (2018). Targeted enhancer activation by a subunit of the integrator complex. Mol Cell 71(1): 103-116 e107. PubMed ID: 30008316

Baumann, D. G. and Gilmour, D. S. (2017). A sequence-specific core promoter-binding transcription factor recruits TRF2 to coordinately transcribe ribosomal protein genes. Nucleic Acids Res 45(18): 10481-10491. PubMed ID: 28977400

Boija, A., Klein, I. A., Sabari, B. R., Dall'Agnese, A., Coffey, E. L., Zamudio, A. V., Li, C. H., Shrinivas, K., Manteiga, J. C., Hannett, N. M., Abraham, B. J., Afeyan, L. K., Guo, Y. E., Rimel, J. K., Fant, C. B., Schuijers, J., Lee, T. I., Taatjes, D. J. and Young, R. A. (2018). Transcription factors activate genes through the phase-separation capacity of their activation domains. Cell 175(7): 1842-1855. PubMed ID: 30449618

Bose, D. A., Donahue, G., Reinberg, D., Shiekhattar, R., Bonasio, R. and Berger, S. L. (2017). RNA binding to CBP stimulates histone acetylation and transcription. Cell 168(1-2): 135-149 e122. PubMed ID: 28086087

Cazalla, D., Xie, M. and Steitz, J. A. (2011). A primate herpesvirus uses the integrator complex to generate viral microRNAs. Mol Cell 43(6): 982-992. PubMed ID: 21925386

Cho, H., et al. (1999). A protein phosphatase functions to recycle RNA polymerase II. Genes Dev. 13: 1540-52. Medline abstract: 10385623

Cho, W. K., Spille, J. H., Hecht, M., Lee, C., Li, C., Grube, V. and Cisse, II (2018). Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science 361(6400): 412-415. PubMed ID: 29930094

Core, L. J., Martins, A. L., Danko, C. G., Waters, C. T., Siepel, A. and Lis, J. T. (2014). Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46(12): 1311-1320. PubMed ID: 25383968

Dollinger, R., Deng, E. B., Schultz, J., Wu, S., Deorio, H. R. and Gilmour, D. S. (2023). Assessment of the roles of Spt5-nucleic acid contacts in promoter proximal pausing of RNA polymerase II. J Biol Chem: 105106. PubMed ID: 37517697

Duttke, S. H. C., Lacadie, S. A., Ibrahim, M. M., Glass, C. K., Corcoran, D. L., Benner, C., Heinz, S., Kadonaga, J. T. and Ohler, U. (2015). Human promoters are intrinsically directional. Mol Cell 57(4): 674-684. PubMed ID: 25639469

Elrod, N. D., Henriques, T., Huang, K. L., Tatomer, D. C., Wilusz, J. E., Wagner, E. J. and Adelman, K. (2019). Mol Cell 76(5):738-752. PubMed ID: 31809743

Fan, W., Lam, S. M., Xin, J., Yang, X., Liu, Z., Liu, Y., Wang, Y., Shui, G. and Huang, X. (2017). Drosophila TRF2 and TAF9 regulate lipid droplet size and phospholipid fatty acid composition. PLoS Genet 13(3): e1006664. PubMed ID: 28273089

Fant, C. B., Levandowski, C. B., Gupta, K., Maas, Z. L., Moir, J., Rubin, J. D., Sawyer, A., Esbin, M. N., Rimel, J. K., Luyties, O., Marr, M. T., Berger, I., Dowell, R. D. and Taatjes, D. J. (2020). TFIID enables RNA polymerase II promoter-proximal pausing. Mol Cell. PubMed ID: 32229306

Franco, G., Taillebourg, E., Delfino, E., Homolka, D., Gueguen, N., Brasset, E., Pandey, R. R., Pillai, R. S. and Fauvarque, M. O. (2023). The catalytic-dead Pcif1 regulates gene expression and fertility in Drosophila. Rna 29(5): 609-619. PubMed ID: 36754578

Gomez-Orte, E., Saenz-Narciso, B., Zheleva, A., Ezcurra, B., de Toro, M., Lopez, R., Gastaca, I., Nilsen, H., Sacristan, M. P., Schnabel, R. and Cabello, J. (2019). Disruption of the Caenorhabditis elegans Integrator complex triggers a non-conventional transcriptional mechanism beyond snRNA genes. PLoS Genet 15(2): e1007981. PubMed ID: 30807579

Isogai, Y, Keles S, Prestel M, Hochheimer A, Tjian R. (2007). Transcription of histone gene cluster by differential core-promoter factors. Genes Dev. 21(22): 2936-49. PubMed ID: 17978101

Jin, Y., Eser, U., Struhl, K. and Churchman, L. S. (2017). The ground state and evolution of promoter region directionality. Cell 170(5): 889-898 e810. PubMed ID: 28803729

Kamieniarz-Gdula, K., Gdula, M. R., Panser, K., Nojima, T., Monks, J., Wisniewski, J. R., Riepsaame, J., Brockdorff, N., Pauli, A. and Proudfoot, N. J. (2019). Selective and roles of vertebrate PCF11 in premature and full-length transcript termination. Mol Cell 74(1): 158-172. PubMed ID: 30819644

Kim, M. K., Tranvo, A., Hurlburt, A. M., Verma, N., Phan, P., Luo, J., Ranish, J. and Stumph, W. E. (2020). Assembly of SNAPc, Bdp1, and TBP on the U6 snRNA gene promoter in Drosophila melanogaster. Mol Cell Biol. PubMed ID: 32253345

Kwak, H. and Lis, J. T. (2013). Control of transcriptional elongation. Annu Rev Genet 47: 483-508. PubMed ID: 24050178

Levitsky, V. G., Zykova, T. Y., Moshkin, Y. M. and Zhimulev, I. F. (2020). Nucleosome Positioning around Transcription Start Site Correlates with Gene Expression Only for Active Chromatin State in Drosophila Interphase Chromosomes. Int J Mol Sci 21(23). PubMed ID: 33291385

Louder, R. K., He, Y., Lopez-Blanco, J. R., Fang, J., Chacon, P. and Nogales, E. (2016). Structure of promoter-bound TFIID and model of human pre-initiation complex assembly. Nature 531(7596): 604-609. PubMed ID: 27007846

Lai, F., Gardini, A., Zhang, A. and Shiekhattar, R. (2015). Integrator mediates the biogenesis of enhancer RNAs. Nature 525(7569): 399-403. PubMed ID: 26308897

Xie, M., Zhang, W., Shu, M. D., Xu, A., Lenis, D. A., DiMaio, D. and Steitz, J. A. (2015). The host Integrator complex acts in transcription-independent maturation of herpesvirus microRNA 3' ends. Genes Dev 29(14): 1552-1564. PubMed ID: 26220997

Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., Sim, H. S., Peh, S. Q., Mulawadi, F. H., Ong, C. T., Orlov, Y. L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C. L., Ge, W., Wang, H., Davis, C., Fisher-Aylor, K. I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fullwood, M. J., Cheung, E., Liu, E., Sung, W. K., Snyder, M. and Ruan, Y. (2012). Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148(1-2): 84-98. PubMed ID: 22265404

Liu, W. L., et al. (2009). Structures of three distinct activator-TFIID complexes. Genes Dev. 23(13): 1510-21. PubMed Citation: 19571180

Mahat, D. B., Kwak, H., Booth, G. T., Jonkers, I. H., Danko, C. G., Patel, R. K., Waters, C. T., Munson, K., Core, L. J. and Lis, J. T. (2016). Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat Protoc 11(8): 1455-1476. PubMed ID: 27442863

Marr, M. T., Isogai, Y., Wright, K. J. and Tjian, R. (2006). Coactivator cross-talk specifies transcriptional output. Genes Dev. 20(11): 1458-69. 16751183

Murakami, K., Elmlund, H., Kalisman, N., Bushnell, D. A., Adams, C. M., Azubel, M., Elmlund, D., Levi-Kalisman, Y., Liu, X., Gibbons, B. J., Levitt, M. and Kornberg, R. D. (2013). Architecture of an RNA polymerase II transcription pre-initiation complex. Science 342: 1238724. Abstract

Neves, A. and Eisenman, R. N. (2019). Distinct gene-selective roles for a network of core promoter factors in Drosophila neural stem cell identity. Biol Open 8(4). PubMed ID: 30948355

Nguyen, T. A., Jones, R. D., Snavely, A. R., Pfenning, A. R., Kirchner, R., Hemberg, M. and Gray, J. M. (2016). High-throughput functional comparison of promoter and enhancer activities. Genome Res 26(8): 1023-1033. PubMed ID: 27311442

Nikolov, D. B. and Burley, S. K. (1997). RNA polymerase II transcription initiation: A structural view. Proc. Natl. Acad. Sci. 94: 15-22. Medline abstract: 8990153

Ohler, U., Liao, G. C., Niemann, H. and Rubin, G. M. (2002). Computational analysis of core promoters in the Drosophila genome. Genome Biol 3(12): RESEARCH0087. PubMed ID: 12537576

Orphanides, G., Lagrange, T., and Reinberg, D. (1996). The general transcription factors of RNA polymerase II. Genes Dev. 10: 2657-83. Medline abstract: 8946909

Pahi, Z., Kiss, Z., Komonyi, O., Borsos, B. N., Tora, L., Boros, I. M. and Pankotai, T. (2015). dTAF10- and dTAF10b-containing complexes are required for ecdysone-driven larval-pupal morphogenesis in Drosophila melanogaster. PLoS One 10: e0142226. PubMed ID: 26556600

Parry, T. J., Theisen, J. W., Hsu, J. Y., Wang, Y. L., Corcoran, D. L., Eustice, M., Ohler, U. and Kadonaga, J. T. (2010). The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev 24(18): 2013-2018. PubMed ID: 20801935

Patel, A. B., Louder, R. K., Greber, B. J., Grunberg, S., Luo, J., Fang, J., Liu, Y., Ranish, J., Hahn, S. and Nogales, E. (2018). Structure of human TFIID and mechanism of TBP loading onto promoter DNA. Science 362(6421). PubMed ID: 30442764

Petrenko, N. and Struhl, K. (2021). Comparison of transcriptional initiation by RNA polymerase II across eukaryotic species. Elife 10. PubMed ID: 34515029

Pimmett, V. L., Dejean, M., Fernandez, C., Trullo, A., Bertrand, E., Radulescu, O. and Lagha, M. (2021). Quantitative imaging of transcription in living Drosophila embryos reveals the impact of core promoter motifs on promoter state dynamics. Nat Commun 12(1): 4504. PubMed ID: 34301936

Qiu, Y. and Gilmour, D. S. (2017). Identification of regions in the Spt5 subunit of DSIF that are involved in promoter proximal pausing. J Biol Chem [Epub ahead of print]. PubMed ID: 28213523

Rubtsova, M. P., Vasilkova, D. P., Moshareva, M. A., Malyavko, A. N., Meerson, M. B., Zatsepin, T. S., Naraykina, Y. V., Beletsky, A. V., Ravin, N. V. and Dontsova, O. A. (2019). Integrator is a key component of human telomerase RNA biogenesis. Sci Rep 9(1): 1701. PubMed ID: 30737432

Schor, I. E., Degner, J. F., Harnett, D., Cannavo, E., Casale, F. P., Shim, H., Garfield, D. A., Birney, E., Stephens, M., Stegle, O. and Furlong, E. E. (2017). Promoter shape varies across populations and affects promoter evolution and expression noise. Nat Genet 49(4): 550-558. PubMed ID: 28191888

Serebreni, L., Pleyer, L. M., Haberle, V., Hendy, O., Vlasova, A., Loubiere, V., Nemcko, F., Bergauer, K., Roitinger, E., Mechtler, K. and Stark, A. (2023). Functionally distinct promoter classes initiate transcription via different mechanisms reflected in focused versus dispersed initiation patterns. Embo J: e113519. PubMed ID: 37013908

Shah, N., Maqbool, M. A., Yahia, Y., El Aabidine, A. Z., Esnault, C., Forne, I., Decker, T. M., Martin, D., Schuller, R., Krebs, S., Blum, H., Imhof, A., Eick, D. and Andrau, J. C. (2018). Tyrosine-1 of RNA polymerase II CTD controls global termination of gene transcription in mammals. Mol Cell 69(1): 48-61 e46. PubMed ID: 29304333

Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T., Fukuda, S., Sasaki, D., Podhajska, A., Harbers, M., Kawai, J., Carninci, P. and Hayashizaki, Y. (2003). Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100(26): 15776-15781. PubMed ID: 14663149

Sigova, A. A., Abraham, B. J., Ji, X., Molinie, B., Hannett, N. M., Guo, Y. E., Jangi, M., Giallourakis, C. C., Sharp, P. A. and Young, R. A. (2015). Transcription factor trapping by RNA in gene regulatory elements. Science 350(6263): 978-981. PubMed ID: 26516199

Skaar, J. R., Ferris, A. L., Wu, X., Saraf, A., Khanna, K. K., Florens, L., Washburn, M. P., Hughes, S. H. and Pagano, M. (2015). The Integrator complex controls the termination of transcription at diverse classes of gene targets. Cell Res 25(3): 288-305. PubMed ID: 25675981

Tatomer, D. C., Elrod, N. D., Liang, D., Xiao, M. S., Jiang, J. Z., Jonathan, M., Huang, K. L., Wagner, E. J., Cherry, S. and Wilusz, J. E. (2019). The Integrator complex cleaves nascent mRNAs to attenuate transcription. Genes Dev 33(21-22): 1525-1538. PubMed ID: 31530651

Tanaka, A., Akimoto, Y., Kobayashi, S., Hisatake, K., Hanaoka, F. and Ohkuma, Y. (2015). Association of the winged helix motif of the TFIIEalpha subunit of TFIIE with either the TFIIEbeta subunit or TFIIB distinguishes its functions in transcription. Genes Cells 20: 203-216. PubMed ID: 25492609

van Arensbergen, J., FitzPatrick, V. D., de Haas, M., Pagie, L., Sluimer, J., Bussemaker, H. J. and van Steensel, B. (2017). Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 35(2): 145-153. PubMed ID: 28024146

Verma, N., Hung, K. H., Kang, J. J., Barakat, N. H. and Stumph, W. E. (2013). Differential utilization of TATA box-binding protein (TBP) and TBP-related factor 1 (TRF1) at different classes of RNA polymerase III promoters. J Biol Chem 288(38): 27564-27570. PubMed ID: 23955442

Verma, N., Hurlburt, A. M., Wolfe, A., Kim, M. K., Kang, Y. S., Kang, J. J. and Stumph, W. E. (2018). Bdp1 interacts with SNAPc bound to a U6, but not U1, snRNA gene promoter element to establish a stable protein-DNA complex. FEBS Lett 592(14): 2489-2498. PubMed ID: 29932462

Xie, X., et al. (1996). Structural similarity between TAFs and the heterotetrameric core of the histone octamer. Nature 380: 316-322. Medline abstract: 8598927

Zhang, Z., English, B. P., Grimm, J. B., Kazane, S. A., Hu, W., Tsai, A., Inouye, C., You, C., Piehler, J., Schultz, P. G., Lavis, L. D., Revyakin, A. and Tjian, R. (2016). Rapid dynamics of general transcription factor TFIIB binding during preinitiation complex assembly revealed by single-molecule analysis. Genes Dev 30: 2106-2118. PubMed ID: 27798851

date revised: 15 December 2022

Zygotically transcribed genes

The Interactive Fly resides on the
Society for Developmental Biology's Web server.