Giant

Gene name - giant

Synonyms -

Cytological map position - 3A2

Function - transcription factor

Keywords - gap gene

Symbol - gt

FlyBase ID:FBgn0001150

Genetic map position - 1-0.9

Classification - basic leucine zipper

Cellular location - nuclear

NCBI link: Entrez Gene

gt orthologs: Biolitmine

Recent literature

Hoermann, A., Cicin-Sain, D. and Jaeger, J. (2016). A quantitative validated model reveals two phases of transcriptional regulation for the gap gene giant in Drosophila. Dev Biol [Epub ahead of print]. PubMed ID: 26806702
Summary:
Understanding eukaryotic transcriptional regulation and its role in development and pattern formation is one of the big challenges in biology today. Most attempts at tackling this problem either focus on the molecular details of transcription factor binding, or aim at genome-wide prediction of expression patterns from sequence through bioinformatics and mathematical modelling. This study bridges the gap between these two complementary approaches by providing an integrative model of cis-regulatory elements governing the expression of the gap gene giant (gt) in the blastoderm embryo. A reverse-engineering method, where mathematical models are fit to quantitative spatio-temporal reporter gene expression data, was used to infer the regulatory mechanisms underlying gt expression in its anterior and posterior domains. These models are validated through prediction of gene expression in mutant backgrounds. A detailed analysis of the data and models reveals that gt is regulated by domain-specific CREs at early stages, while a late element drives expression in both the anterior and the posterior domains. Initial gt expression depends exclusively on inputs from maternal factors. Later, gap gene cross-repression and gt auto-activation become increasingly important. Auto-regulation creates a positive feedback, which mediates the transition from early to late stages of regulation. The existence and role of gt auto-activation was confirmed through targeted mutagenesis of Gt transcription factor binding sites. In summary, this analysis provides a comprehensive picture of spatio-temporal gene regulation by different interacting enhancer elements for an important developmental regulator.

Wu, S., Joseph, A., Hammonds, A. S., Celniker, S. E., Yu, B. and Frise, E. (2016). Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks. Proc Natl Acad Sci U S A 113: 4290-4295. PubMed ID: 27071099
Summary:
Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. This paper describes staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set of Drosophila early embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation of Drosophila expression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. This analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, the PP was used to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP, giant, hunchback, knirps, Kruppel, huckebein, and tailless, that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, 10 of the 11 links were reproduced in the well-studied gap gene network. The performance of PP with the Drosophila data suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.

Abed, J. A., Ghotbi, E., Ye, P., Frolov, A., Benes, J. and Jones, R. S. (2018). De novo recruitment of Polycomb-group proteins in Drosophila embryos. Development. PubMed ID: 30389849
Summary:
Polycomb-group (PcG)-mediated transcriptional repression of target genes can be delineated into two phases. First, following initial repression of target genes by gene-specific transcription factors, PcG proteins recognize the repressed state and assume control of the genes' repression. Once the silenced state is established, PcG proteins may maintain repression through an indefinite number of cell cycles. Little is understood about how PcG proteins initially recognize the repressed state of target genes and the steps leading to de novo establishment of PcG-mediated repression. This study describes a genetic system in which a Drosophila PcG target gene, giant (gt), is ubiquitously repressed during early embryogenesis by a maternally expressed transcription factor, and show the temporal recruitment of components of three PcG protein complexes, PhoRC, PRC1, and PRC2. De novo PcG recruitment follows a temporal hierarchy in which PhoRC stably localizes at the target gene at least one hour before stable recruitment of PRC2 and concurrent trimethylation of histone H3 at lysine 27 (H3K27me3). The presence of PRC2 and increased levels of H3K27me3 are found to precede stable binding by PRC1.

Zoller, B., Little, S. C. and Gregor, T. (2018). Diverse spatial expression patterns emerge from unified kinetics of transcriptional bursting. Cell 175(3): 835-847. PubMed ID: 30340044
Summary:

How transcriptional bursting relates to gene regulation is a central question that has persisted for more than a decade. This study measured nascent transcriptional activity in early Drosophila embryos and characterize the variability in absolute activity levels across expression boundaries. Boundary formation was demonstrated to follow a common transcription principle: a single control parameter determines the distribution of transcriptional activity, regardless of gene identity, boundary position, or enhancer-promoter architecture. The underlying bursting kinetics were inferred and the key regulatory parameter was identified as the fraction of time a gene is in a transcriptionally active state. Unexpectedly, both the rate of polymerase initiation and the switching rates for bcd, hb, Kr, kni and gt and are tightly constrained across all expression levels, predicting synchronous patterning outcomes at all positions in the embryo. These results point to a shared simplicity underlying the apparently complex transcriptional processes of early embryonic patterning and indicate a path to general rules in transcriptional regulation (Zoller, 2018).

Ghotbi, E., Lackey, K., Wong, V., Thompson, K. T., Caston, E. G., Haddadi, M., Benes, J. and Jones, R. S. (2020). Differential contributions of DNA binding proteins to Polycomb response element activity at the Drosophila giant gene. Genetics. PubMed ID: 31919108
Summary:
Polycomb-group (PcG) proteins are evolutionarily conserved epigenetic regulators whose primary function is to maintain the transcriptional repression of target genes. Recruitment of Drosophila melanogaster PcG proteins to target genes requires the presence of one or more Polycomb Response Elements (PREs). The functions or necessity for more than one PRE at a gene are not clear and individual PREs at some loci may have distinct regulatory roles. Various combinations of sequence-specific DNA binding proteins are present at a given PRE, but only Pleiohomeotic (Pho) is present at all strong PREs. The giant (gt) locus has two PREs, a proximal PRE1 and a distal PRE2. During early embryonic development, Pho binds to PRE1 approximately 90 minutes prior to stable binding to PRE2. This observation indicated a possible dependence of PRE2 on PRE1 for PcG recruitment; however, this study finds that PRE2 recruits PcG proteins and maintains transcriptional repression independently of Pho binding to PRE1. Pho-like (Phol) is partially redundant with Pho during larval development and binds to the same DNA sequences in vitro. Although binding of Pho to PRE1 is dependent on the presence of consensus Pho-Phol binding sites, Phol binding is less so and appears to play a minimal role in recruiting other PcG proteins to gt. Another PRE binding protein, Spps, is dependent on the presence of Pho for PRE1 binding. Further, this study showed that, in addition to silencing gene expression, PcG proteins dampen transcription of an active gene.

Lopez-Rivera, F., Foster Rhoades, O. K., Vincent, B. J., Pym, E. C. G., Bragdon, M. D. J., Estrada, J., DePace, A. H. and Wunderlich, Z. (2020). A Mutation in the Drosophila melanogaster eve Stripe 2 Minimal Enhancer Is Buffered by Flanking Sequences. G3 (Bethesda). PubMed ID: 33037064
Summary:
Enhancers are DNA sequences composed of transcription factor binding sites that drive complex patterns of gene expression in space and time. Until recently, studying enhancers in their genomic context was technically challenging. Therefore, minimal enhancers, the shortest pieces of DNA that can drive an expression pattern that resembles a gene's endogenous pattern, are often used to study features of enhancer function. However, evidence suggests that some enhancers require sequences outside the minimal enhancer to maintain function under environmental perturbations. It is hypothesized that these additional sequences also prevent misexpression caused by a transcription factor binding site mutation within a minimal enhancer. Using the Drosophila melanogaster even-skipped stripe 2 enhancer as a case study, the effect was examined of a Giant binding site mutation (gt-2) on the expression patterns driven by minimal and extended enhancer reporter constructs. In contrast to the misexpression caused by the gt-2 binding site deletion in the minimal enhancer, the same gt-2 binding site deletion in the extended enhancer did not have an effect on expression. The buffering of expression levels, but not expression pattern, is partially explained by an additional Giant binding site outside the minimal enhancer. Deleting the gt-2 binding site in the endogenous locus had no significant effect on stripe 2 expression. These results indicate that rules derived from mutating enhancer reporter constructs may not represent what occurs in the endogenous context.

Ghotbi, E., Ye, P., Ervin, T., Kum, A., Benes, J. and Jones, R. S. (2021). Polycomb-group recruitment to a Drosophila target gene is the default state that is inhibited by a transcriptional activator. Sci Adv 7(29). PubMed ID: 34272248
Summary:
Polycomb-group (PcG) proteins are epigenetic regulators that maintain the transcriptional repression of target genes following their initial repression by transcription factors. PcG target genes are repressed in some cells, but active in others. Therefore, a mechanism must exist by which PcG proteins distinguish between the repressed and active states and only assemble repressive chromatin environments at target genes that are repressed. This study presents experimental evidence that the repressed state of a Drosophila PcG target gene, giant (gt), is not identified by the presence of a repressor. Rather, de novo establishment of PcG-mediated silencing at gt is the default state that is prevented by the presence of an activator or coactivator, which may inhibit the catalytic activity of Polycomb-repressive complex 2 (PRC2).

Duk, M. A., Gursky, V. V., Samsonova, M. G. and Surkova, S. Y. (2021). Application of Domain- and Genotype-Specific Models to Infer Post-Transcriptional Regulation of Segmentation Gene Expression in Drosophila. Life (Basel) 11(11). PubMed ID: 34833107
Summary:
Unlike transcriptional regulation, the post-transcriptional mechanisms underlying zygotic segmentation gene expression in early Drosophila embryo have been insufficiently investigated. Condition-specific post-transcriptional regulation plays an important role in the development of many organisms. A recent study revealed the domain- and genotype-specific differences between mRNA and the protein expression of Drosophila hb, gt, and eve genes in cleavage cycle 14A. This study used this dataset and the dynamic mathematical model to recapitulate protein expression from the corresponding mRNA patterns. The condition-specific nonuniformity in parameter values is further interpreted in terms of possible post-transcriptional modifications. For hb expression in wild-type embryos, the results predict the position-specific differences in protein production. The protein synthesis rate parameter is significantly higher in hb anterior domain compared to the posterior domain. The parameter sets describing Gt protein dynamics in wild-type embryos and Kr mutants are genotype-specific. The spatial discrepancy between gt mRNA and protein posterior expression in Kr mutants is well reproduced by the whole axis model, thus rejecting the involvement of post-transcriptional mechanisms. These models fail to describe the full dynamics of eve expression, presumably due to its complex shape and the variable time delays between mRNA and protein patterns, which likely require a more complex model. Overall, this modeling approach enables the prediction of regulatory scenarios underlying the condition-specific differences between mRNA and protein expression in early embryo.

BIOLOGICAL OVERVIEW

giant is a gap gene that codes for a transcriptional repressor. Early in development it is expressed in two broad stripes, belting the embryo rather like a bikini, leaving bare the central "midriff" as well as the anterior and posterior ends. In keeping with its developmental group, lack of giant expression produces gaps in both anterior and posterior structures, specifically the labial and labral head structures and abdominal segments A5 through A7 [Images].

The regulation of giant involves at least three other genes acting to either to repress or invite giant's activity. hunchback acts as a concentration-dependent repressor of giant, restricting its most anterior expression (Eldon, 1991b and Kraut, 1991 and Struhl, 1992). However, giant will not be expressed in the anterior domain without the presence of bicoid (Eldon, 1991), and another required gene, caudal, whose necessary involvement has only recently been identified (Schulz, 1995). Activation in the posterior domain also demands the combined activities of caudal and bicoid. When the persistent expression of hb represses the posterior domain of caudal (as is the case in nanos and oskar mutants), posterior giant expression is not apparent (Eldon, 1991 and Rivera-Pomar and Schulz, 1995).

giant expression in the head becomes increasingly complex over time, developing four stripes. Early on, one straddles the cephalic furrow and another enters the anterior stomodeal invagination to come into contact with a group of cells already strongly expressing giant inside the clypeolabrum. Mutation in both orthodenticle and empty spiracles will alter giant anterior expression. This suggests giant expression is modified by gap genes specific to the head.

Giant helps define domains of expression for the pair-rule genes even-skipped, paired and fushi tarazu. It also delimits the anterior boundary of early Antennapedia expression. giant's complex expression pattern in the head suggests it is involved in as yet undocumented processes of head morphogenesis.

Genetic experiments and a targeted misexpression approach have been combined to examine the role of giant in patterning anterior regions of the Drosophila embryo. The results suggest that gt functions in the repression of three target genes, the gap genes Kruppel (Kr) and hunchback (hb), and the pair-rule gene even-skipped (eve). The anterior border of Kr, which lies 4-5 nucleus diameters posterior to nuclei that express GT mRNA, is set by a threshold repression mechanism involving very low levels of Gt protein. The gap gene Kr is activated in a broad central region of precellular embryos. Midway through cleavage cycle 14, this domain extends from 41-59% egg length. The initial positioning of the anterior border of this domain is thought to be controlled by repression involving a combination of maternal and zygotic hunchback transcripts. To test whether gt is also involved in setting or maintaining this border, the Kr expression pattern was analyzed in embryos containing the st2-gt transgene, a modified version of the 480 bp eve stripe 2 enhancer. These embryos show no changes in the initial positioning of the Kr expression domain early in cleavage cycle 14, but slightly later there is a dramatic retraction of the anterior Kr border. The delay in the observed repressive effect on the Kr anterior border is probably due to the fact that the Kr domain is expressed earlier than the st2-gt transgene. Higher levels of ectopic gt result in a more severe retraction, suggesting that Kr transcription is very sensitive to repression by gt. To test whether gt affects Kr expression during normal development, Kr expression was examined in embryos that carry a strong hypomorphic gt allele. The initial Kr expression pattern was correctly established in these gt hypomorphic embryos. However, slightly later, a significant anterior expansion (from 59% to 65% egg length) is observed, suggesting that gt-mediated repression is essential for maintaining the position of the anterior border of the Kr domain (Wu, 1998).

giant activity is required, but not sufficient, for the formation of the anterior border of eve stripe 2, which lies adjacent to nuclei that express GT mRNA. It is proposed that gt's role in forming this border is to potentiate repressive interaction(s) mediated by other factor(s) that are also localized to anterior regions of the early embryo. It is not clear whether gt is sufficient for repression of the in vivo eve stripe 2 response. To test this, eve expression was examined in embryos containing the st2-gt transgene, which extends the gt domain so that it overlaps the position of eve stripe 2. Surprisingly, the ectopic gt causes only a weak transient reduction of the stripe early in cycle 14. Later the stripe recovers to full strength, but expands toward the posterior by about two nucleus diameters. Double in situ hybridization experiments show that the timing and the extent of the expansion correlates well with the retraction of the Kr domain, suggesting that the expansion of eve stripe 2 is indirectly caused by relief from Kr repression. Doubling the ectopic gt expression levels still does not cause a significantly stronger repression, suggesting that eve stripe 2 is quite insensitive to gt repression. To test whether the effects of ectopic gt on eve stripe 2 are controlled by the early or late regulatory elements, the expression of lacZ reporter genes was examined in embryos containing st2-gt transgenes. It is likely that the posterior expansion of endogenous eve stripe 2 caused by the st2-gt transgene is mediated through the early acting enhancer (Wu, 1998).

The recalcitrance of the eve stripe 2 response to ectopic gt expression led to a reexamination of the eve expression pattern in gt mutant embryos. Early in cycle 14, these mutants show a derepression in the interstripe region between stripes 1 and 2. However, later in cycle 14, gt mutants show a dramatic reduction in stripe 2 expression levels, suggesting a role for gt in maintaining the stripe. Since Kr has been previously implicated as the repressor that forms the stripe 2 posterior border, it is possible that the stripe 2 reduction in gt mutants is indirectly caused by Kr, which expands anteriorly to completely overlap the diminishing stripe. The repression of eve stripe 2 observed in gt mutants can be relieved by reducing Kr levels. These results suggest that a major function of the anterior gt domain is to prevent Kr from expanding anteriorly, thus permitting the expression of eve stripe 2. Furthermore, since gt repression maintains the position of the anterior Kr border in wild-type embryos, it indirectly defines the position of the posterior border of eve stripe 2 (Wu, 1998).

In principle, the preceding experiments support the hypothesis that gt acts as a concentration-dependent repressor to set the anterior borders of the Kr and eve stripe 2 expression domains in different positions. Ectopic gt is an effective repressor of Kr, but has little effect on the activation of eve stripe 2. In situ hybridization experiments indicate that endogenous gt levels are significantly higher than the ectopic gt driven by even the strongest st2-gt transgenic lines. Perhaps these higher endogenous levels are required for effectively setting the anterior border of eve stripe 2. If this is the case, the early expansion of eve stripe 2 toward stripe 1 detected in gt mutants should not be affected in embryos in which the endogenous gt gene is replaced by the st2-gt misexpression domain. To test this, eve expression was examined in gt mutants that contained the st2-gt5 transgene. Surprisingly, a sharp anterior eve stripe 2 border is formed in these embryos, with a clear interstripe between eve stripes 1 and 2. Furthermore, the st2-gt domain rescues eve stripe 2 to full strength, with a posterior expansion that is probably due to repression of the anterior Kr border. The relatively low levels of ectopic gt driven by the st2-gt construct overlap the endogenous gt domain and extend 4-5 nucleus diameters posteriorly. The fact that a sharp anterior eve stripe 2 border is formed in embryos containing only this domain argues against a simple concentration-dependent mechanism for setting this border. Rather, it is proposed that other factor(s) are involved along with gt in defining the anterior border of eve stripe 2 in vivo. Thus, gt may act as a potentiator of repression mediated by these localized factors. Since gt encodes a putative leucine zipper (b-ZIP) protein, one possibility is that this activity is also a b-ZIP protein that can heterodimerize with gt as part of an effective repressor complex. Repressive function in the absence of gt would be provided by a homodimer of this protein. Alternatively, since the gt site deletions tested in previous experiments removed relatively long sequences (14-43 bp), it is possible that these deletions may have removed or interrupted binding sites for other protein(s) (Wu, 1998).

gt is required for repression of zygotic hb expression in more anterior regions of the embryo. Zygotic expression of hb is initially activated by the bcd and maternal hb gradients in a broad domain that spans the anterior half of the embryo. This expression is then rapidly refined during nuclear division cycle 14, leaving a secondary pattern that includes a variable head domain, a stripe at the position of parasegment 4 (PS4), and a posterior stripe. The PS4 stripe overlaps the anterior border of the Kr domain. By examining hb expression in gt mutants, significant changes in this secondary pattern were detected. Initially, hb expression at the position of PS4 is greatly reduced, possibly because of the anterior expansion of the Kr domain in gt mutants. High levels of hb expression persist in more anterior regions of gt mutant embryos. The persistent hb expression domain appears very similar in shape to the normal gt domain, suggesting that gt may act as a repressor to clear hb expression from this part of the embryo during wild-type development. To test whether endogenous gt levels were required for this repression, hb expression was examined in gt mutants that also contained the st2-gt transgene. hb expression is repressed normally by a single copy of the st2-gt5 transgene, suggesting that relatively low levels of ectopic gt can replace this function of the endogenous gene. Since gt seems to be involved in repression of hb in anterior regions, it is possible that this repression is important for setting the anterior border of the hb PS4 stripe during wild-type development. To test this, hb expression was examined in embryos containing the st2-gt transgene. The position of the anterior border of the hb PS4 stripe appears unchanged in these embryos, suggesting that the levels of ectopic gt tested here are not sufficient to repress hb PS4 expression. However, a slight posterior expansion of this stripe could be detected in embryos with high levels of misexpression, which is probably caused by the retraction of the Kr domain. This supports the hypothesis that Kr activity is important for setting the posterior PS4 stripe border, and further demonstrates the importance of gt-mediated restriction of Kr expression to central regions of the embryo (Wu, 1998).

cis-Regulatory logic of Giant mediated short-range transcriptional repression in Drosophila melanogaster

Bioinformatics analysis of transcriptional control is guided by knowledge of the characteristics of cis-regulatory regions or enhancers. Features such as clustering of binding sites and co-occurrence of binding sites have aided enhancer identification, but quantitative predictions of enhancer function are not yet generally feasible. To facilitate the analysis of regulatory sequences in Drosophila melanogaster, quantitative parameters were identified that affect the activity of short-range transcriptional repressors, proteins that play key roles in development. In addition to the previously noted distance dependence, repression is strongly influenced by the stoichiometry, affinity, spacing, and arrangement of activator binding sites. Repression is insensitive to the type of activation domain, suggesting that short-range repression may primarily affect activators at the level of DNA binding. The activity of several short-range, but not long-range, repressors is circumscribed by the same quantitative parameters. This cis-regulatory 'grammar' may aid the identification of enhancers regulated by short-range repressors and facilitate bioinformatic prediction of the functional output of transcriptional regulatory sequences (Kulkarni, 2005).

The activity of short-range transcriptional repressors has been studied mostly in the context of complex natural enhancers. To analyze cis-acting element activity in a setting in which activator-repressor composition, stoichiometry, and spacing can be exactly defined, chromosomally integrated, compact regulatory modules were constructed containing binding sites for the endogenous short-range repressor Giant and chimeric Gal4 activators. The space between repressor and activator sites on these elements is less than 100 bp, a distance over which short-range repressors have been previously shown to be effective. The activity of the chimeric Gal4 activator is localized to the ventral regions of the embryo, where it is expressed under the control of ventral-specific enhancer elements. Strikingly, Giant was unable to repress the activity of a minimal Gal4 activator protein on a reporter gene in which Giant binding sites were located immediately 5' of five high-affinity Gal4 sites. This lack of repression activity reveals a hitherto unknown limitation of short-range repressors. Giant represses adjacent Dorsal and Twist activators on similar reporter genes, indicating that Giant can bind such a reporter gene and that the hsp70 basal promoter is not inherently resistant to repression. The close proximity of the Gal4 activators to the hsp70 basal promoter may prevent Giant from mediating repression on this reporter; therefore, a neutral 400-bp spacer sequence was introduced between the Gal4 binding sites and the transcriptional start site. However, Giant was also unable to repress in this context. The inability of Giant to repress is not due to an inherent resistance of the Gal4 activation domain, for Giant was able to repress the activity of the Gal4 activator on a gene containing a cluster of five Gal4 binding sites 5' of the eve basal promoter. The repression in anterior and posterior regions is relieved when this transgene is assayed in giant mutant embryos, confirming that the observed repression is mediated by Giant (Kulkarni, 2005).

These results indicate that the simple notion that short-range repressors block the activity of all protein complexes within 100 bp is an oversimplification. Clearly, mere proximity is not the only determinant affecting repression by Giant. This study set out to systematically define other factors that dictate repression effectiveness to uncover a potential cis-regulatory grammar of short-range repression. The repressed and nonrepressed reporter genes differ in the sequence of the activator sites, nature of the basal promoters, and repressor position with respect to the transcriptional start site. Activator binding site affinity or spacing seems likely to be a more important factor, because Giant has been previously shown to be able to repress genes with both types of basal promoter, and the relative spacing of the repressors to +1 should in fact favor repression (Kulkarni, 2005).

Studies of the hairy gene in Drosophila led to the suggestion that the overall stoichiometry (rather than the absolute number) of activators and repressors may be critical in dictating enhancer output. To test whether the stoichiometry of activators to repressors is a critical factor in determining short-range repression levels by Giant, the number of Gal4 activator binding sites on the hsp70-lacZ reporter was reduced from five. As anticipated, the levels of transcriptional activation by the minimal Gal4 activator were lower in the transgene containing three Gal4 sites, leading to a less robust ventral staining pattern. In this context, Giant is able to block transcription of the lacZ gene. However, the removal of two Gal4 sites also positions the repressors closer to the start of transcription, which may facilitate repression of the basal promoter ("direct repression"). Therefore, to maintain the distance between Giant binding sites and the start of transcription, a neutral spacer was placed downstream of the three Gal4 sites. Again, Giant was also able to repress the minimal Gal4 activator. These results demonstrate that repression is critically dependent on the number of activator binding sites but do not explicitly differentiate between the overall level of transcriptional activation and binding site number. These results are also consistent with previous analyses of the eve stripe 2 element, where the insertion of additional Bicoid binding sites in an otherwise normal stripe 2 enhancer causes a slight anterior expansion of its expression pattern, suggesting that an excess of Bicoid activators can "overwhelm" the Giant repressor (Kulkarni, 2005).

Binding site affinity influences threshold responses to activator gradients in the embryo, and indeed, transcription factor binding sites of various affinities are typically found in many developmental enhancers that function during early Drosophila development. Such differences in activator site affinity might similarly influence responses to short-range repressors. Whether maintaining the number of activator sites but weakening their affinity would in fact change the response to repressors was tested. The five high-affinity Gal4 binding sites in the hsp70 lacZ reporter were replaced with five copies of a site from the Saccharomyces cerevisiae Gal1-Gal10 promoter that has been characterized as a weaker Gal4 binding site. The minimal Gal4 activator drives gene expression in a weaker, striped pattern from the lower-affinity Gal4 sites. Anterior and posterior repression by Giant is evident. As expected, later in development, when Giant protein is no longer present, lacZ is expressed in a continuous swathe. The striped expression of the constructs is thought to be due to the binding of uncharacterized pair-rule repressors to spacer sequences in the reporter (Kulkarni, 2005).

In the process of weakening the Gal4 binding sites, five high-affinity binding sites for the Bicoid activator were inadvertently created, providing an additional opportunity to assay Giant repression activity. Bicoid is maternally deposited in the anterior regions of the embryo, forming an anterior-to-posterior gradient. lacZ expression from the hsp70 reporter is activated even in the absence of the Gal4 activator by the Bicoid transcription factor in anterior regions. As the embryo develops, Giant inhibits Bicoid activation of lacZ, which is thereby progressively refined into a two-stripe pattern, in regions where giant is not expressed. Analysis of the transgene in a giant mutant background in the absence of Gal4 confirms that refinement of reporter gene expression is due to repression by Giant. These results suggest that five Bicoid binding sites are more susceptible to repression than are five high-affinity Gal4 sites, indicating that stoichiometric relationships of repressors to activators in turn may depend on either distinct DNA binding domains or the type of activation domains (Kulkarni, 2005).

The differential effectiveness of Giant against five Gal4 or five Bicoid binding sites suggests that the nature of the activation domain itself or the DNA binding domain of the transcriptional activator may play a role in dictating the response to repressors. To distinguish between those two possibilities, the activities of a variety of activation domains fused to the DNA binding domain of Gal4 were tested. In addition to the Gal4 activation domain, the acidic transcriptional activation domain of the herpes simplex virus activator VP16, the glutamine-rich activation domain of the mammalian transcription factor Sp1, and the hTBP, which has been shown to function as an activator when targeted to the promoter via the Gal4 DNA binding domain, were tested. Attempts were also made to test the activity of Gal4-Bicoid activators, but unfortunately, these chimeras exhibit strong promoter specificity and are not active on the hsp70 promoter, which precluded a direct comparison. The Gal4 chimeric proteins were used to drive expression of the hsp70 lacZ reporter from the cluster of five high-affinity Gal4 sites. Giant could inhibit neither the strong Gal4 and VP16 activators nor the weak activation domains of Sp1 and hTBP. These results indicate that the ability to repress does not depend on the strength of the activation domain or the activation pathway. Only those genes in which the number or affinity of Gal4 sites was reduced showed a response to Giant, suggesting that the Gal4 DNA binding domain provides a stable platform that can resist the activity of Giant. These results are consistent with a mechanism for short-range repression that involves blocking activator access to its cognate sites (Kulkarni, 2005).

Statistical models, based on motif clustering, are only partially successful at finding novel cis-regulatory elements in the genome, perhaps because they consider only site density and relative site affinity. However, it is probable that specific arrangements of binding motifs also contribute to biological function. The effect of alternative arrangements of Giant repressor and Gal4 activator binding sites was tested to determine if different arrangements or combinations resulted in distinct transcriptional outputs. In all reporter arrangements tested, four Giant binding sites and five high-affinity Gal4 binding sites, bound by the minimal Gal4 activator were tested. Flanking the five Gal4 activator sites with two Giant sites on either side resulted in repression of the proximal hsp70 lacZ reporter gene. Interspersing the Giant repressor binding sites between the Gal4 activator sites also resulted in the inhibition of lacZ expression. However, placing all four Giant binding sites 5' of the five Gal4 sites prevented Giant from repressing the hsp70 lacZ expression, suggesting again that promoter response cannot be calculated simply from overall activator-to-repressor stoichiometries (Kulkarni, 2005).

The Giant binding sites in reporter genes were placed in close proximity to the basal promoter; therefore, it is possible that Giant directly represses the basal promoter. To distinguish between repressor-basal promoter and repressor-activator effects, transcription of the w gene, which is ~4.5 kbp 3' of these sites was tested. Again, it was observed that Giant mediates repression only when flanking or interspersed with activators but not when situated 5' of the activator sites. This result suggests that Giant is acting on the activator cluster rather than only on the basal promoter element (Kulkarni, 2005).

Previous analysis of the short-range repressor Giant demonstrated that due to the extreme distance-dependent activity of this protein, subtle changes in the spacing of Giant binding sites endowed a promoter with high or low sensitivity to repression. Whether Giant's ability to repress a smaller cluster of three Gal4 sites could be affected by small changes in spacing between the activator and repressor binding sites was tested. Moving the smaller cluster of three Gal4 sites 37 bp away from the Giant binding sites results in the loss of repression, suggesting that reducing the amount of activation potential does not guarantee repression by Giant in all cases, even when the activators are located within 100 bp of the repressor sites. In order to ascertain whether the spacing effects seen are specific to this particular activator protein (i.e., Gal4-Gal4 AD), the ability was tested of Giant to block transcription mediated by the full-length Gal4 protein expressed ubiquitously throughout the embryo and the Gal4-VP16 fusion protein. As seen with the minimal Gal4 activation domain, Giant is able to repress lacZ expression mediated by the full-length Gal4 protein and Gal4-VP16 from three sites that are adjacent to the Giant binding sites. Moving the three sites 37 bp further away results in the loss of repression of both Gal4-mediated and Gal4-VP16-mediated activation by Giant (Kulkarni, 2005).

The contextual dependencies of repression described above were characterized for the Giant repressor. To determine if similar rules applied to other types of repressors, parallel evaluations of the short-range repressors Giant, Knirps, and Krüppel were carried out. To test quantitative similarities or differences between these factors, reporters were created that would compare repressor activity on genes that represented permissive or nonpermissive contexts for the Giant protein. All three of these short-range repressors were unable to inhibit lacZ expression driven by the minimal Gal4 activator from five high-affinity Gal4 sites, indicating a similar limitation of repression on even proximally bound activators. The Giant and Krüppel factors were active in the corresponding regions of the embryo when tested against three Gal4 sites. The Knirps repressor was also active in this context, although in general, the levels of repression appeared to be lower. In contrast, the long-range repressor Hairy was able to mediate repression of transgenes containing either three or five high-affinity Gal4 sites. Interestingly, as the embryo aged, repression by Hairy was first attenuated and then completely absent during germ band elongation, indicating that this type of repression, though potent, is also transient. The similarity in the activity of the short-range repressors Giant, Knirps, and Krüppel, in contrast to that of Hairy, suggests that the contextual rules for repression are governed by the functional class of repressor and likely reflects mechanistic differences (Kulkarni, 2005).

Thus, using defined synthetic enhancer elements, it was demonstrated that there is a rich set of rules or contextual grammar that influences the activity of short-range repression extending beyond the generalization that these factors block activators situated within ~100 bp. Although distance is a critical factor in dictating repression effectiveness, it is not the only one, and in some cases, close proximity alone is not sufficient to ensure regulation by these transcriptional repressors. Activators can retain function even when the binding sites are within the previously defined 100-bp effective range of short-range repression. The manipulation of these composite enhancer elements in terms of the number of activator and repressor binding sites, relative affinities, spacing, and distribution of binding sites and the type of activation domains allowed other contextual parameters to be defined that dictate repression effectiveness. First, it was found that the ratio of activators and repressors is an important factor; in the context of five high-affinity Gal4 sites, four Giant sites can mediate repression but two sites do not. Reducing the number of Gal4 binding sites from five to three allowed two Giant sites to repress the lacZ reporter gene. Second, although the effectiveness of repression depends on stoichiometry between the number of activators and repressors, Giant repression of a smaller cluster of activators can be attenuated by subtle changes (<40 bp) in the spacing between the repressor and activator binding sites, even when activator binding sites in this situation are within the previously defined 100-bp range of repression. Such subtle changes in spacing between Giant and activator sites may explain the internal reconfigurations in enhancer design that have been demonstrated to occur between functionally homologous even-skipped stripe 2 enhancers and presumably many other cis-regulatory elements. Indeed, it was found that in order to mediate repression effectively, short-range repressors need to be judiciously placed, either flanking activator sites or interspersed among them, possibly to block multiple modes of activator-promoter interactions. A fourth finding is that repression effectiveness correlates with activator site affinity, and although binding affinity influences the strength of the activating signal, repression does not depend on the chemical nature of the activation domain. Although these experiments were carried out in the context of Gal4 fusion activators, it is likely that similar principles apply for repression of other activators, since repression of native activators also shows strong context dependence. Most likely, quantitative aspects of the relationships that were identified will vary depending on the DNA binding characteristics of different factors, whose characteristics will be established by further empirical tests. Determination of such quantitative factors contributes to understanding of enhancer design and should find application in bioinformatics analysis of novel gene regulatory sequences as well as providing insights into the evolution and biochemical activity of short-range repressors (Kulkarni, 2005).

Computational approaches have focused on the identification of transcriptional regulatory regions based on patterns of binding sites and evolutionary conservation of sequences. A more ambitious objective is to identify quantitative information about enhancers, including temporal, spatial, and quantitative output of such elements. More sophisticated analytical tools might also involve identification of conserved patterns of binding site stoichiometries, arrangements, and affinities that are not readily discernible by using conventional analyses. Recently, bioinformatics analysis of number and affinity of binding sites for the Knirps and Hunchback repressors was used to successfully predict the relative sensitivity of different regulatory sequences to these factors. In addition to quantitating the number and affinity of factor sites, this study indicates that bioinformatics analysis should also take into account the stoichiometry of activators to repressors, the exact spacing involved, and the nature of the DNA binding domains involved. Clearly, these studies focus on the effects of one class of repressor protein; more comprehensive work will be required to elaborate parameters relevant for other types of repressors and for activators. It is unlikely that particular contextual grammars would apply to all transcription factors; however, it is encouraging that the short-range repressors tested so far show similar characteristics. It therefore appears possible to model the properties of groups of proteins without having to develop distinct cis-regulatory grammar rules for each one. Incremental improvements to current approaches, based on the identification of cis-regulatory grammars, will usefully enhance the power of computational tools and allow the extension of bioinformatics analysis to specific data sets (Kulkarni, 2005).

The contextual grammar defined in this study presents a phenomenological perspective to short-range repression, but the results also shed light on possible repression mechanisms. Three models have been presented for the action of short-range repressors. First, by binding overlapping sites, these repressors might directly compete with activators for binding to DNA, a situation that can be demonstrated experimentally. This mechanism has not been shown to play a role in endogenous enhancers, and where experimentally tested, the DNA binding domain of Knirps was not able to mediate repression in the embryo. It is in any event unlikely to be important in cases where the activator and repressor binding sites are separated, as is the case here. Second, repressors might 'quench' neighboring activators, inhibiting their access to the DNA or blocking their interaction with other components of the transcriptional machinery. Third, the proteins might not affect activators but directly contact the basal transcriptional machinery. The results obtained in this study and a recent study are most compatible with the second, quenching model of action. Closely spaced factors can simultaneously mediate opposite transcriptional regulatory outputs, which would be hard to rationalize in the context of basal machinery interactions but is readily explainable in light of different susceptibilities of activators to chromatin remodeling. In addition, as shown in this study, the sensitivity of activators toward repression appears to be most closely linked to the DNA binding domain and affinity of the binding site rather than the activation domain, which may reflect a limited access to the DNA template under repression conditions (Kulkarni, 2005).

The apparent lack of activator specificity demonstrated by short-range repressors also suggests that these proteins function via a general mechanism. Giant, Knirps, Krüppel, and Snail can block the activity of a number of activators such as Bicoid, Hunchback, Dorsal, Twist, and D-Stat. Many biochemical and genetic analyses suggest that at least some of these activators activate transcription via distinct pathways. This study has demonstrated that repression effectiveness does not depend on the nature of the activation domain but correlates instead with activator binding site affinity and placement. These findings are consistent with a mechanism that inhibits transcription by blocking access to DNA by transcriptional activators via local chromatin changes (Kulkarni, 2005).

This model is also consistent with biochemical properties of short-range repressors. These proteins interact with CtBP, which in turn binds chromatin-modifying factors, including histone deacetylases (HDAC1 and HDAC2) and histone methyltransferases. Knirps genetically and physically interacts with Rpd3, the Drosophila homolog of HDAC1 (P. Struffi, unpublished data cited in Kulkarni, 2005). The Rpd3 protein in yeast is known to deacetylate histones at an extremely local level, consistent with its role in short-range repression in Drosophila. Knirps, Giant, and Krüppel can repress in a CtBP-independent fashion, but this activity appears to possess similar properties to that mediated by the Drosophila CtBP-dependent activity, providing a quantitative, rather than qualitative, effect. Thus, both the Drosophila CtBP-dependent and -independent activities of the short-range repressors might work via chromatin remodeling (Kulkarni, 2005).

This study demonstrates that the Hairy repressor, in addition to working over a longer range, is also a more potent repressor on a local level, presumably because of its distinct biochemical mechanism for repression. By examining the nature of the promoter complexes and the chromatin state before and after repression, the defined transcriptional switch elements used in this study will facilitate further biochemical characterization of short- and long-range repressors (Kulkarni, 2005).

Diverse spatial expression patterns emerge from unified kinetics of transcriptional bursting

How transcriptional bursting relates to gene regulation is a central question that has persisted for more than a decade. This study measure nascent transcriptional activity in early Drosophila embryos and characterize the variability in absolute activity levels across expression boundaries. Boundary formation follows a common transcription principle: a single control parameter determines the distribution of transcriptional activity, regardless of gene identity, boundary position, or enhancer-promoter architecture. The underlying bursting kinetics were inferred, and the key regulatory parameter was identified as the fraction of time a gene is in a transcriptionally active state. Unexpectedly, both the rate of polymerase initiation and the switching rates are tightly constrained across all expression levels, predicting synchronous patterning outcomes at all positions in the embryo. These results point to a shared simplicity underlying the apparently complex transcriptional processes of early embryonic patterning and indicate a path to general rules in transcriptional regulation (Zoller, 2018).

A multitude of processes influence eukaryotic transcription rates. It is not clear which events might be more likely than others to determine the kinetics of bursting-either globally or in a gene specific manner, nor is it known how bursting kinetics compare across endogenous genes over a range of expression levels. Quantitative bursting measurements reveal that all gap gene (hunchback, knirps, Kruppel and giant) expression boundaries arise from the same underlying kinetics regardless of the differences in regulatory elements. Thus, from the complex combination of diverse interactions specific to each gene emerges a simple, common strategy for transcriptional regulation (Zoller, 2018).

The recognition of shared regulation surfaced only upon development of a highly precise single-molecule method of quantification. Conclusions about bursting depend heavily upon understanding sources and extent of measurement error and minimizing variability from extrinsic sources. Extrinsic processes, such as cell growth and division, DNA duplication, and mRNA transport and decay, can significantly affect the apparent variability between cells and thus also bursting rates. These effects were minimized by measuring transcription at nascent sites in an endogenous system with synchronized cell divisions. Moreover, explicit quantification of measurement error resulted in a noise model that significantly constrained the inference framework. All these approaches are generally applicable to enable precise quantification in any system (Zoller, 2018).

The fundamental mean-cumulant relationships uncovered in this study demonstrate that a single-parameter distribution globally determines transcriptional activity. Employing the telegraph model, this study found that the modulation of mean occupancy (η) predicts mean mRNA synthesis rates comparable with previous measurements and reproduces the distribution of nascent activity, whereas kini and τ_n (see Terminology and Parameterization of Transcription Rates) are conserved. The global behavior observed is surprising, given that bursting is generally believed to be gene and promoter specific. Multiple factors and processes, including enhancer-promoter interactions, chromatin context, nucleosome occupancy, Pol II pausing, and transcription factor interactions, all impinge on bursting rates. It remains to be determined whether the same processes are modulated in the same manner or, conversely, whether different regulatory strategies have converged to generate identical transcriptional activity across genes (Zoller, 2018).

These observations raise the question of whether the common transcriptional bursting kinetics carry a functional advantage. In early embryos, the precise positioning of cell fates requires minimizing variability between nuclei, which is achieved by a combination of long mRNA lifetimes permitting accumulation and spatial averaging through the syncytial cytoplasm. In principle, modulating kini (Pol II initiation rate) at a constitutive promoter would generate the theoretical minimal (Poisson) transcriptional noise at all levels. The fact that neither constitutive activity (η≤0.85) nor Pol II saturation (k_elo/k_ini~215 bp >;> Pol II footprint) is ever observed suggests that some constraint prohibits this system from maintaining a continuous active state and/or it is not straightforward to alter k_ini. Instead, a constant switching correlation time suggests that this value is important in facilitating robust patterning. It is proposed that both expression timing and noise minimization jointly constrain switching rates (Zoller, 2018).

The mechanistic origins of the conserved parameters are unknown. One possibility is that protein-DNA affinities have been individually selected to confer the switching rates that were observe. However, it is unclear how transient transcription factor interactions, usually on the order of seconds, could generate bursts on the order of minutes. Another possibility is that the fast transcription factor binding kinetics are masked by the slower dynamics of common general factors involved in the transcription process. In fact, recent evidence suggests that mediator and TATA-binding protein binding, as well as the core promoter and its shape, play a key role in bursting. Alternatively, processes of potentially even slower dynamics, such as long-range enhancer-promoter interactions, chromatin modification, or Pol II pausing, may determine common bursting kinetics (Zoller, 2018).

The observed constancy of τ_n (switching correlation time; see Terminology and Parameterization of Transcription Rates) will guide further modeling and identification of the molecular mechanisms. This constancy is connected to the binomial noise level. Extensions of the two-state model must provide similar filtering of the binomial noise, which will restrict the possible class of models. For example, two particular extensions of the two-state model were tested. One possibility is a three-state model consisting of a two-step reversible activation. Alternatively, a model with an additional noise term, such as input noise stemming from input transcription factor diffusion, could explain dual modulation of switching rates observed under the two-state model. However, distinguishing these models will require live imaging (Zoller, 2018).

The common transcriptional parameters of the gap genes highlight a form of complexity reduction: despite the variety of upstream regulatory elements, all expression boundaries result from similar bursting kinetics. Whether this signature results from an underlying molecular simplicity has yet to be determined. Regardless of the mechanistic means by which these similarities are achieved, the convergence suggests the general constraints that limit the range of permitted bursting rates and/or minimize transcription variability. The unexpected conservation of the initiation rate and the correlation time might indicate a path to general rules in transcriptional regulation. It is now possible to inquire about the breadth of these generalities and whether they apply to the same gene expressed in different cell types, to the transcriptome as a whole, or even across organisms. Indeed, it appears plausible that other classes of genes share similarly constrained bursting kinetics. The methods utilized in this study are applicable in a variety of systems and permit the discovery of the molecular mechanism(s) conferring unified transcription kinetics (Zoller, 2018).

Transcription factor binding affinities and DNA shape readout

An essential event in gene regulation is the binding of a transcription factor (TF) to its target DNA. Models considering the interactions between the TF and the DNA geometry proved to be successful approaches to describe this binding event, while conserving data interpretability. However, a direct characterization of the DNA shape contribution to binding is still missing due to the lack of accurate and large-scale binding affinity data. This study use a recently established binding assay to measure with high sensitivity the binding specificities of 13 Drosophila TFs, including dinucleotide dependencies to capture non-independent amino acid-base interactions. Correlating the binding affinities with all DNA shape features, this study found that shape readout is widely used by these factors. A shape readout/TF-DNA complex structure analysis validates this approach while providing biological insights such as positively charged or highly polar amino acids often contact nucleotides that exhibit strong shape readout (Schnepf, 2020).

The binding of transcription factors (TFs) to specific DNA sequences is a key event for the regulation of gene expression. The features defining a binding site have been the focus of several decades of research starting from simple consensus motif binding sites, later replaced by probabilistic models of TF binding assuming that each base contributes independently to the overall affinity, the so-called position-specific weight matrices (PWMs). With the advent of high-throughput methods, binding specificities became available for thousands of TFs and it has become clear that more complex models for binding sites using non-independent nucleotide interactions lead to more accurate predictions than PWMs. Nucleotide correlations can originate from amino acids that contact multiple bases simultaneously or from stacking interactions that determine binding through DNA shape readout. Hence, although determining binding specificities is crucial to predict binding sites in the genome, such data alone are not sufficient to fully describe TF-DNA binding interactions as they do not provide insights about the mechanism the TF employs to bind to different DNA sequences. To elucidate how the TF 'reads' the DNA is of paramount importance not only to improve algorithms predicting binding sites but also to refine fundamental understanding of how TFs are recruited to specific DNA regulatory sequences. To date, two distinct modes of protein-DNA recognition are known: base readout, which reflects the interplay at nucleobase-amino acid contacts mainly driven by the formation of hydrogen bonds, and shape readout, dominated by van der Waals interactions and electrostatic potentials (EPs), that recognizes the 3D structure of the DNA double helix. As a consequence, one can assume that, if the TF uses the shape readout, models incorporating DNA structural information should improve prediction of TF-DNA binding specificities. To test this hypothesis and thereby help model development, it would thus be highly desirable to (1) determine accurately TF-DNA binding specificities, including non-independent nucleotide interactions since deviations from linear binding can carry information about the influence of DNA shape, and (2) use these data to assess the contribution of DNA shape readout to the binding interaction. Despite the availability of techniques able to measure protein-DNA interactions at high throughput such as protein binding microarray (PBM), SELEX-seq, and SMiLE-seq, the accurate measurement of binding affinities remains problematic. Moreover, these methods require a resin- or filter-based selection step that introduces bias and/or use stringent washing protocols resulting in the loss of weak binders, which can lead to erroneously over-specific binding specificities. These limitations are critical, especially to determine higher-order binding interactions, which are intrinsically weak (Schnepf, 2020).

Evaluating the contribution to binding of DNA shape readout also poses challenges. First, although it had been known for along time from crystal structures that. TFs read out the DNA shape, it is still not possible to determine experimentally the DNA shape features at a large scale for any given DNA sequence. However, this would be necessary to quantitatively assess DNA shape influence on TF-DNA binding. This issue has been tackled by Zhou. who introduced 'DNAShape' (Zhou, 2013), an algorithm that predicts structural DNA features from nucleotide sequences, considering at each DNA position a local 5-mers nucleotide environment. The original set of four geometric shape features was later completed by Li (2017), who made tables available to calculate an expanded repertoire of 13 DNA shape features in total. Finally, Chiu (2017) added in a comparable fashion the EP, which approximates the minor-groove EPs. The EP reflects the mean charge density of the DNA back-bone sensed by positively charged amino acid residues of the binding protein. Another difficulty to analyze the influence of DNA shape to binding is that, in spite of all the advances made possible by 'DNAShape' and the succeeding studies, it is still not clear to what degree shape readout can be described as a function of the underling DNA sequence. It is indeed very difficult to tease apart whether a binding protein favors a given nucleotide sequence because it recognizes certain amino acids of this sequence or rather certain shapes features of the DNA helix. An important step was made with homeodomain TFs by Abe (2015), who was able to specifically remove the ability of the binding proteins to read a certain structural feature of DNA and to switch between different modes of DNA shape readouts. Another approach computationally dissects TF binding specificity in terms of base and shape readout (Rube, 2018). Remarkably, that study determined that 92-99% of the variance in the shape features can be explained with a model considering only dinucleotides dependencies. That study also found that interactions were much stronger between neighboring nucleotides than for non-adjacent positions, indicating that these dinucleotide features are the most important for binding. Hence, determining neighboring dinucleotide dependencies should be enough to capture most on the higher-order binding interactions. Unfortunately, although these studies shed new light on the role of DNA shape in TF-DNA recognition, they were limited to the analysis of only a few factors and used only four different shape features. This was due to the lack of quantitative data on higher-order binding specificities and to the lack of tables to calculate other shape features. Thus, a more comprehensive analysis of TF-DNA binding - especially including higher-order dependencies - is urgently needed to better understand TF-DNA binding in general and to what extent DNA shape features are recognized by TFs in particular. Recently, high-performance fluorescence anisotropy (HiP-FA) (Jung, 2018; Jung, 2019), was presented as a method that determines TF-DNA binding energies directly in solution with high sensitivity and at a large scale and allows for measuring the affinity of a TF to any given DNA sequence. These features make HiP-FA an ideal tool to measure TF-DNA binding specificities, in particular the higher-order dependencies since these interactions are generally weak and their accurate measurement is both difficult and indispensable. This study used HiP-FA to measure binding energies for 13 TFs of the Drosophila segmentation gene network belonging to 8 different binding domain families. Their 0th order of binding specificities were determined taking only into account independent base contributions (PWM) and their first order of binding specificities accounting for dinucleotide dependencies represented by the dinucleotide position weight matrices (DPWMs). This work defines DPWMs as being the scoring matrices characterizing the deviations in the dinucleotide binding energies compared to pure PWMs (Schnepf, 2020).

Correlating the affinity data with the 13 known DNA shape features and the EP, it was found that nearly all the factors extensively use shape readout for DNA recognition, independently of the binding domain family. For 11 TFs for which structural information is available, the correlations were examined between their nuclear magnetic resonance (NMR)/co-crystal structures or structures of analog proteins obtained by homology-based modeling and the shape attributes obtained from this analysis. Finally, a cluster analysis was run to test if certain shape features tend to co-occur in the DNA shape readout used by these TFs (Schnepf, 2020).

Correlation between DNA shape readout and structural information is presented for homeodomain proteins Bicoid, Goosecoid and Ocelliless, for the bZip transcription factor Giant, and for the zinc finger transcription factor GATAe (see Correlation between DNA shape readout and structural information) (Schnepf, 2020).

HiP-FA constitutes a powerful tool to quantify TF-DNA binding specificity, especially the non-independent interactions requiring to be determined with high accuracy. The throughput of the method is not sufficient to discover de novo shape motifs or to explore the large sequence space possible with sequencing-based methods like HT-SELEX or SMiLE-seq. However, this is not a major limitation since the prior knowledge that HiP-FA requires (some information about the TF's binding preferences) is known for many TFs, and dinucleotide mutations are sufficient to cover most of the non-independent amino acid-nucleotide interactions. It would also be straightforward to extend the measurements in the flanking regions of the core binding motif (Schnepf, 2020).

By combining directly TF-DNA binding affinities, DNA shape features, and structural information, this study gained insights into their correlation, a debated topic due to their intrinsic covariation. Importantly, the results suggest that DNA shape readout is widespread among the TFs. The extended use of DNA shape readout by TFs has become increasingly apparent over the past years, which comes as no surprise considering that the number of van der Waals interactions enabling shape readout account for two-third of the protein-DNA interactions (Rube, 2018). The correlation analysis of the shape readout values with protein-DNA complex structures leads to a generalization of the influence of the charged amino acids on the shape readout that has been described so far only for homeodomains in the minor groove region of the DNA. This effect is attributed to other DNA secondary structures (such asa-helixes) and to other binding domains. In addition, for the POU domain Nub non-charged but polar residues are described that can also lead to a strong DNA shape readout. These effects onDNA shape readout have not been reported previously. The difficulty to detect the effects of charged and non-charged residues, especially in the major groove, is that they are obscured by the interactions involved in the base readout. This analysis was able to resolve even subtle effects due to the high sensitivity of the binding affinity measurements, and the shape analysis was able to deconvolve, to some extent, shape from base readout. In summary, the binding specificities were determined for 13 Drosophila TFs including first-order depedencies, provided insights into the correlation between their binding affinities to DNA and the shape features of the DNA helix, and gave structural insights in the shape readout. This method could easily be extended to more factors and to different organisms to provide a refined catalog of TF-DNA shape readout landscapes (Schnepf, 2020).

Although the HiP-FA assay allows determination of accurate binding affinities at a relatively large scale, the whole sequence space cannot be covered as high-throughput methods do. To restrict the number of measurements, this study thus focussed on the core binding motif of the TFs, and to all mononucleotide and dinucleotides mutations of the consensus sequence rather that all possible mutations. This should however cover most of the TF-DNA interactions since it has been shown that dinucleotide models explain >92% of the variance for the MGW, ProT, Roll, and HelT shape features (Rube, 2018). In addition, this analysis based on the direct correlation between binding affinities and shape features can only indirectly and partially tease apart the respective contributions of base and DNA shape readouts. Note that how to achieve the deconvolution between base and shape readouts is a longstanding issue in the field (Schnepf, 2020).

GENE STRUCTURE

cDNA clone length - 1.8 kb

Bases in 5' UTR - 112

Exons - two

Bases in 3' UTR - 324

PROTEIN STRUCTURE

Amino Acids - 448

Structural Domains

The sequence of a cDNA from the Drosophila giant gene shows a basic domain followed by a leucine zipper motif. Both features contain characteristic conserved elements of the b-ZIP family of DNA-binding proteins (Capovilla, 1992).

EVOLUTIONARY HOMOLOGS

Breakdown of abdominal patterning in the Tribolium Krüppel mutant jaws: Opposing effects of Kr and gt homologs

During Drosophila segmentation, gap genes function as short-range gradients that determine the boundaries of pair-rule stripes. A classical example is Drosophila Krüppel (Dm'Kr) which is expressed in the middle of the syncytial blastoderm embryo. Patterning defects in Dm'Kr mutants are centred symmetrically around its bell-shaped expression profile. The role of Krüppel was examined in the short-germ beetle Tribolium castaneum where the pair-rule stripes corresponding to the 10 abdominal segments arise during growth stages subsequent to the blastoderm. The previously described mutation jaws is an amorphic Tc'Kr allele. Pair-rule gene expression in the blastoderm is affected neither in the amorphic mutant nor in Tc'Kr RNAi embryos. Only during subsequent growth of the germ band does pair-rule patterning become disrupted. However, only segments arising posterior to the Tc'Kr expression domain are affected, i.e., the deletion profile is asymmetric relative to the expression domain. Moreover, stripe formation does not recover in posterior abdominal segments, i.e., the Tc'Kr^jaws phenotype does not constitute a gap in segment formation but results from a breakdown of segmentation past the 5th eve stripe. Alteration of pair-rule gene expression in Tc'Kr^jaws mutants does not suggest a direct role of Tc'Kr in defining specific stripe boundaries as in Drosophila. Together, these findings show that the segmentation function of Krüppel in this short-germ insect is fundamentally different from its role in the long-germ embryo of Drosophila. The role of Tc'Kr in Hox gene regulation, however, is in better accordance to the Drosophila paradigm (Cerny, 2005).

The most obvious differences between the phenotypes of Krüppel in Tribolium and Drosophila are the homeotic transformations in Tc'Kr^jaws and Tc'Kr RNAi larvae that are not evident in Dm'Kr mutants. Such transformations are not entirely unexpected given that in Drosophila the expression boundaries of Hox genes are also set by gap genes, including Dm'Kr. However, in Drosophila gap mutants all segments that would be transformed because of misregulation of homeotic genes usually also suffer segmentation defects and fail to develop. By contrast, Tribolium segment primordia anterior of, and within, the Krüppel expression domain do differentiate, such that homeotic transformations can manifest themselves in the differentiated larva (Cerny, 2005).

The expression of homeotic genes in Tc'Kr^jaws embryos is consistent with the morphological transformations observed. The results with Tc'Dfd, Tc'Scr, Tc'Antp and Tc'Ubx confirm and extend earlier findings for Tc'pb and Tc'UBX/Tc'ABD-A expression. Notably, the complementary double-segmental expression of Dfd and Scr in Tc'Kr^jaws embryos explains the phenotype of alternating maxillary and labial segments. These expression patterns indicate that the posterior limit of Tc'Dfd and Tc'Scr domains is set through inhibition by Tc'Kr. In this respect, Tc'Kr fulfils a function similar to Drosophila gap genes (Cerny, 2005).

The homeotic phenotype of Tc'gt RNAi embryos could suggest a similar function in Hox regulation for Tc'gt. Indeed Tc'Antp anteriorly expands and gnathal Hox genes (Tc'Scr) repress in Tc'gt RNAi embryos, consistent with the expansion of thoracic fates found in differentiated Tc'gt RNAi larvae. These transformations are just opposite to those of Tc'Kr^jaws larvae. Interestingly, in embryos that lack Tc'Kr and at the same time have reduced Tc'gt activity, the homeotic effect of Tc'Kr^jaws clearly is epistatic. This shows that the ectopic Tc'gt stripes in the Tc'Kr mutant do not contribute to the Tc'Kr phenotype. However, this experiment suggests that the homeotic transformation of gnathal segments into thorax in Tc'gt RNAi embryos is indeed an indirect effect and comes about through misregulation of Tc'Kr in these embryos. This interpretation is supported by the finding that the Tc'Kr expression domain expands anteriorly in Tc'gt RNAi embryos. Evidently, it is expansion of Tc'Kr that results in repression of gnathal Hox genes in maxilla and labium of Tc'gt RNAi embryos, not loss of gnathal Hox gene activation. Similarly, expansion of Tc'Antp in Tc'gt RNAi larvae could be due to activation by anteriorly expanded Tc'Kr. However, as Antp is not significantly reduced in Tc'Kr^jaws, it seems more likely that Tc'gt acts directly to define the anterior boundary of the Tc'Antp domain (Cerny, 2005).

Distinct mechanisms for mRNA localization during embryonic axis specification in the wasp Nasonia

mRNA localization is a powerful mechanism for targeting factors to different regions of the cell and is used in Drosophila to pattern the early embryo. The parasitoid wasp Nasonia (Hymenoptera) undergoes long germ development similar to that of Drosophila, yet is evolutionarily very distant from flies (> 200 MY) and lacks bicoid. During oogenesis of Nasonia, mRNA localization is used extensively to replace the function of the bicoid gene for the initiation of patterning along the antero-posterior axis. Nasonia localizes both caudal and nanos to the posterior pole, whereas giant mRNA is localized to the anterior pole of the oocyte; orthodenticle1 (otd1) is localized to both the anterior and posterior poles. The abundance of differentially localized mRNAs during Nasonia oogenesis provided a unique opportunity to study the different mechanisms involved in mRNA localization. Through pharmacological disruption of the microtubule network, it was found that both anterior otd1 and giant, as well as posterior caudal mRNA localization was microtubule-dependent. Conversely, posterior otd1 and nanos mRNA localized correctly to the posterior upon microtubule disruption. However, actin is important in anchoring these two posteriorly localized mRNAs to the oosome, the structure containing the pole plasm. Moreover, knocking down the functions of the genes tudor and Bicaudal-D mimics disruption of microtubules, suggesting that tudor’s function in Nasonia is different from flies, where it is involved in formation of the pole plasm (Olesnicky, 2007).

Both the Drosophila and Nasonia ovariole are meroistic, meaning that the nurse cells and oocyte are both of germ cell descent and originate from the same primordium, but differentiate during subsequent cell divisions. As each ovarian follicle develops and is positioned more distally along the ovariole, the nurse cells remain attached to one another and to the oocyte through ring canals, which arise from incomplete cleavage during cell division. The 16 sister cells that make up each germline cyst result from four of these incomplete divisions. An egg chamber forms comprising of 15 nurse cells and the oocyte, surrounded by the somatic follicle cells, which form an epithelial layer around the oocyte. Nurse cells produce metabolites and other factors that transit through the ring canals to accumulate in the oocyte (Olesnicky, 2007).

The Drosophila oocyte is specified early during oogenesis as a result of the asymmetric segregation of the fusome, an organelle that connects the 16 cells. Once the oocyte has been specified, the polarity of the oocyte microtubule network becomes extremely dynamic and undergoes a major reorganization resulting from communication between the oocyte and follicle cells. This reorganization is essential to localize maternal mRNAs that will generate the axes of the embryo. At first, microtubule minus ends extend from the nurse cells into the oocyte toward a microtubule organizing center (MTOC) localized at the posterior pole of the oocyte, near its nucleus. Later, however, the posterior MTOC disassembles while multiple MTOCs form toward the anterior of the growing oocyte. At this stage, the microtubules are therefore pointing from the plus end at the posterior of the oocyte to the minus end at the anterior. mRNAs and the oocyte nucleus utilize the polarity of the microtubules to localize to the anterior or posterior pole (Olesnicky, 2007).

Nasonia oogenesis presents striking similarities to that of Drosophila. It is divided into five morphologically distinct stages. In stage 1, the nurse cells and oocyte are indistinguishable until they begin to segregate, with the oocyte lying towards the posterior of the follicle. By stage 2, the nurse cells and a smaller oocyte are clearly distinguishable, as a constriction forms between the oocyte and its supporting nurse cells. At this stage, the oocyte nucleus is positioned in the center of the cell. The oocyte continues to grow throughout stage 3 until it becomes larger than its accompanying nurse cells. Concomitantly, the oocyte nucleus migrates to the dorsal anterior cortex of the developing oocyte, as in Drosophila. Later, during stage 4, the nurse cells begin to degenerate as they empty all their contents into the oocyte. In the final stage (5), a vitelline membrane is constructed around the embryo (Olesnicky, 2007).

This study shows that the localization of four maternal mRNAs is achieved using at least 2 distinct mechanisms. It is shown that, during Nasonia oogenesis, microtubules play a major role in oocyte polarity and in the control of anterior localization of otd1 and gt mRNA and the posterior localization of cad mRNA. In contrast, the actin cytoskeleton is important for anchoring the oosome and is therefore essential for the localization of nanos and otd1 mRNA to the posterior pole of the oocyte (Olesnicky, 2007).

It is proposed that Nasonia utilizes two basic mechanisms for the localization of mRNA, a microtubule-dependent mechanism and an actin-dependent, microtubule-independent one. Anterior localization of gt and otd1 mRNA, as well as posterior localization of cad mRNA, all rely on a similar microtubule-dependent mechanism while posterior localization of otd1 and nos mRNAs relies on actin. In wild-type follicles, cad and gt mRNAs are initially localized, while later in oogenesis this localization is relaxed to achieve a more graded mRNA distribution. otd1 anterior mRNA, although not graded, is also localized loosely in wild-type follicles. nos mRNA localization and posteriorly localized otd1 mRNA, however, are tightly localized to the posterior in a microtubule-independent manner. Interestingly, in freshly laid embryos both posterior otd1 mRNA and nos mRNA are localized to the oosome. Maintaining localization of these two posteriorly localized mRNAs relies on the actin cytoskeleton. Additionally, actin might be required to anchor the oosome to the posterior pole of the oocyte, as well as to trap mRNA to the oosome. It is therefore likely that both mRNAs are localized to structures within the germ plasm, resulting in a tight localization that is maintained throughout oogenesis and early embryogenesis and does not rely extensively on microtubules (Olesnicky, 2007).

giant is a bona fide gap gene in the intermediate germband insect, Oncopeltus fasciatus

Drosophila undergoes a form of development termed long germ segmentation, where all segments are specified nearly simultaneously so that by the blastoderm stage, the entire body plan has been determined. This mode of segmentation is evolutionarily derived. Most insects undergo short or intermediate germ segmentation, where only anterior segments are specified early, and posterior segments are sequentially specified during germband elongation. These embryological differences imply that anterior and posterior segments might rely upon different molecular mechanisms. In Drosophila, embryos mutant for giant show a gap in the anterior as well fusions of several abdominal segments. In Tribolium, a short germ beetle, giant is required for segmental identity, but not for segmental formation, in gnathal segments and also for segmentation of the entire abdomen. This raises the possibility that giant might not act as a gap gene in short and intermediate germ insects. Oncopeltus fasciatus is an intermediate germ insect that is an outgroup to the clade containing Drosophila and Tribolium. The Oncopeltus homolog of giant was cloned and its expression and function during segmentation were determined. Oncopeltus giant was found to be a canonical gap gene in the maxillary and labial segments and also plays a gap-like role in the first four abdominal segments. These results suggest that giant was a bona fide gap gene in the ancestor of these insects with this role being lost in the lineage leading towards Tribolium. This highlights the conservation of anterior patterning and evolutionary plasticity of the genetic regulation controlling posterior segmentation, even in short and intermediate germ insects (Liu, 2010).

giant: Regulation | Developmental Biology | Effects of Mutation | References

date revised: 4 April 2022

The Interactive Fly resides on the
Society for Developmental Biology's Web server.