Interactive Fly, Drosophila

Goosecoid: Biological Overview | Evolutionary Homologs | Regulation | Developmental Biology | References

Gene name - Goosecoid

Synonyms - D-Gsc

Cytological map position - 21C-D

Function - Transcription factor

Keywords - Brain, stromatogastric nervous system, foregut

Symbol - Gsc

FlyBase ID:FBgn0010323

Genetic map position - 2-[107]

Classification - Homeodomain - paired-type

Cellular location - nuclear

NCBI link: Entrez Gene

Gsc orthologs: Biolitmine

BIOLOGICAL OVERVIEW

The Drosophila Goosecoid homolog is expressed in two domains during embryogenesis. During early cellularization an expression domain anterior to the cephalic furrow gives rise to cells that ultimately will be located in the brain hemispheres, while a second domain invaginates inside the stomodeum along with ectodermal cells to give rise to the stomatogastric nervous system, ring gland and foregut.

Goosecoid expression in vertebrates takes place in the organizer, the compartment associated with the dorsal lip of the blastopore which functions in the recruitment of cells for involution through the blastopore and is responsible for self-determination of cells into dorsal mesoderm. Goosecoid expressing cells induce neighboring cells to migrate toward the anterior of the embryo and enhance involution. The migrating cells contribute mainly to the most anterior involuting tissues, namely anterior endoderm and head mesoderm. Thus Goosecoid appears to mimic properties of the "organizer" (Niehrs, 1993).

What is evolutionarily conserved with respect to GSC function when comparing between vertebrates and invertebrates? Any comparison is complicated by the fact that vertebrates are Deuterostomes (the mouth forms as a secondary consequence far away from the blastopore, which gives rise to the anus), while insects are Protostomes (the mouth forms at or near the blastopore). In all Protostomes, the stomodeum is the anterior end of the blastopore. In Drosophila the cells invaginating in the stomodeum (those that express Gsc) constitute the anterior-most structure of the embryo. The stomatogastric nervous system is believed to arise from the labrum, the most anterior embryonic segment (Schmidt-Ott, 1994).

In vertebrates, Gsc is expressed in the dorsal lip of the blastopore, the region leading the invagination of the mesoderm during gastrulation that will give rise to the head process and the anterior-most tissues of the vertebrate embryo (prechordal plate mesoderm). Thus Gsc is expressed in the foregut of both flies and vertebrates. Therefore Gsc expression in Drosophila and vertebrates could be evolutionarily related (Goriely, 1996).

Are the same genes involved in regulation of vertebrate Gsc as those involved in regulating Drosophila Gsc? In vertebrates, nodal, a member of the TGF-ß superfamily, has been implicated as a signal for induction of axial mesoderm during gastrulation. Activin too has been considered as a candidate for this role, but experiments with follistatin, an activin antagonist, lead to the conclusion that activin is not a requirement for mesoderm induction (Jones, 1995 and references). In the zebrafish, nodal expression leads to trunk duplication, but no obvious head duplications. Nodal induces ectopic Gsc expression and axis duplication in zebrafish. This observation is similar to results reported in Xenopus using other inducing factors in the TGF-ß superfamily. In contrast, complete axis duplications can be achieved by injection of certain WNT mRNAs in Xenopus (Toyama, 1995 and references). Interestingly, wingless (the Drosophila wnt prototype) expression is required not as an instructive signal, but as a permissive factor that coordinates the spatial activity of morphoregulatory signals within the stomatogastric nervous system anlage (González-Gaitán, 1995). Currently, the regulation of the stomodeal expression of Drosophila Gsc remains a mystery (Goriely, 1996).

The implication of this work is that the stomatogastric nervous system is of ancient origin, possibly older than the brain itself. The reason for this conclusion is its proximity to what is potentially the remnants of the organizer, a structure common to protostomes and deuterostomes. Expression in flies of Gsc in the brain anlage, albeit occurring earlier than expression in the stomatogastric anlage, represents an evolutionary expansion of a more primitive nervous system. A corollary to this argument is that both the spinal cord of vertebrates and the ventral cord of invertebrates are also derivative structures.

Drosophila goosecoid participates in neural development but not in body axis formation

Study of goosecoid expression and the effects of gsc mutation have lead to a much better understanding of the development of the stomatogastric nervous system. goosecoid expression overlaps with one of the three SNS precursor groups invaginating from the foregut anlage. At stages 14 and 15, the SNS precursor cells that have invaginated migrate as three separate groups (or pouches) following apparently independent paths; their luminal sides are well separated. At the end of migration, the gsc expressing cells derived from the anterior-most pouch are aligned as a curved array of cells along the esophagus. These cells represent the first esophageal ganglion (EG1) of the SNS, a chain of 12-15 neurons associated with the dorsal surface of the esophagus. It is possible that the gsc expressing portion of the ring gland is derived from the anterior-most SNS invagination, since the ring gland in other insects in connected to the enteric nervous system via nerve connections (Hahn, 1996).

In embryos homozygous for a lethal allele of gsc, the anterior-most and middle pouches undergo a defective fusion event at stage 13, rather than remaining distinct. The two pouches stay fused while in migration, at least until stage 14. The two fused pouches in migration contain fewer than the normal number of gsc expressing cells. Subsequently, only one of these cells reaches the wall of the esophagus. It remains in this mutant condition and may represent a residual esophageal ganglion (EG1). The fusion of the pouches is most likely due to the exclusion of gsc expressing EG1 precursors from the anterior-most pouch. It is likely that gsc is required for the birth, survival or invagination (detachment from the foregut anlage) of EG1 precursor cells. The recurrent nerve, which connects the frontal ganglion to EG1 and the second esophageal ganglion (EG2) with two connectives in wild-type, has only one connective linked to EG2 in mutant embryos. In a second gsc mutant, representing a true null allele, a late forming (stage 17) hairpin loop region (normally found at the pharynx-esophagus junction) collapses and the esophagus is swayed anteriorly. Shortly thereafter, the midgut protrudes dorsally, compressing the dorsal-most tissues of the embryo. This defect could explain the embryonic lethality of the null allele. It is likely that cells under the control of gsc in the brain are not massively eliminated in gsc mutants, but there may be subtle defects in the brain region (Hahn, 1996).

Transcriptional integration of Wnt and Nodal pathways in establishment of the Spemann organizer

Signaling inputs from multiple pathways are essential for the establishment of distinct cell and tissue types in the embryo. Therefore, multiple signals must be integrated to activate gene expression and confer cell fate, but little is known about how this occurs at the level of target gene promoters. During early embryogenesis, Wnt and Nodal signals are required for formation of the Spemann organizer, which is essential for germ layer patterning and axis formation. Signaling by both Wnt and Nodal pathways is required for the expression of multiple organizer genes, suggesting that integration of these signals is required for organizer formation. This study demonstrates transcriptional cooperation between the Wnt and Nodal pathways in the activation of the organizer genes Goosecoid (Gsc), Cerberus (Cer), and Chordin (Chd). Combined Wnt and Nodal signaling synergistically activates transcription of these organizer genes. Effectors of both pathways occupy the Gsc, Cer and Chd promoters and effector occupancy is enhanced with active Wnt and Nodal signaling. This suggests that, at organizer gene promoters, a stable transcriptional complex containing effectors of both pathways forms in response to combined Wnt and Nodal signaling. Consistent with this idea, the histone acetyltransferase p300 is recruited to organizer promoters in a Wnt and Nodal effector-dependent manner. Taken together, these results offer a mechanism for spatial and temporal restriction of organizer gene transcription by the integration of two major signaling pathways, thus establishing the Spemann organizer domain (Reid, 2012).

The Wnt and Nodal pathways cooperate to activate transcription of the organizer genes Gsc, Cer, and Chd utilizing adjacent Wnt and Nodal responsive cis-regulatory elements present in the proximal promoters close to the start site of transcription. Functional conservation of these promoters is apparent in the sequence of the response elements, the proximity of the two elements, and their distance from the start site of transcription. The Sia/Twn response is mediated by defined P3 elements present in each of the promoters. Elements mediating the FoxH1-dependent response to Nodal signals have been identified in close proximity to the Sia/Twn elements of each promoter, but are less conserved in sequence. For Gsc, Cer and Chd, the two response elements are in close proximity and are separated by no more than 43 bp. And in each case, the pair of response elements has a strikingly similar location within 250 bp of the start site of transcription. These similar features of three organizer gene promoters argue for functional conservation in mediating the transcriptional response to Wnt and Nodal signaling inputs (Reid, 2012).

At enhancer regions, multiple bound transcription factors may interact to synergistically activate a strong transcriptional output. A number of mechanisms may account for synergy, including cooperative binding to regulatory elements, cooperative recruitment of coactivators, as well as alterations in DNA conformation or nucleosome deposition. The synergy in activation of Gsc, Cer, and Chd may reflect one or several of these mechanisms. While it remains unclear whether cooperative binding is occurring among the Wnt and Nodal effectors, the data clearly demonstrate that the steady state binding of transcriptional effectors is increased when both Wnt and Nodal pathway effectors occupy these promoters . This suggests that the presence of Sia/Twn with FoxH1 and Smad2/3 at organizer gene promoters facilitates enhanced occupancy, which is suggestive of cooperative binding (Reid, 2012).

The common coactivator and lysine acetyltransferase, p300, is recruited to organizer gene promoters in response to both the Wnt and Nodal pathways. The role that p300 plays in the synergistic transcription of organizer genes in response to Wnt and Nodal is not yet understood. Overexpression of p300 alone has no apparent phenotype, suggesting that increasing p300 levels does not alter expression of target genes. The results demonstrate a requirement for p300 activity in the expression of a Gsc reporter, as well as increased occupancy of p300 at organizer promoters in the presence of Sia/Twn or Nodal signals. However, no further enhancement of p300 occupancy is observed in response to the combination of Wnt and Nodal. Perhaps p300 provides a permissive function for transcription, while other recruited coactivators provide an activating function. Similarly, p300 could be acting as a scaffolding protein, either stabilizing a transcriptional complex of both Wnt and Nodal effectors, or allowing effectors to interact with other coactivators and/or the basal transcriptional machinery. The combined effects of Wnt and Nodal inputs could enhance p300 enzymatic activity, resulting in more extensive modification of local histones or transcription factors and increased transcription. In the context of organizer gene expression, changes in histone H3K9/14 or H4K5/8/12/16 acetylation have not been observed in response to Wnt or Nodal signals. However, p300 is also known to modify other lysine residues in histone tails, such as H3K18/27, as well as transcription factors. Activated Smad2/3 is acetylated by p300, which increases transcriptional activity. Preliminary results indicate that Sia is acetylated, however, it is unclear what role acetylation might play in Sia-dependent transcription, or whether other Nodal or Wnt effectors might be acetylated in a signal-dependent manner (Reid, 2012).

It is difficult to relate the experimental induction of organizer gene expression with combinations of Sia/Twn and Nodal to the natural activation of these genes in the intact embryo. It is hypothesized that the temporal and spatial restriction of organizer gene expression is due, at least in part, to the presence of Sia, Twn and Nodal effectors in the cells of the organizer. However, the increase in organizer gene expression observed in response to Sia+Xnr1 or Twn+Xnr1 is much greater than the endogenous expression levels of Gsc, Chd or Cer in the whole embryo. Similarly, expression of the Gsc-luciferase reporter in dorsal blastomeres results in an approximately 10-fold increase in luciferase activity, which is much lower than the nearly 36 to 48-fold induction observed in response to Sia+Xnr1 or Twn+Xnr1. It is hypothesized that an increase in ectopic axis formation would be observed in response to low doses of Sia+Xnr1 or Twn+Xnr1, but consistent results were not obtained. This issue might be more clearly addressed by timed loss of function experiments to specifically inhibit Sia/Twn or Nodal activity during organizer formation. It also seems likely that a number of other transcription factors, such as specific repressors of organizer gene expression may be involved in the formation of the organizer domain (Reid, 2012).

This work has defined a molecular mechanism for the transcriptional integration of Wnt and Nodal signals at organizer gene promoters in the Xenopus gastrula. It is further proposed that this mechanism is likely utilized in multiple vertebrate species to establish the organizer transcriptional domain. Support for the conservation of this mechanism across vertebrates comes from regulatory similarities in organizer formation, organizer gene expression and organizer gene promoter structure. Wnt and Nodal signals are essential for organizer gene expression and organizer formation in Xenopus, zebrafish, chick and mouse. The functional organization of organizer gene promoters is also conserved to an extent. Most strikingly in the case of Gsc, highly conserved DE and PE elements are present in the Xenopus, zebrafish, chick, mouse, and human Gsc genes. For Cer, conserved response elements are present in Xenopus, zebrafish and mouse, but their organization differs among species. For Chd, the available genomic information is insufficient for a conclusive comparison. The effectors of Nodal signaling, FoxH1 and Smad2/3, are also utilized in the control of organizer gene transcription in these vertebrate systems (Reid, 2012).

In contrast to these many conserved features of organizer gene regulation, Sia and Twn are only found in amphibian species, and not in other vertebrates. Given that Wnt inputs and the PE element are conserved across species, it is likely that functional homologs of Sia/Twn, mediating the Wnt-dependent transcriptional activation via the PE, exist in other vertebrate species. Alternatively, Sia/Twn may serve a regulatory function that is unique to organizer gene regulation in Xenopus; if this is the case, conservation of the PE may reflect distinct regulatory requirements among species. It should be noted that Sia/Twn are not the only species-specific regulators of organizer formation. In zebrafish, the transcriptional repressor bozozok is a direct target of the Wnt pathway, is expressed early in organizer formation, and is essential for organizer gene expression and organizer formation. However, as is the case for Sia/Twn, no vertebrate orthologs of bozozok have been identified. Whether functional homologs of Sia/Twn and bozozok exist in other species or whether these factors carry out species-specific regulatory functions remains to be seen. Given the dramatically different sizes and developmental rates for vertebrate embryos, and the non-autonomous function of the organizer, temporal and spatial constraints for organizer formation may differ among species. The non-conserved regulatory components found in Xenopus and zebrafish may be necessary for the unique regulatory demands of organizer formation in distinct species (Reid, 2012).

A number of important aspects of organizer gene regulation remain undefined. The full composition and structure of the activating protein complex, which forms at organizer gene promoters, is yet to be defined. How the Wnt and Nodal pathway effectors interact physically, what modifications occur in response to cofactor recruitment, and how together these result in enhanced, yet spatially restricted transcriptional output, are important mechanistic questions to pursue. The results offer a molecular mechanism for the initiation of organizer gene expression in a spatially and temporally precise manner. However, organizer gene expression is a dynamic process with changing regulatory inputs as development proceeds. Within 60 min of the initiation of organizer gene expression it is likely that promoter occupancy and regulatory complex composition changes dramatically as the initiation phase gives way to the maintenance phase or cell lineage specification. Whether the mechanism that are proposed in this paper for the initiation of organizer gene expression is broadly applicable to the many known organizer genes, and across species as well, will require genome wide analyses of effector occupancy, coregulator recruitment, and chromatin modification in several vertebrate species. Ongoing studies such as these will provide profound mechanistic insight at the interface of transcriptional control and embryonic pattern formation (Reid, 2012).

Cells within the organizer domain receive Wnt and Nodal signals and integrate these signals to generate temporally and spatially specific transcriptional responses. Wnt and Nodal inputs are directly received at multiple organizer gene promoters, and functional interactions among pathway effectors result in strong transcriptional activation of organizer genes. Integration of these signals is accomplished by assembly of an activating complex, consisting of Sia, Twn, FoxH1, Smad2/3, and p300 at the Gsc, Cer, and Chd promoters. In the late blastula, cells receiving both Wnt and Nodal inputs integrate these signals at the level of organizer gene promoters, thus establishing a temporally and spatially distinct transcription domain, resulting in formation of the Spemann organizer (Reid, 2012).

Transcription factor binding affinities and DNA shape readout

An essential event in gene regulation is the binding of a transcription factor (TF) to its target DNA. Models considering the interactions between the TF and the DNA geometry proved to be successful approaches to describe this binding event, while conserving data interpretability. However, a direct characterization of the DNA shape contribution to binding is still missing due to the lack of accurate and large-scale binding affinity data. This study use a recently established binding assay to measure with high sensitivity the binding specificities of 13 Drosophila TFs, including dinucleotide dependencies to capture non-independent amino acid-base interactions. Correlating the binding affinities with all DNA shape features, this study found that shape readout is widely used by these factors. A shape readout/TF-DNA complex structure analysis validates this approach while providing biological insights such as positively charged or highly polar amino acids often contact nucleotides that exhibit strong shape readout (Schnepf, 2020).

The binding of transcription factors (TFs) to specific DNA sequences is a key event for the regulation of gene expression. The features defining a binding site have been the focus of several decades of research starting from simple consensus motif binding sites, later replaced by probabilistic models of TF binding assuming that each base contributes independently to the overall affinity, the so-called position-specific weight matrices (PWMs). With the advent of high-throughput methods, binding specificities became available for thousands of TFs and it has become clear that more complex models for binding sites using non-independent nucleotide interactions lead to more accurate predictions than PWMs. Nucleotide correlations can originate from amino acids that contact multiple bases simultaneously or from stacking interactions that determine binding through DNA shape readout. Hence, although determining binding specificities is crucial to predict binding sites in the genome, such data alone are not sufficient to fully describe TF-DNA binding interactions as they do not provide insights about the mechanism the TF employs to bind to different DNA sequences. To elucidate how the TF 'reads' the DNA is of paramount importance not only to improve algorithms predicting binding sites but also to refine fundamental understanding of how TFs are recruited to specific DNA regulatory sequences. To date, two distinct modes of protein-DNA recognition are known: base readout, which reflects the interplay at nucleobase-amino acid contacts mainly driven by the formation of hydrogen bonds, and shape readout, dominated by van der Waals interactions and electrostatic potentials (EPs), that recognizes the 3D structure of the DNA double helix. As a consequence, one can assume that, if the TF uses the shape readout, models incorporating DNA structural information should improve prediction of TF-DNA binding specificities. To test this hypothesis and thereby help model development, it would thus be highly desirable to (1) determine accurately TF-DNA binding specificities, including non-independent nucleotide interactions since deviations from linear binding can carry information about the influence of DNA shape, and (2) use these data to assess the contribution of DNA shape readout to the binding interaction. Despite the availability of techniques able to measure protein-DNA interactions at high throughput such as protein binding microarray (PBM), SELEX-seq, and SMiLE-seq, the accurate measurement of binding affinities remains problematic. Moreover, these methods require a resin- or filter-based selection step that introduces bias and/or use stringent washing protocols resulting in the loss of weak binders, which can lead to erroneously over-specific binding specificities. These limitations are critical, especially to determine higher-order binding interactions, which are intrinsically weak (Schnepf, 2020).

Evaluating the contribution to binding of DNA shape readout also poses challenges. First, although it had been known for along time from crystal structures that. TFs read out the DNA shape, it is still not possible to determine experimentally the DNA shape features at a large scale for any given DNA sequence. However, this would be necessary to quantitatively assess DNA shape influence on TF-DNA binding. This issue has been tackled by Zhou. who introduced 'DNAShape' (Zhou, 2013), an algorithm that predicts structural DNA features from nucleotide sequences, considering at each DNA position a local 5-mers nucleotide environment. The original set of four geometric shape features was later completed by Li (2017), who made tables available to calculate an expanded repertoire of 13 DNA shape features in total. Finally, Chiu (2017) added in a comparable fashion the EP, which approximates the minor-groove EPs. The EP reflects the mean charge density of the DNA back-bone sensed by positively charged amino acid residues of the binding protein. Another difficulty to analyze the influence of DNA shape to binding is that, in spite of all the advances made possible by 'DNAShape' and the succeeding studies, it is still not clear to what degree shape readout can be described as a function of the underling DNA sequence. It is indeed very difficult to tease apart whether a binding protein favors a given nucleotide sequence because it recognizes certain amino acids of this sequence or rather certain shapes features of the DNA helix. An important step was made with homeodomain TFs by Abe (2015), who was able to specifically remove the ability of the binding proteins to read a certain structural feature of DNA and to switch between different modes of DNA shape readouts. Another approach computationally dissects TF binding specificity in terms of base and shape readout (Rube, 2018). Remarkably, that study determined that 92-99% of the variance in the shape features can be explained with a model considering only dinucleotides dependencies. That study also found that interactions were much stronger between neighboring nucleotides than for non-adjacent positions, indicating that these dinucleotide features are the most important for binding. Hence, determining neighboring dinucleotide dependencies should be enough to capture most on the higher-order binding interactions. Unfortunately, although these studies shed new light on the role of DNA shape in TF-DNA recognition, they were limited to the analysis of only a few factors and used only four different shape features. This was due to the lack of quantitative data on higher-order binding specificities and to the lack of tables to calculate other shape features. Thus, a more comprehensive analysis of TF-DNA binding - especially including higher-order dependencies - is urgently needed to better understand TF-DNA binding in general and to what extent DNA shape features are recognized by TFs in particular. Recently, high-performance fluorescence anisotropy (HiP-FA) (Jung, 2018; Jung, 2019), was presented as a method that determines TF-DNA binding energies directly in solution with high sensitivity and at a large scale and allows for measuring the affinity of a TF to any given DNA sequence. These features make HiP-FA an ideal tool to measure TF-DNA binding specificities, in particular the higher-order dependencies since these interactions are generally weak and their accurate measurement is both difficult and indispensable. This study used HiP-FA to measure binding energies for 13 TFs of the Drosophila segmentation gene network belonging to 8 different binding domain families. Their 0th order of binding specificities were determined taking only into account independent base contributions (PWM) and their first order of binding specificities accounting for dinucleotide dependencies represented by the dinucleotide position weight matrices (DPWMs). This work defines DPWMs as being the scoring matrices characterizing the deviations in the dinucleotide binding energies compared to pure PWMs (Schnepf, 2020).

Correlating the affinity data with the 13 known DNA shape features and the EP, it was found that nearly all the factors extensively use shape readout for DNA recognition, independently of the binding domain family. For 11 TFs for which structural information is available, the correlations were examined between their nuclear magnetic resonance (NMR)/co-crystal structures or structures of analog proteins obtained by homology-based modeling and the shape attributes obtained from this analysis. Finally, a cluster analysis was run to test if certain shape features tend to co-occur in the DNA shape readout used by these TFs (Schnepf, 2020).

Correlation between DNA shape readout and structural information is presented for homeodomain proteins Bicoid, Goosecoid and Ocelliless, for the bZip transcription factor Giant, and for the zinc finger transcription factor GATAe (see Correlation between DNA shape readout and structural information) (Schnepf, 2020).

HiP-FA constitutes a powerful tool to quantify TF-DNA binding specificity, especially the non-independent interactions requiring to be determined with high accuracy. The throughput of the method is not sufficient to discover de novo shape motifs or to explore the large sequence space possible with sequencing-based methods like HT-SELEX or SMiLE-seq. However, this is not a major limitation since the prior knowledge that HiP-FA requires (some information about the TF's binding preferences) is known for many TFs, and dinucleotide mutations are sufficient to cover most of the non-independent amino acid-nucleotide interactions. It would also be straightforward to extend the measurements in the flanking regions of the core binding motif (Schnepf, 2020).

By combining directly TF-DNA binding affinities, DNA shape features, and structural information, this study gained insights into their correlation, a debated topic due to their intrinsic covariation. Importantly, the results suggest that DNA shape readout is widespread among the TFs. The extended use of DNA shape readout by TFs has become increasingly apparent over the past years, which comes as no surprise considering that the number of van der Waals interactions enabling shape readout account for two-third of the protein-DNA interactions (Rube, 2018). The correlation analysis of the shape readout values with protein-DNA complex structures leads to a generalization of the influence of the charged amino acids on the shape readout that has been described so far only for homeodomains in the minor groove region of the DNA. This effect is attributed to other DNA secondary structures (such asa-helixes) and to other binding domains. In addition, for the POU domain Nub non-charged but polar residues are described that can also lead to a strong DNA shape readout. These effects onDNA shape readout have not been reported previously. The difficulty to detect the effects of charged and non-charged residues, especially in the major groove, is that they are obscured by the interactions involved in the base readout. This analysis was able to resolve even subtle effects due to the high sensitivity of the binding affinity measurements, and the shape analysis was able to deconvolve, to some extent, shape from base readout. In summary, the binding specificities were determined for 13 Drosophila TFs including first-order depedencies, provided insights into the correlation between their binding affinities to DNA and the shape features of the DNA helix, and gave structural insights in the shape readout. This method could easily be extended to more factors and to different organisms to provide a refined catalog of TF-DNA shape readout landscapes (Schnepf, 2020).

Although the HiP-FA assay allows determination of accurate binding affinities at a relatively large scale, the whole sequence space cannot be covered as high-throughput methods do. To restrict the number of measurements, this study thus focussed on the core binding motif of the TFs, and to all mononucleotide and dinucleotides mutations of the consensus sequence rather that all possible mutations. This should however cover most of the TF-DNA interactions since it has been shown that dinucleotide models explain >92% of the variance for the MGW, ProT, Roll, and HelT shape features (Rube, 2018). In addition, this analysis based on the direct correlation between binding affinities and shape features can only indirectly and partially tease apart the respective contributions of base and DNA shape readouts. Note that how to achieve the deconvolution between base and shape readouts is a longstanding issue in the field (Schnepf, 2020).

GENE STRUCTURE

cDNA clone length - 2372

Bases in 5' UTR - 464

Exons - 3

Bases in 3' UTR - 635

PROTEIN STRUCTURE

Amino Acids - 419

Structural Domains

The homeodomain at residue position 284-345 contains a lysine at postion 50. This is a landmark of proteins encoded by genes like bicoid, orthodenticle and sine oculis that are involved in patterning the anterior region of the Drosophila embryo (Goriely, 1996).

The Engrailed homeoprotein is a dominantly acting, so-called 'active' transcriptional repressor, both in cultured cells and in vivo. When retargeted via a homeodomain swap to the endogenous fushi tarazu gene (ftz), Engrailed actively represses ftz, resulting in a ftz mutant phenocopy. Functional regions of Engrailed have been mapped using this in vivo repression assay. In addition to a region containing an active repression domain identified in cell culture assays, there are two evolutionarily conserved regions that contribute to activity. The one that does not flank the HD is particularly crucial to repression activity in vivo. This domain is present not only in all engrailed-class homeoproteins but also in all known members of several other classes, including goosecoid, Nk1, Nk2 (vnd) and muscle segment homeobox. The repressive domain is located in the eh1 region, known as 'region three', found several hundred amino acids N-terminal to the homeodomain. The consensus sequence, arrived at by comparing Engrailed, Msh, Gsc, Nk1 and NK2 proteins from a variety of species, consists of a 23 amino acid homologous motif found in all these proteins. Thus Engrailed's active repression function in vivo is dependent on a highly conserved interaction that was established early in the evolution of the homeobox gene superfamily. Using rescue transgenes it has been shown that the widely conserved in vivo repression domain is required for the normal function of Engrailed in the embryo (Smith, 1996).

Goosecoid: Evolutionary Homologs | Regulation | Developmental Biology | References

date revised: 1 Dec 97

The Interactive Fly resides on the
Society for Developmental Biology's Web server.