Bearded
EVOLUTIONARY HOMOLOGS

Enhancer of split m4 and malpha and Tom and Bob

Many cell fate decisions in higher animals are based on intercellular communication governed by the Notch signaling pathway. Developmental signals received by the Notch receptor cause Suppressor of Hairless (Su(H)) mediate transcription of target genes. In Drosophila, the majority of Notch target genes known so far is located in the Enhancer of split complex [E(spl)-C], encoding small basic helix-loop-helix (bHLH) proteins that presumably act as transcriptional repressors. The E(spl)-C contains three additional Notch responsive, non-bHLH genes: m4 and malpha are structurally related, whilst m2 encodes a novel protein. All three genes depend on Su(H) for initiation and/or maintenance of transcription. The two other non-bHLH genes within the locus, m1 and m6, are unrelated to the Notch pathway: m1 might code for a protease inhibitor of the Kazal family, and m6 for a novel peptide. The five genes described in this paper are arrayed between mbeta and m7, both coding for bHLH proteins. Two other bHLH genes, m3 and m5 are intermingled with the five. Bearded and M4 are 16% identical. Furthermore, in transcripts of both Brd and m4 there are three common regulatory sequence motifs within the 3' UTR. These are known as the 'Brd box', the 'GY box' and the 'K box'. As in m4, the sequence motif of the Brd box is found twice in the 3'-UTR of malpha mRNA at similar positions but without a GY box. None of the other four non-bHLH E(spl)-C genes contains either Brd or GY box. The K box appears to be more common. It is found twice in the 3'-UTR of malpha and once each in the 3' UTRs of m2 and m6 (Wurmbach, 1999).

malpha and m4 embyonic expression patterns are nearly indistinguishable, and appear very similar to those of E(spl)-C bHLH genes, particularly m5, m7 and m8. The expression patterns suggest that both genes are under the same regulatory control as are the E(spl) bHLH genes and thus, might serve a role in Notch mediated cell differentiation. Surprisingly, also m2 transcripts accumulate in a pattern reminiscent of the transcript distribution of E(spl) bHLH genes, although there are no structural similarites with either the bHLH or the m4/malpha genes. Therefore m2 might serve as a Notch target gene. Unlike the other E(spl)-C genes, the gene is expressed within neuronal cells in the embryo. m6 mRNA accumulates in the CNS, brain and PNS, and in imaginal tissues. m1 is expressed in the digestive tract. Su(H) is shown to be the transmitter of Notch signaling to malpha, m4 and m2. Thus there are three types of Notch responsive genes. The bHLH genes are represented by m8 and others. m4 and malpha share structural similarity with Bearded. These Bearded family proteins share a presumptive basic amphipatic alpha-helical domain but differ with regard to other conserved sequence elements. m2, coding for a novel protein, represents the third class of Notch responsive genes (Wurmbach, 1999).

Brd and most genes of the E(spl)-C (including both bHLH genes and m4) are also subject to common modes of negative post-transcriptional regulation via defined sequence motifs present in their 3' UTRs. In particular, K boxes (TGTGAT) and Brd boxes (AGCTTTA), which are broadly distributed within the 3' UTRs of these genes, mediate negative regulation of transcript accumulation and translational efficiency. Two Brd boxes and two K boxes have been identified in the 3' UTR of malpha, a K box and a Brd box in the 3' UTR of Twin of m4 (Tom), and two K boxes in the 3' UTR of Brother of Bearded (Bob). Moreover, the second K box in Bob is directly adjacent to a CAAC motif, a sequence that has been implicated in augmentation of regulation by an associated K box. Bob's 3' UTR does not contain a canonical Brd box, but does contain a 7/7 match to a variant of the Brd box (TGCTTTA) found in the D. hydei ortholog of E(spl)m4. Overall, the presence of canonical K box and Brd box sequences in the 3' UTRs of Bob, Tom and malpha strongly suggests that most genes of the Brd-C and E(spl)-C are subject to the same two modes of negative post-transcriptional regulation (Lai, 2000a).

A third class of conserved 3' UTR sequence motif, the GY box (GTCTTCC), is also shared by Brd and genes of the E(spl)-C. Although the precise function of this motif is poorly understood, it is possible that this motif has a role in forming RNA:RNA duplexes with a complementary sequence motif (the proneural box, AATGGAAGACAAT) found in the 3' UTRs of proneural genes, including ac, lethal of scute. and atonal (ato). The 3' UTRs of both Bob and Tom each contain a pair of GY boxes. Closer examination of the GY boxes of Bob, Tom and Brd reveals an unexpected degree of sequence identity in the nucleotides flanking the GY box heptamer in Brd-C genes. The GY boxes of Tom are found within a 19/19 direct repeat in the Tom 3' UTR, while Bob's GY boxes fall within a 15/15 direct repeat in its 3' UTR. Moreover, an exact 16-bp sequence including a GY box is common to the 3' UTRs of Brd, Bob, and Tom, and all five GY boxes in these Brd-C genes are contained within an exact 15/15 identity. It is striking that this latter sequence is exactly complementary to a 15-nt sequence shared by proneural boxes located in the ato and l'sc 3' UTRs. That the GY boxes of all Brd-C genes should share such an exceptional relationship with the proneural boxes of divergent proneural genes located on different chromosomes (ato and l'sc) strongly suggests that these complementary sequence elements are subject to common constraint. The two GY boxes in the 3' UTR of E(spl)m4 are more related to the extended GY box consensus just described than are the GY boxes of most E(spl)-C bHLH transcripts. Thus, the constraint on m4's GY boxes similarly appears to extend well beyond the core seven nucleotides of this motif, in a way that is also evidently connected to the proneural box sequence. Finally, the 3' UTR segments containing the second GY box of Bob and the first GY box of Tom are related by an extraordinary 32-nt exact identity. That members of distinct subfamilies of the Brd gene family should share such an extended GY box-containing identity further underscores the sequence constraint associated with this motif, and may suggest the existence of a common 'partner' gene for Bob and Tom that carries a complementary sequence. The complementary 3' UTR sequence motifs found in proneural genes and Brd family genes can mediate the formation of RNA:RNA duplexes in vitro. Since transcripts of members of the proneural gene family and the Brd gene family co-accumulate in all developmental settings where neurogenesis occurs, it is suggested that these RNA:RNA duplexes also form in vivo, although the possible regulatory consequences of this association remain to be determined (Lai, 2000a).

During Drosophila development, transcriptional activation of genes of the Enhancer of split Complex (E(spl)-C) is a major response to cell-cell signaling via the Notch (N) receptor. Although the structure and function of the E(spl)-C have been studied intensively during the past decade, these efforts have focused heavily on seven transcription units that encode basic helix-loop-helix (bHLH) repressors; the non-bHLH members of the complex have received comparatively little attention. In this report, the structure, regulation and activity of the m1, m2 and m6 genes of the E(spl)-C are examined. E(spl)m2 and E(spl)m6 are found to encode divergent members of the Bearded (Brd) family of proteins, bringing to four (malpha, m2, m4 and m6) the number of Brd family genes in the E(spl)-C. The expression of both m2 and m6 is responsive to N receptor activity and both genes are apparently direct targets of regulation by the N-activated transcription factor Suppressor of Hairless. Consistent with this, both are expressed specifically in multiple settings where N signaling takes place. Particularly noteworthy is the finding that m6 transcripts accumulate both in adult muscle founder cells in the embryo and in a subset of adepithelial (muscle precursor) cells associated with the wing imaginal disc. Overexpression of either m2 or m6 interferes with N-dependent cell fate decisions in adult PNS development. Surprisingly, while misexpression of m6 impairs lateral inhibition, overexpression of m2 potentiates it, suggesting functional diversification within the Brd protein family. Reported here are initial studies of the structure, expression and regulation of the newest member of the Brd gene family, Ocho, which is located in the recently identified Bearded Complex (Lai, 2000b).

The predicted protein products of both m2 and m6 contain highly basic domains with amphipathic character near their N termini. The basic domain of m2 is most similar to that of Tom, in that both contain a proline residue within this region. The basic domain in m6 is found at its extreme N terminus; it is likewise predicted to form a largely alpha-helical, strongly amphipathic structure. Thus, it is clear that a defining structural characteristic of Brd family proteins is present in both m2 and m6. Classification of m2 and m6 as Brd family proteins is further bolstered by the presence of two short sequence domains that are widely shared by members of the family. It is noted that the motif NXANE(K/R)(L/M) is common to m6, ma and m4, while Tom, Bob and Brd contain related sequences at comparable positions. A second motif, (I/L/V)P(L/V)X(F/Y)XXTXXGTFFW, is found near the C terminus of malpha, m2, m4 and Tom, while m6 contains the clearly related sequence VXXXXTXXGSFYW. The motif DRW(A/V)QA found at the extreme C-termini of ma, m4 and Tom is not present in m2 and m6. The 3' UTRs of both m2 and m6 contain single copies of the K box (TGTGAT), a negative post-transcriptional regulatory motif previously observed to be widely distributed among the 3' UTRs of genes in both the Brd-C and the E(spl)-C. The identification of E(spl)m2 and E(spl)m6 as members of the Brd family brings the number of Brd family genes in the E(spl)-C to four (Lai, 2000b).

Uniquely among the known members of the Brd-C, Ocho has a strong concentration of predicted Su(H) binding sites in its proximal upstream region, a feature more typical of Brd family members in the E(spl)-C. Within the first 720 bp 5' to the presumed Ocho transcription start site, there are five sequences fitting the high-affinity Su(H) site consensus YGTGDGAA. In addition, a predicted high-affinity binding site (GCAGGTG) for proneural bHLH activators is found quite close to the start site at position -94. All five predicted Su(H) binding sites upstream of Ocho, as well as the single predicted proneural protein binding site, are indeed bound in vitro by the respective purified fusion proteins. These results suggest that Ocho is a direct target of regulation both by proneural bHLH activators and by Su(H) and the N pathway. By contrast to all other known Brd family genes, Ocho does not appear to include in its 3' UTR any of the known or putative post-transcriptional regulatory sequence elements (Brd box, K box, or GY box). Consistent with its possible regulation by proneural proteins and by N signaling, Ocho is expressed in external sensory organ and chordotonal organ proneural clusters in the wing, eye-antenna, haltere and leg discs. Ocho transcripts also appear in a very thin band in the vicinity of the morphogenetic furrow of the developing retina, evidently corresponding to a single column of cells. Strikingly, at most sites of its accumulation in imaginal disc epithelia, the majority of the Ocho transcript is apparently localized in very small apical 'dots', with markedly less signal in more basal positions. This same predominantly apical concentration of transcripts has not been observed for other Brd family genes, and its significance and control in the case of Ocho are under investigation (Lai, 2000b).

It seems reasonable to postulate that an ancestral Brd family gene encoded a protein resembling the present-day E(spl)malpha, E(spl)m4, Tom and Ocho products, with their four shared domains. Though now significantly diverged in overall amino acid sequence, these paralogous proteins are very similar in size (138-158 aa) and have very similar domain organization. This proposal is supported by the existence of such an archetypal Brd family member (158 aa) in the silk moth Bombyx mori. The Lepidoptera and Diptera diverged approximately 200 million years ago, indicating that the Brd gene family is at least this old. The Brd and Bob proteins can be viewed as truncations of this archetypal Brd family protein, suggesting that a common ancestor of the Brd and Bob genes might have arisen by acquiring a premature termination codon. The E(spl)m2 and E(spl)m6 genes may have derived independently from an archetypal progenitor or progenitors; their predicted protein products can be seen to represent the loss of one [E(spl)m6] or two [E(spl)m2] of the four canonical domains, along with expansions or contractions in the length of non-conserved regions between the remaining domains. It is likely that these evolutionary changes in the domain composition of the Brd, Bob, m2 and m6 proteins contribute to functional diversity in this family (Lai, 2000b).

The only structural element common to all eight Brd family proteins is the N-terminal basic amphipathic domain. These domains are themselves quite diversified and are classifiable into three groups: 'very strongly' amphipathic (Brd and Bob), 'less strongly' amphipathic (malpha, m4 and m6), and proline-containing (m2, Tom and Ocho). The observation that all Brd family proteins tested, with the exception of m2, induce qualitatively (but not quantitatively) similar phenotypes in GAL4-UAS misexpression experiments (i.e., interference with N pathway activity) suggests that they may interact with a common target, though the quality of the interaction may be influenced by the type of basic amphipathic domain present. The diversity of expression patterns among Brd family genes is no less striking. In both embryonic and imaginal tissue, these genes are deployed in a myriad of locations in which N signaling is used to elicit cellular responses and/or determine cell fates, and evidence is presented that all Brd family genes are direct targets of transcriptional regulation by Su(H). Nevertheless, the precise expression pattern of each Brd family member is unique, such that different combinations of Brd family genes are active at different sites of N pathway activity. Thus, the members of this family are differentially responsive to regulation by N receptor activity. The observation that promoter-reporter constructs for all Brd family genes tested to date recapitulate the expression pattern of the corresponding endogenous gene demonstrates that the selectivity of this response is mediated largely at the transcriptional level. Thus, it is suggested that evolution of transcriptional cis-regulatory sequences has been a major mechanism for diversification of Brd family gene expression and probably function as well (Lai, 2000b).

Enhancer of split m4 and malpha and Tom and Bob; targets of Post-transcriptional regulation

Micro RNAs are a large family of noncoding RNAs of 21-22 nucleotides whose functions are generally unknown. A large subset of Drosophila RNAs has been shown to be perfectly complementary to several classes of sequence motif previously demonstrated to mediate negative post-transcriptional regulation. These findings suggest a more general role for micro RNAs in gene regulation through the formation of RNA duplexes (Lai, 2002).

A new strategy of gene regulation was defined by the activities of Caenorhabditis elegans let-7 and lin-4. These RNA molecules of 21-22 nt are complementary to the 3' untranslated regions (UTRs) of target transcripts and mediate negative post-transcriptional regulation through RNA duplex formation. Several recent reports now reveal that a large family of RNAs of 21-22 nt, collectively termed micro RNAs (miRNAs), exists in organisms as diverse as worms, flies and humans. Although it was presumed that many of these new miRNAs would also act in post-transcriptional gene regulation, initial searches did not reveal obvious targets based on sequence complementarity (Lai, 2002).

In Drosophila, two 3'-UTR sequence motifs, the K box (cUGUGAUa) and the Brd box (AGCUUUA) mediate negative post-transcriptional regulation. Although originally identified in the 3' UTRs of Notch pathway target genes encoding basic helix-loop-helix (bHLH) repressors and Bearded family proteins, modes of regulation mediated by both motifs are spatially and temporally ubiquitous. This suggests that at least some of the many other Drosophila transcripts that contain K boxes or Brd boxes in their 3' UTRs are also actively regulated by these motifs. Since RNA-binding proteins typically show relatively relaxed binding specificities, it was hypothesized that an RNA component might be involved in recognition of these highly constrained motifs. This was bolstered by the finding that another motif common to the 3' UTRs of many of the same Notch pathway target genes, the GY box (uGUCUUCC), is complementary to and mediates RNA duplex formation through the proneural box (AUGGAAGACAAU), a motif located in the 3' UTRs of transcripts encoding proneural bHLH activators, (Lai, 2002).

Drosophila miRNAs encoded by 11 of 21 distinct genomic miRNA loci are complementary to the K box at their 5' end, with all but miR-11 having a perfect (8/8) antisense match to the extended K box consensus (UAUCACAG). Notably, the most 5' nucleotide of miR-11 is a cytosine residue, making it complementary to the second most common nucleotide at this position in identified K boxes. In addition, perfect antisense matches to the Brd box and GY box were found at the 5' ends of fly miR-4 and fly miR-7, respectively. The precise complementarity of these miRNAs to K box, Brd box and GY box motifs suggests that they bind these sequences in 3' UTRs and, in the case of the former two motifs, mediate negative post-transcriptional regulation. Complementarity between miRNAs and 3' UTRs extends beyond core sequence motifs in many cases, providing additional support for the existence of the proposed RNA duplexes. Examples exist of extended miRNA complementarity to 3' UTRs containing K boxes, Brd boxes and GY boxes. Complements to all three sequence motifs are located exclusively at the 5' ends of miRNA, suggesting that some aspect of regulation may be shared by these different miRNAs. For example, a common factor might be involved in the recognition or stabilization of these short miRNA-3' UTR duplexes (Lai, 2002).

Several miRNAs complementary to K boxes (miR-11 and the miR-2b and miR-13 subfamilies) are broadly expressed throughout Drosophila development, consistent with their proposed involvement in temporally ubiquitous regulation mediated by K boxes; the GY box-complementary miRNA miR-7 is similarly broadly expressed during development. The expression of the single identified Brd box-complementary miRNA miR-4 is restricted to embryogenesis. However, since the search for miRNAs has not yet been saturating, other miRNAs complementary to Brd boxes that are expressed later in development might yet be found (Lai, 2002).

The regulatory role of the K box and Brd box in other organisms has not yet been tested. Nevertheless, the presence of their complements in worm and human miRNAs suggests that these modes of regulation have potentially been conserved. Notably, the complements to these motifs are also located specifically at the 5' ends of miRNA. The restricted location of complements in these different species further suggests that the regulatory targets of many other miRNAs will be determined by the sequence of their 5' ends. In agreement with this idea, most of the known lin-4 and let-7 target sequences also involve perfect complements with the 5' ends of these miRNAs. Systematic searches for the complements of other 5' miRNA ends in 3' UTRs may therefore identify new post-transcriptional regulatory sequence elements. It should be noted, however, that despite the existence of three conserved sites in the lin-14 3' UTR that include perfect complements to lin-4, normal regulation of lin-14 actually depends on variant lin-4 binding sites containing a bulged nucleotide in the 5' complementary region. Thus, this rule is probably not absolute (Lai, 2002).

Initially, miRNAs are transcribed as RNAs of approximately 70 nt containing a stem-loop structure; these are cleaved by the RNAse III enzyme Dicer to generate the mature miRNA. Curiously, only a single strand of the duplex precursor stem structure is generally stable and is recovered as miRNA. The model proposed here may help to explain this phenomenon, since the strand that is complementary to these identified 3' UTR motifs is nearly exclusively the one that is isolated as miRNA. The single exception is miR-5, whose sequence contains a K box. Notably, miR-5 and the K box-complementary miRNAs miR-6-1,2,3 (whose loci are incidentally located next to each other in the genome) are complementary at 20 of 21 continuous nucleotide positions. This suggests that miR-5 might influence or possibly interfere with the ability of miR-6-1,2,3 to interact with 3' UTRs that contain K boxes (Lai, 2002).

Negative regulation by K box- and Brd box-complementary miRNA must differ from lin-4-mediated regulation, because K boxes and Brd boxes have significant, though distinct, effects on both transcript stability and translational efficiency, whereas lin-4 is thought to act at a step following translational initiation. The GY box does not seem to have a strong effect at the cis-regulatory level. Other miRNAs may show additional regulatory capacities; efforts are underway to understand the different molecular mechanisms of regulation mediated by miRNA-3' UTR RNA duplexes (Lai, 2002).


Bearded: Biological Overview | Regulation | Developmental Biology | Effects of Mutation | References

Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.

The Interactive Fly resides on the
Society for Developmental Biology's Web server.