logo Drosophila genes listed by biochemical function

A genomewide survey of basic helix-loop-helix factors

A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors



Basic helix-loop-helix transcription factors

absent MD neurons and olfactory sensilla
proneural

achaete
proneural - achaete-scute complex

asense
proneural - achaete-scute complex

atonal
bHLH transcription factor - achaete-scute complex - functions as the proneural factor for photoreceptors
and effects the transition from progenitor cells to differentiating neurons - Olfactory receptor neurons
are specified by Atonal and pioneer the formation of the antennal lobe, the first olfactory center in the brain
bigmax
bHLH transcription factor - binding partner of Mondo a nutrient sensor that functions in the fat body - regulates fatty acid synthesis -
facilitates use of sugar-rich nutrient sources - regulates transcription factor Cabut, coordinating energy metabolism
Clock
bHLH and PAS domains

collier (preferred name: knot)
EBF/Olf-1 homolog, HLH protein

cousin of atonal
required for sensory neuron morphology

daughterless
proneural - achaete-scute complex

dimmed (common alternative name: Mist 1-related)
Atonal family bHLH transcription factor - component of a mechanism by which diverse neuroendocrine lineages
differentiate and maintain a pro-secretory state

dmyc
bHLH - leucine zipper

dysfusion
bHLH-PAS protein expressed in tracheal fusion cells and required for tracheal fusion

Enhancer of split
neurogenic - an inhibitor of neural fate - Hairy/E(spl) class

E(spl) region transcript m7
neurogenic - an inhibitor of neural fate - Hairy/E(spl) class

extra macrochaetae
transcription factor - HLH non basic - antagonist of proneural genes - the Notch pathway
is required in combination with Emc to define the proneural cluster which gives rise to the sensory organ precursor cell
48 related 2 (common alternative name: Fer2)
bHLH Transcription Factor - required for the development of a subset of circadian pacemaker neurons and dopaminergic neurons in the protocerebral anterior medial (PAM) and
the protocerebral anterior lateral (PAL) clusters of the brain - required for the survival of the PAM cluster opaminergic neurons in adulthood - oxidative stress response

germ cell-expressed bHLH-PAS (preferred name Methoprene-tolerant)
bHLH-Pas domain transcription factor - involved in juvenile hormone (JH) action
as a likely component of a JH receptor - dimerization partner of Methoprene-tolerant

grainyhead
a maternally expressed bHLH factor that regulates precise timing of gene expression early in development - a key regulator of epidermal
barrier formation and repair - a temporally expressed regulator of neural cell identity functioning during embryonic and larval development

hairy
Hairy/E(spl) class

Hairy/E(spl)-related with YRPW motif
bHLH-Orange protein family - expressed in a subset of newly born neurons that receive Notch signalling -
promotes one of two alternative fates adopted by sibling neurons

Hand
Member of a conserved family of bHLH transcription factors that performs a conserved function in cardiogenesis - also required for hematopoiesis

Helix loop helix protein 106 (common alternative name: Sterol regulatory element-binding protein, SREBP)
bHLH-leucine zipper transcription factor - membrane bound HLH106 is released by phosphatidylethanolamine, exerting feedback control
on the synthesis of fatty acids and phospholipids - cleavage by Drice releases an SREBP fragment to travel to the nucleus
where it mediates the increased transcription of target genes needed for lipid synthesis and uptake

HLH54F
bHLH transcription factor - earliest specific regulator of caudal visceral mesoderm development

knot (common alternative name: collier)
EBF/Olf-1 homolog, HLH protein

lethal of scute
proneural

Resistance to Juvenile Hormone (preferred name Methoprene-tolerant)
bHLH-Pas domain transcription factor - involved in juvenile hormone (JH) action
as a likely component of a JH receptor - dimerization partner of Germ cell-expressed bHLH-PAS

Mitf
bHLH transcription factor - regulates of eye development - controls transcription of all 15 vacuolar-ATPase components - modulator of
metabolism for cellular homeostasis - Mitf, vacuolar-ATPase and TORC1 form a negative regulatory loop that maintains each of these metabolic
regulators in relative balance - control of lysosomal-autophagy pathway

Mondo (alternative names: Mio and mlx indicator)
bHLH leucine zipper transcription factor - coordinates feeding behavior with nutrient availability -
controls fat accumulation in fat body

Mnt
bHLH-zipper transcription factor - Max-interacting transcriptional repressor - associates with the Sin3 corepressor

nautilus
myo D homolog

net
potential repressor

Olig family
HLH transcription factor expressed in the brain and CNS - motor neuron identify factor - regulates axon guidance

Resistance to Juvenile Hormone (preferred name Methoprene-tolerant)
bHLH-Pas domain transcription factor - involved in juvenile hormone (JH) action
as a likely component of a JH receptor

scute
transcription factor - basic helix loop helix - proneural

similar
bHLH-PAS domain transcription factor - a key regulator of responses to hypoxia

single-minded
proneural - spitz group

spineless (also known as spineless-aristapedia)
bHLH PAS domain protein - plays a central role in defining the distal regions of both the antenna and leg

tango
Myc-type, helix loop helix and PAS family protein

target of Pox-n
bHLH, Neurogenin/NeuroD homolog

trachealess
PAS domains

twist
DV pathway

A genomewide survey of basic helix-loop-helix factors

The information contained in the recently published genomic sequence of Drosophila melanogaster was used to identify 12 additional bHLH proteins. By sequence analysis these proteins have been assigned to families defined by Atonal, Hairy-Enhancer of Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domains. In addition, one single protein represents a unique family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue types but are particularly concentrated in the developing nervous system and mesoderm (Moore, 2000).

Two newly identified genes, CG8667 (Mistr) and CG5545 (Doli), both members of the Ato-related family, are expressed in the developing nervous system. CG5545 is closely related to the vertebrate repressor Beta 3 protein (96% sequence identity between fly and vertebrate proteins in the bHLH domain). It is suggested that this protein should be named Doli (Drosophila Olig family) -- the Olig proteins are involved in oligodendritic precursor formation. CG8667 has closest sequence identity to the vertebrate Mist1 protein, a negative regulatory factor of MyoD activity (78% identical over the entire bHLH domain and 92% identical in the basic domain alone). It is proposed that this protein should be named Mistr (Mist 1-related protein). Sequence homology between species does not always imply functional homology. For example, CG8667/Mistr is a Drosophila sequence ortholog of the mammalian Mist1 protein. It is expressed solely in the developing nervous system, whereas Mist1 is expressed not in the nervous system but in gut, pancreas, submandibular gland, lung, and skeletal muscle. In this case, differences in the expression pattern of the genes encoding these proteins argue against any conservation of developmental role (Moore, 2000).

As with the other proteins of the Ato-related family, the genes encoding these proteins are expressed in the developing Drosophila nervous system. CG5545/doli is expressed first in a subset of cells in both the ventral nerve cord (VNC) and the procephalic region at stage 9. The number of cells in these regions expressing the gene increases to a peak at stage 11. By stage 14, levels of expression have fallen such that CG5545/doli is expressed only in a few cells per hemisegment on the ventral surface of the VNC (Moore, 2000).

There is a strong maternal contribution of CG8667/mistr mRNA. Zygotic transcription is initiated at stage 14. It is expressed in bilateral domains in the cephalic region, which, as development proceeds, fuse into a U shape forming part of the ring gland. Concomitant expression of CG8667/mistr also begins in the CNS. By stage 17, CG8667/mistr is in clusters of cells at the anterior and posterior of the VNC and bilaterally in two lateral cells per hemisegment in the VNC (Moore, 2000).

CG10066 (Fer1), CG5952 (Fer2), and CG6913 (Fer3) are related to mammalian p48. These three new bHLH proteins are most closely related to the bHLH domain of the p48 subunit of PTF1, a pancreatic, exocrine cell-specific transcription factor in the mouse, and represent a new bHLH family in Drosophila. These proteins have been named Fer for 48 related. CG10066/Fer1 is 88%, CG5952/Fer2 is 76%, and CG6913/Fer3 is 62% identical to p48 in the bHLH region (Moore, 2000).

CG10066/Fer1 is expressed in the epidermis at the stage when the epidermis begins to secrete cuticle and, therefore, may share a common function with p48 in active exocrine cells. It is first transcribed in the epidermal pads adjacent to the posterior spiracles at stage 15. The expression of this gene quickly spreads over the entire epidermal surface of the embryo and is strongest in epidermis underlying the forming denticle belts (Moore, 2000).

CG5952/Fer2 shows a strong maternal contribution of mRNA in the early embryo. Zygotic expression of this gene begins at stage 10 in an anterior-to-posterior wave in the VNC and the brain. As development proceeds, the number of CG5952/Fer2-positive cells increases, so that by stage 12, the expression domain forms a bilateral, dorsal-posterior, crescent-shaped structure (Moore, 2000).

CG6913/Fer3 is expressed at stage 11 in part of the posterior midgut primordia and stage 12 in part of the anterior midgut primordia. At later stages, expression has been detected in several unidentified cells scattered throughout the embryo (Moore, 2000).

CG10446 (Side) and CG5927 (Her) are in the HES family. CG10446 is most closely related to Deadpan (76% identity in the basic bHLH domain and 62% in the entire bHLH domain). This protein has been named Side (similar to Deadpan). CG5927 is most closely related to the proteins of the Enhancer of split [E(spl)] complex, such as HLHmgamma (76% identity in the basic domain and 51% identity in the entire bHLH domain). CG5927 has been named Her (HES-related). Hairy, Dpn, and the proteins of the E(spl) complex have WRPW at the very C terminus to mediate interaction with Groucho. CG5927/Her and CG10446/Side also end in this motif. All members of the HES proteins mediate transcription repression via their interaction with Groucho. CG10446/Side and CG5952/Her have the WRPW domain required for this interaction, implying that they are highly likely to act via the same mechanism. CG10446/side is expressed solely in the CNS at a stage at which cell differentiation is occurring. It is hypothesized that it may play a role in antagonizing the function of transcription factors involved in the later stages of CNS differentiation (Moore, 2000).

There is a strong maternal contribution of CG10446/side mRNA. Zygotic transcription of the gene begins at stage 12 in a subset of cells in the CNS. CG5927/her has a low level of maternal mRNA contribution and then is expressed ubiquitously throughout embryogenesis (Moore, 2000).

CG12952 (Sage) is distantly related to the Mesp family and is expressed in the salivary gland. CG12952 represents a protein with little sequence similarity to other known proteins. In the neighbor-joining tree, it is placed in the same family as the vertebrate Mesp proteins, which are necessary for mesoderm segmentation initiation (53% identity in the bHLH domains). CG12952 has a strong maternal mRNA contribution in early embryogenesis. Its zygotic expression begins in the salivary gland anlage at stage 10 and persists until stage 15. CG12592 has been named Sage (salivary gland-expressed bHLH) (Moore, 2000).

CG17592 (Dm Usf) is the ortholog of the mammalian USF proteins. CG17592 is the single Drosophila sequence homolog of the USF proteins that are involved in cell proliferation control (92% identical in the basic domain). This protein has been named Dm Usf. Both vertebrate and Drosophila USF are bHLH-zip proteins. Dm Usf has a loop and a second helix region, high in serines, which is greatly diverged from that of mouse and human and, hence, may have lost its ability to dimerize. There is a weak maternal contribution of Dm usf mRNA. At stage 7, Dm usf is expressed in bilateral domains in the ventral cephalic furrow. In later stages (15 onward) of development, Dm usf expression is confined to the proventriculus and a subset of cells in the CNS. This specific expression pattern differs from the ubiquitous USF expression pattern reported in vertebrates (Moore, 2000).

CG6211 (Gce) is closely related to the bHLH-PAS Rst(1)JH protein (78% identity in the bHLH, 68% in the PAS-A, and 86% in the PAS-B domains). Rst(1)JH originally was isolated in a screen to find a Drosophila protein resistant to the Juvenile Hormone Analog insecticide Methoprene. CG6211 transcript is expressed strongly as a maternally supplied message and then later in a subset of the germ cells of the developing embryo. It is suggested that this protein should be named Gce (germ cell-expressed bHLH-PAS) (Moore, 2000).

CG11450 (shout) is expressed during mesoderm formation and in myoblasts. CG11450 represents a member of a new bHLH family. It is expressed first in the dorsal and ventral cellular blastoderm. In the ventral region of the embryo, the gene is expressed continually in the presumptive mesoderm throughout gastrulation and then in a segmented pattern in the ventral mesoderm layer at the extended germ-band stage. It is expressed in the myoblast cells that then migrate dorsally from this layer. The expression pattern of CG11450 overlaps with that of the bHLH transcription factor Twist, suggesting that it may be playing a role in the same mesoderm specification and myogenic pathways; therefore, this gene has been termed shout after "Twist and Shout" by John Lennon and Paul McCartney (Moore, 2000).

The expression domain of CG11450/shout overlaps with that of twist. twist and CG11450/shout continue to be expressed in the presumptive mesoderm during gastrulation. At the extended germ-band stage, both twist and CG11450/shout are expressed in alternating high and low levels along the length of the mesoderm. These alternating expression levels of twist are required for the specification of muscle derived from this tissue. The pattern of CG11450/shout expression in the ventral mesoderm implies that it could have a similar role to twist in specification of mesoderm derivatives. In Drosophila, Twist activates Snail and other downstream, mesoderm-specific regulators such as Tinman, Bagpipe, and Mef2; all of these proteins have vertebrate orthologs implicated in mesoderm development. Hence, CG11450/Shout represents a good candidate for both sequence and function conservation across species (Moore, 2000).

CG18144 (Dm Hand) is the Drosophila ortholog of the vertebrate hand proteins. CG18144 is 76% identical to dHand and 69% homologous in the bHLH domain to eHand; both vertebrate proteins are involved in heart formation. Dm hand expression begins at stage 10 of embryonic development in bilateral stripes in the ventral mesoderm. It continues to be expressed in two tissues derived from this mesoderm, the dorsal vessel (heart) and the circular visceral musculature. In addition, at stage 13 Dm hand mRNA appears in a small subset of cells in the CNS (Moore, 2000).

A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors

Differences in expression, protein interactions, and DNA binding of paralogous transcription factors ('TF parameters') are thought to be important determinants of regulatory and biological specificity. However, both the extent of TF divergence and the relative contribution of individual TF parameters remain undetermined. This study comprehensively identify dimerization partners, spatiotemporal expression patterns, and DNA-binding specificities for the C. elegans bHLH family of TFs, and these data were modeled into an integrated network. This network displays both specificity and promiscuity, as some bHLH proteins, DNA sequences, and tissues are highly connected, whereas others are not. By comparing all bHLH TFs, extensive divergence was found and all three parameters contribute equally to bHLH divergence. This approach provides a framework for examining divergence for other protein families in C. elegans and in other complex multicellular organisms, including humans. Cross-species comparisons of integrated networks may provide further insights into molecular features underlying protein family evolution. A video summary of this article is available online (Grove, 2009).

Specific DNA binding in protein-binding microarray-derived 8-mer data span the full affinity range of DNA binding preferences. Enrichment scores (ESs) were calculated from the PBM signal intensities for all possible 8-mers, and for each bHLH dimer that yielded sequence-specific DNA binding, and position weight matrices (PWMs) were derived for each dimer. A conservative threshold was imposed to identify significantly bound 8-mers. Both the dimers and the 8-mers were hierarchically clustered and it was found that the bHLH proteins can be grouped into two clusters corresponding to different bHLH classes: Cluster I contains HLH-2 (similar to Drosophila Daughterless) and its partners, HLH-1 and HLH-11, and cluster II contains class III, IV, and VI bHLH proteins (Grove, 2009).

As expected, HLH-2-containing dimers (cluster I) exhibit a strong preference for E-box sequences (CANNTG). Surprisingly, however, cluster II dimers, in addition to binding a few E-boxes, also bind multiple non-E-box sequences. These resemble E-boxes, but contain a C or A in the fifth position and a G or T in the sixth position of the binding site (CAYRMK). These 'E-box-like sequences' include the reported CACGCG binding site of Drosophila Hairy, and N-boxes (CACNAG), which are bound by Drosophila Enhancer of Split (Grove, 2009).

The statistical significance was determined of the preference of each bHLH dimer for E-box and E-box-like sequences as compared to all other 8-mers. Neither HLH-2 nor HLH-10 alone can bind significantly to any E-box or E-box-like sequence. However, when combined, they can bind five different sequences. The bHLH DNA binding network also displays degrees of specificity and promiscuity. For instance, only HLH-1 homodimers can bind CAA-containing E-boxes. Some E-boxes and E-box-like sequences are preferred by relatively few dimers, whereas others are bound by many dimers. For example, CACATG is bound by only four dimers, but CACCTG is bound by ten distinct dimers. Conversely, some bHLH dimers bind few E-boxes or E-box-like sequences whereas others bind many: HLH-30 binds only CACGTG, but HLH-2/HLH-10 binds five different E-boxes. This demonstrates that there is specificity and promiscuity in the bHLH DNA binding network, both from the view of the proteins and at the level of their DNA binding sequences (Grove, 2009).

The PBM ES of a particular DNA sequence bound by a dimer is a reflection of relative DNA binding affinities. It was noticed that the ES distribution for 8-mers corresponding to a particular dimer/sequence combination varied greatly. For instance, both HLH-26 and MDL-1/MXL-1 bind CACGTG E-boxes, but HLH-26 does so with a broad ES range and MDL-1/MXL-1 with a very narrow ES range. This suggests that, in contrast to MDL-1/MXL-1, not all CACGTG E-boxes are bound equally well by HLH-26. The possibility is considered that differences may be due to effects of nucleotides flanking the core CACGTG E-box. Indeed, flanking nucleotides have been reported previously to contribute to bHLH dimer DNA binding. However, the effects of nucleotides flanking the E-box and E-box-like sequences had not been analyzed systematically for most bHLH TFs. Since each bHLH monomer may directly contact the flanking nucleotide immediately 5' of the E-box, the influence of this position on relative DNA binding preferences was examined. It was found that for the MDL-1/MXL-1 dimer each of the four possible nucleotides flanking the CACGTG core sequence is recognized approximately equally well; the enrichment score for each relevant 8-mer is between 0.49 and 0.50. However, HLH-26 exhibits a strong preference for a 5' A or G (median 8-mer ES > 0.40), and disfavors a 5' T (median 8-mer ES < 0.10) and, to a lesser extent, a 5' C (Grove, 2009).

Most bHLH proteins exhibit preferences at the 5' flanking nucleotide position and most dimers disfavor a 5' T; this observation is similar to what has been reported for the yeast bHLH homodimer Pho4p. However, there are exceptions: HLH-11 and MDL-1/MXL-1 heterodimer both tolerate a 5' T, and HLH-30 actually favors a 5' T (Grove, 2009).

In summary, both prominent and subtle differences in E-box or E-box-like sequence recognition and flanking site preferences were detected between different bHLH dimers, which likely contribute to target site selection and gene regulation in vivo (Grove, 2009).


REFERENCE

Grove, C. A., et al. (2009). A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138(2): 314-27. PubMed ID: 19632181

Moore, A. W., et al. (2000). A genomewide survey of basic helix-loop-helix factors in Drosophila. Proc. Natl. Acad. Sci. 97: 10436-10441. PubMed ID: 10973473



Drosophila genes listed by biochemical function

Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.

The Interactive Fly resides on the
Society for Developmental Biology's Web server.