logo Drosophila genes listed by biochemical function
Hox (Homeobox) transcription factors

Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences

Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites

A - D || E - N || O - Z

abdominal A
homeodomain - Antennapedia class

Abdominal B
homeodomain - bithorax complex

homeodomain transcription factor (TGIF subclass) - required, along with homeodomain protein Vismay, for spermatogenesis

homeodomain - Antennapedia class

homeodomain - lim domain

homeodomain Pbx class

homeodomain - paired-like

LIM domains and LIM homeodomain

homeodomain - NK-2 class

BarH1 & BarH2


brain-specific homeobox
homeodomain transcription factor - confers neural identity in specific neurons of medulla and lamina of the optic lobe



homeodomain Pbx class

C15 (common alternative name Clawless)
member of the 93E cluster of homeodomain proteins - regulates spatial patterning of the tarsus, a distal portion of the leg -
homolog of vertebrate oncogene Hox11

Chx1 and Chx2 (preferred names: Visual system homeobox 1 ortholog and Visual system homeobox 2 ortholog)
homeodomain transcription factors - markers for the brain central neuroendocrine system termed the pars intercerebralis
that expresses the hormones Drosophila insulin-like peptide (Dilp), FMRF, and myomodulin

homeodomain - cut domain

homeobox gene - contributes to the development of subsets of interneurons via cross-repressive, lineage-specific interactions
with the motoneuron-promoting factors eve and exex

defective proventriculus

homeodomain - Antennapedia class


drifter (preferred name: ventral veinless)
homeodomain - pou domain

empty spiracles

homeodomain - engrailed class - segment polarity gene

homeodomain - pair rule gene

homeodomain - Pbx class

a homeodomain transcription factor - regulates motorneuron cell fate by restricting expression of Even-skipped and Lim2

homeodomain & paired domain (paired box)

homeodomain & paired domain (paired box)

fushi tarazu
homeodomain - Antennapedia class - pair rule gene

gooseberry-proximal (common alternative name: gooseberry-neuro)
homeodomain - paired domain (paired box)

gooseberry-distal (common alternative name: gooseberry)
homeodomain - paired domain (paired box)

homeodomain - paired-like

homeobox, NK decapeptide domain transcription factor - acts within a subclass of early born neurons to link
neuronal subtype identity to neuronal morphology and connectivity

homeodomain - HM domain

intermediate neuroblasts defective
homeodomain protein

homeodomain - engrailed class

Ipou (preferred name: Abnormal chemosensory jump 6)
homeodomain and POU domain

islet (preferred name: tailup)
homeodomain and LIM domain

homeodomain - Antennapedia class

ladybird early and ladybird late
transcription factors - homeodomain proteins

lateral muscles scarcer
homeodomain transcription factor - identity factor for lateral transverse muscles

Lim domain and lim homeodomain

homeodomain - Pbx class

muscle segment homeobox-1

muscle segment homeobox 2 (preferred name: tinman)
homeodomain - NK-2 class

NK1 (preferred name: Slouch)
homeodomain - NK-1 class

NK2 (preferred name: ventral nervous system defective)
homeodomain - NK2 class

Nkx6 (preferred name: HGTX)
homeobox, NK decapeptide domain transcription factor - acts within a subclass of early born neurons to link
neuronal subtype identity to neuronal morphology and connectivity

homeodomain and cut domain

homeodomain and Six domain

homeodomain - paired-like

homeodomain - paired domain (paired box)

POU domain protein 1 (common alternative name: pdm-1)
homeodomain - pou domain

POU domain protein 2 (common alternative name: pdm-2)
homeodomain - pou domain

pou domain motif 3
Pou domain transcription factor required for odor response in a class of olfactory receptor neurons

homeodomain - Antennapedia class

novel homeodomain

PvuII-PstI homology 13
homeodomain transcription expressed in the developing eye - required for rhabdomere morphogenesis and proper detection of light

reversed polarity


homeodomain transcription factor - required for regulation of genes involved in brain morphogenesis

s59 (preferred name: Slouch)
homeodomain - NK-1 class

Sex combs reduced
homeodomain - Antennapedia class

homeodomain transcription factor - confers ventral mesodermal cell fate - regulates somatic cell function during gonadogenesis

shaven (common alternative name: sparkling)
paired domain and homeodomain (partial) - Pax2, 5 and 8 homolog

sine oculis

slouch (common alternative names: S59 and NK-1)
transcription factor - homeodomain - NK-1 class - maintenance of slouch is directly involved in the control of late aspects of muscle development,
such as muscle differentiation and morphogenesis, and possibly also innervation

sparkling (preferred name: shaven)
paired domain and homeodomain (partial) - Pax2, 5 and 8 homolog

tailup (common alternative name: islet)
homeodomain and LIM domain

tinman (common alternative name: NK-4 and msh-2)
homeodomain - NK-2 class

homeodomain - Antennapedia class

homeodomain protein

ventral nervous system defective (common alternative name: vnd or NK2)
homeodomain - NK-2 class

ventral veinless (common alternative name: drifter)
homeodomain - pou domain

homeodomain transcription factor (TGIF subclass) - required, along with homeodomain protein Achintya, for spermatogenesis

Visual system homeobox 1 ortholog and Visual system homeobox 2 ortholog (common alternative names: Chx1 and Chx2)
homeodomain transcription factors - markers for the brain central neuroendocrine system termed the pars intercerebralis
that expresses the hormones Drosophila insulin-like peptide (Dilp), FMRF, and myomodulin

homeodomain - Antennapedia class - DV polarity

Zn finger homeodomain 1
zinc finger domain and homeodomain protein - mutation results in various degrees of local errors in mesodermal cell fate or positioning

Zn finger homeodomain 2
transcription factor - zinc finger domain and homeodomain - required for correct proximal wing development

Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. This study determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. A computational system was developed that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and full 8-mer binding profiles were inferred for the majority of known animal homeodomains. The results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success (Berger, 2008).

It was asked whether the homeodomain monomer binding preferences identified in vitro reflect sequences preferred in vivo. Anecdotally, the highest predicted binding sequences do correspond to known in vivo binding sites. For example, in the predicted 8-mer profile for sea urchin Otx, a previously identified in vivo binding sequence (TAATCC, from the Spec2a RSR enhancer), is contained in the top predicted 8-mer sequence, and, more strikingly, it is embedded in the fifth-highest predicted 8-mer sequence (TTAATCCT). At greater evolutionary distance, three of the four Drosophila Tinman binding sites in the minimal Hand cardiac and hematopoietic (HCH) enhancer are contained within the second (TCAAGTGG), fifth (ACCACTTA), and ninth (GCACTTAA) ranked 8-mers (the fourth overlaps the 428th ranked 8-mer [CAATTGAG], but also overlaps with a GATA binding site and may have constraints on its sequence in addition to binding Tinman) (Berger, 2008).

To ask more generally whether occupied sites in vivo contain sequences preferred in vitro, six ChIP-chip or ChIP-seq data sets in the literature were examined that involved immunoprecipitation of homeodomain proteins that were analyzed, or homologs of proteins analyzed that shared at least 14 of the 15 DNA-contacting amino acids. In all cases, enrichment was observed for monomer binding sites in the neighborhood of the bound fragments, with a peak at the center. Two examples, Drosophila Caudal and human Tcf1/Hnf1 are shown. For Caudal, the size of this ratio peak increased dramatically with E score cutoff, indicating that the most preferred in vitro monomer binding sequences correspond to the most enriched in vivo binding sites (51% of bound fragments have such an 8-mer, versus 17% in randomly selected fragments). For Tcf1/Hnf1, however, the majority of sequences bound in vivo do not contain the best in vitro binding sequences, although most do contain at least one 8-mer with E > 0.45 (53%, versus 27% in random fragments), suggesting utilization of weaker binding sites. Similar results were obtained with PWMs. Thus, the requirement for highest-affinity binding sequences may vary among homeodomain proteins, species, or under different physiological contexts. Nonetheless, a large proportion of the in vivo binding events apparently involve the monomeric homeodomain sequence preferences, which can be derived in vitro (Berger, 2008).

Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites

The comprehensive characterization of homeodomain DNA-binding specificities is described for Drosophila melanogaster. The analysis of all 84 independent homeodomains from Drosophila reveals the breadth of DNA sequences that can be specified by the homeodomain. The majority of these factors can be organized into 11 different specificity groups, where the preferred recognition sequence between these groups can differ at up to four of the six core recognition positions. Analysis of the recognition motifs within these groups led to a catalog of common specificity determinants that may cooperate or compete to define the binding site preference. With these recognition principles, a homeodomain can be reengineered to create factors where its specificity is altered at the majority of recognition positions. This resource also allows prediction of homeodomain specificities from other organisms, which is demonstrated by the prediction and analysis of human homeodomain specificities (Noyes, 2008).

A bacterial one-hybrid (B1H) system was used that allowed the specificities of a DNA-binding domain. Using this system, the DNA-binding specificities was characterized for all 84 homeodomains in Drosophila that are not associated with an additional DNA-binding domain as well as 16 mutant homeodomains with changes in residues that contribute to DNA recognition. This analysis reveals a diverse array of DNA-binding specificities with a minimum of seventeen unique specificities in Drosophila, of which the majority of homeodomains can be clustered into 11 specificity groups (see Clustering of the 84 Drosophila homeodomains). Members of a given specificity group typically share common recognition residues. Combining this data with previous structural and biochemical work on the homeodomain family, a detailed set of recognition determinants is proposed and evaluated for homeodomains and this information was used to broadly and accurately predict the specificities of homeodomains in the human genome (see Comparison of the predicted and determined recognition motifs for 6 human homeodomains; Noyes, 2008).

Remarkable diversity exists in the B1H-determined DNA-binding specificities for the entire set of homeodomains. The conservation of Asn51, which specifies Ade at binding site position 3, in combination with the ability to infer the orientation of each homeodomain on its binding site provides a basis for aligning all of these recognition sequences. Using this master alignment, hierarchical clustering of the Drosophila homeodomains was performed based on the similarity of their DNA-binding specificities. The majority of these factors can be organized into eleven different specificity groups and the average specificity of these groups was determined for the purposes of comparison. In this analysis, only the core 6 base pair element recognized by these factors was used. Consistent with the idea that many homeodomain proteins prefer similar TAAT-related motifs, slightly more than half (43) of the homeodomains fall into the Antp or En specificity groups. There are also a number of specificity groups, such as the Abd-B and NK-1 group, which differ in sequence preference from the Antp or En groups at only one or two positions. However, other groups, such as the TGIF-Exd group, differ at four positions relative to the Antp or En groups. Outside of these specificity groupings are six factors that exhibit unique specificities. The observed diversity of specificities reveals the adaptability of the homeodomain architecture for the recognition of a variety of DNA sequences (Noyes, 2008).

The contribution of specific residues toward binding site preference for one or more group members has been demonstrated in previous studies. This study used correlations between the average group recognition motifs and the amino acid distributions at key DNA recognition positions to systematically describe the characteristics of each group that lead to differences in binding specificity (Noyes, 2008).

  • Antp and En groups: The largest groups of homeodomains provide a reference point to describe how differences in amino acid sequence correlate with DNA-binding specificity. The Antp and En groups share similar recognition motifs and amino acid distributions at the key recognition positions. However, at binding site position 5, the En group prefers Thy, whereas the Antp group tolerates either Gua or Thy. There is a corresponding difference at amino acid position 54: Ala for the En group and Met for the Antp group. In the Antp-DNA structure, the side chain of Met54 is neighboring this base pair (Noyes, 2008).
  • Bcd group: Typical homeodomains utilize Lys50 to specify Cyt at binding site positions 5 and 6 through the interaction of Lys50 with the complementary Gua at these positions. This results in a consensus sequence of TAATCC (Noyes, 2008).
  • NK-1, Bar and Ladybird groups: Many of these homeodomains are members of the NK or DL homeodomain classes and generally have Thr at position 47 or 54. Compared to the Antp and En groups, the homeodomains with Thr47 have reduced specificity at binding site positions 4 and/or 5 (Noyes, 2008).
  • NK-2 group: The members of this group prefer Gua at position 4, due to an interaction between Tyr54 and the complementary Cyt. Their specificities vary at binding site position 1, which correlates with differences at residues 6 and 7 of the N-terminal arm (Noyes, 2008).
  • Abd-B group: These factors prefer Thy over Ade at position 2. In Abd-B, this preference has been mapped to amino acid positions 3, 6 and 7 of the N-terminal arm; however, the variability within the N-terminal arm precludes a simple correlation of binding preference and amino acid sequence (Noyes, 2008).
  • Atypical homeodomains: The atypical groups generally prefer Gua at binding site position 2, and Cyt and Ade at positions 4 and 5. In CG11617, the Iroquois group and the TGIF group, the preference for Cyt and Ade at positions 4 and 5 correlates with the presence of Arg54, consistent with the structure of MATα2. The single exception to this correlation, Onecut, contains a unique residue (Met50), which may contribute to its distinct binding preference. Likewise, with the exception of the Iroquois group, homeodomains that contain Arg55 prefer Gua at position 2, consistent with the Exd and Pbx structures (Noyes, 2008).
  • TGIF-Exd group: The data are consistent with previously described specificities for individual members of the TGIF - Exd group (TGA(C/t)A).
  • Six group: All members of this group (So, Six4 and Optix) display a specificity that overlaps with the recognition motif TGATAC and share identical residues at the key DNA-recognition positions. The data are consistent with a known So motif [(T/C)GATAC]. A discrepancy between these data and a motif (TAAT) reported for an Optix homolog, Six3, was investigated in the analysis of human homeodomains (Noyes, 2008).
  • Iroquois group; The monomeric motif (ACA) reflects part of the palindromic, homodimer binding site (ACANNTGT) for a full-length Mirr protein. Homeodomains in this group have weak preferences at binding site positions 1 and 2, despite containing notable specificity determinants (Arg5 and Arg55). One striking feature of the Iroquois group is Ala at position 8. In other homeodomains, a large hydrophobic residue at this position binds in a cleft formed by the homeodomain helices and appears to position the N-terminal arm over the 5' end of the binding site. To examine the effect of residue 8 on Iroquois specificity, an Ala8Phe mutation was introduced into Caup. This mutation restores, albeit incompletely, the anticipated specificity at positions 1 and 2. The incomplete transformation suggests that additional determinants also contribute to specificity at the 5' end of the binding site (Noyes, 2008).

    This assessment of the typical and atypical superclasses suggests two overlapping, but distinct sets of protein-DNA interactions. Both classes generally share Arg5 and Asn51, which typically specify Thy and Ade at binding site positions 1 and 3, as well as common set of phosphate contacting residues, which should result in a similar docking arrangement of all of these homeodomains with the DNA. Thus, specificity differences between these homeodomains primarily arise from distinct combinations of residues that directly interact with DNA or that influence these contact residues, rather than changes in the overall conformation of the homeodomain-DNA complex (Noyes, 2008).

    This study provides a complete analysis of homeodomain specificities in a metazoan and it dramatically increases the number of characterized homeodomains in this Drosophila, as only 18 of 84 had any binding site information in the FlyREG database. This study has found that the homeodomain family displays an extensive range of specificities in which a wide variety of bases can be preferred at most positions within the core 6 bp binding site. Overall, the majority of homeodomains (93%) in this dataset can be clustered into 11 different specificity groups with an additional 6 homeodomains that display unique specificities. This clustering strategy allowed description of how common variations in residues at a given position in the homeodomain contribute to differences in specificity. However, even within these groups there are homeodomains that display differences in binding site preference. For example, members of the NK-2 group differ in their base preference at the 5'-most position and Exd specificity clearly differs from other members of the TGIF group. In addition, differences outside the core 6 base pair binding site motifs lead to further diversity among homeodomain specificities. Thus, the 17 specificities described by the 11 groups and 6 unique homeodomains represent the minimum number of different specificities recognized by Drosophila homeodomains (Noyes, 2008).

    This analysis demonstrates that the overall sequence similarity between two homeodomains is a useful, but sometimes misleading indicator of the degree of similarity in their DNA-binding specificities. Once factors are clustered into specificity groups, it is possible to compare binding specificity with their degree of sequence homology. As expected, a substantial correlation between sequence similarity and preferred recognition motif is observed. However, multiple examples were found where pairs of closely related homeodomains cluster into different specificity groups. In both naturally-occurring and engineered homeodomains, single amino acid changes at putative DNA recognition positions are sufficient to alter specificity. These observations illustrate the importance of defining the amino acid positions that contribute to variations in binding site specificity in order to make accurate specificity predictions (Noyes, 2008).


    Berger, M. F., et al. (2008). Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133(7): 1266-76. PubMed ID: 18585359

    Noyes, M. B., et al. (2008). Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133(7): 1277-89. PubMed ID: 18585360

  • top of page

    Drosophila genes listed by biochemical function

    Home page: The Interactive Fly © 1995, 1996 Thomas B. Brody, Ph.D.

    The Interactive Fly resides on the
    Society for Developmental Biology's Web server.