Protein-DNA binding: complexities and multi-protein codes

13
SURVEY AND SUMMARY Protein–DNA binding: complexities and multi-protein codes Trevor Siggers 1, * and Raluca Gorda ˆn 2, * 1 Department of Biology, Boston University, Boston, MA 02215, USA, 2 Departments of Biostatistics and Bioinformatics, Computer Science, and Molecular Genetics and Microbiology, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA Received September 6, 2013; Revised October 16, 2013; Accepted October 22, 2013 ABSTRACT Binding of proteins to particular DNA sites across the genome is a primary determinant of specificity in genome maintenance and gene regulation. DNA- binding specificity is encoded at multiple levels, from the detailed biophysical interactions between proteins and DNA, to the assembly of multi-protein complexes. At each level, variation in the mechan- isms used to achieve specificity has led to difficulties in constructing and applying simple models of DNA binding. We review the complexities in protein–DNA binding found at multiple levels and discuss how they confound the idea of simple rec- ognition codes. We discuss the impact of new high- throughput technologies for the characterization of protein–DNA binding, and how these technologies are uncovering new complexities in protein–DNA recognition. Finally, we review the concept of multi-protein recognition codes in which new DNA- binding specificities are achieved by the assembly of multi-protein complexes. INTRODUCTION In the mid 1970s, motivated by early X-ray crystal struc- tures of proteins and DNA, Seeman et al. (1) proposed a protein–DNA recognition code based on hydrogen bonding patterns between amino acids and bases. For example, in the DNA major groove, an arginine side- chain can make two hydrogen bonds with a guanine base, but not with any other base. Therefore, an Arg- Gua residue base pairing provides a mechanism—or code—for proteins to preferentially select for guanines. Although aesthetically pleasing, 30 years of subsequent analyses of protein–DNA complexes and interactions have demonstrated myriad complications with such a simple recognition code. We briefly summarize findings from structural and biochemical studies that explain why simple protein–DNA recognition codes do not exist. Side-chain flexibility and water molecules Protein side-chains and water molecules in the binding interface create a flexible network of interactions that can readily adopt new conformations in response to changes in DNA or protein sequence (2–5). Two examples illustrate the relevant complexities. Comparing protein–DNA crystal structures for wild-type and mutant zinc finger protein Zif268/Egr1, it was demonstrated that a single amino acid mutation (Zif268 D20A) could lead to altered protein side-chain conformations and DNA- binding preferences (6). Further, the side-chain re- arrangement occurred without a change in the protein docking geometry, but resulted in a new, positioned water molecule in the binding interface for one of the DNA sequences. Structural comparison of two Cre re- combinase variants in complex with different DNA se- quences revealed that both DNA and protein differences affect the contacts made in the binding interface (7). In complex with the same LoxM7 DNA sequence, two Cre variants, which differ at only three residue positions, had distinctly different conformations for the common side- chains mediating base contacts. Further, comparison of one of the variants bound to a different LoxP DNA site revealed that recognition of the two DNA sequences, LoxM7 and LoxP, led to altered side-chain conformations and different side-chain and water-mediated contacts. These studies highlight the complex interplay between side-chains, water molecules and DNA bases in the binding interface. DNA structure and indirect readout ‘Direct readout’ (also known as ‘base readout’) refers to the situation where proteins discriminate different bases in a DNA sequence via direct (or water-mediated) *To whom correspondence should be addressed. Tel: +1 617 358 7118; Fax:+1 617 353 6340; Email: [email protected] Correspondence may also be addressed to Raluca Gordan. Tel: +919 684 9881; Fax: +919 668 0795; Email: [email protected] Nucleic Acids Research, 2013, 1–13 doi:10.1093/nar/gkt1112 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Nucleic Acids Research Advance Access published November 16, 2013 by guest on April 20, 2016 http://nar.oxfordjournals.org/ Downloaded from

Transcript of Protein-DNA binding: complexities and multi-protein codes

SURVEY AND SUMMARY

ProteinndashDNA binding complexities andmulti-protein codesTrevor Siggers1 and Raluca Gordan2

1Department of Biology Boston University Boston MA 02215 USA 2Departments of Biostatistics andBioinformatics Computer Science and Molecular Genetics and Microbiology Institute for Genome Sciencesand Policy Duke University Durham NC 27708 USA

Received September 6 2013 Revised October 16 2013 Accepted October 22 2013

ABSTRACT

Binding of proteins to particular DNA sites acrossthe genome is a primary determinant of specificityin genome maintenance and gene regulation DNA-binding specificity is encoded at multiple levelsfrom the detailed biophysical interactions betweenproteins and DNA to the assembly of multi-proteincomplexes At each level variation in the mechan-isms used to achieve specificity has led todifficulties in constructing and applying simplemodels of DNA binding We review the complexitiesin proteinndashDNA binding found at multiple levels anddiscuss how they confound the idea of simple rec-ognition codes We discuss the impact of new high-throughput technologies for the characterization ofproteinndashDNA binding and how these technologiesare uncovering new complexities in proteinndashDNArecognition Finally we review the concept ofmulti-protein recognition codes in which new DNA-binding specificities are achieved by the assemblyof multi-protein complexes

INTRODUCTION

In the mid 1970s motivated by early X-ray crystal struc-tures of proteins and DNA Seeman et al (1) proposed aproteinndashDNA recognition code based on hydrogenbonding patterns between amino acids and bases Forexample in the DNA major groove an arginine side-chain can make two hydrogen bonds with a guaninebase but not with any other base Therefore an Arg-Gua residue base pairing provides a mechanismmdashorcodemdashfor proteins to preferentially select for guaninesAlthough aesthetically pleasing 30 years of subsequentanalyses of proteinndashDNA complexes and interactionshave demonstrated myriad complications with such a

simple recognition code We briefly summarize findingsfrom structural and biochemical studies that explain whysimple proteinndashDNA recognition codes do not exist

Side-chain flexibility and water molecules

Protein side-chains and water molecules in the bindinginterface create a flexible network of interactions thatcan readily adopt new conformations in response tochanges in DNA or protein sequence (2ndash5) Twoexamples illustrate the relevant complexities ComparingproteinndashDNA crystal structures for wild-type and mutantzinc finger protein Zif268Egr1 it was demonstrated that asingle amino acid mutation (Zif268 D20A) could lead toaltered protein side-chain conformations and DNA-binding preferences (6) Further the side-chain re-arrangement occurred without a change in the proteindocking geometry but resulted in a new positionedwater molecule in the binding interface for one of theDNA sequences Structural comparison of two Cre re-combinase variants in complex with different DNA se-quences revealed that both DNA and protein differencesaffect the contacts made in the binding interface (7) Incomplex with the same LoxM7 DNA sequence two Crevariants which differ at only three residue positions haddistinctly different conformations for the common side-chains mediating base contacts Further comparison ofone of the variants bound to a different LoxP DNA siterevealed that recognition of the two DNA sequencesLoxM7 and LoxP led to altered side-chain conformationsand different side-chain and water-mediated contactsThese studies highlight the complex interplay betweenside-chains water molecules and DNA bases in thebinding interface

DNA structure and indirect readout

lsquoDirect readoutrsquo (also known as lsquobase readoutrsquo) refers tothe situation where proteins discriminate different bases ina DNA sequence via direct (or water-mediated)

To whom correspondence should be addressed Tel +1 617 358 7118 Fax +1 617 353 6340 Email tsiggersbueduCorrespondence may also be addressed to Raluca Gordan Tel +919 684 9881 Fax +919 668 0795 Email ralucagordandukeedu

Nucleic Acids Research 2013 1ndash13doi101093nargkt1112

The Author(s) 2013 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc30) which permits non-commercial re-use distribution and reproduction in any medium provided the original work is properly cited For commercialre-use please contact journalspermissionsoupcom

Nucleic Acids Research Advance Access published November 16 2013 by guest on A

pril 20 2016httpnaroxfordjournalsorg

Dow

nloaded from

interactions with the DNA bases In contrast lsquoindirectreadoutrsquo (also known as lsquoshape readoutrsquo) describes a dif-ferent mechanism of recognition in which the sequence-dependent deformability or structural differences betweenDNA molecules contribute to their discrimination (8)A recent review categorized different types of shapereadout and distinguished between lsquolocalrsquo shape readoutin which deviations from B-form DNA are localized alongthe sequence and lsquoglobalrsquo shape readout in which muchof the DNA molecule is deformed or bent (2) In localshape readout sequence-dependent kinks in the DNAmolecule or localized deviations in DNA groove widthcan lead to preferential binding to the cognate proteinExamples of widely used types of local shape readoutare the preferential binding of arginine and histidineresidues into narrow DNA minor grooves DNA (910)or binding to DNA kinks observed at YpR steps suchas seen for TATA-box binding protein (91112) Inglobal shape readout larger deformations are involvedFor example a mechanism proposed for DNA bindingof the human papillomavirus E2 protein is that DNA se-quences in the unbound form adopt global conformationssimilar to that found in the bound proteinndashDNA complex(13) A particular DNA sequence that was already lsquopre-bentrsquo would require less energy to deform and hence bepreferentially bound by a protein (213) What complicatesprediction of the effects of indirect readout on the bindingspecificity is that DNA deformation is a structuralproperty that involves multiple distributed interactions(residuendashbase and basendashbase) and is sensitive to the con-formation adopted in the bound complex Even localshape readout (eg the local narrowing of the DNAminor grove) involves integrating the individual structuralpropensities over multiple bases (2) Therefore a recogni-tion code for many proteins will need to include or accountfor the detailed structure of the bound proteinndashDNAcomplex

Docking geometry and spatial relationships

Energetically favorable residuendashbase interactions particu-larly hydrogen bond interactions depend on the relativespatial orientation of the protein residue and the DNAbase (114ndash16) Subsequently the docking geometry of aproteinndashDNA complex defined by the orientation of theprotein backbone relative to the DNA is critical to estab-lishing favorable amino acidndashbase interactions (141517)Comparisons of the docking geometry for differentproteinndashDNA complexes have revealed complications toa simple recognition code First the docking geometry ofdifferent protein folds can differ dramatically thereforefavorable interactions made by different residue types willdepend strongly on the protein fold (ie not all arginineswill be able to make optimal contacts with a guanine)Second even structurally homologous proteins can dockon the DNA in different orientations (14151718)Therefore the distributed proteinndashDNA interactionsthat contribute to the overall docking geometry bothbase-specific and non-specific (ie with the DNAbackbone) can all potentially affect the proteinndashDNAinteractions In other words effects are non-local and

residues distributed throughout the protein will affectDNA-binding specificity

Studies have also shown that the docking geometry of thesame protein can vary when bound to different DNA se-quences (141819) In an illustrative case crystal structuresof the nuclear factor kappa B (NF-kB)-family RelAhomodimer in complex with different DNA sequences ex-hibited markedly different orientations for the dimersubunits In the structure of RelA homodimer bound tothe pseudo-symmetric sequence 50-GGAA(A)TTTC-30one subunit makes a canonical set of hydrogen bond inter-actions with the 50-GGAA half-site sequence In contrastthe other subunit undergoes a large rotation and translationand makes no base-specific contacts with the apposing 50-GAAA half-site yet allows the protein to bind well to DNA(20) This contrasts starkly with the structure of RelAbound to the more symmetric 50-GGAA(T)TTCC-30 sitewhere the same docking and DNA-base contacts are sym-metric for the two subunits (21) Flexibility of the proteindocking geometry in response to different DNA sequencespresents a major difficulty in predicting the interactions andsubsequent DNA-binding specificity of a protein (1722)

The complexities described here that result from flexibleproteinndashDNA interactions indirect DNA readout andprotein docking geometry make it difficult to establishsimple recognition rules that can provide a comprehensivedescription of proteinndashDNA binding Structural and bio-chemical studies have now identified the major specificity-determining residues for many different protein structuralfamilies and subfamilies providing a detailed picture ofthe basic mechanisms of DNA recognition used by differ-ent proteins (23) However despite these structural andmechanistic descriptions simple recognition codes havenot emerged that can accurately characterize the bindingaffinities of a protein to the vast number of potentialDNA-binding site sequences and has spurred the develop-ment of more complex models (23ndash26)

In the last 10 years newly developed high-throughput(HT) technologies have revolutionized our ability tocharacterize proteinndashDNA binding specificity Suchtechnologies include bacterial one-hybrid (B1H) (2728)protein-binding microarrays (PBMs) (29ndash33) total internalreflectance fluorescence-PBM (34) mechanically inducedtrapping of molecular interactions (MITOMI) (35) Bind-n-seq (36) EMSA-seq (37) HT-SELEXSELEX-seq (38ndash40) microarray evaluation of genomic aptamers by shift(MEGAshift) (41) cognate site identifier (CSI) (42) andHT sequencing fluorescent ligand interaction profiling(HiTS-FLIP) (43) The rich datasets provided by thesenew HT technologies are facilitating the development ofimproved models of proteinndashDNA recognition (4445)while at the same time revealing new complications formodels of binding Here we will briefly review the mostwidely used models and outline features of proteinndashDNAbinding that complicate their development and application

COMPLEXITIES IN PROTEINndashDNA RECOGNITION

Consensus sequences in which base preferences are repre-sented using letters (eg A=Ade Y=Cyt or Thy) are

2 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

widely used to represent proteinndashDNA binding specificityThese models are appropriate for high-specificity DNA-interacting proteins such as restriction enzymes Forexample enzyme HincII will cut DNA sites that matchthe consensus GTYRAC (R=Ade or Gua) but theaffinity for all other DNA sites is orders of lower magni-tude (46) However transcription factor (TF)ndashDNAbinding is much more degenerate and is oftencharacterized by a gradual transition from high to lowaffinity sites (2429354346ndash48) To account for thesequence degeneracy in TF binding the concept ofposition weight matrices (PWMs) was proposed andremains the most widely used representation of TF-binding specificity (4649) A PWM is a matrix of scores(or weights) for each DNA base pair along a binding sitePWM models can be learned from a variety of data typesfrom small collections of known binding sites to largedatasets generated using HT technologies PWMs alsohave the benefit of being easy to visualize as DNAlogos providing an intuitive feel for the TF-bindingspecificity (50)

One drawback of the PWM formalism is that it makesthe implicit assumption that individual base pairs within abinding site contribute independently to the proteinndashDNAbinding affinity It has been shown that this assumptiondoes not always hold (51ndash53) and more complex modelsof proteinndashDNA specificity have been developed toaccount for position dependencies within proteinndashDNAbinding sites (5254ndash56) Some studies have focused onextending traditional PWM models into lsquohigher-orderrsquomodels by including contributions from dinucleotidesand trinucleotides (4557ndash62) A drawback of includingnon-independent contributions is that the number ofmodel parameters increases significantly which makesthe models harder to learn and prone to overfittingThus selecting the relevant dinucleotides andtrinucleotides is critical for building higher-order modelsthat take into account dependencies between positions inTF-binding sites (5860) We note that other approacheshave also been proposed to model DNA-binding specifi-city however a full survey of these various approaches isoutside the scope of this review Beyond inherent sequencedegeneracy and positional dependencies several otheraspects of proteinndashDNA binding complicate the develop-ment and application of binding models and are reviewedbelow

Low-affinity binding sites

Regardless of how degeneracy is represented in proteinndashDNA binding models it is critically important to be ableto capture the full breadth of binding sites used in vivoComplicating this goal is the fact that TFs can specificallyutilize low-affinity DNA-binding sites to regulate genesTF binding to low-affinity DNA sites can provide a mech-anism for interpreting both spatial (63ndash66) and temporal(6768) TF gradients that often arise during developmentto control where and when genes are expressed Analysisof genome-wide binding data has also provided evidencethat low-affinity sites are under wide-spread evolutionaryselection (6970) and that their inclusion can greatly

improve quantitative models of TF binding and generegulation used for predicting segmentation patternsduring early embryonic development in Drosophila (71)Utilization of sites selected to be lower affinity than anoptimal sequence opens the door for functionallyrelevant sites to deviate strongly from the consensus se-quence and may not be well represented by a particularbinding model For example a comprehensive analysis ofDNA binding by NF-kB dimers identified numerouslower affinity non-traditional sites that differ significantlyfrom the consensus sites and are not captured by thewidely used PWMs (3772) Disagreement with a PWMmodel may be due to (i) a protein having multiplebinding modes which will require multiple PWMs (dis-cussed more below) or (ii) poor or biased parameterizationof the PWM model PWMs can capture low-affinitybinding sites but must be explicitly parameterized usinglow-affinity binding data (59) In either case by focusingonly on the highest affinity sites proteinndashDNA bindingmodels are likely to miss functionally important inter-actions that occur through lower affinity sites howevermaking binding models too flexible or encompassingwithout proper parameterization runs the risk ofincreasing the rate of false-positive predictions

TF-specific preferences

Another complexity highlighted by recent HT proteinndashDNA binding studies is that closely related TFs oftenexhibit both common and TF-specific binding preferences(232435374072ndash79) In a study of 104 mouse TFs itwas found that even proteins with a high degree of simi-larity in their DNA-binding domains (DBDs) (as high as67 amino acid sequence identity) can exhibit distinctDNA-binding profiles (24) In many cases the highestaffinity site is identical for several members of a DBDfamily but individual proteins within the family have dif-ferent preferences for lower affinity sites For examplemouse TFs Irf4 and Irf5 share a strong preference forsequences containing the 50-CGAAAC-30 site but preferalbeit with lower affinity 50- TGAAAG-30 or 50-CGAGAC-30 respectively Homeodomain proteins have been ex-tensively studied using HT approaches (237778) reveal-ing rich and diverse sequence preferences for memberseven within protein subfamilies For example the Lhxsubfamily binds with high affinity sites containing thecore 50-TAATTA-30 but different subfamily membershave different preferences for medium- and low-affinitysites Lhx2 prefers 50-TAATGA-30 and 50-TAACGA-30whereas Lhx4 prefers 50-TAACGA-30 and 50-TAATCT-30 These TF-specific preferences cause difficulties forDNA-binding models as the models must accuratelycharacterize the common binding sites while at the sametime capture the sites specific to each TF

Flanking DNA

Even in the case of TFs with well-defined core DNAbinding sites complexities can still arise from the DNAsequence flanking the core which can affect bindingaffinity Two such examples have been highlighted inrecent HT studies (45) The TF Gcn4 binds to the core

Nucleic Acids Research 2013 3

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

motif 50-TGACTCA-30 Binding measurements of Gcn4 tothousands of sequences containing this core motif revealeda wide range of binding affinities from no appreciablebinding to high-affinity binding (43) Nucleotides immedi-ately adjacent to the core 7-mer were important forbinding affinity but even the two nucleotides flankingthe best 9-mer affected affinity by an order of magnitudeAnother recent study examined the DNA-binding prefer-ences of the paralogous yeast TFs Cbf1 and Tye7 tohundreds of genomic sequences containing the E-boxmotif 50-CACGTG-30 (45) The study revealed a strongdependence on flanking DNA and computationalanalyses suggested that the sequence beyond the canonicalE-box motif contributes to binding specificity byinfluencing the three-dimensional structure of the DNA-binding site Importantly the additional specificity comingfrom sequences flanking the core motif could not bedescribed by simply extending the core (45) which isnot surprising given that the influence of flanking DNAis likely to be exerted through DNA shape and notthrough specific DNA contacts DNA shape is afunction of the DNA sequence however this relationshipis complex (80) and cannot be captured by simple PWMmodels Thus in order to account for the influence offlanking regions on the affinity of DNA-binding sitesfuture models of binding specificity will likely use DNAshape characteristics explicitly

Multiple modes of DNA binding

Complicating a simple model of DNA binding for manyproteins is that they can bind DNA using two or moredistinct modes (Figure 1) These different modes of inter-action can lead to fundamentally different DNA-bindingpreferences (ie motifs) complicating simple bindingmodels that do not explicitly account for these multiplemodes HT studies are increasingly revealing unknowndiversity in DNA-binding preferences of numerousproteins many of which may be the results of differentbinding modes (29727581) Here we classify variablebinding modes into four categories (Figure 1) and brieflyreview each category

Variable spacingSome proteins that bind to bipartite DNAmotifs (ie motifscomposed of two half-sites) can recognize distinct classes ofmotifs in which the half-sites are separated by differentnumbers of bases (Figure 1A) This phenomenon was firstobserved more than 20 years ago for basic leucine zipper(bZIP) proteins which can bind overlapping or adjacentTGAC half-sites (82) Subsequent studies have classifiedthe bZIP family into two subfamilies based on proteinsequence (i) Activator protein 1 (AP-1) proteins which gen-erally prefer overlapping half-sites but bind to adjacent half-sites with almost equal affinity and (ii) Activatingtranscription factor (ATF)cAMP response element-binding protein (CREB) proteins which generally preferadjacent half-sites and bind very poorly to overlappinghalf-sites (8283) However a recent PBM-based studyfound that at least one ATFCREB protein in yeast(Yap3) binds with high affinity to both adjacent and

overlapping half-sites (74) Therefore although some ofthe residues that defined bZIP half-site spacing have beenidentified (84) additional residues are also involvedVariable half-site spacing has also been well documentedfor the nuclear receptors (NRs) For example both the per-oxisome proliferator activated receptors (PPARs) andretinoic acid receptors (RARs) form heterodimers with theretinoid X receptor (RXR) and can bind to responseelements composed of direct repeats of the 50-AGGTCA-30

half-site separated by different length spacers (85)PPARRXR dimers bind direct repeats spaced by 1 or 2bases (ie DR1=50-AGGTCANAGGTCA-30 DR2=50-AGGTCANNAGGTCA-30) while RARRXR dimersbind to repeats spaced by 1 (DR1) or 5 (DR5) bases Thevariable spacing has even been shown to affect in vivofunction of DNA-bound NR dimers (86) RARRXRbound to DR5 elements can activate transcription inresponse to ligand but will not activate transcription whenbound to the DR1 elements (86)

Multiple DBDsDNA-binding proteins can contain multiple independentDBDs (387) Multiple DBDs can allow a protein to alter-natively recognize different DNA elements using differentDBDs For example the zinc finger (ZF) protein Evi1contains 10 ZF domains separated into two autonomouslyfunctioning DBDsmdashan N-terminal seven ZF DBD and aC-terminal three ZF DBDmdashthat recognize completely dif-ferent DNA motifs (88ndash90) A still more complicatedscenario is seen for the mouse TF Oct-1 that has twoDBDs known as the POU (Pit-Oct-Unc) homeodomain(POUHD) and the POU-specific domain (POUS) (Figure1B) (249192) Oct-1 can bind to three distinct DNAmotifs using different combinations of the two DBDsthe POUHD site is recognized by the POUHD domainthe POUS site is recognized by POUS domain and thecomposite POU site is recognized using both domains

Multi-meric bindingSelective multimerization provides another means toexpand or diversify the DNA-binding specificity of aprotein Proteins such as Oct-1(93) and RXR (94) havebeen reported to bind DNA as either monomers orhomodimers Recent large-scale studies using HT-SELEX assays (3995) have revealed that the dimericmode of binding might be more common than previouslyappreciated and even proteins that are known to bindDNA primarily as monomers such as Elk1 (Figure 1C)can also form dimers with specific orientation and spacingpreferences These dimeric sites are enriched withingenomic regions bound in vivo suggesting that thedimeric profiles are biologically relevant Furthermorethe dimer orientation and spacing preferences can some-times distinguish individual members of the same TFfamily (95) For example T-box factors share a commonmonomeric binding specificity but show seven distinctdimeric spacingorientation preferences

Alternate structural conformations

The recognition of distinct DNA motifs is expected whena protein contains multiple DBDs or binds as a multi-mer

4 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

however some proteins with only a single DBD can alsobind to multiple distinctly different DNA sites One suchexample is mouse TF sterol regulatory element-bindingfactor 1 (SREBF1) (Figure 1D) the ortholog of humanTF sterol regulatory element-binding protein 1 (SREBP)(5996) SREBFSREBP proteins belong to the basichelixndashloopndashhelix (bHLH) family of TFs that typically rec-ognize symmetric E-boxes (50-CAnnTG-30) Unlike mostbHLH proteins SREBFSREBP can bind the asymmetricsterol regulatory element 50-ATCACnCCAC-30 A co-crystal structure of the DNA-binding domain of humanSREBP-1a bound to DNA revealed an asymmetric DNAndashprotein interface with one monomer binding the E-boxhalf-site (50-ATCAC-30) using proteinndashDNA contactstypical of bHLH proteins and the second monomerrecognizing the non-E-box half-site (50-GTGGG-30)using entirely different proteinndashDNA contacts (9697)

Thus in the case of SREBFSREBP residues in theDBD are responsible for the multiple modes of DNAbinding Regions outside the DNA-binding domain ofthe protein can also enable alternate binding modes Forexample a region N-terminal to the basic DBD of yeastTF Hac1 is required for dual recognition modesMutations in this region were shown to preferentiallyreduce binding to one of the modes whereas mutationof an arginine in the basic domain was crucial for theother mode of binding (98) This indicates that theprotein can bind DNA using two distinct conformationsand individual residues (within and outside the DBD) playspecific roles within each binding mode (98)The ability of proteins to utilize different DNA-binding

modes presents difficulties or even precludes the construc-tion of simple DNA-binding models for many proteinsThe presence of distinct modes usually requires that

Multiple Modes of DNA Binding

Gcn4 dimers can bind to bipartite sites with half-sites separated byvariable-length spacers (82) motifsfrom (7374)

Multiple DBDs

Oct-1 can bind to different DNA sites using different arrangementsof its two DNA-binding domains (9192)motifs from (24)

Variable Spacing

GAT

ACT

TA

CTT

GTC

GTAT

AGTA

GAT

Alternate Structural

Conformations

SREBP can bind to different DNA sites by adopting alternate structural conformations (9697) motifs from (44)

Alternate conformation

CGA

AT

AG

CGA

TGC

ACG

CT

TC

GTA

GCT

Multi-meric Binding Elk1 can bind both as a monomeror as a dimer (95)

Monomer site

GA

TGC

ATTCCGC

GCGGATA

CAG

CT

CTGA

GAC

A

CGGATA

CAG

GACT

GA

Dimer site

GCAG

ATC

AT

CA

TA

ATG

TTA

ATG

GTC

POU site HD

ATG

GTCA

TATAA

TCAGT

TAC

GCTA

POU site S POU site

CGATT

GCTAC

GGATA

CAGCT

TGA

CTT

GATC

AGTA

CGAC

T

A

B

C

D

Overlapping half-sites Adjacent half-sites

CAG

ATC

TGA

GT

CACGCCAG

TC

Figure 1 Multiple modes of DNA binding Schematized are examples illustrating four mechanistic categories by which proteins recognize differentDNA sites via distinct structural modes

Nucleic Acids Research 2013 5

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

multiple DNA-binding models are used to represent aproteinrsquos specificity This can cause difficulties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifssuch as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets In some situations the DNA-bindingmodes can potentially be separated and independentlycharacterized For example one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs However even thesesituations can lead to difficulties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B)

Multi-protein recognition codes

The DNA-binding specificity of a TF is a primary deter-minant of where it will bind in the genome (99100)However transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101102)and it has been shown that multi-protein complexes canexhibit novel DNA-binding specificities not predictablefrom measurements of the individual proteins (4048)Therefore in efforts aimed at understanding the biophys-ical determinants of specificity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specificitiesHere we will review the varied mechanism by whichnovel DNA-binding specificities can arise when proteinsboth DNA-binding and non-DNA-binding assembletogether We suggest that these mechanisms representhigher-order lsquomulti-proteinrsquo recognition codesmdashmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes

Cooperative binding

Cooperative binding of TFs to DNAmdashusually viaproteinndashprotein interactions between adjacently boundTFsmdashstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3103) Cooperative binding can also affect theDNA-binding specificity and lead to recognition of newbinding sites (3) (Figure 2A) One way in which coopera-tive binding alters specificity is by extending the binding ofa TF to include lower affinity sites as a result of theenhanced overall affinity of the cooperative complex forDNA A well-studied example is the binding of yeast TFsMATa2 MATa1 and Mcm1 which regulate mating-typegenes (103104) MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes MATa2MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specific genesand repress their expression whereas MATa2Mcm1heterotetramers bind different sites found in mating-typea-specific gene promoters and repress gene expressionBoth complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2 Therefore MATa2 functions to recruitco-repressors and repress gene expression but its

DNA-binding specificitymdashand subsequently its targetgenesmdashare mediated by cooperative binding with cell-type-specific cofactors In a recent study a more subtleform of specificity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105) Notablythe cooperative binding was not mediated by directprotein contacts but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix The cooperative allosteric effects were shown tooperate even to 16 bp away

A second more active mechanism has been describedwhere cooperative binding alters the residuendashbase contactsthat a TF can make with the DNA thereby altering itsinherent binding specificity (340) (Figure 2A) Oneexample is the binding of the TFs Ets-1 and Pax5 wherePax5 alters the DNA contacts made by Ets-1 making itmore permissive to alternate DNA sequences (106)Bound alone to DNA Ets-1 binds with high-affinity to50-GGAA-30 and with lower affinity to 50-GGAG-30A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanineHowever in complex with Pax-5 the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences and therefore broadens the specificity of thePax-5Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40107)The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specificsite (fkh250) found in the fork head (fkh) gene over aconsensus site (fkh250con) recognized by other HoxExdcomplexes (107108) The preferential binding of theScrExd complex to the fkh250 site involves the stabiliza-tion of a flexible lsquoarmrsquo region N-terminal to the Scrhomeodomain (107) Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residuesTherefore the enhanced specificity of the ScrExdcomplex to the fhk250 sequence is due to local DNAshape (ie minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor This work highlights how the intrin-sic specificity of a monomeric homeodomain (Scr) can bealtered or refined through interaction with a proteincofactor (Exd) a characteristic called latent specificityA recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specificitiesfor all the Drosophila Hox proteins (40) A HT-SELEXSELEX-seq assay was used to measure and compare theDNA-binding specificity of all eight Drosophila Hoxproteins alone and in complex with Exd As seen forScrExd binding with the Exd cofactor revealed latent

6 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

interactions with the DNA bases In contrast lsquoindirectreadoutrsquo (also known as lsquoshape readoutrsquo) describes a dif-ferent mechanism of recognition in which the sequence-dependent deformability or structural differences betweenDNA molecules contribute to their discrimination (8)A recent review categorized different types of shapereadout and distinguished between lsquolocalrsquo shape readoutin which deviations from B-form DNA are localized alongthe sequence and lsquoglobalrsquo shape readout in which muchof the DNA molecule is deformed or bent (2) In localshape readout sequence-dependent kinks in the DNAmolecule or localized deviations in DNA groove widthcan lead to preferential binding to the cognate proteinExamples of widely used types of local shape readoutare the preferential binding of arginine and histidineresidues into narrow DNA minor grooves DNA (910)or binding to DNA kinks observed at YpR steps suchas seen for TATA-box binding protein (91112) Inglobal shape readout larger deformations are involvedFor example a mechanism proposed for DNA bindingof the human papillomavirus E2 protein is that DNA se-quences in the unbound form adopt global conformationssimilar to that found in the bound proteinndashDNA complex(13) A particular DNA sequence that was already lsquopre-bentrsquo would require less energy to deform and hence bepreferentially bound by a protein (213) What complicatesprediction of the effects of indirect readout on the bindingspecificity is that DNA deformation is a structuralproperty that involves multiple distributed interactions(residuendashbase and basendashbase) and is sensitive to the con-formation adopted in the bound complex Even localshape readout (eg the local narrowing of the DNAminor grove) involves integrating the individual structuralpropensities over multiple bases (2) Therefore a recogni-tion code for many proteins will need to include or accountfor the detailed structure of the bound proteinndashDNAcomplex

Docking geometry and spatial relationships

Energetically favorable residuendashbase interactions particu-larly hydrogen bond interactions depend on the relativespatial orientation of the protein residue and the DNAbase (114ndash16) Subsequently the docking geometry of aproteinndashDNA complex defined by the orientation of theprotein backbone relative to the DNA is critical to estab-lishing favorable amino acidndashbase interactions (141517)Comparisons of the docking geometry for differentproteinndashDNA complexes have revealed complications toa simple recognition code First the docking geometry ofdifferent protein folds can differ dramatically thereforefavorable interactions made by different residue types willdepend strongly on the protein fold (ie not all arginineswill be able to make optimal contacts with a guanine)Second even structurally homologous proteins can dockon the DNA in different orientations (14151718)Therefore the distributed proteinndashDNA interactionsthat contribute to the overall docking geometry bothbase-specific and non-specific (ie with the DNAbackbone) can all potentially affect the proteinndashDNAinteractions In other words effects are non-local and

residues distributed throughout the protein will affectDNA-binding specificity

Studies have also shown that the docking geometry of thesame protein can vary when bound to different DNA se-quences (141819) In an illustrative case crystal structuresof the nuclear factor kappa B (NF-kB)-family RelAhomodimer in complex with different DNA sequences ex-hibited markedly different orientations for the dimersubunits In the structure of RelA homodimer bound tothe pseudo-symmetric sequence 50-GGAA(A)TTTC-30one subunit makes a canonical set of hydrogen bond inter-actions with the 50-GGAA half-site sequence In contrastthe other subunit undergoes a large rotation and translationand makes no base-specific contacts with the apposing 50-GAAA half-site yet allows the protein to bind well to DNA(20) This contrasts starkly with the structure of RelAbound to the more symmetric 50-GGAA(T)TTCC-30 sitewhere the same docking and DNA-base contacts are sym-metric for the two subunits (21) Flexibility of the proteindocking geometry in response to different DNA sequencespresents a major difficulty in predicting the interactions andsubsequent DNA-binding specificity of a protein (1722)

The complexities described here that result from flexibleproteinndashDNA interactions indirect DNA readout andprotein docking geometry make it difficult to establishsimple recognition rules that can provide a comprehensivedescription of proteinndashDNA binding Structural and bio-chemical studies have now identified the major specificity-determining residues for many different protein structuralfamilies and subfamilies providing a detailed picture ofthe basic mechanisms of DNA recognition used by differ-ent proteins (23) However despite these structural andmechanistic descriptions simple recognition codes havenot emerged that can accurately characterize the bindingaffinities of a protein to the vast number of potentialDNA-binding site sequences and has spurred the develop-ment of more complex models (23ndash26)

In the last 10 years newly developed high-throughput(HT) technologies have revolutionized our ability tocharacterize proteinndashDNA binding specificity Suchtechnologies include bacterial one-hybrid (B1H) (2728)protein-binding microarrays (PBMs) (29ndash33) total internalreflectance fluorescence-PBM (34) mechanically inducedtrapping of molecular interactions (MITOMI) (35) Bind-n-seq (36) EMSA-seq (37) HT-SELEXSELEX-seq (38ndash40) microarray evaluation of genomic aptamers by shift(MEGAshift) (41) cognate site identifier (CSI) (42) andHT sequencing fluorescent ligand interaction profiling(HiTS-FLIP) (43) The rich datasets provided by thesenew HT technologies are facilitating the development ofimproved models of proteinndashDNA recognition (4445)while at the same time revealing new complications formodels of binding Here we will briefly review the mostwidely used models and outline features of proteinndashDNAbinding that complicate their development and application

COMPLEXITIES IN PROTEINndashDNA RECOGNITION

Consensus sequences in which base preferences are repre-sented using letters (eg A=Ade Y=Cyt or Thy) are

2 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

widely used to represent proteinndashDNA binding specificityThese models are appropriate for high-specificity DNA-interacting proteins such as restriction enzymes Forexample enzyme HincII will cut DNA sites that matchthe consensus GTYRAC (R=Ade or Gua) but theaffinity for all other DNA sites is orders of lower magni-tude (46) However transcription factor (TF)ndashDNAbinding is much more degenerate and is oftencharacterized by a gradual transition from high to lowaffinity sites (2429354346ndash48) To account for thesequence degeneracy in TF binding the concept ofposition weight matrices (PWMs) was proposed andremains the most widely used representation of TF-binding specificity (4649) A PWM is a matrix of scores(or weights) for each DNA base pair along a binding sitePWM models can be learned from a variety of data typesfrom small collections of known binding sites to largedatasets generated using HT technologies PWMs alsohave the benefit of being easy to visualize as DNAlogos providing an intuitive feel for the TF-bindingspecificity (50)

One drawback of the PWM formalism is that it makesthe implicit assumption that individual base pairs within abinding site contribute independently to the proteinndashDNAbinding affinity It has been shown that this assumptiondoes not always hold (51ndash53) and more complex modelsof proteinndashDNA specificity have been developed toaccount for position dependencies within proteinndashDNAbinding sites (5254ndash56) Some studies have focused onextending traditional PWM models into lsquohigher-orderrsquomodels by including contributions from dinucleotidesand trinucleotides (4557ndash62) A drawback of includingnon-independent contributions is that the number ofmodel parameters increases significantly which makesthe models harder to learn and prone to overfittingThus selecting the relevant dinucleotides andtrinucleotides is critical for building higher-order modelsthat take into account dependencies between positions inTF-binding sites (5860) We note that other approacheshave also been proposed to model DNA-binding specifi-city however a full survey of these various approaches isoutside the scope of this review Beyond inherent sequencedegeneracy and positional dependencies several otheraspects of proteinndashDNA binding complicate the develop-ment and application of binding models and are reviewedbelow

Low-affinity binding sites

Regardless of how degeneracy is represented in proteinndashDNA binding models it is critically important to be ableto capture the full breadth of binding sites used in vivoComplicating this goal is the fact that TFs can specificallyutilize low-affinity DNA-binding sites to regulate genesTF binding to low-affinity DNA sites can provide a mech-anism for interpreting both spatial (63ndash66) and temporal(6768) TF gradients that often arise during developmentto control where and when genes are expressed Analysisof genome-wide binding data has also provided evidencethat low-affinity sites are under wide-spread evolutionaryselection (6970) and that their inclusion can greatly

improve quantitative models of TF binding and generegulation used for predicting segmentation patternsduring early embryonic development in Drosophila (71)Utilization of sites selected to be lower affinity than anoptimal sequence opens the door for functionallyrelevant sites to deviate strongly from the consensus se-quence and may not be well represented by a particularbinding model For example a comprehensive analysis ofDNA binding by NF-kB dimers identified numerouslower affinity non-traditional sites that differ significantlyfrom the consensus sites and are not captured by thewidely used PWMs (3772) Disagreement with a PWMmodel may be due to (i) a protein having multiplebinding modes which will require multiple PWMs (dis-cussed more below) or (ii) poor or biased parameterizationof the PWM model PWMs can capture low-affinitybinding sites but must be explicitly parameterized usinglow-affinity binding data (59) In either case by focusingonly on the highest affinity sites proteinndashDNA bindingmodels are likely to miss functionally important inter-actions that occur through lower affinity sites howevermaking binding models too flexible or encompassingwithout proper parameterization runs the risk ofincreasing the rate of false-positive predictions

TF-specific preferences

Another complexity highlighted by recent HT proteinndashDNA binding studies is that closely related TFs oftenexhibit both common and TF-specific binding preferences(232435374072ndash79) In a study of 104 mouse TFs itwas found that even proteins with a high degree of simi-larity in their DNA-binding domains (DBDs) (as high as67 amino acid sequence identity) can exhibit distinctDNA-binding profiles (24) In many cases the highestaffinity site is identical for several members of a DBDfamily but individual proteins within the family have dif-ferent preferences for lower affinity sites For examplemouse TFs Irf4 and Irf5 share a strong preference forsequences containing the 50-CGAAAC-30 site but preferalbeit with lower affinity 50- TGAAAG-30 or 50-CGAGAC-30 respectively Homeodomain proteins have been ex-tensively studied using HT approaches (237778) reveal-ing rich and diverse sequence preferences for memberseven within protein subfamilies For example the Lhxsubfamily binds with high affinity sites containing thecore 50-TAATTA-30 but different subfamily membershave different preferences for medium- and low-affinitysites Lhx2 prefers 50-TAATGA-30 and 50-TAACGA-30whereas Lhx4 prefers 50-TAACGA-30 and 50-TAATCT-30 These TF-specific preferences cause difficulties forDNA-binding models as the models must accuratelycharacterize the common binding sites while at the sametime capture the sites specific to each TF

Flanking DNA

Even in the case of TFs with well-defined core DNAbinding sites complexities can still arise from the DNAsequence flanking the core which can affect bindingaffinity Two such examples have been highlighted inrecent HT studies (45) The TF Gcn4 binds to the core

Nucleic Acids Research 2013 3

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

motif 50-TGACTCA-30 Binding measurements of Gcn4 tothousands of sequences containing this core motif revealeda wide range of binding affinities from no appreciablebinding to high-affinity binding (43) Nucleotides immedi-ately adjacent to the core 7-mer were important forbinding affinity but even the two nucleotides flankingthe best 9-mer affected affinity by an order of magnitudeAnother recent study examined the DNA-binding prefer-ences of the paralogous yeast TFs Cbf1 and Tye7 tohundreds of genomic sequences containing the E-boxmotif 50-CACGTG-30 (45) The study revealed a strongdependence on flanking DNA and computationalanalyses suggested that the sequence beyond the canonicalE-box motif contributes to binding specificity byinfluencing the three-dimensional structure of the DNA-binding site Importantly the additional specificity comingfrom sequences flanking the core motif could not bedescribed by simply extending the core (45) which isnot surprising given that the influence of flanking DNAis likely to be exerted through DNA shape and notthrough specific DNA contacts DNA shape is afunction of the DNA sequence however this relationshipis complex (80) and cannot be captured by simple PWMmodels Thus in order to account for the influence offlanking regions on the affinity of DNA-binding sitesfuture models of binding specificity will likely use DNAshape characteristics explicitly

Multiple modes of DNA binding

Complicating a simple model of DNA binding for manyproteins is that they can bind DNA using two or moredistinct modes (Figure 1) These different modes of inter-action can lead to fundamentally different DNA-bindingpreferences (ie motifs) complicating simple bindingmodels that do not explicitly account for these multiplemodes HT studies are increasingly revealing unknowndiversity in DNA-binding preferences of numerousproteins many of which may be the results of differentbinding modes (29727581) Here we classify variablebinding modes into four categories (Figure 1) and brieflyreview each category

Variable spacingSome proteins that bind to bipartite DNAmotifs (ie motifscomposed of two half-sites) can recognize distinct classes ofmotifs in which the half-sites are separated by differentnumbers of bases (Figure 1A) This phenomenon was firstobserved more than 20 years ago for basic leucine zipper(bZIP) proteins which can bind overlapping or adjacentTGAC half-sites (82) Subsequent studies have classifiedthe bZIP family into two subfamilies based on proteinsequence (i) Activator protein 1 (AP-1) proteins which gen-erally prefer overlapping half-sites but bind to adjacent half-sites with almost equal affinity and (ii) Activatingtranscription factor (ATF)cAMP response element-binding protein (CREB) proteins which generally preferadjacent half-sites and bind very poorly to overlappinghalf-sites (8283) However a recent PBM-based studyfound that at least one ATFCREB protein in yeast(Yap3) binds with high affinity to both adjacent and

overlapping half-sites (74) Therefore although some ofthe residues that defined bZIP half-site spacing have beenidentified (84) additional residues are also involvedVariable half-site spacing has also been well documentedfor the nuclear receptors (NRs) For example both the per-oxisome proliferator activated receptors (PPARs) andretinoic acid receptors (RARs) form heterodimers with theretinoid X receptor (RXR) and can bind to responseelements composed of direct repeats of the 50-AGGTCA-30

half-site separated by different length spacers (85)PPARRXR dimers bind direct repeats spaced by 1 or 2bases (ie DR1=50-AGGTCANAGGTCA-30 DR2=50-AGGTCANNAGGTCA-30) while RARRXR dimersbind to repeats spaced by 1 (DR1) or 5 (DR5) bases Thevariable spacing has even been shown to affect in vivofunction of DNA-bound NR dimers (86) RARRXRbound to DR5 elements can activate transcription inresponse to ligand but will not activate transcription whenbound to the DR1 elements (86)

Multiple DBDsDNA-binding proteins can contain multiple independentDBDs (387) Multiple DBDs can allow a protein to alter-natively recognize different DNA elements using differentDBDs For example the zinc finger (ZF) protein Evi1contains 10 ZF domains separated into two autonomouslyfunctioning DBDsmdashan N-terminal seven ZF DBD and aC-terminal three ZF DBDmdashthat recognize completely dif-ferent DNA motifs (88ndash90) A still more complicatedscenario is seen for the mouse TF Oct-1 that has twoDBDs known as the POU (Pit-Oct-Unc) homeodomain(POUHD) and the POU-specific domain (POUS) (Figure1B) (249192) Oct-1 can bind to three distinct DNAmotifs using different combinations of the two DBDsthe POUHD site is recognized by the POUHD domainthe POUS site is recognized by POUS domain and thecomposite POU site is recognized using both domains

Multi-meric bindingSelective multimerization provides another means toexpand or diversify the DNA-binding specificity of aprotein Proteins such as Oct-1(93) and RXR (94) havebeen reported to bind DNA as either monomers orhomodimers Recent large-scale studies using HT-SELEX assays (3995) have revealed that the dimericmode of binding might be more common than previouslyappreciated and even proteins that are known to bindDNA primarily as monomers such as Elk1 (Figure 1C)can also form dimers with specific orientation and spacingpreferences These dimeric sites are enriched withingenomic regions bound in vivo suggesting that thedimeric profiles are biologically relevant Furthermorethe dimer orientation and spacing preferences can some-times distinguish individual members of the same TFfamily (95) For example T-box factors share a commonmonomeric binding specificity but show seven distinctdimeric spacingorientation preferences

Alternate structural conformations

The recognition of distinct DNA motifs is expected whena protein contains multiple DBDs or binds as a multi-mer

4 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

however some proteins with only a single DBD can alsobind to multiple distinctly different DNA sites One suchexample is mouse TF sterol regulatory element-bindingfactor 1 (SREBF1) (Figure 1D) the ortholog of humanTF sterol regulatory element-binding protein 1 (SREBP)(5996) SREBFSREBP proteins belong to the basichelixndashloopndashhelix (bHLH) family of TFs that typically rec-ognize symmetric E-boxes (50-CAnnTG-30) Unlike mostbHLH proteins SREBFSREBP can bind the asymmetricsterol regulatory element 50-ATCACnCCAC-30 A co-crystal structure of the DNA-binding domain of humanSREBP-1a bound to DNA revealed an asymmetric DNAndashprotein interface with one monomer binding the E-boxhalf-site (50-ATCAC-30) using proteinndashDNA contactstypical of bHLH proteins and the second monomerrecognizing the non-E-box half-site (50-GTGGG-30)using entirely different proteinndashDNA contacts (9697)

Thus in the case of SREBFSREBP residues in theDBD are responsible for the multiple modes of DNAbinding Regions outside the DNA-binding domain ofthe protein can also enable alternate binding modes Forexample a region N-terminal to the basic DBD of yeastTF Hac1 is required for dual recognition modesMutations in this region were shown to preferentiallyreduce binding to one of the modes whereas mutationof an arginine in the basic domain was crucial for theother mode of binding (98) This indicates that theprotein can bind DNA using two distinct conformationsand individual residues (within and outside the DBD) playspecific roles within each binding mode (98)The ability of proteins to utilize different DNA-binding

modes presents difficulties or even precludes the construc-tion of simple DNA-binding models for many proteinsThe presence of distinct modes usually requires that

Multiple Modes of DNA Binding

Gcn4 dimers can bind to bipartite sites with half-sites separated byvariable-length spacers (82) motifsfrom (7374)

Multiple DBDs

Oct-1 can bind to different DNA sites using different arrangementsof its two DNA-binding domains (9192)motifs from (24)

Variable Spacing

GAT

ACT

TA

CTT

GTC

GTAT

AGTA

GAT

Alternate Structural

Conformations

SREBP can bind to different DNA sites by adopting alternate structural conformations (9697) motifs from (44)

Alternate conformation

CGA

AT

AG

CGA

TGC

ACG

CT

TC

GTA

GCT

Multi-meric Binding Elk1 can bind both as a monomeror as a dimer (95)

Monomer site

GA

TGC

ATTCCGC

GCGGATA

CAG

CT

CTGA

GAC

A

CGGATA

CAG

GACT

GA

Dimer site

GCAG

ATC

AT

CA

TA

ATG

TTA

ATG

GTC

POU site HD

ATG

GTCA

TATAA

TCAGT

TAC

GCTA

POU site S POU site

CGATT

GCTAC

GGATA

CAGCT

TGA

CTT

GATC

AGTA

CGAC

T

A

B

C

D

Overlapping half-sites Adjacent half-sites

CAG

ATC

TGA

GT

CACGCCAG

TC

Figure 1 Multiple modes of DNA binding Schematized are examples illustrating four mechanistic categories by which proteins recognize differentDNA sites via distinct structural modes

Nucleic Acids Research 2013 5

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

multiple DNA-binding models are used to represent aproteinrsquos specificity This can cause difficulties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifssuch as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets In some situations the DNA-bindingmodes can potentially be separated and independentlycharacterized For example one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs However even thesesituations can lead to difficulties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B)

Multi-protein recognition codes

The DNA-binding specificity of a TF is a primary deter-minant of where it will bind in the genome (99100)However transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101102)and it has been shown that multi-protein complexes canexhibit novel DNA-binding specificities not predictablefrom measurements of the individual proteins (4048)Therefore in efforts aimed at understanding the biophys-ical determinants of specificity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specificitiesHere we will review the varied mechanism by whichnovel DNA-binding specificities can arise when proteinsboth DNA-binding and non-DNA-binding assembletogether We suggest that these mechanisms representhigher-order lsquomulti-proteinrsquo recognition codesmdashmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes

Cooperative binding

Cooperative binding of TFs to DNAmdashusually viaproteinndashprotein interactions between adjacently boundTFsmdashstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3103) Cooperative binding can also affect theDNA-binding specificity and lead to recognition of newbinding sites (3) (Figure 2A) One way in which coopera-tive binding alters specificity is by extending the binding ofa TF to include lower affinity sites as a result of theenhanced overall affinity of the cooperative complex forDNA A well-studied example is the binding of yeast TFsMATa2 MATa1 and Mcm1 which regulate mating-typegenes (103104) MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes MATa2MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specific genesand repress their expression whereas MATa2Mcm1heterotetramers bind different sites found in mating-typea-specific gene promoters and repress gene expressionBoth complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2 Therefore MATa2 functions to recruitco-repressors and repress gene expression but its

DNA-binding specificitymdashand subsequently its targetgenesmdashare mediated by cooperative binding with cell-type-specific cofactors In a recent study a more subtleform of specificity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105) Notablythe cooperative binding was not mediated by directprotein contacts but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix The cooperative allosteric effects were shown tooperate even to 16 bp away

A second more active mechanism has been describedwhere cooperative binding alters the residuendashbase contactsthat a TF can make with the DNA thereby altering itsinherent binding specificity (340) (Figure 2A) Oneexample is the binding of the TFs Ets-1 and Pax5 wherePax5 alters the DNA contacts made by Ets-1 making itmore permissive to alternate DNA sequences (106)Bound alone to DNA Ets-1 binds with high-affinity to50-GGAA-30 and with lower affinity to 50-GGAG-30A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanineHowever in complex with Pax-5 the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences and therefore broadens the specificity of thePax-5Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40107)The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specificsite (fkh250) found in the fork head (fkh) gene over aconsensus site (fkh250con) recognized by other HoxExdcomplexes (107108) The preferential binding of theScrExd complex to the fkh250 site involves the stabiliza-tion of a flexible lsquoarmrsquo region N-terminal to the Scrhomeodomain (107) Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residuesTherefore the enhanced specificity of the ScrExdcomplex to the fhk250 sequence is due to local DNAshape (ie minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor This work highlights how the intrin-sic specificity of a monomeric homeodomain (Scr) can bealtered or refined through interaction with a proteincofactor (Exd) a characteristic called latent specificityA recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specificitiesfor all the Drosophila Hox proteins (40) A HT-SELEXSELEX-seq assay was used to measure and compare theDNA-binding specificity of all eight Drosophila Hoxproteins alone and in complex with Exd As seen forScrExd binding with the Exd cofactor revealed latent

6 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

widely used to represent proteinndashDNA binding specificityThese models are appropriate for high-specificity DNA-interacting proteins such as restriction enzymes Forexample enzyme HincII will cut DNA sites that matchthe consensus GTYRAC (R=Ade or Gua) but theaffinity for all other DNA sites is orders of lower magni-tude (46) However transcription factor (TF)ndashDNAbinding is much more degenerate and is oftencharacterized by a gradual transition from high to lowaffinity sites (2429354346ndash48) To account for thesequence degeneracy in TF binding the concept ofposition weight matrices (PWMs) was proposed andremains the most widely used representation of TF-binding specificity (4649) A PWM is a matrix of scores(or weights) for each DNA base pair along a binding sitePWM models can be learned from a variety of data typesfrom small collections of known binding sites to largedatasets generated using HT technologies PWMs alsohave the benefit of being easy to visualize as DNAlogos providing an intuitive feel for the TF-bindingspecificity (50)

One drawback of the PWM formalism is that it makesthe implicit assumption that individual base pairs within abinding site contribute independently to the proteinndashDNAbinding affinity It has been shown that this assumptiondoes not always hold (51ndash53) and more complex modelsof proteinndashDNA specificity have been developed toaccount for position dependencies within proteinndashDNAbinding sites (5254ndash56) Some studies have focused onextending traditional PWM models into lsquohigher-orderrsquomodels by including contributions from dinucleotidesand trinucleotides (4557ndash62) A drawback of includingnon-independent contributions is that the number ofmodel parameters increases significantly which makesthe models harder to learn and prone to overfittingThus selecting the relevant dinucleotides andtrinucleotides is critical for building higher-order modelsthat take into account dependencies between positions inTF-binding sites (5860) We note that other approacheshave also been proposed to model DNA-binding specifi-city however a full survey of these various approaches isoutside the scope of this review Beyond inherent sequencedegeneracy and positional dependencies several otheraspects of proteinndashDNA binding complicate the develop-ment and application of binding models and are reviewedbelow

Low-affinity binding sites

Regardless of how degeneracy is represented in proteinndashDNA binding models it is critically important to be ableto capture the full breadth of binding sites used in vivoComplicating this goal is the fact that TFs can specificallyutilize low-affinity DNA-binding sites to regulate genesTF binding to low-affinity DNA sites can provide a mech-anism for interpreting both spatial (63ndash66) and temporal(6768) TF gradients that often arise during developmentto control where and when genes are expressed Analysisof genome-wide binding data has also provided evidencethat low-affinity sites are under wide-spread evolutionaryselection (6970) and that their inclusion can greatly

improve quantitative models of TF binding and generegulation used for predicting segmentation patternsduring early embryonic development in Drosophila (71)Utilization of sites selected to be lower affinity than anoptimal sequence opens the door for functionallyrelevant sites to deviate strongly from the consensus se-quence and may not be well represented by a particularbinding model For example a comprehensive analysis ofDNA binding by NF-kB dimers identified numerouslower affinity non-traditional sites that differ significantlyfrom the consensus sites and are not captured by thewidely used PWMs (3772) Disagreement with a PWMmodel may be due to (i) a protein having multiplebinding modes which will require multiple PWMs (dis-cussed more below) or (ii) poor or biased parameterizationof the PWM model PWMs can capture low-affinitybinding sites but must be explicitly parameterized usinglow-affinity binding data (59) In either case by focusingonly on the highest affinity sites proteinndashDNA bindingmodels are likely to miss functionally important inter-actions that occur through lower affinity sites howevermaking binding models too flexible or encompassingwithout proper parameterization runs the risk ofincreasing the rate of false-positive predictions

TF-specific preferences

Another complexity highlighted by recent HT proteinndashDNA binding studies is that closely related TFs oftenexhibit both common and TF-specific binding preferences(232435374072ndash79) In a study of 104 mouse TFs itwas found that even proteins with a high degree of simi-larity in their DNA-binding domains (DBDs) (as high as67 amino acid sequence identity) can exhibit distinctDNA-binding profiles (24) In many cases the highestaffinity site is identical for several members of a DBDfamily but individual proteins within the family have dif-ferent preferences for lower affinity sites For examplemouse TFs Irf4 and Irf5 share a strong preference forsequences containing the 50-CGAAAC-30 site but preferalbeit with lower affinity 50- TGAAAG-30 or 50-CGAGAC-30 respectively Homeodomain proteins have been ex-tensively studied using HT approaches (237778) reveal-ing rich and diverse sequence preferences for memberseven within protein subfamilies For example the Lhxsubfamily binds with high affinity sites containing thecore 50-TAATTA-30 but different subfamily membershave different preferences for medium- and low-affinitysites Lhx2 prefers 50-TAATGA-30 and 50-TAACGA-30whereas Lhx4 prefers 50-TAACGA-30 and 50-TAATCT-30 These TF-specific preferences cause difficulties forDNA-binding models as the models must accuratelycharacterize the common binding sites while at the sametime capture the sites specific to each TF

Flanking DNA

Even in the case of TFs with well-defined core DNAbinding sites complexities can still arise from the DNAsequence flanking the core which can affect bindingaffinity Two such examples have been highlighted inrecent HT studies (45) The TF Gcn4 binds to the core

Nucleic Acids Research 2013 3

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

motif 50-TGACTCA-30 Binding measurements of Gcn4 tothousands of sequences containing this core motif revealeda wide range of binding affinities from no appreciablebinding to high-affinity binding (43) Nucleotides immedi-ately adjacent to the core 7-mer were important forbinding affinity but even the two nucleotides flankingthe best 9-mer affected affinity by an order of magnitudeAnother recent study examined the DNA-binding prefer-ences of the paralogous yeast TFs Cbf1 and Tye7 tohundreds of genomic sequences containing the E-boxmotif 50-CACGTG-30 (45) The study revealed a strongdependence on flanking DNA and computationalanalyses suggested that the sequence beyond the canonicalE-box motif contributes to binding specificity byinfluencing the three-dimensional structure of the DNA-binding site Importantly the additional specificity comingfrom sequences flanking the core motif could not bedescribed by simply extending the core (45) which isnot surprising given that the influence of flanking DNAis likely to be exerted through DNA shape and notthrough specific DNA contacts DNA shape is afunction of the DNA sequence however this relationshipis complex (80) and cannot be captured by simple PWMmodels Thus in order to account for the influence offlanking regions on the affinity of DNA-binding sitesfuture models of binding specificity will likely use DNAshape characteristics explicitly

Multiple modes of DNA binding

Complicating a simple model of DNA binding for manyproteins is that they can bind DNA using two or moredistinct modes (Figure 1) These different modes of inter-action can lead to fundamentally different DNA-bindingpreferences (ie motifs) complicating simple bindingmodels that do not explicitly account for these multiplemodes HT studies are increasingly revealing unknowndiversity in DNA-binding preferences of numerousproteins many of which may be the results of differentbinding modes (29727581) Here we classify variablebinding modes into four categories (Figure 1) and brieflyreview each category

Variable spacingSome proteins that bind to bipartite DNAmotifs (ie motifscomposed of two half-sites) can recognize distinct classes ofmotifs in which the half-sites are separated by differentnumbers of bases (Figure 1A) This phenomenon was firstobserved more than 20 years ago for basic leucine zipper(bZIP) proteins which can bind overlapping or adjacentTGAC half-sites (82) Subsequent studies have classifiedthe bZIP family into two subfamilies based on proteinsequence (i) Activator protein 1 (AP-1) proteins which gen-erally prefer overlapping half-sites but bind to adjacent half-sites with almost equal affinity and (ii) Activatingtranscription factor (ATF)cAMP response element-binding protein (CREB) proteins which generally preferadjacent half-sites and bind very poorly to overlappinghalf-sites (8283) However a recent PBM-based studyfound that at least one ATFCREB protein in yeast(Yap3) binds with high affinity to both adjacent and

overlapping half-sites (74) Therefore although some ofthe residues that defined bZIP half-site spacing have beenidentified (84) additional residues are also involvedVariable half-site spacing has also been well documentedfor the nuclear receptors (NRs) For example both the per-oxisome proliferator activated receptors (PPARs) andretinoic acid receptors (RARs) form heterodimers with theretinoid X receptor (RXR) and can bind to responseelements composed of direct repeats of the 50-AGGTCA-30

half-site separated by different length spacers (85)PPARRXR dimers bind direct repeats spaced by 1 or 2bases (ie DR1=50-AGGTCANAGGTCA-30 DR2=50-AGGTCANNAGGTCA-30) while RARRXR dimersbind to repeats spaced by 1 (DR1) or 5 (DR5) bases Thevariable spacing has even been shown to affect in vivofunction of DNA-bound NR dimers (86) RARRXRbound to DR5 elements can activate transcription inresponse to ligand but will not activate transcription whenbound to the DR1 elements (86)

Multiple DBDsDNA-binding proteins can contain multiple independentDBDs (387) Multiple DBDs can allow a protein to alter-natively recognize different DNA elements using differentDBDs For example the zinc finger (ZF) protein Evi1contains 10 ZF domains separated into two autonomouslyfunctioning DBDsmdashan N-terminal seven ZF DBD and aC-terminal three ZF DBDmdashthat recognize completely dif-ferent DNA motifs (88ndash90) A still more complicatedscenario is seen for the mouse TF Oct-1 that has twoDBDs known as the POU (Pit-Oct-Unc) homeodomain(POUHD) and the POU-specific domain (POUS) (Figure1B) (249192) Oct-1 can bind to three distinct DNAmotifs using different combinations of the two DBDsthe POUHD site is recognized by the POUHD domainthe POUS site is recognized by POUS domain and thecomposite POU site is recognized using both domains

Multi-meric bindingSelective multimerization provides another means toexpand or diversify the DNA-binding specificity of aprotein Proteins such as Oct-1(93) and RXR (94) havebeen reported to bind DNA as either monomers orhomodimers Recent large-scale studies using HT-SELEX assays (3995) have revealed that the dimericmode of binding might be more common than previouslyappreciated and even proteins that are known to bindDNA primarily as monomers such as Elk1 (Figure 1C)can also form dimers with specific orientation and spacingpreferences These dimeric sites are enriched withingenomic regions bound in vivo suggesting that thedimeric profiles are biologically relevant Furthermorethe dimer orientation and spacing preferences can some-times distinguish individual members of the same TFfamily (95) For example T-box factors share a commonmonomeric binding specificity but show seven distinctdimeric spacingorientation preferences

Alternate structural conformations

The recognition of distinct DNA motifs is expected whena protein contains multiple DBDs or binds as a multi-mer

4 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

however some proteins with only a single DBD can alsobind to multiple distinctly different DNA sites One suchexample is mouse TF sterol regulatory element-bindingfactor 1 (SREBF1) (Figure 1D) the ortholog of humanTF sterol regulatory element-binding protein 1 (SREBP)(5996) SREBFSREBP proteins belong to the basichelixndashloopndashhelix (bHLH) family of TFs that typically rec-ognize symmetric E-boxes (50-CAnnTG-30) Unlike mostbHLH proteins SREBFSREBP can bind the asymmetricsterol regulatory element 50-ATCACnCCAC-30 A co-crystal structure of the DNA-binding domain of humanSREBP-1a bound to DNA revealed an asymmetric DNAndashprotein interface with one monomer binding the E-boxhalf-site (50-ATCAC-30) using proteinndashDNA contactstypical of bHLH proteins and the second monomerrecognizing the non-E-box half-site (50-GTGGG-30)using entirely different proteinndashDNA contacts (9697)

Thus in the case of SREBFSREBP residues in theDBD are responsible for the multiple modes of DNAbinding Regions outside the DNA-binding domain ofthe protein can also enable alternate binding modes Forexample a region N-terminal to the basic DBD of yeastTF Hac1 is required for dual recognition modesMutations in this region were shown to preferentiallyreduce binding to one of the modes whereas mutationof an arginine in the basic domain was crucial for theother mode of binding (98) This indicates that theprotein can bind DNA using two distinct conformationsand individual residues (within and outside the DBD) playspecific roles within each binding mode (98)The ability of proteins to utilize different DNA-binding

modes presents difficulties or even precludes the construc-tion of simple DNA-binding models for many proteinsThe presence of distinct modes usually requires that

Multiple Modes of DNA Binding

Gcn4 dimers can bind to bipartite sites with half-sites separated byvariable-length spacers (82) motifsfrom (7374)

Multiple DBDs

Oct-1 can bind to different DNA sites using different arrangementsof its two DNA-binding domains (9192)motifs from (24)

Variable Spacing

GAT

ACT

TA

CTT

GTC

GTAT

AGTA

GAT

Alternate Structural

Conformations

SREBP can bind to different DNA sites by adopting alternate structural conformations (9697) motifs from (44)

Alternate conformation

CGA

AT

AG

CGA

TGC

ACG

CT

TC

GTA

GCT

Multi-meric Binding Elk1 can bind both as a monomeror as a dimer (95)

Monomer site

GA

TGC

ATTCCGC

GCGGATA

CAG

CT

CTGA

GAC

A

CGGATA

CAG

GACT

GA

Dimer site

GCAG

ATC

AT

CA

TA

ATG

TTA

ATG

GTC

POU site HD

ATG

GTCA

TATAA

TCAGT

TAC

GCTA

POU site S POU site

CGATT

GCTAC

GGATA

CAGCT

TGA

CTT

GATC

AGTA

CGAC

T

A

B

C

D

Overlapping half-sites Adjacent half-sites

CAG

ATC

TGA

GT

CACGCCAG

TC

Figure 1 Multiple modes of DNA binding Schematized are examples illustrating four mechanistic categories by which proteins recognize differentDNA sites via distinct structural modes

Nucleic Acids Research 2013 5

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

multiple DNA-binding models are used to represent aproteinrsquos specificity This can cause difficulties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifssuch as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets In some situations the DNA-bindingmodes can potentially be separated and independentlycharacterized For example one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs However even thesesituations can lead to difficulties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B)

Multi-protein recognition codes

The DNA-binding specificity of a TF is a primary deter-minant of where it will bind in the genome (99100)However transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101102)and it has been shown that multi-protein complexes canexhibit novel DNA-binding specificities not predictablefrom measurements of the individual proteins (4048)Therefore in efforts aimed at understanding the biophys-ical determinants of specificity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specificitiesHere we will review the varied mechanism by whichnovel DNA-binding specificities can arise when proteinsboth DNA-binding and non-DNA-binding assembletogether We suggest that these mechanisms representhigher-order lsquomulti-proteinrsquo recognition codesmdashmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes

Cooperative binding

Cooperative binding of TFs to DNAmdashusually viaproteinndashprotein interactions between adjacently boundTFsmdashstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3103) Cooperative binding can also affect theDNA-binding specificity and lead to recognition of newbinding sites (3) (Figure 2A) One way in which coopera-tive binding alters specificity is by extending the binding ofa TF to include lower affinity sites as a result of theenhanced overall affinity of the cooperative complex forDNA A well-studied example is the binding of yeast TFsMATa2 MATa1 and Mcm1 which regulate mating-typegenes (103104) MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes MATa2MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specific genesand repress their expression whereas MATa2Mcm1heterotetramers bind different sites found in mating-typea-specific gene promoters and repress gene expressionBoth complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2 Therefore MATa2 functions to recruitco-repressors and repress gene expression but its

DNA-binding specificitymdashand subsequently its targetgenesmdashare mediated by cooperative binding with cell-type-specific cofactors In a recent study a more subtleform of specificity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105) Notablythe cooperative binding was not mediated by directprotein contacts but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix The cooperative allosteric effects were shown tooperate even to 16 bp away

A second more active mechanism has been describedwhere cooperative binding alters the residuendashbase contactsthat a TF can make with the DNA thereby altering itsinherent binding specificity (340) (Figure 2A) Oneexample is the binding of the TFs Ets-1 and Pax5 wherePax5 alters the DNA contacts made by Ets-1 making itmore permissive to alternate DNA sequences (106)Bound alone to DNA Ets-1 binds with high-affinity to50-GGAA-30 and with lower affinity to 50-GGAG-30A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanineHowever in complex with Pax-5 the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences and therefore broadens the specificity of thePax-5Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40107)The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specificsite (fkh250) found in the fork head (fkh) gene over aconsensus site (fkh250con) recognized by other HoxExdcomplexes (107108) The preferential binding of theScrExd complex to the fkh250 site involves the stabiliza-tion of a flexible lsquoarmrsquo region N-terminal to the Scrhomeodomain (107) Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residuesTherefore the enhanced specificity of the ScrExdcomplex to the fhk250 sequence is due to local DNAshape (ie minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor This work highlights how the intrin-sic specificity of a monomeric homeodomain (Scr) can bealtered or refined through interaction with a proteincofactor (Exd) a characteristic called latent specificityA recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specificitiesfor all the Drosophila Hox proteins (40) A HT-SELEXSELEX-seq assay was used to measure and compare theDNA-binding specificity of all eight Drosophila Hoxproteins alone and in complex with Exd As seen forScrExd binding with the Exd cofactor revealed latent

6 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

motif 50-TGACTCA-30 Binding measurements of Gcn4 tothousands of sequences containing this core motif revealeda wide range of binding affinities from no appreciablebinding to high-affinity binding (43) Nucleotides immedi-ately adjacent to the core 7-mer were important forbinding affinity but even the two nucleotides flankingthe best 9-mer affected affinity by an order of magnitudeAnother recent study examined the DNA-binding prefer-ences of the paralogous yeast TFs Cbf1 and Tye7 tohundreds of genomic sequences containing the E-boxmotif 50-CACGTG-30 (45) The study revealed a strongdependence on flanking DNA and computationalanalyses suggested that the sequence beyond the canonicalE-box motif contributes to binding specificity byinfluencing the three-dimensional structure of the DNA-binding site Importantly the additional specificity comingfrom sequences flanking the core motif could not bedescribed by simply extending the core (45) which isnot surprising given that the influence of flanking DNAis likely to be exerted through DNA shape and notthrough specific DNA contacts DNA shape is afunction of the DNA sequence however this relationshipis complex (80) and cannot be captured by simple PWMmodels Thus in order to account for the influence offlanking regions on the affinity of DNA-binding sitesfuture models of binding specificity will likely use DNAshape characteristics explicitly

Multiple modes of DNA binding

Complicating a simple model of DNA binding for manyproteins is that they can bind DNA using two or moredistinct modes (Figure 1) These different modes of inter-action can lead to fundamentally different DNA-bindingpreferences (ie motifs) complicating simple bindingmodels that do not explicitly account for these multiplemodes HT studies are increasingly revealing unknowndiversity in DNA-binding preferences of numerousproteins many of which may be the results of differentbinding modes (29727581) Here we classify variablebinding modes into four categories (Figure 1) and brieflyreview each category

Variable spacingSome proteins that bind to bipartite DNAmotifs (ie motifscomposed of two half-sites) can recognize distinct classes ofmotifs in which the half-sites are separated by differentnumbers of bases (Figure 1A) This phenomenon was firstobserved more than 20 years ago for basic leucine zipper(bZIP) proteins which can bind overlapping or adjacentTGAC half-sites (82) Subsequent studies have classifiedthe bZIP family into two subfamilies based on proteinsequence (i) Activator protein 1 (AP-1) proteins which gen-erally prefer overlapping half-sites but bind to adjacent half-sites with almost equal affinity and (ii) Activatingtranscription factor (ATF)cAMP response element-binding protein (CREB) proteins which generally preferadjacent half-sites and bind very poorly to overlappinghalf-sites (8283) However a recent PBM-based studyfound that at least one ATFCREB protein in yeast(Yap3) binds with high affinity to both adjacent and

overlapping half-sites (74) Therefore although some ofthe residues that defined bZIP half-site spacing have beenidentified (84) additional residues are also involvedVariable half-site spacing has also been well documentedfor the nuclear receptors (NRs) For example both the per-oxisome proliferator activated receptors (PPARs) andretinoic acid receptors (RARs) form heterodimers with theretinoid X receptor (RXR) and can bind to responseelements composed of direct repeats of the 50-AGGTCA-30

half-site separated by different length spacers (85)PPARRXR dimers bind direct repeats spaced by 1 or 2bases (ie DR1=50-AGGTCANAGGTCA-30 DR2=50-AGGTCANNAGGTCA-30) while RARRXR dimersbind to repeats spaced by 1 (DR1) or 5 (DR5) bases Thevariable spacing has even been shown to affect in vivofunction of DNA-bound NR dimers (86) RARRXRbound to DR5 elements can activate transcription inresponse to ligand but will not activate transcription whenbound to the DR1 elements (86)

Multiple DBDsDNA-binding proteins can contain multiple independentDBDs (387) Multiple DBDs can allow a protein to alter-natively recognize different DNA elements using differentDBDs For example the zinc finger (ZF) protein Evi1contains 10 ZF domains separated into two autonomouslyfunctioning DBDsmdashan N-terminal seven ZF DBD and aC-terminal three ZF DBDmdashthat recognize completely dif-ferent DNA motifs (88ndash90) A still more complicatedscenario is seen for the mouse TF Oct-1 that has twoDBDs known as the POU (Pit-Oct-Unc) homeodomain(POUHD) and the POU-specific domain (POUS) (Figure1B) (249192) Oct-1 can bind to three distinct DNAmotifs using different combinations of the two DBDsthe POUHD site is recognized by the POUHD domainthe POUS site is recognized by POUS domain and thecomposite POU site is recognized using both domains

Multi-meric bindingSelective multimerization provides another means toexpand or diversify the DNA-binding specificity of aprotein Proteins such as Oct-1(93) and RXR (94) havebeen reported to bind DNA as either monomers orhomodimers Recent large-scale studies using HT-SELEX assays (3995) have revealed that the dimericmode of binding might be more common than previouslyappreciated and even proteins that are known to bindDNA primarily as monomers such as Elk1 (Figure 1C)can also form dimers with specific orientation and spacingpreferences These dimeric sites are enriched withingenomic regions bound in vivo suggesting that thedimeric profiles are biologically relevant Furthermorethe dimer orientation and spacing preferences can some-times distinguish individual members of the same TFfamily (95) For example T-box factors share a commonmonomeric binding specificity but show seven distinctdimeric spacingorientation preferences

Alternate structural conformations

The recognition of distinct DNA motifs is expected whena protein contains multiple DBDs or binds as a multi-mer

4 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

however some proteins with only a single DBD can alsobind to multiple distinctly different DNA sites One suchexample is mouse TF sterol regulatory element-bindingfactor 1 (SREBF1) (Figure 1D) the ortholog of humanTF sterol regulatory element-binding protein 1 (SREBP)(5996) SREBFSREBP proteins belong to the basichelixndashloopndashhelix (bHLH) family of TFs that typically rec-ognize symmetric E-boxes (50-CAnnTG-30) Unlike mostbHLH proteins SREBFSREBP can bind the asymmetricsterol regulatory element 50-ATCACnCCAC-30 A co-crystal structure of the DNA-binding domain of humanSREBP-1a bound to DNA revealed an asymmetric DNAndashprotein interface with one monomer binding the E-boxhalf-site (50-ATCAC-30) using proteinndashDNA contactstypical of bHLH proteins and the second monomerrecognizing the non-E-box half-site (50-GTGGG-30)using entirely different proteinndashDNA contacts (9697)

Thus in the case of SREBFSREBP residues in theDBD are responsible for the multiple modes of DNAbinding Regions outside the DNA-binding domain ofthe protein can also enable alternate binding modes Forexample a region N-terminal to the basic DBD of yeastTF Hac1 is required for dual recognition modesMutations in this region were shown to preferentiallyreduce binding to one of the modes whereas mutationof an arginine in the basic domain was crucial for theother mode of binding (98) This indicates that theprotein can bind DNA using two distinct conformationsand individual residues (within and outside the DBD) playspecific roles within each binding mode (98)The ability of proteins to utilize different DNA-binding

modes presents difficulties or even precludes the construc-tion of simple DNA-binding models for many proteinsThe presence of distinct modes usually requires that

Multiple Modes of DNA Binding

Gcn4 dimers can bind to bipartite sites with half-sites separated byvariable-length spacers (82) motifsfrom (7374)

Multiple DBDs

Oct-1 can bind to different DNA sites using different arrangementsof its two DNA-binding domains (9192)motifs from (24)

Variable Spacing

GAT

ACT

TA

CTT

GTC

GTAT

AGTA

GAT

Alternate Structural

Conformations

SREBP can bind to different DNA sites by adopting alternate structural conformations (9697) motifs from (44)

Alternate conformation

CGA

AT

AG

CGA

TGC

ACG

CT

TC

GTA

GCT

Multi-meric Binding Elk1 can bind both as a monomeror as a dimer (95)

Monomer site

GA

TGC

ATTCCGC

GCGGATA

CAG

CT

CTGA

GAC

A

CGGATA

CAG

GACT

GA

Dimer site

GCAG

ATC

AT

CA

TA

ATG

TTA

ATG

GTC

POU site HD

ATG

GTCA

TATAA

TCAGT

TAC

GCTA

POU site S POU site

CGATT

GCTAC

GGATA

CAGCT

TGA

CTT

GATC

AGTA

CGAC

T

A

B

C

D

Overlapping half-sites Adjacent half-sites

CAG

ATC

TGA

GT

CACGCCAG

TC

Figure 1 Multiple modes of DNA binding Schematized are examples illustrating four mechanistic categories by which proteins recognize differentDNA sites via distinct structural modes

Nucleic Acids Research 2013 5

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

multiple DNA-binding models are used to represent aproteinrsquos specificity This can cause difficulties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifssuch as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets In some situations the DNA-bindingmodes can potentially be separated and independentlycharacterized For example one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs However even thesesituations can lead to difficulties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B)

Multi-protein recognition codes

The DNA-binding specificity of a TF is a primary deter-minant of where it will bind in the genome (99100)However transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101102)and it has been shown that multi-protein complexes canexhibit novel DNA-binding specificities not predictablefrom measurements of the individual proteins (4048)Therefore in efforts aimed at understanding the biophys-ical determinants of specificity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specificitiesHere we will review the varied mechanism by whichnovel DNA-binding specificities can arise when proteinsboth DNA-binding and non-DNA-binding assembletogether We suggest that these mechanisms representhigher-order lsquomulti-proteinrsquo recognition codesmdashmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes

Cooperative binding

Cooperative binding of TFs to DNAmdashusually viaproteinndashprotein interactions between adjacently boundTFsmdashstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3103) Cooperative binding can also affect theDNA-binding specificity and lead to recognition of newbinding sites (3) (Figure 2A) One way in which coopera-tive binding alters specificity is by extending the binding ofa TF to include lower affinity sites as a result of theenhanced overall affinity of the cooperative complex forDNA A well-studied example is the binding of yeast TFsMATa2 MATa1 and Mcm1 which regulate mating-typegenes (103104) MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes MATa2MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specific genesand repress their expression whereas MATa2Mcm1heterotetramers bind different sites found in mating-typea-specific gene promoters and repress gene expressionBoth complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2 Therefore MATa2 functions to recruitco-repressors and repress gene expression but its

DNA-binding specificitymdashand subsequently its targetgenesmdashare mediated by cooperative binding with cell-type-specific cofactors In a recent study a more subtleform of specificity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105) Notablythe cooperative binding was not mediated by directprotein contacts but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix The cooperative allosteric effects were shown tooperate even to 16 bp away

A second more active mechanism has been describedwhere cooperative binding alters the residuendashbase contactsthat a TF can make with the DNA thereby altering itsinherent binding specificity (340) (Figure 2A) Oneexample is the binding of the TFs Ets-1 and Pax5 wherePax5 alters the DNA contacts made by Ets-1 making itmore permissive to alternate DNA sequences (106)Bound alone to DNA Ets-1 binds with high-affinity to50-GGAA-30 and with lower affinity to 50-GGAG-30A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanineHowever in complex with Pax-5 the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences and therefore broadens the specificity of thePax-5Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40107)The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specificsite (fkh250) found in the fork head (fkh) gene over aconsensus site (fkh250con) recognized by other HoxExdcomplexes (107108) The preferential binding of theScrExd complex to the fkh250 site involves the stabiliza-tion of a flexible lsquoarmrsquo region N-terminal to the Scrhomeodomain (107) Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residuesTherefore the enhanced specificity of the ScrExdcomplex to the fhk250 sequence is due to local DNAshape (ie minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor This work highlights how the intrin-sic specificity of a monomeric homeodomain (Scr) can bealtered or refined through interaction with a proteincofactor (Exd) a characteristic called latent specificityA recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specificitiesfor all the Drosophila Hox proteins (40) A HT-SELEXSELEX-seq assay was used to measure and compare theDNA-binding specificity of all eight Drosophila Hoxproteins alone and in complex with Exd As seen forScrExd binding with the Exd cofactor revealed latent

6 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

however some proteins with only a single DBD can alsobind to multiple distinctly different DNA sites One suchexample is mouse TF sterol regulatory element-bindingfactor 1 (SREBF1) (Figure 1D) the ortholog of humanTF sterol regulatory element-binding protein 1 (SREBP)(5996) SREBFSREBP proteins belong to the basichelixndashloopndashhelix (bHLH) family of TFs that typically rec-ognize symmetric E-boxes (50-CAnnTG-30) Unlike mostbHLH proteins SREBFSREBP can bind the asymmetricsterol regulatory element 50-ATCACnCCAC-30 A co-crystal structure of the DNA-binding domain of humanSREBP-1a bound to DNA revealed an asymmetric DNAndashprotein interface with one monomer binding the E-boxhalf-site (50-ATCAC-30) using proteinndashDNA contactstypical of bHLH proteins and the second monomerrecognizing the non-E-box half-site (50-GTGGG-30)using entirely different proteinndashDNA contacts (9697)

Thus in the case of SREBFSREBP residues in theDBD are responsible for the multiple modes of DNAbinding Regions outside the DNA-binding domain ofthe protein can also enable alternate binding modes Forexample a region N-terminal to the basic DBD of yeastTF Hac1 is required for dual recognition modesMutations in this region were shown to preferentiallyreduce binding to one of the modes whereas mutationof an arginine in the basic domain was crucial for theother mode of binding (98) This indicates that theprotein can bind DNA using two distinct conformationsand individual residues (within and outside the DBD) playspecific roles within each binding mode (98)The ability of proteins to utilize different DNA-binding

modes presents difficulties or even precludes the construc-tion of simple DNA-binding models for many proteinsThe presence of distinct modes usually requires that

Multiple Modes of DNA Binding

Gcn4 dimers can bind to bipartite sites with half-sites separated byvariable-length spacers (82) motifsfrom (7374)

Multiple DBDs

Oct-1 can bind to different DNA sites using different arrangementsof its two DNA-binding domains (9192)motifs from (24)

Variable Spacing

GAT

ACT

TA

CTT

GTC

GTAT

AGTA

GAT

Alternate Structural

Conformations

SREBP can bind to different DNA sites by adopting alternate structural conformations (9697) motifs from (44)

Alternate conformation

CGA

AT

AG

CGA

TGC

ACG

CT

TC

GTA

GCT

Multi-meric Binding Elk1 can bind both as a monomeror as a dimer (95)

Monomer site

GA

TGC

ATTCCGC

GCGGATA

CAG

CT

CTGA

GAC

A

CGGATA

CAG

GACT

GA

Dimer site

GCAG

ATC

AT

CA

TA

ATG

TTA

ATG

GTC

POU site HD

ATG

GTCA

TATAA

TCAGT

TAC

GCTA

POU site S POU site

CGATT

GCTAC

GGATA

CAGCT

TGA

CTT

GATC

AGTA

CGAC

T

A

B

C

D

Overlapping half-sites Adjacent half-sites

CAG

ATC

TGA

GT

CACGCCAG

TC

Figure 1 Multiple modes of DNA binding Schematized are examples illustrating four mechanistic categories by which proteins recognize differentDNA sites via distinct structural modes

Nucleic Acids Research 2013 5

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

multiple DNA-binding models are used to represent aproteinrsquos specificity This can cause difficulties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifssuch as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets In some situations the DNA-bindingmodes can potentially be separated and independentlycharacterized For example one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs However even thesesituations can lead to difficulties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B)

Multi-protein recognition codes

The DNA-binding specificity of a TF is a primary deter-minant of where it will bind in the genome (99100)However transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101102)and it has been shown that multi-protein complexes canexhibit novel DNA-binding specificities not predictablefrom measurements of the individual proteins (4048)Therefore in efforts aimed at understanding the biophys-ical determinants of specificity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specificitiesHere we will review the varied mechanism by whichnovel DNA-binding specificities can arise when proteinsboth DNA-binding and non-DNA-binding assembletogether We suggest that these mechanisms representhigher-order lsquomulti-proteinrsquo recognition codesmdashmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes

Cooperative binding

Cooperative binding of TFs to DNAmdashusually viaproteinndashprotein interactions between adjacently boundTFsmdashstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3103) Cooperative binding can also affect theDNA-binding specificity and lead to recognition of newbinding sites (3) (Figure 2A) One way in which coopera-tive binding alters specificity is by extending the binding ofa TF to include lower affinity sites as a result of theenhanced overall affinity of the cooperative complex forDNA A well-studied example is the binding of yeast TFsMATa2 MATa1 and Mcm1 which regulate mating-typegenes (103104) MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes MATa2MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specific genesand repress their expression whereas MATa2Mcm1heterotetramers bind different sites found in mating-typea-specific gene promoters and repress gene expressionBoth complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2 Therefore MATa2 functions to recruitco-repressors and repress gene expression but its

DNA-binding specificitymdashand subsequently its targetgenesmdashare mediated by cooperative binding with cell-type-specific cofactors In a recent study a more subtleform of specificity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105) Notablythe cooperative binding was not mediated by directprotein contacts but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix The cooperative allosteric effects were shown tooperate even to 16 bp away

A second more active mechanism has been describedwhere cooperative binding alters the residuendashbase contactsthat a TF can make with the DNA thereby altering itsinherent binding specificity (340) (Figure 2A) Oneexample is the binding of the TFs Ets-1 and Pax5 wherePax5 alters the DNA contacts made by Ets-1 making itmore permissive to alternate DNA sequences (106)Bound alone to DNA Ets-1 binds with high-affinity to50-GGAA-30 and with lower affinity to 50-GGAG-30A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanineHowever in complex with Pax-5 the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences and therefore broadens the specificity of thePax-5Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40107)The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specificsite (fkh250) found in the fork head (fkh) gene over aconsensus site (fkh250con) recognized by other HoxExdcomplexes (107108) The preferential binding of theScrExd complex to the fkh250 site involves the stabiliza-tion of a flexible lsquoarmrsquo region N-terminal to the Scrhomeodomain (107) Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residuesTherefore the enhanced specificity of the ScrExdcomplex to the fhk250 sequence is due to local DNAshape (ie minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor This work highlights how the intrin-sic specificity of a monomeric homeodomain (Scr) can bealtered or refined through interaction with a proteincofactor (Exd) a characteristic called latent specificityA recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specificitiesfor all the Drosophila Hox proteins (40) A HT-SELEXSELEX-seq assay was used to measure and compare theDNA-binding specificity of all eight Drosophila Hoxproteins alone and in complex with Exd As seen forScrExd binding with the Exd cofactor revealed latent

6 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

multiple DNA-binding models are used to represent aproteinrsquos specificity This can cause difficulties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifssuch as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets In some situations the DNA-bindingmodes can potentially be separated and independentlycharacterized For example one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs However even thesesituations can lead to difficulties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B)

Multi-protein recognition codes

The DNA-binding specificity of a TF is a primary deter-minant of where it will bind in the genome (99100)However transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101102)and it has been shown that multi-protein complexes canexhibit novel DNA-binding specificities not predictablefrom measurements of the individual proteins (4048)Therefore in efforts aimed at understanding the biophys-ical determinants of specificity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specificitiesHere we will review the varied mechanism by whichnovel DNA-binding specificities can arise when proteinsboth DNA-binding and non-DNA-binding assembletogether We suggest that these mechanisms representhigher-order lsquomulti-proteinrsquo recognition codesmdashmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes

Cooperative binding

Cooperative binding of TFs to DNAmdashusually viaproteinndashprotein interactions between adjacently boundTFsmdashstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3103) Cooperative binding can also affect theDNA-binding specificity and lead to recognition of newbinding sites (3) (Figure 2A) One way in which coopera-tive binding alters specificity is by extending the binding ofa TF to include lower affinity sites as a result of theenhanced overall affinity of the cooperative complex forDNA A well-studied example is the binding of yeast TFsMATa2 MATa1 and Mcm1 which regulate mating-typegenes (103104) MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes MATa2MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specific genesand repress their expression whereas MATa2Mcm1heterotetramers bind different sites found in mating-typea-specific gene promoters and repress gene expressionBoth complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2 Therefore MATa2 functions to recruitco-repressors and repress gene expression but its

DNA-binding specificitymdashand subsequently its targetgenesmdashare mediated by cooperative binding with cell-type-specific cofactors In a recent study a more subtleform of specificity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105) Notablythe cooperative binding was not mediated by directprotein contacts but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix The cooperative allosteric effects were shown tooperate even to 16 bp away

A second more active mechanism has been describedwhere cooperative binding alters the residuendashbase contactsthat a TF can make with the DNA thereby altering itsinherent binding specificity (340) (Figure 2A) Oneexample is the binding of the TFs Ets-1 and Pax5 wherePax5 alters the DNA contacts made by Ets-1 making itmore permissive to alternate DNA sequences (106)Bound alone to DNA Ets-1 binds with high-affinity to50-GGAA-30 and with lower affinity to 50-GGAG-30A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanineHowever in complex with Pax-5 the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences and therefore broadens the specificity of thePax-5Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40107)The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specificsite (fkh250) found in the fork head (fkh) gene over aconsensus site (fkh250con) recognized by other HoxExdcomplexes (107108) The preferential binding of theScrExd complex to the fkh250 site involves the stabiliza-tion of a flexible lsquoarmrsquo region N-terminal to the Scrhomeodomain (107) Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residuesTherefore the enhanced specificity of the ScrExdcomplex to the fhk250 sequence is due to local DNAshape (ie minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor This work highlights how the intrin-sic specificity of a monomeric homeodomain (Scr) can bealtered or refined through interaction with a proteincofactor (Exd) a characteristic called latent specificityA recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specificitiesfor all the Drosophila Hox proteins (40) A HT-SELEXSELEX-seq assay was used to measure and compare theDNA-binding specificity of all eight Drosophila Hoxproteins alone and in complex with Exd As seen forScrExd binding with the Exd cofactor revealed latent

6 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity differences between the Hox proteins not ob-servable when Hox proteins were examined as monomers

Cooperative cofactor recruitment

The recruitment of non-DNA-binding cofactors (iecoactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101) lsquoCooperativersquo recruitment occurs when a cofactorcan make simultaneous interactions with multiple DNA-bound TFs the result is a synergistic enhancement in thecofactor recruitment (109110) Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative lsquoAND-typersquo

regulatory logic in gene transcription (111ndash114) At thesame time cooperative recruitment enhances the bindingspecificity of the cofactor The genomic location of thecofactor is determined not by the presence of a singleTF but by the less-frequent (ie more specific) coinci-dent presence of multiple TFs (Figure 2B)A paradigm for cooperative cofactor recruitment are

multi-protein enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111112) Inresponse to viral infection the TFs ATF-2 c-Jun Irf3Irf7 and NF-kB facilitated by the architectural factor

Multi-Protein Recognition Codes

Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103104106) Weak site

(assisted binding)

Inter-protein interactions alter or stabilizeprotein-DNA contacts altering DNA-binding specificity (40106107)

Altered sidechain conformation

Cooperative recruitment

Cofactor recruitment requires multiple factors (rather than only one) allowing more specific cofactor targeting (109-114)

Weak or no recruitment

Strong cooperative recruitment

Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121

Recruiting siteNon-recruiting site

Cofactor-based targeting

Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48129)

Enhanced binding to composite site

Stabilization of contacts(eg N-terminal tail)

Weak site (no binding)

Strong site

Cooperative binding

A

B

C

D

124125)

Figure 2 Multi-protein recognition codes Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes

Nucleic Acids Research 2013 7

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

Hmga1 bind cooperatively to the IFNb promoterMultiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP and operate syn-ergistically to enhance transcription (111) While NF-kBthe ATF-2c-Jun dimer and Irf factors can recruit CBPindividually studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment highlightingthe cooperative nature of the CBP cofactor recruitment(111) A similar situation occurs in the cooperative recruit-ment of class II major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113115) The control sequences contain four DNAelements termed the W X X2 and Y boxes with aconserved sequence spacing and orientation across thedifferent promoters The TFs X-box binding protein(RFX) X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X X2 and Y boxes respectivelyand form a cooperative nucleoprotein enhanceosomecomplex This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113) Removal of interactions from any ofthese component proteins leads to significant loss ofCIITA recruitment

Allosteric effects in cofactor recruitment

DNA can act as a sequence-specific allosteric ligand thatalters the function of a DNA-bound TF (116ndash121)A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitmentA TF bound to one set of DNA sites will recruit a specificcofactor whereas the same TF bound to a different set ofsites will not recruit the cofactor Therefore allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently refinesbinding specificity of the recruited cofactor (Figure 2C)Allosteric mechanisms have been reported for both the

glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116118) The DNA sequence bound by ER can in-fluence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1GRIP1 ACTR) (116) The DNA sequence bound by GRcalled the GR response element (GRE) was shown to in-fluence gene expression but the effect did not correlatewith GR-binding affinity suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118) Furthermore knock down of Brahma a subunitof the SWISNF nucleosome remodeling complexaffected GR-dependent gene expression in a GREsequence-dependent manner suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117120) Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65Rel-containingdimers (117) as well as the recruitment of the cofactors

histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound ternary complex containing p50 homodimers andBcl3 (120)

The mechanism of allosteric control for GR and NF-kBappears to be moderated by structural differences betweenTFs bound to different DNA sites Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lsquolever armrsquo loop region connecting the DNArecognition helix and the ligand binding domain washighly dependent on the GRE sequence (118) Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (1921122) Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72123)

In contrast to the situation for GR and NF-kB whereallostery is mediated by structural differences betweenthe DNA-bound complexes allostery involving the POUfactors operates via larger more global differences in qua-ternary structure when bound to different DNA sitesPOU factors such as Oct-1 described above havetwo DBDs connected by a flexible linker POUS anda POUHD (121) POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUS

and POUHD can vary and subsequently lead to differentialcofactor recruitment (121124) The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements PORE and MORE found in the pro-moters of B-cell-specific genes (121) When bound to thePORE sequence Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1) howeverdimers bound to the MORE sequence do not recruitOBF-1 Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits but in the PORE-bounddimer the site is exposed allowing OBF-1 recruitment(125126) A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specific expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124127128) Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter compared with a similar affinity site inthe prolactin gene promoter altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124)

Cofactor-based specificity

Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinndashprotein interactions with DNA-bound TFs In otherwords the cofactor is lsquorecruitedrsquo and interacts withDNA indirectly through the TFs However severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger multi-protein complex

8 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

(48129) Further as part of this larger complex the non-DNA-binding cofactors mediate sequence-specific inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specific auxiliary motifs (Figure2D) This represents yet another mechanism to enhancethe DNA sequence specificity of multi-protein regulatorycomplexes

The regulation of sulfur metabolism genes in yeastinvolves a multi-protein complex composed of asequence-specific TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130131) Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131) A PBM-basedanalysis revealed that the Met4Met28Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is flanked by an adjacent 50-RYAAT-30

lsquorecruitmentrsquo motif (ie 50-RYAATNNCACGTG-30)(48) High-affinity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex Sequence analysis modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit A highlysimilar situation has also been described for the multi-protein Oct-1HCF-1VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129) The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors the viral activator VP16 and the host cellfactor 1 (HCF-1) Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence Similar to the situation inyeast the VP16 cofactor does not interact with DNA wellon its own but when stabilized on DNA as part of a largermulti-protein complex it can adopt a configuration thatmediates recognition of the 50-GARAT subsequenceTherefore in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specific TF(Cbf1 and Oct-1) subunits respectively

SUMMARY

Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models or rec-ognition codes of DNA-binding Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences andrequire the use of binding models such as PWMs thatcan account for the resulting sequence degeneracy Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteinsFortunately HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized but are providing the comprehensivebinding data needed to construct models that involve

multiple DNA-binding modes (232427283538ndash4042434859737479132ndash134) An added benefit of thecomprehensive nature of certain HT datasets such ascomprehensive k-mer or binding site level data is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23244043133)Multi-protein complexes add tremendously to the

DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulationHowever these higher-order multi-protein mechanismscause difficulties in constructing models primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present Here again HTtechnologies are being used successfully to characterizebinding of multi-protein complexes revealing recognitioncodes mediated by cooperative binding (40133) andcofactor-mediated targeting (48) The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly refined binding models and provide new insights intogene targeting in vivo Furthermore just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery cooperativerecruitment and cofactor-mediated targeting Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecificity in gene regulation

FUNDING

NIH [K22AI093793 to TS] PhRMA FoundationResearch Starter Grant (to RG) Basil OrsquoConnorResearch Award 5-FY13-212 from March of DimesFoundation (to RG) Funding for open access chargeWaived by Oxford University Press

Conflict of interest statement None declared

REFERENCES

1 SeemanNC RosenbergJM and RichA (1976) Sequence-specific recognition of double helical nucleic acids by proteinsProc Natl Acad Sci USA 73 804ndash808

2 RohsR JinX WestSM JoshiR HonigB and MannRS(2010) Origins of specificity in protein-DNA recognition AnnuRev Biochem 79 233ndash269

3 GarvieCW and WolbergerC (2001) Recognition of specificDNA sequences Mol Cell 8 937ndash946

4 JayaramB and JainT (2004) The role of water in protein-DNArecognition Annu Rev Biophys Biomol Struct 33 343ndash361

5 ReddyCK DasA and JayaramB (2001) Do water moleculesmediate protein-DNA recognition J Mol Biol 314 619ndash632

6 MillerJC and PaboCO (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc finger-DNArecognition J Mol Biol 313 309ndash315

7 BaldwinEP MartinSS AbelJ GelatoKA KimHSchultzPG and SantoroSW (2003) A specificity switch in

Nucleic Acids Research 2013 9

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

selected cre recombinase variants is mediated by macromolecularplasticity and water Chem Biol 10 1085ndash1094

8 OtwinowskiZ SchevitzRW ZhangRG LawsonCLJoachimiakA MarmorsteinRQ LuisiBF and SiglerPB(1988) Crystal structure of trp repressoroperator complex atatomic resolution Nature 335 321ndash329

9 RohsR WestSM SosinskyA LiuP MannRS andHonigB (2009) The role of DNA shape in protein-DNArecognition Nature 461 1248ndash1253

10 ChangYP XuM MachadoAC YuXJ RohsR andChenXS (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen Cell Reports 3 1117ndash1127

11 WernerMH GronenbornAM and CloreGM (1996)Intercalation DNA kinking and the control of transcriptionScience 271 778ndash784

12 KimJL NikolovDB and BurleySK (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement Nature 365 520ndash527

13 RohsR SklenarH and ShakkedZ (2005) Structural andenergetic origins of sequence-specific DNA bending Monte Carlosimulations of papillomavirus E2-DNA binding sites Structure13 1499ndash1509

14 PaboCO and NekludovaL (2000) Geometric analysis andcomparison of protein-DNA interfaces why is there no simplecode for recognition J Mol Biol 301 597ndash624

15 SuzukiM GersteinM and YagiN (1994) Stereochemical basisof DNA recognition by Zn fingers Nucleic Acids Res 223397ndash3405

16 KortemmeT MorozovAV and BakerD (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecificity and structure for proteins and protein-proteincomplexes J Mol Biol 326 1239ndash1259

17 SiggersTW and HonigB (2007) Structure-based prediction ofC2H2 zinc-finger binding specificity sensitivity to dockinggeometry Nucleic Acids Res 35 1085ndash1097

18 SiggersTW SilkovA and HonigB (2005) Structural alignmentof proteinndashDNA interfaces insights into the determinants ofbinding specificity J Mol Biol 345 1027ndash1045

19 ChenFE and GhoshG (1999) Regulation of DNA binding byRelNF-kappaB transcription factors structural views Oncogene18 6845ndash6852

20 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

21 ChenYQ SengchanthalangsyLL HackettA and GhoshG(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets Structure 8 419ndash428

22 YanoverC and BradleyP (2011) Extensive protein and DNAbackbone sampling improves structure-based specificity predictionfor C2H2 zinc fingers Nucleic Acids Res 39 4564ndash4576

23 BergerMF BadisG GehrkeAR TalukderSPhilippakisAA Pena-CastilloL AlleyneTM MnaimnehSBotvinnikOB ChanET et al (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences Cell 133 1266ndash1276

24 BadisG BergerMF PhilippakisAA TalukderSGehrkeAR JaegerSA ChanET MetzlerG VedenkoAChenX et al (2009) Diversity and complexity in DNArecognition by transcription factors Science 324 1720ndash1723

25 RamirezCL FoleyJE WrightDA Muller-LerchFRahmanSH CornuTI WinfreyRJ SanderJD FuFTownsendJA et al (2008) Unexpected failure rates formodular assembly of engineered zinc fingers Nat Methods 5374ndash375

26 WeiGH BadisG BergerMF KiviojaT PalinK EngeMBonkeM JolmaA VarjosaloM GehrkeAR et al (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo EMBO J 29 2147ndash2160

27 MengX SmithRM GieseckeAV JoungJK and WolfeSA(2006) Counter-selectable marker for bacterial-based interactiontrap systems Biotechniques 40 179ndash184

28 NoyesMB MengX WakabayashiA SinhaS BrodskyMHand WolfeSA (2008) A systematic characterization of factors

that regulate Drosophila segmentation via a bacterial one-hybridsystem Nucleic Acids Res 36 2547ndash2560

29 BergerMF PhilippakisAA QureshiAM HeFSEstepPW 3rd and BulykML (2006) Compact universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specificities Nat Biotechnol 24 1429ndash1435

30 BulykML GentalenE LockhartDJ and ChurchGM (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays Nat Biotechnol 17 573ndash577

31 MukherjeeS BergerMF JonaG WangXS MuzzeyDSnyderM YoungRA and BulykML (2004) Rapid analysis ofthe DNA-binding specificities of transcription factors with DNAmicroarrays Nat Genet 36 1331ndash1339

32 LinnellJ MottR FieldS KwiatkowskiDP RagoussisJ andUdalovaIA (2004) Quantitative high-throughput analysis oftranscription factor binding specificities Nucleic Acids Res 32e44

33 FieldS UdalovaI and RagoussisJ (2007) Accuracy andreproducibility of protein-DNA microarray technology AdvBiochem Eng Biotechnol 104 87ndash110

34 BonhamAJ NeumannT TirrellM and ReichNO (2009)Tracking transcription factor complexes on DNA using totalinternal reflectance fluorescence protein binding microarraysNucleic Acids Res 37 e94

35 MaerklSJ and QuakeSR (2007) A systems approach tomeasuring the binding energy landscapes of transcription factorsScience 315 233ndash237

36 ZykovichA KorfI and SegalDJ (2009) Bind-n-Seq high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing Nucleic Acids Res 37 e151

37 WongD TeixeiraA OikonomopoulosS HumburgPLoneIN SalibaD SiggersT BulykM AngelovDDimitrovS et al (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits Genome Biol 12 R70

38 ZhaoY GranasD and StormoGD (2009) Inferring bindingenergies from selected binding sites PLoS Comput Biol 5e1000590

39 JolmaA KiviojaT ToivonenJ ChengL WeiG EngeMTaipaleM VaquerizasJM YanJ SillanpaaMJ et al (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specificities Genome Res 20861ndash873

40 SlatteryM RileyT LiuP AbeN Gomez-AlcalaP DrorIZhouT RohsR HonigB BussemakerHJ et al (2011)Cofactor binding evokes latent differences in DNA bindingspecificity between Hox proteins Cell 147 1270ndash1282

41 TantinD GemberlingM CallisterC and FairbrotherWG(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)DNAcomplexes Genome Res 18 631ndash639

42 WarrenCL KratochvilNC HauschildKE FoisterSBrezinskiML DervanPB PhillipsGN Jr and AnsariAZ(2006) Defining the sequence-recognition profile of DNA-bindingmolecules Proc Natl Acad Sci USA 103 867ndash872

43 NutiuR FriedmanRC LuoS KhrebtukovaI SilvaD LiRZhangL SchrothGP and BurgeCB (2011) Directmeasurement of DNA affinity landscapes on a high-throughputsequencing instrument Nat Biotechnol 29 659ndash664

44 WeirauchMT and HughesTR Dramatic changes intranscription factor binding over evolutionary time Genome Biol11 122

45 GordanR ShenN DrorI ZhouT HortonJ RohsR andBulykML (2013) Genomic regions flanking E-box binding sitesinfluence DNA binding specificity of bHLH transcription factorsthrough DNA shape Cell Reports 3 1093ndash1104

46 StormoGD (2000) DNA binding sites representation anddiscovery Bioinformatics 16 16ndash23

47 ManiatisT PtashneM BackmanK KieldD FlashmanSJeffreyA and MaurerR (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda Cell 5 109ndash113

48 SiggersT DuyzendMH ReddyJ KhanS and BulykML(2011) Non-DNA-binding cofactors enhance DNA-binding

10 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

specificity of a transcriptional regulatory complex Mol SystBiol 7 555

49 StadenR (1984) Computer methods to locate signals in nucleicacid sequences Nucleic Acids Res 12 505ndash519

50 WorkmanCT YinY CorcoranDL IdekerT StormoGDand BenosPV (2005) enoLOGOS a versatile web tool for energynormalized sequence logos Nucleic Acids Res 33 W389ndashW392

51 ManTK and StormoGD (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple fluorescence relative affinity (QuMFRA) assay NucleicAcids Res 29 2471ndash2478

52 BulykML JohnsonPL and ChurchGM (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding affinities of transcription factors Nucleic AcidsRes 30 1255ndash1261

53 TomovicA and OakeleyEJ (2007) Position dependencies intranscription factor binding sites Bioinformatics 23 933ndash941

54 ZhouQ and LiuJS (2004) Modeling within-motif dependencefor transcription factor binding site predictions Bioinformatics20 909ndash916

55 AgiusP ArveyA ChangW NobleWS and LeslieC (2010)High resolution models of transcription factor-DNA affinitiesimprove in vitro and in vivo binding predictions PLoS ComputBiol 6 pii e1000916

56 Ben-GalI ShaniA GohrA GrauJ ArvivS ShmiloviciAPoschS and GrosseI (2005) Identification of transcription factorbinding sites with variable-order Bayesian networksBioinformatics 21 2657ndash2666

57 GershenzonNI StormoGD and IoshikhesIP (2005)Computational technique for improvement of the position-weightmatrices for the DNAprotein binding sites Nucleic Acids Res33 2290ndash2301

58 ZhaoY RuanS PandeyM and StormoGD (2012) Improvedmodels for transcription factor binding site identification usingnonindependent interactions Genetics 191 781ndash790

59 WeirauchMT CoteA NorelR AnnalaM ZhaoYRileyTR Saez-RodriguezJ CokelaerT VedenkoATalukderS et al (2013) Evaluation of methods for modelingtranscription factor sequence specificity Nat Biotechnol 31126ndash134

60 MordeletF HortonJ HarteminkAJ EngelhardtBE andGordanR (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specificity Bioinformatics29 i117ndashi125

61 SiddharthanR (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites generalizing the position weightmatrix PLoS One 5 e9722

62 MathelierA and WassermanWW (2013) The next generation oftranscription factor binding site prediction PLoS Comput Biol9 e1003214

63 JiangJ and LevineM (1993) Binding affinities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen Cell 72 741ndash752

64 WhiteMA ParkerDS BaroloS and CohenBA (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors Mol Syst Biol 8 614

65 ScardigliR BaumerN GrussP GuillemotF and Le RouxI(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6 Development 1303269ndash3281

66 StruhlG StruhlK and MacdonaldPM (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator Cell 57 1259ndash1273

67 RowanS SiggersT LachkeSA YueY BulykML andMaasRL (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site affinity Genes Dev 24980ndash985

68 GaudetJ and MangoSE (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4 Science 295821ndash825

69 JaegerSA ChanET BergerMF StottmannR HughesTRand BulykML (2010) Conservation and regulatory associationsof a wide affinity range of mouse transcription factor bindingsites Genomics 95 185ndash195

70 TanayA (2006) Extensive low-affinity transcriptional interactionsin the yeast genome Genome Res 16 962ndash972

71 SegalE Raveh-SadkaT SchroederM UnnerstallU andGaulU (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation Nature 451 535ndash540

72 SiggersT ChangAB TeixeiraA WongD WilliamsKJAhmedB RagoussisJ UdalovaIA SmaleST and BulykML(2012) Principles of dimer-specific gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding Nat Immunol 13 95ndash102

73 ZhuC ByersKJ McCordRP ShiZ BergerMFNewburgerDE SaulrietaK SmithZ ShahMVRadhakrishnanM et al (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors Genome Res 19556ndash566

74 GordanR MurphyKF McCordRP ZhuC VedenkoA andBulykML (2011) Curated collection of yeast transcription factorDNA binding specificity data reveals novel structural and generegulatory insights Genome Biol 12 R125

75 NakagawaS GisselbrechtSS RogersJM HartlDL andBulykML (2013) DNA-binding specificity changes in theevolution of forkhead transcription factors Proc Natl Acad SciUSA 110 12349ndash12354

76 BolotinE ChellappaK Hwang-VersluesW SchnablJMYangC and SladekFM (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats BMC Genomics12 560

77 ChuSW NoyesMB ChristensenRG PierceBG ZhuLJWengZ StormoGD and WolfeSA (2012) Exploring theDNA-recognition potential of homeodomains Genome Res 221889ndash1898

78 NoyesMB ChristensenRG WakabayashiA StormoGDBrodskyMH and WolfeSA (2008) Analysis of homeodomainspecificities allows the family-wide prediction of preferredrecognition sites Cell 133 1277ndash1289

79 FangB Mane-PadrosD BolotinE JiangT and SladekFM(2012) Identification of a binding motif specific to HNF4 bycomparative analysis of multiple nuclear receptors Nucleic AcidsRes 40 5343ndash5356

80 ZhouT YangL LuY DrorI Dantas MachadoACGhaneT Di FeliceR and RohsR (2013) DNAshape a methodfor the high-throughput prediction of DNA structural features ona genomic scale Nucleic Acids Res 41 W56ndashW62

81 BadisG ChanET van BakelH Pena-CastilloL TilloDTsuiK CarlsonCD GossettAJ HasinoffMJ WarrenCLet al (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters Mol Cell 32 878ndash887

82 KimJ and StruhlK (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATFCREB bZIP domainsNucleic Acids Res 23 2531ndash2537

83 VinsonC MyakishevM AcharyaA MirAA MollJR andBonovichM (2002) Classification of human B-ZIP proteins basedon dimerization properties Mol Cell Biol 22 6321ndash6335

84 KuoD LiconK BandyopadhyayS ChuangR LuoCCatalanaJ RavasiT TanK and IdekerT (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations Genome Res 20 1672ndash1678

85 KhorasanizadehS and RastinejadF (2001) Nuclear-receptorinteractions on DNA-response elements Trends Biochem Sci 26384ndash390

86 KurokawaR DiRenzoJ BoehmM SugarmanJ GlossBRosenfeldMG HeymanRA and GlassCK (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding Nature 371 528ndash531

87 LuscombeNM AustinSE BermanHM and ThorntonJM(2000) An overview of the structures of protein-DNA complexesGenome Biol 1 REVIEWS001

88 PerkinsAS FishelR JenkinsNA and CopelandNG (1991)Evi-1 a murine zinc finger proto-oncogene encodes a sequence-specific DNA-binding protein Mol Cell Biol 11 2665ndash2674

89 DelwelR FunabikiT KreiderBL MorishitaK and IhleJN(1993) Four of the seven zinc fingers of the Evi-1 myeloid-transforming gene are required for sequence-specific binding to

Nucleic Acids Research 2013 11

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

GA(CT)AAGA(TC)AAGATAA Mol Cell Biol 134291ndash4300

90 FunabikiT KreiderBL and IhleJN (1994) The carboxyldomain of zinc fingers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG Oncogene 91575ndash1581

91 KlemmJD RouldMA AuroraR HerrW and PaboCO(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site DNA recognition with tethered DNA-bindingmodules Cell 77 21ndash32

92 VerrijzerCP AlkemaMJ van WeperenWW VanLeeuwenHC StratingMJ and van der VlietPC (1992) TheDNA binding specificity of the bipartite POU domain and itssubdomains EMBO J 11 4993ndash5003

93 KemlerI SchreiberE MullerMM MatthiasP andSchaffnerW (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter EMBO J 8 2001ndash2008

94 KerstenS GronemeyerH and NoyN (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state J Biol Chem 27212771ndash12777

95 JolmaA YanJ WhitingtonT ToivonenJ NittaKRRastasP MorgunovaE EngeM TaipaleM WeiG et al(2013) DNA-binding specificities of human transcription factorsCell 152 327ndash339

96 KimJB SpottsGD HalvorsenYD ShihHMEllenbergerT TowleHC and SpiegelmanBM (1995) DualDNA binding specificity of ADD1SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain MolCell Biol 15 2582ndash2588

97 ParragaA BellsolellL Ferre-DrsquoAmareAR and BurleySK(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 23 A resolution Structure 6 661ndash672

98 FordycePM PincusD KimmigP NelsonCS El-SamadHWalterP and DeRisiJL (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microfluidic analyses Proc Natl Acad Sci USA109 E3084ndashE3093

99 Pique-RegiR DegnerJF PaiAA GaffneyDJ GiladY andPritchardJK (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility dataGenome Res 21 447ndash455

100 BulykML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5 201

101 LevineM and TjianR (2003) Transcription regulation andanimal diversity Nature 424 147ndash151

102 JohnsonAD (1995) Molecular mechanisms of cell-typedetermination in budding yeast Curr Opin Genet Dev 5552ndash558

103 WolbergerC (1999) Multiprotein-DNA complexes intranscriptional regulation Annu Rev Biophys Biomol Struct28 29ndash56

104 JohnsonAD (1992) In McKnightSL and YamamotoKR(eds) Transcriptional Regulation Cold Spring Harbor LabPress Cold Spring Harbor NY pp 975ndash1006

105 KimS BrostromerE XingD JinJ ChongS GeHWangS GuC YangL GaoYQ et al (2013) Probingallostery through DNA Science 339 816ndash819

106 GarvieCW HagmanJ and WolbergerC (2001) Structuralstudies of Ets-1Pax5 complex formation on DNA Mol Cell 81267ndash1276

107 JoshiR PassnerJM RohsR JainR SosinskyACrickmoreMA JacobV AggarwalAK HonigB andMannRS (2007) Functional specificity of a Hox proteinmediated by the recognition of minor groove structure Cell131 530ndash543

108 RyooHD and MannRS (1999) The control of trunk Hoxspecificity and activity by Extradenticle Genes Dev 131704ndash1716

109 CareyM (1998) The enhanceosome and transcriptional synergyCell 92 5ndash8

110 PtashneM and GannA (1997) Transcriptional activation byrecruitment Nature 386 569ndash577

111 MerikaM WilliamsAJ ChenG CollinsT and ThanosD(1998) Recruitment of CBPp300 by the IFN beta enhanceosomeis required for synergistic activation of transcription Mol Cell1 277ndash287

112 KimTK and ManiatisT (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome Mol Cell 1 119ndash129

113 MasternakK Muhlethaler-MottetA VillardJ ZuffereyMSteimleV and ReithW (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complexGenes Dev 14 1156ndash1166

114 del BlancoB Garcia-MariscalA WiestDL and Hernandez-MunainC (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signalingJ Immunol 188 3278ndash3293

115 BenoistC and MathisD (1990) Regulation of majorhistocompatibility complex class-II genes X Y and other lettersof the alphabet Annu Rev Immunol 8 681ndash715

116 HallJM McDonnellDP and KorachKS (2002) Allostericregulation of estrogen receptor structure function andcoactivator recruitment by different estrogen response elementsMol Endocrinol 16 469ndash486

117 LeungTH HoffmannA and BaltimoreD (2004) Onenucleotide in a kappaB site can determine cofactor specificity forNF-kappaB dimers Cell 118 453ndash464

118 MeijsingSH PufallMA SoAY BatesDL ChenL andYamamotoKR (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity Science 324407ndash410

119 MrinalN TomarA and NagarajuJ (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by DorsalNucleic Acids Res 39 9574ndash9591

120 WangVY HuangW AsagiriM SpannN HoffmannAGlassC and GhoshG (2012) The transcriptional specificity ofNF-kappaB dimers is coded within the kappaB DNA responseelements Cell Reports 2 824ndash839

121 RemenyiA ScholerHR and WilmannsM (2004)Combinatorial control of gene expression Nat Struct MolBiol 11 812ndash815

122 ChenYQ GhoshS and GhoshG (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer NatStruct Biol 5 67ndash73

123 FujitaT NolanGP GhoshS and BaltimoreD (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B Genes Dev 6 775ndash787

124 ScullyKM JacobsonEM JepsenK LunyakV ViadiuHCarriereC RoseDW HooshmandF AggarwalAK andRosenfeldMG (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specification Science 2901127ndash1131

125 RemenyiA TomilinA PohlE LinsK PhilippsenAReinboldR ScholerHR and WilmannsM (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping Mol Cell 8 569ndash580

126 ChasmanD CepekK SharpPA and PaboCO (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomainoctamer DNA complex specific recognition of a protein-DNA interface Genes Dev 13 2650ndash2657

127 IngrahamHA ChenRP MangalamHJ ElsholtzHPFlynnSE LinCR SimmonsDM SwansonL andRosenfeldMG (1988) A tissue-specific transcription factorcontaining a homeodomain specifies a pituitary phenotype Cell55 519ndash529

128 IngrahamHA FlynnSE VossJW AlbertVRKapiloffMS WilsonL and RosenfeldMG (1990) The POU-specific domain of Pit-1 is essential for sequence-specific highaffinity DNA binding and DNA-dependent Pit-1-Pit-1interactions Cell 61 1021ndash1033

129 BabbR HuangCC AufieroDJ and HerrW (2001) DNArecognition by the herpes simplex virus transactivator VP16 anovel DNA-binding structure Mol Cell Biol 21 4700ndash4712

130 KurasL CherestH Surdin-KerjanY and ThomasD (1996) Aheteromeric complex containing the centromere binding factor 1

12 Nucleic Acids Research 2013

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from

and two basic leucine zipper factors Met4 and Met28 mediatesthe transcription activation of yeast sulfur metabolism EMBOJ 15 2519ndash2529

131 BlaiseauPL and ThomasD (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNAEMBO J 17 6327ndash6336

132 WongD TeixeiraA OikonomopoulosS HumburgP LoneINSalibaD SiggersT BulykM AngelovD DimitrovS et al(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits Genome Biol 12 R70

133 FerrarisL StewartAP KangJ DeSimoneAMGemberlingM TantinD and FairbrotherWG (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome Genome Res 211055ndash1064

134 BolotinE LiaoH TaTC YangC Hwang-VersluesWEvansJR JiangT and SladekFM (2010) Integrated approachfor the identification of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays Hepatology 51642ndash653

Nucleic Acids Research 2013 13

by guest on April 20 2016

httpnaroxfordjournalsorgD

ownloaded from