Characterization of an SRY-like gene, DSox14, from Drosophila

9
Characterization of an SRY-like gene, DSox14, from Drosophila Andrew C. Sparkes, Katherine L. Mumford, Umesh A. Patel, Sarah F. Newbury 1 , Colyn Crane-Robinson * Biophysics Laboratories, Institute of Biomedical and Biomolecular Sciences, University of Portsmouth, St. Michael’s Building, White Swan Road, Portsmouth, PO1 2DT, UK Received 20 February 2001; received in revised form 4 April 2001; accepted 1 June 2001 Received by E. Boncinelli Abstract We have characterized the DSox14 gene, a new member of the family of transcription factors related to the mammalian sex determining factor, SRY. It contains two exons and the intron is large for Drosophila at 2.8 kb. The encoded protein consists of 691 amino acids (72 kDa) and includes an HMG box domain, which is closely related to the mouse Sox4 DNA binding domain. Expression of the DSox14 HMG box domain in vitro shows that it binds the sequence AACAAT with a K d of 190 nM, generating a bend angle of 48.68. At higher protein concentrations, a second HMG box binds at the recognition sequence, increasing the bend angle by 58. DSox14 is variably expressed throughout development as three alternative transcripts but not at all during the 1st and 2nd larval instars. The several mRNA transcripts are produced primarily from different transcriptional start sites. Analysis of the expression of DSox14 mRNAs during early development shows that they are maternally contributed at a low level and ubiquitously expressed during embryogenesis. The widespread pattern of expression suggests that DSox14 affects a large number of target genes. q 2001 Published by Elsevier Science B.V. All rights reserved. Keywords: Sox domain; HMG box; Transcription factor; DNA bending 1. Introduction Sox proteins form a large family of transcription factors that possess DNA binding domains closely related to that of SRY, the mammalian sex determining factor (Pevny and Lovell-Badge, 1997; Wegner, 1999). The HMG domain encoded by Sry and Sox proteins binds DNA at AACAAT sites, or related sequences (Denny et al., 1992b; Harley et al., 1994) via contacts in the minor groove and induces bends of 73–908 (Ferrari et al., 1992; Connor et al., 1994). This DNA bending property has led to the suggestion that Sox proteins act as architectural transcription factors by modulating chromatin structure around transcriptional regu- latory elements (Ferrari et al., 1992; Giese et al., 1992; Prior and Walter, 1996; Pevny and Lovell-Badge, 1997). In addi- tion, a number of Sox proteins can act as classical transcrip- tion factors since they contain transactivation domains that act on downstream reporter genes (van de Wetering et al., 1993; Hosking et al., 1995; Wotton et al., 1995; Sudbeck et al., 1996). However, other proteins containing sequence- specific HMG box domains appear to modulate transcription by binding other proteins (Yuan et al., 1995; Zappavigna et al., 1996). In Drosophila for example, dTCF (pangolin) has been shown to play a role in the Wingless/Wnt pathway by binding to Armadillo (Drosophila b catenin). Transcription of target genes such as Ultrabithorax is blocked upon inter- action of this complex with the transcriptional repressors Groucho and dCBP (Drosophila CREB binding protein) (Cavallo et al., 1997; Nollet et al., 1999). Sox proteins have been shown to be important in a number of developmental processes, including sex determi- nation, limb and eye formation and nervous system organi- zation (Goodfellow and Lovell-Badge, 1993; Kamachi et al., 1995; Uwanogho et al., 1995; Kent et al., 1996). The expression pattern of many Sox family members throughout development also appears to correlate with early cell fate Gene 272 (2001) 121–129 0378-1119/01/$ - see front matter q 2001 Published by Elsevier Science B.V. All rights reserved. PII: S0378-1119(01)00557-1 www.elsevier.com/locate/gene Abbreviations: BSA, bovine serum albumin; cDNA, complementary DNA; CNS, central nervous system; DTT, dithiothreitol; HMG box, the DNA binding domain from high mobility group proteins 1 and 2; IPTG, isopropyl-b-d-thiogalactose; kb, kilobase pairs; PCR, polymerase chain reaction; RNaseA, ribonuclease A; rp49, ribosomal protein 49; RT-PCR, reverse transcriptase PCR; SDS-PAGE, sodium dodecyl sulphate polyacry- lamide electrophoresis; SOX, Sry-related HMG bOX; UTR, untranslated region * Corresponding author. Tel.: 144-23-92842055; fax: 144-23- 92842053. E-mail address: [email protected] (C. Crane-Robin- son). 1 Present address: Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK.

Transcript of Characterization of an SRY-like gene, DSox14, from Drosophila

Characterization of an SRY-like gene, DSox14, from Drosophila

Andrew C. Sparkes, Katherine L. Mumford, Umesh A. Patel,Sarah F. Newbury1, Colyn Crane-Robinson*

Biophysics Laboratories, Institute of Biomedical and Biomolecular Sciences, University of Portsmouth, St. Michael's Building, White Swan Road,

Portsmouth, PO1 2DT, UK

Received 20 February 2001; received in revised form 4 April 2001; accepted 1 June 2001

Received by E. Boncinelli

Abstract

We have characterized the DSox14 gene, a new member of the family of transcription factors related to the mammalian sex determining

factor, SRY. It contains two exons and the intron is large for Drosophila at 2.8 kb. The encoded protein consists of 691 amino acids (72 kDa)

and includes an HMG box domain, which is closely related to the mouse Sox4 DNA binding domain. Expression of the DSox14 HMG box

domain in vitro shows that it binds the sequence AACAAT with a Kd of 190 nM, generating a bend angle of 48.68. At higher protein

concentrations, a second HMG box binds at the recognition sequence, increasing the bend angle by 58. DSox14 is variably expressed

throughout development as three alternative transcripts but not at all during the 1st and 2nd larval instars. The several mRNA transcripts are

produced primarily from different transcriptional start sites. Analysis of the expression of DSox14 mRNAs during early development shows

that they are maternally contributed at a low level and ubiquitously expressed during embryogenesis. The widespread pattern of expression

suggests that DSox14 affects a large number of target genes. q 2001 Published by Elsevier Science B.V. All rights reserved.

Keywords: Sox domain; HMG box; Transcription factor; DNA bending

1. Introduction

Sox proteins form a large family of transcription factors

that possess DNA binding domains closely related to that of

SRY, the mammalian sex determining factor (Pevny and

Lovell-Badge, 1997; Wegner, 1999). The HMG domain

encoded by Sry and Sox proteins binds DNA at AACAAT

sites, or related sequences (Denny et al., 1992b; Harley et

al., 1994) via contacts in the minor groove and induces

bends of 73±908 (Ferrari et al., 1992; Connor et al., 1994).

This DNA bending property has led to the suggestion that

Sox proteins act as architectural transcription factors by

modulating chromatin structure around transcriptional regu-

latory elements (Ferrari et al., 1992; Giese et al., 1992; Prior

and Walter, 1996; Pevny and Lovell-Badge, 1997). In addi-

tion, a number of Sox proteins can act as classical transcrip-

tion factors since they contain transactivation domains that

act on downstream reporter genes (van de Wetering et al.,

1993; Hosking et al., 1995; Wotton et al., 1995; Sudbeck et

al., 1996). However, other proteins containing sequence-

speci®c HMG box domains appear to modulate transcription

by binding other proteins (Yuan et al., 1995; Zappavigna et

al., 1996). In Drosophila for example, dTCF (pangolin) has

been shown to play a role in the Wingless/Wnt pathway by

binding to Armadillo (Drosophila b catenin). Transcription

of target genes such as Ultrabithorax is blocked upon inter-

action of this complex with the transcriptional repressors

Groucho and dCBP (Drosophila CREB binding protein)

(Cavallo et al., 1997; Nollet et al., 1999).

Sox proteins have been shown to be important in a

number of developmental processes, including sex determi-

nation, limb and eye formation and nervous system organi-

zation (Goodfellow and Lovell-Badge, 1993; Kamachi et

al., 1995; Uwanogho et al., 1995; Kent et al., 1996). The

expression pattern of many Sox family members throughout

development also appears to correlate with early cell fate

Gene 272 (2001) 121±129

0378-1119/01/$ - see front matter q 2001 Published by Elsevier Science B.V. All rights reserved.

PII: S0378-1119(01)00557-1

www.elsevier.com/locate/gene

Abbreviations: BSA, bovine serum albumin; cDNA, complementary

DNA; CNS, central nervous system; DTT, dithiothreitol; HMG box, the

DNA binding domain from high mobility group proteins 1 and 2; IPTG,

isopropyl-b-d-thiogalactose; kb, kilobase pairs; PCR, polymerase chain

reaction; RNaseA, ribonuclease A; rp49, ribosomal protein 49; RT-PCR,

reverse transcriptase PCR; SDS-PAGE, sodium dodecyl sulphate polyacry-

lamide electrophoresis; SOX, Sry-related HMG bOX; UTR, untranslated

region

* Corresponding author. Tel.: 144-23-92842055; fax: 144-23-

92842053.

E-mail address: [email protected] (C. Crane-Robin-

son).1 Present address: Department of Biochemistry, University of Oxford,

South Parks Road, Oxford, OX1 3QU, UK.

decisions (Wagner et al., 1994; Pevny and Lovell-Badge,

1997). Many Sox genes in mice are expressed in overlap-

ping patterns suggesting possible redundancy in the Sox

family. For example, in the developing CNS of the mouse

and chicken, Sox1±4 are co-expressed at high levels but are

rapidly down-regulated upon differentiation of neural

precursor cells. Mice homozygous for a targeted disruption

of Sox4 display severe cardiac malformations and lack B

lymphocytes but the CNS is unaffected. This may be

because Sox1±3 can compensate for Sox4 in the CNS

(van de Wetering et al., 1993; Collignon et al., 1996; Schil-

ham et al., 1996). Drosophila Sox Neuro (SoxN) is closely

related to mammalian Sox1±3 and is also expressed in the

developing CNS and the similarity between the ¯y and

mammalian genes extends beyond the HMG box domain

in this case (Cremazy et al., 2000). Drosophila Dichaete

is essential for embryonic development and nervous system

organization: mutant phenotypes are variable and this may

be the result of tissue/cell-speci®c interactions with other

transcription factors or Sox proteins (Nambu and Nambu,

1996; Russell et al., 1996; Soriano and Russell, 1998;

Mukherjee et al., 2000). Drosophila Sox100B expression

is prominent in the developing gut, Malpighian tubes and

gonad, tissues in which the closely related vertebrate Sox9

and Sox10 are also expressed. For Sox100B the close rela-

tionship to the vertebrate homologues is restricted to the

HMG box (Hui Yong Loh and Russell, 2000).

Sox genes are typically involved in critical developmen-

tal pathways and to better understand their role in cellular

processes we have characterized a further Sox gene from

Drosophila. We report that DSox14 (DSox60B) has a

complex temporal pattern of expression and is variably

expressed throughout development as three alternative tran-

scripts. When expressed in vitro, the HMG box domain of

DSox14 binds to and bends DNA in a similar manner to

other HMG box proteins. The widespread expression of

DSox14 suggests that it may be involved in modulating a

range of target genes.

2. Materials and methods

2.1. cDNA and genomic cloning

A 204 bp fragment ampli®ed from a 4±8 h Drosophila

embryonic cDNA phage library (Denny et al., 1992a) was

used to probe a 4±8 h Drosophila plasmid library. A total of

160,000 colonies were screened and four positives were

checked by sequencing. The single cDNA clone that

contained sequences identical to the original DSox14 ampli-

con was fully sequenced on both strands. The complete

cDNA sequence was assembled using information from

this cDNA clone, the genomic sequence, and from the

sequence of a cDNA kindly provided by Dr Christine Rush-

low (CR). Sequence data were analyzed using the MacVec-

tor programme and multiple sequence alignments generated

using ClustalW (http://dot.imgen.bcm.tmc.edu:9332/multi-

align/multi-align.html). For genomic cloning, a Drosophila

Oregon R phage library in EMBL3 was screened with a 1 kb

fragment from the CR DSox14 cDNA. Two non-overlap-

ping clones of 2.7 and 3.0 kb were sequenced and the gap of

,300 bp was bridged by PCR ampli®cation from genomic

DNA. The most downstream part of the gene was obtained

from cosmid 51D11 (European Drosophila project), known

to be located in the region of 60A on chromosome 2 (our

own in situ hybridization experiments using a cDNA probe

had indicated 60A8-B8 as the location of the gene).

2.2. Northern blotting analysis

Total RNA was prepared from ten different life stages of

Oregon R wild-type Drosophila and from 0±20 h embryos

using standard techniques (Sambrook et al., 1989). PolyA1

mRNA was then puri®ed from total RNA using oligo(dT)-

cellulose spin columns (Pharmacia) according to the manu-

facturer's instructions. The mRNA was separated on agar-

ose-formaldehyde gels, transferred to nylon membranes

(Amersham) and probed with random primed DSox14

cDNA with minor modi®cations to standard techniques

(Sambrook et al., 1989). Four different cDNA probes were

used with the developmental Northern: a 1.8 kb NotI/AgeI

fragment which includes most of the cDNA sequence; a 400

bp HincII/XbaI fragment of sequences 5 0 to the HMG box; a

1.1 kb BamHI fragment; and a 1.2 kb SmaI/AgeI fragment.

All gave identical results. The probe used for the 0±20 h

embryo Northern was an 870 bp BamHI genomic fragment

that included the HMG box and downstream sequences. The

control rp49 probe was a 253 bp EcoRI/HindIII fragment

coding for ribosomal protein 49 released from p720 as

described previously (Myers et al., 1995). Three indepen-

dent Northern blots were prepared and all gave identical

results.

2.3. RT-PCR and primer extension analysis

Nested RT-PCR using Superscript II reverse transcriptase

(Life Technologies) was performed according to the manu-

facturer's instructions using 2 mg of total 0±20 h embryonic

RNA. The ®rst round of RT-PCR was carried out in the

presence of 30 pmol of primers ACS10 (5 0-TCTTG-

CGCCGCTCCATCTGGC-3 0) and KLM1 (5 0-GCGTCG-

TCGCCTTCGCCAGC-3 0). The second round used AC-

S10 and KLM7 (5 0-GCCTGGTGTTCGGATCTGCACG-

G-3 0). The resulting 150 bp fragment was cloned and

checked by sequencing. Primer extension was performed

by 5 0 end-labelling of the appropriate primer by standard

techniques. The primer (2 pmol) was then added to 20 mg of

total Drosophila RNA in 80% formamide, 20 mM Tris±HCl

(pH 7.5), 400 mM NaCl, and 1 mM EDTA and incubated at

858C for 10 min, and then at 308C for 12 h. The RNA/primer

complex was ethanol-precipitated and then resuspended in

10 mM DTT, 75 mM KCl, 50 mM Tris±HCl (pH 8.3), 3 mM

MgCl2, 250 mM dNTPs, 2 mg/ml actinomycin D, and 30

A.C. Sparkes et al. / Gene 272 (2001) 121±129122

units RNasin ribonuclease inhibitor (Pharmacia), together

with 400 units of Superscript II (Life Technologies). The

reaction was terminated by the addition of 10 ml 20 mM

EDTA and 10 mg sonicated salmon sperm DNA. RNA was

digested by the addition of 1 mg/ml RNaseA and incubated

at 378C for 15 min. The products were phenol/chloroform-

extracted, ethanol-precipitated and visualized on 8% poly-

acrylamide, 7 M urea sequencing gels with sizing markers

alongside. The primers used in the extension experiments

were: KLM20, 5 0-TAGCCGGACCAGTGGCAGT-3 0;KLM21, 5 0-TTGC-TTTAAGTGTGTTGAT-3 0; and

KLM23, 5 0-CGAGCGA-ATAAACTACGCAA-3 0.

2.4. In situ hybridization to whole-mount embryos

These were performed on wild-type (Oregon R) Droso-

phila embryos using digoxygenin-labelled (DIG) antisense

RNA probes as described previously (Myers et al., 1995).

Antisense DSox14 probes were generated by linearizing

plasmids containing various DSox14 cDNA plasmids and

transcribing antisense RNAs with T7 polymerase. Embryos

were stained and mounted in JB-4 methacrylate (Poly-

sciences). The antisense RNA probes were transcribed

from the following DSox14 cDNA sequences: probe 5, the

1.4 kb AgeI/PstI fragment; probe 6, the 1.2 kb SmaI/AgeI

fragment; and probe 7, the 0.4 kb BamHI/HincII fragment.

These three different antisense probes all gave identical

results. The sense probe was transcribed using SP6 poly-

merase from the 1.4 kb AgeI/PstI cDNA fragment.

2.5. Construction of DSox14 HMG box expression plasmids

and protein puri®cation

The Dsox14 HMG box, encoding residues 178±265 (88

amino acids) of the DSox14 protein, was ampli®ed by PCR

from the cDNA clone with the primers BOX1 (5 0-GCGGGCGGATCCACCAAGAAACATTCGCCCGGCC-

3 0) and BOX2 (5 0-GCCGGCGAATTCGGAGCGCGTCT-

GCTTCTTTTGCG-3 0). After ampli®cation, the product

was restricted with BamHI and EcoRI and inserted into

pGEX2T, previously linearized with BamHI and EcoRI.

The ligation mix was used to transform Escherichia coli

HB101 and the resulting plasmids were checked by sequen-

cing. Escherichia coli HB101 containing this plasmid was

grown in LB broth, expression was induced with IPTG and

the fusion protein was prepared and puri®ed as described

previously for other HMG boxes (Read et al., 1994).

2.6. Band shift assays

Protein concentrations were determined from their UV

absorbance at 280 nm, using a molar extinction coef®cient

calculated on the basis of four tyrosines and two tryptophans

(1 � 16; 500 mol21 cm21). A 27 bp duplex DNA containing

a Sox binding site (underlined) was prepared by annealing

the oligonucleotides ACS-25 (5 0-CTAGCACTATAACAA-

TACAAGCCGGCC-3 0) and ACS-26 (5 0-GGCCGGCTT-

GTATTGTTATAGTGCTAG-3 0). This duplex was then 5 0

end-labelled using T4 polynucleotide kinase and

[g-32P]ATP. Labeled duplex DNA was puri®ed by gel elec-

trophoresis and DNA duplex concentrations were deter-

mined from their UV absorbance at 260 nm. Labeled

duplex DNA (50 nM) was mixed with varying concentra-

tions (50 nM to 1 mM) of DSox14 HMG box protein in a

buffer containing 5 mM HEPES (pH 7.5), 30 mM KCl, 4%

Ficoll, 0.05 mM PMSF, 1 mM MgCl2, 0.5 mg/ml BSA and 1

mM DTT (10 ml ®nal volume). Binding reactions were

incubated for 30 min on ice and then electrophoresed on

non-denaturing 8% polyacrylamide gels (19:1 acrylamide/

bis) in 0.25 £ TBE buffer at 150 V for 3 h at 48C. Gels were

then ®xed, dried and autoradiographed at 2808C with an

intensifying screen.

2.7. Circular permutation assay for DNA bending

The duplex obtained by annealing oligonucleotides

ACS25 and ACS26 was cloned into the HpaI site of

pBend4 (Zweib and Adhya, 1994; Read et al., 1994; Lneni-

cek-Allen et al., 1996). Circularly permutated DNA frag-

ments were isolated by restriction of the resulting plasmid

pB4S52, puri®ed by gel electrophoresis and end-labeled.

DNA bending assays were performed as described above

for band shift assays, using 500 pM DNA and 100 and

150 nM HMG box protein, except that poly(dI.dC) compe-

titor at 1 ng/ml was added to the reaction mixture.

3. Results and discussion

3.1. Cloning of Drosophila Sox14

The HMG domain is well conserved among Sox proteins,

although surrounding sequences are frequently highly

diverged (Soullier et al., 1999). To isolate a full-length

cDNA encoding DSox14, a 204 bp PCR amplicon (kindly

provided by Alan Ashworth) derived from a 4±8 h embryonic

DNA library using degenerate primers homologous to the

conserved ends of vertebrate SRY-like HMG boxes (Denny

et al., 1992a) was used to screen a similar 4±8 h embryonic

Drosophila cDNA library. Four different cDNA clones were

isolated and one of these was found to contain sequences

identical to the probe. Sequencing this 3 kb clone (ACS

cDNA) showed it to contain the complete HMG box and 2

kb of 3 0 in-frame sequence up to a polyA tail. However, just

5 0 of the box the frame was lost. A second 1.3 kb cDNA clone

(CR cDNA) kindly provided by Dr Christine Rushlow was

sequenced and found to be identical to the ACS cDNA from

the HMG box to the polyA tail but differed 5 0 of the box;

moreover, this 5 0 sequence was in-frame with the remainder

of the clone. We concluded that the 5 0 part of the ACS cDNA

clone was probably intronic, i.e. it was a partially processed

product. Furthermore, the breakpoint between the clones

corresponded to an acceptance splice site. Genomic sequen-

cing later con®rmed that this was indeed the case (see below).

A.C. Sparkes et al. / Gene 272 (2001) 121±129 123

Since the CR cDNA had an open frame right to its 5 0 terminus

it was possibly incomplete. We therefore cloned the Dsox14

gene from an Oregon R Drosophila genomic library made in

l EMBL3 by Kim Kaiser and Steve Russell. A total of 6790

bp of sequence was obtained showing the presence of a 2.8 kb

intron and continuation of the open reading frame upstream

of the 5 0 end of the CR cDNA up to an ATG in a context

conforming well to a Drosophila consensus Kozak sequence.

Assuming this to be the translational start, the DSox14

protein contains 669 amino acids. MWt 72.7 kDa. Our geno-

mic DNA sequence is in excellent accord with that in

FlyBase at FBgn0005612, however the translation given in

FlyBase prematurely stops at S530 rather than continuing to

the correct C-terminal M669. The correct nucleotide

sequence and its translation have been submitted to FlyBase.

Comparison of the Drosophila Sox14 HMG box with that

of other Sox proteins shows that it is most similar to mouse

Sox4 and to human Sox11 and 22 (Fig. 1) (Soullier et al.,

1999) and more similar to them than to any of the other nine

Sox proteins in the Drosophila database. Within the HMG

box, DSox14 shows 76% identity with mouse Sox4 but

outside the DNA binding domain there is no signi®cant

similarity between these proteins.

3.2. Expression of DSox14 during the Drosophila life cycle

To determine the expression of DSox14 through the

Drosophila life cycle, polyA1 mRNA was isolated at differ-

ent developmental stages and analyzed for the expression of

DSox14 transcripts by Northern blotting. Four probes from

different regions of the DSox14 gene were used and all gave

identical results (Fig. 2B and data not shown). DSox14

mRNA is expressed as three transcripts of 3.6, 3.3 and 2.5

kb. In embryos, the 3.6 kb transcript is barely visible and

only the 3.3 and 2.5 kb transcripts are clearly seen. This was

veri®ed in a separate experiment using 0±20 h embryos (Fig.

2A). The 3.6 kb transcript becomes clearly visible only in

late pupae and adult ¯ies. The 2.5 kb transcript is expressed

in embryonic stages, in early pupae and variably thereafter.

The expression levels of DSox14 transcripts were low

compared to the control transcript rp49. There is a striking

lack of expression in 1st and 2nd instar larvae and to verify

this, use was made of commercial 96-well plates containing

®rst-strand cDNA from various stages of Drosophila devel-

opment (Origene, Rockville, MD) that were probed by PCR

using a forward primer close to the ATG of the DSox14 gene

and a reverse primer within the HMG box (amplicon size

613 bp). This also showed no evidence of transcripts in 1st

and 2nd instar larvae, maximal expression in pupae and

signi®cant expression between 8 and 24 h in embryos

(data not shown). This approach does not of course distin-

guish between the different sized transcripts.

The three DSox14 transcripts could arise from alterna-

tive splicing events, or from different start and/or termina-

tion sites. In order to determine whether the putative 2.8 kb

intron in fact included additional exons, RT-nested PCR

was used with primers in the coding sequences close to the

ends of the 2.8 kb intron, using total RNA extracted from

0±20 h embryos as a template. Fig. 3A shows that a single

product of 150 bp was produced, the expected length if

there are no exons within the 2.8 kb region: sequencing

con®rmed its correct identity. Although we cannot rule out

the possibility that there are alternative polyA addition

sites, this seems unlikely since the sequences of the 3 0

UTRs of the two cDNA clones were identical and an

AATAAA polyA addition signal is located 16 bp upstream

of the polyA tail (Colgan and Manley, 1997). To establish

A.C. Sparkes et al. / Gene 272 (2001) 121±129124

Fig. 1. Position of the DSox14 gene within the sequenced segment of 6790 bp. The 5 0 UTR is shown from position 667, corresponding to the 3.3 kb mRNA. The

translational start ATG is at position 1668 and the 2.8 kb intron starts at position 2202 and ®nishes at position 5000. The HMG box domain starts six amino

acids into the second exon and continues to position 5269. The TGA stop codon is at position 6474 and the polyA addition site is at position 6634. The 3 0 UTR

is thus 158 bp in length. A sequence comparison is shown of the Sox14 HMG box with mouse Sox4 (X70298), human Sox22 (U35612) and human SOX11

(AB028641). Identical amino acids are marked with an asterisk. The protein segment used in the DNA bending and binding experiments is underlined.

whether the mRNA transcripts observed resulted from

alternative start sites, a number of primer extension experi-

ments were performed using three different primers (see

Section 2) with RNA from 0±20 h embryos. The primer

furthest upstream of the open reading frame, KLM23, gave

no products, indicating that transcription starts within 1190

nt of the translation start. The KLM21 primer, located 674

nt upstream of the translation start, detected one major

product corresponding to a transcriptional start point

3167 nt from the polyA addition site (Fig. 3). Assuming

a polyA tail of 200 nt (Graber et al., 1999), this matches

the observed size of 3.3 kb detected on the Northern blots.

The primer KLM20, located 101 nt upstream of the trans-

lation start, detected a product corresponding to a transcrip-

tional start point 2483 nt upstream of the polyA addition

site (Fig. 3). This matches (within 10 nt) the start site of the

EST clone LD30105. Again, if we assume a polyA tail of

200 nt, this matches the observed 2.5 kb band observed on

the Northern blots. The 3.6 kb transcript cannot be

accounted for by the primer extension data but is anyway

very weak in embryos.

Analysis of the genomic sequence in FlyBase showed that

A.C. Sparkes et al. / Gene 272 (2001) 121±129 125

Fig. 3. (A) Nested RT-PCR to screen for possible exon sequences within the 2.8 kb intron. Total RNA from Drosophila embryos was reverse-transcribed and

then ampli®ed using primer ACS10 (located 82 bp downstream of the 3 0 border of the intron) and primer KLM1 (located 350 bp upstream of the 5 0 border of the

intron). A second round of ampli®cation used ACS10 with primer KLM7 (located 68 bp upstream of the 5 0 border of the intron). Only a 150 bp product was

observed after the second round. (B) Primer extensions to determine transcriptional start points. Autoradiographs of extension reactions from primer KLM20

(extension product of 218 nt) and from primer KLM21 (extension product of 327 nt). (C) Positions of the primers KLM20 and KLM21 relative to the

translational ATG start codon.

Fig. 2. Expression of DSox14 mRNA. (A) Northern blot of mRNA from 0±

20 h embryos probed with a genomic fragment consisting of the HMG box

plus 450 bp of downstream sequence. (B) Expression of DSox14 mRNA

through the Drosophila life cycle. Northern blot of mRNAs probed with a

1.8 kb cDNA fragment of DSox14 that encompasses most of the coding

sequence and with a 300 bp fragment from the ribosomal protein (rp49)

gene. rp49 is known to be expressed at constant levels throughout devel-

opment and is used as a loading control. Developmental stages are: (1) 0±4

h embryos; (2) 4±8 h embryos; (3) 8±24 h embryos; (4) 1st instar larvae; (5)

2nd instar larvae; (6) third instar larvae; (7) early pupae; (8) late pupae; (9)

adult males; (10) adult females.

the polyA addition site of our DSox14 cDNA was only 44 bp

away from the 3 0 end of the 3 0 UTR of the PHM(U7743)

gene which is orientated in the opposite direction to

DSox14. The protein product of the PHM gene (peptidylgy-

cine-a-aminidating mono-oxygenase complex) is involved

in the production of neuropeptides. The 3.6 kb transcript

might therefore produce mRNA that is antisense to the

PHM mRNA. Similar convergent transcripts have been

found at other sites in the Drosophila genome (e.g. Spencer

et al., 1986) and it is possible that these convergent tran-

scripts are co-ordinately regulated.

3.3. Spatial and temporal expression of DSox14

To examine the spatial and temporal pattern of DSox14

expression through embryogenesis, the distribution of tran-

scripts was analyzed by in situ hybridization with four differ-

ent antisense RNA probes and one negative control sense

RNA probe, all ®ve labelled with digoxygenin. The four

different antisense probes all gave identical results. DSox14

is expressed widely at low levels throughout embryogenesis

(Fig. 4A±D) and the sense probe (Fig. 4E) showed that this

widespread low level staining with the antisense probes was

genuine and not a general background staining. The ubiqui-

tous early low level expression of DSox14 indicates that there

is a maternal contribution. DSox14 is expressed at low levels

throughout the germ band and ubiquitously throughout the

rest of embryonic development. In order to follow develop-

mental changes in the DSox14 protein itself, a 41 kDa peptide

from the C-terminal region was expressed in E. coli and used

to generate antisera in rabbits. These recognized the expected

band of about 72 kDa in Western blots and it was found that

the DSox14 protein was present in the embryo and 3rd instar

larval stages but not in 1st instar larvae, in agreement with the

observations made by Northern analysis in Fig. 2 (data not

shown).

3.4. Analysis of the DNA binding and bending activities of

the DSox14 HMG box

HMG boxes have been shown to bind AT-rich sequences

and bend the DNA to angles of between 30 and 1308 (Ferrari

et al., 1992; Giese et al., 1992; Connor et al., 1994; Read et

al., 1994). To compare the binding and bending properties

of the HMG box from DSox14 with other HMG boxes, we

expressed and puri®ed it for analysis of its interactions with

A.C. Sparkes et al. / Gene 272 (2001) 121±129126

Fig. 4. Spatial and temporal distribution of DSox14 mRNA in wild-type embryos. Whole-mount in situ hybridization was carried out using digoxygenin-

labelled antisense probes. (A) Embryo at stage 3. (B) Embryo at syncytial blastoderm, showing expression throughout the embryo. (C) Embryo at germ band

extension, stage 11, showing abundant DSox14 transcripts. (D) Embryo at germ band retraction (stage 13), showing low and ubiquitous expression of DSox14

throughout the embryo. (E) Embryo at syncytial/cellular blastoderm, i.e. of an age close to (B), probed with a DSox14 sense probe.

DNA. Comparison with other Sox proteins, especially with

mouse Sox4, led to the selection of a region of 87 amino

acids that included all the residues conserved between the

HMG boxes of DSox14 and other Sox proteins (Fig. 1). The

DNA encoding the selected region was ampli®ed by PCR

and cloned into pGEX2T. The GST fusion protein was

expressed, the GST was removed with thrombin and the

HMG box peptide was puri®ed using previously described

methods (Read et al., 1994).

The puri®ed HMG box domain migrated as a single band

with the expected mobility in SDS-PAGE and acetic acid/

urea gels (data not shown). Band shift assays were used to

monitor the binding of the DSox14 HMG box to DNA using

a 27 bp duplex containing the recognition site AACAAT:

this is the recognition sequence determined by site selection

experiments with the HMG box from the mSox-5 and

human SRY proteins (Denny et al., 1992b; Harley et al.,

1994). Fig. 5 shows that the DSox14 HMG box binds to

this DNA fragment in a concentration-dependent manner,

although at high concentrations (where no free DNA

remains) a `supershifted' complex is formed. The relative

proportions of free and complexed DNA were measured for

the ®rst eight protein concentrations using a Phosphorima-

ger system and ®tted to a 1:1 binding equation, yielding a

dissociation constant of 190 nM. This value is somewhat

larger than that measured for the HMG box of mouse Sox5

at the same temperature (,35 nM; Privalov et al., 1999).

This somewhat reduced af®nity may be because the target

site was not optimal or that additional residues N- or C-

terminal to the minimum HMG box domain need to be

included to achieve a higher af®nity.

To determine whether the DSox14 HMG box is able to

bend DNA, a circular permutation assay was performed.

The plasmid pB4552, including the Sox recognition

sequence AACAAT, was restricted to give seven DNA frag-

ments of 149 bp having this recognition site at different

positions along the fragment and each was then end-labeled

(Read et al., 1994; Privalov et al., 1999). The products of

seven binding reactions, each containing 100 nM protein,

were electrophoresed on an 8% polyacrylamide gel (Fig.

6A). The relative mobilities of the shifted bands were

plotted against their ¯exure displacement (position of the

binding site relative to the end of the fragment) and a para-

bola centred at the recognition sequence (a ¯exure displace-

ment of 0.5) was observed (Fig. 6B). The bend angle

derived using the algorithm of Ferrari et al. (1992) was

48.68. Circular permutation assays carried out at a higher

protein concentration (150 nM) gave rise to additional

supershifted bands (Fig. 6C) due to the binding of additional

molecules of protein. Since the 149 bp target duplex is much

longer than a typical HMG box footprint (14±16 bp), this

could be due to the HMG box binding to other sites of lower

af®nity and inducing additional bends. The relative mobili-

ties of the upper supershifted bands were therefore plotted

against ¯exure displacement (Fig. 6D): the minimum of the

parabola was at the same position as for the lower shifted

bands and the bend angle was calculated to be 548. This

indicates that the second protein molecule binds at the

same position in the 149 bp duplex as the ®rst protein, rather

than at a second site elsewhere, and makes only a small

difference to the bend angle generated. We conclude that

the second protein molecule binds directly to the ®rst by

protein/protein interactions. Piggy-backing of a second

HMG box onto one already bound to DNA explains the

supershifted band observed in the band shift experiment of

Fig. 5 that used a DNA duplex of only 27 bp, a length

insuf®cient to accommodate two HMG boxes side by side.

These data demonstrate that the DSox14 HMG box can

bind and bend DNA. It is important to note, however, that

the exact bend angle in vivo may differ from that observed

here due to additional protein contacts made to ¯anking

parts of the protein, for example as shown for LEF-1 (Lneni-

cek-Allen et al., 1996), and/or the presence of other protein

factors. Furthermore, since the cellular targets for DSox14

are not yet known, the precise in vivo DNA recognition

sequence may not have been used in the bending assay

and this could also in¯uence the bend angle generated.

4. Conclusions

1. We have identi®ed and sequenced the Drosophila Sox14

gene which encodes a protein containing an Sry-like

HMG box domain.

2. Analyses of cDNA clones indicate that the gene contains

two exons spaced by a 2.8 kb intron. The resulting

protein consists of 691 amino acids with a molecular

weight of 72 kDa.

3. DSox14 mRNA is expressed in a complex pattern

throughout the Drosophila life cycle and is ubiquitously

expressed during embryonic development. It is absent

during 1st and 2nd instar larval development. This wide-

spread pattern of expression suggests that DSox14 may

affect a large number of target genes.

4. The DSox14 mRNA is expressed as three different tran-

scripts of 3.6, 3.3 and 2.5 kb, due primarily to variations

A.C. Sparkes et al. / Gene 272 (2001) 121±129 127

Fig. 5. Band shift assays of the binding of the DSox14 Sox HMG box to a 27

bp duplex containing the Sox binding site AACAAT. All reactions contained

50 nM DNA and protein concentrations of (1) 0 nM, (2) 50 nM, (3) 100 nM,

(4) 150 nM, (5) 200 nM, (6) 250 nM, (7) 300 nM, (8) 350 nM, (9) 400 nM,

(10) 450 nM, (11) 500 nM, and (12) 1mM. Products were visualized on an 8%

polyacrylamide, 0.25 £ TBE gel electrophoresed at 150 V for 4 h. The initial

complex forms with a Ka of 5.26 £ 106 M21 (Kd � 190 nM).

in the transcriptional start site, rather than alternative

splicing or differences in termination sites.

5. The polyA addition site of the DSox14 cDNA is only 44

bp from the 3 0 UTR of the convergent PHM1 gene.

6. The HMG box of the DSox14 protein binds the sequence

AACAAT with a Kd of 190 nM, generating a bend angle of

48.68. Both the binding and bending analyses indicate that

two HMG boxes can piggy-back at the DNA recognition

sequence.

Acknowledgements

We are grateful to Dr N. Brown (Wellcome CRC Institute,

Cambridge, UK) for the Drosophila cDNA libraries, Dr S.

Russell (University of Cambridge, UK) for the Drosophila

genomic libraries, Dr C. Rushlow (University of Koln,

Germany) for the CR cDNA clone and Dr A. Ashworth (Insti-

tute of Cancer Research, London, UK) for the DSox14 PCR

fragment. We would also like to thank Dr C. Read for tech-

A.C. Sparkes et al. / Gene 272 (2001) 121±129128

Fig. 6. Circular permutation analysis of the DNA bending induced by the DSox14 HMG box domain. Plasmid pB4S52 containing the binding site used in the

DNA band shift experiment (Fig. 5) was restricted to give seven 149 bp fragments, each with the binding site in a different position relative to one end (the

¯exure displacement). (A) The result of a circular permutation assay obtained using 100 nM protein, ,500 pM DNA, 1£ binding buffer, 500 mg/ml of BSA and

poly(dI.dC).poly(dI.dC) duplex competitor DNA at 0.1 ng/ml. The gel is an 8% polyacrylamide 0.25£ TBE native gel electrophoresed in 0.25£ TBE running

buffer at 150 V at 48C. (B) The graph plotted with the data from the gel in (A). The parabola equation which ®ts these results is

y � 0:6112x2 2 0:6496x 1 0:9302, R2 � 0:9813. (C) The result of a circular permutation assay obtained using identical reaction conditions to (A) except

that 150 nM protein was used. The two complexes obtained are shown. (D) The graph plotted with the data from the gel in (C). The parabola equation which ®ts

these results is y � 0:69192x2 2 0:6496x 1 0:9302, R2 � 0:9813.

nical advice. A.C.S. acknowledges the award of a Wellcome

Trust Prize Studentship and S.F.N. acknowledges the support

of a Royal Society University Research Fellowship.

References

Cavallo, R., Rubenstein, D., Pfeifer, M., 1997. Armadillo and dTCF: a

marriage made in the nucleus. Curr. Opin. Genet. Dev. 7, 459±466.

Colgan, D.F., Manley, J.L., 1997. Mechanism and regulation of mRNA

polyadenylation. Genes Dev. 11, 2755±2766.

Collignon, J., Sockanathan, S., Hacker, A., Cohen-Tannoudji, M., Norris,

D., Rastan, S., Stevanovic, M., Goodfellow, P.N., Lovell-Badge, R.,

1996. A comparison of the properties of Sox-3 with Sry and two related

genes, Sox-1 and Sox-2. Development 122, 509±520.

Connor, F., Cary, P.D., Read, C.M., Preston, N.S., Driscoll, P.C., Denny, P.,

Crane-Robinson, C., Ashworth, A., 1994. DNA binding and bending

properties of the post-meiotically expressed Sry-related protein Sox-5 0.Nucleic Acids Res. 22, 3339±3346.

Cremazy, F., Berta, P., Girard, F., 2000. Sox neuro, a new drosophila sox

gene expressed in the developing central nervous system. Mech. Dev.

93, 215±219.

Denny, P., Swift, S., Brand, N., Dabhade, N., Barton, P., Ashworth, A.,

1992a. A conserved family of genes related to the testis determining

gene, SRY. Nucleic Acids Res. 20, 2887.

Denny, P., Swift, S., Connor, F., Ashworth, A., 1992b. An SRY-related

gene expressed during spermatogenesis in the mouse encodes a

sequence-speci®c DNA-binding protein. EMBO J. 11, 3705±3712.

Ferrari, S., Harley, V.R., Pontiggia, A., Goodfellow, P.N., Lovell-Badge,

R., Bianchi, M.E., 1992. Sry, like HMG1, recognizes sharp angles in

DNA. EMBO J. 11, 4497±4506.

Giese, K., Cox, J., Grosschedl, R., 1992. The HMG domain of lymphoid

enhancer factor-1 bends DNA and facilitates assembly of functional

nucleoprotein structures. Cell 69, 185±195.

Goodfellow, P.N., Lovell-Badge, R., 1993. SRY and sex determination in

mammals. Annu. Rev. Genet. 27, 71±92.

Graber, J.H., Cantor, C.R., Mohr, S.C., Smith, T.F., 1999. In silico detec-

tion of control signals: mRNA 3 0-end-processing sequences in diverse

species. Proc. Natl. Acad. Sci. USA 96, 14055±14060.

Harley, V.R., Lovell-Badge, R., Goodfellow, P.N., 1994. De®nition of a

DNA consensus binding site for SRY. Nucleic Acids Res. 22, 453±456.

Hosking, B.M., Muscat, G.E.O., Koopman, P.A., Dowhan, D.H., Dunn,

T.L., 1995. Trans-activation and DNA-binding properties of the tran-

scription factor, Sox-18. Nucleic Acids Res. 23, 2626±2628.

Hui Yong Loh, S., Russell, S., 2000. A Drosophila group E sox gene is

dynamically expressed in the embryonic alimentary canal. Mech. Dev.

93, 185±188.

Kamachi, Y., Sockanathan, S., Liu, Q., Brieman, M., Lovell-Badge, R.,

Kondoh, H., 1995. Involvement of SOX proteins in lens-speci®c activa-

tion of crystallin genes. EMBO J. 14, 3510±3519.

Kent, J., Theatley, S.C., Andrews, J.E., Sinclair, A.H., Koopman, P.A.,

1996. A male speci®c role for SOX9 in vertebrate sex determination.

Development 122, 2813±2822.

Lnenicek-Allen, M., Read, C.M., Crane-Robinson, C., 1996. The DNA

bend angle and binding af®nity of an HMG box increased by the

presence of short terminal arms. Nucleic Acids Res. 24, 1047±1051.

Mukherjee, A., Shan, X., Mutsuddi, M., Ma, Y., Nambu, J.R., 2000. The

Drosophila Sox gene, ®sh-hook, is required for postembryonic devel-

opment. Dev. Biol. 217, 91±106.

Myers, F.A., Francis-Lang, H., Newbury, S.F., 1995. Degradation of mater-

nal string mRNA is controlled by proteins encoded on maternally

contributed transcripts. Mech. Dev. 51, 217±226.

Nambu, P.A., Nambu, J.R., 1996. The Drosophila ®sh-hook gene encodes a

HMG domain protein essential for segmentation and CNS develop-

ment. Development 122, 3467±3475.

Nollet, F., Berx, G., van Roy, F., 1999. The role of the E-Cadherin/Catenin

adhesion complex in the development and progression of cancer. Mol.

Cell Biol. Res. Commun. 2, 77±85.

Pevny, L.H., Lovell-Badge, R., 1997. Sox genes ®nd their feet. Curr. Opin.

Genet. Dev. 7, 338±344.

Prior, H.M., Walter, M.A., 1996. SOX genes: architects of development.

Mol. Med. 2, 405±412.

Privalov, P.L., Jelesarov, I., Read, C.M., Dragan, A., Crane-Robinson, C.,

1999. The energetics of HMG box interactions with DNA: thermody-

namics of the DNA binding of the HMG box from mouse sox-5. J. Mol.

Biol. 294, 997±1013.

Read, C.M., Cary, P.D., Preston, N.S., Lnenicek-Allen, M., Crane-Robin-

son, C., 1994. The DNA sequence speci®city of HMG boxes lies in the

minor wing of the structure. EMBO J. 13, 5639±5646.

Russell, S.R.H., Sanchez-Soriano, N., Wright, C.R., Ashburner, M., 1996.

The dichaete gene of Drosophila melanogaster encodes a SOX-domain

protein required for embryonic segmentation. Development 122, 3669±

3676.

Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A

Laboratory Manual, 2nd Edition. Cold Spring Harbor Laboratory

Press, Cold Spring Harbor, NY.

Schilham, M.W., Oosterwegel, M.A., Moerer, P., Ya, J., de Boer, P.A.J.,

van de Wetering, M., Verbeek, S., Lamers, W.H., Kruisbeej, A.M.,

Cumano, A., Clevers, H., 1996. Defects in cardiac out¯ow tract forma-

tion and pro-B-lymphocyte expansion in mice lacking Sox-4. Nature

380, 711±714.

Soriano, N.S., Russell, S., 1998. The Drosophila SOX-domain protein

dichaete is required for the development of the central nervous system

midline. Development 125, 3989±3996.

Soullier, S., Jay, P., Poulat, F., Vanacker, J.-M., Berta, P., Laudet, V., 1999.

Diversi®cation pattern of the HMG and SOX family members during

evolution. J. Mol. Evol. 48, 517±527.

Spencer, C.A., Gietz, R.D., Hodgetts, R.B., 1986. Overlapping transcription

units in the dopa decarboxylase region of Drosophila. Nature 322, 279±

281.

Sudbeck, P., Lienhard-Schmitz, M., Baeuerle, P.A., Scherer, G., 1996. Sex-

reversal by loss of the C-terminal transactivation domain of human

SOX9. Nat. Genet. 13, 230±232.

Uwanogho, D., Ree, M., Cartwright, E.J., Pearl, G., Healy, C., Scotting,

P.J., Sharpe, P.T., 1995. Embryonic expression of the chicken Sox2,

Sox3 and Sox11 genes suggests an interactive role in neuronal devel-

opment. Mech. Dev. 49, 23±36.

van de Wetering, M., Oosterwegel, M., Van Norren, K., Clevers, H., 1993.

Sox-4, an Sry like HMG box protein, is a transcriptional activator in

lymphocytes. EMBO J. 12, 3847±3854.

Wagner, T., Wirth, J., Meyer, J., Zabel, B., Held, M., Zimmer, J., Pasantes,

J., Dagna Bricarelli, F., Keutel, J., Hustert, E., Wolf, U., Tommerup, N.,

Schempp, W., Scherer, G., 1994. Autosomal sex reversal and campo-

melic dysplasia are caused by mutations in and around the SRY related

gene SOX9. Cell 79, 1111±1120.

Wegner, M., 1999. From head to toes: the multiple facets of Sox proteins.

Nucleic Acids Res. 27, 1409±1420.

Wotton, D., Lake, R.A., Farr, C.J., Owen, M.J., 1995. The high-mobility

group transcription factor, SOX4, transactivates the human CD2 enhan-

cer. J. Biol. Chem. 270, 7515±7522.

Yuan, H., Corbi, N., Basilico, C., Dailey, L., 1995. Developmental speci®c

activity of the FGF-4 enhancer requires the synergistic action of Sox2

and Oct-3. Genes Dev. 9, 2635±2645.

Zappavigna, V., Faliola, L., Citterich, M.H., Mavilio, F., Bianchi, M.E.,

1996. HMG1 interacts with HOX proteins and enhances their DNA-

binding and transcriptional activation. EMBO J. 15, 4981±4991.

Zweib, C., Adhya, S., 1994. Improved plasmid vectors for the analysis of

protein-induced DNA bending. In: Kneale, G.G. (Ed.). Methods in

Molecular Biology. Humana Press. Totowa, NJ, pp. 281±294.

A.C. Sparkes et al. / Gene 272 (2001) 121±129 129