Development of Arabidopsis whole-genome microarrays and their application to the discovery of...

11
TECHNICAL ADVANCE Development of Arabidopsis whole-genome microarrays and their application to the discovery of binding sites for the TGA2 transcription factor in salicylic acid-treated plants Franc ¸ oise Thibaud-Nissen 1, *, Hank Wu 1 , Todd Richmond 2 , Julia C. Redman 1 , Christopher Johnson 3,† , Roland Green 2 , Jonathan Arias 3,‡ and Christopher D. Town 1 1 The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA, 2 NimbleGen Systems Inc., Madison, WI 53711, USA, and 3 University of Maryland, Baltimore County, Baltimore, MD 21250, USA Received 5 January 2006; revised 8 March 2006; accepted 15 March 2006. *For correspondence (fax þ1 301 838 0208; e-mail [email protected]). Present address: Cellular Neurophysiology Section, Cellular Neurobiology Research Branch, IPR/NIDA/NIH/DHHS, Baltimore, MD 21224, USA. Present address: Center for Scientific Review, National Institutes of Health, Bethesda, MD 20892, USA. Summary We have developed two long-oligonucleotide microarrays for the analysis of genome features in Arabidopsis thaliana, in particular for the high-throughput identification of transcription factor-binding sites. The first platform contains 190 000 probes representing the 2-kb regions upstream of all annotated genes at a density of seven probes per promoter. The second platform is divided into three chips, each of over 390 000 features, and represents the entire Arabidopsis genome at a density of one probe per 90 bases. Protein–DNA complexes resulting from the formaldehyde fixation of leaves of plants 2 h after exposure to 1mM salicylic acid (SA) were immunoprecipitated using antibodies against the TGA2 transcription factor. After reversal of the cross-links and amplification, the resulting ChIP sample was hybridized to both platforms. High signal ratios of the ChIP sample versus raw chromatin for clusters of neighboring probes provided evidence for 51 putative binding sites for TGA2, including the only previously confirmed site in the promoter of PR-1 (At2g14610). Enrichment of several regions was confirmed by quantitative real-time PCR. Motif search revealed that the palindromic octamer TGACGTCA was found in 55% of the enriched regions. Interestingly, 15 of the putative binding sites for TGA2 lie outside the presumptive promoter regions. The effect of the 2-h SA treatment on gene expression was measured using Affymetrix ATH1 arrays, and SA-induced genes were found to be significantly over-represented among genes neighboring putative TGA2-binding sites. Keywords: ChIP-chip, immunoprecipitation, microarray, TGA, transcription factor, Arabidopsis thaliana. Introduction Microarrays of complementary DNA, oligonucleotides or amplicons have been developed for expression analysis in Arabidopsis thaliana by many entities, including the Ara- bidopsis Functional Genomics Consortium (Wisman and Ohlrogge, 2000); Affymetrix (Redman et al., 2004; Zhu and Wang, 2000); Operon (Zanetti et al., 2005); a European con- sortium (Crowe et al., 2003); and The Institute for Genomic Research (TIGR) (Kim et al., 2003). These arrays, which focus on representing exons and are biased to the genes’ 3¢ ends, have provided valuable information on the timing and location of expression of most Arabidopsis genes. Gaining an understanding of the orchestration of patterns of expression will, however, require a different set of tools. Recently, arrays representing entire genomes or large genomic regions have been deployed to determine gene structure empirically in Arabidopsis (Stolc et al., 2005; Yamada et al., 2003) and to analyze genome features such as chromatin structure (Schubeler et al., 2004); sites of DNA 152 ª 2006 The Authors Journal compilation ª 2006 Blackwell Publishing Ltd The Plant Journal (2006) 47, 152–162 doi: 10.1111/j.1365-313X.2006.02770.x

Transcript of Development of Arabidopsis whole-genome microarrays and their application to the discovery of...

TECHNICAL ADVANCE

Development of Arabidopsis whole-genome microarrays andtheir application to the discovery of binding sites for the TGA2transcription factor in salicylic acid-treated plants

Francoise Thibaud-Nissen1,*, Hank Wu1, Todd Richmond2, Julia C. Redman1, Christopher Johnson3,†, Roland Green2, Jonathan

Arias3,‡ and Christopher D. Town1

1The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA,2NimbleGen Systems Inc., Madison, WI 53711, USA, and3University of Maryland, Baltimore County, Baltimore, MD 21250, USA

Received 5 January 2006; revised 8 March 2006; accepted 15 March 2006.

*For correspondence (fax þ1 301 838 0208; e-mail [email protected]).†Present address: Cellular Neurophysiology Section, Cellular Neurobiology Research Branch, IPR/NIDA/NIH/DHHS, Baltimore, MD 21224, USA.‡Present address: Center for Scientific Review, National Institutes of Health, Bethesda, MD 20892, USA.

Summary

We have developed two long-oligonucleotide microarrays for the analysis of genome features in Arabidopsis

thaliana, in particular for the high-throughput identification of transcription factor-binding sites. The first

platform contains 190 000 probes representing the 2-kb regions upstream of all annotated genes at a density of

seven probes per promoter. The second platform is divided into three chips, each of over 390 000 features, and

represents the entire Arabidopsis genome at a density of one probe per 90 bases.

Protein–DNA complexes resulting from the formaldehyde fixation of leaves of plants 2 h after exposure to

1 mM salicylic acid (SA) were immunoprecipitated using antibodies against the TGA2 transcription factor.

After reversal of the cross-links and amplification, the resulting ChIP sample was hybridized to both platforms.

High signal ratios of the ChIP sample versus raw chromatin for clusters of neighboring probes provided

evidence for 51 putative binding sites for TGA2, including the only previously confirmed site in the promoter of

PR-1 (At2g14610). Enrichment of several regions was confirmed by quantitative real-time PCR. Motif search

revealed that the palindromic octamer TGACGTCA was found in 55% of the enriched regions. Interestingly, 15

of the putative binding sites for TGA2 lie outside the presumptive promoter regions. The effect of the 2-h SA

treatment on gene expression was measured using Affymetrix ATH1 arrays, and SA-induced genes were found

to be significantly over-represented among genes neighboring putative TGA2-binding sites.

Keywords: ChIP-chip, immunoprecipitation, microarray, TGA, transcription factor, Arabidopsis thaliana.

Introduction

Microarrays of complementary DNA, oligonucleotides or

amplicons have been developed for expression analysis in

Arabidopsis thaliana by many entities, including the Ara-

bidopsis Functional Genomics Consortium (Wisman and

Ohlrogge, 2000); Affymetrix (Redman et al., 2004; Zhu and

Wang, 2000); Operon (Zanetti et al., 2005); a European con-

sortium (Crowe et al., 2003); and The Institute for Genomic

Research (TIGR) (Kim et al., 2003). These arrays, which focus

on representing exons and are biased to the genes’ 3¢ ends,

have provided valuable information on the timing and

location of expression of most Arabidopsis genes. Gaining

an understanding of the orchestration of patterns of

expression will, however, require a different set of tools.

Recently, arrays representing entire genomes or large

genomic regions have been deployed to determine gene

structure empirically in Arabidopsis (Stolc et al., 2005;

Yamada et al., 2003) and to analyze genome features such as

chromatin structure (Schubeler et al., 2004); sites of DNA

152 ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd

The Plant Journal (2006) 47, 152–162 doi: 10.1111/j.1365-313X.2006.02770.x

modifications (Kurdistani et al., 2004); and DNA–protein

binding sites (Ren et al., 2000).

Chromatin immunoprecipitation (ChIP) is a technology

that allows the enrichment of DNA targets to which specific

proteins (e.g. transcription factors) are bound in vivo. On a

small scale and with a priori knowledge, these binding sites

can be confirmed and quantified by sequence-specific PCR

(Johnson et al., 2001; Orlando, 2000). However, at the

discovery level, finding all genomic targets of a particular

transcription factor requires a more open-ended approach.

Identification of the binding sites of over 100 yeast tran-

scription factors in vivo has been successfully accomplished

by hybridization of immunoprecipitated chromatin to whole-

genome arrays (Iyer et al., 2001; Lee et al., 2002; Ren et al.,

2000). This approach (termed ChIP-chip) has also been used

for the discovery of the binding sites of the transcription

factors Sp1, cMyc and p53 on human chromosomes 21 and

22 (Cawley et al., 2004); the profiling of chromatin and of

DNA modifications in an Arabidopsis heterochromatic knob

(Lippman et al., 2004); and, on a smaller scale, the identifi-

cation of acetylated histones in the vicinity of 88 tobacco

genes (Chua et al., 2004). To address similar questions at the

genome-wide level in Arabidopsis, we have developed two

long-oligonucleotide microarrays representing large frac-

tions of the Arabidopsis genome. The ATH1_P1 array

represents exclusively promoter regions, defined as the 2-

kb region upstream of a gene’s start codon, of all 27 166

Arabidopsis genes annotated in TIGR release 4; while the

ATH1_WG1 array is a whole-genome array. We report the

utility of these arrays for the discovery of transcription

factor-binding sites, based on results obtained using anti-

bodies to the TGA2 transcription factor.

The TGA factors were first characterized in tobacco by

their ability to bind the as-1 element of the CaMV 35S

promoter, a 20-bp element containing two TGACG boxes,

and to promote transcription (Katagiri et al., 1989; Lam et al.,

1989). The as-1 like elements have been identified in plant

promoters of genes responding to salicylic acid (SA) or

auxin. In vitro, the TGACG motif is sufficient for TGA factor

binding (Lam et al., 1989). However demonstration of DNA

binding by TGA factors in vivo has been limited to GNT35

and GNT1 in tobacco, and to PR-1 in Arabidopsis (Johnson

et al., 2001, 2003). In Arabidopsis, the TGA family comprises

10 members. In the presence of SA, and upon activation by

NPR1 (Fan and Dong, 2002; Zhou et al., 2000), TGA2 and

TGA3 bind to the pathogenesis-related (PR-1) promoter, as

demonstrated in planta by chromatin immunoprecipitation

(Johnson et al., 2003). In the tga2-tga5-tga6 triple mutant,

PR-1 expression is severely reduced and systemic acquired

resistance (SAR) is abolished (Zhang et al., 2003), indicating

the essential role of at least some TGA factors in the

establishment of SAR.

We performed ChIP-chip experiments using an antibody

specific to TGA2 (Johnson et al., 2003). A total of 51 regions

of the genome were found enriched in the ChIP sample.

Fifty-five per cent of these contain the palindromic motif

TGACGTCA, which includes the TGACG box characteristic of

the TGA factor family. In all cases the motif coincides

precisely with the peak of enrichment detected on the array.

These results indicate that our arrays permit the discovery of

many putative binding sites for the TGA2 transcription

factor. Furthermore, 15 of these enriched regions lie outside

the presumptive promoter regions, an observation that

highlights the advantage of using the whole-genome array

over the promoter-only array. Finally, a significant propor-

tion of genes neighboring putative TGA2-binding sites were

induced by SA, as determined by microarrray analysis. Our

findings demonstrate the validity of the two designs and

their utility for high-throughput discovery of transcription

factor-binding sites in Arabidopsis.

Results

Design of ATH1_P1: a whole-genome promoter array

The first array design targeted exclusively the promoter re-

gions of annotated genes, as these are the most probable

binding sites of transcription factors. This design was

developed within the limitations of 190 546 features per

microarray and 27 166 genes in the Version 4 annotation of

the Arabidopsis genome, and therefore allowed seven

probes within the presumptive promoter region of each

gene (2 kb upstream of the ATG start codon), except for 30

genes for which 20 probes per promoter were designed.

Constraints were placed on oligonucleotide sequence and

position, so that the probes would both have similar

hybridization properties and be regularly spaced across the

promoter regions. The length of the probes was allowed to

vary between 54 and 65 bases, and the targeted Tm was set

to 76�C as this temperature was shown to be optimal for the

hybridization of DNA to whole-genome arrays (R.G., unpub-

lished results). In our hands, the size of the sheared

chromatin used for hybridization ranges between 0.4 and

2.5 kb, with most of the DNA around 0.8 kb in size. With this

in mind, and to allow detection of an enriched region by a

minimum of two probes, the maximum distance between

two adjacent probes was set to 600 nt. In the first pass of the

design algorithm, this distance constraint was fulfilled for

23 736 promoters. For 3400 promoters, however, some

adjacent probes were separated by over 600 bases and an

alternative design strategy was used. For these promoter

regions, the probes were designed within seven given

intervals of 150 bases spaced regularly throughout the

promoter to ensure a maximum distance of 450 bases

between two adjacent probes. A similar strategy, but with

different spacing parameters, was used for a subset of 30

promoters represented by 20 probes. The final design is

characterized by an average gap between two adjacent

Arabidopsis whole-genome microarrays 153

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

probes of 206 bases, a mean probe length of 62 bases, and a

Tm of 76.4 � 2.4�C.

Design of ATH1_WG1: a whole-genome array

Reports of binding of transcription factors outside promoter

regions (Cawley et al., 2004) and improvements in the

microarray technology prompted the design of a whole-

genome array, contained on three chips, which took

advantage of the doubling of the feature density on any

chip (to 390 000) and the implementation of a two-color

hybridization protocol.

First, repetitive regions totaling 15% of the genome were

excluded from the pseudomolecules (TIGR release 5). The

algorithm described in Experimental procedures was then

run three times with different parameters on the resulting

unmasked 101 091 579 bases. A summary of the parameters

used for each design iteration, and the statistics used to

evaluate the designs generated, is presented in Table 1. As

the minimum start-to-start distance between probes is

smaller than the maximum probe length, the probes are

allowed to overlap by as much as 35, 30 and 15 bases in

designs 1–3, respectively. The largest interval between two

probes (3352 bases in designs 1 and 2; 3392 bases in design

3), as well as the average Tm (76.7–77.0�C), was approxi-

mately the same across the three designs. As shown in

Table 1, differences resided in the total number of probes

(and therefore the average probe density); in the proportion

of the genome represented by at least one probe; and in the

median size of the interval between two non-overlapping

probes. The average probe density was one probe per 89

and 90 bases, respectively, in designs 1 and 3 versus one

probe per 97 bases in design 2. Nearly 4 Mb (4%) more

unmasked bases were covered by at least one probe in

design 3 than in design 1, as also reflected in the lower

percentage of overlapping probes; and the median interval

size between two non-overlapping probes was 57 bases in

design 3 versus 68 in design 1. Therefore design 3 was

chosen for fabrication. On average, each gene and 2-kb

upstream promoter region is represented by 45 probes. Only

189 gene regions are represented by five probes or fewer.

One advantage of the maskless synthesis process is that an

individual probe can be replaced or modified in the design at

any time, so this shortcoming could be corrected if

necessary.

Hybridization reproducibility

Cross-linked chromatin was extracted from leaves of plants

2 h after exposure to 1 mM SA, a delay sufficient for the

induction of PR-1, and immunoprecipitated with antibodies

specific to the TGA2 transcription factor. Biotin-labeled DNA

from immunoprecipitated and control samples (raw chro-

matin) were hybridized separately to the ATH1_P1 promoter

arrays, while cyanine-labeled samples of immunoprecipi-

tated and control material were mixed and hybridized to-

gether to the ATH1_WG1 promoter arrays (see Experimental

procedures). Technical reproducibility was high for both

array-hybridization modalities, with correlation coefficients

of log intensity of 0.97 for ATH1_P1, and in the range 0.92–

0.95 for ATH1_WG1.

Discovery of putative TGA2-binding sites with ATH1_P1 and

ATH1_WG1 arrays

For ATH1_P1, the standard deviation (SD) of all normalized

log intensity ratios around the mean was 0.62, while for

ATH1_WG1 chips 1–3 the SD was 0.99, 0.88 and 1.03,

respectively. In general the intensity ratios in enriched re-

gions were also slightly lower in the ATH1_P1 than in the

ATH1_WG1 hybridization.

For the purposes of this analysis, we define an enriched

region as two or more adjacent (or next-to-adjacent) probes

exhibiting log intensity ratios between the TGA2 ChIP

sample and raw chromatin that are >3 SD from the mean.

This somewhat arbitrary threshold corresponds to log2

Table 1 Summary statistics of three designs considered for the whole-genome array (ATH1_WG1)

Design 1 Design 2 Design 3

N, maximum probe length (nt) 85 85 70D, minimum start-to-start distance between probes (nt) 50 55 55Total number of probes 1 134 430 1 045 715 1 126 096Average probe density 1 per 89 nt 1 per 97 nt 1 per 90 ntMean melting temperature � SD (�C) 77.0 � 1.9 77.0 � 1.9 76.7 � 2.1Largest interval between two adjacent probes in unmaskedregions (nt)

3352 3392 3392

Percentage overlapping probes 72 67 67Median interval size between twoadjacent non-overlapping probes (nt)

68 66 57

Number of bases not covered by any oligonucleotide 40 651 008 41 599 194 36 788 000Number of bases covered by at least one oligonucleotide 60 440 571 59 492 385 64 303 579

154 Francoise Thibaud-Nissen et al.

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

ratios of 1.86 for the ATH1_P1 array, and 2.98, 2.64 and 3.09

for chips 2 and 3, respectively, of the ATH1_WG1 array, and

is towards the high end of the typical two- to eightfold

enrichment range reported for ChIP-chip (Buck and Lieb,

2004). Totals of 250, 87, 100 and 60 probes showing ratios

above the respective cut-offs for the ATH1_P1 array and

chips 2 and 3 of the ATH1_WG1 array clustered into 51

regions (Table S1). Fourteen regions (group 1 in Table S2)

were found enriched with both versions of the array; 11

regions were found enriched with the ATH1_P1 array only

(group 2); and 26 were found enriched with the ATH1_WG1

array only (groups 3 and 4), of which 15 fell outside promoter

regions and the range of ATH1_P1 probes (group 4). The

false-positive rate associated with each cluster was estima-

ted as 6 and 21% for the ATH1_P1 and ATH1_WG1 arrays,

respectively.

A majority of enriched regions contain extended TGACG

motifs

The cis-element characteristic of the TGA transcription fac-

tor family is the TGACG box (Lam and Lam, 1995). To test

whether this motif, or any other 1- to 10-mer, was more

frequent than expected by chance in the 51 enriched regions,

the 1-kb sequences flanking the enriched probes on both

sides were searched for over-represented motifs using the

program SIFT (Hudson and Quail, 2003). The resulting fre-

quency of each motif was compared with the entire Ara-

bidopsis genome. This analysis revealed that the most

significantly over-represented sequences (P < 10)10) all

contain the TGACG motif at their core (Figure 1). The most

significant motif was TGACGTCATC, present in 15 enriched

regions (P ¼ 2.88 · 10)35). However, the shorter TGACGTCA

motif was found in 28 (55%) of the enriched regions identi-

fied on either array (P ¼ 8.89 · 10)34). It is composed of two

overlapping TGACG boxes on opposite strands and, when

present, is invariably in the center of the enriched region.

The close match between the co-ordinates of the peak in

enrichment and the location of the TGACGTCA motif

strongly supports the authenticity of these regions, and

establishes the center of the enriched region as the putative

binding site for the TGA2 transcription factor. Interestingly,

only three enriched regions contain an as-1-like TGACG

tandem, with distances of up to 30 bases between the two

elements.

Genic context and enrichment profiles of TGA-binding

regions

Of the 14 regions reported to be enriched by both ATH1_P1

and ATH1_WG1 arrays (group 1 in Table S1), 11 contain the

TGACGTCA sequence, and one (the PR-1 promoter) has an

as-1-like element in its center. A 600-bp region of the PR-1

promoter, centered 690 bases upstream of the gene’s start

codon, was determined to be enriched approximately 20-

fold by the ATH1_P1 array and 40-fold by the ATH1_WG1

array (Figure 2a). It should be noted that the PR-1 promoter

is one of 30 promoters represented by 20 probes in the

ATH1_P1 array, while the other promoters are represented

by seven probes. Enrichment in this region was confirmed

by real-time PCR using multiple pairs of primers (Figure 2b),

and was shown to coincide with the location of the as-1

motif ()676 to )698 bases), an element that is essential for

the activation of PR-1 transcription, and binds Arabidopsis

TGA2 (Johnson et al., 2003). Examples of other enriched

promoter regions reported by both arrays are presented in

Figure 3; further examples can be found in Figure S1. Up to

15-fold enrichment was detected by the array in the 1-kb

region upstream of the WRKY51 transcription factor

At5g64810 (Figure 3a), and confirmed by real-time PCR

reactions across this region (Figure 3b). Four probes located

between 800 and 1200 bases upstream of the scarecrow

transcription factor At5g66770 start site exhibited about

eightfold enrichment in the ChIP sample on ATH1_WG1

arrays and fourfold enrichment in the ATH1_P1 array

(Figure 3c). An eightfold enrichment was also measured by

real-time PCR for this promoter (Figure 3d). In addition,

11 promoter regions, each represented by, on average,

Figure 1. Over-represented motifs in the 51 regions found enriched on at

least one of the arrays.

All motifs with associated P-values <10)10 are shown. The number of enriched

regions containing at least one motif is shown on the right: black highlight,

sub-motifs found in all over-represented motifs; gray highlight, sub-motifs

found in 75–99% of over-represented motifs.

Arabidopsis whole-genome microarrays 155

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

2.45 probes 3 SD above the mean, were four- to fivefold

enriched on the ATH1_P1 array, but not reported by the

ATH1_WG1 array using the same cut-off (group 2 in

Table S2). Among these, eight (72%) contain a TGACGTCA

motif and one an as-1-like element. On further inspection of

the ATH1_WG1 array, below-threshold peaks of enrichment

formed by several probes ranging from 5.5- to 6.5-fold

(approximately 2.5 SD) were also detected for all these

regions on the ATH1_WG1 array.

Similarly, 11 promoters identified as enriched on the

ATH1_WG1 array were not called enriched on the promoter-

only array (group 3), probably due to the smaller size of

these regions and the lower probe density on the ATH1_P1

array. These enriched regions are detected by, on average,

2.7 probes above threshold on the ATH1_WG1 array, com-

pared with 4.7 probes for group 1 regions. For example,

three probes spanning a 300-base enriched region 1 kb

upstream of the start site of At3g18530, an expressed

protein, and four probes in a 600-base region located 500

bases upstream of the protein phosphatase 2C gene

At1g79630, were enriched approximately eightfold on the

ATH1_WG1 array (Figure S2). For both these examples, only

one probe in this region was enriched on the ATH1_P1 array.

Three of the 11 regions (27%) in this category contain a

TGACGTCA motif, compared with 79% of the enriched

regions above threshold on both arrays, and 72% above

threshold on ATH1_P1.

Enriched regions located outside presumptive promoter

regions

Hybridizations to the ATH1_WG1 chips revealed 15 enriched

regions outside presumptive promoter regions (see group 4,

Table S2), six (40%) of which contain a TGACGTCA motif in

their centers, and one a GATGACG motif. Location of these

At2g14610

–2

2

4

6

0

0

5

10

15

–56249 6250 6251

Log2

(in

tens

ity r

atio

)Lo

g2 (

rela

tive

abun

danc

e)

Chromosome coordinate (in Kb)

(a)

(b)

Figure 2. Enrichment of the PR-1 (At2g14610) promoter in the TGA2 ChIP

sample compared with raw chromatin on the array and by real-time PCR.

(a) Log base 2 of the intensity ratio of TGA2 ChIP versus raw chromatin

detected on the promoter-only ATH1_P1 array (circles, dashed line) and the

whole-genome ATH1_WG1 array (triangles, solid line). Probes with ratios >3

SD are represented by larger filled symbols. Horizontal arrow, beginning of

the PR-1 coding region and direction of transcription; vertical arrow, location

of the as-1-like element.

(b) Log base 2 of the relative abundance of 10 120–200-bp segments of the PR-

1 promoter in the TGA2 ChIP sample versus raw chromatin, as detected by

real-time PCR. Co-ordinates on chromosome 2 are on the x-axis.

At5g64810

1

–1

2

3

4

5

0

Log2

(in

tens

ity r

atio

) (a)

At5g66770

(c)

2

4

0

–2

25924 25926

Log2

(re

lativ

e ab

unda

nce)

Chromosome coordinate (in kb)

(b)

26676 26677 26678

(d)

Figure 3. Enrichment of the At5g64810 and At5g66770 promoter in the TGA2

ChIP sample compared with raw chromatin on the array and by real-time PCR.

(a, c) Enrichment detected on the array: (a) At5g64810; (c) At5g66770.

Symbols as for Figure 2(a), except that vertical arrows indicate the location

of TGACGTCA motifs.

(b, d) Enrichment detected by real-time PCR: (b) At5g64810 promoter region;

(d) At5g66770 promoter region. Co-ordinates on chromosome 5 are indicated

on the x-axis.

156 Francoise Thibaud-Nissen et al.

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

regions was examined in relation to gene structures

innotated within 3 kb. Three regions occur downstream of

the nearest gene; four are more than 2 kb upstream of the

nearest gene; two are located within annotated genes; and

six belong to two of these categories. For example, a 1-kb

region located in the last exon of the zinc-finger protein gene

At3g13810 and 3–4 kb upstream of the F-box protein gene

At3g13820 is enriched sevenfold in the TGA2 ChIP sample

(Figure 4a). A ninefold enrichment was detected in the last

exon and 3¢ UTR of the pseudo-response regulator 2 gene

At4g18020 (Figure 4b), a region located 8960 bases

upstream of At4g18010 and 8981 bases downstream of

At4g18030, and was confirmed by real-time PCR (data not

shown). The 3¢ end of the single-exon gene At2g28650 is

enriched 14-fold and was detected by nine probes in

ATH1_WG1 (Figure 4c). This gene encodes a member of the

EXO70 exocyst subunit family, as does At2g28640, a gene

starting 2.7 kb downstream of the enriched region.

Functions of the genes downstream of enriched regions

In order to gain insight into the biological processes con-

trolled by TGA2, the functions of genes in the proximity of

the putative TGA2-binding regions were examined. Forty-

four of the 51 regions lie between )2 and 0.5 kb (for groups

1–3) or between )3 and 0.5 kb (for group 4) of the translation

start of 62 genes, as measured from the center of the

enrichment (Table S2). GOslim annotations, available for 22

of these 62 genes, were retrieved from The Arabidopsis

Information Resource (TAIR) database, and the representa-

tion of each category in this set was compared with the en-

tire genome with the GoStats utility of GOTOOLBOX (Martin

et al., 2004). We found that genes with kinase activity, or

genes involved in response to stress or external stimulus,

were over-represented in the subset (P < 0.05).

Correlation between enriched regions and SA induction of

neighboring genes

We measured by microarray analysis the changes in

expression of the genes close to enriched regions 2 h after

treatment with 1 mM SA. We found that seven genes out of

the 65 genes close to a putative TGA2-binding site showed a

significant increase in transcript levels, using an adjusted

P-value threshold of 0.05 (Table 2 and Table S2). This is

significantly greater than the number expected by chance

(P ¼ 0.04), considering that 1265 genes out of the 22243

represented on the array are up-regulated by SA and that 17

of the 65 genes neighboring enriched regions (21%) are not

represented on the ATH1 array and thus lack expression

data. Several of the significantly regulated genes play a role

in disease resistance, including the NPR1-interacting protein

NIMIN1 and PR1; two are kinases. These seven genes are

located close to six putative TGA2-binding sites (At1g02240

and At1g02450 are associated with the same binding

region). All of the six regions close to SA-induced genes are

between )2 and 0.5 kb from the gene start codon, and five

were predicted by both types of array. Three of the six

regions contain a TGACGTCA motif and one an as-1-like

element.

Discussion

Performance of the arrays

We have designed two sets of long oligonucleotides repre-

senting different parts of the Arabidopsis genome. For the

first array, probes were designed in the regions 2 kb up-

stream of all annotated genes’ start codons, at an average

density of one probe per 286 bases. The possibility of tran-

scription factor-binding sites outside presumptive promoter

regions prompted us to generate a second design, in which

the entire genome (with the exclusion of repetitive regions)

is represented. The characteristics of the second set of

probes are similar to those of the first set, but the probes are,

on average, 90 bases apart start to start. The proportion of

probesmapping exclusively to their intended location with a

75% identity cut-off is very high in both designs (87 and 82%

in the ATH1_P1 and ATH1_WG1 sets, respectively), which

should limit issues of cross-hybridization.

In order to evaluate the utility of the two designs for the

discovery of transcription factor-binding sites, we compared

a ChIP sample immunoprecipitated with TGA2 antibodies to

non-immunoprecipitated chromatin on both versions of the

0

–1

1

2

3

4

4544 4546 4548

At3g13810 At3g13820

Log2

(in

tens

ity r

atio

)

10003 10005 10007

At4g18020

12294 12296 12298

At2g28640 At2g28650

Chromosome coordinate (in Kb)

(a) (b) (c)Figure 4. Regions outside presumptive promot-

ers detected as enriched in the TGA2 ChIP

sample compared with raw chromatin.

(a) At3g13810/20; (b) At4g18020; (c) At2g28640/

50. Symbols as for Figure 2(a), except that ver-

tical arrows indicate the location of TGACGTCA

motifs. Note that the promoter array does not

interrogate these enriched regions. Co-ordinates

of probes on their respective chromosome are

indicated on the x-axis.

Arabidopsis whole-genome microarrays 157

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

array. Technical reproducibility was high for both types of

array. The location of the probes showing the highest ratios

between ChIP and raw chromatin signal was examined. In an

attempt to minimize the number of false-positive calls, only

probes showing signal ratios above 3 SD were selected. A

total of 51 clusters of high-ranking probes were found

enriched with one platform or the other. Apart from 15

enriched regions lying outside promoter regions and detect-

able only with ATH1_WG1 arrays, 22 promoters were found

enriched on only one of the two platforms. Reasons for the

different abilities of the two arrays to detect identical sets of

enriched regions include the sequence, location and density

of the oligonucleotides themselves, and the different levels

of noise (reflected in the signal ratio SD) on the two types of

array.

Enrichment of PR-1 promoter and of promoters of genes

involved in stress response and over-representation of

SA-induced genes in proximity of putative binding sites

support the validity of the putative TGA2-binding sites

Both generations of array demonstrated the ability to detect

genuine transcription factor-binding sites. The strong

enrichment of the PR-1 promoter found on both platforms is

in agreement with the findings of Johnson et al. (2003), who

showed by semi-quantitative PCR the binding of TGA2 to the

PR-1 promoter. Several lines of evidence support the

authenticity of the additional 50 putative binding sites for the

TGA2 transcription factor identified by the ChIP-chip study

presented here. First, enrichment of seven candidate regions

out of seven tested was confirmed by quantitative real-time

PCR (data are shown for three). Second, we found highly

significant over-representation of TGACG-containingmotifs,

well documented as TGA factor-binding sites (Lam and Lam,

1995; Lam et al., 1989) in enriched regions, and coincidence

of the location of the motifs with that of the most enriched

probes. Third, frequency analysis of the functions of the

genes located in the vicinity of the enriched presumptive

promoter regions showed significant over-representation

(P < 0.05) of genes with kinase activity and genes involved in

stress response, including WRKY51 (At5g64810), a gene

known to be induced by pathogen or SA treatment (Dong

et al., 2003).

Finally, there is significant over-representation of genes

that are induced by SA in the neighborhood of genomic

regions found to be enriched by TGA2-ChIP. Our ATH1

GeneChip experiments revealed seven of 65 significantly

SA-induced genes, compared with 1265 of 22 243 in the

entire genome (this difference is significant at P ¼ 0.04). The

same analysis applied to the data of Kliebenstein et al.

(2006) identified nine (P ¼ 0.01) of the genes close to our

enriched regions as upregulated by SA, including five

identified in our own data. PR-1 and GBF3, induced by SA

on our arrays, have log ratios of 1.44 and 0.55 in the

Kliebenstein data set but P values above threshold. The four

genes identified in the Kliebenstein data set, but not in ours,

are a disease-resistance protein (At1g56510, log R ¼ 0.63);

an XET (At2g14620, log R ¼ 2.95); the expressed protein

At2g4000 (log R ¼ 1.84); and a putative homologue to the

transcription regulator SNF2 (At2g44980, log R ¼ 0.62). Far

fewer genes neighboring enriched regions are significantly

repressed by SA, which could indicate that TGA2 is more

often involved in positive regulation of gene expression.

Downregulation by SA is observed for two adjacent genes

on opposite strands, At3g52070 and At3g52060. The

co-regulation also observed for At2g14620 (an XET) and

At1g14610 (PR-1) (our data; Johnson et al., 2003) supports

the validity of binding sites identified by ChIP-chip and

suggests that the activity of TGA2 is not directional.

It has been shown that, in human cells, approximately

85% of genes downstream of characterized promoters

bound by the RNA polymerase II pre-initiation complex

Table 2 Genes within 3 kb of a putative TGA2-binding site and upregulated by salicylic acid (SA)

Gene AnnotationDistance of enrichedregion from ATGa

Enriched regiongroupb

Log2 expressionratio SA:mock

AdjustedPc

At1g02450 NPR1/NIM1-interacting protein 1 (NIMIN-1) )361 1 5.15 0.001At1g02440 ADP-ribosylation factor, putative )1362 1 1.74 0.001At5g51830 pfkB-type carbohydrate kinase family protein )65d 1 2.99 0.002At1g76600 Expressed protein )189d 1 1.52 0.013At2g14610 PR1 )627e 1 3.02 0.016At2g46270 G-box binding factor 3 (GBF3) )1620 1 0.73 0.021At2g05940 Protein kinase, putative 370d 4 0.47 0.042

aATG is at position 1. A negative number indicates that the enriched region is upstream of the start codon, a positive that it is downstream of thestart codon.bGroup 1, region enriched on both platforms; group 4, region outside the presumptive promoter regions, unrepresented on the ATH1_P1 andenriched on ATH_WG1.cP-values for differences in expression were adjusted for false discovery rate.dContains a TGACGTCA motif.eContains an as-1-like element.

158 Francoise Thibaud-Nissen et al.

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

factor IID are expressed (Kim et al., 2005). However, a lower

correlation is expected in the case of whole plants or plant

organs, for several reasons. TGA2 might be present, active

or bound only in particular cell types, while expression

analysis on whole leaves measures transcript levels in all

cells regardless of the presence of a TGA factor, thus

potentially ‘diluting’ the changes in expression caused by

TGA2. It is also possible that TGA2 binding to a promoter

region is not sufficient for the recruitment of the RNA

polymerase and for transcription, which would explain why

only a fraction of genes close to putative TGA2-binding sites

are induced or repressed by SA. This study is an attempt at

unraveling the role of the TGA factors in triggering SAR. It is

likely that other transcription factors or protein-binding

proteins are also involved.

Extending the TGA factor recognition motif

Based on their high frequency in the enriched regions, we

propose that GATGACGTCA or TGACGTCAmight be higher-

affinity cis-elements of the TGA2 transcription factor than

TGACG alone. This is supported by in vitro observations that

the C-box ATGACGTCAT binds TGA1 with higher affinity

than T- or G-boxes, and with same affinity as an as-1

tetramer (Izawa et al., 1993; de Pater et al., 1996).

Significance of enriched regions outside presumptive

promoter regions

Fifteen enriched regions are located outside the 2-kb pro-

moter regions and, for most, within genes and/or beyond

3 kb from the start codon of a gene. The high incidence of

the TGACGTCA motif in these enriched regions in Arabid-

opsis argues for the authenticity of these sites, as do similar

findings in the human genome. In human cell cultures, 13%

of the Transcription Factor IID (TFIID) binding sites identified

are more than 2.5 kb away from any 5¢ ends (Kim et al.,

2005), and on chromosomes 21 and 22 only 22% of the

binding sites of Sp1, cMyc and p53 lie in the promoter

regions, while 36% reside in 3¢ ends and non-coding RNA

(Cawley et al., 2004).

Which array to use?

Based on this study, the ATH_WG1 array offers two advan-

tages over the ATH1_P1 design described here. Higher probe

density allows a more robust detection of enriched regions,

which might be represented by only one or two increased-

intensity oligonucleotides on the ATH1_P1 array. The dis-

tribution of probes throughout the genome not only pro-

vides an unbiased genome-wide scan for binding sites, but

also provides information about local background hybrid-

ization around enriched regions, which may help to better

define and detect these regions.

An important advantage of the maskless in situ synthesis

system is that probes can be added or removed from the

array at the time of fabrication. It is therefore possible, for

example, to fabricate a smaller array containing only probes

designed in the putative target regions of a given transcrip-

tion factor for confirmation of initial results. Alternatively, as

approximately 85% of the putative TGA2-binding sites are

between )3 andþ1 kb of the start of each gene, it is probable

that a compromise 390 000-probe array, comprising probes

only in this window, would allow the discovery of most

binding sites for Arabidopsis transcription factors while

significantly decreasing the cost per experiment.

Experimental procedures

Probe design

ATH1_P1 array. DNA sequences 0 to )2 kb of all Arabidopsis genetranslation start sites, as annotated in TIGR release 4, were extractedfrom the pseudomolecules, regardless of their proximity to neigh-boring genes or the existence of annotated untranslated regions.Probes varying in size from 54 to 65 bp were selected in these re-gions using a scoring algorithm that weighted the deviation from atarget melting temperature of 76�C; the position of the probe withregard to other selected probes in the same region; a sliding win-dow average of 24-mer frequency within the probe; and a Booleanmeasure of whether the probe complied with a number of simplebase-pair composition rules. The target melting temperature foreach probe was determined using a calculation by Bolton andMcCarthy (1962) as described by Sambrook et al. (1989). After probeselection, the distribution of probe positions was evaluated for eachtarget region. For promoters where there was an interval >600 bpbetween any two probes, probes were reselected using the samecriteria, but forcing selection within one of seven 150-bp non-overlapping windows within the target region.

ATH1_WG1 array. The latest set of Arabidopsis pseudomoleculesfrom TIGR (release 5) weremasked for Arabidopsis repeat sequencespresent in REPBASE (Jurka, 2000); for Escherichia coli sequencesretrieved from the National Center for Biotechnology Information(NCBI); and for vector sequences present in the TIGR UniVec data-base using REPEATMASKER (http://www.repeatmasker.org). Theprobe design algorithm proceeded on the unmasked sequences asfollows. Starting from position 1, a probe of the next N residues isselected and then trimmed from the 3¢ end until the targeted Tm of76�C, or a lower cut-off length of 55 bases, is reached. If no probecan be designed that satisfies these Tm and length constraints, or ifpart of the probe is masked or contains an ambiguous base, theprobe is discarded and the design process repeated one basedownstream. The probe is also rejected if it exceeds the limitationsin the number of cycles required for its synthesis. When a probe isfound in this region that satisfies all the design constraints, it isselected for synthesis. The design process is then repeated D nu-cleotides downstream (where D is the minimum start-to-start probespacing). The algorithm was run several times with different valuesfor the parameters of maximum length (N) and distance (D).

Sequences of the probes and mapping information are availableat http://www.tigr.org/tdb/e2k1/ath1/TGA_factors/project_summary.shtml. Both oligo sets were synthesized on glass slides usingmaskless synthesis technology (Nuwaysir et al., 2002), and areavailable from NimbleGen, Inc.

Arabidopsis whole-genome microarrays 159

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

Amplification of ChIP samples

Plant treatment, isolation of raw chromatin and immunoprecipita-tion were performed as described by Johnson et al. (2003). Immu-noprecipitated samples were amplified according to Wang et al.(2002), with the following modifications: Sequenase (USB) andPrimer A (5¢-GTTTCCCAGTCACGATC NNNNNNNNN) were usedinstead of reverse transcriptase and Primer D, and amplification wascarried on with Primer B (5¢-GTTTCCCAGTCACGATC) for 15 and 30cycles.

Labeling and hybridization to the ATH1_P1 array

Digestion of samples down to 100–200 bases was done by incuba-ting 6 lg amplified DNA for 3 min at 37�C in the presence of 0.05 UDNAse I and 1· One-Phor-All buffer (Amersham, Piscataway, NJ,USA). End-labeling of the digested DNA was performed for 90 minat 37�C in the presence of 1· buffer provided with the terminaltransferase, 1 ll biotin-N6-ddATP (Perkin-Elmer, Wellesley, MA,USA) and 2 ll terminal transferase (Promega, Madison, WI, USA) ina 20-ll volume. The terminal transferase was heat-inactivated at95�C for 15 min.

Before application to the microarray, the labeled sample wasdried down, resuspended in 40% Formamide, 8 mM Tris, 0.8 mM

EDTA, 5· saline sodium citrate (SSC), 0.08% sodium dodecylsulfate (SDS), denatured for 5 min at 95�C, spun down andcooled to 42�C. After overnight hybridization at 42�C, the arrayswere washed briefly at 42�C in wash solution WS1 (0.2% SDSand 0.2· SSC), transferred to wash solution WS2 (0.2· SSC) for1 min, and placed in stain solution (1 ng ll)1 Cy3-streptavidin(Pierce Chemical Company, Rockford, IL, USA), 100 mM 2-mor-pholinoethanesulfonic acid (MES) salts, 1 M NaCl, 0.05% Tween-20) for 25 min. After a rinse in WS2, the slides were placed inantibody solution [100 mM MES salts, 1 M NaCl, 0.05% Tween-20,0.2 mg ml)1 goat IgG, 50 mg ml)1 bovine serum albumin,250 ng ml)1 anti-streptavidin (Vector Laboratories, Burlingame,CA, USA)] for 25 min. Following rinsing in WS2, the slides werestained once more, rinsed for 1 min in WS2 and for 30 sec inWS3 (0.05· SSC), and spun dry.

Labeling and hybridization to the ATH1_WG1 array

Amplified DNA (1 lg) was labeled by random priming in the pres-ence of Cy3- or Cy5-labeled random nonamers, 10 nM dNTP and100 U Klenow polymerase. Following isopropanol precipitation,12 lg of each Cy3- and Cy5-labeled probe were resuspended to-gether in hybridization buffer (NimbleGen, Inc.), denatured for5 min at 95�C, cooled to 42�C and applied to the array. After over-night hybridization at 42�C, the slides were incubated successivelyin WS1 and 0.1 mM dithiothreitol (DTT) for 2 min at 45�C, in WS2buffer with 0.1 mM DTT for 1 min, and in WS3 with 0.1 mM DTT for15 sec. The slides were then dipped in 70% ethanol before spin-drying.

Analysis of the ChIP arrays

The arrays were scanned using a Genepix 4000 scanner (AxonInstruments). Correlation coefficients of log2 signals between tech-nical replicateswerecalculatedusing two replicatesof theTGA2ChIPsample on ATH1_P1 array or, in the case of the ATH1_WG1 array,15 000 probes present on more than one chip of the set.

The signal intensities were normalized pairwise using the Q-spline method described by Workman et al. (2002), as implemented

by the Bioconductor project. Probes showing log2 ratio 3 SD abovethe mean were considered enriched. We verified that all theseexhibited medium-to-high signal intensity in the TGA2 ChIPhybridization. For each chip, enriched probes were sorted accordingto their location on the genome. An enriched region was defined bya cluster of at least two adjacent or next-to-adjacent enrichedprobes. The number of false-positive clusters predicted for eacharray was calculated as 0.00135 · 0.00135 · 4 · n (with 0.00135 theprobability of an oligo having a ratio >3 SD; n ¼ number of oligoson the array), and was compared with the number of enrichedregions identified for estimation of the false-positive rate.

Real-time PCR verification

Five to 10 unique primer pairs were designed across the enrichedregions of interest using PRIMER3 (Rozen and Skaletski, 2000).Amplification of the 120–200-base products was performed in 15 llvolume with 0.5 ng amplified ChIP or raw chromatin sample, 0.4 lMof each primer, 1 · SYBRGreen RT–PCR mastermix (EuroGentech,Seraing, Belgium). The reactions were performed in duplicate ineach run, and each run was repeated once. The fluorescence ofdouble-stranded DNA was recorded using ABI7900HT (Applied Bi-osystems, Foster City, CA, USA). The DDCT value method was usedto evaluate the relative abundance of each amplified product in thetwo samples, as described in the ABI 7700 sequence-detectionsystem User Bulletin 2. Three promoters, present in equal amountsin the ChIP sample and the raw chromatin, as indicated by anaverage intensity ratio of 1 on the promoter array, were used asendogenous references. The threshold cycles (CT) of the corres-ponding amplicons were averaged, and used to normalize the CT ofthe test amplicons.

Expression arrays

Affymetrix ATH1 arrays were hybridized with labeled cRNA fromthree biological replicates of rosette leaves harvested 2 h afterspraying with 1 mM SA in 0.01% Silwet, or with 0.01% Silwet(Redman et al., 2004). The data were analyzed using the bio-conductor packages AFFY (Gautier et al., 2004) and LIMMA (Smyth,2004). Expression was background-corrected and quantile-nor-malized with robust multi-array analysis, and a mixed-effectslinear model was applied to the data. The data are publiclyaccessible as series GSE3984 in the NCBI Gene ExpressionOmnibus. After false discovery-rate adjustment of P-values(Benjamini and Hochberg, 1995), the expression of 1265 of the22 243 genes represented on the array (based on TIGR release 5annotation) was found to be significantly induced by SA(P < 0.05). The probability of over-representation by chance ofsignificantly induced genes among those close to enriched re-gions was estimated using the binomial distribution. Cel files ofarrays hybridized with cRNA from leaves harvested 4 h afterspraying with 0.3 mM SA and 0.02% Silwet, or with 0.02% Silwetwere provided by Dan Kliebenstein (Kliebenstein et al., 2006) andanalyzed in the same manner.

Scanning of enriched regions for motifs

The 2-kb regions surrounding the clusters of high-ranking probeswere searched for over-represented motifs using the program SIFT

(Hudson and Quail, 2003). The frequency of any 1- to 10-mer wascompared in the list of regions of interest and in the entire Arabid-opsis genome split into 2-kb fragments. The probabilities of each

160 Francoise Thibaud-Nissen et al.

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

motif occurring by chance were computed and all motifs with a P-value <10)10 are reported here.

Determination of the over-representation of functional

categories

For each gene downstream of a potential TGA2-binding site, GO-Slim terms were retrieved from the TAIR database. The probabilityof the observed difference in the frequencies of GOSlim terms in thesample group and in the entire genome was evaluated usingthe hypergeometric distribution, as implemented in GOTOOLBOX

(Martin et al., 2004).

Acknowledgements

We are grateful to Dr Dan Kliebenstein for providing the .cel files ofhis SA experiment. This work is supported by the National ScienceFoundation (MCB-0600882).

Supplementary Material

The following supplementary material is available for this articleonline:Figure S1. Enrichment on the arrays of At3g50860 and At3g04260promoters in the TGA2 ChIP sample compared with raw chromatin:(a) At3g50860; (b) At3g04260. Symbols as for Figure 2(a). The co-ordinates of the probes on their respective chromosome areindicated on the x-axis. Vertical arrows indicate the location ofTGACGTCA motifs, if present.Figure S2. Enrichment of At3g18530 and At1g76930 promoters inthe TGA2 ChIP sample compared with raw chromatin on the arrays:(a) At3g18530; (b) At1g76930. Symbols as for Figure 2(a). The co-ordinates of the probes on their respective chromosome areindicated on the x-axis. In both cases, only one probe of the ATH_P1array is 3 SD above the mean.Table S1 Number of probes 3 SD above the mean and number ofcorresponding enriched regionsTable S2 Putative TGA2-binding sites identified on ATH_P1 and/orATH_WG1 arrays and their neighboring genesThis material is available as part of the online article from http://www.blackwell-synergy.com

References

Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discov-ery rate: a practical and powerful approach to multiple testing.J. Roy. Stat. Soc. Series B, 57, 289–300.

Bolton, E.T. and McCarthy, B.J. (1962) A general method for theisolation of RNA complementary to DNA. Proc. Natl Acad. Sci.USA, 48, 1390–1397.

Buck, M.J. and Lieb, J.D. (2004) ChIP-chip: considerations for thedesign, analysis, and application of genome-wide chromatinimmunoprecipitation experiments. Genomics, 83, 349–360.

Cawley, S., Bekiranov, S., Ng, H.H. et al. (2004) Unbiased mappingof transcription factor binding sites along human chromosomes21 and 22 points to widespread regulation of noncoding RNAs.Cell, 116, 499–509.

Chua, Y.L., Mott, E., Brown, A.P., MacLean, D. and Gray, J.C. (2004)Microarray analysis of chromatin-immunoprecipitated DNAidentifies specific regions of tobacco genes associated withacetylated histones. Plant J. 37, 789–800.

Crowe, M.L., Serizet, C., Thareau, V. et al. (2003) CATMA: acomplete Arabidopsis GST database. Nucleic Acids Res. 31,156–158.

Dong, J., Chen, C. and Chen, Z. (2003) Expression profiles of theArabidopsis WRKY gene superfamily during plant defense re-sponse. Plant Mol. Biol. 51, 21–37.

Fan, W. and Dong, X. (2002) In vivo interaction between NPR1 andtranscription factor TGA2 leads to salicylic acid-mediated geneactivation in Arabidopsis. Plant Cell, 14, 1377–1389.

Gautier, L., Cope, L., Bolstad, B.M. and Irizarry, R.A. (2004) affy –analysis of Affymetrix GeneChip data at the probe level. Bioin-formatics, 20, 307–315.

Hudson, M.E. and Quail, P.H. (2003) Identification of promoter mo-tifs involved in the network of phytochrome A-regulated geneexpression by combined analysis of genomic sequence andmicroarray data. Plant Physiol. 133, 1605–1616.

Iyer, V.R., Horak, C.E., Scafe, C.S., Botstein, D., Snyder, M. and

Brown, P.O. (2001) Genomic binding sites of the yeast cell-cycletranscription factors SBF and MBF. Nature, 409, 533–538.

Izawa, T., Foster, R. and Chua, N.H. (1993) Plant bZIP protein DNAbinding specificity. J. Mol. Biol. 230, 1131–1144.

Johnson, C., Boden, E., Desai, M., Pascuzzi, P. and Arias, J. (2001) Invivo target promoter-binding activities of a xenobiotic stress-activated TGA factor. Plant J. 28, 237–243.

Johnson, C., Boden, E. and Arias, J. (2003) Salicylic acid andNPR1 induce the recruitment of trans-activating TGA factors to adefense gene promoter in Arabidopsis. Plant Cell, 15, 1846–1858.

Jurka, J. (2000) Repbase update: a database and an electronicjournal of repetitive elements. Trends Genet. 16, 418–420.

Katagiri, F., Lam, E. and Chua, N.H. (1989) Two tobacco DNA-bind-ing proteins with homology to the nuclear factor CREB. Nature,340, 727–730.

Kim, H., Snesrud, E.C., Haas, B., Cheung, F., Town, C.D. and Quac-

kenbush, J. (2003) Gene expression analyses of Arabidopsischromosome 2 using a genomic DNA amplicon microarray.Genome Res. 13, 327–340.

Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond,

T.A., Wu, Y., Green, R.D. and Ren, B. (2005) A high-resolutionmap of active promoters in the human genome. Nature, 436,876–880.

Kliebenstein, D.J., West, M.A., van Leeuwen, H., Kim, K., Doerge,

R.W., Michelmore, R.W. and St Clair, D.A. (2006) Genomic surveyof gene expression diversity in Arabidopsis thaliana. Genetics,172, 1179–1189.

Kurdistani, S.K., Tavazoie, S. and Grunstein, M. (2004) Mappingglobal histone acetylation patterns to gene expression. Cell, 117,721–733.

Lam, E. and Lam, Y.K. (1995) Binding site requirements and differ-ential representation of TGA factors in nuclear ASF-1 activity.Nucleic Acids Res. 23, 3778–3785.

Lam, E., Benfey, P.N., Gilmartin, P.M., Fang, R.X. and Chua, N.H.

(1989) Site-specific mutations alter in vitro factor binding andchange promoter expression pattern in transgenic plants. Proc.Natl Acad. Sci. USA, 86, 7890–7894.

Lee, T.I., Rinaldi, N.J., Robert, F. et al. (2002) Transcriptional regu-latory networks in Saccharomyces cerevisiae. Science, 298, 799–804.

Lippman, Z., Gendrel, A.V., Black, M. et al. (2004) Role of transpo-sable elements in heterochromatin and epigenetic control. Nat-ure, 430, 471–476.

Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D. and Jacq, B.

(2004) GOTOOLBOX: functional analysis of gene datasets based ongene ontology. Genome Biol. 5, R101.

Arabidopsis whole-genome microarrays 161

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162

Nuwaysir, E.F., Huang, W., Albert, T.J. et al. (2002) Gene expressionanalysis using oligonucleotide arrays produced by masklessphotolithography. Genome Res. 12, 1749–1755.

Orlando, V. (2000) Mapping chromosomal proteins in vivo by for-maldehyde-crosslinked-chromatin immunoprecipitation. TrendsBiochem. Sci. 25, 99–104.

de Pater, S., Pham, K., Memelink, J. and Kijne, J. (1996) Bindingspecificity and tissue-specific expression pattern of the Arabid-opsis bZIP transcription factor TGA2. Mol. Gen. Genet. 250, 237–239.

Redman, J.C., Haas, B.J., Tanimoto, G. and Town, C.D. (2004)Development and evaluation of an Arabidopsis whole genomeAffymetrix probe array. Plant J. 38, 545–561.

Ren, B., Robert, F., Wyrick, J.J. et al. (2000) Genome-wide locationand function of DNA binding proteins. Science, 290, 2306–2309.

Rozen, S. and Skaletski, H.J. (2000) PRIMER3 on the WWW for gen-eral users and for biologist programmers. In BioinformaticsMethods and Protocols: Methods in Molecular Biology (Krawetz,S. and Misener, S., eds). Totowa, NJ: Humana Press, pp. 365–386.

Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Clo-ning: A Laboratory Manual, 2nd edn. Cold Spring Harbor, NY:Cold Spring Harbor Laboratory Press.

Schubeler, D., MacAlpine, D.M., Scalzo, D. et al. (2004) The histonemodification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev. 18,1263–1271.

Smyth, G.K. (2004) Linear models and empirical Bayes methods forassessing differential expression inmicroarray experiments. Stat.Appl. Genet. Mol. Biol. 3, 3.

Stolc, V., Samanta, M.P., Tongprasit, W. et al. (2005) Identificationof transcribed sequences in Arabidopsis thaliana by using high-

resolution genome tiling arrays. Proc. Natl Acad. Sci. USA, 102,4453–4458.

Wang, D., Coscoy, L., Zylberberg, M., Avila, P.C., Boushey, H.A.,

Ganem, D. and DeRisi, J.L. (2002) Microarray-based detection andgenotyping of viral pathogens. Proc. Natl Acad. Sci. USA, 99,15687–15692.

Wisman, E. and Ohlrogge, J. (2000) Arabidopsis microarray servicefacilities. Plant Physiol. 124, 1468–1471.

Workman, C., Jensen, L.J., Jarmer, H., Berka, R., Gautier, L., Nielser,

H.B., Saxild, H.H., Nielsen, C., Brunak, S. and Knudsen, S. (2002) Anew non-linear normalization method for reducing variability inDNA microarray experiments. Genome Biol. 3, 1–16.

Yamada, K., Lim, J., Dale, J.M. et al. (2003) Empirical analysis oftranscriptional activity in the Arabidopsis genome. Science, 302,842–846.

Zanetti, M.E., Chang, I.F., Gong, F., Galbraith, D.W. and Bailey-

Serres, J. (2005) Immunopurification of polyribosomal com-plexes of Arabidopsis for global analysis of gene expression.Plant Physiol. 138, 624–635.

Zhang, Y., Tessaro, M.J., Lassner, M. and Li, X. (2003) Knockoutanalysis of Arabidopsis transcription factors TGA2, TGA5, andTGA6 reveals their redundant and essential roles in systemic ac-quired resistance. Plant Cell, 15, 2647–2653.

Zhou, J.M., Trifa, Y., Silva, H., Pontier, D., Lam, E., Shah, J. and

Klessig, D.F. (2000) NPR1 differentially interacts with members ofthe TGA/OBF family of transcription factors that bind an elementof the PR-1 gene required for induction by salicylic acid. Mol.Plant Microbe Interact. 13, 191–202.

Zhu, T. and Wang, X. (2000) Large-scale profiling of the Arabidopsistranscriptome. Plant Physiol. 124, 1472–1476.

162 Francoise Thibaud-Nissen et al.

ª 2006 The AuthorsJournal compilation ª 2006 Blackwell Publishing Ltd, The Plant Journal, (2006), 47, 152–162