More Taxa or More Characters Revisited: Combining Data from Nuclear Protein-Encoding Genes for...

23
Syst. Biol. 49(2):202-224,2000 More Taxa or More Characters Revisited: Combining Data from Nuclear Protein-Encoding Genes for Phylogenetic Analyses of Noctuoidea (Insecta: Lepidoptera) ANDREW MITCHELL, 1 ' 3 CHARLES MITTER, 1 AND JEROME C. REGIER 2 ^Department of Entomology, University of Maryland, College Park, Maryland 20742, USA 2 Center for Agricultural Biotechnology, University of Maryland Biotechnology Institute, College Park, Maryland 20742, USA Abstract.—A central question concerning data collection strategy for molecular phylogenies has been, is it better to increase the number of characters or the number of taxa sampled to improve the robustness of a phylogeny estimate? A recent simulation study concluded that increasing the num- ber of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are cho- sen specifically to break up long branches. We explore this hypothesis by using empirical data from noctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes, elongation factor-la (EF-la) and dopa decarboxylase (DDC), have yielded similar gene trees and high concordance with morphological groupings for 49 exemplar species. However, support levels were quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic sig- nal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data from the two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed to break up long branches, generated greater disagreement between the two gene data sets and de- creased support levels for deeper nodes. We appear to have inadvertently introduced new long branches, and breaking these up may require a yet larger taxon sample. Sampling additional char- acters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect of combining data from independent genes with collection of the same total number of characters from a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets. Support levels for combined data were at least as high as those for the bootstrap-augmented data set for DDC and were much higher than those for the augmented EF-la data set. This supports the view that in obtaining additional sequence data to solve a refractory systematic problem, it is pru- dent to take them from an independent gene. [Combining data; independent genes; Noctuidae; Noctuoidea; taxon sampling.] Quantifying empirical support for a phy- logeny is now the norm in systematics, and weak support is typically regarded as a problem in need of solving. The solution most often proposed, particularly for mole- cular studies, is to collect more data. How- ever, given financial and other constraints, one may have to choose between either col- lecting additional sequence data for the taxa already sampled or sampling addi- tional taxa. The benefits of collecting more characters are evident: By definition, con- sistent methods of phylogeny estimation will converge on the correct answer or true tree as the number of characters increases. Increasing taxon sampling can help phy- 3 Current address (and address for correspondence): Department of Biological Sciences, CW-405 Biological Sciences Building, University of Alberta, Edmonton, Canada T6G 2E9. E-mail: [email protected] logeny estimation by reducing long branch effects, but the benefits are less obvious be- cause the number of potential trees, and thus the size of the estimation problem, in- creases geometrically with the number of taxa (Felsenstein, 1978a). Taxon-sampling density has been the subject of much debate in the recent sys- tematics literature, e.g., Systematic Biology, March 1998. Many authors have noted that increased taxon sampling might generally increase the accuracy of estimates of phy- logeny (e.g., Hillis, 1996; but see Kim, 1996). Graybeal (1998) explicitly addressed the is- sue in a simulation study, concluding that if taxa are chosen specifically to break up long branches, increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters. However, there have been few empirical studies on these questions. 202 by guest on April 20, 2014 http://sysbio.oxfordjournals.org/ Downloaded from

Transcript of More Taxa or More Characters Revisited: Combining Data from Nuclear Protein-Encoding Genes for...

Syst. Biol. 49(2):202-224,2000

More Taxa or More Characters Revisited: Combining Data from NuclearProtein-Encoding Genes for Phylogenetic Analyses of Noctuoidea

(Insecta: Lepidoptera)

ANDREW MITCHELL,1'3 CHARLES MITTER,1 AND JEROME C. REGIER2

^Department of Entomology, University of Maryland, College Park, Maryland 20742, USA2Center for Agricultural Biotechnology, University of Maryland Biotechnology Institute,

College Park, Maryland 20742, USA

Abstract.—A central question concerning data collection strategy for molecular phylogenies hasbeen, is it better to increase the number of characters or the number of taxa sampled to improve therobustness of a phylogeny estimate? A recent simulation study concluded that increasing the num-ber of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are cho-sen specifically to break up long branches. We explore this hypothesis by using empirical data fromnoctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes,elongation factor-la (EF-la) and dopa decarboxylase (DDC), have yielded similar gene trees andhigh concordance with morphological groupings for 49 exemplar species. However, support levelswere quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic sig-nal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data fromthe two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed tobreak up long branches, generated greater disagreement between the two gene data sets and de-creased support levels for deeper nodes. We appear to have inadvertently introduced new longbranches, and breaking these up may require a yet larger taxon sample. Sampling additional char-acters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect ofcombining data from independent genes with collection of the same total number of charactersfrom a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets.Support levels for combined data were at least as high as those for the bootstrap-augmented dataset for DDC and were much higher than those for the augmented EF-la data set. This supports theview that in obtaining additional sequence data to solve a refractory systematic problem, it is pru-dent to take them from an independent gene. [Combining data; independent genes; Noctuidae;Noctuoidea; taxon sampling.]

Quantifying empirical support for a phy-logeny is now the norm in systematics, andweak support is typically regarded as aproblem in need of solving. The solutionmost often proposed, particularly for mole-cular studies, is to collect more data. How-ever, given financial and other constraints,one may have to choose between either col-lecting additional sequence data for thetaxa already sampled or sampling addi-tional taxa. The benefits of collecting morecharacters are evident: By definition, con-sistent methods of phylogeny estimationwill converge on the correct answer or truetree as the number of characters increases.Increasing taxon sampling can help phy-

3Current address (and address for correspondence):Department of Biological Sciences, CW-405 BiologicalSciences Building, University of Alberta, Edmonton,Canada T6G 2E9. E-mail: [email protected]

logeny estimation by reducing long brancheffects, but the benefits are less obvious be-cause the number of potential trees, andthus the size of the estimation problem, in-creases geometrically with the number oftaxa (Felsenstein, 1978a).

Taxon-sampling density has been thesubject of much debate in the recent sys-tematics literature, e.g., Systematic Biology,March 1998. Many authors have noted thatincreased taxon sampling might generallyincrease the accuracy of estimates of phy-logeny (e.g., Hillis, 1996; but see Kim, 1996).Graybeal (1998) explicitly addressed the is-sue in a simulation study, concluding that iftaxa are chosen specifically to break uplong branches, increasing the number oftaxa sampled is preferable to increasing thenumber of nucleotide characters. However,there have been few empirical studies onthese questions.

202

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 203

We present a case study of noctuoidmoths (Insecta: Lepidoptera) in the contextof this debate. Given previously low levelsof support for the hypotheses of interest, inthis case the deeper nodes in the tree, weask whether it is better to increase the num-ber of characters or the number of taxa sam-pled to improve the overall robustness ofthe phylogeny? We also expand the ques-tion to contrast the possibility of collectingadditional characters from the same source,as in the simulations of Graybeal (1998), tothat of obtaining such characters from a sec-ond, independent gene. With the recent de-velopment of several nuclear protein-en-coding genes for use in systematics (Sladeet al., 1994; Cho et al., 1995; Gupta, 1995;Waters, 1995; Friedlander et al., 1996; Ortiand Meyer, 1996; Fang et al., 1997; Regierand Shultz, 1997; Galloway et al., 1998), andthe promise of more to come (e.g., Friedlan-der et al., 1992; Graybeal, 1994), there isnow greater potential for utilizing indepen-dent sources of nucleotide characters in asingle analysis.

We investigated these questions as partof an ongoing systematic study of the Noc-tuoidea, one of the largest superfamilies ofinsects (>45,000 species; Scoble, 1992). Sep-arate studies of two nuclear genes, elonga-tion factor-la (EF-la) and dopa decarboxy-lase (DDC), have shown each to carrymuch information about noctuoid relation-ships (Mitchell et al., 1997; Fang et al.,1999). The two genes yield very similartrees for overlapping sets of 49 exemplarspecies and show almost complete concor-dance with groupings that have beenstrongly supported by earlier morphologi-cal evidence. However, in each gene tree,support was weak for the deeper nodes,which represent relationships above thesubfamily level.

Seeking a more robust phylogeny esti-mate, in this study we investigate the effecton phylogenetic signal of (1) increasing thetaxon sampling by nearly 60%, to 77species, and (2) combining data from thetwo genes in a single analysis. To contrastthe effect of combining data from differentsources to collection of the same total num-ber of characters from a single gene, wesimulate the latter by bootstrap augmenta-

tion of the single-gene data sets. The resultsconfirm that adding data from a secondgene can yield greater benefit for phy-logeny reconstruction than obtaining morecharacters from a single gene. Somewhatsurprisingly, however, we find that our in-creased taxon sampling, even though de-signed to break up long branches, producedmarkedly greater incongruence betweengenes and, if anything, decreased support fordeeper nodes.

Phylogenetic Framework:Current Understanding of

Noctuoid Relationships

Phylogenetic relationships within Noc-tuoidea have been problematic (Kitching,1984), but recent work has identified sev-eral probable monophyletic groups, forwhich Poole (1995) and Kitching and Rawl-ins (1999) cite at least one synapomorphy.We will refer to these as "concordancegroups" (Mitchell et al., 1997). Althoughthese groupings are not beyond all doubt,they are surely close approximations; wetherefore used recovery of concordancegroups as one gauge of phylogenetic utilityfor EF-la, DDC, and their combination. Wesampled multiple representatives from 24such groups, 21 of which are indicated inTable 1 and in the figures. The remainingthree concordance groups are (Euteliinae +Stictopterinae), Spodoptera, and (S. frugi-perda + S. ornithogalli); the monophyly ofthe latter two groups was confirmed by amorphological study in progress (M. Pogue,pers. comm.).

Of the four traditional noctuoid families,Notodontidae s.l. is widely agreed to be sis-ter group to the others, i.e., Noctuidae, Arc-tiidae, and Lymantriidae (Miller, 1991;Kitching and Rawlins, 1999). Arctiidae andLymantriidae are very likely monophyleticgroups (Kitching and Rawlins, 1999). Incontrast, unambiguous synapomorphiesfor Noctuidae are lacking (see Speidel et al.,1996, vs. Kitching and Rawlins, 1999).Within Noctuidae, recent reviews supporttwo groups, trifines and quadrifines. Tri-fines appear to be monophyletic (Poole,1995; Speidel et al., 1996; Kitching andRawlins, 1999), but quadrifines probably

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

204 SYSTEMATIC BIOLOGY VOL. 49

Higher taxaa

Notodontidaed

Notodontinae

Phalerinae

Nystaleinae

Heterocampinae

Arctiidaed

Lithosiinae

Arctiinaed

Arctiini

Ctenuchini

Lymantriidaed

Lymantriini

Orgyiini

"Quadrifine" Noctuidae

Aganainae

Herminiinaed

Catocalinae

Calpinae

Hypeninae

Euteliinae

Stictopterinaed

Nolinaed

Nolini

Sarrothripini

"Trifine" Noctuidaed

Acontiinaed

Eustrotiinae

Plusiinaed

TABLE 1. Species of noctuids and outgroups sampled.

Exemplars'1

Furcula cinerea

Gluphisia septentrionis

Datana perspicua

Symmerista albifrons

Nerice bidentata

Hypoprepia miniata

Estigmene acrea

Hyphantria cunea

Cisseps fulvicollis

Lymantria dispar

Dasychira obliquata

Orgyia leucostigma

Asota caricae

Palthis angulalis

Idia aemula

Catocala ultronia

Caenurgina crassiuscula

Hypsoropha monilis

Hypena scabra

Paectes pygmaea

Odontodes aleuca

Lophoptera sp.

Meganola sp.

Baileya levitans

Tarachidia candefacta

Spragueia leo

Thioptera nigrofimbria

Lithacodia musta

Anagrapha falcifera

Chrysanympha formosa

Trichoplusia ni

Abbrev.

Fci*

Gseps

Dpee-s

Sale-g

Nbide'g

Hmie'g

Eace'S

Hcune

Cfu

Ldie'S

Dobe<g

Oleu

Asot

Pane

Iaes

Cule-s

Cere

Hyps

Pscs

Ppye.g

Oale

Loph

Mmie'g

Bleve

Tcae-s

Sleoe

Thne<g

Lmsg

Afae-g

Cfor

Tnie'g

GenBank accession no.c

EF-la

U85665

AF151603f

U85666

U85667

AF151604

U85669

U85670

U85671

AF151606f

U85672

U85673

AF151605f

AF151607'

U85678

AF151608f

U85677

AF151613f

AF151612f

AF151609f

U85674

AF151611f

AF151610f

U85675

U85676

U85681

U85680

U85679

AF151614f

U85686

AF151615f

U20140

DDC

AF151539f

AF151542

AF151540

AF151541

AF151543

AF151547

AF151549

AF151550f

AF151548f

AF151544

AF151545

AF151546'

AF151551f

AF151552

AF151553f

AF151561

AF151562

AF151560

AF151554

AF151555

AF151557f

AF151556f

AF151558

AF151559f

AF151565

AF151566'

AF151564

AF151563

AF151567

AF151568f

U71401

are not. Kitching and Rawlins, for example,excluded the quadrifine subfamily Nolinaefrom Noctuidae, regarding it as related toLymantriidae and Arctiidae. Three recentmolecular studies, using 28S ribosomalRNA and the mitochondrial gene ND1

(Weller et al, 1994), EF-la (Mitchell et al.,1997), and DDC (Fang et al., 2000), similarlysuggest that some quadrifines are related tolymantriids and arctiids, but in none of thestudies were these groupings strongly sup-ported. Regarding trifines, Poole (1995)

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 205

Higher taxaa

Heliothinaed

Stiriinaed

Oncocnemidinaed

CuculliinaePsaphidinaed

AmphipyrinaePantheinae

RaphiinaeAcronictinae

Condicinaed

Condicini

Leuconyctini

Agaristinaed

EriopinaeNoctuinae s.l.dh

"Caradrinini"Prodeniini

Dypterygiini

TABLE 1. Continued

Exemplars'5

Pyrrhia exprimens

Eutricopis nexilis

Schinia arcigera

Adisura bella

Heliocheilus albipunctella

Heliothis (Masalia) terracottoides

Heliothis virescens

Helicoverpa zea

Basilodes chrysopis

Plagiomimicus olvello

Oncocnemis obscurata

Catabena lineolata

Cucullia convexipennis

Psaphida resumens

Feralia major

Triocnemis saporis

Amphipyra pyramidoides

Panthea furcilia

Charadra deridens

Raphia abrupta

Acronicta sp. 1

Acronicta sp. 2

Polygrammate hebraeicum

Condica videns

Homophoberia sp.

Leuconycta diptheroides

Eudryas grata

Psychomorpha epimenis

Callopistria mollissima

Spodoptera frugiperda

Spodoptera exigua

Spodoptera ornithogalli

Elaphria grata

Galgula partita

Anorthodes tarda

Nedra ramulosa

Abbrev.

Pexe-g

Enee'S

Sare-g

Abee'g

Hal"*MtesHvisHzee-g

Bche-g

Polie<s

Oob<=

Clin

Ccone

Prese'S

Fmae<g

Tsap

Apyre'g

Pfur*

Cdere

Rabre

AsplgAsp2e

Pheeg

Cvie-g

HomoLdipEgr^g

Pepie-g

Cmol

Sfre-g

Sexs

SorElge'g

GpaAtar

Nrae-s

GenBank accession no.c

EF-lo

U20137

U20126

U20138

U20123

U20127

AF151631f

U20135

U20136

U20125

AF151620f

U85685

AF151621'

U85694

U85695

U85696

AF151619'

U85693

U85684

U85683

U85689

AF151618f

U85687

U85688

U85682

AF151616f

AF151617'

U85690

U85691

AF151622'

U20139

AF151624'

AF151623f

U85697

AF151626f

AF151625'

U85698

DDC

U71430

U71410

U71431

U71407

U71413

U71427

U71428

U71429

U71405

U71406

AF151585f

AF151586f

AF151584f

AF151581

AF151579

AF151580f

AF151578

AF151572'

AF151573f

AF151574'

AF151575

AF151576f

AF151577

AF151569

AF151570f

AF151571'

AF151582

AF151583

AF151587f

U71403

U71404

AF151588f

AF151589

AF151591'

AF151590f

AF151592

used a broad definition of the group, andcombined the true cutworms of Lafontaine(1993) into an expanded subfamily Noctu-inae s.L, containing most of the trifinespecies. Kitching and Rawlins (1999) useda more restricted definition of trifines, ex-

cluding Acronictinae and associated taxa aswell as Pantheinae from this group. Onegoal of the present study was to further testthese hypotheses of noctuid nonmono-phyly and of the boundaries of the trifineclade.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

206 SYSTEMATIC BIOLOGY VOL. 49

Higher taxaa

HadeniniHadenina

EriopyginaApameinid

Xyleninid

Noctuini s.l.d

NoctuinaAgrotina

Aniclina

TABLE 1.

Exemplars'3

Lacinipolia renigera

Pseudaletia unipuncta

Orthodes crenulata

Apamea ampntatrix

Papaipema sp.

Anathix ralla

Lithophane hemina

Abagrotis alternata

Agrotis ipsilon

Anicla infecta

Continued

Abbrev.

Lree-g

Pung

Ocr*.g

AampPappArale-g

Lheme-g

Aalt

Aipe-g

Ainfrs

GenBank accession no.c

EF-la

U85700

AF151627f

U85699

AF151629f

AF151628f

U85702

U85701

AF151630f

U85704

U85703

DDC

AF151594

AF151595

AF151593

AF151597f

AF151596f

AF151598

AF151599

AF151601f

AF151600AF151602

"Nomenclature follows Kitching and Rawlins (1999) and Poole (1995); where these classifications were in conflict, we used themore finely split alternative.

bAll species were collected in the USA, except as follows: Asota caricae, Odontodes aleuca, and Lophoptera sp. from Thailand;Heliocheilus albipunctella and Heliothis (Masalia) terracottoides from Mali.

cThe entire alignment is available from the Society of Systematic Biologists web site (http://www.utexas.edu/admin/systbiol/).dConcordance groups, supported by morphological synapomorphies. The following are also concordance groups: Euteliinae +

Stictopterinae, Cuculliinae + Oncocnemidinae, Spodoptera, and S.frugiperda + S. ornithogalli.cSpecies included in reduced data set of 49 taxa, closely matching the taxon sampling of Mitchell et al. (1997).fGene sequences new to this study.BSpecies included in reduced data set of 49 taxa, closely matching the taxon sampling of Fang et al. (2000).hAlthough there is strong evidence for the monophyly of Noctuinae s.l., its limits are poorly defined; "Caradrinini" is placed

here tentatively (Poole, 1995).

MATERIALS AND METHODS

Species Sampled

Table 1 lists the species sampled andGenBank accession numbers of their EF-laand DDC partial sequences used in thisstudy, arranged according to recent propos-als on noctuoid classification. The new datacomprise 56 gene sequences. The entirealignment is also available from the Societyof Systematic Biologists web site (http://www.utexas.edu/admin/systbiol/).

Our earlier studies of Noctuoidea, usingEF-la (Mitchell et al., 1997) and DDC (Fanget al., 2000), sampled 49 exemplar specieseach, only 35 of which were shared betweendata sets. Because each data set was sensi-tive to taxon-sampling effects, we decidednot to attempt combining data for analysisuntil we had determined both gene se-quences for most species. This goal wasachieved for 60 species, after which se-quence data for both genes were obtainedfrom an additional 17 species, chosen tobreak up long branches present in treesfrom our earlier analyses. Seven of these

new species were from previously unsam-pled clades within Noctuoidea (Ctenuchini,Apameini, Eriopinae, and the Old Worldsubfamilies Aganainae and Stictopterinae),seven were from previously underrepre-sented clades (Lymantriidae, Plusiinae, On-cocnemidinae, Psaphidinae, Condicinae,and "Caradrinini") and the last three werespecies of Heliothinae, as reported in Choet al. (1995) and Fang et al. (1997). All told,our sample includes all four traditionalfamilies; 23 of the 31 subfamilies of Noctu-idae recognized by Kitching and Rawlins(1999), including the two noctuid subfami-lies which they raise to family rank; and 27of the 45 tribes recognized (both explicitlyand implicitly) within Noctuidae.

Laboratory Protocols

Specimens were either live-frozen at-80°C or collected directly into 100%ethanol at ambient temperature in the fieldand subsequently stored at temperatures ascold as -20°C for as long as 6 months be-fore being cooled to -80°C.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 207

TABLE 2. Sequences of DDC primers used in this study. International Union of Biochemistry (1986) codes areused to indicate degeneracy. All primers included an M13 sequence at the 5' end to facilitate automated sequenc-ing. Primer names ending in F identify forward primers, which bind to the antisense strand of DNA; primernames ending in RC identify reverse complement primers.

Primer Sequences (5'-3') Position8

1 .7sF

1.9sF

3 . 2 s F

3.3sRC

4sRC

GCH TGY ATY GGN TTY WCN TGG AT

ATG HTN GAY TGG YTV GGY CAR ATG

TGG YTN CAY GTN GAY GCN GCN TAY GC

CCA YTT RTG NGG RTT RAA RTT RAA

GG DAT YTG CCA RTG HCK RTA RTC

471-493

528-551

975-1000

1065-1088

1203-1225

4\Iucleoride position relative to the DDC sequence of Manduca sexta (GenBank accession no. U03909).

Wings and abdomens were removedfrom specimens and vouchered. Genomicnucleic acids were extracted from the tho-rax (head + thorax for smaller specimens)by using a commercially available DNA/RNA isolation kit (Amersham Corp., Ar-lington Heights, IL). Polymerase chain reac-tion (PCR) primer sequences for EF-la werethose from Cho et al. (1995). DDC primersincorporated slightly more degeneracythan those of Fang et al. (1997) and arelisted in Table 2. An additional DDC primer(3.3sRC) was also made. The 3.3sRC primerbinds downstream from 3.2sF and allowsamplification of two overlapping PCR frag-ments (1.7sF/3.3sRC or 1.9sF/3.3sRC and3.2sF/4sRC). PCR amplification protocolsfor EF-la and DDC were unchanged fromMitchell et al. (1997) and Fang et al. (1997),respectively. Note that reverse transcription(RT)-PCR was used to avoid introns inDDC. To obtain purer templates for se-quencing, we always used a nested PCRapproach; that is, a first amplification usedexternal primers, and reamplifications uti-lized nested internal primers. Sequenceswere generated with an Applied Biosys-tems 373A Stretch DNA sequencer usingTaq polymerase and dye-primer. Templateswere sequenced in both directions.

Data Analysis

We used the XDAP program within thesoftware package STADEN (Staden, 1992)to check chromatograms for accuracy ofbase-calling and to assemble contiguousDNA fragments. Sequences were alignedmanually by using the Genetic Data Envi-ronment software package (GDE 2.2; Smith

et al., 1994). No gaps were needed for align-ment.

Nucleotide sequences for each gene werefirst analyzed separately for the 77-taxadata sets, and then sequences for the twogenes were concatenated for the combineddata analyses. To facilitate direct compar-isons between our current data set with 77taxa and those of our earlier studies with 49taxa each (Mitchell et al., 1997; Fang et al.,1999), we also analyzed reduced data setswith 49 taxa each. Taxa included in thesesmaller data sets are indicated in Table 1and were selected to match the taxon sam-pling of the earlier studies as closely aspossible. Exact matches were not possiblebecause DDC sequences could not be ob-tained for some of the species used in theearlier EF-la study, and vice versa.

To test the hypothesis that the increasedsupport levels seen in the combined dataset were a direct consequence of combiningdata from independent genes, as opposedto simply increasing the total number of in-formative characters, we made additionaldata sets as follows. Seqboot, part of thePhylip package (Felsenstein, 1995), wasused to generate bootstrap pseudodata setsfrom the original data for each gene, sepa-rately. For EF-la, the first 709 bp from abootstrap pseudodata set were added to the1,240 bp of the original data set to get 1,949bp. For DDC, 1,240 bp from three separatebootstrap pseudodata sets were added tothe 709 bp of the original data set to get1,949 bp. To account for the considerablerandom variation expected among boot-strap pseudodata sets (Hillis and Bull,1993), we repeated this procedure twice.Thus there were three bootstrap-aug-

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

208 SYSTEMATIC BIOLOGY VOL. 49

mented data sets for either gene, each equalin size to the combined data set.

Parsimony analysis.—Calculation of un-corrected pairwise divergences and allmaximum parsimony (MP) analyses wereperformed by using test versions ofPAUP*4.0 (D. Swofford, pers. comm.). MPanalyses consisted of heuristic searcheswith 1,000 random addition sequences andtree bisection-reconnection (TBR) branchswapping, unless stated otherwise. InitialMP analyses used equal weights for alltaxa. Subsequent analyses used six-parame-ter parsimony step matrices, calculated byusing a spreadsheet described by Cunning-ham (1997a) and supplied to us by the au-thor. In these analyses, transformationswere weighted by the negative of the nat-ural logarithm of their frequencies (our "In-weighting").

Bootstrap percentages (BPs) were basedon 1,000 bootstrap replicates with three ran-dom addition sequences per replicate, foreach gene separately and for combineddata, using equal weighting for all charac-ters. Each bootstrap analysis performed un-der In-weighting was restricted to 200 repli-cates, because the use of step matricesincreased computer run time approxi-mately fivefold.

Partitions.—Before combining the single-gene data sets in an effort to increase phy-logenetic resolution, we considered how,and indeed whether, a combined analysisshould be performed. Proponents of condi-tional combination argue that the degree ofincongruence among data partitions shoulddetermine whether independent data setscan be combined (e.g., Swofford, 1991; Bullet al, 1993; Huelsenbeck et al., 1996). Theincongruence length difference (ILD) test(Farris et al., 1994) appears to be the mostappropriate method currently available forthis purpose, but it may be too conservativea test (Sullivan, 1996; Cunningham, 1997b).Thus, adherence to conventional signifi-cance levels under this test could mean notcombining some independent data sets,even when doing so would yield a more ro-bust and accurate phylogeny. To investigatethis possibility, we carried out a combinedanalysis for the 77-taxa data set despite asignificant incongruence test result.

The ILD test was performed inPAUP*4.0b2 (D. Swofford, 1999). As sug-

gested by Cunningham (1997a), invariantcharacters were excluded before the testswere performed. Details of parametersused for each test are given under Results.All ILD tests used 200 replicates with ran-dom addition sequences for taxa and TBRbranch swapping. Tests performed on thereduced 49-taxa data sets used 10 randomaddition sequences, whereas tests per-formed on the 77-taxa data sets used threerandom addition sequences per replicate.To determine the contributions of individ-ual taxa to overall significance of ILD testresults, we excluded taxa for which place-ment varied markedly between genes orwithin a gene among methods and re-peated the tests.

Distance analyses.—Some data partitionsshowed significant heterogeneity of basefrequencies among taxa. Nonstationarity ofbase frequencies can cause inconsistency ofsome phylogenetic methods, but LogDetdistances (Lockart et al., 1994) are robust tononstationarity (Swofford et al., 1996). Aninitial minimum evolution (ME; Rzhetskyand Nei, 1992) analysis using LogDet dis-tances with invariable sites was performedfor comparison with ME trees derived un-der the preferred maximum likelihood(ML) model (see below). A ME bootstrapanalysis also, was performed, with the MLdistances estimated by using the samemodel and parameters as the ML analyses(below). Five hundred bootstrap replicateswere performed, each of which used fiverandom-addition sequences for taxa andTBR branch swapping. Complete-and-par-tial bootstrap analyses were carried out byusing the software "njbootli" (Zharkikhand Li, 1995), which uses a synonymousversus nonsynonymous rate model, basedon Li (1993). This technique corrects boot-strap values based on the number of taxa ina data set, facilitating comparison of sup-port levels among data sets with differenttaxon samples.

Comparison of support levels.—Supportlevels were compared in two ways. First,the overall frequency of strong support wasused as a measure of resolution strengthper se, regardless of which groups thesewere. Second, clade-by-clade comparisonsof support levels were made, in which wealso kept track of concordance groups. Thisprovides a combined measure of support

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 209

and accuracy. For comparisons between 49-taxa and 77-taxa data sets, we consideredonly those groups represented by multipletaxa in both data sets. We used sign tests, asdescribed by Zar (1984), to evaluate the sta-tistical significance of differences in BPs be-tween data sets; only the direction (sign) ofdifferences between members of a pair isused in this test, not the actual difference.Nodes that broke up concordance groupsand therefore were expected to have lowsupport levels were treated differently fromother groups; that is, the sign was reversedbefore performing the test. Differences ofzero are not considered in sign tests, so n isthe number of differences having a sign.

Maximum likelihood analyses.—ML analy-ses were performed on the 77-taxa com-bined data set by using PAUP*4.0b2 (Swof-ford, 1999). As suggested by Frati et al.(1997), 16 models (four substitution modelspermuted with four rate-distribution mod-els) were evaluated on the most-parsimo-nious (mp) trees derived under In-weight-ing. Likelihood ratio tests are used todetermine which models have the signifi-cantly higher likelihood scores. The bestmodel is then chosen from the set of modelswith the best score, preference being givento the model with the fewest free parame-ters. This model was selected for furtheranalyses, with all parameters estimatedonce from the data on the mp trees, andfixed subsequently. The ML heuristic searchstrategy entailed (1) TBR branch swappingto completion on user-input trees, includ-ing the mp trees obtained under equal andIn-weighting, and ME trees obtained underthe preferred ML model and the LogDetmodel; (2) 15 random addition sequencereplicates for taxa with nearest neighbor in-terchange (NNI) branch swapping to com-pletion; (3) 10 random addition sequencereplicates for taxa with TBR branch swap-ping, with only a single replicate swappingto completion because of time constraints.The Kishino-Hasegawa test (Kishino andHasegawa, 1989) was used to chooseamong ML trees.

RESULTS

Third-codon position base compositionwas significantly heterogeneous amongtaxa for both EF-la and DDC (x2-tests, P =

0.013 and 0.000, respectively). For EF-la,exclusion of Arctiidae and Lymantriidaefrom the data set resulted in homogeneityof base composition (P = 1). For DDC wehad to exclude all or part of Notodontidae,Arctiidae, Lymantriidae, and quadrifineNoctuidae before nonsignificance wasachieved. However, the LogDet-model MEtrees recovered the same concordancegroups as the In-weighted mp tree and dif-fered from the ML-model ME trees only inminor rearrangements among adjacentnodes.

ILD tests performed on the two com-bined-gene 49-taxa data sets yielded non-significant results under both equal weight-ing (P = 0.450 and 0.500) and In-weighting(P = 0.210 and 0.570).

For the 77-taxa data set results of ILDtests are summarized in Table 3. Underequal weighting, an initial ILD test sug-gested statistically significant incongruencebetween the two genes (P = 0.005, the mini-mum value possible for 200 test replicates).Incongruence became less significant on ex-clusion of certain relatively long-branchtaxa (P = 0.035). Under In-weighting, ILDtest P-values were usually higher, and amarginally significant test result was ob-tained by excluding eight taxa (P = 0.040).

Maximum Parsimony Analyses, 77 Taxa

Table 4 summarizes the more notable dif-ferences among the mp trees recovered byeach gene and by the combined data set un-der both equal weighting and six-parame-ter parsimony In-weighting. The number ofconcordance groups recovered by data par-titions under different weighting schemesis shown in Table 5. Summary statistics forvarious data partitions plotted onto one ofthe mp trees are shown in Table 6.

EF-la.—Uncorrected pairwise diver-gence values (total nucleotides) for EF-laranged from 1.0% (within Spodoptera) to10.7% (Aganainae + Herminiinae vs. Arcti-idae + Lymantriidae). EF-la divergencevalues were greater within the Arctiidae +Lymantriidae clade because of differencesin third-codon position base composition(Mitchell et al., 1997). Equally weighted MPanalysis of EF-la alone resulted in 25 mptrees of 2,736 steps each, with the retentionindex (RI) = 0.443. The strict consensus tree

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

210 SYSTEMATIC BIOLOGY VOL. 49

TABLE 3. Incongruence length difference tests on 77-taxa data set.

Taxa excluded11

None

Hmi

Blev

Hmi, Blev

Hmi, Blev, Gsep

Nolinae

Nolinae, Hmi, Gsep

Acontiinae

Plusiinae

Plusiinae, Hmi

Plusiinae, Hmi, GsepPlusiinae, Acontiinae, Hmi

Nolinae, Acontiinae

Nolinae, Plusiinae

Acontiinae, Plusiinae

Notodontidae

Acontiinae, Plusiinae, Gsep, Hmi, Blev

No. of taxa excluded

0

1

1

2

3

2

4

2

3

4

56

4

55

5

8

P-values

Equal weight

0.005

0.005

0.015

0.035

0.010

0.035

0.005

0.010

0.005

0.035

0.025

0.015

0.005

0.035

0.010

0.005

0.015

ln-weight

0.010

0.035

0.005

0.020

0.020

0.020

0.010

0.005

0.010

0.040

0.040

0.020

0.005

0.015

0.005

0.005

0.040

"Abbreviations of species names are given in Table 1.

had only 41 resolved nodes out of a possi-ble 75. Under In-weighting a single mp treewas found, with a length of 3,869 steps andRI = 0.458 (Fig. 1, left).

DDC.—Pairwise divergence values forDDC ranged from 2.8% (within Xylenini) to26.7% (within Nolinae). Equally weightedMP analysis of DDC alone resulted in fourmp trees of 4,661 steps each, with RI =0.456. The strict consensus tree had 66 re-solved nodes, many more than EF-la(equally weighted), the conflict being pri-marily among the clades of higher trifine

noctuids. The In-weighting scheme gavetwo mp trees with lengths of 7,255 stepsand RI = 0.474. The strict consensus of thesetrees had a 12-branch polytomy at the baseof the trifine clade. This polytomy was pri-marily the result of alternative placementsof Acontiinae, either at the base of thetrifine noctuid clade or as a highly derivedtrifine group. The similarity between thetwo trees is thus better illustrated by anAdams consensus tree (Fig. 1, right).

Combined data.—Equally weighted MPanalysis of the combined data set gave four

TABLE 4. Summary of most notable differences among trees inferred from equally weighted versus In-weighted parsimony analyses.

Relationships recovered in mp trees

Ingroup monophyletic?

Arctiidae monophyletic?

(Arctiidae + Lymantriidae) monophyletic?

Nolinae placed with other quadrifines?

Plusiinae placed within trifine Noctuidae?

Acontiinae placed within trifine Noctuidae?

Acontiinae basal within trifine Noctuidae?

Eq., equal weighting; In, six-parameter parsimony, logarithmic weighting; Y/N, yes for some trees and no for others; N/A, notapplicable.

EF-la only

Eq.

N

N

Y/N

Y/N

N

Y/N

N

In

Y

N

Y

Y

Y

Y

Y

DDC only

Eq.

Y/N

Y

N

Y

N

N

N/A

In

Y

Y

N

Y

N

Y

Y/N

Combined data

Eq.

Y

Y/N

Y/N

NN

Y

N

In

Y

Y

Y

Y

N

Y

Y

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 211

TABLE 5. Recovery of concordance groups by 77-taxa data sets under different methods.

Data sets MPeq. wt.a MPlnwt.3 MLa

Combined 21 21 21

EF-la 13 17

DDC 19 20

EF-la BA1>> 13 -

EF-la BA2*> 12 -

EF-la BA3b 15 -

DDCBAlb 20 -

DDCBA2" 19 -

DDCBA3" 18 -aMaximum number possible = 24; dashes indicate analysis not performed.bBootstrap-augmented data sets with 1,949 bp from single genes.

mp trees, the strict consensus of which had al., 1985) +1 + F, based on the likelihood ra-71 resolved nodes. Three mp trees were re- tio test' with 4 df (Aln L = 54.07, P < 0.001).covered under In-weighting, with lengths Table 7 shows the estimated parametersof 11,259 steps and RI = 0.461 (Fig. 2). of the GTR substitution model with rate

heterogeneity, for different partitions of thedata, based on the mp tree (Fig. 2). EF-la

Maximum Likelihood Analyses, h a d _ L 5 t i m e s a s m a n y inVariable sites as77Taxa DDC and showed somewhat greater rate

The best-fitting likelihood model for the heterogeneity, even when the proportion ofIn-weighted mp tree was the most general invariable sites in each gene was accountedand parameter-rich model, the general time for. Site-specific rate models (not tabulated)reversible model (GTR; Lanave et al, 1984; yielded the following rate estimates: theRodriguez et al., 1990), with invariant sites partition EF-la:DDC gave relative rates ofand gamma-distributed rates, that is, GTR ~ 1:3.3, and the partition first:second:third+ I + F. This model had a score of -In L = codon position gave relative rates of 4:1:8133,226.45 (where L refers to likelihood), and 2.7:1:22 for EF-la and DDC, respectively,which proved significantly better than the The best tree recovered for the combinedsecond-best model, HKY85 (Hasegawa et data set under the ML criterion (Fig. 3) had

TABLE 6. Summary statistics for data partitions mapped onto one of four mp trees for the combined data set,under equal weighting.

Combined

EF-la

All sites

ntl

nt2

nt3

DDC

All sites

ntl

nt2

nt3

No.characters

1,949

1,240

413

413

414

709

236

236

237

No. (%)variable

803 (41)

399 (32)

35(8)

18(4)

_ 346 (84)

404 (57)

110 (47)

65 (28)

229 (97)

No. (%)inform"

666 (34)

329(27)

15(4)

6(1)

303 (73)

337(48) •

79 (33)

32 (14)

226 (95)

No. stepson tree

7,485

2,792

145

37

2,610

4,693

609

237

3,847

CI

0.168

0.196

0.174

0.259

0.197

0.152

0.206

0.262

0.137

RI

0.443

0.428

0.450

0.500

0.426

0.452

0.544

0.570

0.428

CI = consistency index, excluding uninformative characters; RI = retention index.•Parsimony informative characters.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

212 SYSTEMATIC BIOLOGY VOL. 49

INOTODONTIDAE:Plusiinae*:

Nolinae* :

EuteliinaeStictopterinae* •.

HypeninaeEustrotiinae (part)Calpinae

CatocalinaeHypeninaeAganainae

Herminiinae*ARCTIIDAE*

AganainaeCalpinae

LYMANTRIIDAE*Acontiinae*

Condicinae*

Spodoptera*Stiriinae*

Amphipyrinae s.s.Psaphidinae*

Cuculliinae s.s.Eustrotiinae

Oncocnemidinae*Agaristinae*

Plusiinae*

"Raphiinae"

AcronictinaePantheinaeEriopinae

. ElaphriaCondicinaeT

Oncocnemidinae*

Heliothinae*

GalgulaElaphria'podoptera*

Cuculliinae s.s.Agaristinae*

AnorthodesNoctuini s.L*

Hadeninae(part)Apameini*Xylenini*

HadeninaeNedra

FIGURE 1. Comparison of mp trees derived under In-weighting for EF-la and DDC gene sequences. For DDC,tree shown is an Adams consensus of 2 mp trees. Thick branches have >70% bootstrap support. The genera of"Caradrinini" are bold. Asterisks indicate concordance groups. Four additional concordance groups are Euteli-inae + Stictopterinae, Cuculliinae + Oncocnemidinae, Spodoptera (Sex + Sfr + Sor), and S. frugiperda + S. orni-thogalli (Sfr + Sor).

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 213

64-88 Jfii GsepFci NotodontinaeNbid HeterocampinaeDpe Phalerinae

Nystaleinae

Plusiinae* ("trifine" Noctuidae)

"Quadrifine"Noctuidae

Lymantriidae*

Arctiidae*

HypemnaeCalpinaeCatocalinae s.s.

Asot Aganainae

Herminiinae*LymantriiniLdi

Dob . . .Oleu | OrgynniHmi LithosiinaeCfuEac | Arctiinae*HcunHomoCvi I Condicinae*Ldip

Cder ' P a n t h e i n a e

i f™ I Eustrotiinae

Tea

OobClinCcon Cuculliinae s.s.

Pepi |Agaristinae*Rabr RaphiinaePheAsp1 | AcronictinaeAsp2P°Ji IStiriinaeBenApyr Amphipyrinae s.s.Fma

96-21 r-4^- Hvi' M Hze

NraPunOcrLre

Sp|APa m e i n i

Aral ! „ , . ..Lhemlxy|enini

Notodontidae*(OUTGROUP)

Trifine"Noctuidae*

Noctuinae s.l.*

FIGURE 2. One of three mp trees for the combined data set, under In-weighting. The single branch that col-lapses in the strict consensus tree is indicated by a thick line. Numbers above branches are bootstrap percentages(shown only if s50%), followed by branch lengths under ACCTRAN optimization. Asterisks indicate concor-dance groups. Four additional concordance groups are as detailed in Figure 1. Taxa included in the Noctuinae s.l.by Poole (1995) but not strongly allied with that group by our data are indicated by a dotted line.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

214 SYSTEMATIC BIOLOGY

TABLE 7. Parameters for the general time reversible (GTR) substitutionmated by maximum likelihood on the mp tree derived under In-weighting.

Parameters EF-la

model with

DDC

VOL. 49

rate heterogeneity, esti-

Combined

F-distr. shape parameter (a)a

Proportion of invariable sites (4>)a

a estimated with 4> = 0Relative substitution rate parameters6

A-CA-GA-TC-GC-TG-T

0.820.640.20

2.16311.324.4021.825

20.011

0.940.450.31

1.9397.7412.1601.1407.3271

0.860.580.23

2.0449.0853.0271.302

10.441

aa and <(> estimated simultaneously.bR-matrix values.

a score of - In L = 33,181.26. Nine of the 12random addition sequence replicates usingTBR branch swapping, including the singlereplicate that was allowed to swap to com-pletion, found this tree. The latter replicateused approximately the same time as all 15random addition sequence replicates thatused NNI branch swapping, combined.None of the NNI-swapped replicates foundtrees as good as the TBR-swapped trees, but13 replicates found trees that were not sig-nificantly worse (Kishino-Hasegawa tests,P > 0.05).

The relationships recovered by the bestML tree (Fig. 3) were similar to those of themp tree under In-weighting (Fig. 2). Themost notable difference was in the place-ment of Condicinae, which was the basaltrifine lineage under MP but a much morederived group under ML. Although none ofthe intervening nodes were strongly sup-ported under either MP or ME, all the mptrees had significantly lower likelihoodsthan the ML tree (Kishino-Hasegawa tests,P ^ 0.040). The two methods recovered thesame number of concordance groups, al-though Psaphidinae was recovered by MLonly, and the expected relationships withinSpodoptera were recovered by MP only.

The following rearragements of taxacould be performed under the ML criterionwithout significantly altering tree scores,based on Kishino-Hasegawa tests: Plusi-inae could be moved to the base of thetrifine clade; Cucullia convexipennis (Ccon)

could be placed as sister group to Agaristi-nae, as in the mp trees; Raphia abrupta(Rabr) could be placed as sister group toPantheinae; and the four basal-most cladesof trifines in the ML tree could be united asa monophylextic group.

Taxon Sampling: Effects on Support Levels

EF-la analyses yielded 18 groups withmultiple representatives in both the 49-taxaand 77-taxa data sets (Table 8). Under MP,half of these groups showed <10% differ-ence in BP between data sets and were notconsidered further. Of the remaining nineclades, seven had greater BPs with 49 taxathan with 77 taxa (clades 1, 7, 9, 19, 27, 29,and 33); only two had greater BPs with 77taxa (clades 4 and 8). However, clade 8breaks up a concordance group (Arctiidae),which means that the lower BP obtainedwith the 49-taxa data set is more congruentwith existing phylogenetic evidence. Thus,for EF-la, eight of nine clades have "better"BPs with 49 taxa than with 77 taxa. Thistrend is statistically significant (sign test,0.05 < P < 0.02).

For DDC, 19 BP comparisons were avail-able (note that the DDC and EF-la 49-taxadata sets sampled different taxa). UnderMP, 12 clades showed <10% difference be-tween data sets and were not consideredfurther. Of the remaining seven clades,three had greater BPs with 49 taxa (clades 1,4, and 33) and four had greater BPs with 77

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 215

(t5) 0.13660

EuteliinaeI StictopterinaeHypeninaeCalpinae

IcatocalinaeAganainaeHerminiinae

LdiDob

Oleu

_ Eac

NOTODONTIDAE

Plusiinae ("TRIFINE" NOCTUIDAE)

Blev Nolinae

I Eustrotiinae

AcontiinaeCuculliinae

| Pantheinae

Agaristinae

lOncocnemidinaeRaphiinae

Hcun

"QUADRIFINE"NOCTUIDAE

LYMANTRIIDAE

Hmi

ARCTIIDAE

Acronictinae

StiriinaeAmphipyrinae

Psaphidinae

Condicinae

Heliothinae

AralLhem

"Caradrinini"

Eriopinae"Caradrinini"

Noctuini s.l.

Hadenini

"Caradrinini"Apameini

jXylenini

"TRIFINE"NOCTUIDAE

- Noctuinae s.l.

0.05 substitutions/site

FIGURE 3. Maximum likelihood tree obtained under GTR + I + T model, -In L = 33,181.26. Branches withs70% and 2:90% bootstrap support under ME analysis using the same ML model are drawn 2 X and 3 X thicker,respectively. The seven longest terminal branches and the seven longest internal branches are indicated by "t" or"i," respectively, and their rank order in parentheses above the branch; following the parentheses is the branchlength in number of substitutions per site. The dotted line indicates taxa included in the Noctuinae s.l. by Poole(1995) but not strongly allied with that group by our data.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

216 SYSTEMATIC BIOLOGY VOL. 49

taxa (clades 2, 9, 19, and 29). The sign testfor BP comparisons was nonsignificant (P >0.5).

The complete-and-partial bootstrap, de-signed to correct BPs for the number of taxaused and implemented for neighbor-join-ing (Zharkikh and Li, 1995) was performednext. For EF-la, all five of the clades thatshowed >10% difference in BP betweendata sets (clades 6, 7, 19, 25, and 27) hadlarger BPs for the 49-taxa data set. Thistrend was not statistically significant be-cause of the small number of comparisonsavailable (n = 5; 0.5 < P < 0.2). For DDC,eight of nine comparisons (clades 1, 2, 15,17, 25, 29, 30, and 33) showed the sametrend (0.05 <P< 0.02).

Combining Data: Effect on Support Levels

Overall support, measured by the per-centage of nodes with parsimony BP ̂50%, was greater for the combined-data mptrees than for either gene alone: Under In-weighting, 48% and 55% of nodes in theEF-la and DDC trees, respectively, had BP5:50%, whereas 65% of the nodes in thecombined-data mp tree are supported atthat level.

The effect of combining data on levels ofsupport for particular groups, under threedifferent methods of analysis, is shown inTable 8 (ignoring the values in parenthe-ses). The five clades that break up concor-dance groups (i.e., clades 3, 8, 35, 36, and37) almost certainly represent incorrect re-lationships. It is therefore better that theseclades receive little support. Indeed, underequally weighted parsimony, these cladeshave substantially lower BPs in the com-bined analysis than the highest value foreither gene analyzed separately. Of the re-maining 32 clades, all but seven havehigher BPs in the combined analysis thanfor each gene alone. For five of the remain-ing seven clades (11,13, 20, 33, and 34), thecombined-data BP is much nearer thehigher than the lower of the two BPs fromeach gene alone. Of the remaining twoclades, one (clade 9) reflects strong conflictbetween the two genes over the placementof the arctiid Hypoprepia miniata, and theother (clade 21) reflects conflict, albeit weaker,over the placement of Raphia abrupta. Thus,30 of 37 clades have better BPs in the com-

bined analysis and only two clades havesubstantially better BP in the analysis of asingle gene. This difference is highly signif-icant (sign test, P < 0.001). If one excludesBP differences of ^10% from consideration,then 16 of 20 clades show the same trend(0.02 > P > 0.01). The same trends were seenunder In-weighted parsimony and distancemethods.

The BPs for each of the above 37 cladeswere also tabulated separately for the com-bined data set and each of the six bootstrap-augmented, single-gene data sets (data notshown). Results of these BP comparisonsare summarized in Table 9. For DDC, thedifferences in BPs between the bootstrap-augmented data sets and the combineddata set were nonsignificant. For EF-la, thedifferences were significant, with the com-bined data set having higher BPs.

The number of concordance groups re-covered by the bootstrap-augmented datasets was also noted (Table 5). None of thesix bootstrap-augmented data sets recov-ered as many concordance groups as thecombined data set did.

DISCUSSION

The source and effects of third-positionbase composition bias in EF-la were identi-fied by Mitchell et al. (1997) and are men-tioned here only for completeness. ForEF-la the source of heterogeneity was theArctiidae and Lymantriidae. These familiesshared a distinctive third-position basecomposition, but the similarity of LogDetME trees and ML ME trees, among otherthings, suggested that this bias had little ifany effect on phylogeny reconstruction. Asimilar conclusion was reached by Fang etal. (2000) for DDC third positions, althoughthe source of heterogeneity in the 77-taxadata set is not as easily definable, includingNotodontidae and at least some quadrifinenoctuids, arctiids, and lymantriids.

Combining disparate phylogenetic datasets in a single analysis can be problematicif there is substantial disagreement amongthem (e.g., Bull et al., 1993). However,defining "substantial disagreement" is dif-ficult. The ILD test (Farris et al., 1994) hasbeen proposed as an objective test of the de-gree of disagreement among data parti-tions, but recent studies suggest that the

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

TABL

E 8.

B

oots

trap

per

cent

ages

for

cla

des

reco

vere

d b

y di

ffer

ent

data

set

s, u

nder

equ

al a

nd

In-w

eigh

ting

sch

emes

for

par

sim

ony

anal

ysis

, an

d us

ing

the

com

plet

e-an

d-pa

rtia

l bo

otst

rap

prog

ram

Njb

ootli

(Z

hark

ikh

and

Li,

1995

) for

dis

tanc

e an

alys

is. O

nly

clad

es s

how

ing

subs

tant

iall

y di

ffer

ent

boot

stra

p va

lues

(i.e

., £1

0% d

iffer

ence

)am

ong

data

set

par

titi

ons

wer

e ta

bula

ted.

Cla

de n

o.

1 2 3d 4 5 6 7 8d 9 10 11 12 13 14 15 16 17 18 19 20

Tax

a in

clud

ed i

n cl

adea

Ingr

oup

(all

exce

pt N

otod

onti

dae)

Not

odon

tida

e, e

xcl.

Gse

p

Gse

p, B

lev

Arc

tiid

ae, L

yman

trii

dae,

som

e qu

adri

fine

noc

tuid

s*

Arc

tiid

ae, L

yman

trii

dae,

Her

min

iina

e, A

gana

inae

Arc

tiid

ae, L

yman

trii

dae

Lym

antr

iida

e

Lym

antr

iida

e, H

mi

Arc

tiid

aeH

erm

inii

nae,

Aga

nain

aeH

erm

inii

nae

Eut

elii

nae,

Sti

ctop

teri

nae

Cat

ocal

inae

, Cal

pina

eN

olin

aeE

ustr

otii

nae

som

e P

lusi

inae

(A

fa,

Cfo

r)S

tiri

inae

, Psa

phid

inae

, Apy

rP

saph

idin

ae, A

pyr

Psa

phid

inae

som

e P

saph

idin

ae (

Pres

, Tsa

p)

EF

-la

12 (

24)

<5 34 53 (

19)

<5(

11)

38 (

36)

54(7

1)

77(5

8)

• <

5(35

)

14 50 34 70 15 (

18)

6 67 16(9

)

13 (

21)

54(6

4)

25

MP,

equ

al w

eigh

ting

DD

O

42 (

64)

45 (

<5)

12 31

(49

)

35

(39

)

24 (

28)

75 (

69)

<5

(8)

91{7

7)

20 95 74

7(1

0)

16 56

(52

)

89 31

(30

)

68

(60

)

78 (

68)

89

Com

b.

50 53 <5 84 52 52 85 43 52 29 93 81 58 27 72 95 53 72 86 74

MP,

In-w

eigh

ting

EF

-la

13 9

22 68 <5 56 72 74 <5 36 59 33 75 19 <5 70 18 16 32 44

DD

C

69 75 9

57 37 31 88 <5 92 33 95 85 <5 47 80 87 63 80 36 90

Com

b.

64 77 15 92 43 71 97 9 81 69 92 91 44 65 72 98 75 89 33 97

Dis

tanc

e, N

jboo

tlib

EF

-la'

4(1

2)

- - 23

(-)

36 (

61)

64

(94

)

74 (

76)

-H•

-

72 - - -(-)

- 98 11 (

10)

-(-)

23 (

66)

40

DD

O

19 (

70)

44

(76

)

- -(1

8)

36 (

28)

-(8

1)

-(-)

66 (

71)

- 95 47 -(1

7)

- 14 (

63)

100 45

(64

)

44

(49

)

51

(-)

89

Com

b.

40 50 - 65 - 82 89 54 - 79 89 80 - - - 99 65 - 79 71

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

TAB

LE 8

. C

on

tin

ued

Cla

de n

o.

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35d

36d

37^

Tax

a in

clud

ed i

n cl

adea

Rab

r, P

he

On

cocn

emid

inae

, A

gar

isti

nae

Co

nd

icin

ae

som

e C

on

dic

inae

(C

vi,

Ld

ip)

som

e H

elio

thin

ae (

Sar

, Ab

e, H

al, M

te, H

vi,

Hze

)

som

e H

elio

thin

ae (

Hv

i, H

ze)

No

ctu

inae

s.l., e

xcl

. al

l "C

arad

rin

ini"

bu

t A

tar

No

ctu

inae

s.L

, excl

. al

l "C

arad

rin

ini"

Had

en

inae, A

pam

ein

i, X

yle

nin

i, N

ra

Had

en

inae, A

pam

ein

i, X

yle

nin

i

Ap

amei

ni,

Xyle

nin

i

Ap

amei

ni

Xy

len

ini

Cm

ol,

Elg

Cm

ol,

Ho

mo

Sex

, S

or

Aam

p,

Ara

l

EF

-la

44 (

43)

8(<

5)

7

34 35 (

27)

29 60

(76

)

8 12 (

60)

14 (

20)

90 15 19 (

95)

<5 67 76 57

MP,

equ

al w

eigh

ting

DD

Cc

<5 43 42 91 84 (

82)

67(6

4)

94

(93

)

50 27 (

16)

21

(16

)

47 81 75(9

5)

47 <5 20 <5

Com

b.

<5 51 69 93 90 77 100 57 44 43 97 90 68 33 <5 28 <5

MP,

In-

wei

ghtin

g

EF

-la

33

7 8

76 51 67 54 <5 13 15 95 20 25 <5 84 74 48

DD

C

<5 36 35 95 84 94 86 62 22 12 18 97 75 42 5 32 <5

Com

b.

<5 45 58 97 91 96 99 52 13 22 96 98 72 42 11 22 <5

Dis

tanc

e, N

jboo

tlid

EF

-lac

28

(-)

-(-)

13 29 69 (

84)

72 40 (

73)

- -(5

2)

-(-)

81 40 -(9

0)

- 38 61 -

DD

O

- - 38 71 62 (

75)

48

(-)

80 (

88)

24 17(7

6)

19 (

53)

79 59 77 (

100)

30 - 94 -

Com

b.

- 30 - 86 96 76 97 24 34 - 95 96 90 17 - 58 -a S

peci

es a

bbre

viat

ions

are

fro

m T

able

1.

b Das

hes

deno

te n

o va

lue

give

n (t

he "

njbo

otli

" so

ftw

are

repo

rts

boot

stra

p va

lues

for

thos

e cl

ades

rec

over

ed i

n th

e ne

ighb

or-j

oini

ng t

ree

only

).c B

oots

trap

per

cent

ages

in

pare

nthe

ses

are

valu

es d

eriv

ed f

or c

ompa

rabl

e gr

oups

(if

pres

ent)

in

the

49-t

axa

data

set

s.dC

lade

s th

at b

reak

up

conc

orda

nce

grou

ps (

clad

es f

or w

hich

mon

ophy

ly h

as b

een

esta

blis

hed

by m

orph

olog

ical

evi

denc

e); i

t is

expe

cted

tha

t th

ese

grou

ps r

ecei

ve li

ttle

supp

ort.

e Qua

drif

ine

noct

uid

subf

amil

ies

incl

uded

are

Her

min

iina

e, A

gana

inae

, Cat

ocal

inae

, Cal

pina

e, a

nd H

ypen

inae

.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 219

TABLE 9. Sign tests performed on bootstrap percentage (BP) comparisons between each of the six bootstrap-augmented data sets and the combined data set for the 37 clades listed in Table 8.

Data sets

EF-la 1EF-la 2EF-la 3DDC1DDC2DDC3

+-

34

33

32

15

14

13

— b

3

4

5

17

20

20

0

0

0

5

3

4

n

37

37

37

32

34

33

<0.001

<0.001

<0.001

>0.5

>0.5

<0.5, >0.2

"Number of clades for which the combined data set had the larger BP.^Number of clades for which the combined data set had the smaller BP.cNumber of clades showing no difference in BP."Two-tailed test.

ILD test is not entirely appropriate as a testfor combinability of separate data parti-tions. For example, Cunningham (1997b)suggested that "a significance threshold of0.05 may be too conservative for the ILDtest," after finding that ILD test P-valueshad to be much less than 0.01 (P = 0.001)before combined analysis of data partitionsled to reduced recovery of a known phy-logeny. By this criterion, many of our ILDtests performed under In-weighting are ef-fectively nonsignificant (0.05 < P < 0.01),justifying our analysis of the combined dataset. Nonetheless, much information maybe gleaned from comparison of P-valuesamong different tests, as discussed below.

Because there can be only one truespecies phylogeny, incongruence must indi-cate either that phylogeny reconstructionfor at least one of the data partitions was in-accurate, attributable to poor data qualityor inappropriate phylogenetic analysis, orthat the partitions have different evolution-ary histories, attributable to lineage sorting,for example. The latter explanation is moreplausible when internodes are relativelyshort, as is seen for the deeper branches ofthe ML tree. Thus lineage sorting cannot beruled out for this data set. In our analyses,however, much of the incongruence can beeliminated by excluding highly divergedtaxa, which involve relatively deep nodesin the tree. This observation seems to favorerror in the gene tree(s), possibly from longbranch attraction (Felsenstein, 1978b), asthe explanation for the incongruence. It istherefore surprising that ILD tests per-formed on the reduced 49-taxa data setsgave consistently higher P-values than

equivalent tests performed on the complete77-taxa data sets; that is, incongruence be-tween data partitions increased with in-creased taxon sampling. Long branch andtaxon sampling effects could reasonably beexpected to decrease with the addition oftaxa, making it easier to reconstruct the truephylogeny (e.g., Hillis, 1996). A possiblereason for this apparent contradiction isthat we could be exacerbating the longbranch problem when adding more taxa,because we are inadvertently selectinghighly divergent exemplar species from apoorly known higher-level phylogeny.

Robustness of Results to Variationin Phylogenetic Methods

Almost all nodes strongly supported byMP analysis were also strongly supportedby ME analysis, and vice versa, with the fol-lowing exceptions: Three groups with mod-erate support under MP (5:65%; Nolinae,Arctiidae + Lymantriidae, and Spodopterafrugiperda + S. ornithogalli) had much lesssupport under ME (^55%). In contrast,support for Psaphidinae, Condicinae, andStiriinae + Amphipyrinae + Psaphidinae in-creased from 33%, 58%, and 75% under MPto 83%, 90%, and 96%, respectively, underME. Relationships within Plusiinae andwithin Spodoptera were recovered "cor-rectly," that is, in agreement with morphol-ogy, by the mp tree (Fig. 2) but not by theML tree (Fig. 3). Perhaps not coincidentally,both groups are subtended by long internalbranches. The placement of plusiineswithin noctuids also differed: The plusiineswere sister group to the rest of the ingroup

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

220 SYSTEMATIC BIOLOGY VOL. 49

in the mp tree but were sister group to thequadrifine noctuids, arctiids, and lymantri-ids in the ML tree. However, under the MLcriterion, Plusiinae can be placed in eitherposition, or as sister group to all other tri-fine noctuids, without significantly changingthe likelihood score (Kishino-Hasegawatests, P > 0.5).

More Taxa or More Characters?

Taxon sampling effects.—Under the MP cri-terion, higher support values were seen inthe 49-taxa data set than in the 77-taxa dataset, in comparisons made for EF-la. ForDDC, slightly more than half the cladescompared had higher support with 77 taxa,but the trend was not statistically signifi-cant. Note that the extra taxa added in the77-taxa data set join below the root node ofthe same group in the smaller data set forthree clades (1, 4, and 27). This might bepredicted to result in lower BPs for the 77-taxa data set because the subtendingbranch is split. However, for EF-la clade 4actually increased in BP with increasedtaxon sampling. None of the clades affectedin this way in the DDC analysis showed achange >10%.

When a distance method with a correc-tion for the number of taxa in each data setwas used (complete-and-partial bootstrap;Zharkikh and Li, 1995), the 49-taxa data setsclearly had higher support values for bothgenes. All available comparisons for EF-lasupported this conclusion. For DDC, al-most all clades compared showed the sametrend, and the difference was statisticallysignificant. The nonsignificance of theEF-la result obtained by this method wasobviously the result of the small number ofcomparisons available (n = 5).

Thus, in cases where increasing taxonsampling changed the support levels for aclade, the effect was a negative one. This re-sult was surprising, in light of the large(57%) increase in taxon sampling, espe-cially because the taxa had been selected inan attempt to break up long branches(Swofford et al, 1996). Furthermore, our re-sults are contrary to the findings of Poe(1998) that when adding taxa decreased theaccuracy of a phylogeny estimate, the de-crease usually did not involve preexistingrelationships.

The reason for the decrease in supportlevels with addition of taxa appears to bethat the additional groups sampled weresampled too sparsely relative to the localsaturation level for these genes. The effect isthat of adding long branches to the treerather than breaking up long branches(Graybeal, 1998; Hillis, 1998). Thus, "lin-eages that previously were not spuriouslyattracted to each other could become 'long'in a relative sense by virtue of the shorten-ing of another branch on which the addedtaxa connect" (Poe, 1998).

An alternative explanation for the de-crease in support levels, that the original 49species samples were just fortuitous combi-nations of taxa, and that different combina-tions might yield lower support values,was discounted by randomly deleting 28species from subfamilies for which morethan a single exemplar was sampled in the77 species data set (this was repeated fivetimes). Bootstrap analyses of these new 49-taxa data sets produced support valuesvery similar to those of the original 49-taxadata sets.

We have thus provided empirical data insupport of simulation studies (e.g., Poe andSwofford, 1999) that suggest when trying tosolve a difficult phylogenetic problem,adding taxa is not a panacea because it cancreate additional problems under certaincircumstances.

Addition of characters.—The combineddata sets gave both higher overall levels ofsupport, measured as percentage of nodeswith strong support, and higher supportlevels for the specific groups recovered inthe trees, than did any of the single-genedata sets alone. That trend was apparentunder all phylogenetic methods used.Those results are no surprise given thatconsistent methods of phylogeny estima-tion, by definition, will converge on thetrue tree with increasing numbers of char-acters sampled. However, the absolutenumber of characters sampled is not theonly consideration. Our data suggest thatequal attention should be paid to potentialnonindependence among characters.

Combining data from independent genes.—One caveat to the observation that consis-tent methods of phylogeny estimation willconverge on the true tree as increasingnumbers of characters are sampled is that

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 221

none of the assumptions of the method canbe violated. However, violations of as-sumptions are not necessarily easily de-tected (Swofford et al., 1996). This fact alonemay be grounds for preferring that addi-tional characters be obtained from an inde-pendent source, all else being equal.

Other reasons also support the idea thatobtaining characters from independentgenes would hold greater benefit for phy-logeny inference than obtaining the sametotal number of characters from a singlegene. First, the endemic biases of any onegene (in base composition, evolutionaryrate, and so forth) might be diluted in thecombined data set. Cummings et al. (1995),for example, showed that data sets consist-ing of blocks of contiguous sites drawnfrom complete mitochondrial genomeswere less likely to recover the genome phy-logeny than were data sets comprising anequal number of single sites drawn ran-domly from the entire genome. They pro-posed that the effects of "location-dependentprocesses in sequence evolution" could bereduced, and thus the power of phyloge-netic analysis improved, by sampling "sitesfrom throughout the genome or from othergenomes in the organisms." Second, on thepositive side, each gene might carry a sig-nal for groupings on which the other genesare silent. These effects could result in de-creased support for spurious groupings, inaddition to increased support levels for cor-rect groupings.

Our comparisons of the combined-genedata set and the single-gene data sets of thesame size obtained by bootstrap augmenta-tion provide some support for the abovepostulates. For EF-la, support levels for thebootstrap-augmented data set were consis-tently lower than for the combined data set.For DDC, there was little apparent differ-ence in support levels between the boot-strap-augmented data set and the com-bined-gene data set, although the combineddata set recovered more concordance groups.One might argue that as long as one usedan especially informative gene, it wouldmake little difference whether one sampledmore of the same gene or an additionalgene. However, the difficulty is knowing inadvance that a given gene is especially in-formative for the group of interest. Besides,this misses the point that the resolution and

support levels obtained by using the com-bined-gene data set were no less than thoseobtained from the bootstrap-augmentedDDC data set, despite the fact that EF-laproved to offer less information per nu-cleotide. Thus, while recognizing that thedynamics of this single case may not begeneralizable, we argue that the prudentapproach to collecting additional sequencedata is to prefer that such data comes fromadditional genes.

Systematics of the Noctuoidea

The tentative hypothesis that Noctuidaeis paraphyletic with respect to Arctiidaeand Lymantriidae (Weller et al., 1994;Mitchell et al., 1997; Fang et al., 2000) isstrongly supported by our new data set.There is 96% bootstrap support under MP,and 90% under ME, for a clade comprisingArctiidae, Lymantriidae, and the quadrifinenoctuid subfamilies Aganainae, Hermirii-inae, Catocalinae, Calpinae, and Hypeni-nae, to the exclusion of other quadrifineand all trifine Noctuidae. It is thus likelythat Noctuidae is not a natural group. How-ever, in the interests of taxonomic stability,we decline to revise noctuoid classificationto reflect the paraphyly of Noctuidae untilthe deeper divergences within the familyare resolved with reasonable confidence.

Supporting a growing consensus amongnoctuid systematists (Beck, 1960,1992; Hol-loway, 1989; Lafontaine and Poole, 1991;Lafontaine, 1993; Poole, 1995; Speidel et al.,1996; Kitching and Rawlins, 1999), our dataalso provide the strongest evidence to date(99% bootstrap value under MP and 100%under ME) for the monophyly of the "truecutworms" (Noctuinae s.l.; Poole, 1995). Ascircumscribed in our trees, Noctuinae s.l.includes Noctuini s.l., Hadenini, Apameini,Xylenini, and some "Caradrinini," (Nedraand Anorthodes). "Caradrinini" is a hetero-geneous assemblage of genera, most ofwhose placements are still problematic; ac-cordingly, further work is needed to clarifythe limits of Noctuinae s.l.

Several other higher-level relationshipswithin trifine noctuids are supported byour data. Support for the placement of Am-phipyra with the Psaphidinae, and the Stiri-inae as sister group to this clade, is rela-tively strong. Moreover, the broader trifine

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

222 SYSTEMATIC BIOLOGY VOL. 49

relationships in Figure 2 and Figure 3 corre-spond well to recent morphological hy-potheses. For example, groups previouslythought to be relatively primitive, such asAcontiinae and Eustrotiinae (Poole, 1995),do indeed take basal positions, whereassubfamilies generally regarded as derived,such as Agaristinae, Cuculliinae s.s., Oncoc-nemidinae, and Amphipyrinae s.s., are al-lied with the highly derived Heliothinaeand Noctuinae s.l. The placement of othergroups, such as Condicinae and Plusiinae,remains difficult. Plusiinae are thought tobe allied with the trifine clade (Poole, 1995;Speidel et al, 1996; Kitching and Rawlins,1999), but our data set lacks the resolvingpower to distinguish among various basalplacements for this group, plusiines beingbasal within trifines or within quadrifines,or even being basal to all other noctuids.Thus the deeper nodes in Noctuidae remainweakly supported, and clearly, much moreevidence will be needed to fully sort out thehigher-level relationships of this large para-phyletic group.

Conclusions

By combining data from independent nu-clear genes, we obtained the first strong ev-idence that Noctuidae are paraphyleticwith respect to Arctiidae and Lymantriidae,and we have increased confidence in manyhigher-level relationships within the family.

Concordance among multiple indepen-dent data sets is probably the most power-ful evidence systematists can provide forphylogenetic relationships (Miyamoto andFitch, 1995) and this may be reason enoughto turn to an independent source whenmore data are needed to answer a difficultphylogenetic problem. We have providedsome evidence for the importance of char-acter-set independence. Our study also sug-gests that the gene-specific biases endemicto DNA sequence data will be diluted in acombined data set, thus reducing or eveneliminating support for erroneous relation-ships.

The increased support levels in the com-bined data set provided a strong a posteri-ori argument for combining data in a singleanalysis. While we agree in principle withthe desirability of a priori congruence test-ing, our results support proposals by other

authors that the ILD test is too conservativefor the purpose of deciding when to com-bine data sets if one applies the conven-tional significance level of P = 0.05, or evenP = 0.01. An additional dilemma is that withlarge data sets, the time needed to performa rigorous ILD test may be prohibitive.Quicker methods for testing the signifi-cance of incongruence would be welcomed.

Increased taxon sampling is often recom-mended as a fix for problematic molecularphylogenetic data sets, especially if taxaare chosen specifically to break up longbranches. However, we present an empiri-cal example of deleterious effects: Supportlevels for most clades decreased when weintroduced previously unsampled, diver-gent groups to the data set. We suggest thatthis might be a more general phenomenon:If "new" taxa are not sampled denselyenough, one can inadvertently increase ap-parent long branch effects, thereby reduc-ing support for clades on the tree, evenwhen trying to do the opposite! This situa-tion is most likely to occur in taxa for whichthe phylogenetic relationships are poorlyknown. At present we are attempting to de-termine whether a much larger increase inthe taxon sample will permit more confi-dent resolution of deep nodes on the noc-tuid tree.

ACKNOWLEDGMENTS

This paper was greatly strengthened by thethoughtful suggestions made by Dick Olmstead, ChrisSimon, Tim Collins, and Cliff Cunningham. We thankJeff Shultz for providing useful comments on a draft ofthe manuscript, Bob Poole and Eric Metzler for helpwith species identifications, Suwei Zhao and KongyiJiang for technical assistance, Cliff Cunningham forsupplying a spreadsheet for calculation of six-parame-ter parsimony step matrices, Dave Swofford for use ofa test version of PAUP*4.0, Andrey Zharkikh for helpwith his complete-and-partial bootstrap software, andBob Poole, Don Lafontaine, Mike Pogue, Tim Friedlan-der, Quentin Fang, and Soowon Cho for helpful dis-cussions. We are deeply indebted to James Adams, BobDenno, Ian Kitching, Ed Knudsen, Marcus Matthews,Ric Piegler, Bob Poole, Ron Robertson, and Andy War-ren for providing fresh specimens for this study. Fi-nancial support from the USDA-NRICGP, USDA-ARSSystematic Entomology Laboratory, and the Univer-sity of Maryland Graduate School is gratefully ac-knowledged.

REFERENCES

BECK, H. 1960. Die Larvalsystematik der Eulen (Noc-tuidae). Akademie Verlag, Berlin.

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 223

BECK, H. 1992. New view of the higher classificationof the Noctuidae (Lepidoptera). Nota Lepid.15:3-28.

BULL, J. J., J. P. HUELSENBECK, C. W. CUNNINGHAM, D. L.SWOFFORD, AND P. J. WADDELL. 1993. Partitioningand combining data in phylogenetic analysis. Syst.Biol. 42:384-397.

CHO, S., A. MITCHELL, J. C. REGIER, C. MITTER, R. W.POOLE, T. P. FRIEDLANDER, AND S. ZHAO. 1995. Ahighly conserved nuclear gene for low-level phylo-genetics: Elongation factor-larecovers morphology-based tree for heliothine moths. Mol. Biol. Evol.12:650-656.

CUMMINGS, M. P., S. P. OTTO, AND J. WAKELY. 1995.Sampling properties of DNA sequence data in phy-logenetic analysis. Mol. Biol. Evol. 12:814-822.

CUNNINGHAM, C. W. 1997a. Is congruence betweendata partitions a reliable predictor of phylogeneticaccuracy? Empirically testing an iterative procedurefor choosing among phylogenetic methods. Syst.Biol. 46:464-478.

CUNNINGHAM, C. W. 1997b. Can three incongruencetests predict when data should be combined? Mol.Biol. Evol. 14:733-740.

FANG, Q. Q., S. CHO, J. C. REGIER, C. MITTER, M.MATTHEWS, R. W. POOLE, T. P. FRIEDLANDER, AND S.ZHAO. 1997. A new nuclear gene for insect phylo-genetics: Dopa decarboxylase is informative of rela-tionships within the Heliothinae (Lepidoptera: Noc-tuidae). Syst. Biol. 46:269-283.

FANG, Q. Q., A. MITCHELL, J. C. REGIER, C. MITTER, T. P.FRIEDLANDER, AND R. W. POOLE. 2000. Phylogeneticutility of the nuclear gene dopa decarboxylase(DDC) in noctuoid moths (Insecta: Lepidoptera:Noctuoidea). Mol. Phylogenet. Evol. (in press).

FARRIS, J. S., M. KALLERSJO, A. G. KLUGE, AND C. BULT.1994. Testing the significance of incongruence.Cladistics 10:315-319.

FELSENSTEIN, J. 1978a. The number of evolutionarytrees. Syst. Zool. 27:27-33.

FELSENSTEIN, J. 1978b. Cases in which parsimony andcompatibility methods will be positively mislead-ing. Syst. Zool. 27:401-410.

FELSENSTEIN, J. 1995. PHYLIP (phylogeny inferencepackage), version 3.572. Department of Genetics,University of Washington, Seattle.

FRATI, F., C. SIMON, J. SULLIVAN, AND D. L. SWOFFORD.1997. Evolution of the mitochondrial cytochromeoxidase II gene in Collembola. J. Mol. Evol.44:145-158.

FRIEDLANDER, T. P., J. C. REGIER, AND C. MITTER. 1992.Nuclear gene sequences for higher-level phyloge-netic analysis: 14 promising candidates. Syst. Biol.41:483-489.

FRIEDLANDER, T. P., J. C. REGIER, C. MITTER, AND D. L.WAGNER. 1996. A nuclear gene for higher-levelphylogenetics: Phosophoenolpyruvate carboxyki-nase tracks Mesozoic-age divergences withinLepidoptera (Insecta). Mol. Biol. Evol. 13:594-604.

GALLOWAY, G. L., R. L. MALMBERG, AND R. A. PRICE.1998. Phylogenetic utility of the nuclear gene argi-nine decarboxylase: An example from the Brassi-caceae. Mol. Biol. Evol. 15:1312-1320.

GRAYBEAL, A. 1994. Evaluating the phylogenetic util-ity of genes: A search for genes informative about

deep divergences among vertebrates. Syst. Biol.43:174-193.

GRAYBEAL, A. 1998. Is it better to add taxa or charac-ters to a difficult phylogenetic problem? Syst. Biol.47:9-17.

GUPTA, R. S. 1995. Phylogenetic utility of the 90kDheat shock family of protein sequences and an ex-amination of the relationship among animals, plantsand fungi species. Mol. Biol. Evol. 12:1063-1073.

HASEGAWA, M., H. KISHINO, AND T. YANO. 1985. Dat-ing of the human-ape splitting by a molecular clockof mitochondrial DNA. J. Mol. Evol. 21:160-174.

HILLIS, D. M. 1996. Inferring complex phylogenies.Nature 383:130-131.

HILLIS, D. M. 1998. Taxonomic sampling, phyloge-netic accuracy, and investigator bias. Syst. Biol.47:3-8.

HILLIS, D. M., AND J. J. BULL. 1993. An empirical testof bootstrapping as a method for assessing confi-dence in phylogenetic analysis. Syst. Biol. 42:182-192.

HOLLOWAY, J. D. 1989. The moths of Borneo. Part 12.Noctuidae: Noctuinae, Heliothinae, Hadeninae,Acronictinae, Amphipyrinae, Agaristinae. South-dene, Kuala Lampur.

HUELSENBECK, J. P., J. J. BULL., AND C. W. CUNNINGHAM.1996. Combining data in phylogenetic analysis.Trends Ecol. Evol. 11:152-158.

INTERNATIONAL UNION OF BIOCHEMISTRY. 1986.Nomenclature for incompletely specified bases innucleic acid sequences. Mol. Biol. Evol. 3:99-108.

KIM, J. 1996. General inconsistency conditions formaximum parsimony: Effects of branch lengths andincreasing numbers of taxa. Syst. Biol. 45:363-374.

KISHINO, H., AND M. HASEGAWA. 1989. Evaluation ofthe maximum likelihood estimate of the evolution-ary tree topologies from DNA sequence data, andthe branching order in Hominoidea. J. Mol. Evol.29:170-179.

KlTCHlNG, I. J. 1984. An historical review of thehigher classification of the Noctuidae (Lepidoptera).Bull. Br. Mus. Nat. Hist. (Entomol.) 49:153-234.

KITCHING, I. J., AND J. E. RAWLINS. 1999. Noctuoidea.Pages 355-401 in Handbook of zoology. Lepi-doptera, volume 1: Systematics and evolution (N. P.Kristensen, ed.). W. de Gruyter, Berlin.

LAFONTAINE, J. D. 1993. Cutworm systematics: Con-fusions and solutions. Mem. Entomol. Soc. Can.165:189-196.

LAFONTAINE, J. D., AND R. W. POOLE. 1991. Noc-tuoidea, Noctuidae (Part), Plusiinae. Moths ofAmerica north of Mexico, Fascicle 25.1. The WedgeEntomological Research Foundation, Washington,DC.

LANAVE, C, J. PREPARATA, C. SACCONE, AND G. SERIO.1984. A new method for calculating evolutionarysubstitution rates. J. Mol. Evol. 20:86-93.

Li, W.-H. 1993. Unbiased estimation of the rates ofsynonymous and nonsynonymous substitution. J.Mol. Evol. 36:96-99.

LOCKART, P. J., M. A. STEEL, M. D. HENDY, AND D. PENNY.1994. Recovering evolutionary trees under a morerealistic model of sequence evolution. Mol. Biol.Evol. 11:605-612.

MILLER, J. S. 1991. Cladistics and classification of theNotodontidae (Lepidoptera: Noctuoidea) based on

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

224 SYSTEMATIC BIOLOGY VOL. 49

larval and adult morphology. Bull. Am. Mus. Nat.Hist. 204:1-230.

MITCHELL, A., S. CHO, J. C. REGIER, C. MITTER, R. W.POOLE, AND M. MATTHEWS. 1997. Phylogenetic util-ity of elongation factor-la in Noctuoidea (Insecta:Lepidoptera): The limits of synonymous substitu-tion. Mol. Biol. Evol. 14:381-390.

MIYAMOTO, M. M., AND W. M. FITCH. 1995. Testingspecies phylogenies and phylogenetic methods withcongruence. Syst. Biol. 44:64-76.

ORTI, G., AND A. MEYER. 1996. Molecular evolution ofependymin and the phylogenetic resolution of earlydivergences among euteleost fishes. Mol. Biol. Evol.13:556-574.

POE, S. 1998. The effect of taxonomic sampling on ac-curacy of phylogeny estimation: Test case of aknown phylogeny. Mol. Biol. Evol. 15:1086-1090.

POE, S., AND D. L. SWOFFORD. 1999. Taxon samplingrevisited. Nature 398:299-300.

POOLE, R. W. 1995. Noctuoidea. Noctuidae (part).Cuculliinae, Stiriinae, Psaphidinae (part). Moths ofAmerica north of Mexico, Fascicle 26.1. The WedgeEntomological Research Foundation. Washington,DC.

REGIER, J. C, AND J. W. SHULTZ. 1997. Molecular phy-logeny of the major arthropod groups indicatespolyphyly of crustaceans and a new hypothesis forthe origin of hexapods. Mol. Biol. Evol. 14:902-913.

RODRIGUEZ, F., J. L. OLIVER, A. MARIN, AND J. R. MEDINA.1990. The general stochastic model of nucleotidesubstitution. J. Theor. Biol. 142:485-501.

RZHETSKY, A., AND M. NEI. 1992. A simple method forestimating and testing minimum evolution trees.Mol. Biol. Evol. 9:945-967.

SCOBLE, M. J. 1992. The Lepidoptera. Form, functionand diversity. Oxford Univ. Press, Oxford, U.K.

SLADE, R. W., C. MORITZ, AND A. HEIDEMAN. 1994.Multiple nuclear-gene phylogenies: Application topinnipeds and comparison with a mitochondrialDNAgene phylogeny. Mol. Biol. Evol. 11:341-356.

SMITH, S. W., R. OVERBEEK, C. R. WOESE, W. GILBERT, ANDP. M. GILLEVET. 1994. The genetic data environ-ment and expandable GUI for multiple sequenceanalysis. Comput. Appl. Biosci. 10:671-675.

SPEIDEL, W., H. FANGER, AND C. M. NAUMANN. 1996.The phylogeny of the Noctuidae (Lepidoptera).Syst. Entomol. 21:219-251.

STADEN,R. 1992. STADEN package. MRC Laboratoryof Molecular Biology, Cambridge, U.K.

SULLIVAN, J. 1996. Combining data with different dis-tributions of among-site variation. Syst. Biol.45:375-380.

SWOFFORD, D. L. 1991. When are phylogeny esti-mates from molecular and morphological data in-congruent? Pages 295-333 in Phylogenetic analysisof DNA sequences (M. M. Miyamoto and J. Cracraft,eds.). Oxford Univ. Press, New York.

SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, AND D. M.HILLIS. 1996. Phylogenetic inference. Pages 407-514 in Molecular systematics, 2nd edition (D. M.Hillis, C. Moritz, and B. K. Mable, eds.). Sinauer As-sociates, Sunderland, Massachusetts.

SWOFFORD, D. L. 1999. PAUP*. Phylogenetic analysisusing parsimony (*and other methods). Version 4.Sinauer Associates, Sunderland, Massachusetts.

WATERS, E. R. 1995. An evaluation of the usefulnessof the small heat shock genes for phylogeneticanalysis in plants. Ann. Mo. Bot. Gard. 82:278-295.

WELLER, S. J., D. P. PASHLEY, J. A. MARTIN, AND J. L. CON-STABLE. 1994. Phylogeny of noctuoid moths andthe utility of combining independent nuclear andmitochondrial genes. Syst. Biol. 43:194-211.

ZAR, J. H. 1984. Biostatistical analysis, 2nd edition.Prentice Hall, Englewood Cliffs, New Jersey.

ZHARKIKH, A., AND W.-H. LI. 1995. Estimation of con-fidence in phylogeny: The complete-and-partialbootstrap technique. Mol. Phylogenet. Evol. 4:44-63.

Received 10 November 1998; accepted 6 April 1999

Associate Editor: C. Simon

by guest on April 20, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from