Post on 09-Feb-2023
Syst. Biol. 49(2):202-224,2000
More Taxa or More Characters Revisited: Combining Data from NuclearProtein-Encoding Genes for Phylogenetic Analyses of Noctuoidea
(Insecta: Lepidoptera)
ANDREW MITCHELL,1'3 CHARLES MITTER,1 AND JEROME C. REGIER2
^Department of Entomology, University of Maryland, College Park, Maryland 20742, USA2Center for Agricultural Biotechnology, University of Maryland Biotechnology Institute,
College Park, Maryland 20742, USA
Abstract.—A central question concerning data collection strategy for molecular phylogenies hasbeen, is it better to increase the number of characters or the number of taxa sampled to improve therobustness of a phylogeny estimate? A recent simulation study concluded that increasing the num-ber of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are cho-sen specifically to break up long branches. We explore this hypothesis by using empirical data fromnoctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes,elongation factor-la (EF-la) and dopa decarboxylase (DDC), have yielded similar gene trees andhigh concordance with morphological groupings for 49 exemplar species. However, support levelswere quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic sig-nal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data fromthe two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed tobreak up long branches, generated greater disagreement between the two gene data sets and de-creased support levels for deeper nodes. We appear to have inadvertently introduced new longbranches, and breaking these up may require a yet larger taxon sample. Sampling additional char-acters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect ofcombining data from independent genes with collection of the same total number of charactersfrom a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets.Support levels for combined data were at least as high as those for the bootstrap-augmented dataset for DDC and were much higher than those for the augmented EF-la data set. This supports theview that in obtaining additional sequence data to solve a refractory systematic problem, it is pru-dent to take them from an independent gene. [Combining data; independent genes; Noctuidae;Noctuoidea; taxon sampling.]
Quantifying empirical support for a phy-logeny is now the norm in systematics, andweak support is typically regarded as aproblem in need of solving. The solutionmost often proposed, particularly for mole-cular studies, is to collect more data. How-ever, given financial and other constraints,one may have to choose between either col-lecting additional sequence data for thetaxa already sampled or sampling addi-tional taxa. The benefits of collecting morecharacters are evident: By definition, con-sistent methods of phylogeny estimationwill converge on the correct answer or truetree as the number of characters increases.Increasing taxon sampling can help phy-
3Current address (and address for correspondence):Department of Biological Sciences, CW-405 BiologicalSciences Building, University of Alberta, Edmonton,Canada T6G 2E9. E-mail: aml6@ualberta.ca
logeny estimation by reducing long brancheffects, but the benefits are less obvious be-cause the number of potential trees, andthus the size of the estimation problem, in-creases geometrically with the number oftaxa (Felsenstein, 1978a).
Taxon-sampling density has been thesubject of much debate in the recent sys-tematics literature, e.g., Systematic Biology,March 1998. Many authors have noted thatincreased taxon sampling might generallyincrease the accuracy of estimates of phy-logeny (e.g., Hillis, 1996; but see Kim, 1996).Graybeal (1998) explicitly addressed the is-sue in a simulation study, concluding that iftaxa are chosen specifically to break uplong branches, increasing the number oftaxa sampled is preferable to increasing thenumber of nucleotide characters. However,there have been few empirical studies onthese questions.
202
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 203
We present a case study of noctuoidmoths (Insecta: Lepidoptera) in the contextof this debate. Given previously low levelsof support for the hypotheses of interest, inthis case the deeper nodes in the tree, weask whether it is better to increase the num-ber of characters or the number of taxa sam-pled to improve the overall robustness ofthe phylogeny? We also expand the ques-tion to contrast the possibility of collectingadditional characters from the same source,as in the simulations of Graybeal (1998), tothat of obtaining such characters from a sec-ond, independent gene. With the recent de-velopment of several nuclear protein-en-coding genes for use in systematics (Sladeet al., 1994; Cho et al., 1995; Gupta, 1995;Waters, 1995; Friedlander et al., 1996; Ortiand Meyer, 1996; Fang et al., 1997; Regierand Shultz, 1997; Galloway et al., 1998), andthe promise of more to come (e.g., Friedlan-der et al., 1992; Graybeal, 1994), there isnow greater potential for utilizing indepen-dent sources of nucleotide characters in asingle analysis.
We investigated these questions as partof an ongoing systematic study of the Noc-tuoidea, one of the largest superfamilies ofinsects (>45,000 species; Scoble, 1992). Sep-arate studies of two nuclear genes, elonga-tion factor-la (EF-la) and dopa decarboxy-lase (DDC), have shown each to carrymuch information about noctuoid relation-ships (Mitchell et al., 1997; Fang et al.,1999). The two genes yield very similartrees for overlapping sets of 49 exemplarspecies and show almost complete concor-dance with groupings that have beenstrongly supported by earlier morphologi-cal evidence. However, in each gene tree,support was weak for the deeper nodes,which represent relationships above thesubfamily level.
Seeking a more robust phylogeny esti-mate, in this study we investigate the effecton phylogenetic signal of (1) increasing thetaxon sampling by nearly 60%, to 77species, and (2) combining data from thetwo genes in a single analysis. To contrastthe effect of combining data from differentsources to collection of the same total num-ber of characters from a single gene, wesimulate the latter by bootstrap augmenta-
tion of the single-gene data sets. The resultsconfirm that adding data from a secondgene can yield greater benefit for phy-logeny reconstruction than obtaining morecharacters from a single gene. Somewhatsurprisingly, however, we find that our in-creased taxon sampling, even though de-signed to break up long branches, producedmarkedly greater incongruence betweengenes and, if anything, decreased support fordeeper nodes.
Phylogenetic Framework:Current Understanding of
Noctuoid Relationships
Phylogenetic relationships within Noc-tuoidea have been problematic (Kitching,1984), but recent work has identified sev-eral probable monophyletic groups, forwhich Poole (1995) and Kitching and Rawl-ins (1999) cite at least one synapomorphy.We will refer to these as "concordancegroups" (Mitchell et al., 1997). Althoughthese groupings are not beyond all doubt,they are surely close approximations; wetherefore used recovery of concordancegroups as one gauge of phylogenetic utilityfor EF-la, DDC, and their combination. Wesampled multiple representatives from 24such groups, 21 of which are indicated inTable 1 and in the figures. The remainingthree concordance groups are (Euteliinae +Stictopterinae), Spodoptera, and (S. frugi-perda + S. ornithogalli); the monophyly ofthe latter two groups was confirmed by amorphological study in progress (M. Pogue,pers. comm.).
Of the four traditional noctuoid families,Notodontidae s.l. is widely agreed to be sis-ter group to the others, i.e., Noctuidae, Arc-tiidae, and Lymantriidae (Miller, 1991;Kitching and Rawlins, 1999). Arctiidae andLymantriidae are very likely monophyleticgroups (Kitching and Rawlins, 1999). Incontrast, unambiguous synapomorphiesfor Noctuidae are lacking (see Speidel et al.,1996, vs. Kitching and Rawlins, 1999).Within Noctuidae, recent reviews supporttwo groups, trifines and quadrifines. Tri-fines appear to be monophyletic (Poole,1995; Speidel et al., 1996; Kitching andRawlins, 1999), but quadrifines probably
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
204 SYSTEMATIC BIOLOGY VOL. 49
Higher taxaa
Notodontidaed
Notodontinae
Phalerinae
Nystaleinae
Heterocampinae
Arctiidaed
Lithosiinae
Arctiinaed
Arctiini
Ctenuchini
Lymantriidaed
Lymantriini
Orgyiini
"Quadrifine" Noctuidae
Aganainae
Herminiinaed
Catocalinae
Calpinae
Hypeninae
Euteliinae
Stictopterinaed
Nolinaed
Nolini
Sarrothripini
"Trifine" Noctuidaed
Acontiinaed
Eustrotiinae
Plusiinaed
TABLE 1. Species of noctuids and outgroups sampled.
Exemplars'1
Furcula cinerea
Gluphisia septentrionis
Datana perspicua
Symmerista albifrons
Nerice bidentata
Hypoprepia miniata
Estigmene acrea
Hyphantria cunea
Cisseps fulvicollis
Lymantria dispar
Dasychira obliquata
Orgyia leucostigma
Asota caricae
Palthis angulalis
Idia aemula
Catocala ultronia
Caenurgina crassiuscula
Hypsoropha monilis
Hypena scabra
Paectes pygmaea
Odontodes aleuca
Lophoptera sp.
Meganola sp.
Baileya levitans
Tarachidia candefacta
Spragueia leo
Thioptera nigrofimbria
Lithacodia musta
Anagrapha falcifera
Chrysanympha formosa
Trichoplusia ni
Abbrev.
Fci*
Gseps
Dpee-s
Sale-g
Nbide'g
Hmie'g
Eace'S
Hcune
Cfu
Ldie'S
Dobe<g
Oleu
Asot
Pane
Iaes
Cule-s
Cere
Hyps
Pscs
Ppye.g
Oale
Loph
Mmie'g
Bleve
Tcae-s
Sleoe
Thne<g
Lmsg
Afae-g
Cfor
Tnie'g
GenBank accession no.c
EF-la
U85665
AF151603f
U85666
U85667
AF151604
U85669
U85670
U85671
AF151606f
U85672
U85673
AF151605f
AF151607'
U85678
AF151608f
U85677
AF151613f
AF151612f
AF151609f
U85674
AF151611f
AF151610f
U85675
U85676
U85681
U85680
U85679
AF151614f
U85686
AF151615f
U20140
DDC
AF151539f
AF151542
AF151540
AF151541
AF151543
AF151547
AF151549
AF151550f
AF151548f
AF151544
AF151545
AF151546'
AF151551f
AF151552
AF151553f
AF151561
AF151562
AF151560
AF151554
AF151555
AF151557f
AF151556f
AF151558
AF151559f
AF151565
AF151566'
AF151564
AF151563
AF151567
AF151568f
U71401
are not. Kitching and Rawlins, for example,excluded the quadrifine subfamily Nolinaefrom Noctuidae, regarding it as related toLymantriidae and Arctiidae. Three recentmolecular studies, using 28S ribosomalRNA and the mitochondrial gene ND1
(Weller et al, 1994), EF-la (Mitchell et al.,1997), and DDC (Fang et al., 2000), similarlysuggest that some quadrifines are related tolymantriids and arctiids, but in none of thestudies were these groupings strongly sup-ported. Regarding trifines, Poole (1995)
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 205
Higher taxaa
Heliothinaed
Stiriinaed
Oncocnemidinaed
CuculliinaePsaphidinaed
AmphipyrinaePantheinae
RaphiinaeAcronictinae
Condicinaed
Condicini
Leuconyctini
Agaristinaed
EriopinaeNoctuinae s.l.dh
"Caradrinini"Prodeniini
Dypterygiini
TABLE 1. Continued
Exemplars'5
Pyrrhia exprimens
Eutricopis nexilis
Schinia arcigera
Adisura bella
Heliocheilus albipunctella
Heliothis (Masalia) terracottoides
Heliothis virescens
Helicoverpa zea
Basilodes chrysopis
Plagiomimicus olvello
Oncocnemis obscurata
Catabena lineolata
Cucullia convexipennis
Psaphida resumens
Feralia major
Triocnemis saporis
Amphipyra pyramidoides
Panthea furcilia
Charadra deridens
Raphia abrupta
Acronicta sp. 1
Acronicta sp. 2
Polygrammate hebraeicum
Condica videns
Homophoberia sp.
Leuconycta diptheroides
Eudryas grata
Psychomorpha epimenis
Callopistria mollissima
Spodoptera frugiperda
Spodoptera exigua
Spodoptera ornithogalli
Elaphria grata
Galgula partita
Anorthodes tarda
Nedra ramulosa
Abbrev.
Pexe-g
Enee'S
Sare-g
Abee'g
Hal"*MtesHvisHzee-g
Bche-g
Polie<s
Oob<=
Clin
Ccone
Prese'S
Fmae<g
Tsap
Apyre'g
Pfur*
Cdere
Rabre
AsplgAsp2e
Pheeg
Cvie-g
HomoLdipEgr^g
Pepie-g
Cmol
Sfre-g
Sexs
SorElge'g
GpaAtar
Nrae-s
GenBank accession no.c
EF-lo
U20137
U20126
U20138
U20123
U20127
AF151631f
U20135
U20136
U20125
AF151620f
U85685
AF151621'
U85694
U85695
U85696
AF151619'
U85693
U85684
U85683
U85689
AF151618f
U85687
U85688
U85682
AF151616f
AF151617'
U85690
U85691
AF151622'
U20139
AF151624'
AF151623f
U85697
AF151626f
AF151625'
U85698
DDC
U71430
U71410
U71431
U71407
U71413
U71427
U71428
U71429
U71405
U71406
AF151585f
AF151586f
AF151584f
AF151581
AF151579
AF151580f
AF151578
AF151572'
AF151573f
AF151574'
AF151575
AF151576f
AF151577
AF151569
AF151570f
AF151571'
AF151582
AF151583
AF151587f
U71403
U71404
AF151588f
AF151589
AF151591'
AF151590f
AF151592
used a broad definition of the group, andcombined the true cutworms of Lafontaine(1993) into an expanded subfamily Noctu-inae s.L, containing most of the trifinespecies. Kitching and Rawlins (1999) useda more restricted definition of trifines, ex-
cluding Acronictinae and associated taxa aswell as Pantheinae from this group. Onegoal of the present study was to further testthese hypotheses of noctuid nonmono-phyly and of the boundaries of the trifineclade.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
206 SYSTEMATIC BIOLOGY VOL. 49
Higher taxaa
HadeniniHadenina
EriopyginaApameinid
Xyleninid
Noctuini s.l.d
NoctuinaAgrotina
Aniclina
TABLE 1.
Exemplars'3
Lacinipolia renigera
Pseudaletia unipuncta
Orthodes crenulata
Apamea ampntatrix
Papaipema sp.
Anathix ralla
Lithophane hemina
Abagrotis alternata
Agrotis ipsilon
Anicla infecta
Continued
Abbrev.
Lree-g
Pung
Ocr*.g
AampPappArale-g
Lheme-g
Aalt
Aipe-g
Ainfrs
GenBank accession no.c
EF-la
U85700
AF151627f
U85699
AF151629f
AF151628f
U85702
U85701
AF151630f
U85704
U85703
DDC
AF151594
AF151595
AF151593
AF151597f
AF151596f
AF151598
AF151599
AF151601f
AF151600AF151602
"Nomenclature follows Kitching and Rawlins (1999) and Poole (1995); where these classifications were in conflict, we used themore finely split alternative.
bAll species were collected in the USA, except as follows: Asota caricae, Odontodes aleuca, and Lophoptera sp. from Thailand;Heliocheilus albipunctella and Heliothis (Masalia) terracottoides from Mali.
cThe entire alignment is available from the Society of Systematic Biologists web site (http://www.utexas.edu/admin/systbiol/).dConcordance groups, supported by morphological synapomorphies. The following are also concordance groups: Euteliinae +
Stictopterinae, Cuculliinae + Oncocnemidinae, Spodoptera, and S.frugiperda + S. ornithogalli.cSpecies included in reduced data set of 49 taxa, closely matching the taxon sampling of Mitchell et al. (1997).fGene sequences new to this study.BSpecies included in reduced data set of 49 taxa, closely matching the taxon sampling of Fang et al. (2000).hAlthough there is strong evidence for the monophyly of Noctuinae s.l., its limits are poorly defined; "Caradrinini" is placed
here tentatively (Poole, 1995).
MATERIALS AND METHODS
Species Sampled
Table 1 lists the species sampled andGenBank accession numbers of their EF-laand DDC partial sequences used in thisstudy, arranged according to recent propos-als on noctuoid classification. The new datacomprise 56 gene sequences. The entirealignment is also available from the Societyof Systematic Biologists web site (http://www.utexas.edu/admin/systbiol/).
Our earlier studies of Noctuoidea, usingEF-la (Mitchell et al., 1997) and DDC (Fanget al., 2000), sampled 49 exemplar specieseach, only 35 of which were shared betweendata sets. Because each data set was sensi-tive to taxon-sampling effects, we decidednot to attempt combining data for analysisuntil we had determined both gene se-quences for most species. This goal wasachieved for 60 species, after which se-quence data for both genes were obtainedfrom an additional 17 species, chosen tobreak up long branches present in treesfrom our earlier analyses. Seven of these
new species were from previously unsam-pled clades within Noctuoidea (Ctenuchini,Apameini, Eriopinae, and the Old Worldsubfamilies Aganainae and Stictopterinae),seven were from previously underrepre-sented clades (Lymantriidae, Plusiinae, On-cocnemidinae, Psaphidinae, Condicinae,and "Caradrinini") and the last three werespecies of Heliothinae, as reported in Choet al. (1995) and Fang et al. (1997). All told,our sample includes all four traditionalfamilies; 23 of the 31 subfamilies of Noctu-idae recognized by Kitching and Rawlins(1999), including the two noctuid subfami-lies which they raise to family rank; and 27of the 45 tribes recognized (both explicitlyand implicitly) within Noctuidae.
Laboratory Protocols
Specimens were either live-frozen at-80°C or collected directly into 100%ethanol at ambient temperature in the fieldand subsequently stored at temperatures ascold as -20°C for as long as 6 months be-fore being cooled to -80°C.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 207
TABLE 2. Sequences of DDC primers used in this study. International Union of Biochemistry (1986) codes areused to indicate degeneracy. All primers included an M13 sequence at the 5' end to facilitate automated sequenc-ing. Primer names ending in F identify forward primers, which bind to the antisense strand of DNA; primernames ending in RC identify reverse complement primers.
Primer Sequences (5'-3') Position8
1 .7sF
1.9sF
3 . 2 s F
3.3sRC
4sRC
GCH TGY ATY GGN TTY WCN TGG AT
ATG HTN GAY TGG YTV GGY CAR ATG
TGG YTN CAY GTN GAY GCN GCN TAY GC
CCA YTT RTG NGG RTT RAA RTT RAA
GG DAT YTG CCA RTG HCK RTA RTC
471-493
528-551
975-1000
1065-1088
1203-1225
4\Iucleoride position relative to the DDC sequence of Manduca sexta (GenBank accession no. U03909).
Wings and abdomens were removedfrom specimens and vouchered. Genomicnucleic acids were extracted from the tho-rax (head + thorax for smaller specimens)by using a commercially available DNA/RNA isolation kit (Amersham Corp., Ar-lington Heights, IL). Polymerase chain reac-tion (PCR) primer sequences for EF-la werethose from Cho et al. (1995). DDC primersincorporated slightly more degeneracythan those of Fang et al. (1997) and arelisted in Table 2. An additional DDC primer(3.3sRC) was also made. The 3.3sRC primerbinds downstream from 3.2sF and allowsamplification of two overlapping PCR frag-ments (1.7sF/3.3sRC or 1.9sF/3.3sRC and3.2sF/4sRC). PCR amplification protocolsfor EF-la and DDC were unchanged fromMitchell et al. (1997) and Fang et al. (1997),respectively. Note that reverse transcription(RT)-PCR was used to avoid introns inDDC. To obtain purer templates for se-quencing, we always used a nested PCRapproach; that is, a first amplification usedexternal primers, and reamplifications uti-lized nested internal primers. Sequenceswere generated with an Applied Biosys-tems 373A Stretch DNA sequencer usingTaq polymerase and dye-primer. Templateswere sequenced in both directions.
Data Analysis
We used the XDAP program within thesoftware package STADEN (Staden, 1992)to check chromatograms for accuracy ofbase-calling and to assemble contiguousDNA fragments. Sequences were alignedmanually by using the Genetic Data Envi-ronment software package (GDE 2.2; Smith
et al., 1994). No gaps were needed for align-ment.
Nucleotide sequences for each gene werefirst analyzed separately for the 77-taxadata sets, and then sequences for the twogenes were concatenated for the combineddata analyses. To facilitate direct compar-isons between our current data set with 77taxa and those of our earlier studies with 49taxa each (Mitchell et al., 1997; Fang et al.,1999), we also analyzed reduced data setswith 49 taxa each. Taxa included in thesesmaller data sets are indicated in Table 1and were selected to match the taxon sam-pling of the earlier studies as closely aspossible. Exact matches were not possiblebecause DDC sequences could not be ob-tained for some of the species used in theearlier EF-la study, and vice versa.
To test the hypothesis that the increasedsupport levels seen in the combined dataset were a direct consequence of combiningdata from independent genes, as opposedto simply increasing the total number of in-formative characters, we made additionaldata sets as follows. Seqboot, part of thePhylip package (Felsenstein, 1995), wasused to generate bootstrap pseudodata setsfrom the original data for each gene, sepa-rately. For EF-la, the first 709 bp from abootstrap pseudodata set were added to the1,240 bp of the original data set to get 1,949bp. For DDC, 1,240 bp from three separatebootstrap pseudodata sets were added tothe 709 bp of the original data set to get1,949 bp. To account for the considerablerandom variation expected among boot-strap pseudodata sets (Hillis and Bull,1993), we repeated this procedure twice.Thus there were three bootstrap-aug-
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
208 SYSTEMATIC BIOLOGY VOL. 49
mented data sets for either gene, each equalin size to the combined data set.
Parsimony analysis.—Calculation of un-corrected pairwise divergences and allmaximum parsimony (MP) analyses wereperformed by using test versions ofPAUP*4.0 (D. Swofford, pers. comm.). MPanalyses consisted of heuristic searcheswith 1,000 random addition sequences andtree bisection-reconnection (TBR) branchswapping, unless stated otherwise. InitialMP analyses used equal weights for alltaxa. Subsequent analyses used six-parame-ter parsimony step matrices, calculated byusing a spreadsheet described by Cunning-ham (1997a) and supplied to us by the au-thor. In these analyses, transformationswere weighted by the negative of the nat-ural logarithm of their frequencies (our "In-weighting").
Bootstrap percentages (BPs) were basedon 1,000 bootstrap replicates with three ran-dom addition sequences per replicate, foreach gene separately and for combineddata, using equal weighting for all charac-ters. Each bootstrap analysis performed un-der In-weighting was restricted to 200 repli-cates, because the use of step matricesincreased computer run time approxi-mately fivefold.
Partitions.—Before combining the single-gene data sets in an effort to increase phy-logenetic resolution, we considered how,and indeed whether, a combined analysisshould be performed. Proponents of condi-tional combination argue that the degree ofincongruence among data partitions shoulddetermine whether independent data setscan be combined (e.g., Swofford, 1991; Bullet al, 1993; Huelsenbeck et al., 1996). Theincongruence length difference (ILD) test(Farris et al., 1994) appears to be the mostappropriate method currently available forthis purpose, but it may be too conservativea test (Sullivan, 1996; Cunningham, 1997b).Thus, adherence to conventional signifi-cance levels under this test could mean notcombining some independent data sets,even when doing so would yield a more ro-bust and accurate phylogeny. To investigatethis possibility, we carried out a combinedanalysis for the 77-taxa data set despite asignificant incongruence test result.
The ILD test was performed inPAUP*4.0b2 (D. Swofford, 1999). As sug-
gested by Cunningham (1997a), invariantcharacters were excluded before the testswere performed. Details of parametersused for each test are given under Results.All ILD tests used 200 replicates with ran-dom addition sequences for taxa and TBRbranch swapping. Tests performed on thereduced 49-taxa data sets used 10 randomaddition sequences, whereas tests per-formed on the 77-taxa data sets used threerandom addition sequences per replicate.To determine the contributions of individ-ual taxa to overall significance of ILD testresults, we excluded taxa for which place-ment varied markedly between genes orwithin a gene among methods and re-peated the tests.
Distance analyses.—Some data partitionsshowed significant heterogeneity of basefrequencies among taxa. Nonstationarity ofbase frequencies can cause inconsistency ofsome phylogenetic methods, but LogDetdistances (Lockart et al., 1994) are robust tononstationarity (Swofford et al., 1996). Aninitial minimum evolution (ME; Rzhetskyand Nei, 1992) analysis using LogDet dis-tances with invariable sites was performedfor comparison with ME trees derived un-der the preferred maximum likelihood(ML) model (see below). A ME bootstrapanalysis also, was performed, with the MLdistances estimated by using the samemodel and parameters as the ML analyses(below). Five hundred bootstrap replicateswere performed, each of which used fiverandom-addition sequences for taxa andTBR branch swapping. Complete-and-par-tial bootstrap analyses were carried out byusing the software "njbootli" (Zharkikhand Li, 1995), which uses a synonymousversus nonsynonymous rate model, basedon Li (1993). This technique corrects boot-strap values based on the number of taxa ina data set, facilitating comparison of sup-port levels among data sets with differenttaxon samples.
Comparison of support levels.—Supportlevels were compared in two ways. First,the overall frequency of strong support wasused as a measure of resolution strengthper se, regardless of which groups thesewere. Second, clade-by-clade comparisonsof support levels were made, in which wealso kept track of concordance groups. Thisprovides a combined measure of support
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 209
and accuracy. For comparisons between 49-taxa and 77-taxa data sets, we consideredonly those groups represented by multipletaxa in both data sets. We used sign tests, asdescribed by Zar (1984), to evaluate the sta-tistical significance of differences in BPs be-tween data sets; only the direction (sign) ofdifferences between members of a pair isused in this test, not the actual difference.Nodes that broke up concordance groupsand therefore were expected to have lowsupport levels were treated differently fromother groups; that is, the sign was reversedbefore performing the test. Differences ofzero are not considered in sign tests, so n isthe number of differences having a sign.
Maximum likelihood analyses.—ML analy-ses were performed on the 77-taxa com-bined data set by using PAUP*4.0b2 (Swof-ford, 1999). As suggested by Frati et al.(1997), 16 models (four substitution modelspermuted with four rate-distribution mod-els) were evaluated on the most-parsimo-nious (mp) trees derived under In-weight-ing. Likelihood ratio tests are used todetermine which models have the signifi-cantly higher likelihood scores. The bestmodel is then chosen from the set of modelswith the best score, preference being givento the model with the fewest free parame-ters. This model was selected for furtheranalyses, with all parameters estimatedonce from the data on the mp trees, andfixed subsequently. The ML heuristic searchstrategy entailed (1) TBR branch swappingto completion on user-input trees, includ-ing the mp trees obtained under equal andIn-weighting, and ME trees obtained underthe preferred ML model and the LogDetmodel; (2) 15 random addition sequencereplicates for taxa with nearest neighbor in-terchange (NNI) branch swapping to com-pletion; (3) 10 random addition sequencereplicates for taxa with TBR branch swap-ping, with only a single replicate swappingto completion because of time constraints.The Kishino-Hasegawa test (Kishino andHasegawa, 1989) was used to chooseamong ML trees.
RESULTS
Third-codon position base compositionwas significantly heterogeneous amongtaxa for both EF-la and DDC (x2-tests, P =
0.013 and 0.000, respectively). For EF-la,exclusion of Arctiidae and Lymantriidaefrom the data set resulted in homogeneityof base composition (P = 1). For DDC wehad to exclude all or part of Notodontidae,Arctiidae, Lymantriidae, and quadrifineNoctuidae before nonsignificance wasachieved. However, the LogDet-model MEtrees recovered the same concordancegroups as the In-weighted mp tree and dif-fered from the ML-model ME trees only inminor rearrangements among adjacentnodes.
ILD tests performed on the two com-bined-gene 49-taxa data sets yielded non-significant results under both equal weight-ing (P = 0.450 and 0.500) and In-weighting(P = 0.210 and 0.570).
For the 77-taxa data set results of ILDtests are summarized in Table 3. Underequal weighting, an initial ILD test sug-gested statistically significant incongruencebetween the two genes (P = 0.005, the mini-mum value possible for 200 test replicates).Incongruence became less significant on ex-clusion of certain relatively long-branchtaxa (P = 0.035). Under In-weighting, ILDtest P-values were usually higher, and amarginally significant test result was ob-tained by excluding eight taxa (P = 0.040).
Maximum Parsimony Analyses, 77 Taxa
Table 4 summarizes the more notable dif-ferences among the mp trees recovered byeach gene and by the combined data set un-der both equal weighting and six-parame-ter parsimony In-weighting. The number ofconcordance groups recovered by data par-titions under different weighting schemesis shown in Table 5. Summary statistics forvarious data partitions plotted onto one ofthe mp trees are shown in Table 6.
EF-la.—Uncorrected pairwise diver-gence values (total nucleotides) for EF-laranged from 1.0% (within Spodoptera) to10.7% (Aganainae + Herminiinae vs. Arcti-idae + Lymantriidae). EF-la divergencevalues were greater within the Arctiidae +Lymantriidae clade because of differencesin third-codon position base composition(Mitchell et al., 1997). Equally weighted MPanalysis of EF-la alone resulted in 25 mptrees of 2,736 steps each, with the retentionindex (RI) = 0.443. The strict consensus tree
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
210 SYSTEMATIC BIOLOGY VOL. 49
TABLE 3. Incongruence length difference tests on 77-taxa data set.
Taxa excluded11
None
Hmi
Blev
Hmi, Blev
Hmi, Blev, Gsep
Nolinae
Nolinae, Hmi, Gsep
Acontiinae
Plusiinae
Plusiinae, Hmi
Plusiinae, Hmi, GsepPlusiinae, Acontiinae, Hmi
Nolinae, Acontiinae
Nolinae, Plusiinae
Acontiinae, Plusiinae
Notodontidae
Acontiinae, Plusiinae, Gsep, Hmi, Blev
No. of taxa excluded
0
1
1
2
3
2
4
2
3
4
56
4
55
5
8
P-values
Equal weight
0.005
0.005
0.015
0.035
0.010
0.035
0.005
0.010
0.005
0.035
0.025
0.015
0.005
0.035
0.010
0.005
0.015
ln-weight
0.010
0.035
0.005
0.020
0.020
0.020
0.010
0.005
0.010
0.040
0.040
0.020
0.005
0.015
0.005
0.005
0.040
"Abbreviations of species names are given in Table 1.
had only 41 resolved nodes out of a possi-ble 75. Under In-weighting a single mp treewas found, with a length of 3,869 steps andRI = 0.458 (Fig. 1, left).
DDC.—Pairwise divergence values forDDC ranged from 2.8% (within Xylenini) to26.7% (within Nolinae). Equally weightedMP analysis of DDC alone resulted in fourmp trees of 4,661 steps each, with RI =0.456. The strict consensus tree had 66 re-solved nodes, many more than EF-la(equally weighted), the conflict being pri-marily among the clades of higher trifine
noctuids. The In-weighting scheme gavetwo mp trees with lengths of 7,255 stepsand RI = 0.474. The strict consensus of thesetrees had a 12-branch polytomy at the baseof the trifine clade. This polytomy was pri-marily the result of alternative placementsof Acontiinae, either at the base of thetrifine noctuid clade or as a highly derivedtrifine group. The similarity between thetwo trees is thus better illustrated by anAdams consensus tree (Fig. 1, right).
Combined data.—Equally weighted MPanalysis of the combined data set gave four
TABLE 4. Summary of most notable differences among trees inferred from equally weighted versus In-weighted parsimony analyses.
Relationships recovered in mp trees
Ingroup monophyletic?
Arctiidae monophyletic?
(Arctiidae + Lymantriidae) monophyletic?
Nolinae placed with other quadrifines?
Plusiinae placed within trifine Noctuidae?
Acontiinae placed within trifine Noctuidae?
Acontiinae basal within trifine Noctuidae?
Eq., equal weighting; In, six-parameter parsimony, logarithmic weighting; Y/N, yes for some trees and no for others; N/A, notapplicable.
EF-la only
Eq.
N
N
Y/N
Y/N
N
Y/N
N
In
Y
N
Y
Y
Y
Y
Y
DDC only
Eq.
Y/N
Y
N
Y
N
N
N/A
In
Y
Y
N
Y
N
Y
Y/N
Combined data
Eq.
Y
Y/N
Y/N
NN
Y
N
In
Y
Y
Y
Y
N
Y
Y
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 211
TABLE 5. Recovery of concordance groups by 77-taxa data sets under different methods.
Data sets MPeq. wt.a MPlnwt.3 MLa
Combined 21 21 21
EF-la 13 17
DDC 19 20
EF-la BA1>> 13 -
EF-la BA2*> 12 -
EF-la BA3b 15 -
DDCBAlb 20 -
DDCBA2" 19 -
DDCBA3" 18 -aMaximum number possible = 24; dashes indicate analysis not performed.bBootstrap-augmented data sets with 1,949 bp from single genes.
mp trees, the strict consensus of which had al., 1985) +1 + F, based on the likelihood ra-71 resolved nodes. Three mp trees were re- tio test' with 4 df (Aln L = 54.07, P < 0.001).covered under In-weighting, with lengths Table 7 shows the estimated parametersof 11,259 steps and RI = 0.461 (Fig. 2). of the GTR substitution model with rate
heterogeneity, for different partitions of thedata, based on the mp tree (Fig. 2). EF-la
Maximum Likelihood Analyses, h a d _ L 5 t i m e s a s m a n y inVariable sites as77Taxa DDC and showed somewhat greater rate
The best-fitting likelihood model for the heterogeneity, even when the proportion ofIn-weighted mp tree was the most general invariable sites in each gene was accountedand parameter-rich model, the general time for. Site-specific rate models (not tabulated)reversible model (GTR; Lanave et al, 1984; yielded the following rate estimates: theRodriguez et al., 1990), with invariant sites partition EF-la:DDC gave relative rates ofand gamma-distributed rates, that is, GTR ~ 1:3.3, and the partition first:second:third+ I + F. This model had a score of -In L = codon position gave relative rates of 4:1:8133,226.45 (where L refers to likelihood), and 2.7:1:22 for EF-la and DDC, respectively,which proved significantly better than the The best tree recovered for the combinedsecond-best model, HKY85 (Hasegawa et data set under the ML criterion (Fig. 3) had
TABLE 6. Summary statistics for data partitions mapped onto one of four mp trees for the combined data set,under equal weighting.
Combined
EF-la
All sites
ntl
nt2
nt3
DDC
All sites
ntl
nt2
nt3
No.characters
1,949
1,240
413
413
414
709
236
236
237
No. (%)variable
803 (41)
399 (32)
35(8)
18(4)
_ 346 (84)
404 (57)
110 (47)
65 (28)
229 (97)
No. (%)inform"
666 (34)
329(27)
15(4)
6(1)
303 (73)
337(48) •
79 (33)
32 (14)
226 (95)
No. stepson tree
7,485
2,792
145
37
2,610
4,693
609
237
3,847
CI
0.168
0.196
0.174
0.259
0.197
0.152
0.206
0.262
0.137
RI
0.443
0.428
0.450
0.500
0.426
0.452
0.544
0.570
0.428
CI = consistency index, excluding uninformative characters; RI = retention index.•Parsimony informative characters.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
212 SYSTEMATIC BIOLOGY VOL. 49
INOTODONTIDAE:Plusiinae*:
Nolinae* :
EuteliinaeStictopterinae* •.
HypeninaeEustrotiinae (part)Calpinae
CatocalinaeHypeninaeAganainae
Herminiinae*ARCTIIDAE*
AganainaeCalpinae
LYMANTRIIDAE*Acontiinae*
Condicinae*
Spodoptera*Stiriinae*
Amphipyrinae s.s.Psaphidinae*
Cuculliinae s.s.Eustrotiinae
Oncocnemidinae*Agaristinae*
Plusiinae*
"Raphiinae"
AcronictinaePantheinaeEriopinae
. ElaphriaCondicinaeT
Oncocnemidinae*
Heliothinae*
GalgulaElaphria'podoptera*
Cuculliinae s.s.Agaristinae*
AnorthodesNoctuini s.L*
Hadeninae(part)Apameini*Xylenini*
HadeninaeNedra
FIGURE 1. Comparison of mp trees derived under In-weighting for EF-la and DDC gene sequences. For DDC,tree shown is an Adams consensus of 2 mp trees. Thick branches have >70% bootstrap support. The genera of"Caradrinini" are bold. Asterisks indicate concordance groups. Four additional concordance groups are Euteli-inae + Stictopterinae, Cuculliinae + Oncocnemidinae, Spodoptera (Sex + Sfr + Sor), and S. frugiperda + S. orni-thogalli (Sfr + Sor).
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 213
64-88 Jfii GsepFci NotodontinaeNbid HeterocampinaeDpe Phalerinae
Nystaleinae
Plusiinae* ("trifine" Noctuidae)
"Quadrifine"Noctuidae
Lymantriidae*
Arctiidae*
HypemnaeCalpinaeCatocalinae s.s.
Asot Aganainae
Herminiinae*LymantriiniLdi
Dob . . .Oleu | OrgynniHmi LithosiinaeCfuEac | Arctiinae*HcunHomoCvi I Condicinae*Ldip
Cder ' P a n t h e i n a e
i f™ I Eustrotiinae
Tea
OobClinCcon Cuculliinae s.s.
Pepi |Agaristinae*Rabr RaphiinaePheAsp1 | AcronictinaeAsp2P°Ji IStiriinaeBenApyr Amphipyrinae s.s.Fma
96-21 r-4^- Hvi' M Hze
NraPunOcrLre
Sp|APa m e i n i
Aral ! „ , . ..Lhemlxy|enini
Notodontidae*(OUTGROUP)
Trifine"Noctuidae*
Noctuinae s.l.*
FIGURE 2. One of three mp trees for the combined data set, under In-weighting. The single branch that col-lapses in the strict consensus tree is indicated by a thick line. Numbers above branches are bootstrap percentages(shown only if s50%), followed by branch lengths under ACCTRAN optimization. Asterisks indicate concor-dance groups. Four additional concordance groups are as detailed in Figure 1. Taxa included in the Noctuinae s.l.by Poole (1995) but not strongly allied with that group by our data are indicated by a dotted line.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
214 SYSTEMATIC BIOLOGY
TABLE 7. Parameters for the general time reversible (GTR) substitutionmated by maximum likelihood on the mp tree derived under In-weighting.
Parameters EF-la
model with
DDC
VOL. 49
rate heterogeneity, esti-
Combined
F-distr. shape parameter (a)a
Proportion of invariable sites (4>)a
a estimated with 4> = 0Relative substitution rate parameters6
A-CA-GA-TC-GC-TG-T
0.820.640.20
2.16311.324.4021.825
20.011
0.940.450.31
1.9397.7412.1601.1407.3271
0.860.580.23
2.0449.0853.0271.302
10.441
aa and <(> estimated simultaneously.bR-matrix values.
a score of - In L = 33,181.26. Nine of the 12random addition sequence replicates usingTBR branch swapping, including the singlereplicate that was allowed to swap to com-pletion, found this tree. The latter replicateused approximately the same time as all 15random addition sequence replicates thatused NNI branch swapping, combined.None of the NNI-swapped replicates foundtrees as good as the TBR-swapped trees, but13 replicates found trees that were not sig-nificantly worse (Kishino-Hasegawa tests,P > 0.05).
The relationships recovered by the bestML tree (Fig. 3) were similar to those of themp tree under In-weighting (Fig. 2). Themost notable difference was in the place-ment of Condicinae, which was the basaltrifine lineage under MP but a much morederived group under ML. Although none ofthe intervening nodes were strongly sup-ported under either MP or ME, all the mptrees had significantly lower likelihoodsthan the ML tree (Kishino-Hasegawa tests,P ^ 0.040). The two methods recovered thesame number of concordance groups, al-though Psaphidinae was recovered by MLonly, and the expected relationships withinSpodoptera were recovered by MP only.
The following rearragements of taxacould be performed under the ML criterionwithout significantly altering tree scores,based on Kishino-Hasegawa tests: Plusi-inae could be moved to the base of thetrifine clade; Cucullia convexipennis (Ccon)
could be placed as sister group to Agaristi-nae, as in the mp trees; Raphia abrupta(Rabr) could be placed as sister group toPantheinae; and the four basal-most cladesof trifines in the ML tree could be united asa monophylextic group.
Taxon Sampling: Effects on Support Levels
EF-la analyses yielded 18 groups withmultiple representatives in both the 49-taxaand 77-taxa data sets (Table 8). Under MP,half of these groups showed <10% differ-ence in BP between data sets and were notconsidered further. Of the remaining nineclades, seven had greater BPs with 49 taxathan with 77 taxa (clades 1, 7, 9, 19, 27, 29,and 33); only two had greater BPs with 77taxa (clades 4 and 8). However, clade 8breaks up a concordance group (Arctiidae),which means that the lower BP obtainedwith the 49-taxa data set is more congruentwith existing phylogenetic evidence. Thus,for EF-la, eight of nine clades have "better"BPs with 49 taxa than with 77 taxa. Thistrend is statistically significant (sign test,0.05 < P < 0.02).
For DDC, 19 BP comparisons were avail-able (note that the DDC and EF-la 49-taxadata sets sampled different taxa). UnderMP, 12 clades showed <10% difference be-tween data sets and were not consideredfurther. Of the remaining seven clades,three had greater BPs with 49 taxa (clades 1,4, and 33) and four had greater BPs with 77
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 215
(t5) 0.13660
EuteliinaeI StictopterinaeHypeninaeCalpinae
IcatocalinaeAganainaeHerminiinae
LdiDob
Oleu
_ Eac
NOTODONTIDAE
Plusiinae ("TRIFINE" NOCTUIDAE)
Blev Nolinae
I Eustrotiinae
AcontiinaeCuculliinae
| Pantheinae
Agaristinae
lOncocnemidinaeRaphiinae
Hcun
"QUADRIFINE"NOCTUIDAE
LYMANTRIIDAE
Hmi
ARCTIIDAE
Acronictinae
StiriinaeAmphipyrinae
Psaphidinae
Condicinae
Heliothinae
AralLhem
"Caradrinini"
Eriopinae"Caradrinini"
Noctuini s.l.
Hadenini
"Caradrinini"Apameini
jXylenini
"TRIFINE"NOCTUIDAE
- Noctuinae s.l.
0.05 substitutions/site
FIGURE 3. Maximum likelihood tree obtained under GTR + I + T model, -In L = 33,181.26. Branches withs70% and 2:90% bootstrap support under ME analysis using the same ML model are drawn 2 X and 3 X thicker,respectively. The seven longest terminal branches and the seven longest internal branches are indicated by "t" or"i," respectively, and their rank order in parentheses above the branch; following the parentheses is the branchlength in number of substitutions per site. The dotted line indicates taxa included in the Noctuinae s.l. by Poole(1995) but not strongly allied with that group by our data.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
216 SYSTEMATIC BIOLOGY VOL. 49
taxa (clades 2, 9, 19, and 29). The sign testfor BP comparisons was nonsignificant (P >0.5).
The complete-and-partial bootstrap, de-signed to correct BPs for the number of taxaused and implemented for neighbor-join-ing (Zharkikh and Li, 1995) was performednext. For EF-la, all five of the clades thatshowed >10% difference in BP betweendata sets (clades 6, 7, 19, 25, and 27) hadlarger BPs for the 49-taxa data set. Thistrend was not statistically significant be-cause of the small number of comparisonsavailable (n = 5; 0.5 < P < 0.2). For DDC,eight of nine comparisons (clades 1, 2, 15,17, 25, 29, 30, and 33) showed the sametrend (0.05 <P< 0.02).
Combining Data: Effect on Support Levels
Overall support, measured by the per-centage of nodes with parsimony BP ̂50%, was greater for the combined-data mptrees than for either gene alone: Under In-weighting, 48% and 55% of nodes in theEF-la and DDC trees, respectively, had BP5:50%, whereas 65% of the nodes in thecombined-data mp tree are supported atthat level.
The effect of combining data on levels ofsupport for particular groups, under threedifferent methods of analysis, is shown inTable 8 (ignoring the values in parenthe-ses). The five clades that break up concor-dance groups (i.e., clades 3, 8, 35, 36, and37) almost certainly represent incorrect re-lationships. It is therefore better that theseclades receive little support. Indeed, underequally weighted parsimony, these cladeshave substantially lower BPs in the com-bined analysis than the highest value foreither gene analyzed separately. Of the re-maining 32 clades, all but seven havehigher BPs in the combined analysis thanfor each gene alone. For five of the remain-ing seven clades (11,13, 20, 33, and 34), thecombined-data BP is much nearer thehigher than the lower of the two BPs fromeach gene alone. Of the remaining twoclades, one (clade 9) reflects strong conflictbetween the two genes over the placementof the arctiid Hypoprepia miniata, and theother (clade 21) reflects conflict, albeit weaker,over the placement of Raphia abrupta. Thus,30 of 37 clades have better BPs in the com-
bined analysis and only two clades havesubstantially better BP in the analysis of asingle gene. This difference is highly signif-icant (sign test, P < 0.001). If one excludesBP differences of ^10% from consideration,then 16 of 20 clades show the same trend(0.02 > P > 0.01). The same trends were seenunder In-weighted parsimony and distancemethods.
The BPs for each of the above 37 cladeswere also tabulated separately for the com-bined data set and each of the six bootstrap-augmented, single-gene data sets (data notshown). Results of these BP comparisonsare summarized in Table 9. For DDC, thedifferences in BPs between the bootstrap-augmented data sets and the combineddata set were nonsignificant. For EF-la, thedifferences were significant, with the com-bined data set having higher BPs.
The number of concordance groups re-covered by the bootstrap-augmented datasets was also noted (Table 5). None of thesix bootstrap-augmented data sets recov-ered as many concordance groups as thecombined data set did.
DISCUSSION
The source and effects of third-positionbase composition bias in EF-la were identi-fied by Mitchell et al. (1997) and are men-tioned here only for completeness. ForEF-la the source of heterogeneity was theArctiidae and Lymantriidae. These familiesshared a distinctive third-position basecomposition, but the similarity of LogDetME trees and ML ME trees, among otherthings, suggested that this bias had little ifany effect on phylogeny reconstruction. Asimilar conclusion was reached by Fang etal. (2000) for DDC third positions, althoughthe source of heterogeneity in the 77-taxadata set is not as easily definable, includingNotodontidae and at least some quadrifinenoctuids, arctiids, and lymantriids.
Combining disparate phylogenetic datasets in a single analysis can be problematicif there is substantial disagreement amongthem (e.g., Bull et al., 1993). However,defining "substantial disagreement" is dif-ficult. The ILD test (Farris et al., 1994) hasbeen proposed as an objective test of the de-gree of disagreement among data parti-tions, but recent studies suggest that the
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
TABL
E 8.
B
oots
trap
per
cent
ages
for
cla
des
reco
vere
d b
y di
ffer
ent
data
set
s, u
nder
equ
al a
nd
In-w
eigh
ting
sch
emes
for
par
sim
ony
anal
ysis
, an
d us
ing
the
com
plet
e-an
d-pa
rtia
l bo
otst
rap
prog
ram
Njb
ootli
(Z
hark
ikh
and
Li,
1995
) for
dis
tanc
e an
alys
is. O
nly
clad
es s
how
ing
subs
tant
iall
y di
ffer
ent
boot
stra
p va
lues
(i.e
., £1
0% d
iffer
ence
)am
ong
data
set
par
titi
ons
wer
e ta
bula
ted.
Cla
de n
o.
1 2 3d 4 5 6 7 8d 9 10 11 12 13 14 15 16 17 18 19 20
Tax
a in
clud
ed i
n cl
adea
Ingr
oup
(all
exce
pt N
otod
onti
dae)
Not
odon
tida
e, e
xcl.
Gse
p
Gse
p, B
lev
Arc
tiid
ae, L
yman
trii
dae,
som
e qu
adri
fine
noc
tuid
s*
Arc
tiid
ae, L
yman
trii
dae,
Her
min
iina
e, A
gana
inae
Arc
tiid
ae, L
yman
trii
dae
Lym
antr
iida
e
Lym
antr
iida
e, H
mi
Arc
tiid
aeH
erm
inii
nae,
Aga
nain
aeH
erm
inii
nae
Eut
elii
nae,
Sti
ctop
teri
nae
Cat
ocal
inae
, Cal
pina
eN
olin
aeE
ustr
otii
nae
som
e P
lusi
inae
(A
fa,
Cfo
r)S
tiri
inae
, Psa
phid
inae
, Apy
rP
saph
idin
ae, A
pyr
Psa
phid
inae
som
e P
saph
idin
ae (
Pres
, Tsa
p)
EF
-la
12 (
24)
<5 34 53 (
19)
<5(
11)
38 (
36)
54(7
1)
77(5
8)
• <
5(35
)
14 50 34 70 15 (
18)
6 67 16(9
)
13 (
21)
54(6
4)
25
MP,
equ
al w
eigh
ting
DD
O
42 (
64)
45 (
<5)
12 31
(49
)
35
(39
)
24 (
28)
75 (
69)
<5
(8)
91{7
7)
20 95 74
7(1
0)
16 56
(52
)
89 31
(30
)
68
(60
)
78 (
68)
89
Com
b.
50 53 <5 84 52 52 85 43 52 29 93 81 58 27 72 95 53 72 86 74
MP,
In-w
eigh
ting
EF
-la
13 9
22 68 <5 56 72 74 <5 36 59 33 75 19 <5 70 18 16 32 44
DD
C
69 75 9
57 37 31 88 <5 92 33 95 85 <5 47 80 87 63 80 36 90
Com
b.
64 77 15 92 43 71 97 9 81 69 92 91 44 65 72 98 75 89 33 97
Dis
tanc
e, N
jboo
tlib
EF
-la'
4(1
2)
- - 23
(-)
36 (
61)
64
(94
)
74 (
76)
-H•
-
72 - - -(-)
- 98 11 (
10)
-(-)
23 (
66)
40
DD
O
19 (
70)
44
(76
)
- -(1
8)
36 (
28)
-(8
1)
-(-)
66 (
71)
- 95 47 -(1
7)
- 14 (
63)
100 45
(64
)
44
(49
)
51
(-)
89
Com
b.
40 50 - 65 - 82 89 54 - 79 89 80 - - - 99 65 - 79 71
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
TAB
LE 8
. C
on
tin
ued
Cla
de n
o.
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35d
36d
37^
Tax
a in
clud
ed i
n cl
adea
Rab
r, P
he
On
cocn
emid
inae
, A
gar
isti
nae
Co
nd
icin
ae
som
e C
on
dic
inae
(C
vi,
Ld
ip)
som
e H
elio
thin
ae (
Sar
, Ab
e, H
al, M
te, H
vi,
Hze
)
som
e H
elio
thin
ae (
Hv
i, H
ze)
No
ctu
inae
s.l., e
xcl
. al
l "C
arad
rin
ini"
bu
t A
tar
No
ctu
inae
s.L
, excl
. al
l "C
arad
rin
ini"
Had
en
inae, A
pam
ein
i, X
yle
nin
i, N
ra
Had
en
inae, A
pam
ein
i, X
yle
nin
i
Ap
amei
ni,
Xyle
nin
i
Ap
amei
ni
Xy
len
ini
Cm
ol,
Elg
Cm
ol,
Ho
mo
Sex
, S
or
Aam
p,
Ara
l
EF
-la
44 (
43)
8(<
5)
7
34 35 (
27)
29 60
(76
)
8 12 (
60)
14 (
20)
90 15 19 (
95)
<5 67 76 57
MP,
equ
al w
eigh
ting
DD
Cc
<5 43 42 91 84 (
82)
67(6
4)
94
(93
)
50 27 (
16)
21
(16
)
47 81 75(9
5)
47 <5 20 <5
Com
b.
<5 51 69 93 90 77 100 57 44 43 97 90 68 33 <5 28 <5
MP,
In-
wei
ghtin
g
EF
-la
33
7 8
76 51 67 54 <5 13 15 95 20 25 <5 84 74 48
DD
C
<5 36 35 95 84 94 86 62 22 12 18 97 75 42 5 32 <5
Com
b.
<5 45 58 97 91 96 99 52 13 22 96 98 72 42 11 22 <5
Dis
tanc
e, N
jboo
tlid
EF
-lac
28
(-)
-(-)
13 29 69 (
84)
72 40 (
73)
- -(5
2)
-(-)
81 40 -(9
0)
- 38 61 -
DD
O
- - 38 71 62 (
75)
48
(-)
80 (
88)
24 17(7
6)
19 (
53)
79 59 77 (
100)
30 - 94 -
Com
b.
- 30 - 86 96 76 97 24 34 - 95 96 90 17 - 58 -a S
peci
es a
bbre
viat
ions
are
fro
m T
able
1.
b Das
hes
deno
te n
o va
lue
give
n (t
he "
njbo
otli
" so
ftw
are
repo
rts
boot
stra
p va
lues
for
thos
e cl
ades
rec
over
ed i
n th
e ne
ighb
or-j
oini
ng t
ree
only
).c B
oots
trap
per
cent
ages
in
pare
nthe
ses
are
valu
es d
eriv
ed f
or c
ompa
rabl
e gr
oups
(if
pres
ent)
in
the
49-t
axa
data
set
s.dC
lade
s th
at b
reak
up
conc
orda
nce
grou
ps (
clad
es f
or w
hich
mon
ophy
ly h
as b
een
esta
blis
hed
by m
orph
olog
ical
evi
denc
e); i
t is
expe
cted
tha
t th
ese
grou
ps r
ecei
ve li
ttle
supp
ort.
e Qua
drif
ine
noct
uid
subf
amil
ies
incl
uded
are
Her
min
iina
e, A
gana
inae
, Cat
ocal
inae
, Cal
pina
e, a
nd H
ypen
inae
.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 219
TABLE 9. Sign tests performed on bootstrap percentage (BP) comparisons between each of the six bootstrap-augmented data sets and the combined data set for the 37 clades listed in Table 8.
Data sets
EF-la 1EF-la 2EF-la 3DDC1DDC2DDC3
+-
34
33
32
15
14
13
— b
3
4
5
17
20
20
0
0
0
5
3
4
n
37
37
37
32
34
33
P«
<0.001
<0.001
<0.001
>0.5
>0.5
<0.5, >0.2
"Number of clades for which the combined data set had the larger BP.^Number of clades for which the combined data set had the smaller BP.cNumber of clades showing no difference in BP."Two-tailed test.
ILD test is not entirely appropriate as a testfor combinability of separate data parti-tions. For example, Cunningham (1997b)suggested that "a significance threshold of0.05 may be too conservative for the ILDtest," after finding that ILD test P-valueshad to be much less than 0.01 (P = 0.001)before combined analysis of data partitionsled to reduced recovery of a known phy-logeny. By this criterion, many of our ILDtests performed under In-weighting are ef-fectively nonsignificant (0.05 < P < 0.01),justifying our analysis of the combined dataset. Nonetheless, much information maybe gleaned from comparison of P-valuesamong different tests, as discussed below.
Because there can be only one truespecies phylogeny, incongruence must indi-cate either that phylogeny reconstructionfor at least one of the data partitions was in-accurate, attributable to poor data qualityor inappropriate phylogenetic analysis, orthat the partitions have different evolution-ary histories, attributable to lineage sorting,for example. The latter explanation is moreplausible when internodes are relativelyshort, as is seen for the deeper branches ofthe ML tree. Thus lineage sorting cannot beruled out for this data set. In our analyses,however, much of the incongruence can beeliminated by excluding highly divergedtaxa, which involve relatively deep nodesin the tree. This observation seems to favorerror in the gene tree(s), possibly from longbranch attraction (Felsenstein, 1978b), asthe explanation for the incongruence. It istherefore surprising that ILD tests per-formed on the reduced 49-taxa data setsgave consistently higher P-values than
equivalent tests performed on the complete77-taxa data sets; that is, incongruence be-tween data partitions increased with in-creased taxon sampling. Long branch andtaxon sampling effects could reasonably beexpected to decrease with the addition oftaxa, making it easier to reconstruct the truephylogeny (e.g., Hillis, 1996). A possiblereason for this apparent contradiction isthat we could be exacerbating the longbranch problem when adding more taxa,because we are inadvertently selectinghighly divergent exemplar species from apoorly known higher-level phylogeny.
Robustness of Results to Variationin Phylogenetic Methods
Almost all nodes strongly supported byMP analysis were also strongly supportedby ME analysis, and vice versa, with the fol-lowing exceptions: Three groups with mod-erate support under MP (5:65%; Nolinae,Arctiidae + Lymantriidae, and Spodopterafrugiperda + S. ornithogalli) had much lesssupport under ME (^55%). In contrast,support for Psaphidinae, Condicinae, andStiriinae + Amphipyrinae + Psaphidinae in-creased from 33%, 58%, and 75% under MPto 83%, 90%, and 96%, respectively, underME. Relationships within Plusiinae andwithin Spodoptera were recovered "cor-rectly," that is, in agreement with morphol-ogy, by the mp tree (Fig. 2) but not by theML tree (Fig. 3). Perhaps not coincidentally,both groups are subtended by long internalbranches. The placement of plusiineswithin noctuids also differed: The plusiineswere sister group to the rest of the ingroup
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
220 SYSTEMATIC BIOLOGY VOL. 49
in the mp tree but were sister group to thequadrifine noctuids, arctiids, and lymantri-ids in the ML tree. However, under the MLcriterion, Plusiinae can be placed in eitherposition, or as sister group to all other tri-fine noctuids, without significantly changingthe likelihood score (Kishino-Hasegawatests, P > 0.5).
More Taxa or More Characters?
Taxon sampling effects.—Under the MP cri-terion, higher support values were seen inthe 49-taxa data set than in the 77-taxa dataset, in comparisons made for EF-la. ForDDC, slightly more than half the cladescompared had higher support with 77 taxa,but the trend was not statistically signifi-cant. Note that the extra taxa added in the77-taxa data set join below the root node ofthe same group in the smaller data set forthree clades (1, 4, and 27). This might bepredicted to result in lower BPs for the 77-taxa data set because the subtendingbranch is split. However, for EF-la clade 4actually increased in BP with increasedtaxon sampling. None of the clades affectedin this way in the DDC analysis showed achange >10%.
When a distance method with a correc-tion for the number of taxa in each data setwas used (complete-and-partial bootstrap;Zharkikh and Li, 1995), the 49-taxa data setsclearly had higher support values for bothgenes. All available comparisons for EF-lasupported this conclusion. For DDC, al-most all clades compared showed the sametrend, and the difference was statisticallysignificant. The nonsignificance of theEF-la result obtained by this method wasobviously the result of the small number ofcomparisons available (n = 5).
Thus, in cases where increasing taxonsampling changed the support levels for aclade, the effect was a negative one. This re-sult was surprising, in light of the large(57%) increase in taxon sampling, espe-cially because the taxa had been selected inan attempt to break up long branches(Swofford et al, 1996). Furthermore, our re-sults are contrary to the findings of Poe(1998) that when adding taxa decreased theaccuracy of a phylogeny estimate, the de-crease usually did not involve preexistingrelationships.
The reason for the decrease in supportlevels with addition of taxa appears to bethat the additional groups sampled weresampled too sparsely relative to the localsaturation level for these genes. The effect isthat of adding long branches to the treerather than breaking up long branches(Graybeal, 1998; Hillis, 1998). Thus, "lin-eages that previously were not spuriouslyattracted to each other could become 'long'in a relative sense by virtue of the shorten-ing of another branch on which the addedtaxa connect" (Poe, 1998).
An alternative explanation for the de-crease in support levels, that the original 49species samples were just fortuitous combi-nations of taxa, and that different combina-tions might yield lower support values,was discounted by randomly deleting 28species from subfamilies for which morethan a single exemplar was sampled in the77 species data set (this was repeated fivetimes). Bootstrap analyses of these new 49-taxa data sets produced support valuesvery similar to those of the original 49-taxadata sets.
We have thus provided empirical data insupport of simulation studies (e.g., Poe andSwofford, 1999) that suggest when trying tosolve a difficult phylogenetic problem,adding taxa is not a panacea because it cancreate additional problems under certaincircumstances.
Addition of characters.—The combineddata sets gave both higher overall levels ofsupport, measured as percentage of nodeswith strong support, and higher supportlevels for the specific groups recovered inthe trees, than did any of the single-genedata sets alone. That trend was apparentunder all phylogenetic methods used.Those results are no surprise given thatconsistent methods of phylogeny estima-tion, by definition, will converge on thetrue tree with increasing numbers of char-acters sampled. However, the absolutenumber of characters sampled is not theonly consideration. Our data suggest thatequal attention should be paid to potentialnonindependence among characters.
Combining data from independent genes.—One caveat to the observation that consis-tent methods of phylogeny estimation willconverge on the true tree as increasingnumbers of characters are sampled is that
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 221
none of the assumptions of the method canbe violated. However, violations of as-sumptions are not necessarily easily de-tected (Swofford et al., 1996). This fact alonemay be grounds for preferring that addi-tional characters be obtained from an inde-pendent source, all else being equal.
Other reasons also support the idea thatobtaining characters from independentgenes would hold greater benefit for phy-logeny inference than obtaining the sametotal number of characters from a singlegene. First, the endemic biases of any onegene (in base composition, evolutionaryrate, and so forth) might be diluted in thecombined data set. Cummings et al. (1995),for example, showed that data sets consist-ing of blocks of contiguous sites drawnfrom complete mitochondrial genomeswere less likely to recover the genome phy-logeny than were data sets comprising anequal number of single sites drawn ran-domly from the entire genome. They pro-posed that the effects of "location-dependentprocesses in sequence evolution" could bereduced, and thus the power of phyloge-netic analysis improved, by sampling "sitesfrom throughout the genome or from othergenomes in the organisms." Second, on thepositive side, each gene might carry a sig-nal for groupings on which the other genesare silent. These effects could result in de-creased support for spurious groupings, inaddition to increased support levels for cor-rect groupings.
Our comparisons of the combined-genedata set and the single-gene data sets of thesame size obtained by bootstrap augmenta-tion provide some support for the abovepostulates. For EF-la, support levels for thebootstrap-augmented data set were consis-tently lower than for the combined data set.For DDC, there was little apparent differ-ence in support levels between the boot-strap-augmented data set and the com-bined-gene data set, although the combineddata set recovered more concordance groups.One might argue that as long as one usedan especially informative gene, it wouldmake little difference whether one sampledmore of the same gene or an additionalgene. However, the difficulty is knowing inadvance that a given gene is especially in-formative for the group of interest. Besides,this misses the point that the resolution and
support levels obtained by using the com-bined-gene data set were no less than thoseobtained from the bootstrap-augmentedDDC data set, despite the fact that EF-laproved to offer less information per nu-cleotide. Thus, while recognizing that thedynamics of this single case may not begeneralizable, we argue that the prudentapproach to collecting additional sequencedata is to prefer that such data comes fromadditional genes.
Systematics of the Noctuoidea
The tentative hypothesis that Noctuidaeis paraphyletic with respect to Arctiidaeand Lymantriidae (Weller et al., 1994;Mitchell et al., 1997; Fang et al., 2000) isstrongly supported by our new data set.There is 96% bootstrap support under MP,and 90% under ME, for a clade comprisingArctiidae, Lymantriidae, and the quadrifinenoctuid subfamilies Aganainae, Hermirii-inae, Catocalinae, Calpinae, and Hypeni-nae, to the exclusion of other quadrifineand all trifine Noctuidae. It is thus likelythat Noctuidae is not a natural group. How-ever, in the interests of taxonomic stability,we decline to revise noctuoid classificationto reflect the paraphyly of Noctuidae untilthe deeper divergences within the familyare resolved with reasonable confidence.
Supporting a growing consensus amongnoctuid systematists (Beck, 1960,1992; Hol-loway, 1989; Lafontaine and Poole, 1991;Lafontaine, 1993; Poole, 1995; Speidel et al.,1996; Kitching and Rawlins, 1999), our dataalso provide the strongest evidence to date(99% bootstrap value under MP and 100%under ME) for the monophyly of the "truecutworms" (Noctuinae s.l.; Poole, 1995). Ascircumscribed in our trees, Noctuinae s.l.includes Noctuini s.l., Hadenini, Apameini,Xylenini, and some "Caradrinini," (Nedraand Anorthodes). "Caradrinini" is a hetero-geneous assemblage of genera, most ofwhose placements are still problematic; ac-cordingly, further work is needed to clarifythe limits of Noctuinae s.l.
Several other higher-level relationshipswithin trifine noctuids are supported byour data. Support for the placement of Am-phipyra with the Psaphidinae, and the Stiri-inae as sister group to this clade, is rela-tively strong. Moreover, the broader trifine
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
222 SYSTEMATIC BIOLOGY VOL. 49
relationships in Figure 2 and Figure 3 corre-spond well to recent morphological hy-potheses. For example, groups previouslythought to be relatively primitive, such asAcontiinae and Eustrotiinae (Poole, 1995),do indeed take basal positions, whereassubfamilies generally regarded as derived,such as Agaristinae, Cuculliinae s.s., Oncoc-nemidinae, and Amphipyrinae s.s., are al-lied with the highly derived Heliothinaeand Noctuinae s.l. The placement of othergroups, such as Condicinae and Plusiinae,remains difficult. Plusiinae are thought tobe allied with the trifine clade (Poole, 1995;Speidel et al, 1996; Kitching and Rawlins,1999), but our data set lacks the resolvingpower to distinguish among various basalplacements for this group, plusiines beingbasal within trifines or within quadrifines,or even being basal to all other noctuids.Thus the deeper nodes in Noctuidae remainweakly supported, and clearly, much moreevidence will be needed to fully sort out thehigher-level relationships of this large para-phyletic group.
Conclusions
By combining data from independent nu-clear genes, we obtained the first strong ev-idence that Noctuidae are paraphyleticwith respect to Arctiidae and Lymantriidae,and we have increased confidence in manyhigher-level relationships within the family.
Concordance among multiple indepen-dent data sets is probably the most power-ful evidence systematists can provide forphylogenetic relationships (Miyamoto andFitch, 1995) and this may be reason enoughto turn to an independent source whenmore data are needed to answer a difficultphylogenetic problem. We have providedsome evidence for the importance of char-acter-set independence. Our study also sug-gests that the gene-specific biases endemicto DNA sequence data will be diluted in acombined data set, thus reducing or eveneliminating support for erroneous relation-ships.
The increased support levels in the com-bined data set provided a strong a posteri-ori argument for combining data in a singleanalysis. While we agree in principle withthe desirability of a priori congruence test-ing, our results support proposals by other
authors that the ILD test is too conservativefor the purpose of deciding when to com-bine data sets if one applies the conven-tional significance level of P = 0.05, or evenP = 0.01. An additional dilemma is that withlarge data sets, the time needed to performa rigorous ILD test may be prohibitive.Quicker methods for testing the signifi-cance of incongruence would be welcomed.
Increased taxon sampling is often recom-mended as a fix for problematic molecularphylogenetic data sets, especially if taxaare chosen specifically to break up longbranches. However, we present an empiri-cal example of deleterious effects: Supportlevels for most clades decreased when weintroduced previously unsampled, diver-gent groups to the data set. We suggest thatthis might be a more general phenomenon:If "new" taxa are not sampled denselyenough, one can inadvertently increase ap-parent long branch effects, thereby reduc-ing support for clades on the tree, evenwhen trying to do the opposite! This situa-tion is most likely to occur in taxa for whichthe phylogenetic relationships are poorlyknown. At present we are attempting to de-termine whether a much larger increase inthe taxon sample will permit more confi-dent resolution of deep nodes on the noc-tuid tree.
ACKNOWLEDGMENTS
This paper was greatly strengthened by thethoughtful suggestions made by Dick Olmstead, ChrisSimon, Tim Collins, and Cliff Cunningham. We thankJeff Shultz for providing useful comments on a draft ofthe manuscript, Bob Poole and Eric Metzler for helpwith species identifications, Suwei Zhao and KongyiJiang for technical assistance, Cliff Cunningham forsupplying a spreadsheet for calculation of six-parame-ter parsimony step matrices, Dave Swofford for use ofa test version of PAUP*4.0, Andrey Zharkikh for helpwith his complete-and-partial bootstrap software, andBob Poole, Don Lafontaine, Mike Pogue, Tim Friedlan-der, Quentin Fang, and Soowon Cho for helpful dis-cussions. We are deeply indebted to James Adams, BobDenno, Ian Kitching, Ed Knudsen, Marcus Matthews,Ric Piegler, Bob Poole, Ron Robertson, and Andy War-ren for providing fresh specimens for this study. Fi-nancial support from the USDA-NRICGP, USDA-ARSSystematic Entomology Laboratory, and the Univer-sity of Maryland Graduate School is gratefully ac-knowledged.
REFERENCES
BECK, H. 1960. Die Larvalsystematik der Eulen (Noc-tuidae). Akademie Verlag, Berlin.
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
2000 MITCHELL ET AL.—MORE TAXA OR MORE CHARACTERS? 223
BECK, H. 1992. New view of the higher classificationof the Noctuidae (Lepidoptera). Nota Lepid.15:3-28.
BULL, J. J., J. P. HUELSENBECK, C. W. CUNNINGHAM, D. L.SWOFFORD, AND P. J. WADDELL. 1993. Partitioningand combining data in phylogenetic analysis. Syst.Biol. 42:384-397.
CHO, S., A. MITCHELL, J. C. REGIER, C. MITTER, R. W.POOLE, T. P. FRIEDLANDER, AND S. ZHAO. 1995. Ahighly conserved nuclear gene for low-level phylo-genetics: Elongation factor-larecovers morphology-based tree for heliothine moths. Mol. Biol. Evol.12:650-656.
CUMMINGS, M. P., S. P. OTTO, AND J. WAKELY. 1995.Sampling properties of DNA sequence data in phy-logenetic analysis. Mol. Biol. Evol. 12:814-822.
CUNNINGHAM, C. W. 1997a. Is congruence betweendata partitions a reliable predictor of phylogeneticaccuracy? Empirically testing an iterative procedurefor choosing among phylogenetic methods. Syst.Biol. 46:464-478.
CUNNINGHAM, C. W. 1997b. Can three incongruencetests predict when data should be combined? Mol.Biol. Evol. 14:733-740.
FANG, Q. Q., S. CHO, J. C. REGIER, C. MITTER, M.MATTHEWS, R. W. POOLE, T. P. FRIEDLANDER, AND S.ZHAO. 1997. A new nuclear gene for insect phylo-genetics: Dopa decarboxylase is informative of rela-tionships within the Heliothinae (Lepidoptera: Noc-tuidae). Syst. Biol. 46:269-283.
FANG, Q. Q., A. MITCHELL, J. C. REGIER, C. MITTER, T. P.FRIEDLANDER, AND R. W. POOLE. 2000. Phylogeneticutility of the nuclear gene dopa decarboxylase(DDC) in noctuoid moths (Insecta: Lepidoptera:Noctuoidea). Mol. Phylogenet. Evol. (in press).
FARRIS, J. S., M. KALLERSJO, A. G. KLUGE, AND C. BULT.1994. Testing the significance of incongruence.Cladistics 10:315-319.
FELSENSTEIN, J. 1978a. The number of evolutionarytrees. Syst. Zool. 27:27-33.
FELSENSTEIN, J. 1978b. Cases in which parsimony andcompatibility methods will be positively mislead-ing. Syst. Zool. 27:401-410.
FELSENSTEIN, J. 1995. PHYLIP (phylogeny inferencepackage), version 3.572. Department of Genetics,University of Washington, Seattle.
FRATI, F., C. SIMON, J. SULLIVAN, AND D. L. SWOFFORD.1997. Evolution of the mitochondrial cytochromeoxidase II gene in Collembola. J. Mol. Evol.44:145-158.
FRIEDLANDER, T. P., J. C. REGIER, AND C. MITTER. 1992.Nuclear gene sequences for higher-level phyloge-netic analysis: 14 promising candidates. Syst. Biol.41:483-489.
FRIEDLANDER, T. P., J. C. REGIER, C. MITTER, AND D. L.WAGNER. 1996. A nuclear gene for higher-levelphylogenetics: Phosophoenolpyruvate carboxyki-nase tracks Mesozoic-age divergences withinLepidoptera (Insecta). Mol. Biol. Evol. 13:594-604.
GALLOWAY, G. L., R. L. MALMBERG, AND R. A. PRICE.1998. Phylogenetic utility of the nuclear gene argi-nine decarboxylase: An example from the Brassi-caceae. Mol. Biol. Evol. 15:1312-1320.
GRAYBEAL, A. 1994. Evaluating the phylogenetic util-ity of genes: A search for genes informative about
deep divergences among vertebrates. Syst. Biol.43:174-193.
GRAYBEAL, A. 1998. Is it better to add taxa or charac-ters to a difficult phylogenetic problem? Syst. Biol.47:9-17.
GUPTA, R. S. 1995. Phylogenetic utility of the 90kDheat shock family of protein sequences and an ex-amination of the relationship among animals, plantsand fungi species. Mol. Biol. Evol. 12:1063-1073.
HASEGAWA, M., H. KISHINO, AND T. YANO. 1985. Dat-ing of the human-ape splitting by a molecular clockof mitochondrial DNA. J. Mol. Evol. 21:160-174.
HILLIS, D. M. 1996. Inferring complex phylogenies.Nature 383:130-131.
HILLIS, D. M. 1998. Taxonomic sampling, phyloge-netic accuracy, and investigator bias. Syst. Biol.47:3-8.
HILLIS, D. M., AND J. J. BULL. 1993. An empirical testof bootstrapping as a method for assessing confi-dence in phylogenetic analysis. Syst. Biol. 42:182-192.
HOLLOWAY, J. D. 1989. The moths of Borneo. Part 12.Noctuidae: Noctuinae, Heliothinae, Hadeninae,Acronictinae, Amphipyrinae, Agaristinae. South-dene, Kuala Lampur.
HUELSENBECK, J. P., J. J. BULL., AND C. W. CUNNINGHAM.1996. Combining data in phylogenetic analysis.Trends Ecol. Evol. 11:152-158.
INTERNATIONAL UNION OF BIOCHEMISTRY. 1986.Nomenclature for incompletely specified bases innucleic acid sequences. Mol. Biol. Evol. 3:99-108.
KIM, J. 1996. General inconsistency conditions formaximum parsimony: Effects of branch lengths andincreasing numbers of taxa. Syst. Biol. 45:363-374.
KISHINO, H., AND M. HASEGAWA. 1989. Evaluation ofthe maximum likelihood estimate of the evolution-ary tree topologies from DNA sequence data, andthe branching order in Hominoidea. J. Mol. Evol.29:170-179.
KlTCHlNG, I. J. 1984. An historical review of thehigher classification of the Noctuidae (Lepidoptera).Bull. Br. Mus. Nat. Hist. (Entomol.) 49:153-234.
KITCHING, I. J., AND J. E. RAWLINS. 1999. Noctuoidea.Pages 355-401 in Handbook of zoology. Lepi-doptera, volume 1: Systematics and evolution (N. P.Kristensen, ed.). W. de Gruyter, Berlin.
LAFONTAINE, J. D. 1993. Cutworm systematics: Con-fusions and solutions. Mem. Entomol. Soc. Can.165:189-196.
LAFONTAINE, J. D., AND R. W. POOLE. 1991. Noc-tuoidea, Noctuidae (Part), Plusiinae. Moths ofAmerica north of Mexico, Fascicle 25.1. The WedgeEntomological Research Foundation, Washington,DC.
LANAVE, C, J. PREPARATA, C. SACCONE, AND G. SERIO.1984. A new method for calculating evolutionarysubstitution rates. J. Mol. Evol. 20:86-93.
Li, W.-H. 1993. Unbiased estimation of the rates ofsynonymous and nonsynonymous substitution. J.Mol. Evol. 36:96-99.
LOCKART, P. J., M. A. STEEL, M. D. HENDY, AND D. PENNY.1994. Recovering evolutionary trees under a morerealistic model of sequence evolution. Mol. Biol.Evol. 11:605-612.
MILLER, J. S. 1991. Cladistics and classification of theNotodontidae (Lepidoptera: Noctuoidea) based on
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from
224 SYSTEMATIC BIOLOGY VOL. 49
larval and adult morphology. Bull. Am. Mus. Nat.Hist. 204:1-230.
MITCHELL, A., S. CHO, J. C. REGIER, C. MITTER, R. W.POOLE, AND M. MATTHEWS. 1997. Phylogenetic util-ity of elongation factor-la in Noctuoidea (Insecta:Lepidoptera): The limits of synonymous substitu-tion. Mol. Biol. Evol. 14:381-390.
MIYAMOTO, M. M., AND W. M. FITCH. 1995. Testingspecies phylogenies and phylogenetic methods withcongruence. Syst. Biol. 44:64-76.
ORTI, G., AND A. MEYER. 1996. Molecular evolution ofependymin and the phylogenetic resolution of earlydivergences among euteleost fishes. Mol. Biol. Evol.13:556-574.
POE, S. 1998. The effect of taxonomic sampling on ac-curacy of phylogeny estimation: Test case of aknown phylogeny. Mol. Biol. Evol. 15:1086-1090.
POE, S., AND D. L. SWOFFORD. 1999. Taxon samplingrevisited. Nature 398:299-300.
POOLE, R. W. 1995. Noctuoidea. Noctuidae (part).Cuculliinae, Stiriinae, Psaphidinae (part). Moths ofAmerica north of Mexico, Fascicle 26.1. The WedgeEntomological Research Foundation. Washington,DC.
REGIER, J. C, AND J. W. SHULTZ. 1997. Molecular phy-logeny of the major arthropod groups indicatespolyphyly of crustaceans and a new hypothesis forthe origin of hexapods. Mol. Biol. Evol. 14:902-913.
RODRIGUEZ, F., J. L. OLIVER, A. MARIN, AND J. R. MEDINA.1990. The general stochastic model of nucleotidesubstitution. J. Theor. Biol. 142:485-501.
RZHETSKY, A., AND M. NEI. 1992. A simple method forestimating and testing minimum evolution trees.Mol. Biol. Evol. 9:945-967.
SCOBLE, M. J. 1992. The Lepidoptera. Form, functionand diversity. Oxford Univ. Press, Oxford, U.K.
SLADE, R. W., C. MORITZ, AND A. HEIDEMAN. 1994.Multiple nuclear-gene phylogenies: Application topinnipeds and comparison with a mitochondrialDNAgene phylogeny. Mol. Biol. Evol. 11:341-356.
SMITH, S. W., R. OVERBEEK, C. R. WOESE, W. GILBERT, ANDP. M. GILLEVET. 1994. The genetic data environ-ment and expandable GUI for multiple sequenceanalysis. Comput. Appl. Biosci. 10:671-675.
SPEIDEL, W., H. FANGER, AND C. M. NAUMANN. 1996.The phylogeny of the Noctuidae (Lepidoptera).Syst. Entomol. 21:219-251.
STADEN,R. 1992. STADEN package. MRC Laboratoryof Molecular Biology, Cambridge, U.K.
SULLIVAN, J. 1996. Combining data with different dis-tributions of among-site variation. Syst. Biol.45:375-380.
SWOFFORD, D. L. 1991. When are phylogeny esti-mates from molecular and morphological data in-congruent? Pages 295-333 in Phylogenetic analysisof DNA sequences (M. M. Miyamoto and J. Cracraft,eds.). Oxford Univ. Press, New York.
SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, AND D. M.HILLIS. 1996. Phylogenetic inference. Pages 407-514 in Molecular systematics, 2nd edition (D. M.Hillis, C. Moritz, and B. K. Mable, eds.). Sinauer As-sociates, Sunderland, Massachusetts.
SWOFFORD, D. L. 1999. PAUP*. Phylogenetic analysisusing parsimony (*and other methods). Version 4.Sinauer Associates, Sunderland, Massachusetts.
WATERS, E. R. 1995. An evaluation of the usefulnessof the small heat shock genes for phylogeneticanalysis in plants. Ann. Mo. Bot. Gard. 82:278-295.
WELLER, S. J., D. P. PASHLEY, J. A. MARTIN, AND J. L. CON-STABLE. 1994. Phylogeny of noctuoid moths andthe utility of combining independent nuclear andmitochondrial genes. Syst. Biol. 43:194-211.
ZAR, J. H. 1984. Biostatistical analysis, 2nd edition.Prentice Hall, Englewood Cliffs, New Jersey.
ZHARKIKH, A., AND W.-H. LI. 1995. Estimation of con-fidence in phylogeny: The complete-and-partialbootstrap technique. Mol. Phylogenet. Evol. 4:44-63.
Received 10 November 1998; accepted 6 April 1999
Associate Editor: C. Simon
by guest on April 20, 2014
http://sysbio.oxfordjournals.org/D
ownloaded from