Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes

13
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes Ken KUROKAWA 1,† , Takehiko ITOH 2,† , Tomomi KUWAHARA 3,† , Kenshiro OSHIMA 4 , Hidehiro TOH 4,5 , Atsushi TOYODA 6 , Hideto TAKAMI 7 , Hidetoshi MORITA 8 , Vineet K. SHARMA 6 , Tulika P. SRIVASTAVA 6 , Todd D. TAYLOR 6 , Hideki NOGUCHI 9 , Hiroshi MORI 1 , Yoshitoshi OGURA 10 , Dusko S. EHRLICH 11 , Kikuji ITOH 12 , Toshihisa TAKAGI 9 , Yoshiyuki SAKAKI 6 , Tetsuya HAYASHI 10, *, and Masahira HATTORI 4,6,9, * Laboratory of Comparative Genomics, Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan 1 ; Life Science Research Group, Research Center for Advanced Science and Technology, Mitsubishi Research Institute, 3-6 Otemachi 2-chome Chiyoda-ku, Tokyo 100-8141, Japan 2 ; Department of Molecular Bacteriology, Institute of Health Biosciences, The University of Tokushima Graduate School, 3-18-5 Kuramoto- cho, Tokushima 770-8503, Japan 3 ; Kitasato Institute for Life Science, Kitasato University, 1-15-1 Kitasato, Sagamihara, Kanagawa 228-8555, Japan 4 ; Center for Basic Research, Kitasato Institute, 5-9-1 Shirokane, Minato-ku, Tokyo 108- 8641, Japan 5 ; RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 6 ; Extremophiles Research Program, Extremobiosphere Research Center, Japan Agency for Marine-Earth Science and Technology, 2-15 Natsushima, Yokosuka, Kanagawa 237-0061, Japan 7 ; School of Veterinary Medicine, Azabu University, 1-17-71 Fuchinobe, Sagamihara, Kanagawa 229-8501, Japan 8 ; Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan 9 ; Frontier Science Research Center, University of Miyazaki, 5200 Kiyotake, Miyazaki 889-1692, Japan 10 ; Institut National de la Recherche Agronomique, Domaine de Vilvet, 8352 Jouy en Josas, France 11 and Department of Veterinary Medicine, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan 12 (Received 22 June 2007; accepted on 30 July 2007; published online 3 October 2007) Abstract Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal micro- biomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes. Key words: metagenomics; human gut microbiota; gene family; conjugative transposon 1. Introduction All surfaces of the human body are inhabited by complex microbial communities (microbiota). 1–4 In adults, the combined microbial populations exceed 100 trillion cells, about 10 times the total number of cells composing the Edited by Osamu Ohara * To whom correspondence should be addressed. Masahira Hattori. Tel. þ81 7136-4070. Fax. þ81 4-7136-4084. E-mail: [email protected]. jp; or Tetsuya Hayashi. Tel. þ81 985-85-0871. Fax. þ81 985-85-6475. These authors contributed equally to this work. # The Author 2007. Kazusa DNA Research Institute. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] DNA RESEARCH 14, 169–181, (2007) doi:10.1093/dnares/dsm018

Transcript of Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes

Comparative Metagenomics Revealed Commonly Enriched Gene

Sets in Human Gut Microbiomes

Ken KUROKAWA1,†, Takehiko ITOH2,†, Tomomi KUWAHARA3,†, Kenshiro OSHIMA4, Hidehiro TOH4,5, Atsushi TOYODA6,Hideto TAKAMI7, Hidetoshi MORITA8, Vineet K. SHARMA6, Tulika P. SRIVASTAVA6, Todd D. TAYLOR6, Hideki NOGUCHI9,Hiroshi MORI1, Yoshitoshi OGURA10, Dusko S. EHRLICH11, Kikuji ITOH12, Toshihisa TAKAGI9, Yoshiyuki SAKAKI6,Tetsuya HAYASHI10,*, and Masahira HATTORI4,6,9,*

Laboratory of Comparative Genomics, Graduate School of Information Science, Nara Institute of Science and Technology,8916-5Takayama-cho, Ikoma,Nara630-0192, Japan1;LifeScienceResearchGroup,ResearchCenter forAdvancedScienceandTechnology,Mitsubishi Research Institute, 3-6Otemachi 2-chomeChiyoda-ku, Tokyo 100-8141, Japan2;Department ofMolecular Bacteriology, Institute of Health Biosciences, The University of Tokushima Graduate School, 3-18-5 Kuramoto-cho, Tokushima 770-8503, Japan3; Kitasato Institute for Life Science, Kitasato University, 1-15-1 Kitasato, Sagamihara,Kanagawa 228-8555, Japan4; Center for Basic Research, Kitasato Institute, 5-9-1 Shirokane, Minato-ku, Tokyo 108-8641, Japan5;RIKENGenomicSciencesCenter, 1-7-22Suehiro-cho,Tsurumi-ku,Yokohama,Kanagawa230-0045, Japan6;Extremophiles Research Program, Extremobiosphere Research Center, Japan Agency for Marine-Earth Science andTechnology, 2-15 Natsushima, Yokosuka, Kanagawa 237-0061, Japan7; School of Veterinary Medicine, Azabu University,1-17-71 Fuchinobe, Sagamihara, Kanagawa 229-8501, Japan8; Department of Computational Biology, Graduate School ofFrontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan9; Frontier ScienceResearch Center, University of Miyazaki, 5200 Kiyotake, Miyazaki 889-1692, Japan10; Institut National de la RechercheAgronomique,DomainedeVilvet, 8352JouyenJosas,France11 and Department ofVeterinaryMedicine,GraduateSchool ofAgricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan12

(Received 22 June 2007; accepted on 30 July 2007; published online 3 October 2007)

Abstract

Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form acomplex microbial community that deeply affects human physiology. To identify the genomic features common to allhuman gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomicanalysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, whilethe gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic andgene composition, those from adults and weaned children were more complex but showed a high functional uniformityregardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene familiescommonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis oftheir predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinalenvironment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. Byanalysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal micro-biomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes,which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes.Key words: metagenomics; human gut microbiota; gene family; conjugative transposon

1. Introduction

All surfaces of the human body are inhabited by complexmicrobial communities (microbiota).1–4 In adults, thecombined microbial populations exceed 100 trillion cells,about 10 times the total number of cells composing the

Edited by Osamu Ohara* To whom correspondence should be addressed. Masahira Hattori. Tel.þ81 7136-4070. Fax. þ81 4-7136-4084. E-mail: [email protected]; or Tetsuya Hayashi. Tel. þ81 985-85-0871. Fax. þ81 985-85-6475.

† These authors contributed equally to this work.# The Author 2007. Kazusa DNA Research Institute.

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display theopen access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journaland Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequentlyreproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use,please contact [email protected]

DNA RESEARCH 14, 169–181, (2007) doi:10.1093/dnares/dsm018

human body. Most reside in the intestinal tract, and inparticular in the distal colon where cell densities are ashigh as 1012 g21 of human feces. The members of thehuman intestinal microbiota are classified into more than50 genera and hundreds of species representing ninebacterial and one archaeal division.1,5–7 The total numberof genes encoded by their collective genomes (microbiome)is estimated to exceed that of the human genome by atleast one order of magnitude.1 Our gut microbiota possessesmany metabolic capabilities which are lacking in the humanhost and, thus, can be viewed as indispensable for humanlife.3 It contributes to host nutrition by enhancing theefficacy of energy harvest from ingested food and by synthe-sizing essential vitamins.1,8,9 It also affects a broad range ofphysiological properties of the human host, controlling, forinstance, intestinal epithelial cell proliferation and differen-tiation, energy balance, pH, the development of the immunesystem, and protection against pathogens.1,10–12 Imbalanceof the intestinal microbiota can predispose individuals to avariety of disease states ranging from inflammatory boweldiseases to allergy and obesity.13–18

The composition, dynamics, and functions of the humanintestinal microbiota have been studied mostly usingculture-based approaches and analyses of 16S ribosomalRNA sequences.3,5,13,19 Notwithstanding these efforts, ourunderstanding of this microbial community is still verylimited, particularly with regard to the overall genecontent, because of its high complexity and our inability tocultivate most of the microbial species residing in the gut.For instance, although it has been established that the micro-biota of adults and unweaned infants differ in composition,6,7

it is largely unknown how such compositional differencesaffect the overall gene contents and functional properties.

To explore the genomic features of complex microbialcommunities including uncultivable microbes, a culture-independent metagenomic approach is practical.20 In a pio-neering study, it has been recently applied to fecal samplesfrom two adult Americans.21 However, the data obtainedfrom only two adults are insufficient for understandingthe structure and functional capabilities of gut microbialcommunities, as they may be largely affected by variousintrinsic and environmental factors, such as age, diet,and host genotype.7,16,22–25 Here, we extended the studyto 13 healthy Japanese individuals, including adults,weaned children, and unweaned infants. The analysis notonly indicated the presence of gene sets commonly enrichedin human gut microbiomes, but also revealed intriguingdifferences of microbiomes between adults and unweanedinfants and between American and Japanese individuals.

2. Materials and methods

2.1. SubjectsAll the subjects were healthy Japanese individuals. The

ages, genders, and familial relationships of all subjects aresummarized in Table 1. All subjects or their parents were

informed of the purpose and protocol of this study. Noneof the subjects were given dietary restrictions exceptfor antibiotics, probiotics, fermented foods (fermentedbeans, yogurt, etc.), and well-known functional foods forat least 4 weeks prior to sampling. None had a historyof gastrointestinal disorder at the time of sampling, andnone had unusual eating behaviors.

2.2. Isolation of bacterial DNA from fecal samplesThe collected fecal samples were immediately placed

under anaerobic conditions using the Anaero-Pack system(Mitsubishi Gas Chemical Co.), and transported at 48C tothe laboratory within 24 h. Three grams of wet fecalsample were suspended vigorously in phosphate-bufferedsaline (8 mM Na2HPO4, 137 mM NaCl, 2.7 mM KCl, and1.5 mM KH2PO4). The suspension was filtered through a100 mm mesh nylon filter (Falcon). The bacterial cells inthe filtrates were collected by centrifugation at 5000 � gfor 10 min at 48C, suspended in 10 ml of Tris–EDTA con-taining 10 mM Tris-HCl and 1 mM EDTA (pH 8), andthen used for DNA isolation. The lysis of bacterial cellswas carried out under almost the same conditions as thosedescribed in the literature.26 In brief, 1.5 mg of lysozymewas added to 10 ml of cell suspension. After incubation at378C for 1 h with gentle mixing, 2 mg of proteinase Kwere added and the mixture was incubated at 558C for5 min. Subsequently, 1.2 ml of 10% (wt/vol) sodiumdodecyl sulfate was added to the cell suspension, whichwas further incubated at 558C for 1 h with gentle mixing.High-molecular-weight DNA was isolated and purified byphenol/chloroform extraction, RNase A treatment,ethanol, and finally polyethyleneglycol precipitation.

To assess the efficiency of cell lysis, the total number ofmicrobial cells was determined before and after the lytictreatment. A chamber with a depth of 0.02 mm (Erma,Tokyo, Japan) was used for the bacterial cell countingunder a phase-contrast microscope (Leica; Leitz DMRmodel). The cell counting was repeated three times foreach sample. The average efficiency of the cell lysis wasapparently around 70% or more.

2.3. DNA sequencing, assembly, and gene predictionShotgun libraries were constructed from randomly

sheared bacterial DNA (2–3 kb) (HydroShear, Gene-Machines) and the pUC18 vector. Template DNA forthe sequencing was prepared by polymerase chainreaction (PCR) amplification of the insert DNA usinga TaKaRa ExTaq kit (Takara Bio) and GeneAmpPCR System 9700 (ABI) as described previously.27

Sequencing was carried out for both ends of the clonesusing the BigDye v3.1 chemistry on ABI3730 sequencers(ABI) or the ET chemistry on MegaBACE4500sequencers (GE Healthcare). The shotgun reads fromeach sample were individually assembled to generate

170 Comparative metagenomics of human gut microbiomes [Vol. 14,

Table 1. Summary of the samples, sequencing, assembly, and gene annotation

Samplestatus

Samplename Sex Age

Totalreads

Total readlengtha (bp)

contigs singlets

Total lengh ofassembled

sequence (Mb)

Totalgenesb

(A)

COG-assignedgenes (B)

IdentifiedCOGsc (C)

AverageCOG sized

(B)/(C)

In-houseNR DB hitgenese (D)

Orphangenesf

(A) 2 (D)

averagelength(bp) contigs

averagelength(bp) reads

Individual In-A Male 45 years 81687 52509363 2809.0 5410 902.0 16330 29.93 38778 18058 2355 7.67 30210 8568

Individual In-B Male 6 months 80617 62792581 3701.6 1721 1003.1 8481 14.88 20063 9393 1617 5.81 15127 4936

Individual In-D Male 35 years 84237 55137918 2159.1 7613 911.8 36312 49.55 67740 29998 2559 11.72 49079 18661

Individual In-E Male 3 months 80852 56781600 2301.1 4819 1008.7 16838 28.07 37652 18688 2107 8.87 28513 9139

Individual In-M Female 4 months 89340 57808421 3375.1 4794 655.7 15541 26.37 34330 18187 2857 6.37 27050 7280

Individual In-R Female 24 years 85787 55404826 1920.6 8935 811.3 36524 46.79 63356 28612 2655 10.78 46104 17252

Family I F1-S Male 30 years 78452 53568019 2153.6 7545 806.5 28038 38.86 54151 25173 2531 9.95 40771 13380F1-T Female 28 years 81348 55365235 1980.4 7389 791.5 37458 44.28 65156 31230 2921 10.69 47955 17201F1-U Female 7 months 82525 53864663 2779.4 4854 850.4 14430 25.76 35260 20365 2519 8.08 28711 6549

Family II F2-V Male 37 years 80772 55926002 2008.7 7919 809.2 38442 47.02 66461 33535 2873 11.67 49955 16506F2-W Female 36 years 79163 54885684 2289.8 6778 833.0 30550 40.97 57213 27680 2609 10.61 43625 13588F2-X Male 3 years 80858 56587120 2296.0 5032 832.1 34252 40.05 57446 26599 2669 9.97 42452 14994F2-Y Female 1.5 years 79754 56276047 2044.2 9159 849.9 32461 46.31 64942 30870 2664 11.59 50349 14593

Total 1065392 726907479 2299.8 81968 839.9 345657 478.84 662548 318388 3268 97.43 499901 162647

American Sub. 7 Female 28 years 15.94 22329 12223 2160 5.66 18443 3886Sub. 8 Male 37 years 20.49 27579 14962 2249 6.65 23518 4061

Soil 144.90 212128 84060 4423 19.01 125302 86826

Whale fall 1 29.78 46478 23222 3140 7.40 32847 136312 30.86 47274 21826 2922 7.47 32038 152363 28.87 42870 21076 2894 7.28 31410 11460

Sargasso 816.55 1406274 746941 5184 144.09 1038259 368015

aThe total read length with Phred scores of .15.bThe number of genes predicted from the non-redundant sequences of each sample by the MetaGene program.cThe number of COGs to which at least one gene was assigned in each microbime.dThe average numbers of predicted genes belonging to each COG [(B)/(C)].eThe number of predicted genes showing similarity to genes in the ‘in-house extended NR database’.fThe number of genes showing no significant similarity to genes in the ‘in-house extended NR database’ (e-value�1.0e 2 5 in the BLASTP analysis).

No.

4]K

.K

urokawa

etal.

171

non-redundant metasequences using the PCAP soft-ware28 with default parameters.

The MetaGene program,29 which is based on a HiddenMarkov Model (HMM) algorithm, was employed topredict potential protein-coding regions (open readingframes, ORFs �20 amino acids) from the metasequencesof each sample. Prior to gene finding, we masked low-quality sequences (Phred score ,15) using ‘Xs’.

2.4. Database constructionThe ‘in-house extended NR database’ included the data

set from the GenBank non-redundant amino-acid database(version 26, May 2007) plus a dataset obtained byMetaGene prediction from 44 unpublished microbialgenome sequences (Supplementary Table S1). Theseunpublished sequences were obtained from the public data-base and the websites of the Genome Sequencing Center ofWashington University, St Louis (http://genome.wustl.edu/sub_genome_group_index.cgi?GROUP¼3) and theHuman Metagenome Consortium Japan (HMGJ; http://www.metagenome.jp/). The ‘reference dataset for COG(Cluster of Orthologous Groups of proteins) assignment’(Supplementary Table S2) contained 343 microbialgenome sequences where COG assignment has been madefor all the genes by the NCBI. The ‘in-house reference data-base’ (referred to as ‘Ref-DB’) was constructed by selecting243 microbial genomes (Supplementary Table S3) from thereference dataset for COG assignment. To avoid the effect ofmultiply sequenced species, we selected one representativestrain from each species. To identify the genomic featuresspecific to human gut microbiomes, known gut microbeswere also excluded from Ref-DB.

To search for the genes related to the Tn1549-likeconjugative transposons (CTns), we also constructed amodified Ref-DB by adding the genes on Tn1549 fromEnterococcus faecalis strain BM4382 (AAF72340-68)30

and on CTn2, CTn4, and CTn5 of Clostridium difficilestrain 630.31 Since Ref-DB originally includes those onthe Tn1549-like CTns from E. faecalis strain V583 32

and Streptococcus pyogenes strain MGAS10750,33 themodified Ref-DB contains six known Tn1549-like CTns.

2.5. Clustering analysis of pairwise microbiomecomparison

BLASTP34 analyses of all protein sequences from onemicrobiome against all those from every other were usedto estimate the genomic similarities existing in all possiblemicrobiome comparisons.18,35 Bit-scores of the best hitsfrom every single sequence from one microbiome againstanother were summed up to yield a cumulative pairwisebit-score value. The cumulative bit-score values from allpossible pairwise comparisons were then used to constructa distance matrix.18 Since the cumulative bit-score valuesfrom microbiome A to microbiome B [denoted as S(A,B)]is not equal to that from microbiome B to microbiome A

[S(B,A)] in the non-reciprocal BLASTP analysis, we usedthe minimum of the two values, min[S(A,B), S(B,A)], asa measure of the similarity between microbiomes A andB. To normalize the differences derived from the sizedifferences among the microbiomes, we calculated themeasure of genome conservation distance, D2.36,37 Theobtained distance matrix was then used for clusteringby the multi-dimensional scaling (MDS) method.

2.6. Taxonomic assignmentTaxonomic assignment of protein-coding genes was

performed according to the best-hit pairs in theBLASTP analysis against the in-house extended NRdatabase where the taxonomic information for all genesis available. For the first screening, all BLASTP resultswere filtered by e-value (�1.0e 2 8) and hit length cover-age (�50% of a query sequence). Then, to tentativelyassign the taxonomic origins of the genes, we adopted a90% BLASTP identity threshold. The genes with best-hit pairs under the threshold were assigned as ‘No hits’.

2.7. COG assignment and evaluation of enrichmentCOG assignment of predicted gene products was made

on the basis of the BLASTP analysis against the referencedataset for COG assignment. After filtering the BLASTPresults by e-value (�1.0e 2 8) and aligned length coverage(�50% of a query sequence), the COG assignment of eachgene product was made according to the COG informationof its best-hit pair in the reference dataset for COG assign-ment. If the best-hit pair was not assigned to any COG, thegene product was considered to be ‘uncharacterized’. Onthe basis of COG assignments, the size of each COG (thenumber of gene products belonging to each COG) wasthen counted up for every microbiome. Since most of thegenes located at the contig ends and in singletons are pre-dicted as ‘partial genes’ by MetaGene, their hit countswere corrected by the length ratio of each ‘partial gene’to the reference to minimize multiple counts of fragmentedgenes. When a predicted gene product had a best-hit pairwith a reference protein that has been assigned to multipleCOGs, this hit was divided by the number of assignedCOGs and the value was dispensed evenly to each COG.The size of each COG was normalized by the totalnumber of genes predicted in each microbiome (‘normal-ized %’). The average size of each COG in previouslysequenced microbial genomes in Ref-DB was also calcu-lated and normalized by the average total gene number(‘DB %’). Finally, the magnitude of enrichment (enrich-ment value) of each COG was calculated for every micro-biome by dividing the ‘normalized %’ by the ‘DB%’.COGs with an average enrichment value of . 2.0 weredefined as enriched COGs in each microbiome because, inall the gut microbiomes examined, the enrichment valuesfor 126 COGs belonging to the essential gene sets definedin Escherichia coli and Bacillus subtilis mostly varied

172 Comparative metagenomics of human gut microbiomes [Vol. 14,

between 0.3 and 1.9, with a mean value of 0.92. A cluster-ing analysis of microbiomes based on the COG enrichmentvalues was performed using only the 868 COGs where atleast one member was observed in every microbiome toavoid the effects of unidentified COGs. The matrix wasclustered independently in the microbiomes and in theCOGs using the pairwise complete-linkage hierarchicalclustering of the uncentered correlation (cluster-1.3138).

2.8. Orphan genes in human gut microbiomesOrphan genes whose products showed no significant

similarity to known proteins were surveyed from the662,548 genes predicted in the 13 samples by theBLASTP analysis with the e-value threshold of 1.0e 2 5against the ‘in-house extended NR database’. We alsoobtained orphan genes from the metagenomic sequencesof other environmental microbiomes21,26,39 under thesame condition. We then performed an all-to-all BLASTPanalysis among the orphan gene products and selectedthe pairs having BLAST scores greater than 60. Thegene products selected were subsequently clustered usingthe TribeMCL program40 with default parameters. Ofthe clusters comprising only the gene products derivedfrom human gut microbiomes, those containing �30members were further subjected to motif search/extrac-tion with the HMMER program against the Pfam motifdatabase41 and to identification of the conserved amino-acid sequences by using the MEME program.42

2.9. Accession numbersAll the assembled sequence data have been deposited

in DDBJ/EMBL/GenBank under accession numbersBAAU01000001-BAAU01028900 (subject F1-S), BAAV01000001-BAAV01036326 (F1-T), BAAW01000001-BAAW01016539 (F1-U), BAAX01000001-BAAX01036455(F2-V), BAAY01000001-BAAY01030198 (F2-W), BAAZ01000001-BAAZ01031237 (F2-X), BABA01000001-BABA01035177 (F2-Y), BABB01000001-BABB01020226(In-A), BABC01000001-BABC01009958 (In-B), BABD01000001-BABD01037296 (In-D), BABE01000001-BABE01020532 (In-E), BABF01000001-BABF01016164 (In-M),and BABG01000001-BABG01034797 (In-R).

3. Results and discussion

3.1. Samples, sequencing, and gene predictionThe subjects analysed in this study included seven

adults (aged 24–45 years), two children (3 and 1.5years), and four unweaned infants (3–7 months). Sevenof the subjects belonged to two unrelated families consist-ing of three and four members, respectively (Table 1).

We isolated the microbial DNA from each fecal sample,constructed shotgun libraries (see ‘Materials andmethods’ for details), and produced a total of 1,057,481shotgun reads (about 80,000 reads for each) representing

sequences of about 727 Mb at a Phred score of .15(Table 1). Relatively large numbers of total shotgunreads (67.6% on average) were assembled into contigsfor each sample (Table 1), contrasted with the soilsample in which less than 1% of the total reads (nearly150, 000 reads) were assembled.39 Although 52–80% (inmost cases less than 60%) of the total reads wereassembled in the adults and children, 79–89% wereassembled in the unweaned infants. Therefore, thelengths of the non-redundant metasequences significantlydiffered between adult/child and infant microbiomes:38.9–49.6 Mb for all adults and children except for oneadult (29.9 Mb in In-A) and 14.9–28.1 Mb for allinfants (Table 1). The total length of the contigs and sin-gletons obtained from the 13 samples was 478.8 Mb.

From each non-redundant metasequence, we identified20,063–67,740 potential protein-coding genes (�20amino acids) by using the MetaGene program (Table 1).We might have overestimated the number of genes inour metasequences by false prediction and/or doublecounting of the fragmented ORFs that were derivedfrom the same gene. The MetaGene program, however,predicted 1,406,000 ORFs in the Sargasso Sea metage-nomic data,26 which is 8.7% more than that (1,284,108ORFs) identified from the same dataset by evidence-based gene finding.

3.2. Composition of human gut microbiotaTo compare the overall sequence similarities among

the microbiomes from fecal and other-environmentalsamples,21,26,39 we performed a reciprocal BLASTP analy-sis of the whole gene set for each microbiome, followed byMDS clustering against the D2 normalized distancematrix (see ‘Materials and methods’). The data indicatedthat all gut microbiomes from the adults and weanedchildren form a distinct group (Fig. 1). In contrast,those from the unweaned infants were highly divergentfrom each other and from the microbiomes of the adultsand children, as well as from those of other environments.

To determine the microbial composition at the genuslevel, we next conducted a BLASTP analysis of all thepredicted genes against the genes in our ‘in-houseextended NR database’ (see ‘Materials and methods’).With a threshold of 90% BLASTP identity, 17–43% ofthe predicted genes could be assigned to particulargenera (35–65 genera, 121 in total) in the adults and chil-dren (Fig. 2). A significantly higher proportion of genes(35–55%) was assignable (31–61 genera, 84 in total) inthe unweaned infants, but, overall, the data indicatedthat the majority of gut microbes are as yet uncharacter-ized. We detected a total of 142 genera from the 13samples in this analysis.

Despite the low proportion of assigned genes, theirtaxonomic distribution indicated a clear compositionalchange after weaning. In the adults and weaned children,

No. 4] K. Kurokawa et al. 173

the major constituents were always Bacteroides, followedby several genera belonging to the division Firmicutes,such as Eubacterium, Ruminococcus, and Clostridium,and the genus Bifidobacterium. In the infants, Bifido-bacterium and/or a few genera from the family Enterobac-teriaceae, such as Escherichia, Raoultella, and Klebsiella,were the major constituents. A significant level of inter-individual variation was observed also among the adults

and children, but there was a much higher variationamong the unweaned infants.

To further evaluate the microbial composition at thespecies level, we examined the intra-genus diversity ofBacteroides and Bifidobacterium, the most dominantgenera in the adults and children and in the infants,respectively. Taking advantage of the number of availablegenome sequences belonging to these two genera (11species from Bacteroides, five from Bifidobacterium), weperformed a mapping analysis of the shotgun reads tothese genomes using the BLASTN program34 with athreshold of �95% identity and �150 bp aligned length.The analysis demonstrated that, for each infant, morethan 80% of the shotgun reads were mapped toBifidobacterium derived from a single species or genome(Supplementary Table S4). In contrast, the shotgunreads from the adults and children were mapped to mul-tiple Bacteroides species except for In-A, suggesting acompositional complexity of this genus in human gutmicrobiota.

Together with the results of the shotgun sequenceassembly (Table 1), the data from these compositionalanalyses showed a clear structural difference betweenthe microbiota of the unweaned infants and those of theadults and weaned children. Infant microbiota were domi-nated by a few microbial species or strains, exhibitingrather simple structures, but showed a remarkable inter-individual variation. In contrast, most microbiota of theadults and children were much more complex in speciescomposition, but exhibited high levels of overall sequence

Figure 1. Clustering analysis of microbiomes based on cumulativebitscore comparisons. MDS was applied to the distance matrixcalculated from reciprocal pairwise BLASTP analysis among allpredicted gene products. The dots represent fecal samples fromadults and children (blue), unweaned infants (red), Americans(green), and samples from other natural environments (orange).Whale falls 1–3, Sargasso and soil indicate the metadata formicrobial communities from the deep sea,39 surface seawater,26 andthe surface soil of a farm,39 respectively.

Figure 2. Compositional view of human intestinal microbiomes. A compositional view of microbiomes based on the taxonomic assignment ofprotein-coding genes is shown. The stacked bars represent the compositions of each sample estimated from the results of BLASTP analysiswith a 90% threshold identity. ‘Others’ includes the genera whose proportions were less than 1% in any of the samples. The data for the fecalsamples from two American adults (‘Sub. 7’ and ‘Sub. 8’) are also shown.

174 Comparative metagenomics of human gut microbiomes [Vol. 14,

similarity between the samples. In In-A, the shotgun readsequences apparently derived from a few species ofBacteroides and Eubacterium were notably dominated(data not shown), accounting for the significantly shorternon-redundant sequence of In-A (see Subsection 3.1 andTable 1).

It should also be noted that the samples from Japaneseand American21 adults differed significantly in compo-sition, particularly in terms of Bacteroides and archaealspecies (Fig. 2). The gut microbiomes from twoAmerican samples contained very few sequences andgenes assigned to Bacteroides species and a significantnumber of sequences and genes assigned to an archaealspecies, whereas the gut microbiomes from the Japanesesamples contained a high ratio of sequences and genesassigned to Bacteroides species and almost no archaealsequences and genes. Further studies should establishthe reasons for these intriguing differences, which couldbe due to various factors including the genetic back-ground and dietary style of the hosts, but also to differ-ences in the experimental conditions between the twostudies.

3.3. Functional assignment of predicted genesFunctional assignment of predicted genes (662,548 in

total) was made on the basis of BLASTP analysisagainst the ‘reference dataset for COG assignment’ (see‘Materials and methods’). By this analysis, about 48%of the predicted genes were assigned to a total of 3,268COGs (Table 1). The number of COGs identified in theinfant microbiomes showed remarkable inter-individualvariation (1617–2857 COGs), in contrast to those of theadults and children (2355–2921 COGs). Also, thenumber of orthologous genes belonging to each COG inthe infants was on average about two-third of thatobserved in the adults and children. These results indicatethat the gene repertoires in the gut microbiomes are morevariable and functionally less redundant in infants than inadults and children.

To explore the functional characteristics of humanintestinal microbiota, we looked for significantly over- orunder-represented COGs in gut microbiomes when com-pared with Ref-DB (see ‘Materials and methods’). Asshown in Fig. 3, human gut microbiomes showed patternsdistinct from those of other environments such as sea andsoil. The over-representation of COGs classified into the‘Carbohydrate transport and metabolism’ category andthe under-representation of those for ‘Lipid transportand metabolism’ were observed in all the human gutmicrobiomes examined in this study. However, a cleardifference was observed between the adults/children andthe unweaned infants. The gut microbiomes from theadults and children exhibited a uniform pattern, andthe over-representation of COGs for ‘Defense mechan-isms’ and under-representation of ‘Cell motility’,

Figure 3. Summary of the COG assignment of predicted genes. (A)Comparison of the distribution patterns of COG-assigned genesbetween each type of microbiome and Ref-DB (for Ref-DB, see‘Materials and methods’). Fecal samples from Japanese adults andchildren (nine samples), Japanese infants (four) and Americanadults (two), and three samples from whale fall were each averaged.(B) Distribution of COG-assigned genes in the nine microbiomesfrom adults and children. (C) Distribution of COG-assigned genes inthe four microbiomes from unweaned infants; C: Energy productionand conversion; D: Cell cycle control, mitosis, and meiosis; E: Aminoacid transport and metabolism; F: Nucleotide transport andmetabolism; G: Carbohydrate transport and metabolism; H:Coenzyme transport and metabolism; I: Lipid transport andmetabolism; J: Translation; K: Transcription; L: Replication,recombination and repair; M: Cell wall/membrane biogenesis; N: Cellmotility; O: Post-translational modification, protein turnover,chaperones; P: Inorganic ion transport and metabolism; Q:Secondary metabolites biosynthesis, transport and catabolism;R: General function prediction only; S: Function unknown; T: Signaltransduction mechanisms; U: Intracellular trafficking and secretion;V: Defense mechanisms. The genes not assignable to any COGs arenot shown in this figure.

No. 4] K. Kurokawa et al. 175

‘Secondary metabolites biosynthesis, transport andcatabolism’ and ‘Post-translational modification, proteinturnover, chaperones’ were remarkable (Fig. 3B). In con-trast, the infant microbiomes showed variable patterns(Fig. 3C). The enrichment values of all the COGs ineach microbiome are shown in Supplementary Table S5.

Profiling analysis based on the COG enrichment valuescalculated for each microbiome further demonstratedthat, while the gut microbiomes of the adults and childrenshowed similar profiles, those of the infants had distinctand more variable profiles (Fig. 4; also see SupplementaryFig. S1). It may be noteworthy that neither this analysisnor the overall sequence similarity analysis shown inFig. 1 provided any conclusive evidence for the resem-blance of the genomic features of gut microbiomesamong family members and within the sexes.

3.4. A gene set commonly enriched in adult-typegut microbiomes

COGs that are commonly enriched in the microbiomesof all adults and children were searched, and 237 COGsmet the following criteria: (i) the average enrichmentvalue exceeds 2.0 and (ii) the enrichment value in all sub-jects exceeds 1.0 (Fig. 5 and Supplementary Table S6).When the samples from two adult Americans wereincluded in the analysis, 79% (188) of the 237 COGsstill met the above criteria, even though the American

data contained, unusually, only a few sequences derivedfrom Bacteroides species (Fig. 2). In contrast, only5–10% of these COGs exhibited an enrichment value of.2.0 in the microbiomes of other environments.Therefore, these COGs are specifically enriched in adult-type gut microbiomes, and thus may encode importantfunctions for the gut microbiota itself as well as for itshost.

Pyruvate-formate lyase (COG1882), which catalyzesthe non-oxidative conversion of pyruvate to formateand acetyl-coenzyme A, was enriched. Unexpectedly,however, genes for the formate hydrogenlyase systemthat decomposes formate to CO2 and H2 were ratherunder-represented. In this regard, of interest is the enrich-ment of formyltetrahydrofolate synthetase (COG2759),methenyl tetrahydrofolate cyclohydrolase (COG3404),and methionine synthase (COG1410), all of which are

Figure 4. Relationship between human intestinal microbiomes andother-environmental microbiomes based on their functional profiles.The result of a clustering analysis of microbiomes based on theenrichment values of each COG calculated for each microbiome isshown. The matrix was clustered independently in the samples andCOGs using the pairwise complete-linkage hierarchical clustering ofthe uncentered correlation (cluster-1.31). Also, see SupplementaryFig. S1.

Figure 5. Functional distribution of commonly enriched COGs. Thefunctional distribution of commonly enriched COGs in adult/childmicrobiomes (‘A-gutCEGs’), in infant microbiomes (‘I-gutCEGs’),or in both types of microbiomes is shown: see Fig. 3 for the COGcategorization.

176 Comparative metagenomics of human gut microbiomes [Vol. 14,

enzymes involved in the regulation of one carbon pool byfolate. Their enrichment may suggest that one carbonunit of formate can be utilized effectively by the gutmicrobiota in the folate-mediated cycle of one carbonpool. In contrast to the enrichment of enzymes for anaero-bic pyruvate metabolism, the pyruvate dehydrogenasecomplex (COG2069) was profoundly depleted. All com-ponents of the oxidative tricarboxylic acid (TCA) cycleand the membrane respiratory chain (with the exceptionof NADH:ubiquinone oxidoreductase) were also signifi-cantly under-represented in all subjects, but phosphoenol-pyruvate carboxykinase (COG1866) and pyruvatecarboxyltransferase (COG5016), which generate oxaloa-cetate, an entry substrate to the TCA cycle in the reduc-tive pathway, were enriched. Together with the strikingdepletion of most gene families whose products scavengeoxygen radicals, these findings reflect well the fact thatthe adult gut ecosystem is a kingdom of strict anaerobes.

The enrichment of carbohydrate metabolism genes wasalso striking: 24% (53 COGs) of the commonly enrichedCOGs had this function. At least 14 families of glycosylhydrolases for plant-derived dietary polysaccharides andhost tissue-derived proteoglycans or glycoconjugates wereenriched in the adults. In addition, many enzymes involvedin the metabolism of mono- or disaccharides releasedby these glycosyl hydrolases, such as L-fucose isomerase(COG2407), L-arabinose isomerase (COG2160), and galac-tokinase (COG0153), were also over-represented. Severalpeptidase families (COG1362, COG2195, COG3340, andCOG3579) were also enriched, but most genes for fatty-acid metabolism were selectively reduced in number.These findings support the notion that the colonic micro-biota utilizes otherwise indigestible polysaccharides andpeptides as major resources for energy production and bio-synthesis of cellular components.7,8 The enrichment ofphosphoenolpyruvate carboxykinase (COG1866), glyco-gen synthase (COG0297) and ADP-glucose pyrophosphor-ylase (COG0448) suggests that energy storage is also animportant activity of adult-type gut microbiota. Thisactivity may be required for the gut microbiota to copewith intermittent nutrient supply in the adult gut.The enrichment of antimicrobial peptide transporters(COG0577 and COG1132) and a multidrug efflux pump(COG0534) is also of interest. Host intestinal cellsproduce various cationic antimicrobial peptides (CAMPs),such as beta-defensins.43 Many microorganisms alsoproduce CAMPs to compete with other microbes sharingthe same niche. The enrichment of antimicrobial peptidetransporters and the multidrug efflux pump may playa primary role in the stable colonization of gut microbesin the adult intestine by conferring resistance to CAMPs.

The enrichment of several enzymes for DNA repair isalso noteworthy (Supplementary Table S6). Theseenzymes may be needed to repair microbial DNAdamage caused by genotoxic substances, such as nitrosa-mines and heterocyclic amines contained in ingested foods

and secondary bile acids and nitroso compounds syn-thesized in the intestine via gut microbiota-involving pro-cesses.44 It is conceivable that not only the host cells butalso intestinal microbes are constantly exposed to suchgenotoxic compounds.

Another distinguishing feature of the adult-type micro-biota is the striking depletion of genes for the biosynthesisof flagella and chemotaxis (Supplementary Table S5).This implies that motility and chemotaxis are notrequired for the intestinal microbes to persist in the gut,where the contents are constantly stirred by peristalsis.Rather, flagellated microbes may be easily eliminated bythe host immune system because flagella are highly immu-nogenic. Abnegation of motility may be another adap-tation mechanism of gut microbes to the intestinalenvironment.

3.5. A gene set commonly enriched in infant-typegut microbiomes

Despite the high inter-individual variation, 136 COGswere found to be commonly enriched in the infantmicrobiomes (Fig. 5 and Supplementary Table S6). Ofthese, 58 were also over-represented in the adult/childmicrobiomes.

Genes for anaerobic energy production were alsoenriched in infants, but genes for the pyruvate dehydro-genase complex and all components of the oxidativeTCA cycle were present in the infants at a frequencysimilar to that in Ref-DB. These findings may reflectthe compositional feature of the infant gut microbiota,which contains considerable numbers of facultative anae-robes (Fig. 2).

In infant microbiomes, about 35% (47) of the 136enriched COGs were for ‘Carbohydrate transport andmetabolism,’ including 12 families of glycosyl hydrolases,nine of which were enriched also in the adult gut micro-biomes. Unexpectedly, they included several enzymesthat degrade non-digestible polysaccharides of plantorigin, such as pullulanase and related glycosidases(COG1523), arabinogalactan endo-1,4-beta-galactosidase(COG3867), and endopolygalacturonase (COG5434).These enzymes may act to degrade oligosaccharides inbreast milk or host-derived proteoglycans like mucin tomaintain the functional homeostasis of gut epithelia.45

It is also possible that the gut microbiota is ready toutilize plant-derived polysaccharides to some extentbefore weaning.

The over-representation of various transport systemswas also characteristic to infants, with 22% (29) of the136 enriched COGs being transporters. In particular,the enrichment of phosphotransferase systems thatmediate active sugar transport was remarkable. This pro-karyote-specific transport system may play a central rolein the uptake of lactose and other easily digestiblesimple sugars rich in breast milk. The over-representation

No. 4] K. Kurokawa et al. 177

of other transporters may also be advantageous to themicrobes in the infant intestine because breast milk con-tains many other essential nutrients such as aminoacids, long-chain fatty acids, nucleotides, vitamins, andminerals in a readily available form. The difference indiet between adults and unweaned infants appears toaffect other functional properties as well. For instance,the genes for defense mechanisms and DNA repair thatwere over-represented in the adults were not so in theinfants.

3.6. A CTn family amplified in the intestineAlthough this remains largely unproven, the distal

colon has been regarded an ecologically suitable site forhorizontal gene transfer (HGT) between microorganismsdue to its high microbial cell density.2 We identifiedmany gene families related to transposases and bacterio-phages in the metagenomic data, but their over-represen-tation was noted only in certain individuals(Supplementary Table S5). An exception to this was aset of genes homologous to those on Tn1549-like CTns,which was notably enriched in most of the gut micro-biomes analysed here (Fig. 6). Tn1549 was originallyidentified in an E. faecalis strain,30 thereafter, its relativeshave been identified in another E. faecalis strain,Butyrivibrio, Clostridium, and Streptococcus.31,32,46,47 Ithas also been recorded that Tn1549 is transferablebetween C. symbiosum and Enterococcus spp.48 Thehomologues found in the metagenomic data accountedfor 0.8% of all the predicted genes (5,325 genes in total)and were also enriched in the two fecal samples fromAmerican individuals21 (Fig. 6 and SupplementaryTable S7). They were highly divergent in sequence fromthe corresponding genes on six known Tn1549-like CTns(Fig. 6C), but frequently appeared as gene clusters oncontigs. We identified 89 contigs that contained four ormore genes related to Tn1549-like CTns. These genesappeared there in the same or a similar gene organizationas seen in Tn1549-like CTns (Supplementary Table S7),suggesting that they were derived from divergentmembers of a Tn1549-like CTn family, which we referto as ‘CTnRINT’ (CTn rich in intestine). By analysingthese contigs, we found that CTnRINT memberscontain a variety of genes, such as those for ABC-typemultidrug transport systems, in the regions correspondingto that for the vancomycin-resistance genes on Tn1549(data not shown). As shown in Fig. 6B, other knownTn1549-like CTns also contain various accessory genesin this region. These findings strongly suggest that theCTnRINT family is largely involved in the process ofHGT in the human intestine. It seems reasonable thatconjugal elements, which mediate genetic exchanges andtransmittance through cell–cell contact, are key playersin HGT in the colon.

In addition to the CTnRINT family, we foundthat integrases/site-specific recombinases belonging toCOG4974 were remarkably expanded in the microbiomesof the adults and children. Most of them were apparentlyderived from several types of integrative mobile geneticelements such as Tn916-like CTns49, suggesting thatother types of integrative elements are also richlypresent in human gut microbiomes.

3.7. Orphan gene families in human gut microbiomesOf the 662,548 genes predicted in the 13 samples,

162,647 were orphan genes (25% of the total genes).Similarly, 503,115 orphan genes were obtained fromother-environmental microbiomes.21,26,39 An all-to-allBLASTP analysis of these 665,762 orphan gene productsfollowed by a clustering analysis (see ‘Materials andmethods’) yielded 160,543 clusters and 461,435 single-tons. Of the 160,543 clusters, 647 comprised five ormore gene products derived only from human fecalsamples (Supplementary Table S8). The largest two clus-ters, ID37 and ID39 containing 48 and 47 members,respectively, were present in the microbiomes of all theJapanese adults and children, and also in those of theAmerican adults. For eight clusters that comprise �30gene members, we performed motif search/extractionanalyses by using the HMMER program against thePfam motif database.41 Only two gene products incluster ID44 showed a significant similarity to a Pfammotif (PF07508; recombinase); other gene products incluster ID44 showed no significant similarity in thePfam database. We could identify the conserved amino-acid sequences (38–50 amino acids) for each cluster byusing the MEME program42 (Supplementary Fig. S2).These conserved sequences may represent new motifsspecific to human gut microbiomes.

3.8. Remarks and future perspectivesThe present study is the first large-scale comparative

metagenomic analysis of human gut microbiomes. Thedata provided several new lines of insight into thegenomic features of gut microbiota. First, our dataclearly demonstrated a difference in overall compositionand gene repertoire between adult- and infant-type gutmicrobiomes. The simple and less redundant features ofthe infant-type gut microbiota are probably linked to itshigh inter-individual variability (Figs. 1, 2, and 3C). Wesuggest that the infant-type can be viewed as unstable,yet dynamic and adaptable. Conversely, the functionaluniformity observed in the adult-type microbiota(Figs. 1, 3B, and 4) may be attributable to its morecomplex nature (Fig. 2 and Tables 1 and S2), which inturn suggests that the insurance hypothesis50 for thebenefit of biodiversity may be relevant to the gut.

Secondly, a comparison of the gene contents betweengut microbiota and previously sequenced microbes

178 Comparative metagenomics of human gut microbiomes [Vol. 14,

Figure 6. The Tn1549-like CTn family, ‘CTnRINT’, explosively amplified in human gut microbiomes. (A) Numbers of genes homologous to those on six known Tn1549-like CTns. Genes derivedfrom each fecal sample are shown in different colors. Plum: In-A; brown: F2-V; light yellow: F2-W; light cyan: In-D; dark red: F1-S; salmon: F1-T; royal blue: In-R; pale turquoise: F2-X; midnightblue: F2-Y; magenta: F1-U; yellow: In-B; cyan: In-M; and maroon: In-E. (B) Gene organizations of six known Tn1549-like CTns. Tn1549 from E. faecalis strain BM4382, an unnamed CTn fromE. faecalis V583, CTn4, CTn2 and CTn5 from C. difficile, and an unnamed CTn from S. pyogenes MGAS10750 are shown. Pentagons and circles represent ORFs. ORFs in the conserved regionsare depicted in green, and orthologues are connected with pink vertical lines. The three ORFs depicted in blue are tentatively positioned in this figure to more clearly show their orthologousrelationships, and their actual positions are indicated by dotted arrow lines. Orange pentagons represent the accessory genes of the six CTns. Those from Tn1549 and the CTn of E. faecalis V583include vancomycin-resistance genes.30,32 (C) Sequence diversity of the Tn1549-like CTn family (‘CTnRINT’). All the CTnRINT-related gene products were searched for their best-hithomologues among those on the six known CTnRINT family members. The six columns represent Tn1549, the CTn from E. faecalis V583, CTn4, CTn2 and CTn5 from C. difficile, andthe CTn from S. pyogenes MGAS10750, respectively (from left to right). The BLASTP identity (%) for a CTnRINT-related gene product was plotted in the CTn column where its best-hithomologue was identified.

No.

4]K

.K

urokawa

etal.

179

revealed 237 COGs commonly enriched in adult-typemicrobiomes and 136 COGs in infant-type microbiomes(Fig. 5 and Supplementary Table S6). The characteriz-ation of these genes revealed distinct nutrient acquisitionstrategies in each type of microbiota, possibly to accom-modate very different diets of their hosts. The analysisalso revealed several possible strategies through whichintestinal microbes adapt to the intestinal environmentand establish symbiotic relationships with their host.Thus, these genes, which we refer to as ‘Adult- orInfant-gut Commonly Enriched Genes’ (A-gutCEGs orI-gutCEGs), appear to encode some of the core functionsof each type of microbiota. It is noteworthy that the twogene sets contain as many as 104 gene families ofunknown functions (Fig. 5). The in vitro and in vivo func-tions of these uncharacterized genes as well as those of the647 ‘new’ gene families (Supplementary Table S8) wouldbe important topics of future studies.

Thirdly, a survey of the enriched genes revealed anabundance of mobile genetic elements in the human intes-tinal gene pool, emphasizing that the human gut micro-biota is a “hot spot” for HGT between microbes. Ofparticular importance is the abundance of conjugalelements including CTnRINT. Considering their hightransfer efficiency, the broad range of hosts, and the fre-quent carriage of drug-resistance genes, it would beprudent to reassess the heavy use of antibiotics inmodern medicine.

Finally, the metagenomic datasets presented here willbe of great use for understanding the roles of gut micro-biota in the etiology of human diseases and also for scien-tifically evaluating the efficacy of probiotics, prebioticsand other ‘functional foods’ that are widely used for mod-ulating the intestinal microbiota in an effort to improveour health7.

Acknowledgements: We thank Jun Ishikawa, PawanK. Dhar, and Takeaki Taniguchi for their help andsupport with the data analysis, Noriko Itoh, HiromiInaba, Asako Tamura, Keiko Furuya, KanakoMotomura, Yasue Yamashita, Chie Yoshino, and YuriHayakawa for their technical assistances, YumikoHayashi for her editorial assistance, and thoseresponsible for the microbial genome sequencing projectat the Washington University Genome SequencingCenter for providing open access to unpublished genomesequence data.

Funding

This work was supported by Grants-in-Aid forScientific Research on Priority Areas ‘ComprehensiveGenomics’ (M.H.) and ‘Applied Genomics’ (T.H. andT.K.) from the Ministry of Education, Science, andTechnology of Japan, and a Grant-in-Aid from theInstitute for Bioinformatics Research and Development,

the Japan Science and Technology Agency (BIRD-JST)(K.K.).

Supplementary Data: Supplementary data areavailable online at www.dnaresearch.oxfordjournals.org.

References

1. Backhed, F., Ley, R. E., Sonnenburg, J. L., Peterson, D. A.and Gordon, J. I. 2005, Host-bacterial mutualism in thehuman intestine, Science, 307, 1915–1920.

2. Ley, R. E., Peterson, D. A. and Gordon, J. I. 2006,Ecological and evolutionary forces shaping microbial diver-sity in the human intestine, Cell, 124, 837–848.

3. Savage, D. C. 2001, Microbial biota of the human intestine:a tribute to some pioneering scientists, Curr. Issues. Intest.Microbiol., 2, 1–15.

4. Tancrede, C. 1992, Role of human microflora in health anddisease, Eur J Clin Microbiol. Infect. Dis., 11, 1012–1015.

5. Eckburg, P. B., Bik, E. M., Bernstein, C. N., et al. 2005,Diversity of the human intestinal microbial flora, Science,308, 1635–1638.

6. Fanaro, S., Chierici, R., Guerrini, P. and Vigi, V. 2003,Intestinal microflora in early infancy: composition anddevelopment, Acta Paediatr. 91, (Suppl.) 48–55.

7. Gibson, G. R. and Roberfroid, M. B. 1995, Dietary modu-lation of the human colonic microbiota: introducing theconcept of prebiotics, J. Nutr., 125, 1401–1412.

8. Guarner, F. and Malagelada, J. R. 2003, Gut flora in healthand disease, Lancet, 361, 512–519.

9. Hooper, L. V. and Gordon, J. I. 2001, Commensal host-bac-terial relationships in the gut, Science, 292, 1115–1118.

10. Backhed, F., Ding, H., Wang, T., et al. 2004, The gut micro-biota as an environmental factor that regulates fat storage,Proc. Natl. Acad. Sci. USA, 101, 15718–15723.

11. Rakoff-Nahoum, S., Paglino, J., Eslami-Varzaneh, F.,Edberg, S. and Medzhitov, R. 2004, Recognition of commen-sal microflora by toll-like receptors is required for intestinalhomeostasis, Cell, 118, 229–241.

12. Samuel, B. S. and Gordon, J. I. 2006, A humanized gnoto-biotic mouse model of host-archaeal-bacterial mutualism.Proc. Natl. Acad. Sci. USA, 103, 10011–10016.

13. Ley, R. E., Turnbaugh, P. J., Klein, S. and Gordon, J. I.2006, Microbial ecology: human gut microbes associatedwith obesity, Nature, 444, 1022–1023.

14. Macdonald, T. T. and Monteleone, G. 2005, Immunity,inflammation, and allergy in the gut, Science, 307, 1920–1925.

15. Manichanh, C., Rigottier-Gois, L., Bonnaud, E., et al. 2006,Reduced diversity of faecal microbiota in Crohn’s diseaserevealed by a metagenomic approach, Gut, 55, 205–211.

16. McGarr, S. E., Ridlon, J. M. and Hylemon, P. B. 2005, Diet,anaerobic bacterial metabolism, and colon cancer: a reviewof the literature, J. Clin. Gastroenterol., 39, 98–109.

17. Seksik, P., Sokol, H., Lepage, P., et al. 2006, Review article:the role of bacteria in onset and perpetuation of inflamma-tory bowel disease, Aliment. Pharmacol. Ther., 24, Suppl3, 11–18.

18. Turnbaugh, P. J., Ley, R. E., Mahowald, M. A., Magrini, V.,Mardis, E. R. and Gordon, J. I. 2006, An obesity-associated

180 Comparative metagenomics of human gut microbiomes [Vol. 14,

gut microbiome with increased capacity for energy harvest,Nature, 444, 1027–1031.

19. Finegold, S. M., Sutter, V. L. and Mathisen, G. E. 1983,Normal indigenous flora, In: Hentges, D. J. ed. HumanIntestinal Microflora in Health and Disease. NY,Academic Press, 3–31.

20. Tringe, S. G. and Rubin, E. M. 2005, Metagenomics: DNAsequencing of environmental samples, Nat. Rev. Genet., 6,805–814.

21. Gill, S. R., Pop, M., Deboy, R. T., et al. 2006, Metagenomicanalysis of the human distal gut microbiome, Science, 312,1355–1359.

22. Benno, Y., Endo, K., Mizutani, T., Namba, Y., Komori, T.and Mitsuoka, T. 1989, Comparison of fecal microflora ofelderly persons in rural and urban areas of Japan, Appl.Environ. Microbiol., 55, 1100–1105.

23. Favier, C. F., Vaughan, E. E., De Vos, W. M. andAkkermans, A. D. 2002, Molecular monitoring of successionof bacterial communities in human neonates, Appl. Environ.Microbiol., 68, 219–226.

24. Hopkins, M. J., Sharp, R. and Macfarlane, G. T. 2001, Ageand disease related changes in intestinal bacterial popu-lations assessed by cell culture, 16S RNA abundance, andcommunity cellular fatty acid profiles, Gut, 48, 198–205.

25. Stewart, J. A., Chadwick, V. S. and Murray, A. 2005,Investigations into the influence of host genetics on the pre-dominant eubacteria in the faecal microflora of children,J. Med. Microbiol., 54, 1239–1242.

26. Venter, J. C., Remington, K., Heidelberg, J. F., et al. 2004,Environmental genome shotgun sequencing of the SargassoSea, Science, 304, 66–74.

27. Hattori, M., Tsukahara, F., Furuhata, Y., et al. 1997, Anovel method for making nested deletions and its appli-cation for sequencing of a 300 kb region of human APPlocus, Nuc. Acids Res., 25, 1802–1808.

28. Huang, X., Wang, J., Aluru, S., Yang, S. P. and Hillier, L.2003, PCAP: a whole-genome assembly program, GenomeRes., 13, 2164–2170.

29. Noguchi, H., Park, J. and Takagi, T. 2006, MetaGene: pro-karyotic gene finding from environmental genome shotgunsequences, Nucl. Acids Res., 34, 5623–5630.

30. Garnier, F., Taourit, S., Glaser, P., Courvalin, P. andGalimand, M. 2000, Characterization of transposonTn1549, conferring VanB-type resistance in Enterococcusspp, Microbiology, 146(Pt 6), 1481–1489.

31. Sebaihia, M., Wren, B. W., Mullany, P., et al. 2006, Themultidrug-resistant human pathogen Clostridium difficilehas a highly mobile, mosaic genome, Nat. Genet., 38,779–786.

32. Paulsen, I. T., Banerjei, L., Myers, G. S., et al. 2003, Role ofmobile DNA in the evolution of vancomycin-resistantEnterococcus faecalis, Science, 299, 2071–2074.

33. Beres, S. B., Richter, E. W., Nagiec, M. J., et al. 2006,Molecular genetic anatomy of inter- and intraserotype vari-ation in the human bacterial pathogen group AStreptococcus, Proc. Natl. Acad. Sci. USA, 103,7059–7064.

34. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. 1997,Gapped BLAST and PSI-BLAST: a new generation of

protein database search programs. Nucl. Acids Res., 25,3389–3402.

35. DeLong, E. F., Preston, C. M., Mincer, T., et al. 2006,Community genomics among stratified microbial assem-blages in the ocean’s interior, Science, 311, 496–503.

36. Kunin, V., Ahren, D., Goldovsky, L., Janssen, P. andOuzounis, C. A. 2005, Measuring genome conservationacross taxa: divided strains and united kingdoms, Nucl.Acids Res., 33, 616–621.

37. Korbel, J. O., Snel, B., Huynen, M. A. and Bork, P. 2002,SHOT: a web server for the construction of genome phyloge-nies, Trends Genet., 18, 158–162.

38. de Hoon, M. J., Imoto, S., Nolan, J. and Miyano, S. 2004,Open source clustering software, Bioinformatics, 20,1453–1454.

39. Tringe, S. G., Mering von, C., Kobayashi, A., et al. 2005,Comparative metagenomics of microbial communities,Science, 308, 554–557.

40. Enright, A. J., Van Dongen, S. and Ouzounis, C. A. 2002,An efficient algorithm for large-scale detection of proteinfamilies, Nucl. Acids Res., 30, 1575–1584.

41. Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., et al. 2006, Pfam: clans, web tools and services,Nucl. Acids Res., 34(Database Issue), D247–D251.

42. Bailey, T. L. and Elkan, C. 1994, Fitting a mixture model byexpectation maximization to discover motifs in biopolymers,Proc. Int. Conf. Intell. Syst. Mol. Biol., 2, 28–36.

43. Peschel, A. and Sahl, H. G. 2006, The co-evolution of hostcationic antimicrobial peptides and microbial resistance,Nat. Rev. Microbiol., 4, 529–536.

44. Wakabayashi, K., Nagao, M., Esumi, H. and Sugimura, T.1992, Food-derived mutagens and carcinogens. CancerRes., 52, 2092s–2098s.

45. Bry, L., Falk, P. G., Midtvedt, T. and Gordon, J. I. 1996, Amodel of host-microbial interactions in an open mammalianecosystem, Science, 273, 1380–1383.

46. Domingo, M. C., Huletsky, A., Bernal, A., et al. 2005,Characterization of a Tn5382-like transposon containingthe vanB2 gene cluster in a Clostridium strain isolatedfrom human faeces, J. Antimicrob. Chemother., 55,466–474.

47. Melville, C. M., Brunel, R., Flint, H. J. and Scott, K. P.2004, The Butyrivibrio fibrisolvens tet(W) gene is carriedon the novel conjugative transposon TnB1230, whichcontains duplicated nitroreductase coding sequences,J. Bacteriol., 186, 3656–3659.

48. Launay, A., Ballard, S. A., Johnson, P. D., Grayson, M. L.and Lambert, T. 2006, Transfer of vancomycin resistancetransposon Tn1549 from Clostridium symbiosum toEnterococcus spp. in the gut of gnotobiotic mice,Antimicrob. Agents. Chemother., 50, 1054–1062.

49. Franke, A. E. and Clewell, D. B. 1981, Evidence for achromosome-borne resistance transposon (Tn916) inStreptococcus faecalis that is capable of “conjugal” transferin the absence of a conjugative plasmid, J. Bacteriol., 145,494–502.

50. Yachi, S. and Loreau, M. 1999, Biodiversity and ecosystemproductivity in a fluctuating environment: the insurancehypothesis. Proc. Natl. Acad. Sci. USA, 96, 1463–1468.

No. 4] K. Kurokawa et al. 181