Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations

64
LETTER doi:10.1038/nature10989 Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations Brian J. O’Roak 1 , Laura Vives 1 , Santhosh Girirajan 1 , Emre Karakoc 1 , Niklas Krumm 1 , Bradley P. Coe 1 , Roie Levy 1 , Arthur Ko 1 , Choli Lee 1 , Joshua D. Smith 1 , Emily H. Turner 1 , Ian B. Stanaway 1 , Benjamin Vernot 1 , Maika Malig 1 , Carl Baker 1 , Beau Reilly 2 , Joshua M. Akey 1 , Elhanan Borenstein 1,3,4 , Mark J. Rieder 1 , Deborah A. Nickerson 1 , Raphael Bernier 2 , Jay Shendure 1 & Evan E. Eichler 1,5 It is well established that autism spectrum disorders (ASD) have a strong genetic component; however, for at least 70% of cases, the underlying genetic cause is unknown 1 . Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes—so-called sporadic or simplex families 2,3 —we sequenced all coding regions of the genome (the exome) for parent–child trios exhibiting sporadic ASD, including 189 new trios and 20 that were previously reported 4 . Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n 5 31) and previously reported trios (n 5 19) 4 , for a total of 677 individual exomes from 209 families. Here we show that de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, con- sistent with the modest increased risk for children of older fathers to develop ASD 5 . Moreover, 39% (49 of 126) of the most severe or disruptive de novo mutations map to a highly interconnected b-catenin/chromatin remodelling protein network ranked signifi- cantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes: CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering muta- tions in GRIN2B, LAMC3 and SCN1A. Combined with copy number variant (CNV) data, these results indicate extreme locus heterogeneity but also provide a target for future discovery, diagnostics and therapeutics. We selected 189 autism trios from the Simons Simplex Collection (SSC) 6 , which included males significantly impaired with autism and intellectual disability (n 5 47), a female sample set (n 5 56) of which 26 were cognitively impaired, and samples chosen at random from the remaining males in the collection (n 5 86) (Supplementary Table 1 and Supplementary Fig. 1). In general, we excluded samples known to carry large de novo CNVs 2 . Exome sequencing was performed as described previously 4 , but with an expanded target definition (see Methods). We achieved sufficient coverage for both parents and child to call genotypes for, on average, 29.5megabases (Mb) of haploid exome coding sequence (Supplementary Table 1). In addition, we performed copy number analysis on 122 of these families, using a combination of the exome data, array comparative genomic hybrid- ization (CGH), and genotyping arrays, thereby providing a more com- prehensive view of rare variation. In the 189 new probands, we validated 248 de novo events, 225 single nucleotide variants (SNVs), 17 small insertions/deletions (indels), and six CNVs (Supplementary Table 2). These included 181 non- synonymous changes, of which 120 were classified as severe based on sequence conservation and/or biochemical properties (Methods and Supplementary Table 3). The observed point mutation rate in coding sequence was ,1.3 events per trio or 2.17 3 10 28 per base per generation, in close agreement with our previous observations 4 , yet in general, higher than previous studies, indicating increased sensitivity (Supplementary Table 2 and Supplementary Table 4) 7 . We also observed complex classes of de novo mutation including: five cases of multiple mutations in close proximity; two events consistent with paternal germline mosaicism (that is, where both siblings con- tained a de novo event observed in neither parent); and nine events showing a weak minor allele profile consistent with somatic mosaicism (Supplementary Table 3 and Supplementary Figs 2 and 3). Of the severe de novo events, 28% (33 of 120) are predicted to truncate the protein. The distribution of synonymous, missense and nonsense changes corresponds well with a random mutation model 7 (Supplementary Fig. 4 and Supplementary Table 2). However, the difference in nonsense rates between de novo and rare singleton events (not present in 1,779 other exomes) is striking (4:1) and suggests strong selection against new nonsense events (Fisher’s exact test, P , 0.0001). In contrast with a recent report 8 , we find no significant difference in mutation rate between affected and unaffected indivi- duals; however, we do observe a trend towards increased non- synonymous rates in probands, consistent with the findings of ref. 9 (Supplementary Tables 1 and 2). Given the association of ASD with increased paternal age 5 and our previous observations 4 , we used molecular cloning, read-pair informa- tion, and obligate carrier status to identify informative markers linked to 51 de novo events and observed a marked paternal bias (41:10; binomial P , 1.4 3 10 25 ; Fig. 1a and Supplementary Tables 3 and 5). This provides strong direct evidence that the germline mutation rate in protein-coding regions is, on average, substantially higher in males. A similar finding was recently reported for de novo CNVs 10 . In addition, we observe that the number of de novo events is positively correlated with increasing paternal age (Spearman’s rank correlation 5 0.19; P , 0.008; Fig. 1b). Together, these observations are consistent with the hypothesis that the modest increased risk for children of older fathers to develop ASD 5 is the result of an increased mutation rate. Using sequence read-depth methods in 122 of the 189 families, we scanned ASD probands for either de novo CNVs or rare (,1% of controls), inherited CNVs. Individual events were validated by either array CGH or genotyping array (see Methods). We identified 76 events in 53 individuals, including six de novo (median size 467 kilobases (kb)) and 70 inherited (median size 155kb) CNVs (Supplementary Table 6). These include disruptions of EHMT1 (Kleefstra’s syndrome, Online Mendelian Inheritance in Man (OMIM) accession 610253), CNTNAP4 (reported in children with developmental delay and aut- ism 11 ) and the 16p11.2 duplication (OMIM 611913) associated with developmental delay, bipolar disorder and schizophrenia. We performed a multivariate analysis on non-verbal IQ (NVIQ), verbal IQ (VIQ) and the load of ‘extreme’ de novo mutations—where extreme is defined as point mutations that truncate proteins, intersect 1 Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA. 2 Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington 98195, USA. 3 Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA. 4 Santa Fe Institute, Santa Fe, New Mexico 87501, USA. 5 Howard Hughes Medical Institute, Seattle, Washington 98195, USA. 00 MONTH 2012 | VOL 000 | NATURE | 1 Macmillan Publishers Limited. All rights reserved ©2012

Transcript of Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations

LETTERdoi:10.1038/nature10989

Sporadic autism exomes reveal a highlyinterconnected protein network of de novo mutationsBrian J. O’Roak1, Laura Vives1, Santhosh Girirajan1, Emre Karakoc1, Niklas Krumm1, Bradley P. Coe1, Roie Levy1, Arthur Ko1, Choli Lee1,Joshua D. Smith1, Emily H. Turner1, Ian B. Stanaway1, Benjamin Vernot1, Maika Malig1, Carl Baker1, Beau Reilly2, Joshua M. Akey1,Elhanan Borenstein1,3,4, Mark J. Rieder1, Deborah A. Nickerson1, Raphael Bernier2, Jay Shendure1 & Evan E. Eichler1,5

It is well established that autism spectrum disorders (ASD) have astrong genetic component; however, for at least 70% of cases, theunderlying genetic cause is unknown1. Under the hypothesis thatde novo mutations underlie a substantial fraction of the risk fordeveloping ASD in families with no previous history of ASD orrelated phenotypes—so-called sporadic or simplex families2,3—wesequenced all coding regions of the genome (the exome) forparent–child trios exhibiting sporadic ASD, including 189 newtrios and 20 that were previously reported4. Additionally, we alsosequenced the exomes of 50 unaffected siblings corresponding tothese new (n 5 31) and previously reported trios (n 5 19)4, for atotal of 677 individual exomes from 209 families. Here we showthat de novo point mutations are overwhelmingly paternal inorigin (4:1 bias) and positively correlated with paternal age, con-sistent with the modest increased risk for children of older fathersto develop ASD5. Moreover, 39% (49 of 126) of the most severe ordisruptive de novo mutations map to a highly interconnectedb-catenin/chromatin remodelling protein network ranked signifi-cantly for autism candidate genes. In proband exomes, recurrentprotein-altering mutations were observed in two genes: CHD8 andNTNG1. Mutation screening of six candidate genes in 1,703 ASDprobands identified additional de novo, protein-altering muta-tions in GRIN2B, LAMC3 and SCN1A. Combined with copynumber variant (CNV) data, these results indicate extreme locusheterogeneity but also provide a target for future discovery,diagnostics and therapeutics.

We selected 189 autism trios from the Simons Simplex Collection(SSC)6, which included males significantly impaired with autism andintellectual disability (n 5 47), a female sample set (n 5 56) of which26 were cognitively impaired, and samples chosen at random from theremaining males in the collection (n 5 86) (Supplementary Table 1and Supplementary Fig. 1). In general, we excluded samples known tocarry large de novo CNVs2. Exome sequencing was performed asdescribed previously4, but with an expanded target definition (seeMethods). We achieved sufficient coverage for both parents and childto call genotypes for, on average, 29.5 megabases (Mb) of haploidexome coding sequence (Supplementary Table 1). In addition, weperformed copy number analysis on 122 of these families, using acombination of the exome data, array comparative genomic hybrid-ization (CGH), and genotyping arrays, thereby providing a more com-prehensive view of rare variation.

In the 189 new probands, we validated 248 de novo events, 225 singlenucleotide variants (SNVs), 17 small insertions/deletions (indels), andsix CNVs (Supplementary Table 2). These included 181 non-synonymous changes, of which 120 were classified as severe basedon sequence conservation and/or biochemical properties (Methodsand Supplementary Table 3). The observed point mutation rate incoding sequence was ,1.3 events per trio or 2.17 3 1028 per base

per generation, in close agreement with our previous observations4,yet in general, higher than previous studies, indicating increasedsensitivity (Supplementary Table 2 and Supplementary Table 4)7.We also observed complex classes of de novo mutation including: fivecases of multiple mutations in close proximity; two events consistentwith paternal germline mosaicism (that is, where both siblings con-tained a de novo event observed in neither parent); and nine eventsshowing a weak minor allele profile consistent with somatic mosaicism(Supplementary Table 3 and Supplementary Figs 2 and 3).

Of the severe de novo events, 28% (33 of 120) are predicted totruncate the protein. The distribution of synonymous, missense andnonsense changes corresponds well with a random mutation model7

(Supplementary Fig. 4 and Supplementary Table 2). However, thedifference in nonsense rates between de novo and rare singleton events(not present in 1,779 other exomes) is striking (4:1) and suggestsstrong selection against new nonsense events (Fisher’s exact test,P , 0.0001). In contrast with a recent report8, we find no significantdifference in mutation rate between affected and unaffected indivi-duals; however, we do observe a trend towards increased non-synonymous rates in probands, consistent with the findings of ref. 9(Supplementary Tables 1 and 2).

Given the association of ASD with increased paternal age5 and ourprevious observations4, we used molecular cloning, read-pair informa-tion, and obligate carrier status to identify informative markers linkedto 51 de novo events and observed a marked paternal bias (41:10;binomial P , 1.4 3 1025; Fig. 1a and Supplementary Tables 3 and 5).This provides strong direct evidence that the germline mutation rate inprotein-coding regions is, on average, substantially higher in males. Asimilar finding was recently reported for de novo CNVs10. In addition,we observe that the number of de novo events is positively correlatedwith increasing paternal age (Spearman’s rank correlation 5 0.19;P , 0.008; Fig. 1b). Together, these observations are consistent withthe hypothesis that the modest increased risk for children of olderfathers to develop ASD5 is the result of an increased mutation rate.

Using sequence read-depth methods in 122 of the 189 families, wescanned ASD probands for either de novo CNVs or rare (,1% ofcontrols), inherited CNVs. Individual events were validated by eitherarray CGH or genotyping array (see Methods). We identified 76 eventsin 53 individuals, including six de novo (median size 467 kilobases(kb)) and 70 inherited (median size 155 kb) CNVs (SupplementaryTable 6). These include disruptions of EHMT1 (Kleefstra’s syndrome,Online Mendelian Inheritance in Man (OMIM) accession 610253),CNTNAP4 (reported in children with developmental delay and aut-ism11) and the 16p11.2 duplication (OMIM 611913) associated withdevelopmental delay, bipolar disorder and schizophrenia.

We performed a multivariate analysis on non-verbal IQ (NVIQ),verbal IQ (VIQ) and the load of ‘extreme’ de novo mutations—whereextreme is defined as point mutations that truncate proteins, intersect

1Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA. 2Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle,Washington 98195, USA. 3Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA. 4Santa Fe Institute, Santa Fe, New Mexico 87501, USA. 5HowardHughes Medical Institute, Seattle, Washington 98195, USA.

0 0 M O N T H 2 0 1 2 | V O L 0 0 0 | N A T U R E | 1

Macmillan Publishers Limited. All rights reserved©2012

Mendelian or ASD loci (n 5 57), or de novo CNVs that intersect genes(n 5 5) (Fig. 1c and Supplementary Discussion). NVIQ, but not VIQ,decreased significantly (P , 0.01) with increased number of events.Covariant analysis of the samples with CNV data showed that thisfinding was strengthened, but not exclusively driven, by the presenceof either de novo or rare CNVs (Supplementary Fig. 5).

Among the de novo events, we identified 62 top ASD risk con-tributing mutations based on the deleteriousness of the mutations,functional evidence, or previous studies (Table 1). Probands with thesemutations spanned the range of IQ scores, with only a modest non-significant trend towards individual’s co-morbid with intellectualdisability (Supplementary Figs 1 and 6). We observed recurrent,protein-disruptive mutations in two genes: NTNG1 (netrin G1) andCHD8 (chromodomain helicase DNA binding protein 8). Given theirlocus-specific mutation rates, the probability of identifying two inde-pendent mutations in our sample set is low (uncorrected, NTNG1:P , 1.2 3 1026; CHD8: P , 6.9 3 1025) (Supplementary Fig. 7,Supplementary Table 8 and Methods). NTNG1 is a strong biologicalcandidate given its role in laminar organization of dendrites and axonalguidance12 and was also reported as being disrupted by a de novo trans-location in a child with Rett’s syndrome, without MECP2 mutation13.Both de novo mutations identified here are missense (p.Tyr23Cys andp.Thr135Ile) at highly conserved positions predicted to disrupt proteinfunction, although there is evidence of mosaicism for the former muta-tion (Supplementary Table 3).

CHD8 has not previously been associated with ASD and codes foran ATP-dependent chromatin-remodelling factor that has a signifi-cant role in the regulation of both b-catenin and p53 signalling14,15. Wealso identified de novo missense variants in CHD3 as well as CHD7(CHARGE syndrome, OMIM 214800), a known binding partner ofCHD8 (ref. 16). ASD has been found in as many as two-thirds ofchildren with CHARGE, indicating that CHD7 may contribute to anASD syndromic subtype17.

We identified 30 protein-altering de novo events intersecting withMendelian disease loci (Supplementary Table 3) as well as inheritedhemizygous mutations of clinical significance (Supplementary Table 9).

The de novo mutations included truncating events in syndromicintellectual disability genes (MBD5 (mental retardation, autosomaldominant 1, OMIM 156200), RPS6KA3 (Coffin–Lowry syndrome,OMIM 303600) and DYRK1A (the Down’s syndrome candidategene, OMIM 600855)), and missense variants in loci associated withsyndromic ASD, including CHD7, PTEN (macrocephaly/autismsyndrome, OMIM 605309) and TSC2 (tuberous sclerosis complex,OMIM 613254). Notably, DYRK1A is a highly conserved genemapping to the Down’s syndrome critical region (SupplementaryFig. 8). The proband here (13890) is severely cognitively impairedand microcephalic, consistent with previous studies of DYRK1Ahaploinsufficiency in both patients and mouse models18.

Twenty-one of the non-synonymous de novo mutations map toCNV regions recurrently identified in children with developmentaldelay and ASD (Supplementary Table 10), such as MBD5 (2q23.1 dele-tion syndrome), SYNRG (17q12 deletion syndrome) and POLRMT(19p13.3 deletion)19. There is also considerable overlap with genes dis-rupted by single de novo CNVs in children with ASD (for example,NLGN1 and ARID1B; Supplementary Table 11). Given the priorprobability that these loci underlie genomic disorders, the disruptivede novo SNVs and small indels may be pinpointing the possible majoreffect locus for ASD-related features. For example, we identified a com-plex de novo mutation resulting in truncation of SETBP1 (SET bindingprotein 1), one of five genes in the critical region for del(18)(q12.2q21.1)syndrome (Fig. 1d), which is characterized by hypotonia, expressivelanguage delay, short stature and behavioural problems20. Recurrentde novo missense mutations at SETBP1 were recently reported to becausative for a distinct phenotype, Schinzel–Giedion syndrome,probably through a gain-of-function mechanism21, indicating diversephenotypic outcomes at this locus depending on mutation mechanism.

Several of the mutated genes encode proteins that directly interact,suggesting a common biological pathway. From our full list of genescarrying truncating or severe missense mutations (126 events from all209 families), we generated a protein–protein interaction (PPI) net-work based on a database of physical interactions (SupplementaryTable 12)22. We found 39% (49 of 126) of the genes mapped to a highly

0 1 2+

20

40

60

80

10

01

20

14

0

ba

d Chr18: 40000000 40500000 41000000 41500000

18q12.3

SETBP1 SLC14A2SLC14A1

SIGLEC15

EPG5

Cases

Controls

c

Pate

rnal ag

e (m

on

ths)

Number of de novo coding mutations

0 1 2 3+

25

03

50

45

05

50

Number of extreme de novo mutations

No

n-v

erb

al IQ

41paternalevents

10maternalevents

AAT

TTT

T

AA

GG

Figure 1 | De novo mutation events in autism spectrum disorder.a, Haplotype phasing using informative markers shows a strong parent-of-origin bias with 41 of 51 de novo events occurring on the paternally inheritedhaplotype. Arrows represent sequence reads from paternal (blue) or maternal(red) haplotypes. b, c, Box and whisker plots for 189 SSC probands. b, Thepaternal estimated age at conception versus the number of observed de novopoint mutations (0, n 5 53; 1, n 5 65; 2, n 5 44; 31, n 5 27). c, Decreased non-verbal IQ is significantly associated with an increasing number of extreme

mutation events (0, n 5 138; 1, n 5 41; 21, n 5 10), both with and withoutCNVs (Supplementary Discussion). d, Browser images showing CNVsidentified in the del(18)(q12.2q21.1) syndrome region. The truncating pointmutation in SETBP1 occurs within the critical region, identifying the likelycausative locus. Each red (deletion) and green (duplication) line represents anidentified CNV in cases (solid lines) versus controls (dashed lines), witharrowheads showing point mutation.

RESEARCH LETTER

2 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012

interconnected network wherein 92% of gene pairs in the connectedcomponent are linked by paths of three or fewer edges (Fig. 2a). Wetested this degree of interconnectivity by simulation (n 5 10,000 repli-cates; Methods and Supplementary Fig. 9) and found that our experi-mental network had significantly more edges (P , 0.0001) and agreater clustering coefficient (P , 0.0001) than expected by chance.

To investigate the relevance of this network to autism further, weapplied degree-aware disease gene prioritization (DADA)23, based onthe same PPI database to rank all genes based on their relatedness to a

set of 103 previously identified ASD genes17. We found that the geneswith severe mutations ranked significantly higher than all other genes(Mann–Whitney U-test, P , 4.0 3 1024), suggesting enrichment ofASD candidates. Furthermore, the 49 members of the connected com-ponent overwhelmingly drove this difference (Mann–Whitney U-test,P , 1.6 3 1028), as the unconnected members were not significant ontheir own (Mann–Whitney U-test, P , 0.28), increasing our confid-ence that these connected gene products are probably related to ASD(Supplementary Fig. 10). Consistent with this finding, the rankings ofunaffected sibling events are highly similar to the unconnected com-ponent, strengthening our confidence in the enrichment of the con-nected component of proband events for ASD-relevant genes.

Members of this network have known functions in b-catenin andp53 signalling, chromatin remodelling, ubiquitination and neuronaldevelopment (Fig. 2a). A fundamental developmental regulatorobserved in the network is CTNNB1 (catenin (cadherin-associatedprotein), b1, 88 kDa), also known as b-catenin. Interestingly, a parallelanalysis using ingenuity pathway analysis (IPA) shows an enrichmentof upstream interacting genes of the b-catenin pathway (8 of 358,P 5 0.0030; see Methods, Supplementary Table 13 and Supplemen-tary Fig. 11). A role for Wnt/b-catenin signalling in ASD was previ-ously proposed24, largely on the basis of the association of commonvariants in EN2 and WNT2, and the high rate of children withmacrocephaly. It is striking that both individuals with CHD8 muta-tions in this study have multiple de novo disruptive missense muta-tions in this pathway or closely related pathways (Fig. 2b, c andSupplementary Fig. 12) and both have macrocephaly.

In addition, the pathway analysis shows several other disrupted genesnot identified in the PPI that are involved in common pathways, whichin some cases are linked to b-catenin (Supplementary Discussion andSupplementary Fig. 11). TBR1, for example, is a transcription factor thathas a critical role in the development of the cerebral cortex25. TBR1binds with CASK and regulates several candidate genes for ASD andintellectual disability including GRIN2B, AUTS2 and RELN—genes ofrecurrent ASD mutation, some of which are described here and inother studies4,9,11,17.

Our exome analysis of de novo coding mutations in 209 autism triosidentified only two recurrently altered genes, consistent with extremelocus heterogeneity underlying ASD. This extreme heterogeneitynecessitates the analysis of very large cohorts for validation. We imple-mented a cost-effective approach based on molecular inversion probe(MIP) technology26 for the targeted resequencing of six candidategenes in ,2,500 individuals, including 1,703 simplex ASD probandsand 744 controls. Four of these candidates (FOXP1, GRIN2B, LAMC3and SCN1A) were identified previously4, whereas two (FOXP2, OMIM602081 and GRIN2A, OMIM 613971) are related genes implicated inother neurodevelopmental phenotypes. We identified all previouslyobserved de novo events (that is, in the same individuals), as well asadditional de novo events in GRIN2B (two protein-truncating events),SCN1A (a missense) and LAMC3 (a missense) (Supplementary Table 8).The observed number of de novo events was compared with expecta-tions based on the mutation rates estimated for each gene (Methodsand Supplementary Table 8), with GRIN2B showing the highest sig-nificance (uncorrected P value ,0.0002). Notably, the three de novoevents observed in GRIN2B are all predicted to be protein truncating,whereas no events truncating GRIN2B were found in more than 3,000controls (Methods).

Our analysis predicts extreme locus heterogeneity underlying thegenetic aetiology of autism. Under a strict sporadic disorder–de novomutation model, if 20–30% of our de novo point mutations are con-sidered to be pathogenic, we can estimate between 384 and 821 loci(Methods and Supplementary Fig. 13). We reach a similar estimate ifwe consider recurrences from ref. 9. It is clear from phenotype andgenotype data that there are many ‘autisms’ represented under thecurrent umbrella of ASD and other genetic models are more likelyin different contexts (for example, families with multiple affected

Table 1 | Top de novo ASD risk contributing mutationsProband NVIQ Candidate gene Amino acid change

12225.p1 89 ABCA2 p.Val1845Met11653.p1 44 ADCY5 p.Arg603Cys12130.p1 55 ADNP Frameshift indel11224.p1 112 AP3B2 p.Arg435His13447.p1 51 ARID1B Frameshift indel13415.p1 48 BRSK2 3n indel14292.p1 49 BRWD1 Frameshift indel11872.p1 65 CACNA1D p.Ala769Gly11773.p1 50 CACNA1E p.Gly1209Ser13606.p1 60 CDC42BPB p.Arg764TERM12086.p1 108 CDH5 p.Arg545Trp12630.p1 115 CHD3 p.Arg1818Trp13733.p1 68 CHD7 p.Gly996Ser13844.p1 34 CHD8 p.Gln959TERM12752.p1 93 CHD8 Frameshift indel13415.p1 48 CNOT4 p.Asp48Asn12703.p1 58 CTNNB1 p.Thr551Met11452.p1 80 CUL3 p.Glu246TERM11571.p1 94 CUL5 p.Val355Ile13890.p1 42 DYRK1A Splice site12741.p1 87 EHD2 p.Arg167Cys11629.p1 67 FBXO10 p.Glu54Lys13629.p1 63 GPS1 p.Arg492Gln13757.p1 91 GRINL1A 3n indel11184.p1 94 HDGFRP2 p.Glu83Lys11610.p1 138 HDLBP p.Ala639Ser11872.p1 65 KATNAL2 Splice site12346.p1 77 MBD5 Frameshift indel11947.p1 33 MDM2 p.Glu433Lys/p.Trp160TERM11148.p1 82 MLL3 p.Tyr4691TERM12157.p1 91 NLGN1 p.His795Tyr11193.p1 138 NOTCH3 p.Gly1134Arg11172.p1 60 NR4A2 p.Tyr275His11660.p1 60 NTNG1 p.Thr135Ile12532.p1 110 NTNG1 p.Tyr23Cys11093.p1 91 OPRL1 p.Arg157Cys13793.p1 56 PCDHB4 p.Asp555His11707.p1 23 PDCD1 Frameshift indel12304.p1 83 PSEN1 p.Thr421Ile11390.p1 77 PTEN p.Thr167Asn13629.p1 63 PTPRK p.Arg784His13333.p1 69 RGMA p.Val379Ile13222.p1 86 RPS6KA3 p.Ser369TERM11257.p1 128 RUVBL1 p.Leu365Gln11843.p1 113 SESN2 p.Ala46Thr12933.p1 41 SETBP1 Frameshift indel12565.p1 79 SETD2 Frameshift indel12335.p1 47 TBL1XR1 p.Leu282Pro11480.p1 41 TBR1 Frameshift indel11569.p1 67 TNKS p.Arg568Thr12621.p1 120 TSC2 p.Arg1580Trp11291.p1 83 TSPAN17 p.Ser75TERM11006.p1 125 UBE3C p.Ser845Phe12161.p1 95 UBR3 Frameshift indel12521.p1 78 USP15 Frameshift indel11526.p1 92 ZBTB41 p.Tyr886His13335.p1 25 ZNF420 p.Leu76Pro

CNVProband NVIQ Candidate gene Type

11928.p1 66 CHRNA7 Duplication13815.p1 56 CNTNAP4 Deletion13726.p1 59 CTNND1 Deletion12581.p1 34 EHMT1 Deletion13335.p1 25 TBX6 Duplication

Top candidate mutations based on severity and/or supporting evidence from the literature.

LETTER RESEARCH

0 0 M O N T H 2 0 1 2 | V O L 0 0 0 | N A T U R E | 3

Macmillan Publishers Limited. All rights reserved©2012

individuals). There is marked convergence on genes previously impli-cated in intellectual disability and developmental delay. As has beennoted for CNVs, this indicates that nosological divisions may notreadily translate into differences at the molecular level. We believe thatthere is value in comparing mutation patterns in children withdevelopmental delay (without features of autism) to those in childrenwith ASD.

Although there is no one major genetic lesion responsible for ASD,it is still largely unknown whether there are subsets of individuals witha common or strongly related molecular aetiology and how large thesesubsets are likely to be. Using gene expression, protein–protein inter-actions, and CNV pathway analysis, recent reports have highlightedthe role of synapse formation and maintenance27–29. We find itintriguing that 49 proteins found to be mutated here have critical rolesin fundamental developmental pathways, including b-catenin and p53signalling, and that patients have been identified with multipledisruptive de novo mutations in interconnected pathways. The latterobservations are consistent with an oligogenic model of autism whereboth de novo and extremely rare inherited SNV and CNV mutationscontribute in conjunction to the overall genetic risk. Recent work hassupported a role for these interconnected pathways in neuronal stem-cell fate-determination, differentiation and synaptic formation inhumans and animal models24,30,31. Given that fundamental develop-mental processes have previously been found to underlie syndromicforms of autism, a wider role of these pathways in idiopathic ASDwould not be entirely surprising and would help explain the extremegenetic heterogeneity observed in this study.

METHODS SUMMARYExome capture, alignments and base-calling. Genomic DNA was deriveddirectly from whole blood. Exomes were considered to be completed when,90% of the capture target exceeded 8-fold coverage and ,80% exceeded 20-foldcoverage. Exomes for the 189 trios (and 31 unaffected siblings) were captured withNimbleGen EZ Exome V2.0. Reads were mapped as in ref. 4 to a custom referencegenome assembly (GRC build37). Genotypes were generated with GATK unifiedgenotyper and parallel SAMtools pipeline4. Exomes for the unaffected siblingsmatching the pilot trios were captured and analysed as in ref. 4. Predicted de novoevents were called as in ref. 4 and confirmed by capillary sequencing in all familymembers (for 176 of the 189 trios, this also included one unaffected sibling).Mutations were considered severe if they were truncating, missense withGrantham score $50 and GERP score $3 or only Grantham score $85, or deleteda highly conserved amino acid.Exome read-depth CNV analysis. Reads were mapped using mrsFAST andnormalized reads per kilobase of exon per million mapped reads (RPKM) values

calculated by exon. Population normalization was performed using a set of 366non-ASD exomes. Calls were made if three or more exons passed a threshold valueand cross-validated calls using two orthogonal platforms, custom array CGH andIllumina 1M array data2. CNVs were filtered to identify de novo and rare inheritedevents by comparison with 2,090 controls and 1,651 parent profiles.Network reconstruction and null model estimation. PPI networks were generatedusing physical interaction data from GeneMANIA22. Null models were estimatedusing gene-specific mutation rate estimates based on human–chimp divergence. Torank candidate genes we obtained the seed ASD list from ref. 17 and severe dis-ruptive de novo events from all families (n 5 209). Given the PPI network and seedgene product list, we used DADA23 for ranking each gene.Human subjects. All samples and phenotypic data were collected under thedirection of the Simons Simplex Collection by its 12 research clinic sites (http://sfari.org/sfari-initiatives/simons-simplex-collection). Parents consented and childrenassented as required by each local institutional review board. Participantswere de-identified before distribution. Research was approved by the Universityof Washington Human Subject Division under non-identifiable biologicalspecimens/data.

Full Methods and any associated references are available in the online version ofthe paper at www.nature.com/nature.

Received 8 September 2011; accepted 23 February 2012.

Published online 4 April 2012.

1. Schaaf, C. P. & Zoghbi, H. Y. Solving the autism puzzle a few pieces at a time.Neuron 70, 806–808 (2011).

2. Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of the7q11.23 Williams syndrome region, are strongly associated with autism. Neuron70, 863–885 (2011).

3. Levy, D. et al. Rare de novo and transmitted copy-number variation in autisticspectrum disorders. Neuron 70, 886–897 (2011).

4. O’Roak, B. J. et al. Exome sequencing in sporadic autism spectrum disordersidentifies severe de novo mutations. Nature Genet. 43, 585–589 (2011).

5. Hultman, C. M., Sandin, S., Levine, S. Z., Lichtenstein, P. & Reichenberg, A.Advancing paternal age and risk of autism:new evidence froma population-basedstudy and a meta-analysis of epidemiological studies. Mol. Psychiatry 16,1203–1212 (2010).

6. Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource foridentification of autism genetic risk factors. Neuron 68, 192–195 (2010).

7. Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc.Natl Acad. Sci. USA 107, 961–968 (2010).

8. Xu, B. et al. Exome sequencing supports a de novo mutational paradigm forschizophrenia. Nature Genet. 43, 864–868 (2011).

9. Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing arestrongly associated with autism. Nature http://dx.doi.org/10.1038/nature10945(this issue).

10. Hehir-Kwa, J. Y. et al. De novo copy number variants associated with intellectualdisability have a paternal origin and age bias. J. Med. Genet. 48, 776–778 (2011).

11. O’Roak, B. J. & State, M. W. Autism genetics: strategies, challenges, andopportunities. Autism Res. 1, 4–17 (2008).

SYNE1 UBE3C

DDX20 MYBBP1A

BRSK2

POLRMT

INCENP

RUVBL1IQGAP2

CUL3

MAP4HNRNPF

NACA HDLBPKRT80

HDGFRP2

BRWD1

FBXW9

PDIA6

H2AFV

ADCY5

NR4A2CHD7

TSR2

SFPQRPS6KA3

ARID1B

CHD8

PBRM1TBL1XR1

CDC42BPBCNOT3

CNOT1

EIF4G1

MKI67

KATNAL2UBR3

SRBD1

MYH10

SCN1A

CTNNB1

PSEN1NOTCH3CDH5

ADNP

YTHDC2CHD3

DEPDC7DYRK1A 2x

13844.p1

TOP3B

TLE1

ALB

PPP3CA

APOA1

PRKACB

SCGB1A1

HNF4A

PID1LGALS3

CTCF

CTNNB1

GIF

ZNF143

GC

LGMN

LRP2

AMN

CHD8

CUBN

RCAN1

Nonsense Splice Frameshift Deletion of amino acidMissense

De novo truncating SNV CHD8

Deletion PITRM1

5 gene disruptions

De novo truncating SNV CUBN

Deletion SPNS3Deletion RCAN1

a b

c

Figure 2 | Mutations identified in protein–protein interaction (PPI)networks. a, The 49-gene connected component of the PPI network formedfrom 126 genes with severe de novo mutations among the 209 probands.b, Proband 13844 inherits three rare gene-disruptive CNVs and carries two de

novo truncating mutations. c, GeneMANIA22 view of three of the affected genes(b) (red labels) which encode proteins that are part of a b-catenin-linkednetwork. This proband is macrocephalic, impaired cognitively, and has deficitsin social behaviour and language development (Supplementary Discussion).

RESEARCH LETTER

4 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012

12. Nishimura-Akiyoshi, S., Niimi, K., Nakashiba, T. & Itohara, S. Axonal netrin-Gstransneuronally determine lamina-specific subdendritic segments. Proc. NatlAcad. Sci. USA 104, 14801–14806 (2007).

13. Borg, I. et al. Disruption of Netrin G1 by a balanced chromosome translocation in agirl with Rett syndrome. Eur. J. Hum. Genet. 13, 921–927 (2005).

14. Nishiyama, M. et al. CHD8 suppresses p53-mediated apoptosis through histoneH1recruitmentduringearly embryogenesis.NatureCell Biol. 11, 172–182 (2009).

15. Thompson, B. A., Tremblay, V., Lin, G. & Bochar, D. A. CHD8 is an ATP-dependentchromatin remodeling factor that regulates b-catenin target genes. Mol. Cell. Biol.28, 3894–3904 (2008).

16. Batsukh, T. et al.CHD8 interacts withCHD7, a proteinwhich ismutated inCHARGEsyndrome. Hum. Mol. Genet. 19, 2858–2866 (2010).

17. Betancur, C. Etiological heterogeneity in autism spectrum disorders: more than100 genetic and genomic disorders and still counting. Brain Res. 1380, 42–77(2011).

18. Moller, R. S. et al.Truncation of the Downsyndrome candidate gene DYRK1A in twounrelated patients with microcephaly. Am. J. Hum. Genet. 82, 1165–1170 (2008).

19. Cooper, G. M. et al. A copy number variation morbidity map of developmentaldelay. Nature Genet. 43, 838–846 (2011).

20. Buysse, K. et al. Delineation of a critical region on chromosome 18 for thedel(18)(q12.2q21.1) syndrome. Am. J. Med. Genet. A. 146A, 1330–1334 (2008).

21. Hoischen, A. et al. De novo mutations of SETBP1 cause Schinzel-Giedionsyndrome. Nature Genet. 42, 483–485 (2010).

22. Warde-Farley, D. et al. The GeneMANIA prediction server: biological networkintegration for gene prioritization and predicting gene function. Nucleic Acids Res.38, W214–W220 (2010).

23. Erten, S., Bebek, G., Ewing, R. & Koyuturk, M. DADA: Degree-aware algorithms fornetwork-based disease gene prioritization. BioData Mining 4, 19 (2011).

24. De Ferrari, G. V. & Moon, R. T. The ups and downs of Wnt signaling in prevalentneurological disorders. Oncogene 25, 7545–7553 (2006).

25. Bedogni, F.et al.Tbr1 regulates regional and laminar identity ofpostmitoticneuronsin developing neocortex. Proc. Natl Acad. Sci. USA 107, 13129–13134 (2010).

26. Turner, E. H., Lee, C., Ng, S. B., Nickerson, D. A. & Shendure, J. Massively parallelexon capture and library-free resequencing across 16 genomes. Nature Methods6, 315–316 (2009).

27. Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergentmolecular pathology. Nature 474, 380–384 (2011).

28. Sakai, Y. et al. Protein interactome reveals converging molecular pathways amongautism disorders. Sci. Transl. Med. 3, 86ra49 (2011).

29. Gilman, S. R. et al. Rare de novo variants associated with autism implicate a largefunctional network of genes involved in formation and function of synapses.Neuron 70, 898–907 (2011).

30. Ille, F. & Sommer, L. Wnt signaling: multiple functions in neural development. Cell.Mol. Life Sci. 62, 1100–1108 (2005).

31. Tedeschi, A. & Di Giovanni, S. The non-apoptotic role of p53 in neuronal biology:enlightening the dark side of the moon. EMBO Rep. 10, 576–583 (2009).

Supplementary Information is linked to the online version of the paper atwww.nature.com/nature.

Acknowledgements We would like to thank and recognize the following ongoingstudies that produced and provided exome variant calls for comparison: NHLBI LungCohort Sequencing Project (HL 1029230), NHLBI WHI Sequencing Project (HL102924), NIEHS SNPs (HHSN273200800010C), NHLBI/NHGRI SeattleSeq (HL094976), and theNorthwestGenomicsCenter (HL102926).Wearegrateful toall of thefamilies at the participating Simons Simplex Collection (SSC) sites, as well as theprincipal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne,D. Geschwind, E. Hanson, D. Grice, A. Klin, R. Kochel, D. Ledbetter, C. Lord, C. Martin,D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier,M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We alsoacknowledge M. State and the Simons Simplex Collection Genetics Consortium forproviding Illumina genotyping data, T. Lehner and the Autism Sequencing Consortiumfor providing an opportunity for pre-publication data exchange among theparticipating groups. We appreciate obtaining access to phenotypic data on SFARIBase. This work was supported by the Simons Foundation Autism Research Initiative(SFARI 137578 and 191889; E.E.E., J.S. and R.B.) and NIH HD065285 (E.E.E. and J.S.).E.B. is anAlfredP. Sloan Research Fellow. E.E.E. is an Investigator of theHowardHughesMedical Institute.

Author Contributions E.E.E., J.S. and B.J.O. designed the study and drafted themanuscript. E.E.E. and J.S. supervised the study. R.B., B.R. and B.J.O. analysed theclinical information. R.B., L.V., S.G., E.K., N.K. and B.P.C. contributed to the manuscript.S.G., N.K., B.P.C., A.K., C.B., M.M. and L.V. generated and analysed CNV data. B.J.O. andL.V. performed MIP resequencing and mutation validations. I.B.S., E.H.T., B.J.O. and J.S.developed MIP protocol and analysis. B.V. and J.M.A. generated loci-specific mutationrate estimates. R.L. and E.B. performed PPI network analysis and simulations. E.K.performed DADA analysis. C.L. performed Illumina sequencing. J.D.S., I.B.S., E.H.T. andC.L. analysed sequence data. B.P.C. performed IPA analysis. B.J.O., E.K. and N.K.developed the de novo analysis pipelines and analysed sequence data. D.A.N., M.J.R.,J.D.S. and E.H.T. supervised exome sequencing and primary analysis.

Author Information Access to the raw sequence reads can be found at the NCBIdatabase of Genotypes and Phenotypes (dbGaP) and National Database for AutismResearch under accession numbers phs000482.v1.p1 and NDARCOL0001878,respectively. Reprints and permissions information is available at www.nature.com/reprints. The authors declare competing financial interests: details accompany thefull-text HTML version of thepaper atwww.nature.com/nature. Readers are welcome tocomment on the online version of this article at www.nature.com/nature.Correspondence and requests for materials should be addressed to E.E.E.([email protected]) or J.S. ([email protected]).

LETTER RESEARCH

0 0 M O N T H 2 0 1 2 | V O L 0 0 0 | N A T U R E | 5

Macmillan Publishers Limited. All rights reserved©2012

METHODSExome capture, alignments and base-calling. Exomes for the 189 trios (and 31unaffected siblings) were captured with NimbleGen EZ Exome V2.0. Finallibraries were then sequenced on either an Illumina GAIIx (paired- or single-end 76-bp reads) or HiSeq2000 (paired- or single-end 50-bp reads). Reads weremapped to a custom GRCh37/hg19 build using BWA 0.5.6 (ref. 32). Read qualitieswere recalibrated using GATK Table Recalibration 1.0.2905 (ref. 33). Picard-tools1.14 was used to flag duplicate reads (http://picard.sourceforge.net/). GATKIndelRealigner 1.0.2905 was used to realign reads around insertion/deletion(indel) sites. Genotypes were generated with GATK Unified Genotyper33 withFILTER 5 ‘‘QUAL # 50.0 jj AB $ 0.75 jjHRun . 3 jjQD , 5.0’’ and in parallelwith the SAMtools pipeline as described previously4. Only positions with at leasteightfold coverage were considered. All pilot sibling exomes were captured andanalysed as described previously4. Predicted de novo events were called and com-pared against a set of 946 other exomes to remove recurrent artefacts and likelyundercalled sites. Indels were also called with the GATK Unified Genotyper andSAMtools and filtered to those with at least 25% of reads showing a variant at aminimum depth of 83. Mutations were phased using molecular cloning of PCRfragments, read-pair information, linked informative SNPs, and obligate carrierstatus. To identify rare private variants (singleton), the full variant list was com-pared against a larger set of 1,779 other exomes. Predicted de novo indels were alsofiltered against this larger set.Sanger validations. All reported de novo events (exome or MIP capture) werevalidated by designing primers with BatchPrimer3 followed by PCR amplificationand Sanger sequencing. We performed PCR reactions using 10 ng of DNA fromfather, mother, unaffected sibling (when available), and proband and performedSanger capillary sequencing of the PCR product using forward and reverse primers.In some cases, one direction could not be assessed due to the presence of repeatelements or indels in close proximity to the mutation event.Mutation candidate gene analysis. We examined whether each non-synonymousor CNV de novo event may be contributing to the aetiology of ASD by evaluatingthe likelihood deleteriousness of the change (GERP, Grantham score) andintersecting with known syndromic and non-syndromic candidate genes, CNVmorbidity maps, and information in OMIM and PubMed. Mutations were con-sidered severe if they were truncating, missense with Grantham score $50 andGERP score $3 or only Grantham score $85, or deleted a highly conserved aminoacid. For genes that had not previously been implicated in ASD, we gave priority tothose with structural similarities to known candidate or strong evidence of neuralfunction or development.Exome read-depth CNV discovery. To find CNVs using exome read-depth data,we first mapped sequenced reads to the hg19 exome using the mrsFAST aligner34.Next, we applied a novel method (N.K. et al., manuscript in preparation), whichuses normalized RPKM values35 of the ,194,000 captured exons/sequences,subsequent population normalization using 366 exomes from the ExomeSequencing Project and singular value decomposition to remove systematic biaspresent within exome capture reactions. Rare CNVs were detected using athreshold cutoff of the normalized RPKM values, and we required at least threeexons above our threshold in order to make a call. We made a total of 1,077deletion or duplication calls in 366 individuals (range 0–14, median 5 3,mean 5 2.94).CNV detection using array CGH. A custom-targeted 2 3 400K Agilent chip withmedian probe spacing of 500 bp in the genomic hotspots flanked by segmentalduplications or Alu repeats and probe spacing of 14 kb in the genomic backbonewas designed. All experiments were performed according to the manufacturer’sinstructions using NA12878 as the female reference and NA18507 as the malereference (Coriell). Data analysis was performed following feature extraction usingDNA analytics with ADM-2 setting. All CNV calls were visually inspected in theUCSC Genome Browser. CNV calls from probands were then intersected withthose from parents and also with 377 controls recruited through NIMH GeneticsInitiative36,37 and ClinSeq cohort38 analysed on the same microarray platform. TheNIMH set of controls were ascertained by the NIMH Genetics Initiative36 throughan online self-report based on the Composite International Diagnostic InstrumentShort-Form (CIDI-SF)37. Those who did not meet DSM-IV criteria for majordepression, denied a history of bipolar disorder or psychosis, and reported exclu-sively European origins were included39,40. Samples from the ClinSeq cohort wereselected from a population representing a spectrum of atherosclerotic heartdisease38. De novo and inherited potential pathogenic CNVs were selected onlyif they intersected with RefSeq coding sequence and allowing for a frequency of,1% in the controls and ,50% segmental duplication content.Illumina array CNV calling. CNV calling was performed in hg18 as describedpreviously41, using an HMM that incorporates both allele frequencies (BAF) andtotal intensity values (logR). In total, we generated CNV calls for 841 probands,1,651 parents and 793 siblings including the samples reported recently2. Of the 122

families selected for CNV comparisons in this study, calls were generated for 107probands. Of these, both parents were profiled for 101 families and one parent wasprofiled for the remaining six families. In addition, at least one sibling was profiledfor 99 of these families.

Independent of array CGH detection, to identify putatively pathogenic CNVs,we first compared our data to 2,090 control samples derived from the WellcomeTrust Case Control Consortium (WTCCC) National Blood Services Cohort19,42

and filtered all CNVs present in 1% (20) of WTCCC2 controls or 1% (16) ofparents by 50% reciprocal overlap with matching copy number status. In addition,similar to the filtering criteria used for array CGH detection, we selected onlyCNVs that contained less than 50% segmental duplication and intersected withRefSeq coding sequence. To select putative de novo CNVs, we further required theCNV not to be present in family-matched parents and siblings. Additionally, wefiltered CNVs present in .0.1% (2) of the full 1,651 parent set. To select potential,rare inherited events, we required the CNV be detected in a matched parent orsibling. Finally, we filtered the genes inside each CNV under the same criteria (toaccount for smaller or larger CNPs) and removed CNVs with no remaining genes.CNV cross validation. High-confidence, cross-validated de novo and inheritedCNVs were selected by identifying events detected by at least two of threemethodologies. To account for the variable breakpoint definitions in arrayCGH, SNP arrays, and exome copy number profiles, we aligned the CNVs by atleast one overlapping gene ID and reported each CNV region by its maximal outerboundaries. This identified six de novo and 70 rare inherited events for furtherstudy (Supplementary Table 6).Ingenuity pathway analysis. Ingenuity pathway analysis (IPA) was performed toidentify potential functional enrichments within both our PPI (49 genes) andoverall set of 126 genes. RefSeq reference gene list was used as a background listfor all analysis. To confirm our results pertaining to CTNNB1 upstream enrich-ment, we simulated 10,000 random populations of 209 individuals using Poissonpriors for each gene based on their estimated mutation rates (see below), with aglobal correction factor resulting in selecting a mean of 126 genes per population.We then used this simulation data to calculate the probability of observing eightdirect upstream interactors of CTNNB1 and determined that our data set isenriched for these genes with P 5 0.0030.Estimating locus-specific mutation rates. Human–chimpanzee alignments weredownloaded from the UCSC Genome Browser (reference versions GRCb37 andpanTro2, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsPanTro2/syntenicNet/).The more conservative syntenicNet alignments were used (details in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsPanTro2/README.txt). Gene defi-nitions were downloaded from the UCSC Table Browser, from the RefSeq Genestrack, and the refFlat table. Exons were extended by 2 bp, and overlapping exonswere merged using BEDTools. Non-exonic sequence was not considered. For eachgene, we extracted: (1) d 5 the number of differences between chimpanzee andhuman; and (2) n 5 the number of bases aligned. We assumed a divergence timebetween human and chimpanzee of 12 million years (Myr) and an average genera-tion time of 25 years. We then calculated gene-specific mutation rates per site pergeneration: r 5 (d/n)/(12 Myr/25 years/generation). We calculated the probabilityof observing X1 events using the Poisson distribution defined by the number ofchromosomes screened and the size of the coding region, including actual splicebases.Network simulation and null model estimation. To generate a null distributionof gene mutations, de novo mutation rates were estimated from human–chimpmutation rates. A pseudocount of 2.083331026 (the smallest calculated in thegene set) was applied to any exon with a mutation rate of zero. To create null genesets, genes were drawn uniformly from this background distribution. Humanprotein–protein interaction data were collected from GeneMANIA22 on 29August 2011. Only direct physical interactions from the Homo sapiens databasewere considered. The list comprises approximately 1.5 million physical interac-tions, gathered from 150 studies. A protein interaction network was created fromeach experimental and null gene set by drawing edges between genes with physicalinteractions reported in the GeneMANIA database. Qualitatively similar resultswere achieved by including only interactions supported by multiple independentdata sources. For each network, clustering coefficient, centralization, average shortestpath length, density, and heterogeneity were determined using Cytoscape43 andNetwork Analyzer44. Duplicate- and self-interactions were not considered in cal-culating network statistics.Disease gene prioritization based on PPI networks. We applied degree-awarealgorithms to rank a set of candidate genes with respect to a set of products of genesassociated with ASD using human PPI networks. We used the integrated humanPPI network data collected from GeneMANIA22 on 29 August 2011. The PPInetwork contains 12,007 proteins with ,1.5 million direct physical interactionsassociated with a reliability score. We obtain the seed proteins for the ASD fromthe list of ref. 17. For the candidate set we used 126 gene products from the severe

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2012

disruptive de novo events from the pilot autism project4 and the current study.Given the GeneMANIA PPI network and Betancur seed gene product list, we usedDADA23 for ranking the candidate genes. We emphasize that this ranking is notimplying causality but rather relatedness to genes previously and independentlyassociated with ASD. For testing the significance of this ranking, we rank all thegene products except the seed set using the same algorithm. On the basis of theranking result, we applied a Mann–Whitney U rank sum test (one-tailed) on thecandidate set compared to all the other genes.MIP protocol. Each of 1,703 autism probands from the SSC collection and 744controls from the NIMH collection was subjected to MIP-based multiplex captureof the six genes: SCN1A, GRIN2B, GRIN2A, LAMC3, FOXP1 and FOXP2. For eachlibrary, 50 ng of DNA was used. Individually synthesized 70 mer MIPs (n 5 355)were pooled and 59 phosphorylated with T4 PNK (NEB). Hybridization with MIPs,gap filling and ligation were performed in one step for 45–48 h at 60 uC, followed byan exonuclease treatment of 30 min at 37 uC, similar to ref. 45, with modifications forreduced MIP number (B.J.O. et al., manuscript in preparation). Amplification of thelibrary was performed by PCR using different barcoded primers for each library.Then barcoded libraries were pooled, purified using Agencourt AMPure XP andone lane of 101-bp paired-end reads was generated for each mega-pool (,384) onan Illumina HiSeq 2000 according to manufacturer’s instructions. Raw reads weremapped to the genome as in ref. 4. MIP targeting arms were then removed andvariants called using SAMtools4. A 25-fold coverage, with AB allele ration ,0.7,and quality 30 threshold was used for high-confident variant calling. Private(possible de novo) variants were identified by filtering against 1,779 other exomes.The parents of children with disruptive rare variants were then captured. Variantsnot seen or with low coverage in the parents were validated by Sanger capillary-based fluorescent sequencing. No truncating variants of GRIN2B were observed inthe MIP sequenced controls or the Exome Variant Server ESP2500 release (NHLBIExome Sequencing Project (ESP), Seattle, Washington, http://evs.gs.washington.edu/EVS/).Estimating the number of autism loci. The gene-level specificity of exomesequencing enables the estimation of the number of recurrently mutated genesimplicated in the genetic aetiology of sporadic ASD. This question can bereformulated as the ‘unseen species problem’ (see ref. 46 for review and ref. 2for application to de novo CNVs discovered in autism), where genes with severe denovo events in probands are considered ‘observed species’, and binned by theirfrequency of appearance (that is, singletons, doubletons, etc.). We estimated thetotal number of genes implicated in autism (the total number of species) usingseveral different estimators (implemented in the R package SPECIES, http://www.jstatsoft.org/), as well as the formula provided in ref. 2. This estimate dependson the number of singletons and twin pairs of genes observed in probands, as wellas the fraction of de novo events believed to be pathogenic for autism, that is, single,disruptive events that can cause autism on their own. We assumed that both of our

recurrent severe de novo events (affecting CHD8 and NTNG1) were pathogenic;these compose the entire set of twin pairs. The number of singletons is based on theestimated a priori fraction of the observed events that are pathogenic for autism.Across this sliding scale, the estimated number of loci is plotted in SupplementaryFig. 13. For example, using the estimator from ref. 47, if 20–50% of our de novosevere events are considered pathogenic, exome sequencing of a large number ofadditional samples would reveal between 182 and 992 pathogenic genes harbour-ing coding de novo point mutations (Supplementary Fig. 13); if all the observedsevere de novo events in our experiment are included as pathogenic singletons, thenumber of implicated loci increases to more than 3,000.

32. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheelertransform. Bioinformatics 25, 1754–1760 (2009).

33. DePristo, M. A. et al. A framework for variation discovery and genotyping usingnext-generation DNA sequencing data. Nature Genet. 43 (2011).

34. Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping.Nature Methods 7, 576–577 (2010).

35. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping andquantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 621–628(2008).

36. Moldin, S. O. NIMH Human Genetics Initiative: 2003 update. Am. J. Psychiatry 160,621–622 (2003).

37. Kessler, R. C. & Ustun, T. B. The World Mental Health (WMH) survey initiativeversion of the World Health Organization (WHO) Composite InternationalDiagnostic Interview (CIDI). Int. J. Methods Psychiatr. Res. 13, 93–121 (2004).

38. Biesecker, L. G. et al. The ClinSeq Project: piloting large-scale genome sequencingfor research in genomic medicine. Genome Res. 19, 1665–1674 (2009).

39. Talati, A., Fyer, A. J. & Weissman, M. M. A comparison between screened NIMH andclinically interviewed control samples on neuroticism and extraversion. Mol.Psychiatry 13, 122–130 (2008).

40. Baum,A. E.et al.Agenome-wideassociation study implicatesdiacylglycerol kinaseeta (DGKH) and several other genes in the etiology of bipolar disorder. Mol.Psychiatry 13, 197–207 (2008).

41. Itsara, A. et al. Population analysis of large copy number variants and hotspots ofhuman genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).

42. Craddock, N. et al. Genome-wide association study of CNVs in 16,000 cases ofeight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).

43. Smoot, M. E., Ono, K., Ruscheinski, J., Wang, P. L. & Ideker, T. Cytoscape 2.8: newfeatures for data integration and network visualization. Bioinformatics 27,431–432 (2011).

44. Assenov, Y., Ramirez, F., Schelhorn, S. E., Lengauer, T. & Albrecht, M. Computingtopological parametersofbiological networks. Bioinformatics 24,282–284 (2008).

45. Mamanova, L. et al. Target-enrichment strategies for next-generation sequencing.Nature Methods 7, 111–118 (2010).

46. Bunge, J. & Fitzpatrick, M. Estimating the number of species - a Review. J. Am. Stat.Assoc. 88, 364–373 (1993).

47. Chao, A. & Lee, S. M. Estimating the number of classes via sample coverage. J. Am.Stat. Assoc. 87, 210–217 (1992).

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2012

W W W. N A T U R E . C O M / N A T U R E | 1

SUPPLEMENTARY INFORMATIONdoi:10.1038/nature10989

! "!

Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations Brian J. O’Roak, Laura Vives, Santhosh Girirajan, Emre Karakoc, Nik Krumm, Bradley P. Coe, Roie Levy, Arthur Ko, Choli Lee, Joshua D. Smith, Emily H. Turner, Ian B. Stanaway, Benjamin Vernot, Maika Malig, Carl Baker, Beau Reilly, Joshua M. Akey, Elhanan Borenstein, Mark J. Rieder, Deborah A. Nickerson, Raphael Bernier, Jay Shendure, and Evan E. Eichler

Supplementary Table of Contents

Supplementary Discussion ........................................................................................................... 3

Supplementary Figure 1. Distribution of nonverbal intelligence quotient (NVIQ) of the SSC189 sample based on different mutation groupings. ......................................................... 13

Supplementary Figure 2. Brower views showing complex de novo mutation events. ........... 14 Supplementary Figure 3. Confirmed event showing weak allele signature. ......................... 15

Supplementary Figure 4. Observed number of mutation events fits the expected Poisson distribution. ................................................................................................................................. 16

Supplementary Figure 5. Multivariate analysis to examine effect of number of “extreme” de novo coding mutations and the presence of a CNV. ............................................................ 17

Supplementary Figure 6. Ratio of samples with various mutation types binned by NVIQ. 18 Supplementary Figure 7. Distribution of locus specific mutation rates bases on human-chimp comparisons. .................................................................................................................... 19 Supplementary Figure 8. DYRK1A falls in a Down Syndrome critical region disrupted by CNVs. ........................................................................................................................................... 20 Supplementary Figure 9. Histograms of Network statistics for 10,000 simulated null networks. ...................................................................................................................................... 21 Supplementary Figure 10. Protein-protein interaction network based prioritization of 126 gene products with severe mutations. ....................................................................................... 22 Supplementary Figure 11. Top interaction network from IPA analysis of 126 genes with severe mutations. ......................................................................................................................... 23

Supplementary Figure 12. De novo mutations in 12752.p1. .................................................... 24 Supplementary Figure 13. Estimating the number of genes contributing to sporadic autism pathogenicity via recurrent de novo mutation. ......................................................................... 25 Supplementary Table 1. Summary of sequenced families, including sex, parental age, NVIQ, CNV pre-screening, trio bases screened, and point mutations. ................................. 26 Supplementary Table 2. Summary of Exome Sequencing Results from 209 ASD Families 27

Supplementary Table 3. All 242 de novo point mutations found in 189 trios. ...................... 28 Supplementary Table 4. Comparison of mutation rates between O’Roak et al. and Sanders et al. .............................................................................................................................................. 29

SUPPLEMENTARY INFORMATION

2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! #!

Supplementary Table 5. De novo events identified in 50 unaffected siblings and 20 pilot probands. ..................................................................................................................................... 30

Supplementary Table 6. 70 rare inherited and 6 de novo CNVs identified in 122 trios. ...... 31 Supplementary Table 7. Expanded top de novo ASD risk contributing mutations* ............ 39

Supplementary Table 8. Mutation rates and probability of recurrence for genes with >1 mutation. ...................................................................................................................................... 42

Supplementary Table 9. Selected inherited hemizygous and compound heterozygous sites........................................................................................................................................................ 43

Supplementary Table 10. List of the 21 severe de novo mutations that map to regions of recurrent CNV associated with Developmental Delay and ASD. ........................................... 44

Supplementary Table 11. Other mutations intersecting previous CNV loci and animal models for ASD. .......................................................................................................................... 45

Supplementary Table 12. List of the 126 genes/proteins with severe mutations used for the PPI, along w/ summary stats. ..................................................................................................... 46

Supplementary Table 13. Top IPA function for the PPI connected component. .................. 48 Supplementary References ......................................................................................................... 57

W W W. N A T U R E . C O M / N A T U R E | 3

SUPPLEMENTARY INFORMATION RESEARCH

! $!

Supplementary Discussion

Sample Overlap

Three of the previously reported families (12325, 12680, and 12647) are the only samples known

to overlap with other studies (Sanders et al. 2012).

Rate of De Novo CNVs

We expected the de novo CNV rate for this cohort would be less than for other ASD cohorts as

77% (94/122) had previously been screened negative for large, disruptive de novo events.

Nonetheless, our observed rate of de novo CNVs (6/122, ~5%) is in line with other recent

estimates for ASD1,2 owing possibly to the increased resolution of detecting gene disruptions

with exome sequencing.

Effect of Multiple Genetic Lesions on Intellectual Functioning

We performed a multivariate analysis to examine effect of number of “extreme” de novo coding

mutations (0, 1 or 2 or more) and the presence of either de novo or rare inherited copy number

variation (122/189 probands) on nonverbal IQ (NVIQ) and verbal IQ (VIQ) (Supplementary

Fig. 5). Extreme mutations (n = 62) were defined as de novo protein truncating, intersections

with known OMIM and ASD candidate genes, and CNVs predicted to be gene breaking and

pathogenic. In the sample of 122 individuals for whom CNV analysis had been completed, we

observed a significant decrease in NVIQ with increased numbers of events (F(2,116) = 5.45,

p<.01, partial !2 = 0.09), but not in VIQ (F(2,116) = 1.13, p = ns, partial !2 = 0.02). This result

in NVIQ was strengthened, but not exclusively driven, by the presence of CNVs (F(2,116) =

0.97, p = ns, partial !2 = 0.02); there was no main effect of strictly having a CNV on cognitive

ability (F(2,116) = 0.71, p = ns, partial !2 = 0.006). Post hoc analyses indicated individuals with

SUPPLEMENTARY INFORMATION

4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! %!

one and two or more events scored significantly lower in NVIQ than individuals with no events

(mean difference = 18.0 points, p < 0.05, Cohen’s d = 0.63; 38.5 points, p < 0.01, d = 1.69;

respectively). The significant difference in NVIQ between individuals with no de novo coding

mutations and those individuals with two or more mutations was also observed with the complete

sample of 189 individuals (F(2,186) = 6.129, p<.01, partial !2 = 0.06) (Fig. 1c).

IPA Analysis

Within our 49 PPI network members, IPA detected the most significant functional enrichment in

Gene Expression (B-H p-value 9.45E-03-8.57E-02), Behavior (B-H p-value 9.45E-03-8.57E-02),

Organismal Development (B-H p-value 9.45E-03-7.74E-02), Embryonic Development (B-H p-

value 9.45E-03-8.01E-02), and Nervous System Development and Function (B-H p-value 9.45E-

03-8.91E-02) (Supplementary Table 13).

We then performed an additional IPA analysis on the 126 genes identified in 209 samples.

The top interconnected network consists of 22 genes (15 of which are PPI members), of which

CTNNB1 is a central node (Supplementary Fig. 11). To further investigate the potential role of

CTNNB1 interactors in autism, we selected all direct upstream interacting genes from beta-

Catenin in IPA and noted that 8/358 (p = 0.0030) were present in our mutation list. Furthermore,

we note that CTNNB1 is directly linked to multiple highly interconnected genes in the PPI

network (MYBBP1A, PBRM1, RUVBL1, TBL1XR1, and CHD8), suggesting that additional

mutated genes involved in CTNNB1 function are represented in autism. This enrichment for

CTNNB1 interactors further supports the hypothesis that the WNT/beta-catenin pathway may

play a role in the etiology of autism3.

W W W. N A T U R E . C O M / N A T U R E | 5

SUPPLEMENTARY INFORMATION RESEARCH

! &!

Phenotyping Summaries for Selected Families

Family 13844. Proband is second of three children with an older sister (13844.s1) and younger

brother (13844.s2).

Patient ID: 13844.fa

Summary: Father is an adult non-Hispanic white male. Age at conception of proband is 40.

Normative range of social responsiveness, but elevated score for rigidity on broader autism

phenotype. Some signs of alcoholism (use, attempting to cut down, annoyed by criticism about

drinking, feeling bad about drinking, eye opening experience). No medication use endorsed for

current or past. Some college education. Annual household income = $101–130K. Father has

head circumference of 58.5 cm (z = 1.57) and normative BMI. No comorbid diagnoses endorsed.

Patient ID: 13844.mo

Summary: Mother is an adult non-Hispanic white female. Age at conception of proband is 35.

Normative range of social responsiveness. No evidence of broader autism phenotype. Antibiotics

taken during second trimester of pregnancy with proband. Currently taking thyroid medication

and antidepressant (not taken during pregnancy). Endorsement of current tobacco use and past

marijuana use. Some college education. Annual household income = $101–130K. Mother has

head circumference of 54 cm (z = -.41) and normative BMI. No comorbid diagnoses endorsed.

Patient ID: 13844.s1

Summary: Sibling is a non-Hispanic white 10-year-old female. Normative adaptive scores and

social responsiveness from parent and teacher noted. Behavioral elevations for somatic problems

and complaints. Mother was prescribed an unspecified hormone treatment to aid with growth in

past (not currently taking). No other endorsement of medication use. Head circumference of 54

cm (z = 0.96) and normative BMI. No comorbid diagnoses endorsed. Cognitive decline

following Ebstein-Barr virus reported by parents.

SUPPLEMENTARY INFORMATION

6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! '!

Patient ID: 13844.s2

Summary: Sibling is a non-Hispanic white 5-year-old male. Adaptive scores not available.

Normative social responsiveness from parent and teacher. No behavioral elevations across any

domain. No endorsement of medication use. Head circumference of 52 cm (z = 0.15) and BMI

suggestive of being underweight. No comorbid diagnoses endorsed.

Patient ID: 13844.p1

Event: de novo CHD8 truncating, de novo CUBN truncating, 2X inherited CNV

Summary: Patient is a 99-month-old non-Hispanic white male diagnosed with autism. Extremely

low VIQ (20), NVIQ (34), and adaptive (59) scores. Clinical range deficits in social

responsiveness (120). Possible loss of language skills during development and elevated social

withdrawal behaviors with no comorbid diagnoses. Large head (z = 2.62) and normal BMI. Food

allergies (gluten and casein). Gastrointestinal constipation diagnosis with bloating and abdominal

pain. Roseola diagnosed at 2.5 years and Epstein bar virus contracted at 8 years. Respiratory

problems diagnosed at 11 months and kidney problems diagnosed at 9 months. No diagnosis of

cardiac or metabolic syndromes noted. No report of congenital anomalies. Family history of

Down syndrome (maternal cousin). NICU admission shortly after birth with oxygen treatment.

Meconium aspiration at birth. Family history among several members for migraines. Currently

on GFCF diet. Took asthma medication in the past but not currently.

Family 12752. Proband is an only child.

Patient ID: 12752.fa

Summary: Patient is an adult non-Hispanic white male. Age at conception of proband is 38.

Normative range social responsiveness. Elevated score for aloofness and pragmatic social skills.

W W W. N A T U R E . C O M / N A T U R E | 7

SUPPLEMENTARY INFORMATION RESEARCH

! (!

Diagnosis of diabetes. Current tobacco and alcohol use endorsed. Current and past use of

antihypertensive meds and medication for high cholesterol. Past use of sedatives and pain killers.

Some college education. Annual household income = $36–50K. Father has a head circumference

of 59.5 cm (z = 1.56). BMI information unavailable.

Patient ID: 12752.ma

Summary: Patient is an adult non-Hispanic white female. Age at conception of proband is 36.

Normative range of social responsiveness. No presence of broader autism phenotype. No

endorsement of medications currently or during pregnancy with proband. Current tobacco and

alcohol use endorsed. Some college education. Annual household income = $36–50K. Mother

has head circumference of 54 cm (z = -.41). BMI information unavailable. Mother has been

diagnosed with heart disease.

Patient ID: 12752.p1

Event: de novo CHD8 truncating, de novo ETFB truncating, de novo IQGAP2 truncating

Summary: Patient is a 55-month-old non-Hispanic white female diagnosed with autism.

Normative range VIQ (90) and NVIQ (93) with low adaptive behavior skills (59). Clinical range

deficits in social responsiveness (90). Clinical elevations in attention problems, internalizing

problems, and affective problems with no comorbid diagnoses. Large head (z = 2.40) and BMI

indications of being underweight. No loss or regression of language skills. Diagnosis of chronic

constipation, ongoing from 3.5 months with intermittent episodes of abnormal stool.

Coordination problems noted since 3.5 months. No cardiac or metabolic syndromes noted. No

report of congenital anomalies. Hyperbilirubinema diagnosis with phototherapy shortly after

birth, no complications after treatment.

SUPPLEMENTARY INFORMATION

8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! )!

Patient ID: 11660.p1

Event: de novo NTNG1 missense, inherited CNV

Summary: Patient is a 60-month-old non-Hispanic white female diagnosed with autism. Low

range VIQ (63) and NVIQ (60) with low adaptive skills (65). Clinical range deficits in social

responsiveness (90). No language loss or regression noted. Clinical elevations in withdrawn

behaviors, attention difficulties, and affective problems with no comorbid diagnoses. Large head

(z = 2.5) with BMI indications of being underweight. Improvement in repetitive behaviors during

fever symptoms noted by parents. No cardiac, metabolic, or autoimmune syndromes noted.

Dysmorphology assessment indicating nondysmorphic features. No noted congenital anomalies.

Patient ID: 12532.p1

Event: de novo NTNG1 missense, de novo NAA40 missense

Summary: Patient is a 141-month-old non-Hispanic white male diagnosed with autism. Very

high VIQ (135) and normative range NVIQ (110) with low adaptive behavior composite scores

(71). Clinical range elevations in social responsiveness (74) with word loss regression occurring

early in development. Borderline and clinical range problems with attention, internalizing, and

affective problem behaviors with no comorbid diagnoses. Normal head circumference with BMI

suggesting underweight. Penicillin allergy noted beginning at 5 years of age. Diagnosis of

chronic otitis media at approximately age 6. No cardiac, metabolic, or autoimmune syndromes

noted. No report of congenital anomalies.

W W W. N A T U R E . C O M / N A T U R E | 9

SUPPLEMENTARY INFORMATION RESEARCH

! *!

Patient ID: 13733.p1

Event: de novo CHD7 missense, inherited CNV

Summary: Patient is a 160-month-old non-Hispanic white female diagnosed with autism.

Normative VIQ scores (90) with very low NVIQ scores (68) and adaptive scores (69). No

regression or loss of language. Borderline range anxiety scores and with no comorbid mental

health diagnoses. Normal head circumference and BMI. Vision problems with correction. No

hearing deficits. Diagnosis of Tourette’s Syndrome at six years. Myringotomy procedure for

recurrent problems with otitis media. Diagnosis of respiratory problems but no diagnosis of

cardiac or metabolic syndromes. No report of congenital anomalies.

Patient ID: 11390.p1

Event: de novo PTEN missense

Summary: Patient is a 99-month-old non-Hispanic white female diagnosed with autism. Very

low VIQ (57) and low NVIQ (77) with low average adaptive behavior skill scores (79). Clinical

range elevations in social responsiveness (90). Language regression and word loss noted in early

development as well as occurrence of nonfebrile seizures. Borderline and clinical range problems

with social withdrawal, attention, and affective problematic behaviors with no comorbid

diagnoses. Large head (z = 2.84) with normal range BMI scores. Chronic unusual stools noted

from 6 months of age. Chronic otitis media diagnoses at 2.5 years of age with noted

improvements in repetitive behaviors during periods of fever. No cardiac, metabolic, or

autoimmune syndromes noted. Dysmorphology assessment indicating nondysmorphic features.

No report of congenital anomalies. Sleep difficulties noted for falling asleep with night time

incontinence. Normal menstrual cycle and pubertal changes taking place. Mood stabilizer

medication used in the past but not current. Special education services in school 100% of time

SUPPLEMENTARY INFORMATION

1 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! "+!

since age 3 and continuing to current. Occupational therapy services 1 hour per week year round

beginning at age 3 continuing to current.

Patient ID: 12346.p1

Event: de novo MBD5 truncating, de novo MYBBP1A missense, de novo PBRM1 missense

Summary: Patient is a 833-month-old non-Hispanic white male diagnosed with autism.

Normative VIQ (106) with low NVIQ (77) and adaptive behavior skills (64). Clinical range

elevations in social responsiveness (90). No language loss or regression noted in early

development. Clinical elevations in withdrawn behaviors and no comorbid diagnoses. Normative

head circumference and BMI. Chronic constipation diagnosed by PCP at 3 years of age.

Coordination problems diagnosed by PCP at 16 months. Suspected cerebral palsy (unsure) noted

at 1 year of age by orthopedist. Grand mal seizure occurrences beginning at 1 month and

occurring approximately once per month in frequency. Chicken pox contraction at four years.

Chronic Otitis Media and intermittent strep throat occurrences. No report of congenital

anomalies.

Patient ID: 13890.p1

Event: de novo DYRK1A truncating

Summary: Patient is a 164-month-old non-Hispanic white female diagnosed with autism. Very

low VIQ (26) and NVIQ (42) and adaptive behavior skills (41). Clinical range elevations in

social responsiveness (82). No language loss or regression noted in development. No clinical

elevations in behavior ratings from parents or teacher and no comorbid diagnoses. Small head (z

= -1.64) and BMI suggestive of being overweight. Vision difficulties with correction. Pollen

allergies. Intermittent constipation (undiagnosed). Chronic otitis media diagnosed by PCP at age

W W W. N A T U R E . C O M / N A T U R E | 1 1

SUPPLEMENTARY INFORMATION RESEARCH

! ""!

5. Surgery on release tendons and ligaments in right foot. Dysmorphology assessment indicates

nonspecific dysmorphic features with microcephaly but no evidence of known syndrome.

Abnormal hair growth, ear structure, nose size, face size, philtrum, mouth, lips, fingers,

fingernails, and feet noted upon exam.

Patient ID: 12933.p1

Event: de novo SETBP1 truncating, de novo MYO7B missense, de novo OR10Z1 missense

Summary: Patient is a 120-month-old non-Hispanic white male diagnosed with autism. Very low

VIQ (44) and NVIQ (41) and low adaptive behavior skills (68). Clinical range elevations in

social responsiveness (85). No language loss or regression noted in development. Borderline and

clinical elevations in anxious/depressed, attention deficit, aggression, internalizing, affective

problems, oppositional problems, and externalizing behavior with no comorbid diagnoses.

Normative head circumference and BMI suggestive of being underweight. Vision difficulties

with correction. Food allergies diagnosed by PCP at 4 months. Intermittent problems with

vomiting diagnosed at 5 years old. Chronic acid reflux diagnosed at 7 years of age. Excessive

clumsiness and coordination problems suspected beginning at age 5. Surgery at 1.1 years for

undescended testicle. Adenoids removed at 7 years. Dysmorphology assessment indicates

nonspecific dysmorphic features without microcephaly. Abnormal ear structure, nose size, face

size, teeth, hands, fingers, thumbs, and fingernails noted upon exam.

Patient ID: 11834.p1

Event: inherited 16p12 duplication

Summary: Patient is a 126-month-old non-Hispanic white male diagnosed with autism. Very low

VIQ (43) and normative NVIQ (93) with very low adaptive behavior skills (57). Elevations in

SUPPLEMENTARY INFORMATION

1 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! "#!

social responsiveness (75) with no word loss or regression in early development. Borderline

elevations in anxiety problems with no comorbid diagnoses. Large head (z = 2.29) and normative

BMI scores. Diagnosed with Tourette/tics at age 7. Diagnosed with roseola at age 1. No report of

congenital anomalies.

W W W. N A T U R E . C O M / N A T U R E | 1 3

SUPPLEMENTARY INFORMATION RESEARCH

! "$!

NVIQ Samples w/ "top candidate mut"

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ "extreme mut"

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ SSC189

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ any non-syn

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ severe non-syn

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ truncating

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ "top candidate mut"

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ "extreme mut"

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

!"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!!!1.%22!!!34%22!!!4.%22!!!56%31!!!65%22!!&.4%22!!

NVIQ SSC189

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ any non-syn

NVIQFrequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ severe non-syn

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

NVIQ Samples w/ truncating

NVIQ

Frequency

20 40 60 80 100 120 140

05

1015

2025

!"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!!!1.%22!!!3&%22!!!56%22!!!57%18!!!68%53!!&.4%22!

!"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!!!1.%22!!!33%13!!!55%32!!!57%.6!!!61%53!!&.4%22!!

!"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!!!1.%22!!!86%22!!!55%32!!!58%88!!!68%22!!&.4%22!!

!"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!!!1.%22!!!86%32!!!53%32!!!52%35!!!43%13!!&.4%22!!

!"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!!!1.%22!!!3&%22!!!58%22!!!52%67!!!47%22!!&.4%22!! !

Supplementary Figure 1. Distribution of nonverbal intelligence quotient (NVIQ) of the

SSC189 sample based on different mutation groupings.

Histograms in each panel show the distribution of samples based on those having one or more

event fitting each mutational category. Initial distribution was approximately bimodal. Summary

statistics for each distribution are listed below. Red line indicates NVIQ of 70, the general

threshold of intellectual disability.

SUPPLEMENTARY INFORMATION

1 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! "%!

!"

#"

!

Supplementary Figure 2. Brower views showing complex de novo mutation events.

a, Proband reads show deletion of a G base and G/T substitution (Top), neither event is present

in the father (middle) or mother (bottom) tracks. b, Proband reads show an exonic G/C

substitution and intronic T/C substitution (Top), neither event is present in the father (middle) or

mother (bottom) tracks.

W W W. N A T U R E . C O M / N A T U R E | 1 5

SUPPLEMENTARY INFORMATION RESEARCH

! "&!

!" #" Fa-For Fa-Rev Mo-For Mo-Rev P1-For P1-Rev

Supplementary Figure 3. Confirmed event showing weak allele signature.

a, Proband reads show G/A substitution at 24% frequency (Top), event is present in the father

(middle) or mother (bottom) tracks. b, Sanger traces in both forward (For) and reverse (Rev)

confirm event and show reduced signal suggesting somatic mosaicism.

SUPPLEMENTARY INFORMATION

1 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! "'!

!"

#!"

$!"

%!"

&!"

'!"

(!"

)!"

*!"

!" #" $" %" &" '"

+,-

./0"1

2"301.4

567"

+,-./0"12"8,94:15";</597"

==>#*?"301.467"

31@7715"

Supplementary Figure 4. Observed number of mutation events fits the expected Poisson

distribution.

We calculated the expected counts of mutations per person from the Poisson distribution based

on the observed trio mutation rate of 1.28. The observed number of counts per probands matches

closely to this distribution.

W W W. N A T U R E . C O M / N A T U R E | 1 7

SUPPLEMENTARY INFORMATION RESEARCH

! "(!

Supplementary Figure 5. Multivariate analysis to examine effect of number of “extreme”

de novo coding mutations and the presence of a CNV.

Extreme mutations (n = 62) were defined as de novo protein-truncating intersections with known

OMIM and ASD candidate genes and CNVs predicted to be gene breaking and pathogenic.

Boxplots show the samples with and without a CNV, either de novo or rare inherited

(Supplementary Discussion). We observed a significant decrease in NVIQ with increased

numbers of events (F(2,116) = 5.45, p<.01, partial !2 = 0.09), but not in VIQ (F(2,116) = 1.13, p

= ns, partial !2 = 0.02). This result in NVIQ was strengthened, but not exclusively driven, by the

presence of CNVs (F(2,116) = 0.97, p = ns, partial !2 = 0.02); there was no main effect of

strictly having a CNV on cognitive ability (F(2,116) = 0.71, p = ns, partial !2 = 0.006).

SUPPLEMENTARY INFORMATION

1 8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! ")!

!"#$%$&%'"()*+,%

!"

!#$"

!#%"

!#&"

!#'"

("

)*+"," -."," ./0/1/"," 234","

567!"879:"

;7!"8((&:"

<"

="

!"

!#$"

!#%"

!#&"

!#'"

("

)*+"," -."," ./0/1/"," 234","

56%!"8(&:"

%(>??"8$':"

?&>7!"8$@:"

7(>'?"8$7:"

'&>(!!"8%7:"

(!(>((?"8$&:"

;6((&"8(&:"

!Supplementary Figure 6. Ratio of samples with various mutation types binned by NVIQ.

The proportion of individuals within various NVIQ bins with 1+ event across “disruptive”

classification schemes: any event, any nonsynonymous event, any severe nonsynonymous event,

and our “top candidate” list (Table 1). a Grouped by IQ standard deviations. We observed a

nonsignificant trend by Fisher’s exact test (lower IQ higher probability of 1+ de novo event)

for any event (p = 0.096), any nonsynonymous event (p = 0.064), and any severe

nonsynonymous event (p = 0.052), but conflicting results for top candidates (p = 0.19). b

Grouped by high (>70) and low (<=70) to increase power. Strongest p-value 0.032, severe

nonsynonymous, again suggesting a nonsignificant trend. Error bars are 95% confidence

intervals.

W W W. N A T U R E . C O M / N A T U R E | 1 9

SUPPLEMENTARY INFORMATION RESEARCH

! "*!

Supplementary Figure 7. Distribution of locus specific mutation rates bases on human-

chimp comparisons.

Red triangles represent CHD8: 3.95E-09, NTNG1: 2.34E-09, LAMC3: 1.73E-08, SCN1A: 8.83E-

09, GRIN2B: 6.94E-09, FOXP2: 7.01E-09, FOXP1: 5.51E-09, and GRIN2A: 9.38E-09.

SUPPLEMENTARY INFORMATION

2 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! #+!

chr21 37650000 37700000 37750000 37800000 37850000 37900000

21q22.13DYRK1A

Cases

Controls

Supplementary Figure 8. DYRK1A falls in a Down Syndrome critical region disrupted by

CNVs.

Each red (deletion) and green (duplication) line represents an identified CNV in cases (solid

lines) versus controls (dashed lines), with arrowheads showing point mutations.

W W W. N A T U R E . C O M / N A T U R E | 2 1

SUPPLEMENTARY INFORMATION RESEARCH

! #"!

Supplementary Figure 9. Histograms of Network statistics for 10,000 simulated null

networks.

Left: Number of edges in the network. Right: Clustering coefficient of the network. Red dotted

line indicates the corresponding value in the experimentally determined network.

0 50 100 150 200 2500

0.5

1

1.5

2

2.5

3

3.5

4Pe

rcen

t of N

ull N

etw

orks

Number of Edges0 0.05 0.1 0.15 0.2 0.25

0

0.5

1

1.5

2

2.5

3

3.5

4

Clustering Coefficient

SUPPLEMENTARY INFORMATION

2 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! ##!

0 2000 4000 6000 8000 10000 12000

Ranking of Gene Products!

Proband Unconnected! Proband Connected! Sibling Events! !Supplementary Figure 10. Protein-protein interaction network based prioritization of 126

gene products with severe mutations.

X-axis represents the ranking of gene products from GeneMANIA PPI database using ASD

associated gene products from Betancaur et al. as a seed. The red lines represents the gene

products (49) that are in the largest connected component of the PPI network generated using the

proband severe disruptive de novo events, gray lines represents the gene products not in the

connected component, and the blue lines represent the gene products with severe de novo

mutations in the unaffected siblings. The gene products of the connected component ranks

significantly higher compared to all other gene products (Mann-Whitney U one-sided, p < 1.6E-

08) whereas the unconnected gene products do not (Mann-Whitney U, one-sided p < 0.28).

Similarly, the siblings events compared to all other gene products do not show significant

rankings (Mann-Whitney U, one-sided p < 0.26).

W W W. N A T U R E . C O M / N A T U R E | 2 3

SUPPLEMENTARY INFORMATION RESEARCH

! #$!

Supplementary Figure 11. Top interaction network from IPA analysis of 126 genes with

severe mutations.

Displayed is the highest scoring network from IPA analysis. Solid lines represent direct

interactions while dashed lines indicated indirect connections. Genes not found in the PPI

connected component are marked in yellow, while those in the PPI connected component are

marked in blue.

SUPPLEMENTARY INFORMATION

2 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! #%!

./0!1

!"#$%

2.)%

2.)1

)."2

.)37%

&3#-(

+'9(5(

2!34

)&)'

)&++(%

*+'%,-

)0),1

$&'.

)!08

564.#1

$&'(

3 x

! "#$%&'()%*!+,-./!CHD8! "#$%&'()%*!0,1!IQGAP2! "#$%&'()%*!0,1!ETFB

12752.p1

3 de novo Gene Disruptions

b

!

Supplementary Figure 12. De novo mutations in 12752.p1.

GeneMANIA4 view of three de novo truncating mutations (red labels) which encode proteins

that are part of a beta-catenin linked network. The proband is macrocephalic (z = 2.4) with

normal cognitive ability (VIQ = 90, NVIQ = 93) but has adaptive behavior deficits (Vineland

Standard Score = 59) with significant social impairments. (Supplementary Discussion).

W W W. N A T U R E . C O M / N A T U R E | 2 5

SUPPLEMENTARY INFORMATION RESEARCH

! #&!

!

Supplementary Figure 13. Estimating the number of genes contributing to sporadic autism

pathogenicity via recurrent de novo mutation.

Left: Estimate based on 120 de novo mutations especially likely to be pathogenic (including both

observed doubletons) based on GERP and Grantham score. Right: Estimate based on all de novo

nonsynonymous events. Two different estimators for the “unseen species problem” were used.

01000

2000

3000

4000

5000

Number of genes implicated in autism 120 de novo SNPs; 2 doublets observed

Fraction of de novo SNPs considered pathogenic

Est

imat

ed n

umbe

r of a

utis

m g

enes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

01000

2000

3000

4000

5000

Number of genes implicated in autism 181 de novo SNPs; 2 doublets observed

Fraction of de novo SNPs considered pathogenic

Est

imat

ed n

umbe

r of a

utis

m g

enes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

992 genes

182 genes

Sanders (2010) Chao and Lee (1992)

Estimator Used

SUPPLEMENTARY INFORMATION

2 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! #'!

Supplementary Table 1. Summary of sequenced families, including sex, parental age,

NVIQ, CNV pre-screening, trio bases screened, and point mutations.

See attached Excel document

W W W. N A T U R E . C O M / N A T U R E | 2 7

SUPPLEMENTARY INFORMATION RESEARCH

! #(!

Supplementary Table 2. Summary of Exome Sequencing Results from 209 ASD Families

SNV Type Total Average Silent Missense Nonsense or Read-Through

Splice Ti/Tv

% CpG

Ti SSC189

Pro all 3,513,050 18,588 1,859,735 1,631,320 16,449 5,546 3.27 20.9

rare 32,028 169 10,682 20,515 581 250 2.87 31.1

DN 225 1.19 61 145 16 3 2.69 36.0 SSC189

Sib (31)& DN 29 0.94 9 19 1 0 3.14 41.4

Pilot Pro (20) DN 17 0.85 7 9 0 1 7.50 41.1

Pilot Sib (19) DN 21 1.11 7 12 2 0 1.63 38.1

Indels Type Total Average 3n Indels Truncating Indels Splice

SSC189 Pro all 42,929 227 18,817 23,270 842

rare 806 4.26 222 561 23

DN 17 0.09 2 15 0 SSC189 Sib (31) DN 0 0 0 0 0

Pilot Pro (20) DN 1 0.05 0 1 0

Pilot Sib (19) DN 0 0 0 0 0

Estimated Mutation

Rates

All Events

Screened Bases* All Rate Sub Rate !

SSC189 Pro 242 11,144,644,963 2.17

(1.90-2.46) 1.94

(1.68-2.21) SSC189 Sib (31) 29 1,838,730,129 1.57

(1.05-2.26) 1.52

(1.01-2.20)

Pilot Pro (20) 18 937,498,416 1.92

(1.14-3.04) 1.81

(1.05-2.90) Pilot Sib

(19) 21 891,825,463 2.35 (1.45-3.59)

2.24 (1.36-3.46)

&Number of children. *Count of all bases screened in child (minimum 8X), i.e. concordant in each member of the trio. !Mutations/base/generation x10^-8 with 95% Confidence intervals. ! Substitutions only, excludes indels and possible somatic mosaics. Pro = proband, Sib = unaffected sibling, DN = de novo, SSC = Simons Simplex Collection

SUPPLEMENTARY INFORMATION

2 8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! #)!

Supplementary Table 3. All 242 de novo point mutations found in 189 trios.

See attached Excel document

W W W. N A T U R E . C O M / N A T U R E | 2 9

SUPPLEMENTARY INFORMATION RESEARCH

! #*!

Supplementary Table 4. Comparison of mutation rates between O’Roak et al. and Sanders

et al.

Site/Type # of Trios

Bases Screened

Total De novo Substitutions Indels Mosaic

events

Combined Rate and 95% CI (x10-8)

Strict Substitution Rate and

95% CI (x10-8)

UW-189 Probands 189 11,144,644,963 242 216 17 9 2.17

(1.90-2.46) 1.94

(1.68-2.21) UW-31

Matched Probands

31 1,830,218,983 42 39 1 2 2.29 (1.65-3.10)

2.13 (1.51-2.91)

UW-31 Matched Siblings

31 1,838,730,129 29 28 0 1 1.57 (1.05-2.26)

1.52 (1.01-2.20)

UW-Pilot Probands 20 937,498,416 18 17 1 0 1.92

(1.14-3.04) 1.81

(1.05-2.90) UW-Pilot Matched Probands

19 890,665,738 18 17 1 0 2.02 (1.20-3.19)

1.91 (1.11-3.05)

UW-Pilot Matched

Sibs 19 891,825,463 21 20 0 1 2.35

(1.45-3.59) 2.24

(1.36-3.46)

UW-All Probands 209 12,082,143,379 260 233 18 9 2.15

(1.90-2.43) 1.93

(1.69-2.19)

Yale-

Quartet Probands

200 9,718,659,190 156 153 3 0 1.61 (1.34-1.87)

1.57 (1.31-1.84)

Yale-Quartet Siblings

200 9,718,659,190 124 124 0 0 1.28 (1.05-1.50)

1.28 (1.05-1.50)

Yale-Trio Probands 24 1,082,156,674 15 15 0 0 1.39

(0.67-2.10) 1.39

(0.67-2.10) Yale-All

Probands 225 10,800,815,864 171 168 3 0 1.58 (1.33-1.83)

1.56 (1.31-1.80)

UW: Variant calling threshold was a minimum of 8x in each member of a trio. Bases screened were calculated based on trio concordant positions at 8x (n) and converting to diploid bases (2n), adjusting for sex chromosomes. Probands and siblings were calculated separately as trio units. Observed mutation rate was calculated by dividing the total number of events by total number of bases. The exact 95% Poisson confidence intervals were generated from the observed counts and then dividend by the total number of bases to get the rate confidence intervals.

Yale: The variant calling threshold was a minimum of 20x unique reads in each member of the quartet and a minimum of 8x unique non-reference reads in the child. Bases screened were estimated by assessing the number of nucleotides in each family with a minimum of 20x unique reads in each member of the quartet and converting to diploid bases (2n). Observed mutation rates were calculated by dividing the total events per sample by the number of nucleotides analyzed per sample and averaging across individuals. The 95% confidence intervals were calculated from the variance of this measure.

SUPPLEMENTARY INFORMATION

3 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! $+!

Supplementary Table 5. De novo events identified in 50 unaffected siblings and 20 pilot

probands.

See attached Excel document

W W W. N A T U R E . C O M / N A T U R E | 3 1

SUPPLEMENTARY INFORMATION RESEARCH

! $"!

Supplementary Table 6. 70 rare inherited and 6 de novo CNVs identified in 122 trios.

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

chr20 6706891 8271234 11013.p1 Gain Inherited Mother BMP2 (Gene Broken), HAO1, TMX4, PLCB1 (Gene Broken)

chr2 197997071 198254315 11023.p1 Gain Inherited Mother SF3B1 (Gene Broken), COQ10B, HSPD1, HSPE1, MOBKL3, RFTN2

chr2 208742960 208763173 11023.p1 Loss Inherited Mother C2orf80 (Gene Broken)

chr2 33593707 36437270 11064.p1 Gain Inherited Mother RASGRP3 (Gene Broken), FAM98A, MYADML, CRIM1 (Gene Broken)

chr13 49022622 49083205 11083.p1 Gain Inherited Father RCBTB1 (Gene Broken)

chr4 47538 117452 11093.p1 Gain Inherited Mother ZNF595 (Gene Broken), ZNF718 (Gene Larger Than CNV)

chr8 13295432 15139503 11141.p1 Gain Inherited Mother DLC1 (Gene Broken), C8orf48, SGCZ (Gene Broken), MIR383

chr19 48612452 48651954 11141.p1 Loss Inherited Mother TEX101 (Gene Broken)

chr16 79736150 79748195 11184.p1 Loss Inherited Father PKD1L2 (Gene Larger Than CNV)

chr4 108066404 109130669 11190.p1 Gain Inherited Father DKK2 (Gene Broken), PAPSS1, SGMS2, CYP2U1, HADH (Gene Broken)

chr4 5781934 5823956 11224.p1 Gain Inherited Father EVC (Gene Larger Than CNV)

chr7 33097353 33153804 11346.p1 Gain Inherited Mother RP9, BBS9 (Gene Broken)

chr5 157006525 157051157 11375.p1 Loss Inherited Father SOX30 (Gene Broken), C5orf52

chr7 11117201 12440046 11398.p1 Gain Inherited Father PHF14 (Gene Broken), THSD7A, TMEM106B, VWDE

SUPPLEMENTARY INFORMATION

3 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! $#!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

chr12 43879949 43967314 11452.p1 Loss Inherited Mother PLEKHA9 (Gene Broken), ANO6 (Gene Broken)

chr6 88374109 88424279 11459.p1 Loss Inherited Father ORC3L (Gene Larger Than CNV)

chr1 173697478 174026214 11469.p1 Gain Inherited Father TNR (Gene Broken)

chr5 112931249 113726857 11469.p1 Gain Inherited Father YTHDC2 (Gene Broken), KCNN2 (Gene Broken)

chr6 26077937 26393662 11480.p1 Gain Inherited Father TRIM38 (Gene Broken), HIST1H1A, HIST1H3A, HIST1H4A, HIST1H4B, HIST1H3B, HIST1H2AB, HIST1H2BB, HIST1H3C, HIST1H1C, HFE, HIST1H4C, HIST1H1T, HIST1H2BC, HIST1H2AC, HIST1H1E, HIST1H2BD, HIST1H2BE, HIST1H4D, HIST1H3D, HIST1H2AD, HIST1H2BF, HIST1H4E, HIST1H2BG, HIST1H2AE, HIST1H3E, HIST1H1D, HIST1H4F, HIST1H4G, HIST1H3F, HIST1H2BH, HIST1H3G, HIST1H2BI, HIST1H4H (Gene Broken)

chr8 15992606 16066256 11556.p1 Loss Inherited Father MSR1 (Gene Broken)

chr3 147531173 147649939 11653.p1 Gain Inherited Father PLSCR2 (Gene Broken)

chr2 110167952 110510396 11660.p1 Loss Inherited Mother MIR4267, MALL, NPHP1, NCRNA00116

chr7 16805611 17800441 11696.p1 Gain Inherited Father AGR2 (Gene Broken), AGR3, AHR, SNX13 (Gene Broken)

W W W. N A T U R E . C O M / N A T U R E | 3 3

SUPPLEMENTARY INFORMATION RESEARCH

! $$!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

chr3 37265222 37416340 11696.p1 Loss de novo NA GOLGA4, C3orf35 (Gene Broken)

chr1 205383785 205599637 11707.p1 Gain Inherited Father C4BPA (Gene Broken), CD55 (Gene Broken)

chr9 28181261 28338013 11707.p1 Loss Inherited Mother LINGO2 (Gene Larger Than CNV)

chr17 3946651 4368839 11707.p1 Gain Inherited Father ZZEF1 (Gene Broken), CYB5D2, ANKFY1, UBE2G1, SPNS3, SPNS2 (Gene Broken)

chr2 160248507 160313660 11711.p1 Gain Inherited Mother MARCH7 (Gene Broken)

chr10 67942427 68177427 11711.p1 Loss Inherited Father CTNNA3 (Gene Larger Than CNV)

chr4 135140571 135406410 11715.p1 Loss Inherited Mother PABPC4L chr7 48259354 48398102 11722.p1 Loss Inherited Mother ABCA13

(Gene Larger Than CNV)

chr8 30067949 30170383 11722.p1 Gain Inherited Father LEPROTL1, MBOAT4, DCTN6

chr19 62680865 62710992 11753.p1 Gain Inherited Father ZNF419, ZNF773 (Gene Broken)

chr16 21675903 22357384 11834.p1 Gain Inherited Father OTOA (Gene Broken), RRN3P1, LOC100190986, UQCRC2, PDZD9, C16orf52, VWA3A, EEF2K, POLR3E, CDR2, RRN3P3, LOC641298 (Gene Broken)

chr7 153588034 154254473 11843.p1 Loss Inherited Mother DPP6 (Gene Larger Than CNV)

chr12 7882951 8015652 11843.p1 Loss Inherited Father SLC2A14 (Gene Broken), SLC2A3

chr2 86145907 86418710 11895.p1 Gain Inherited Father POLR1A (Gene Broken), PTCD3, SNORD94, IMMT, MRPL35, REEP1 (Gene Broken)

chr15 28712984 30303265 11928.p1 Gain de novo NA ARHGAP11B (Gene Broken), FAN1,

SUPPLEMENTARY INFORMATION

3 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! $%!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

MTMR10, TRPM1, MIR211, KLF13, LOC283711, OTUD7A, CHRNA7

chr6 162692702 162951722 11947.p1 Gain Inherited Father PARK2 (Gene Larger Than CNV)

chr22 39048803 39223310 11947.p1 Gain Inherited Father TNRC6B (Gene Broken), ADSL, SGSM3, MKL1 (Gene Broken)

chr11 14831730 14864039 11964.p1 Gain Inherited Father PDE3B (Gene Broken), CYP2R1 (Gene Broken)

chr16 82990535 83030397 11964.p1 Loss Inherited Mother ATP2C2 (Gene Larger Than CNV)

chr10 133426828 134361413 12118.p1 Gain Inherited Father PPP2R2D, BNIP3, JAKMIP3, DPYSL4, STK32C, LRRC27, PWWP2B, C10orf91, INPP5A (Gene Broken)

chr5 112941833 112974949 12130.p1 Gain Inherited Mother YTHDC2 (Gene Broken)

chr8 15992606 16066256 12130.p1 Loss Inherited Mother MSR1 (Gene Broken)

chr6 107599923 107887182 12212.p1 Loss Inherited Father PDSS2 (Gene Larger Than CNV)

chr11 31133684 31384778 12430.p1 Loss Inherited Mother DCDC1, DNAJC24 (Gene Broken)

chr12 110659017 110799706 12581.p1 Gain Inherited Mother ACAD10 (Gene Broken), ALDH2, C12orf47, MAPKAPK5 (Gene Broken)

chr9 139800210 140131086 12581.p1 Loss de novo NA EHMT1 (Gene Broken), MIR602, CACNA1B

chr1 183364423 183402407 12667.p1 Gain Inherited Mother C1orf25 (Gene Broken), C1orf26 (Gene Broken)

chr18 37874638 37962267 12667.p1 Gain Inherited Mother PIK3C3 (Gene Broken)

chr11 50035201 50669978 12810.p1 Gain Inherited Father LOC441601, LOC646813

chr22 30835976 31086819 12810.p1 Gain Inherited Father SLC5A1 (Gene

W W W. N A T U R E . C O M / N A T U R E | 3 5

SUPPLEMENTARY INFORMATION RESEARCH

! $&!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

Broken), C22orf42, RFPL2, SLC5A4, RFPL3 (Gene Broken), RFPL3S (Gene Broken)

chr3 28443079 28490738 13008.p1 Loss Inherited Father ZCWPW2 (Gene Larger Than CNV)

chr12 180014 645460 13008.p1 Gain Inherited Mother SLC6A12 (Gene Broken), SLC6A13, KDM5A, CCDC77, B4GALNT3, NINJ2

chr3 143305311 143566711 13335.p1 Gain Inherited Father TFDP2 (Gene Broken), GK5, XRN1 (Gene Broken)

chr17 613719 644926 13335.p1 Loss Inherited Father GLOD4 (Gene Broken), RNMTL1

chr16 29502984 30107398 13335.p1 Gain de novo NA SLC7A5P1, SPN, QPRT, C16orf54, MAZ, PRRT2, C16orf53, MVP, CDIPT, LOC440356, SEZ6L2, ASPHD1, KCTD13, TMEM219, TAOK2, HIRIP3, INO80E, DOC2A, C16orf92, FAM57B, ALDOA, PPP4C, TBX6, YPEL3, GDPD3, MAPK3, LOC100271831, CORO1A (Gene Broken)

chr17 69851956 70220158 13409.p1 Gain Inherited Father KIF19 (Gene Broken), BTBD17, GPR142, GPRC5C, CD300A, CD300LB, CD300C, CD300LD, C17orf77, CD300E, RAB37 (Gene

SUPPLEMENTARY INFORMATION

3 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! $'!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

Broken), CD300LF (Gene Broken)

chr6 57019780 57062509 13415.p1 Loss Inherited Father KIAA1586 (Gene Broken)

chr15 55430140 55596476 13415.p1 Gain Inherited Mother CGNL1 (Gene Broken)

chr10 67981151 68085796 13494.p1 Loss Inherited Mother CTNNA3 (Gene Larger Than CNV)

chr18 75004789 75209184 13494.p1 Loss Inherited Father ATP9B (Gene Larger Than CNV)

chr14 67088466 67343145 13530.p1 Gain Inherited Father PLEKHH1 (Gene Broken), PIGH, ARG2, VTI1B, RDH11, RDH12, ZFYVE26 (Gene Broken)

chr4 91076267 91171447 13533.p1 Loss Inherited Father MMRN1 (Gene Broken)

chr14 73586157 73611473 13533.p1 Loss Inherited Father C14orf45 (Gene Broken), ALDH6A1 (Gene Broken)

chr10 67724488 67879503 13557.p1 Loss Inherited Mother CTNNA3 (Gene Larger Than CNV)

chr11 56599848 59990367 13726.p1 Loss de novo NA LRRC55, APLNR, TNKS1BP1, SSRP1, P2RX3, PRG3, PRG2, SLC43A3, RTN4RL2, SLC43A1, TIMM10, SMTNL1, UBE2L6, SERPING1, MIR130A, YPEL4, CLP1, ZDHHC5, MED19, TMX2, C11orf31, BTBD18, CTNND1, OR9Q1, OR6Q1, OR9I1, OR9Q2, OR1S2, OR1S1, OR10Q1, OR10W1, OR5B17, OR5B3, OR5B2,

W W W. N A T U R E . C O M / N A T U R E | 3 7

SUPPLEMENTARY INFORMATION RESEARCH

! $(!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

OR5B12, OR5B21, LPXN, ZFP91, ZFP91-CNTF, CNTF, GLYAT, GLYATL2, LOC283194, GLYATL1, FAM111B, FAM111A, DTX4, MPEG1, OR5AN1, OR5A2, OR5A1, OR4D6, OR4D10, OR4D11, OR4D9, OSBP, MIR3162, PATL1, OR10V1, STX3, MRPL16, GIF, TCN1, PLAC1L, MS4A3, MS4A2, MS4A6A, MS4A4A, MS4A6E, MS4A7, MS4A14, MS4A5, MS4A1 (Gene Broken)

chr17 10287416 10297580 13733.p1 Loss Inherited Mother MYH4 (Gene Larger Than CNV)

chr8 102800655 103493638 13741.p1 Gain Inherited Father NCALD (Gene Broken), RRM2B, UBR5 (Gene Broken)

chr12 19361460 19461747 13793.p1 Gain Inherited Mother PLEKHA5 (Gene Broken)

chr15 55423402 55596476 13812.p1 Gain Inherited Father CGNL1 (Gene Broken)

chr19 62524033 62624661 13815.p1 Loss Inherited Mother ZNF543 (Gene Broken), ZNF304, TRAPPC2P1, ZNF547, ZNF548, ZNF17 (Gene Broken)

chr16 74808505 75067064 13815.p1 Loss de novo NA TERF2IP (Gene Broken), CNTNAP4 (Gene Broken)

SUPPLEMENTARY INFORMATION

3 8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! $)!

Chr (hg18) Start End Family ID CNV Type Inheritance Genes

chr10 3113861 3204971 13844.p1 Loss Inherited Mother PFKP (Gene Broken), PITRM1 (Gene Broken)

chr21 34648298 34909180 13844.p1 Gain Inherited Mother KCNE2, FAM165B, KCNE1, RCAN1 (Gene Broken)

W W W. N A T U R E . C O M / N A T U R E | 3 9

SUPPLEMENTARY INFORMATION RESEARCH

! $*!

Supplementary Table 7. Expanded top de novo ASD risk contributing mutations*

Proband NVIQ Candidate Gene AA change GERP Gran-

tham Location

(hg19) Chrom Position Ref Alt

12225.p1 89 ABCA2 p.VAL1845MET 3.4 21 9q34 9 139906388 C T

11653.p1 44 ADCY5 p.ARG603CYS 3.74 180 3q13.2-q21 3 123046605 G A

12130.p1 55 ADNP frameshift indel

20q13.13 20 49510027 * -TT

11224.p1 112 AP3B2 p.ARG435HIS 4.81 29 15q 15 83346497 C T

13447.p1 51 ARID1B frameshift indel

6q25.3 6 157527664 * -TGTT

13415.p1 48 BRSK2 3n indel

11p15.5 11 1466811 * -AGA

14292.p1 49 BRWD1 frameshift indel

21q22.2 21 40568453 * -T

11872.p1 65 CACNA1D p.ALA769GLY 4.91 60 3p14.3 3 53764493 C G

11773.p1 50 CACNA1E p.GLY1209SER 4.96 56 1q25-q31 1 181708295 G A

13606.p1 60 CDC42BPB p.ARG764TERM 3.19

14q32.32 14 103434646 G A

12086.p1 108 CDH5 p.ARG545TRP 4.9 101 16q22.1 16 66434715 C T

12630.p1 115 CHD3 p.ARG1818TRP 2.97 101 17p13 17 7812028 C T

13733.p1 68 CHD7 p.GLY996SER 5.48 56 8q12.2 8 61735090 G A

13844.p1 34 CHD8 p.GLN959TERM 4.98

14q11.2 14 21871178 G A

12752.p1 93 CHD8 frameshift indel

14q11.2 14 21861376 * -CT

13415.p1 48 CNOT4 p.ASP48ASN 5.52 23 7q22-qter 7 135122938 C T

12703.p1 58 CTNNB1 p.THR551MET 5.43 81 3p21 3 41275757 C T

11452.p1 80 CUL3 p.GLU246TERM 5.51

2q36.2 2 225376218 C A

11571.p1 94 CUL5 p.VAL355ILE NA 29 11q22.3 11 107944174 G A

13890.p1 42 DYRK1A splice site 5.63

21q22.13 21 38865466 G A

12741.p1 87 EHD2 p.ARG167CYS 2.55 180 19q13.3 19 48221860 C T

11629.p1 67 FBXO10 p.GLU54LYS 4.25 56 9p13.1 9 37541606 C T

13629.p1 63 GPS1 p.ARG492GLN 3.66 43 17q25.3 17 80014816 G A

13757.p1 91 GRINL1A 3n indel

15q22.1 15 58001002 * -AGA

11184.p1 94 HDGFRP2 p.GLU83LYS 4.77 56 19p13.3 19 4475539 G A

11610.p1 138 HDLBP p.ALA639SER 5.81 99 2q37.3 2 242186202 C A

11872.p1 65 KATNAL2 splice site 5.48

18q21.1 18 44603833 G C

SUPPLEMENTARY INFORMATION

4 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! %+!

12346.p1 77 MBD5 frameshift indel

2q23.2 2 149225965 * -TC

11947.p1 33 MDM2 p.GLU433LYS/

p.TRP160TERM 3.59 56 12q13-q14 12 69233432 G A

11148.p1 82 MLL3 p.TYR4691TERM -6.76

7q36 7 151842339 G T

12157.p1 91 NLGN1 p.HIS795TYR 5.55 83 3q26.32 3 173999004 C T

11193.p1 138 NOTCH3 p.GLY1134ARG 3.63 125 19p13.2-p13.1 19 15290235 C G

11172.p1 60 NR4A2 p.TYR275HIS 4.53 83 2q22-q23 2 157185876 A G

11660.p1 60 NTNG1 p.THR135ILE 5.36 89 1p13.2-p13.1 1 107867061 C T

12532.p1 110 NTNG1 p.TYR23CYS 3.97 194 1p13.2-p13.1 1 107691283 A G

11093.p1 91 OPRL1 p.ARG157CYS 3.42 180 20q13.33 20 62729390 C T

13793.p1 56 PCDHB4 p.ASP555HIS 3.76 81 5q31 5 140503243 G C

11707.p1 23 PDCD1 frameshift indel

2q37.3 2 242795103 * -G & K

12304.p1 83 PSEN1 p.THR421ILE 5.61 89 14q24.3 14 73685855 C T

11390.p1 77 PTEN p.THR167ASN 5.39 65 10q23 10 89711882 C A

13629.p1 63 PTPRK p.ARG784HIS 5.52 29 6q22.2-q22.3 6 128326372 C T

13333.p1 69 RGMA p.VAL379ILE 3.41 29 15q26.1 15 93588446 C T

13222.p1 86 RPS6KA3 p.SER369TERM 4.32

Xp22.2-p22.1 X 20193403 G C

11257.p1 128 RUVBL1 p.LEU365GLN 4.62 113 3q21 3 127806574 A T

11843.p1 113 SESN2 p.ALA46THR 4.88 58 1p35.2 1 28595739 G A

12933.p1 41 SETBP1 frameshift indel

18q21.1 18 42532021 * +GG, -C

12565.p1 79 SETD2 frameshift indel

3p21.31 3 47098932 * -T

12335.p1 47 TBL1XR1 p.LEU282PRO 5.43 98 3q26.33 3 176765107 A G

11480.p1 41 TBR1 frameshift indel

2q24.2 2 162273322 * -C

11569.p1 67 TNKS p.ARG568THR 4.94 71 8p23.1 8 9567684 G C

12621.p1 120 TSC2 p.ARG1580TRP 2.06 101 16p13.3 16 2136269 C T

11291.p1 83 TSPAN17 p.SER75TERM 1.09

5q35.3 5 176078840 C A

11006.p1 125 UBE3C p.SER845PHE 5.39 155 7q36.3 7 157041114 C T

12161.p1 95 UBR3 frameshift indel

2q31.1 2 170732427 * -AAA, +C

W W W. N A T U R E . C O M / N A T U R E | 4 1

SUPPLEMENTARY INFORMATION RESEARCH

! %"!

12521.p1 78 USP15 frameshift indel

12q14 12 62775296 * -TGAG

11526.p1 92 ZBTB41 p.TYR886HIS 5.28 83 1q31.3 1 197128563 A G

13335.p1 25 ZNF420 p.LEU76PRO 2.54 98 19q13.12 19 37618120 T C

CNV

Proband Candidate Gene Type Location Chrom Start Stop

11928.p1 66 CHRNA7 DUPLICATION

15q13.3 15 30925692 32515973

13815.p1 56 CNTNAP4 DELETION

16q23.1 16 75690104 76523780

13726.p1 59 CTNND1 DELETION

11q12.1 11 56843272 60233791

12581.p1 34 EHMT1 DELETION

9 9 140680073 141023914

13335.p1 25 TBX6 DUPLICATION

16p11.2 16 29595483 30199897

*Top candidate de novo mutations based on severity and/or supporting evidence from the literature.

SUPPLEMENTARY INFORMATION

4 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! %#!

Supplementary Table 8. Mutation rates and probability of recurrence for genes with >1

mutation.

Gene GRIN2B LAMC3 SCN1A NTNG1 CHD8 chimp diffs 15 40 26 2 15 length mapped to chimp

4503 4815 6134 1782 7904

total length of sequence

4503 4840 6134 1782 7904

mut rate per site (diffs / mapped seq)

3.33E-03 8.31E-03

4.24E-03 1.12E-03 1.90E-03

Mutation rate/base/generation

6.94E-09 1.73E-08 8.83E-09 2.34E-09 3.95E-09

People_Screened 1703 1703 1703 189 189 Size of Coding and Splice

4499 4836 6130 1778 7900

# of bases screened 15323594 16471416 20878780 672084 2986200 Avg# DN events 1.06E-01 2.85E-01 1.84E-01 1.57E-03 1.18E-02

#of De Novo 3 2 2 2 2

P(X+) 1.85E-04 3.37E-02 1.50E-02 1.23E-06 6.92E-05

De Novo Events Proband Type Chrom Pos (hg19) Genotype MIP GRIN2B 12681 splice 12 13722953 Y GRIN2B 12547 nonsense 12 13764762 Y GRIN2B 11691 frameshift indel 12 14019043 +G/* LAMC3 11666 missense 9 133914290 R LAMC3 11704 missense 9 133952690 R SCN1A 12499 missense 2 166848071 R SCN1A 12340 missense 2 166848006 Y Exome NTNG1 11660 missense 1 107867061 Y NTNG1 12532 missense 1 107691283 R CHD8 12752 frameshift indel 14 21861376 -CT CHD8 13844 nonsense 14 21871178 R

W W W. N A T U R E . C O M / N A T U R E | 4 3

SUPPLEMENTARY INFORMATION RESEARCH

! %$!

Supplementary Table 9. Selected inherited hemizygous and compound heterozygous sites.

Hemizygous rare singletons

Gene Proband Type Chrom Pos (hg19) Geno Type

AFF2 11056 missense X 147743569 G xmr

ATP7A 11388 missense X 77245127 A xmr

ATRX 11096 missense X 76938115 G xmr

ATRX 11504 missense X 76939319 G xmr

ATRX 11827 missense X 76938353 C xmr

AWAT1 12238 nonsense X 69455937 A x_truncating

CASK 11291 missense X 41469222 C xmr

CNGA2 13793 nonsense X 150912962 T x_truncating

DMD 12157 missense X 31284928 T xmr

FMR1 12390 splice X 147019617 A xmr

FRMPD4 12036 missense X 12712524 C xasd

FTSJ1 11425 missense X 48340830 C xmr

GPR119 11172 nonsense X 129518668 A x_truncating

IQSEC2 11526 missense X 53265000 A xmr

NHS 11989 missense X 17745600 T xmr

OCRL 11388 missense X 128674731 C xmr

PRKY 13822 splice Y 7224175 T y_truncating

PTCHD1 11083 missense X 23397843 T xmr

PTCHD1 11638 missense X 23411984 C xmr

RPGR 12157 splice X 38178241 T x_truncating

RPGR 11218 missense X 38145259 A x_truncating

ZNF185 12373 splice X 152101415 C x_truncating

NLGN4Y 12373 nonsense Y 16734300 T yasd_truncating

Compound Het rare singletons

Gene Proband Type Chrom Pos (hg19) Geno Type

CNTNAP4 11009 missense 16 76482674 M asd_rc

CNTNAP4 11009 missense 16 76482817 R asd_rc

VPS13B 11659 missense 8 100568761 R asd_rc

VPS13B 11659 missense 8 100729575 M asd_rc

MYH7B 11390 frameshift indel 20 33574698 -TCTG rc

MYH7B 11390 missense 20 33574761 Y rc

ITGAM 11863 missense 16 31308874 R rc

ITGAM 11863 splice 16 31340548 R rc

MATN2 11947 frameshift indel 8 99045849 -C rc

MATN2 11947 missense 8 99045883 S rc

QRFPR 12667 nonsense 4 122254158 Y rc

QRFPR 12667 missense 4 122301478 Y rc

SCN5A 13169 missense 3 38622556 Y rc

SCN5A 13169 nonsense 3 38648178 Y rc_truncating_ms xmr = X-linked mental retardation loci, asd = ASD candidate loci, rc = possible recessive loci

SUPPLEMENTARY INFORMATION

4 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! %%!

Supplementary Table 10. List of the 21 severe de novo mutations that map to regions of

recurrent CNV associated with Developmental Delay and ASD. G

ene

Typ

e of

mut

atio

n

Gen

omic

di

sord

er

Prob

and

Gai

ns a

ll ca

ses

Gai

ns C

T

Gai

ns A

SD

Sig

ASD

ga

ins*

Sig

all g

ains

#

Los

s all

case

s

Los

s CT

Los

s ASD

Sig

ASD

lo

ss*

Sig

all l

oss#

MBD5 fs 2q23.1 del 12346 0 0 0 ~ ~ 11 1 2 5.48E-02 4.52E-02

HDLBP ms 2q37 del 11610 1 0 0 ~ 6.54E-01 24 3 1 4.58E-01 5.77E-03

PDCD1 ms 2q37 del 11707 3 0 0 ~ 2.80E-01 25 1 1 2.64E-01 2.38E-04

UIMC1 ms 5q35.2 del 12285 0 0 0 ~ ~ 9 0 0 ~ 2.20E-02

PSMG4 ms 6p25.3 del 11064 2 0 0 ~ 4.28E-01 8 0 0 ~ 3.36E-02

GPR146 ms 7p22.1 del/dup 11518 13 1 2 5.48E-02 2.21E-02 8 3 0 ~ 4.38E-01

TSNARE1 ms 8q24.3 del 13557 3 0 2 2.02E-02 2.80E-01 14 10 2 5.25E-01 ~

PTGES ms 9q34 del 13822 2 0 0 ~ 4.28E-01 10 0 2 2.02E-02 1.44E-02

PAEP ms 9q34 del 11109 4 2 0 ~ 6.56E-01 56 0 3 2.86E-03 4.68E-11

ABCA2 ms 9q34 del 12225 5 0 0 ~ 1.20E-01 67 4 4 1.76E-02 6.82E-09

PNPLA7 ms 9q34 del 12249 14 0 1 1.42E-01 2.60E-03 64 3 4 9.92E-03 3.39E-09

SLC6A13 ms 12p13.3 del/dup 11388 14 1 0 ~ 1.54E-02 6 2 1 3.69E-01 4.38E-01

TSC2 ms 16p13.3 del/dup 12621 16 0 3 2.86E-03 1.10E-03 48 4 1 5.35E-01 6.42E-06

KIAA0182 ms 16p13.3 del/dup 11711 5 0 0 ~ 1.20E-01 14 0 1 1.42E-01 2.63E-03

SYNRG ms 17q12 del/dup 11599 18 3 0 ~ 3.61E-02 15 2 1 3.69E-01 3.54E-02

DNAH17 ms 17q25 del 11587 3 2 0 ~ ~ 42 1 4 1.80E-03 2.80E-07

GPS1 ms 17q25 del 13629 1 1 0 ~ ~ 45 4 4 1.76E-02 1.82E-05

DUS1L ms 17q25 del 13274 1 1 0 ~ ~ 45 3 4 9.92E-03 4.10E-06

POLRMT ns 19p13.3 del/dup 13333 28 1 3 1.02E-02 7.00E-05 14 1 1 2.64E-01 1.54E-02

HDGFRP2 ms 19p13.3 del/dup 11184 31 1 3 1.02E-02 2.00E-05 7 0 1 1.42E-01 5.13E-02

SBF1 ms 22q13 del/dup 13793 9 0 0 ~ 2.20E-02 53 0 3 2.86E-03 1.68E-10

ms = missense, ns = nonsense, fs = frameshifting indel, All Signature Cases (n = 15,767), ASD (n = 1,379), Controls (n = 8,329) *Fisher exact p-values ASD cases versus controls, #all cases versus controls5.

W W W. N A T U R E . C O M / N A T U R E | 4 5

SUPPLEMENTARY INFORMATION RESEARCH

! %&!

Supplementary Table 11. Other mutations intersecting previous CNV loci and animal

models for ASD.

Gene Type Region Proband Notes ADNP fs 20q13.13 12130 Animal Model

MYH10 ms 17p13 13742 Animal Model CACNA1D ms 3p14.3 11872 Animal Model CACNA1E ms 1q25-q31 11773 Animal Model

AP3B2 ms 15q 11224 CNV Region ARID1B fs 6q25.3 13447 CNV Region BRSK2 del_aa 11p15.5 13415 CNV Region

DYRK1A sp 21q22.13 13890 CNV Region NLGN1 ms 3q26.32 12157 CNV Region SETBP1 fs 18q21.1 12933 CNV Region

TNKS ms 8p23.1 11569 CNV Region TSPAN17 ns 5q35.3 11291 CNV Region

ms = missense, ns = nonsense, fs = frameshifting indel, del_aa = deletion of conserved amino acid

SUPPLEMENTARY INFORMATION

4 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! %'!

Supplementary Table 12. List of the 126 genes/proteins with severe mutations used for the

PPI, along w/ summary stats.

Protein Degree Clustering Coefficient

Connected component?

ADCY5 2 1 Yes

ADNP 15 0.949 Yes

ARID1B 2 1 Yes

BRSK2 2 1 Yes

BRWD1 1 0 Yes

CDC42BPB 15 0.94285714 Yes

CDH5 1 0 Yes

CHD3 13 1 Yes

CHD7 13 1 Yes

CHD8 18 0.68 Yes

CNOT1 16 0.858 Yes

CNOT3 12 0.894 Yes

CTNNB1 10 0.333 Yes

CUL3 16 0.292 Yes

DDX20 9 0.694 Yes

DEPDC7 15 0.943 Yes

DYRK1A 2 0 Yes

EIF4G1 6 0.867 Yes

FBXW9 1 0 Yes

H2AFV 7 1 Yes

HDGFRP2 4 0.833 Yes

HDLBP 16 0.842 Yes

HNRNPF 30 0.37 Yes

INCENP 2 0 Yes

IQGAP2 8 0.857 Yes

KATNAL2 4 1 Yes

KRT80 6 1 Yes

MAP4 6 1 Yes

MKI67 6 0.667 Yes

MYBBP1A 29 0.34 Yes

MYH10 6 0.467 Yes

NACA 6 1 Yes

NOTCH3 1 0 Yes

NR4A2 1 0 Yes

PBRM1 18 0.647 Yes

PDIA6 13 0.628 Yes

POLRMT 2 1 Yes

PSEN1 2 0 Yes

Protein Degree Clustering Coefficient

Connected component?

RPS6KA3 1 0 Yes

RUVBL1 24 0.489 Yes

SCN1A 5 0.7 Yes

SFPQ 29 0.374 Yes

SRBD1 1 0 Yes

SYNE1 2 1 Yes

TBL1XR1 16 0.858 Yes

TSR2 8 0.929 Yes

UBE3C 5 1 Yes

UBR3 2 1 Yes

YTHDC2 19 0.661 Yes

AMY2B 0 0 No

APAF1 0 0 No

ASAH2 0 0 No

ASAH2C 0 0 No

BMP1 0 0 No

CACNA1D 0 0 No

CACNA1E 0 0 No

COL25A1 0 0 No

CUBN 0 0 No

DDR2 0 0 No

DNAH17 0 0 No

DNAH5 0 0 No

DNAJB9 0 0 No

DUS1L 0 0 No

EFCAB8 0 0 No

EHD2 0 0 No

ETFB 0 0 No

FAM45A 0 0 No

FBXO10 0 0 No

FOXP1 0 0 No

GPR146 0 0 No

GRIN2B 0 0 No

KIAA0100 0 0 No

KIAA0182 0 0 No

KRBA1 0 0 No

L1TD1 0 0 No

LAMC3 0 0 No

W W W. N A T U R E . C O M / N A T U R E | 4 7

SUPPLEMENTARY INFORMATION RESEARCH

! %(!

Protein Degree Clustering Coefficient

Connected component?

MBD5 0 0 No

MCAM 0 0 No

MDM2 0 0 No

MEGF11 0 0 No

MLL3 0 0 No

MUC16 0 0 No

MYO7B 0 0 No

NAA40 0 0 No

NLGN1 0 0 No

NTNG1 0 0 No

OPRL1 0 0 No

OR10Z1 0 0 No

PCDHB4 0 0 No

PCNX 0 0 No

PDCD1 0 0 No

PHF19 0 0 No

PION 0 0 No

PITPNM3 0 0 No

PLEKHA8 0 0 No

PNPLA7 0 0 No

POLQ 0 0 No

PTEN 0 0 No

PTGR1 0 0 No

RBMS3 0 0 No

RGS22 0 0 No

RNF160 0 0 No

Protein Degree Clustering Coefficient

Connected component?

SESN2 0 0 No

SETBP1 0 0 No

SETD2 0 0 No

SGSM3 0 0 No

SLC30A5 0 0 No

SLC7A7 0 0 No

SP7 0 0 No

ST3GAL3 0 0 No

STIL 0 0 No

STK36 0 0 No

SYNRG 0 0 No

TBR1 0 0 No

TLK2 0 0 No

TNKS 0 0 No

TRIO 0 0 No

TRPM5 0 0 No

TSC2 0 0 No

TSNARE1 0 0 No

TSPAN17 0 0 No

USP15 0 0 No

VPS39 0 0 No

ZBTB41 0 0 No

ZNF420 0 0 No

ZNF644 0 0 No

SUPPLEMENTARY INFORMATION

4 8 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! %)!

Supplementary Table 13. Top IPA function for the PPI connected component.

Category Function Function Annotation B-H p-value Molecules #

Gene Expression transcription transcription 9.45E-03

ARID1B, BRWD1, CHD3, CHD8, CTNNB1, DDX20, DYRK1A, MYBBP1A, NOTCH3, NR4A2, POLRMT, PSEN1, RUVBL1, SFPQ, TBL1XR1 15

Gene Expression transcription transcription of DNA endogenous promoter 3.06E-02

BRWD1, CHD3, CHD8, CTNNB1, DDX20, NR4A2, RUVBL1, TBL1XR1 8

Gene Expression transcription transcription of AP1/CRE element 3.42E-02 CTNNB1 1

Gene Expression transcription transcription of simian virus 40 3.42E-02 CHD3 1

Gene Expression transcription transcription of LEF1 binding site 4.32E-02 CTNNB1 1

Gene Expression transcription transcription of DNA 4.57E-02 ARID1B, CTNNB1, MYBBP1A, NOTCH3, NR4A2, TBL1XR1 6

Gene Expression transcription transcription of Ets1 binding site 4.90E-02 CTNNB1 1

Gene Expression transcription transcription of mitochondrial DNA 5.42E-02 POLRMT 1

Gene Expression transcription

transcription of T-cell factor recognition sequence 6.82E-02 CTNNB1 1

Gene Expression transcription transcription of TCF binding site 6.82E-02 CTNNB1 1

Gene Expression transcription transcription of Ets element 7.06E-02 CTNNB1 1

Gene Expression transcription transcription of TATA box 8.01E-02 CTNNB1 1

Gene Expression activation activation of promoter fragment 3.06E-02 CTNNB1, RPS6KA3 2

Gene Expression activation activation of LEF1 binding site 3.06E-02 CTNNB1, PSEN1 2

Gene Expression activation activation of Tbe3 response element 4.10E-02 CTNNB1 1

Gene Expression activation activation of Tcf-4 response element 4.10E-02 CTNNB1 1

W W W. N A T U R E . C O M / N A T U R E | 4 9

SUPPLEMENTARY INFORMATION RESEARCH

! %*!

Gene Expression activation activation of T-cell factor responsive element 5.42E-02 CTNNB1 1

Gene Expression activation activation of Smad3-Smad4 binding element 8.25E-02 CTNNB1 1

Gene Expression activation activation of T-cell factor recognition sequence 8.25E-02 CTNNB1 1

Gene Expression transactivation transactivation of LEF1 binding site 4.10E-02 CTNNB1 1

Gene Expression transactivation transactivation of TCF binding site 4.10E-02 CTNNB1 1

Gene Expression transactivation transactivation of Tcf4 binding site 4.10E-02 CTNNB1 1

Gene Expression transactivation transactivation of RBP-J/CBF binding site 4.57E-02 NOTCH3 1

Gene Expression transactivation

transactivation of androgen receptor binding site 4.57E-02 CTNNB1 1

Gene Expression transactivation transactivation of thyroid hormone response element 5.42E-02 TBL1XR1 1

Gene Expression transactivation transactivation 7.00E-02 CTNNB1, DDX20, NOTCH3, NR4A2, TBL1XR1 5

Gene Expression transactivation transactivation of DNA endogenous promoter 8.57E-02 NOTCH3 1

Gene Expression binding binding of progesterone response element 4.57E-02 SFPQ 1

Gene Expression binding binding of gene 5.42E-02 PSEN1 1

Gene Expression binding binding of TPA response element 6.21E-02 PSEN1 1

Gene Expression expression expression of Nfat binding site 4.57E-02 DYRK1A 1

Gene Expression expression expression of synthetic promoter 6.82E-02

CTNNB1, DYRK1A, NOTCH3, PSEN1 4

Gene Expression repression repression of p53 consensus binding site 4.57E-02 CUL3 1

Gene Expression translation translation of reporter mRNA 7.74E-02 EIF4G1 1

Behavior freezing behavior freezing behavior of mice 9.45E-03 DYRK1A, PSEN1 2

Behavior locomotion locomotion 3.06E-02 ADCY5, CHD7, NR4A2, PSEN1, SCN1A 5

SUPPLEMENTARY INFORMATION

5 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! &+!

Behavior behavior behavior 3.42E-02 ADCY5, DYRK1A, NR4A2, PSEN1, SCN1A, UBR3 6

Behavior behavior behavior of mice 7.11E-02 DYRK1A, PSEN1, SCN1A 3

Behavior walking walking 3.42E-02 CHD7, SCN1A 2

Behavior psychological process psychological process of mice 4.35E-02 ADCY5, DYRK1A, PSEN1, SCN1A 4

Behavior learning learning by mice 4.57E-02 ADCY5, PSEN1 2

Behavior cognition cognition 4.58E-02 ADCY5, CHD7, PSEN1 3

Behavior motor learning motor learning by mice 4.90E-02 ADCY5 1

Behavior long-term memory long-term memory of mice 6.82E-02 PSEN1 1

Behavior startle response startle response of mice 7.06E-02 DYRK1A 1

Behavior mating behavior mating behavior of mice 8.25E-02 SCN1A 1

Organismal Development development development of organism 9.45E-03

ADNP, CDH5, CHD7, CHD8, CTNNB1, CUL3, DYRK1A, MYH10, PSEN1, RUVBL1, UBR3 11

Organismal Development development development of animal 9.45E-03

CDH5, CHD7, CHD8, CTNNB1, CUL3, DYRK1A, MYH10, PSEN1, UBR3 9

Organismal Development development development of embryo 9.45E-03

CHD7, CHD8, CTNNB1, CUL3, MYH10, PSEN1, UBR3 7

Organismal Development development development of mice 3.06E-02

CDH5, CTNNB1, DYRK1A, MYH10, PSEN1 5

Organismal Development development

development of capillary plexus 4.10E-02 CDH5 1

Organismal Development development

development of blood vessel 4.55E-02

CDH5, CHD7, CTNNB1, MYH10, PSEN1 5

Organismal Development development development of head 6.21E-02 CTNNB1 1

Organismal Development development

delay in development of mice 6.82E-02 DYRK1A 1

Organismal Development developmental process

developmental process of animal 9.45E-03

ADNP, CDH5, CHD7, CHD8, CTNNB1, CUL3, DYRK1A, MYH10, PSEN1, UBR3 10

Organismal Development developmental process

developmental process of mice 3.06E-02

ADNP, CDH5, CTNNB1, DYRK1A, MYH10, PSEN1 6

Organismal Development formation formation of body axis 3.06E-02 CHD8, CTNNB1 2

Organismal Development formation

formation of dorsal-ventral axis 3.42E-02 CTNNB1 1

Organismal Development morphology morphology of aortic root 3.06E-02 MYH10 1

W W W. N A T U R E . C O M / N A T U R E | 5 1

SUPPLEMENTARY INFORMATION RESEARCH

! &"!

Organismal Development neovascularization

neovascularization of corpus luteum 3.06E-02 CDH5 1

Organismal Development patterning

patterning of umbilical vessels 3.06E-02 CTNNB1 1

Organismal Development patterning patterning of embryo 4.54E-02 CTNNB1, PSEN1 2 Organismal Development patterning patterning of vasculature 7.06E-02 CTNNB1 1 Organismal Development morphogenesis morphogenesis of limb 3.28E-02 CHD7, CTNNB1, PSEN1 3

Organismal Development morphogenesis

morphogenesis of hindlimb 3.32E-02 CHD7, CTNNB1 2

Organismal Development morphogenesis morphogenesis of arm 4.57E-02 CTNNB1 1 Organismal Development duplication duplication of body axis 4.32E-02 CHD8 1 Organismal Development angiogenesis angiogenesis of mice 4.90E-02 CDH5, MYH10 2 Organismal Development segmentation segmentation of somites 4.90E-02 PSEN1 1 Organismal Development segmentation segmentation of embryo 5.77E-02 PSEN1 1 Organismal Development growth growth of mice 5.19E-02 ADNP, CTNNB1, DYRK1A 3 Organismal Development growth delay in growth of mice 7.74E-02 DYRK1A 1

Embryonic Development development development of embryo 9.45E-03

CHD7, CHD8, CTNNB1, CUL3, MYH10, PSEN1, UBR3 7

Embryonic Development development

development of trophoblast 4.32E-02 PBRM1 1

Embryonic Development development

development of second branchial arch 6.82E-02 PSEN1 1

Embryonic Development patterning

patterning of embryonic tissue 3.06E-02 CTNNB1, PSEN1 2

Embryonic Development patterning

patterning of vitelline vessel 3.06E-02 CTNNB1 1

Embryonic Development patterning patterning of embryo 4.54E-02 CTNNB1, PSEN1 2

Embryonic Development maintenance

maintenance of apical ectodermal ridge 3.06E-02 CTNNB1 1

Embryonic Development regression

onset of regression of apical ectodermal ridge 3.06E-02 CTNNB1 1

Embryonic Development size size of ventricular zone 3.06E-02 CTNNB1 1 Embryonic Development morphogenesis morphogenesis of limb 3.28E-02 CHD7, CTNNB1, PSEN1 3

Embryonic Development morphogenesis

morphogenesis of hindlimb 3.32E-02 CHD7, CTNNB1 2

Embryonic Development morphogenesis morphogenesis of arm 4.57E-02 CTNNB1 1

Embryonic Development morphogenesis

morphogenesis of metanephros 4.57E-02 CTNNB1 1

SUPPLEMENTARY INFORMATION

5 2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! &#!

Embryonic Development morphogenesis morphogenesis of foregut 6.82E-02 CTNNB1 1 Embryonic Development adhesion adhesion of blastomeres 3.42E-02 CTNNB1 1

Embryonic Development formation formation of renal vesicle 3.42E-02 CTNNB1 1

Embryonic Development formation

formation of apical ectodermal ridge 4.32E-02 CTNNB1 1

Embryonic Development formation

formation of visceral endoderm 4.57E-02 MYH10 1

Embryonic Development formation formation of endoderm 5.77E-02 CTNNB1 1

Embryonic Development maturation

maturation of embryonic cell lines 3.42E-02 NR4A2 1

Embryonic Development thickness

thickness of ventricular zone 3.42E-02 PSEN1 1

Embryonic Development morphology

morphology of outflow tract 4.10E-02 MYH10 1

Embryonic Development quantity quantity of somites 4.32E-02 ADNP 1

Embryonic Development quantity

quantity of mesenchymal cells 4.57E-02 PSEN1 1

Embryonic Development quantity

quantity of embryonic cell lines 8.01E-02 ADNP 1

Embryonic Development apoptosis

apoptosis of neural crest cells 4.90E-02 CTNNB1 1

Embryonic Development segmentation segmentation of somites 4.90E-02 PSEN1 1 Embryonic Development segmentation segmentation of embryo 5.77E-02 PSEN1 1

Embryonic Development specification

specification of dorsal-ventral axis 6.62E-02 CTNNB1 1

Embryonic Development developmental process

developmental process of embryonic cell lines 7.25E-02 DYRK1A, NR4A2 2

Nervous System Development and Function development

development of nervous system 9.45E-03

CHD7, CTNNB1, DYRK1A, MYH10, NOTCH3, NR4A2, PSEN1, RPS6KA3 8

Nervous System Development and Function development

development of central nervous system 1.65E-02

CHD7, CTNNB1, MYH10, NOTCH3, PSEN1, RPS6KA3 6

Nervous System Development and Function development development of forebrain 3.06E-02 CTNNB1, NOTCH3, PSEN1 3

Nervous System Development and Function development

development of fourth cerebral ventricle 3.06E-02 MYH10 1

W W W. N A T U R E . C O M / N A T U R E | 5 3

SUPPLEMENTARY INFORMATION RESEARCH

! &$!

Nervous System Development and Function development

development of third cerebral ventricle 3.06E-02 MYH10 1

Nervous System Development and Function development development of brain 3.06E-02 CTNNB1, MYH10, NOTCH3, PSEN1 4

Nervous System Development and Function development

development of lateral cerebral ventricle 4.32E-02 MYH10 1

Nervous System Development and Function development

development of dopaminergic neurons 5.42E-02 NR4A2 1

Nervous System Development and Function size size of brain 3.06E-02 CTNNB1, DYRK1A 2

Nervous System Development and Function size size of superior colliculus 3.06E-02 DYRK1A 1

Nervous System Development and Function neurological process

neurological process of brain cells 3.06E-02 ADCY5, ADNP, PSEN1 3

Nervous System Development and Function neurological process

neurological process of neurons 3.42E-02

ADCY5, ADNP, CTNNB1, PSEN1, SCN1A 5

Nervous System Development and Function neurological process

neurological process of corticostriatal neurons 4.32E-02 ADCY5 1

Nervous System Development and Function neurological process

neurological process of cerebral cortex cells 4.42E-02 ADNP, PSEN1 2

Nervous System Development and Function neurological process

neurological process of mice 4.57E-02 ADCY5, DYRK1A, NR4A2, PSEN1 4

Nervous System Development and Function differentiation

differentiation of neuronal progenitor cells 3.06E-02 NOTCH3, PSEN1 2

Nervous System Development and Function differentiation differentiation of neurons 4.32E-02 BRSK2, NOTCH3, NR4A2, PSEN1 4

Nervous System Development and Function differentiation

differentiation of amacrine cells 4.90E-02 NR4A2 1

SUPPLEMENTARY INFORMATION

5 4 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! &%!

Nervous System Development and Function differentiation

differentiation of neuroepithelial cells 7.74E-02 PSEN1 1

Nervous System Development and Function differentiation

differentiation of dopaminergic neurons 8.25E-02 NR4A2 1

Nervous System Development and Function differentiation differentiation of neuroglia 8.25E-02 CTNNB1, NOTCH3 2

Nervous System Development and Function cell cycle progression

re-entry into cell cycle progression of neuroepithelial cells 3.06E-02 CTNNB1 1

Nervous System Development and Function cytostasis cytostasis of neurites 3.06E-02 MYH10 1

Nervous System Development and Function recovery recovery of brain tissue 3.06E-02 ADNP 1

Nervous System Development and Function surface area

surface area of cerebral cortex 3.06E-02 CTNNB1 1

Nervous System Development and Function cell-cell contact

cell-cell contact of nervous tissue cell lines 3.42E-02 CDH5 1

Nervous System Development and Function shrinkage

shrinkage of cerebral cortex 3.42E-02 PSEN1 1

Nervous System Development and Function maturation

maturation of dopaminergic neurons 4.10E-02 NR4A2 1

Nervous System Development and Function morphology

morphology of cerebral cortex 4.10E-02 CTNNB1 1

Nervous System Development and Function neurogenesis neurogenesis of brain 4.10E-02 PSEN1 1

Nervous System Development and Function neurogenesis neurogenesis 8.02E-02 NOTCH3, PSEN1, SYNE1 3

Nervous System Development and Function retraction retraction of neurites 4.20E-02 MYH10, PSEN1 2

W W W. N A T U R E . C O M / N A T U R E | 5 5

SUPPLEMENTARY INFORMATION RESEARCH

! &&!

Nervous System Development and Function retraction retraction of dendrites 6.21E-02 PSEN1 1

Nervous System Development and Function neurotransmission neurotransmission 4.32E-02 ADNP, CTNNB1, PSEN1, SCN1A 4

Nervous System Development and Function neuroprotection

neuroprotection of cortical neurons 4.32E-02 ADNP 1

Nervous System Development and Function tubulation

tubulation of nervous tissue cell lines 4.32E-02 CDH5 1

Nervous System Development and Function migration migration of neurons 4.55E-02 MYH10, NR4A2, PSEN1 3

Nervous System Development and Function quantity quantity of neurons 4.55E-02 DYRK1A, NR4A2, PSEN1 3

Nervous System Development and Function quantity

quantity of Cajal-Retzius neurons 4.57E-02 PSEN1 1

Nervous System Development and Function quantity quantity of amacrine cells 5.77E-02 NR4A2 1

Nervous System Development and Function quantity

quantity of neuroepithelial cells 6.21E-02 CTNNB1 1

Nervous System Development and Function quantity

quantity of dopaminergic neurons 8.57E-02 NR4A2 1

Nervous System Development and Function learning learning by mice 4.57E-02 ADCY5, PSEN1 2

Nervous System Development and Function motor learning motor learning by mice 4.90E-02 ADCY5 1

Nervous System Development and Function morphogenesis morphogenesis of brain 5.77E-02 PSEN1 1

Nervous System Development and Function morphogenesis morphogenesis of neurites 8.65E-02 MYH10, SYNE1 2

SUPPLEMENTARY INFORMATION

5 6 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

! &'!

Nervous System Development and Function survival survival of nervous tissue 5.77E-02 ADNP 1

Nervous System Development and Function synaptic transmission

synaptic transmission of neurons 5.77E-02 ADNP, CTNNB1, PSEN1 3

Nervous System Development and Function synaptic transmission

synaptic transmission of hippocampal neurons 6.82E-02 PSEN1 1

Nervous System Development and Function long-term memory long-term memory of mice 6.82E-02 PSEN1 1

Nervous System Development and Function long-term potentiation

long-term potentiation of hippocampal neurons 6.82E-02 PSEN1 1

Nervous System Development and Function proliferation

proliferation of cerebral cortex cells 7.06E-02 DYRK1A 1

Nervous System Development and Function proliferation

proliferation of neural precursor cells 8.01E-02 DYRK1A 1

Nervous System Development and Function startle response startle response of mice 7.06E-02 DYRK1A 1

Nervous System Development and Function extension extension of neurites 8.01E-02 ADNP, MYH10 2

Nervous System Development and Function axonogenesis axonogenesis of axons 8.57E-02 PSEN1 1

Nervous System Development and Function long term depression

long term depression of brain cells 8.57E-02 ADCY5 1

Nervous System Development and Function long term depression

long term depression of neurons 8.91E-02 ADCY5 1

W W W. N A T U R E . C O M / N A T U R E | 5 7

SUPPLEMENTARY INFORMATION RESEARCH

! &(!

Supplementary References

1 Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of

the 7q11.23 Williams syndrome region, are strongly associated with autism.

Neuron 70, 863-885, doi:10.1016/j.neuron.2011.05.002 (2011).

2 Levy, D. et al. Rare de novo and transmitted copy-number variation in autistic

spectrum disorders. Neuron 70, 886-897, doi:10.1016/j.neuron.2011.05.015

(2011).

3 De Ferrari, G. V. & Moon, R. T. The ups and downs of Wnt signaling in prevalent

neurological disorders. Oncogene 25, 7545-7553 (2006).

4 Warde-Farley, D. et al. The GeneMANIA prediction server: biological network

integration for gene prioritization and predicting gene function. Nucleic acids

research 38, W214-220, doi:10.1093/nar/gkq537 (2010).

5 Cooper, G. M. et al. A copy number variation morbidity map of developmental

delay. Nature Genetics 43, 838-846, doi:10.1038/ng.909 (2011).