Human Splicing Finder: an online bioinformatics tool to predict splicing signals

14
Nucleic Acids Research, 2009, 1–14 doi:10.1093/nar/gkp215 Human Splicing Finder: an online bioinformatics tool to predict splicing signals Franc ¸ ois-Olivier Desmet 1 , Dalil Hamroun 1,2 , Marine Lalande 1 , Gwenae ¨ lle Collod-Be ´ roud 1 , Mireille Claustres 1,2,3 and Christophe Be ´ roud 1,2,3, * 1 INSERM, U827, 2 CHU Montpellier, Ho ˆ pital Arnaud de Villeneuve, Laboratoire de Ge ´ ne ´ tique Mole ´ culaire and 3 Universite ´ Montpellier1, UFR Me ´ decine, Montpellier, F-34000, France Received December 11, 2008; Revised February 28, 2009; Accepted March 16, 2009 ABSTRACT Thousands of mutations are identified yearly. Although many directly affect protein expression, an increasing proportion of mutations is now believed to influence mRNA splicing. They mostly affect existing splice sites, but synonymous, non-synonymous or nonsense mutations can also create or disrupt splice sites or auxiliary cis-splicing sequences. To facilitate the analysis of the different mutations, we designed Human Splicing Finder (HSF), a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-b Serine-Arginine proteins and the hnRNP A1 ribonu- cleoprotein. We also developed new Position Weight Matrices to assess the strength of 5and 3splice sites and branch points. We evaluated HSF effi- ciency using a set of 83 intronic and 35 exonic muta- tions known to result in splicing defects. We showed that the mutation effect was correctly predicted in almost all cases. HSF could thus represent a valu- able resource for research, diagnostic and thera- peutic (e.g. therapeutic exon skipping) purposes as well as for global studies, such as the GEN2PHEN European Project or the Human Variome Project. INTRODUCTION Since its discovery more than three decades ago (1), mRNA splicing is the focus of many studies both in fun- damental and applied research. Splicing is part of the pre- mRNA maturation process that occurs in each eukaryotic cell between mRNA transcription from DNA and its translation into protein. During this event, parts of the pre-mRNA transcripts are removed in a ribonucleoprotein complex (spliceosome) which is constituted of five essen- tial small nuclear RNAs and more than 150 polypeptides (2,3). Depending on tissue localization and/or stage of development, pre-mRNA transcripts may be differentially spliced, allowing several transcripts to be built and thus different proteins to be synthesized from the same gene. A prime example of this phenomenon is the Troponin T gene for which 64 different mRNAs have been described (4). This process is called alternative splicing and it is estimated that more than 70% of human protein-coding genes are alternatively spliced (5). Understanding how splicing is regulated is thus crucial, particularly in a medical context, since genomic variations which cause aberrant splicing may represent up to 50% of all mutations that lead to gene dysfunction (6). Mutations can indeed not only alter directly the sequence that will be translated into pro- tein, for instance, base substitutions can change a codon for an amino acid into another one or into a premature termination codon (PTC), but can also affect splicing and, as a consequence, lead to the appearance of truncated proteins or to the lack of the correct gene product. How are exons and introns recognized during the splicing process? Exon definition (7) is the identification of splice sites located at the 5 0 and 3 0 ends of exon–intron– exon junctions (5 0 ss and 3 0 ss also known as donor and acceptor splice site, respectively). At 3 0 end of introns, a branch point sequence and a polypyrimidine tract, which are situated upstream the 3 0 ss, are also used as consensus elements. These consensus sequences have probably evolved from ancestral common sequences as it has been reported for the 5 0 site with the AG/guaagu prototype sequence whose eight contiguous nucleotides are comple- mentary to nucleotides 4–11 of U1RNA (8). The diver- gence of splice site sequences from the prototypes has been closely associated with the creation of alternative tran- scripts. Moreover, in higher eukaryotes, these highly degenerated motifs can also be found in most introns, framing pseudo-exons. Pseudo-exons are intronic sequences of typical exon size that outnumber real exons and are bounded by sequences that match the 5 0 and 3 0 *To whom correspondence should be addressed. Tel: +33 4 67 41 53 60; Fax: +33 4 67 41 53 65; Email: [email protected] ß 2009 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research Advance Access published April 1, 2009 by guest on February 9, 2016 http://nar.oxfordjournals.org/ Downloaded from

Transcript of Human Splicing Finder: an online bioinformatics tool to predict splicing signals

Nucleic Acids Research 2009 1ndash14doi101093nargkp215

Human Splicing Finder an online bioinformatics toolto predict splicing signalsFrancois-Olivier Desmet1 Dalil Hamroun12 Marine Lalande1

Gwenaelle Collod-Beroud1 Mireille Claustres123 and Christophe Beroud123

1INSERM U827 2CHU Montpellier Hopital Arnaud de Villeneuve Laboratoire de Genetique Moleculaire and3Universite Montpellier1 UFR Medecine Montpellier F-34000 France

Received December 11 2008 Revised February 28 2009 Accepted March 16 2009

ABSTRACT

Thousands of mutations are identified yearlyAlthough many directly affect protein expressionan increasing proportion of mutations is nowbelieved to influence mRNA splicing They mostlyaffect existing splice sites but synonymousnon-synonymous or nonsense mutations can alsocreate or disrupt splice sites or auxiliary cis-splicingsequences To facilitate the analysis of the differentmutations we designed Human Splicing Finder(HSF) a tool to predict the effects of mutations onsplicing signals or to identify splicing motifs in anyhuman sequence It contains all available matricesfor auxiliary sequence prediction as well as newones for binding sites of the 9G8 and Tra2-bSerine-Arginine proteins and the hnRNP A1 ribonu-cleoprotein We also developed new Position WeightMatrices to assess the strength of 5rsquo and 3rsquo splicesites and branch points We evaluated HSF effi-ciency using a set of 83 intronic and 35 exonic muta-tions known to result in splicing defects We showedthat the mutation effect was correctly predicted inalmost all cases HSF could thus represent a valu-able resource for research diagnostic and thera-peutic (eg therapeutic exon skipping) purposes aswell as for global studies such as the GEN2PHENEuropean Project or the Human Variome Project

INTRODUCTION

Since its discovery more than three decades ago (1)mRNA splicing is the focus of many studies both in fun-damental and applied research Splicing is part of the pre-mRNA maturation process that occurs in each eukaryoticcell between mRNA transcription from DNA and itstranslation into protein During this event parts of thepre-mRNA transcripts are removed in a ribonucleoprotein

complex (spliceosome) which is constituted of five essen-tial small nuclear RNAs and more than 150 polypeptides(23) Depending on tissue localization andor stage ofdevelopment pre-mRNA transcripts may be differentiallyspliced allowing several transcripts to be built and thusdifferent proteins to be synthesized from the same gene Aprime example of this phenomenon is the Troponin T genefor which 64 different mRNAs have been described (4)This process is called alternative splicing and it is estimatedthat more than 70 of human protein-coding genes arealternatively spliced (5) Understanding how splicing isregulated is thus crucial particularly in a medical contextsince genomic variations which cause aberrant splicingmay represent up to 50 of all mutations that lead togene dysfunction (6) Mutations can indeed not onlyalter directly the sequence that will be translated into pro-tein for instance base substitutions can change a codonfor an amino acid into another one or into a prematuretermination codon (PTC) but can also affect splicing andas a consequence lead to the appearance of truncatedproteins or to the lack of the correct gene productHow are exons and introns recognized during the

splicing process Exon definition (7) is the identificationof splice sites located at the 50 and 30 ends of exonndashintronndashexon junctions (50ss and 30ss also known as donor andacceptor splice site respectively) At 30 end of introns abranch point sequence and a polypyrimidine tract whichare situated upstream the 30ss are also used as consensuselements These consensus sequences have probablyevolved from ancestral common sequences as it has beenreported for the 50 site with the AGguaagu prototypesequence whose eight contiguous nucleotides are comple-mentary to nucleotides 4ndash11 of U1RNA (8) The diver-gence of splice site sequences from the prototypes has beenclosely associated with the creation of alternative tran-scripts Moreover in higher eukaryotes these highlydegenerated motifs can also be found in most intronsframing pseudo-exons Pseudo-exons are intronicsequences of typical exon size that outnumber real exonsand are bounded by sequences that match the 50 and 30

To whom correspondence should be addressed Tel +33 4 67 41 53 60 Fax +33 4 67 41 53 65 Email christopheberoudinsermfr

2009 The Author(s)This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc20uk) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited

Nucleic Acids Research Advance Access published April 1 2009 by guest on February 9 2016

httpnaroxfordjournalsorgD

ownloaded from

splicing signal requirements of an exon but that are neverconsidered as proper exons by the spliceosomeFurthermore human transcripts contain many lsquodecoyrsquosplice sites that are seldom used So while 50 and 30 spli-cing signals are mandatory for exon definition they arenot sufficient for correct splicing In order to reliably dis-tinguish authentic exons and splice sites from pseudo-exons and decoy splice sites the splicing machinery mustrely on auxiliary sequence features such as intronic andexonic cis-elements Among them the Exonic SplicingEnhancers (ESEs) are the most studied They are specificshort nucleotide sequences that are targeted essentially bySerineArgine-rich (SR) proteins which then promoteexon definition (9) Conversely the Exonic SplicingSilencers (ESSs) help the spliceosome to ignore pseudo-exons and decoy splice sites They act as binding sitesfor proteins promoting exon exclusion (mainly hnRNPproteins) (10) Intronic Splicing Enhancers (ISEs) andIntronic Splicing Silencers (ISSs) are intronic cis-elementsthat play similar roles as ESEs and ESSsSeveral bioinformatics tools to study or predict splice

signals have been developed and are today availableonline Their approaches can vary (11) from using blastnto align a query sequence to a database of alternativesplicing events and splice signals (12) to an ab initio pre-diction approach (13) Despite the quality of thesetools and because of the complexity of sequence signalsharbored by any mRNA sequence new tools areneeded to simultaneously identify putative donorand acceptor splice sites branch points and cis-actingelements (ESE ESS ISE and ISS) In addition sincemany human disease-causing mutations affect splicingnew bioinformatics tools should also be able to predictthe consequence of mutations on splice signals Suchtool could be of great value not only for geneticists tobetter understand splicing events and the effect of muta-tions on mRNA splicing but also for clinical researchersto design new therapeutic approaches based on splicinginterference such as the exon-skipping strategy usedin Duchenne Muscular Dystrophy (DMD) (14) or geneand exon silencing through manipulation of mRNA splic-ing (15)In this article we present a new bioinformatics tool

the Human Splicing Finder (HSF) software that is freelyavailable online (httpwwwumdbeHSF) It includesnew algorithms derived from the Universal MutationDatabase (UMD) (1617) to allow the evaluation of thestrength of 50ss 30ss and branch points In addition inorder to identify cis-acting elements it includes alreadypublished algorithms such as the RESCUE-ESE (18)and ESE-Finder (19) as well as new algorithms designedto use available or newly created matrices To allow thestudy of virtually any human sequence HSF includes allgenes and alternative transcripts as well as intronicsequences that were extracted from the Ensembl humangenome database (httpwwwensemblorg) (20) To eval-uate the predictive potential of HSF web interface (version24 httpwwwumdbeHSF) we used a set of mutationsfor which the effect on splicing has been experimentallydemonstrated

MATERIAL AND METHODS

Software development and database design

HSF was developed using the 4D package (4D SA) fordata management algorithm design and web interfaceThe HSF database was designed to include the intronsand exons of all human genes It was constructed froman Ensembl dataset (20) containing more than 22 000genes and 46 000 transcripts of Homo sapiens (release44 httpapril2007archiveensemblorg) using Biomart(20) Genes were created from the crude dataset usingboth Ensembl transcript coordinates and sequencesfrom the UCSC genome browser database (21) At pre-sent HSF database only contains human genes sincematrices and tools were specifically designed for thehuman genome

To study the potential effects of single nucleotide poly-morphisms (SNPs) on splicing HSF also harbors dataextracted from the Ensembl Variation database (20) Forthis a Perl script was developed using Ensembl Perl APIthat allows HSF to directly query the Ensembl Variationdatabase and retrieve SNPs located in human genes

Splicing donoracceptor sites

To predict potential 50ss and 30ss we used matrices derivedfrom Shapiro and Senapathy (22) A potential splice site isdefined as an n-mer sequence For each lsquonrsquo position aweight is given to each nucleotide based on its frequencyand the relative importance of its position in the sequencemotif (position weight matrices PWM) The strengthof a site is thus defined as the sum of each nucleotidersquosweight plus a constant (Equation 1) that is used for nor-malization Only n-mer sequences with consensus values(CV) higher or equal to a given threshold are consideredas potential 50 or 30ss

Since the human 50 consensus sequence is [CA]AGgt[ag]agt we defined the 50ss as a 9-mer matrixSimilarly the 30ss was defined as a 14-mer matrixCalculation of the strength of a potential splice site For50ss x=9 and for 30ss x=14

Sitestrength frac14 BasevaluethornXx

ifrac141

nucleotidevalueethiTHORN

HSF also includes an algorithm adapted from the MaxEntscript (23) that allows the analysis of a whole sequence Inaddition for this matrix users can define thresholds forsplice site prediction

Branch point sequences

Since the human branch point (BP) consensus sequence isYNYCRAY (24) we defined the BP sequence as a 7 4position weight matrix (Figure 1) The threshold for BPsequences was fixed at 67 The strength of a BP sequencewas thus calculated as follows (Equation 2)

BPSitestrength frac14 BasevaluethornX7

ifrac141

nucleotidevalueethiTHORN

2 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Since many intronic sequences match the BP consensussequence we included the AG-Exclusion Zone algorithmdescribed by Gooding et al (25) to predict BP candidatesFor a given intronic sequence and its intron-exon bound-ary HSF searches all AG dinucleotides that are includedin a 30ss candidate sequence (threshold of 67) and there-fore define the exclusion zones As it has been shown thatthe BP allows the recognition of the first downstream 30ssHSF annotates the functional BP as the strongest candi-date without a 30-exclusion zone before the natural 30ss

Additionally to take into account the steric obstructioncaused by the spliceosome we excluded BP sequenceslocated at less than 12 nt from the exon Finally as mostBP sequences are located between 21 and 34 nt fromthe exon (26) only a window of 100 bp is processed Wearbitrarily excluded the probability of having a BP motiflocated very far away in order to save computation time

Matrices for splicing enhancers and silencers

To maximize the detection of auxiliary motifs HSF inte-grated (i) matrices for SR proteins (SRp40 SC35 SF2ASF SF2ASF IgMBRCA1 and SRp55) from the ESEFinder tool (1927) (ii) sequence motifs shown to be dif-ferentially present in exons and introns such as theRESCUE-ESE hexamers (18) the putative 8-mer ESEand ESS identified by Zhang and Chasin (28) the ESRsequences identified by Goren and co-workers (29) and theexon-identity elements (EIE) and intron-identity elements(IIE) defined by Zhang and co-workers (30) For thesilencer sequences identified by Sironi and colleagues(31) and the ESS decamers (32) for which no web-basedtool were available we developed new algorithms to usethe crude data

New matrices were also created to predict hnRNP A1Tra2-b and 9G8 protein binding motifs These matriceswere designed using published data collected fromSELEX experiments and consensus sequences Sequenceswere aligned with ClustalW (33) to generate a consensusmotif Note that these motifs were too short to be pro-cessed with MEME (34) The consensus sequences werethen used to design PWM matrices (Figure 2)

Sequence datasets used to evaluate HSF efficiency

To evaluate the new algorithms dedicated to the predic-tion of 50ss and 30ss we used the Ensembl database

(20) that contain 245286 human exons (release 44httpapril2007archiveensemblorg) For BP predictionswe used a set of 14 experimentally validated BPs (Table 3)These datasets were completed by 69 intronic mutations(35ndash56) as well as 15 exonic mutations known to alter 50

and 30ss (5758) and for whom the impact on mRNA splic-ing has been characterized in vivo or in vitro To evaluatethe ability to correctly predict ESE and ESS we used a setof 20 experimentally validated mutations that affect splic-ing by a direct effect on ESE andor ESS (58ndash66) Inaddition we used a set of 36 mutations previously reportedto alter splicing (positive controls) and 220 SNPs (negativecontrols) The negative controls were extracted from thedbSNP database (httpwwwncbinlmnihgovprojectsSNP) and corresponded to SNPs with the highest minorallele frequency and therefore had a minimal risk ofaffecting splicing Conversely the positive controls werechosen because experimental results showed that thesemutations targeted auxiliary splicing sequence motifsNevertheless in most cases the data about the exactmotif andor the protein that recognizes this motif werenot available For each mutation we evaluated only itseffects in terms of disruption of ESE or creation of ESSsignals (Supplementary Table 1)

RESULTS

Web interface and database

HSF web interface was designed to maximize theperception of efficiency and easy of use by end users

Figure 2 New position weight matrices of recognition motifs for pro-teins involved in splicing (A) hnRNP A1 (B) Tra2-b and (C) 9G8

Figure 1 Branch point matrix The size of each nucleotide is propor-tional to its weight in the position weight matrix Nucleotides above thebase line have positive values while nucleotides below have negativevalues

Nucleic Acids Research 2009 3

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Only default parameters are displayed on the submissionform while skilled users can easily access advanced param-eters Sequences stored in the database can be queriedusing either the gene symbol the Ensembl gene ID theEnsembl transcript ID the RefSeq peptide ID or the con-sensus CDS In addition users can process their ownsequences either for simple sequence analysis or mutantcomparison In addition HSF can be queried in differentways full analysis of a sequence comparison of a mutantand a wild-type sequence or simultaneous analysis of sev-eral mutants related or not to the same transcript In thiscase all mutations should be referred to sequencesincluded in the HSF database In order to easily study agroup of mutations from different genes and transcriptsthe mutation must be described by using the internationalnomenclature system for cDNA mutations (67) (httpwwwgenomicunimelbeduaumdimutnomen) HSFwill then check that each mutation is correctly describedand automatically reconstruct the mutant allele from thewild-type sequence and the mutation name Since onlysmall rearrangements (ie substitutions small exonic orintronic deletions and insertions duplications and indels)provide useful information about splicing defects largerearrangements can not be processed by HSFMoreover differently from previous resources the user

can specifically analyze BP sequences or splice site motifsusing HSF specific matrices and algorithms

The main result page was divided in three areas thereference sequence(s) various graphical displays andtables Since mutations could have different effects relatedto the local context a lsquoquick mutationrsquo option allows theaddition of a small rearrangement (missense deletioninsertion duplication indel) to the sequence(s)

Splicing donoracceptor sites

The new HSF algorithm to define consensus values (CV)of 50ss or 30ss was created to maximize the differencebetween wild-type (wt) active sites and mutant inactivesites Thus strong sites presented a CV higher than 80and less strong sites a CV ranging between 70 and 80Only a minor fraction of active sites showed a CV between65 and 70 (Figure 3) The mean CV for 30ss was 8681 witha standard deviation of 633 while the mean CV for 50sswas 8753 with a standard deviation of 834 These valueswere calculated from more than 400 000 natural splicesites extracted from all alternative transcripts If a muta-tion affects directly the CV it is critical to consider notonly the CV of the mutant splice site but also the deltabetween the wt and mutant CV To validate this algo-rithm we used a set of 69 intronic mutations that affecteither the canonical AGGT splice site motifs or less con-served nucleotides (Table 1) All mutations affecting thenucleotides in canonical positions (2 1 +1 or +2)strongly influenced the CV value with an average

Figure 3 Distribution of CVs for (A) 30 and (B) 50 natural splice sites (50ss and 30ss) Data extracted from the Ensembl dataset (release 44 httpapril2007archiveensemblorg) (20) using the HSF algorithm

4 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 1 Intronic mutations in FBN1 (ENST00000316623) FBN2 (ENST00000262464) RB1 (ENST00000267163) TGFBR2 (ENST00000295754)

MLH1 (ENST00000231790) and MSH2 (ENST00000233146) that lead to splicing defects

Gene Mutation References WT CV Mutant CV CV variation ()

Mutations causing exon skippingFBN1 c247+1GgtA (374649ndash5153) 8226 5542 3262a

FBN1 c538+1GgtA (45) 8399 5715 3196a

FBN1 c1468+5GgtA (44) 8446 7230 1440a

FBN1 c3208+5GgtT (82) 9498 8266 1297a

FBN1 c3838+1GgtA (52) 9584 6901 2800a

FBN1 c3839 1GgtT (83) 8762 5867 3304a

FBN1 c3964+1GgtA (8485) 9004 6320 2980a

FBN1 c3965 2AgtT (85) 8930 6035 3241a

FBN1 c4459+1GgtA (44) 9766 7083 2747a

FBN1 c4943 1GgtC (44) 7977 5082 3629a

FBN1 c5788+5GgtA (3536384143525483) 8806 7589 1382a

FBN1 c6163+2del6 (83) 9905 7290 2640a

FBN1 c6496+2insTG (43) 8221 3205 6101a

FBN1 c6616+1GgtC (86) 7808 5124 3437a

FBN1 c6997+1GgtA (83) 9211 6527 2913a

FBN1 c7205 2AgtG (83) 8411 5516 3442a

FBN1 c7330+1GgtA (55) 9802 7118 2738a

FBN1 c7331 2AgtG (40) 8072 5177 3586a

FBN1 c8051+1GgtA (44) 9202 6518 2916a

FBN1 c8051+5GgtA (51) 9202 7985 1322a

FBN1 c8052 2AgtG (52) 9286 6392 3117a

FBN2 c3472+2TgtG (48) 9099 6415 2853a

FBN2 c4099+1GgtC (39) 9166 6482 2928a

FBN2 c4222+5GgtA (47) 9211 7994 1321a

FBN2 c4346 2AgtT (87) 9091 6196 3184a

RB1 c264+4delA (57) 9134 8493 701a

RB1 c380+3AgtC (57) 9510 7882 1712a

RB1 c607+1GgtT (57) 9905 7221 2709a

RB1 c939+4AgtG (57) 8375 7541 996a

RB1 c1049+2delT (57) 7695 5700 2590a

RB1 c1215+1GgtA (57) 8586 5902 3126a

RB1 c1389+1GgtA (57) 8269 5586 3245a

RB1 c1389+4AgtG (57) 8269 7435 1009a

RB1 c1389+5GgtA (57) 8269 7053 1471a

RB1 c1422 2AgtT (57) 8612 5717 3362a

RB1 c1422 1GgtA (57) 8612 5717 3362a

RB1 c1498+5GgtA (57) 8291 7075 1467a

RB1 c1960+1GgtA (57) 9402 6719 2854a

RB1 c1960+1delG (57) 9402 4962 4722a

RB1 c2211+1GgtT (57) 8990 6306 2986a

RB1 c2212 2AgtG (57) 8909 6015 3248a

RB1 c2211+1GgtC (57) 8990 6306 2986a

RB1 c2520+1GgtA (57) 9222 6539 2910a

RB1 c2520+3del4 (57) 9222 7226 2164a

RB1 c2663+1GgtA (57) 8837 6154 3036a

MLH1 c306+4AgtG (58) 9607 8773 868a

MLH1 c454 2AgtG (59) 9359 6464 2880a

MLH1 c790+1GgtA (59) 8328 5645 3222a

MLH1 c790+5GgtT (58) 8328 7097 1479a

MLH1 c791 5TgtG (59) 8080 7717 449MLH1 c884+4AgtG (58) 8575 7741 973a

MSH2 c366+1GgtT (59) 8673 5989 3095a

MSH2 c793 2AgtC (59) 8398 5504 3446a

MSH2 c942+3AgtT (59) 9924 8386 1550a

MSH2 c1276+2TgtA (59) 8470 5786 3169a

MSH2 c1386+1GgtA (59) 8902 6219 3013a

MSH2 c2634+5GgtT (58) 8441 7209 1459a

Mutations resulting in the usage of cryptic splice sitesFBN1 c2293+2TgtC (83) 8977 6294 2989

6721 50 CS (51 nt upstream)b

FBN1 c3463+1GgtA (88) 9134 6450 29388847 50 CS (27 nt downstream)b

FBN1 c4747+5GgtT (42) 8913 7681 13827906 50 CS (48 nt upstream)b

FBN1 c5788+1GgtA (52) 8806 6122 30488264 50 CS (33 nt downstream)b

RB1 c138 8TgtG (57) 8162 7969 236

(continued)

Nucleic Acids Research 2009 5

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

reduction (CV) of 31 and a standard deviation (SD) of28 Mutations affecting less conserved residues had aweaker effect with a CV of 7 for the residue in position+4 and 14 for nucleotides in position +3 or +5 Theseresults together with data from other disease-causingmutations (526869) indicated that a CV reduction ofat least 10 for a mutation in any position or of 7 for amutation in position +4 is likely to have a significantimpact on splicing and should be further investigatedSince a mutation can result not only in the disruption of

a 50ss or a 30ss but also in the creation of a new splice siteHSF evaluates the lsquocreation of cryptic splice sitesrsquo Asshown in Table 1 for intronic mutations HSF correctlypredicted the creation of cryptic splice sites in the RB1mutants c607+1delG c138-8TgtG and c501-1GgtAMutations in canonical sequences such as c95-2AgtGc1397-2AgtG and c1397-1GgtA in TGFBR2 c2293+2TgtC c3463+1GgtA c4747+5GgtT and c5788+1GgtA in FBN1 and c1815-2AgtG c2107-2AgtG andc2211+1GgtC in RB1 led to a more complex splicingdefect in which disruption of the wt splice site was coupledto the usage of an alternative pre-existing splice site Asmutations do not directly affect alternative splice sites thisphenomenon was not automatically investigated by HSFTherefore to identify the alternative splice sites we chosein lsquoSelect an analysis typersquo the option lsquoNumber of nucleo-tides surrounding the exonrsquo and entered the value lsquo100rsquo Inaddition we checked the advanced parameter lsquoProcesssequencersquo and selected the lsquoFull sequencersquo option To ana-lyze only splice sites we then selected in lsquoAll or subset ofmatricesrsquo the lsquoSplice site matricesrsquo option Using theseparameters all alternative sites were identified either asthe closest and strongest alternative sites (five cases) oras the second-best sites (two cases) Overall HSF correctlypredicted the impact of mutations affecting 50ss or 30sseven when complex mechanisms were involvedIn addition to splicing defects due to 50ss or 30ss disrup-

tion it is well known that exonic mutations could result in

the creation or activation of cryptic splice sites As shownin Table 2 the nine mutations affecting the last base of anexon had a strong effect on the activity of the concerned50ss (CV=12 07) that resulted in exon skipping oractivation of a cryptic splice site The two mutationsaffecting the penultimate nucleotide of an exon had a lim-ited effect on the activity of the 50ss (CV=54 03)Indeed these mutations were pathogenic only when acryptic splice site was activated and therefore predictionswere hazardous Finally exonic mutations that were dis-tant both from the 50 and 30ss could activate a crypticsplice site and result in splicing defects as shown for muta-tions c658CgtG in RB1 c1915CgtT in MSH2 andc5985TgtG in DMD

Branch point sequences

We analyzed 14 BP sequences previously reported tobe abolished by mutations As shown in Table 3 13 outof 14 BPs were correctly predicted by HSF with an aver-age strength of 834 and a standard deviation of 86The only discrepancy concerned the mutation localizedin intron 3 of GH1 for which the BP was predicted tobe at position 26 by HSF instead of position 21Note that in both cases the BP was located within thec468-37_468-16del which is responsible for the cases ofautosomal dominant isolated GH deficiency (IGHDII)in one single family and therefore additional data areneeded to identify the functional BP Among the otherBP sequences 12 were reported as targets of point muta-tions leading to their inactivation In six cases the muta-tion involved the critical adenosine residue leading to aremarkable BP of 296 For mutations involvingresidues surrounding the BP the average BP was139 with a SD of 3 Taking into account theweight matrix (Figure 1) and experimental data thethreshold for BP prediction was thus set at 67

Table 1 Continued

Gene Mutation References WT CV Mutant CV CV variation ()

5535 8429 30 CS (7 nt upstream)c

RB1 c501 1GgtA (57) 9750 6855 29695482 8377 30 CS (1 nt downstream)c

RB1 c607+1delG (57) 9905 2254 77244251 8847 50 CS (1 nt upstream)c

RB1 c1815 2AgtG (57) 7542 4647 38398184 30 CS (19 nt downstream)b

RB1 c2107 2AgtG (57) 8073 5178 35866956 30 CS (35 nt downstream)b

TGFBR2 c95 2AgtG (56) 9177 6282 31556828 30 CS (18 nt downstream)b

TGFBR2 c1397 2AgtG (89) 9232 6338 31358432 30 CS (30 nt upstream)b

TGFBR2 c1397 1GgtA (90) 9232 6338 31358432 30 CS (30 nt upstream)b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe mutation induces exon skippingbA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSFcThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF

6 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

splicing signal requirements of an exon but that are neverconsidered as proper exons by the spliceosomeFurthermore human transcripts contain many lsquodecoyrsquosplice sites that are seldom used So while 50 and 30 spli-cing signals are mandatory for exon definition they arenot sufficient for correct splicing In order to reliably dis-tinguish authentic exons and splice sites from pseudo-exons and decoy splice sites the splicing machinery mustrely on auxiliary sequence features such as intronic andexonic cis-elements Among them the Exonic SplicingEnhancers (ESEs) are the most studied They are specificshort nucleotide sequences that are targeted essentially bySerineArgine-rich (SR) proteins which then promoteexon definition (9) Conversely the Exonic SplicingSilencers (ESSs) help the spliceosome to ignore pseudo-exons and decoy splice sites They act as binding sitesfor proteins promoting exon exclusion (mainly hnRNPproteins) (10) Intronic Splicing Enhancers (ISEs) andIntronic Splicing Silencers (ISSs) are intronic cis-elementsthat play similar roles as ESEs and ESSsSeveral bioinformatics tools to study or predict splice

signals have been developed and are today availableonline Their approaches can vary (11) from using blastnto align a query sequence to a database of alternativesplicing events and splice signals (12) to an ab initio pre-diction approach (13) Despite the quality of thesetools and because of the complexity of sequence signalsharbored by any mRNA sequence new tools areneeded to simultaneously identify putative donorand acceptor splice sites branch points and cis-actingelements (ESE ESS ISE and ISS) In addition sincemany human disease-causing mutations affect splicingnew bioinformatics tools should also be able to predictthe consequence of mutations on splice signals Suchtool could be of great value not only for geneticists tobetter understand splicing events and the effect of muta-tions on mRNA splicing but also for clinical researchersto design new therapeutic approaches based on splicinginterference such as the exon-skipping strategy usedin Duchenne Muscular Dystrophy (DMD) (14) or geneand exon silencing through manipulation of mRNA splic-ing (15)In this article we present a new bioinformatics tool

the Human Splicing Finder (HSF) software that is freelyavailable online (httpwwwumdbeHSF) It includesnew algorithms derived from the Universal MutationDatabase (UMD) (1617) to allow the evaluation of thestrength of 50ss 30ss and branch points In addition inorder to identify cis-acting elements it includes alreadypublished algorithms such as the RESCUE-ESE (18)and ESE-Finder (19) as well as new algorithms designedto use available or newly created matrices To allow thestudy of virtually any human sequence HSF includes allgenes and alternative transcripts as well as intronicsequences that were extracted from the Ensembl humangenome database (httpwwwensemblorg) (20) To eval-uate the predictive potential of HSF web interface (version24 httpwwwumdbeHSF) we used a set of mutationsfor which the effect on splicing has been experimentallydemonstrated

MATERIAL AND METHODS

Software development and database design

HSF was developed using the 4D package (4D SA) fordata management algorithm design and web interfaceThe HSF database was designed to include the intronsand exons of all human genes It was constructed froman Ensembl dataset (20) containing more than 22 000genes and 46 000 transcripts of Homo sapiens (release44 httpapril2007archiveensemblorg) using Biomart(20) Genes were created from the crude dataset usingboth Ensembl transcript coordinates and sequencesfrom the UCSC genome browser database (21) At pre-sent HSF database only contains human genes sincematrices and tools were specifically designed for thehuman genome

To study the potential effects of single nucleotide poly-morphisms (SNPs) on splicing HSF also harbors dataextracted from the Ensembl Variation database (20) Forthis a Perl script was developed using Ensembl Perl APIthat allows HSF to directly query the Ensembl Variationdatabase and retrieve SNPs located in human genes

Splicing donoracceptor sites

To predict potential 50ss and 30ss we used matrices derivedfrom Shapiro and Senapathy (22) A potential splice site isdefined as an n-mer sequence For each lsquonrsquo position aweight is given to each nucleotide based on its frequencyand the relative importance of its position in the sequencemotif (position weight matrices PWM) The strengthof a site is thus defined as the sum of each nucleotidersquosweight plus a constant (Equation 1) that is used for nor-malization Only n-mer sequences with consensus values(CV) higher or equal to a given threshold are consideredas potential 50 or 30ss

Since the human 50 consensus sequence is [CA]AGgt[ag]agt we defined the 50ss as a 9-mer matrixSimilarly the 30ss was defined as a 14-mer matrixCalculation of the strength of a potential splice site For50ss x=9 and for 30ss x=14

Sitestrength frac14 BasevaluethornXx

ifrac141

nucleotidevalueethiTHORN

HSF also includes an algorithm adapted from the MaxEntscript (23) that allows the analysis of a whole sequence Inaddition for this matrix users can define thresholds forsplice site prediction

Branch point sequences

Since the human branch point (BP) consensus sequence isYNYCRAY (24) we defined the BP sequence as a 7 4position weight matrix (Figure 1) The threshold for BPsequences was fixed at 67 The strength of a BP sequencewas thus calculated as follows (Equation 2)

BPSitestrength frac14 BasevaluethornX7

ifrac141

nucleotidevalueethiTHORN

2 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Since many intronic sequences match the BP consensussequence we included the AG-Exclusion Zone algorithmdescribed by Gooding et al (25) to predict BP candidatesFor a given intronic sequence and its intron-exon bound-ary HSF searches all AG dinucleotides that are includedin a 30ss candidate sequence (threshold of 67) and there-fore define the exclusion zones As it has been shown thatthe BP allows the recognition of the first downstream 30ssHSF annotates the functional BP as the strongest candi-date without a 30-exclusion zone before the natural 30ss

Additionally to take into account the steric obstructioncaused by the spliceosome we excluded BP sequenceslocated at less than 12 nt from the exon Finally as mostBP sequences are located between 21 and 34 nt fromthe exon (26) only a window of 100 bp is processed Wearbitrarily excluded the probability of having a BP motiflocated very far away in order to save computation time

Matrices for splicing enhancers and silencers

To maximize the detection of auxiliary motifs HSF inte-grated (i) matrices for SR proteins (SRp40 SC35 SF2ASF SF2ASF IgMBRCA1 and SRp55) from the ESEFinder tool (1927) (ii) sequence motifs shown to be dif-ferentially present in exons and introns such as theRESCUE-ESE hexamers (18) the putative 8-mer ESEand ESS identified by Zhang and Chasin (28) the ESRsequences identified by Goren and co-workers (29) and theexon-identity elements (EIE) and intron-identity elements(IIE) defined by Zhang and co-workers (30) For thesilencer sequences identified by Sironi and colleagues(31) and the ESS decamers (32) for which no web-basedtool were available we developed new algorithms to usethe crude data

New matrices were also created to predict hnRNP A1Tra2-b and 9G8 protein binding motifs These matriceswere designed using published data collected fromSELEX experiments and consensus sequences Sequenceswere aligned with ClustalW (33) to generate a consensusmotif Note that these motifs were too short to be pro-cessed with MEME (34) The consensus sequences werethen used to design PWM matrices (Figure 2)

Sequence datasets used to evaluate HSF efficiency

To evaluate the new algorithms dedicated to the predic-tion of 50ss and 30ss we used the Ensembl database

(20) that contain 245286 human exons (release 44httpapril2007archiveensemblorg) For BP predictionswe used a set of 14 experimentally validated BPs (Table 3)These datasets were completed by 69 intronic mutations(35ndash56) as well as 15 exonic mutations known to alter 50

and 30ss (5758) and for whom the impact on mRNA splic-ing has been characterized in vivo or in vitro To evaluatethe ability to correctly predict ESE and ESS we used a setof 20 experimentally validated mutations that affect splic-ing by a direct effect on ESE andor ESS (58ndash66) Inaddition we used a set of 36 mutations previously reportedto alter splicing (positive controls) and 220 SNPs (negativecontrols) The negative controls were extracted from thedbSNP database (httpwwwncbinlmnihgovprojectsSNP) and corresponded to SNPs with the highest minorallele frequency and therefore had a minimal risk ofaffecting splicing Conversely the positive controls werechosen because experimental results showed that thesemutations targeted auxiliary splicing sequence motifsNevertheless in most cases the data about the exactmotif andor the protein that recognizes this motif werenot available For each mutation we evaluated only itseffects in terms of disruption of ESE or creation of ESSsignals (Supplementary Table 1)

RESULTS

Web interface and database

HSF web interface was designed to maximize theperception of efficiency and easy of use by end users

Figure 2 New position weight matrices of recognition motifs for pro-teins involved in splicing (A) hnRNP A1 (B) Tra2-b and (C) 9G8

Figure 1 Branch point matrix The size of each nucleotide is propor-tional to its weight in the position weight matrix Nucleotides above thebase line have positive values while nucleotides below have negativevalues

Nucleic Acids Research 2009 3

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Only default parameters are displayed on the submissionform while skilled users can easily access advanced param-eters Sequences stored in the database can be queriedusing either the gene symbol the Ensembl gene ID theEnsembl transcript ID the RefSeq peptide ID or the con-sensus CDS In addition users can process their ownsequences either for simple sequence analysis or mutantcomparison In addition HSF can be queried in differentways full analysis of a sequence comparison of a mutantand a wild-type sequence or simultaneous analysis of sev-eral mutants related or not to the same transcript In thiscase all mutations should be referred to sequencesincluded in the HSF database In order to easily study agroup of mutations from different genes and transcriptsthe mutation must be described by using the internationalnomenclature system for cDNA mutations (67) (httpwwwgenomicunimelbeduaumdimutnomen) HSFwill then check that each mutation is correctly describedand automatically reconstruct the mutant allele from thewild-type sequence and the mutation name Since onlysmall rearrangements (ie substitutions small exonic orintronic deletions and insertions duplications and indels)provide useful information about splicing defects largerearrangements can not be processed by HSFMoreover differently from previous resources the user

can specifically analyze BP sequences or splice site motifsusing HSF specific matrices and algorithms

The main result page was divided in three areas thereference sequence(s) various graphical displays andtables Since mutations could have different effects relatedto the local context a lsquoquick mutationrsquo option allows theaddition of a small rearrangement (missense deletioninsertion duplication indel) to the sequence(s)

Splicing donoracceptor sites

The new HSF algorithm to define consensus values (CV)of 50ss or 30ss was created to maximize the differencebetween wild-type (wt) active sites and mutant inactivesites Thus strong sites presented a CV higher than 80and less strong sites a CV ranging between 70 and 80Only a minor fraction of active sites showed a CV between65 and 70 (Figure 3) The mean CV for 30ss was 8681 witha standard deviation of 633 while the mean CV for 50sswas 8753 with a standard deviation of 834 These valueswere calculated from more than 400 000 natural splicesites extracted from all alternative transcripts If a muta-tion affects directly the CV it is critical to consider notonly the CV of the mutant splice site but also the deltabetween the wt and mutant CV To validate this algo-rithm we used a set of 69 intronic mutations that affecteither the canonical AGGT splice site motifs or less con-served nucleotides (Table 1) All mutations affecting thenucleotides in canonical positions (2 1 +1 or +2)strongly influenced the CV value with an average

Figure 3 Distribution of CVs for (A) 30 and (B) 50 natural splice sites (50ss and 30ss) Data extracted from the Ensembl dataset (release 44 httpapril2007archiveensemblorg) (20) using the HSF algorithm

4 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 1 Intronic mutations in FBN1 (ENST00000316623) FBN2 (ENST00000262464) RB1 (ENST00000267163) TGFBR2 (ENST00000295754)

MLH1 (ENST00000231790) and MSH2 (ENST00000233146) that lead to splicing defects

Gene Mutation References WT CV Mutant CV CV variation ()

Mutations causing exon skippingFBN1 c247+1GgtA (374649ndash5153) 8226 5542 3262a

FBN1 c538+1GgtA (45) 8399 5715 3196a

FBN1 c1468+5GgtA (44) 8446 7230 1440a

FBN1 c3208+5GgtT (82) 9498 8266 1297a

FBN1 c3838+1GgtA (52) 9584 6901 2800a

FBN1 c3839 1GgtT (83) 8762 5867 3304a

FBN1 c3964+1GgtA (8485) 9004 6320 2980a

FBN1 c3965 2AgtT (85) 8930 6035 3241a

FBN1 c4459+1GgtA (44) 9766 7083 2747a

FBN1 c4943 1GgtC (44) 7977 5082 3629a

FBN1 c5788+5GgtA (3536384143525483) 8806 7589 1382a

FBN1 c6163+2del6 (83) 9905 7290 2640a

FBN1 c6496+2insTG (43) 8221 3205 6101a

FBN1 c6616+1GgtC (86) 7808 5124 3437a

FBN1 c6997+1GgtA (83) 9211 6527 2913a

FBN1 c7205 2AgtG (83) 8411 5516 3442a

FBN1 c7330+1GgtA (55) 9802 7118 2738a

FBN1 c7331 2AgtG (40) 8072 5177 3586a

FBN1 c8051+1GgtA (44) 9202 6518 2916a

FBN1 c8051+5GgtA (51) 9202 7985 1322a

FBN1 c8052 2AgtG (52) 9286 6392 3117a

FBN2 c3472+2TgtG (48) 9099 6415 2853a

FBN2 c4099+1GgtC (39) 9166 6482 2928a

FBN2 c4222+5GgtA (47) 9211 7994 1321a

FBN2 c4346 2AgtT (87) 9091 6196 3184a

RB1 c264+4delA (57) 9134 8493 701a

RB1 c380+3AgtC (57) 9510 7882 1712a

RB1 c607+1GgtT (57) 9905 7221 2709a

RB1 c939+4AgtG (57) 8375 7541 996a

RB1 c1049+2delT (57) 7695 5700 2590a

RB1 c1215+1GgtA (57) 8586 5902 3126a

RB1 c1389+1GgtA (57) 8269 5586 3245a

RB1 c1389+4AgtG (57) 8269 7435 1009a

RB1 c1389+5GgtA (57) 8269 7053 1471a

RB1 c1422 2AgtT (57) 8612 5717 3362a

RB1 c1422 1GgtA (57) 8612 5717 3362a

RB1 c1498+5GgtA (57) 8291 7075 1467a

RB1 c1960+1GgtA (57) 9402 6719 2854a

RB1 c1960+1delG (57) 9402 4962 4722a

RB1 c2211+1GgtT (57) 8990 6306 2986a

RB1 c2212 2AgtG (57) 8909 6015 3248a

RB1 c2211+1GgtC (57) 8990 6306 2986a

RB1 c2520+1GgtA (57) 9222 6539 2910a

RB1 c2520+3del4 (57) 9222 7226 2164a

RB1 c2663+1GgtA (57) 8837 6154 3036a

MLH1 c306+4AgtG (58) 9607 8773 868a

MLH1 c454 2AgtG (59) 9359 6464 2880a

MLH1 c790+1GgtA (59) 8328 5645 3222a

MLH1 c790+5GgtT (58) 8328 7097 1479a

MLH1 c791 5TgtG (59) 8080 7717 449MLH1 c884+4AgtG (58) 8575 7741 973a

MSH2 c366+1GgtT (59) 8673 5989 3095a

MSH2 c793 2AgtC (59) 8398 5504 3446a

MSH2 c942+3AgtT (59) 9924 8386 1550a

MSH2 c1276+2TgtA (59) 8470 5786 3169a

MSH2 c1386+1GgtA (59) 8902 6219 3013a

MSH2 c2634+5GgtT (58) 8441 7209 1459a

Mutations resulting in the usage of cryptic splice sitesFBN1 c2293+2TgtC (83) 8977 6294 2989

6721 50 CS (51 nt upstream)b

FBN1 c3463+1GgtA (88) 9134 6450 29388847 50 CS (27 nt downstream)b

FBN1 c4747+5GgtT (42) 8913 7681 13827906 50 CS (48 nt upstream)b

FBN1 c5788+1GgtA (52) 8806 6122 30488264 50 CS (33 nt downstream)b

RB1 c138 8TgtG (57) 8162 7969 236

(continued)

Nucleic Acids Research 2009 5

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

reduction (CV) of 31 and a standard deviation (SD) of28 Mutations affecting less conserved residues had aweaker effect with a CV of 7 for the residue in position+4 and 14 for nucleotides in position +3 or +5 Theseresults together with data from other disease-causingmutations (526869) indicated that a CV reduction ofat least 10 for a mutation in any position or of 7 for amutation in position +4 is likely to have a significantimpact on splicing and should be further investigatedSince a mutation can result not only in the disruption of

a 50ss or a 30ss but also in the creation of a new splice siteHSF evaluates the lsquocreation of cryptic splice sitesrsquo Asshown in Table 1 for intronic mutations HSF correctlypredicted the creation of cryptic splice sites in the RB1mutants c607+1delG c138-8TgtG and c501-1GgtAMutations in canonical sequences such as c95-2AgtGc1397-2AgtG and c1397-1GgtA in TGFBR2 c2293+2TgtC c3463+1GgtA c4747+5GgtT and c5788+1GgtA in FBN1 and c1815-2AgtG c2107-2AgtG andc2211+1GgtC in RB1 led to a more complex splicingdefect in which disruption of the wt splice site was coupledto the usage of an alternative pre-existing splice site Asmutations do not directly affect alternative splice sites thisphenomenon was not automatically investigated by HSFTherefore to identify the alternative splice sites we chosein lsquoSelect an analysis typersquo the option lsquoNumber of nucleo-tides surrounding the exonrsquo and entered the value lsquo100rsquo Inaddition we checked the advanced parameter lsquoProcesssequencersquo and selected the lsquoFull sequencersquo option To ana-lyze only splice sites we then selected in lsquoAll or subset ofmatricesrsquo the lsquoSplice site matricesrsquo option Using theseparameters all alternative sites were identified either asthe closest and strongest alternative sites (five cases) oras the second-best sites (two cases) Overall HSF correctlypredicted the impact of mutations affecting 50ss or 30sseven when complex mechanisms were involvedIn addition to splicing defects due to 50ss or 30ss disrup-

tion it is well known that exonic mutations could result in

the creation or activation of cryptic splice sites As shownin Table 2 the nine mutations affecting the last base of anexon had a strong effect on the activity of the concerned50ss (CV=12 07) that resulted in exon skipping oractivation of a cryptic splice site The two mutationsaffecting the penultimate nucleotide of an exon had a lim-ited effect on the activity of the 50ss (CV=54 03)Indeed these mutations were pathogenic only when acryptic splice site was activated and therefore predictionswere hazardous Finally exonic mutations that were dis-tant both from the 50 and 30ss could activate a crypticsplice site and result in splicing defects as shown for muta-tions c658CgtG in RB1 c1915CgtT in MSH2 andc5985TgtG in DMD

Branch point sequences

We analyzed 14 BP sequences previously reported tobe abolished by mutations As shown in Table 3 13 outof 14 BPs were correctly predicted by HSF with an aver-age strength of 834 and a standard deviation of 86The only discrepancy concerned the mutation localizedin intron 3 of GH1 for which the BP was predicted tobe at position 26 by HSF instead of position 21Note that in both cases the BP was located within thec468-37_468-16del which is responsible for the cases ofautosomal dominant isolated GH deficiency (IGHDII)in one single family and therefore additional data areneeded to identify the functional BP Among the otherBP sequences 12 were reported as targets of point muta-tions leading to their inactivation In six cases the muta-tion involved the critical adenosine residue leading to aremarkable BP of 296 For mutations involvingresidues surrounding the BP the average BP was139 with a SD of 3 Taking into account theweight matrix (Figure 1) and experimental data thethreshold for BP prediction was thus set at 67

Table 1 Continued

Gene Mutation References WT CV Mutant CV CV variation ()

5535 8429 30 CS (7 nt upstream)c

RB1 c501 1GgtA (57) 9750 6855 29695482 8377 30 CS (1 nt downstream)c

RB1 c607+1delG (57) 9905 2254 77244251 8847 50 CS (1 nt upstream)c

RB1 c1815 2AgtG (57) 7542 4647 38398184 30 CS (19 nt downstream)b

RB1 c2107 2AgtG (57) 8073 5178 35866956 30 CS (35 nt downstream)b

TGFBR2 c95 2AgtG (56) 9177 6282 31556828 30 CS (18 nt downstream)b

TGFBR2 c1397 2AgtG (89) 9232 6338 31358432 30 CS (30 nt upstream)b

TGFBR2 c1397 1GgtA (90) 9232 6338 31358432 30 CS (30 nt upstream)b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe mutation induces exon skippingbA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSFcThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF

6 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Since many intronic sequences match the BP consensussequence we included the AG-Exclusion Zone algorithmdescribed by Gooding et al (25) to predict BP candidatesFor a given intronic sequence and its intron-exon bound-ary HSF searches all AG dinucleotides that are includedin a 30ss candidate sequence (threshold of 67) and there-fore define the exclusion zones As it has been shown thatthe BP allows the recognition of the first downstream 30ssHSF annotates the functional BP as the strongest candi-date without a 30-exclusion zone before the natural 30ss

Additionally to take into account the steric obstructioncaused by the spliceosome we excluded BP sequenceslocated at less than 12 nt from the exon Finally as mostBP sequences are located between 21 and 34 nt fromthe exon (26) only a window of 100 bp is processed Wearbitrarily excluded the probability of having a BP motiflocated very far away in order to save computation time

Matrices for splicing enhancers and silencers

To maximize the detection of auxiliary motifs HSF inte-grated (i) matrices for SR proteins (SRp40 SC35 SF2ASF SF2ASF IgMBRCA1 and SRp55) from the ESEFinder tool (1927) (ii) sequence motifs shown to be dif-ferentially present in exons and introns such as theRESCUE-ESE hexamers (18) the putative 8-mer ESEand ESS identified by Zhang and Chasin (28) the ESRsequences identified by Goren and co-workers (29) and theexon-identity elements (EIE) and intron-identity elements(IIE) defined by Zhang and co-workers (30) For thesilencer sequences identified by Sironi and colleagues(31) and the ESS decamers (32) for which no web-basedtool were available we developed new algorithms to usethe crude data

New matrices were also created to predict hnRNP A1Tra2-b and 9G8 protein binding motifs These matriceswere designed using published data collected fromSELEX experiments and consensus sequences Sequenceswere aligned with ClustalW (33) to generate a consensusmotif Note that these motifs were too short to be pro-cessed with MEME (34) The consensus sequences werethen used to design PWM matrices (Figure 2)

Sequence datasets used to evaluate HSF efficiency

To evaluate the new algorithms dedicated to the predic-tion of 50ss and 30ss we used the Ensembl database

(20) that contain 245286 human exons (release 44httpapril2007archiveensemblorg) For BP predictionswe used a set of 14 experimentally validated BPs (Table 3)These datasets were completed by 69 intronic mutations(35ndash56) as well as 15 exonic mutations known to alter 50

and 30ss (5758) and for whom the impact on mRNA splic-ing has been characterized in vivo or in vitro To evaluatethe ability to correctly predict ESE and ESS we used a setof 20 experimentally validated mutations that affect splic-ing by a direct effect on ESE andor ESS (58ndash66) Inaddition we used a set of 36 mutations previously reportedto alter splicing (positive controls) and 220 SNPs (negativecontrols) The negative controls were extracted from thedbSNP database (httpwwwncbinlmnihgovprojectsSNP) and corresponded to SNPs with the highest minorallele frequency and therefore had a minimal risk ofaffecting splicing Conversely the positive controls werechosen because experimental results showed that thesemutations targeted auxiliary splicing sequence motifsNevertheless in most cases the data about the exactmotif andor the protein that recognizes this motif werenot available For each mutation we evaluated only itseffects in terms of disruption of ESE or creation of ESSsignals (Supplementary Table 1)

RESULTS

Web interface and database

HSF web interface was designed to maximize theperception of efficiency and easy of use by end users

Figure 2 New position weight matrices of recognition motifs for pro-teins involved in splicing (A) hnRNP A1 (B) Tra2-b and (C) 9G8

Figure 1 Branch point matrix The size of each nucleotide is propor-tional to its weight in the position weight matrix Nucleotides above thebase line have positive values while nucleotides below have negativevalues

Nucleic Acids Research 2009 3

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Only default parameters are displayed on the submissionform while skilled users can easily access advanced param-eters Sequences stored in the database can be queriedusing either the gene symbol the Ensembl gene ID theEnsembl transcript ID the RefSeq peptide ID or the con-sensus CDS In addition users can process their ownsequences either for simple sequence analysis or mutantcomparison In addition HSF can be queried in differentways full analysis of a sequence comparison of a mutantand a wild-type sequence or simultaneous analysis of sev-eral mutants related or not to the same transcript In thiscase all mutations should be referred to sequencesincluded in the HSF database In order to easily study agroup of mutations from different genes and transcriptsthe mutation must be described by using the internationalnomenclature system for cDNA mutations (67) (httpwwwgenomicunimelbeduaumdimutnomen) HSFwill then check that each mutation is correctly describedand automatically reconstruct the mutant allele from thewild-type sequence and the mutation name Since onlysmall rearrangements (ie substitutions small exonic orintronic deletions and insertions duplications and indels)provide useful information about splicing defects largerearrangements can not be processed by HSFMoreover differently from previous resources the user

can specifically analyze BP sequences or splice site motifsusing HSF specific matrices and algorithms

The main result page was divided in three areas thereference sequence(s) various graphical displays andtables Since mutations could have different effects relatedto the local context a lsquoquick mutationrsquo option allows theaddition of a small rearrangement (missense deletioninsertion duplication indel) to the sequence(s)

Splicing donoracceptor sites

The new HSF algorithm to define consensus values (CV)of 50ss or 30ss was created to maximize the differencebetween wild-type (wt) active sites and mutant inactivesites Thus strong sites presented a CV higher than 80and less strong sites a CV ranging between 70 and 80Only a minor fraction of active sites showed a CV between65 and 70 (Figure 3) The mean CV for 30ss was 8681 witha standard deviation of 633 while the mean CV for 50sswas 8753 with a standard deviation of 834 These valueswere calculated from more than 400 000 natural splicesites extracted from all alternative transcripts If a muta-tion affects directly the CV it is critical to consider notonly the CV of the mutant splice site but also the deltabetween the wt and mutant CV To validate this algo-rithm we used a set of 69 intronic mutations that affecteither the canonical AGGT splice site motifs or less con-served nucleotides (Table 1) All mutations affecting thenucleotides in canonical positions (2 1 +1 or +2)strongly influenced the CV value with an average

Figure 3 Distribution of CVs for (A) 30 and (B) 50 natural splice sites (50ss and 30ss) Data extracted from the Ensembl dataset (release 44 httpapril2007archiveensemblorg) (20) using the HSF algorithm

4 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 1 Intronic mutations in FBN1 (ENST00000316623) FBN2 (ENST00000262464) RB1 (ENST00000267163) TGFBR2 (ENST00000295754)

MLH1 (ENST00000231790) and MSH2 (ENST00000233146) that lead to splicing defects

Gene Mutation References WT CV Mutant CV CV variation ()

Mutations causing exon skippingFBN1 c247+1GgtA (374649ndash5153) 8226 5542 3262a

FBN1 c538+1GgtA (45) 8399 5715 3196a

FBN1 c1468+5GgtA (44) 8446 7230 1440a

FBN1 c3208+5GgtT (82) 9498 8266 1297a

FBN1 c3838+1GgtA (52) 9584 6901 2800a

FBN1 c3839 1GgtT (83) 8762 5867 3304a

FBN1 c3964+1GgtA (8485) 9004 6320 2980a

FBN1 c3965 2AgtT (85) 8930 6035 3241a

FBN1 c4459+1GgtA (44) 9766 7083 2747a

FBN1 c4943 1GgtC (44) 7977 5082 3629a

FBN1 c5788+5GgtA (3536384143525483) 8806 7589 1382a

FBN1 c6163+2del6 (83) 9905 7290 2640a

FBN1 c6496+2insTG (43) 8221 3205 6101a

FBN1 c6616+1GgtC (86) 7808 5124 3437a

FBN1 c6997+1GgtA (83) 9211 6527 2913a

FBN1 c7205 2AgtG (83) 8411 5516 3442a

FBN1 c7330+1GgtA (55) 9802 7118 2738a

FBN1 c7331 2AgtG (40) 8072 5177 3586a

FBN1 c8051+1GgtA (44) 9202 6518 2916a

FBN1 c8051+5GgtA (51) 9202 7985 1322a

FBN1 c8052 2AgtG (52) 9286 6392 3117a

FBN2 c3472+2TgtG (48) 9099 6415 2853a

FBN2 c4099+1GgtC (39) 9166 6482 2928a

FBN2 c4222+5GgtA (47) 9211 7994 1321a

FBN2 c4346 2AgtT (87) 9091 6196 3184a

RB1 c264+4delA (57) 9134 8493 701a

RB1 c380+3AgtC (57) 9510 7882 1712a

RB1 c607+1GgtT (57) 9905 7221 2709a

RB1 c939+4AgtG (57) 8375 7541 996a

RB1 c1049+2delT (57) 7695 5700 2590a

RB1 c1215+1GgtA (57) 8586 5902 3126a

RB1 c1389+1GgtA (57) 8269 5586 3245a

RB1 c1389+4AgtG (57) 8269 7435 1009a

RB1 c1389+5GgtA (57) 8269 7053 1471a

RB1 c1422 2AgtT (57) 8612 5717 3362a

RB1 c1422 1GgtA (57) 8612 5717 3362a

RB1 c1498+5GgtA (57) 8291 7075 1467a

RB1 c1960+1GgtA (57) 9402 6719 2854a

RB1 c1960+1delG (57) 9402 4962 4722a

RB1 c2211+1GgtT (57) 8990 6306 2986a

RB1 c2212 2AgtG (57) 8909 6015 3248a

RB1 c2211+1GgtC (57) 8990 6306 2986a

RB1 c2520+1GgtA (57) 9222 6539 2910a

RB1 c2520+3del4 (57) 9222 7226 2164a

RB1 c2663+1GgtA (57) 8837 6154 3036a

MLH1 c306+4AgtG (58) 9607 8773 868a

MLH1 c454 2AgtG (59) 9359 6464 2880a

MLH1 c790+1GgtA (59) 8328 5645 3222a

MLH1 c790+5GgtT (58) 8328 7097 1479a

MLH1 c791 5TgtG (59) 8080 7717 449MLH1 c884+4AgtG (58) 8575 7741 973a

MSH2 c366+1GgtT (59) 8673 5989 3095a

MSH2 c793 2AgtC (59) 8398 5504 3446a

MSH2 c942+3AgtT (59) 9924 8386 1550a

MSH2 c1276+2TgtA (59) 8470 5786 3169a

MSH2 c1386+1GgtA (59) 8902 6219 3013a

MSH2 c2634+5GgtT (58) 8441 7209 1459a

Mutations resulting in the usage of cryptic splice sitesFBN1 c2293+2TgtC (83) 8977 6294 2989

6721 50 CS (51 nt upstream)b

FBN1 c3463+1GgtA (88) 9134 6450 29388847 50 CS (27 nt downstream)b

FBN1 c4747+5GgtT (42) 8913 7681 13827906 50 CS (48 nt upstream)b

FBN1 c5788+1GgtA (52) 8806 6122 30488264 50 CS (33 nt downstream)b

RB1 c138 8TgtG (57) 8162 7969 236

(continued)

Nucleic Acids Research 2009 5

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

reduction (CV) of 31 and a standard deviation (SD) of28 Mutations affecting less conserved residues had aweaker effect with a CV of 7 for the residue in position+4 and 14 for nucleotides in position +3 or +5 Theseresults together with data from other disease-causingmutations (526869) indicated that a CV reduction ofat least 10 for a mutation in any position or of 7 for amutation in position +4 is likely to have a significantimpact on splicing and should be further investigatedSince a mutation can result not only in the disruption of

a 50ss or a 30ss but also in the creation of a new splice siteHSF evaluates the lsquocreation of cryptic splice sitesrsquo Asshown in Table 1 for intronic mutations HSF correctlypredicted the creation of cryptic splice sites in the RB1mutants c607+1delG c138-8TgtG and c501-1GgtAMutations in canonical sequences such as c95-2AgtGc1397-2AgtG and c1397-1GgtA in TGFBR2 c2293+2TgtC c3463+1GgtA c4747+5GgtT and c5788+1GgtA in FBN1 and c1815-2AgtG c2107-2AgtG andc2211+1GgtC in RB1 led to a more complex splicingdefect in which disruption of the wt splice site was coupledto the usage of an alternative pre-existing splice site Asmutations do not directly affect alternative splice sites thisphenomenon was not automatically investigated by HSFTherefore to identify the alternative splice sites we chosein lsquoSelect an analysis typersquo the option lsquoNumber of nucleo-tides surrounding the exonrsquo and entered the value lsquo100rsquo Inaddition we checked the advanced parameter lsquoProcesssequencersquo and selected the lsquoFull sequencersquo option To ana-lyze only splice sites we then selected in lsquoAll or subset ofmatricesrsquo the lsquoSplice site matricesrsquo option Using theseparameters all alternative sites were identified either asthe closest and strongest alternative sites (five cases) oras the second-best sites (two cases) Overall HSF correctlypredicted the impact of mutations affecting 50ss or 30sseven when complex mechanisms were involvedIn addition to splicing defects due to 50ss or 30ss disrup-

tion it is well known that exonic mutations could result in

the creation or activation of cryptic splice sites As shownin Table 2 the nine mutations affecting the last base of anexon had a strong effect on the activity of the concerned50ss (CV=12 07) that resulted in exon skipping oractivation of a cryptic splice site The two mutationsaffecting the penultimate nucleotide of an exon had a lim-ited effect on the activity of the 50ss (CV=54 03)Indeed these mutations were pathogenic only when acryptic splice site was activated and therefore predictionswere hazardous Finally exonic mutations that were dis-tant both from the 50 and 30ss could activate a crypticsplice site and result in splicing defects as shown for muta-tions c658CgtG in RB1 c1915CgtT in MSH2 andc5985TgtG in DMD

Branch point sequences

We analyzed 14 BP sequences previously reported tobe abolished by mutations As shown in Table 3 13 outof 14 BPs were correctly predicted by HSF with an aver-age strength of 834 and a standard deviation of 86The only discrepancy concerned the mutation localizedin intron 3 of GH1 for which the BP was predicted tobe at position 26 by HSF instead of position 21Note that in both cases the BP was located within thec468-37_468-16del which is responsible for the cases ofautosomal dominant isolated GH deficiency (IGHDII)in one single family and therefore additional data areneeded to identify the functional BP Among the otherBP sequences 12 were reported as targets of point muta-tions leading to their inactivation In six cases the muta-tion involved the critical adenosine residue leading to aremarkable BP of 296 For mutations involvingresidues surrounding the BP the average BP was139 with a SD of 3 Taking into account theweight matrix (Figure 1) and experimental data thethreshold for BP prediction was thus set at 67

Table 1 Continued

Gene Mutation References WT CV Mutant CV CV variation ()

5535 8429 30 CS (7 nt upstream)c

RB1 c501 1GgtA (57) 9750 6855 29695482 8377 30 CS (1 nt downstream)c

RB1 c607+1delG (57) 9905 2254 77244251 8847 50 CS (1 nt upstream)c

RB1 c1815 2AgtG (57) 7542 4647 38398184 30 CS (19 nt downstream)b

RB1 c2107 2AgtG (57) 8073 5178 35866956 30 CS (35 nt downstream)b

TGFBR2 c95 2AgtG (56) 9177 6282 31556828 30 CS (18 nt downstream)b

TGFBR2 c1397 2AgtG (89) 9232 6338 31358432 30 CS (30 nt upstream)b

TGFBR2 c1397 1GgtA (90) 9232 6338 31358432 30 CS (30 nt upstream)b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe mutation induces exon skippingbA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSFcThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF

6 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Only default parameters are displayed on the submissionform while skilled users can easily access advanced param-eters Sequences stored in the database can be queriedusing either the gene symbol the Ensembl gene ID theEnsembl transcript ID the RefSeq peptide ID or the con-sensus CDS In addition users can process their ownsequences either for simple sequence analysis or mutantcomparison In addition HSF can be queried in differentways full analysis of a sequence comparison of a mutantand a wild-type sequence or simultaneous analysis of sev-eral mutants related or not to the same transcript In thiscase all mutations should be referred to sequencesincluded in the HSF database In order to easily study agroup of mutations from different genes and transcriptsthe mutation must be described by using the internationalnomenclature system for cDNA mutations (67) (httpwwwgenomicunimelbeduaumdimutnomen) HSFwill then check that each mutation is correctly describedand automatically reconstruct the mutant allele from thewild-type sequence and the mutation name Since onlysmall rearrangements (ie substitutions small exonic orintronic deletions and insertions duplications and indels)provide useful information about splicing defects largerearrangements can not be processed by HSFMoreover differently from previous resources the user

can specifically analyze BP sequences or splice site motifsusing HSF specific matrices and algorithms

The main result page was divided in three areas thereference sequence(s) various graphical displays andtables Since mutations could have different effects relatedto the local context a lsquoquick mutationrsquo option allows theaddition of a small rearrangement (missense deletioninsertion duplication indel) to the sequence(s)

Splicing donoracceptor sites

The new HSF algorithm to define consensus values (CV)of 50ss or 30ss was created to maximize the differencebetween wild-type (wt) active sites and mutant inactivesites Thus strong sites presented a CV higher than 80and less strong sites a CV ranging between 70 and 80Only a minor fraction of active sites showed a CV between65 and 70 (Figure 3) The mean CV for 30ss was 8681 witha standard deviation of 633 while the mean CV for 50sswas 8753 with a standard deviation of 834 These valueswere calculated from more than 400 000 natural splicesites extracted from all alternative transcripts If a muta-tion affects directly the CV it is critical to consider notonly the CV of the mutant splice site but also the deltabetween the wt and mutant CV To validate this algo-rithm we used a set of 69 intronic mutations that affecteither the canonical AGGT splice site motifs or less con-served nucleotides (Table 1) All mutations affecting thenucleotides in canonical positions (2 1 +1 or +2)strongly influenced the CV value with an average

Figure 3 Distribution of CVs for (A) 30 and (B) 50 natural splice sites (50ss and 30ss) Data extracted from the Ensembl dataset (release 44 httpapril2007archiveensemblorg) (20) using the HSF algorithm

4 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 1 Intronic mutations in FBN1 (ENST00000316623) FBN2 (ENST00000262464) RB1 (ENST00000267163) TGFBR2 (ENST00000295754)

MLH1 (ENST00000231790) and MSH2 (ENST00000233146) that lead to splicing defects

Gene Mutation References WT CV Mutant CV CV variation ()

Mutations causing exon skippingFBN1 c247+1GgtA (374649ndash5153) 8226 5542 3262a

FBN1 c538+1GgtA (45) 8399 5715 3196a

FBN1 c1468+5GgtA (44) 8446 7230 1440a

FBN1 c3208+5GgtT (82) 9498 8266 1297a

FBN1 c3838+1GgtA (52) 9584 6901 2800a

FBN1 c3839 1GgtT (83) 8762 5867 3304a

FBN1 c3964+1GgtA (8485) 9004 6320 2980a

FBN1 c3965 2AgtT (85) 8930 6035 3241a

FBN1 c4459+1GgtA (44) 9766 7083 2747a

FBN1 c4943 1GgtC (44) 7977 5082 3629a

FBN1 c5788+5GgtA (3536384143525483) 8806 7589 1382a

FBN1 c6163+2del6 (83) 9905 7290 2640a

FBN1 c6496+2insTG (43) 8221 3205 6101a

FBN1 c6616+1GgtC (86) 7808 5124 3437a

FBN1 c6997+1GgtA (83) 9211 6527 2913a

FBN1 c7205 2AgtG (83) 8411 5516 3442a

FBN1 c7330+1GgtA (55) 9802 7118 2738a

FBN1 c7331 2AgtG (40) 8072 5177 3586a

FBN1 c8051+1GgtA (44) 9202 6518 2916a

FBN1 c8051+5GgtA (51) 9202 7985 1322a

FBN1 c8052 2AgtG (52) 9286 6392 3117a

FBN2 c3472+2TgtG (48) 9099 6415 2853a

FBN2 c4099+1GgtC (39) 9166 6482 2928a

FBN2 c4222+5GgtA (47) 9211 7994 1321a

FBN2 c4346 2AgtT (87) 9091 6196 3184a

RB1 c264+4delA (57) 9134 8493 701a

RB1 c380+3AgtC (57) 9510 7882 1712a

RB1 c607+1GgtT (57) 9905 7221 2709a

RB1 c939+4AgtG (57) 8375 7541 996a

RB1 c1049+2delT (57) 7695 5700 2590a

RB1 c1215+1GgtA (57) 8586 5902 3126a

RB1 c1389+1GgtA (57) 8269 5586 3245a

RB1 c1389+4AgtG (57) 8269 7435 1009a

RB1 c1389+5GgtA (57) 8269 7053 1471a

RB1 c1422 2AgtT (57) 8612 5717 3362a

RB1 c1422 1GgtA (57) 8612 5717 3362a

RB1 c1498+5GgtA (57) 8291 7075 1467a

RB1 c1960+1GgtA (57) 9402 6719 2854a

RB1 c1960+1delG (57) 9402 4962 4722a

RB1 c2211+1GgtT (57) 8990 6306 2986a

RB1 c2212 2AgtG (57) 8909 6015 3248a

RB1 c2211+1GgtC (57) 8990 6306 2986a

RB1 c2520+1GgtA (57) 9222 6539 2910a

RB1 c2520+3del4 (57) 9222 7226 2164a

RB1 c2663+1GgtA (57) 8837 6154 3036a

MLH1 c306+4AgtG (58) 9607 8773 868a

MLH1 c454 2AgtG (59) 9359 6464 2880a

MLH1 c790+1GgtA (59) 8328 5645 3222a

MLH1 c790+5GgtT (58) 8328 7097 1479a

MLH1 c791 5TgtG (59) 8080 7717 449MLH1 c884+4AgtG (58) 8575 7741 973a

MSH2 c366+1GgtT (59) 8673 5989 3095a

MSH2 c793 2AgtC (59) 8398 5504 3446a

MSH2 c942+3AgtT (59) 9924 8386 1550a

MSH2 c1276+2TgtA (59) 8470 5786 3169a

MSH2 c1386+1GgtA (59) 8902 6219 3013a

MSH2 c2634+5GgtT (58) 8441 7209 1459a

Mutations resulting in the usage of cryptic splice sitesFBN1 c2293+2TgtC (83) 8977 6294 2989

6721 50 CS (51 nt upstream)b

FBN1 c3463+1GgtA (88) 9134 6450 29388847 50 CS (27 nt downstream)b

FBN1 c4747+5GgtT (42) 8913 7681 13827906 50 CS (48 nt upstream)b

FBN1 c5788+1GgtA (52) 8806 6122 30488264 50 CS (33 nt downstream)b

RB1 c138 8TgtG (57) 8162 7969 236

(continued)

Nucleic Acids Research 2009 5

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

reduction (CV) of 31 and a standard deviation (SD) of28 Mutations affecting less conserved residues had aweaker effect with a CV of 7 for the residue in position+4 and 14 for nucleotides in position +3 or +5 Theseresults together with data from other disease-causingmutations (526869) indicated that a CV reduction ofat least 10 for a mutation in any position or of 7 for amutation in position +4 is likely to have a significantimpact on splicing and should be further investigatedSince a mutation can result not only in the disruption of

a 50ss or a 30ss but also in the creation of a new splice siteHSF evaluates the lsquocreation of cryptic splice sitesrsquo Asshown in Table 1 for intronic mutations HSF correctlypredicted the creation of cryptic splice sites in the RB1mutants c607+1delG c138-8TgtG and c501-1GgtAMutations in canonical sequences such as c95-2AgtGc1397-2AgtG and c1397-1GgtA in TGFBR2 c2293+2TgtC c3463+1GgtA c4747+5GgtT and c5788+1GgtA in FBN1 and c1815-2AgtG c2107-2AgtG andc2211+1GgtC in RB1 led to a more complex splicingdefect in which disruption of the wt splice site was coupledto the usage of an alternative pre-existing splice site Asmutations do not directly affect alternative splice sites thisphenomenon was not automatically investigated by HSFTherefore to identify the alternative splice sites we chosein lsquoSelect an analysis typersquo the option lsquoNumber of nucleo-tides surrounding the exonrsquo and entered the value lsquo100rsquo Inaddition we checked the advanced parameter lsquoProcesssequencersquo and selected the lsquoFull sequencersquo option To ana-lyze only splice sites we then selected in lsquoAll or subset ofmatricesrsquo the lsquoSplice site matricesrsquo option Using theseparameters all alternative sites were identified either asthe closest and strongest alternative sites (five cases) oras the second-best sites (two cases) Overall HSF correctlypredicted the impact of mutations affecting 50ss or 30sseven when complex mechanisms were involvedIn addition to splicing defects due to 50ss or 30ss disrup-

tion it is well known that exonic mutations could result in

the creation or activation of cryptic splice sites As shownin Table 2 the nine mutations affecting the last base of anexon had a strong effect on the activity of the concerned50ss (CV=12 07) that resulted in exon skipping oractivation of a cryptic splice site The two mutationsaffecting the penultimate nucleotide of an exon had a lim-ited effect on the activity of the 50ss (CV=54 03)Indeed these mutations were pathogenic only when acryptic splice site was activated and therefore predictionswere hazardous Finally exonic mutations that were dis-tant both from the 50 and 30ss could activate a crypticsplice site and result in splicing defects as shown for muta-tions c658CgtG in RB1 c1915CgtT in MSH2 andc5985TgtG in DMD

Branch point sequences

We analyzed 14 BP sequences previously reported tobe abolished by mutations As shown in Table 3 13 outof 14 BPs were correctly predicted by HSF with an aver-age strength of 834 and a standard deviation of 86The only discrepancy concerned the mutation localizedin intron 3 of GH1 for which the BP was predicted tobe at position 26 by HSF instead of position 21Note that in both cases the BP was located within thec468-37_468-16del which is responsible for the cases ofautosomal dominant isolated GH deficiency (IGHDII)in one single family and therefore additional data areneeded to identify the functional BP Among the otherBP sequences 12 were reported as targets of point muta-tions leading to their inactivation In six cases the muta-tion involved the critical adenosine residue leading to aremarkable BP of 296 For mutations involvingresidues surrounding the BP the average BP was139 with a SD of 3 Taking into account theweight matrix (Figure 1) and experimental data thethreshold for BP prediction was thus set at 67

Table 1 Continued

Gene Mutation References WT CV Mutant CV CV variation ()

5535 8429 30 CS (7 nt upstream)c

RB1 c501 1GgtA (57) 9750 6855 29695482 8377 30 CS (1 nt downstream)c

RB1 c607+1delG (57) 9905 2254 77244251 8847 50 CS (1 nt upstream)c

RB1 c1815 2AgtG (57) 7542 4647 38398184 30 CS (19 nt downstream)b

RB1 c2107 2AgtG (57) 8073 5178 35866956 30 CS (35 nt downstream)b

TGFBR2 c95 2AgtG (56) 9177 6282 31556828 30 CS (18 nt downstream)b

TGFBR2 c1397 2AgtG (89) 9232 6338 31358432 30 CS (30 nt upstream)b

TGFBR2 c1397 1GgtA (90) 9232 6338 31358432 30 CS (30 nt upstream)b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe mutation induces exon skippingbA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSFcThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF

6 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 1 Intronic mutations in FBN1 (ENST00000316623) FBN2 (ENST00000262464) RB1 (ENST00000267163) TGFBR2 (ENST00000295754)

MLH1 (ENST00000231790) and MSH2 (ENST00000233146) that lead to splicing defects

Gene Mutation References WT CV Mutant CV CV variation ()

Mutations causing exon skippingFBN1 c247+1GgtA (374649ndash5153) 8226 5542 3262a

FBN1 c538+1GgtA (45) 8399 5715 3196a

FBN1 c1468+5GgtA (44) 8446 7230 1440a

FBN1 c3208+5GgtT (82) 9498 8266 1297a

FBN1 c3838+1GgtA (52) 9584 6901 2800a

FBN1 c3839 1GgtT (83) 8762 5867 3304a

FBN1 c3964+1GgtA (8485) 9004 6320 2980a

FBN1 c3965 2AgtT (85) 8930 6035 3241a

FBN1 c4459+1GgtA (44) 9766 7083 2747a

FBN1 c4943 1GgtC (44) 7977 5082 3629a

FBN1 c5788+5GgtA (3536384143525483) 8806 7589 1382a

FBN1 c6163+2del6 (83) 9905 7290 2640a

FBN1 c6496+2insTG (43) 8221 3205 6101a

FBN1 c6616+1GgtC (86) 7808 5124 3437a

FBN1 c6997+1GgtA (83) 9211 6527 2913a

FBN1 c7205 2AgtG (83) 8411 5516 3442a

FBN1 c7330+1GgtA (55) 9802 7118 2738a

FBN1 c7331 2AgtG (40) 8072 5177 3586a

FBN1 c8051+1GgtA (44) 9202 6518 2916a

FBN1 c8051+5GgtA (51) 9202 7985 1322a

FBN1 c8052 2AgtG (52) 9286 6392 3117a

FBN2 c3472+2TgtG (48) 9099 6415 2853a

FBN2 c4099+1GgtC (39) 9166 6482 2928a

FBN2 c4222+5GgtA (47) 9211 7994 1321a

FBN2 c4346 2AgtT (87) 9091 6196 3184a

RB1 c264+4delA (57) 9134 8493 701a

RB1 c380+3AgtC (57) 9510 7882 1712a

RB1 c607+1GgtT (57) 9905 7221 2709a

RB1 c939+4AgtG (57) 8375 7541 996a

RB1 c1049+2delT (57) 7695 5700 2590a

RB1 c1215+1GgtA (57) 8586 5902 3126a

RB1 c1389+1GgtA (57) 8269 5586 3245a

RB1 c1389+4AgtG (57) 8269 7435 1009a

RB1 c1389+5GgtA (57) 8269 7053 1471a

RB1 c1422 2AgtT (57) 8612 5717 3362a

RB1 c1422 1GgtA (57) 8612 5717 3362a

RB1 c1498+5GgtA (57) 8291 7075 1467a

RB1 c1960+1GgtA (57) 9402 6719 2854a

RB1 c1960+1delG (57) 9402 4962 4722a

RB1 c2211+1GgtT (57) 8990 6306 2986a

RB1 c2212 2AgtG (57) 8909 6015 3248a

RB1 c2211+1GgtC (57) 8990 6306 2986a

RB1 c2520+1GgtA (57) 9222 6539 2910a

RB1 c2520+3del4 (57) 9222 7226 2164a

RB1 c2663+1GgtA (57) 8837 6154 3036a

MLH1 c306+4AgtG (58) 9607 8773 868a

MLH1 c454 2AgtG (59) 9359 6464 2880a

MLH1 c790+1GgtA (59) 8328 5645 3222a

MLH1 c790+5GgtT (58) 8328 7097 1479a

MLH1 c791 5TgtG (59) 8080 7717 449MLH1 c884+4AgtG (58) 8575 7741 973a

MSH2 c366+1GgtT (59) 8673 5989 3095a

MSH2 c793 2AgtC (59) 8398 5504 3446a

MSH2 c942+3AgtT (59) 9924 8386 1550a

MSH2 c1276+2TgtA (59) 8470 5786 3169a

MSH2 c1386+1GgtA (59) 8902 6219 3013a

MSH2 c2634+5GgtT (58) 8441 7209 1459a

Mutations resulting in the usage of cryptic splice sitesFBN1 c2293+2TgtC (83) 8977 6294 2989

6721 50 CS (51 nt upstream)b

FBN1 c3463+1GgtA (88) 9134 6450 29388847 50 CS (27 nt downstream)b

FBN1 c4747+5GgtT (42) 8913 7681 13827906 50 CS (48 nt upstream)b

FBN1 c5788+1GgtA (52) 8806 6122 30488264 50 CS (33 nt downstream)b

RB1 c138 8TgtG (57) 8162 7969 236

(continued)

Nucleic Acids Research 2009 5

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

reduction (CV) of 31 and a standard deviation (SD) of28 Mutations affecting less conserved residues had aweaker effect with a CV of 7 for the residue in position+4 and 14 for nucleotides in position +3 or +5 Theseresults together with data from other disease-causingmutations (526869) indicated that a CV reduction ofat least 10 for a mutation in any position or of 7 for amutation in position +4 is likely to have a significantimpact on splicing and should be further investigatedSince a mutation can result not only in the disruption of

a 50ss or a 30ss but also in the creation of a new splice siteHSF evaluates the lsquocreation of cryptic splice sitesrsquo Asshown in Table 1 for intronic mutations HSF correctlypredicted the creation of cryptic splice sites in the RB1mutants c607+1delG c138-8TgtG and c501-1GgtAMutations in canonical sequences such as c95-2AgtGc1397-2AgtG and c1397-1GgtA in TGFBR2 c2293+2TgtC c3463+1GgtA c4747+5GgtT and c5788+1GgtA in FBN1 and c1815-2AgtG c2107-2AgtG andc2211+1GgtC in RB1 led to a more complex splicingdefect in which disruption of the wt splice site was coupledto the usage of an alternative pre-existing splice site Asmutations do not directly affect alternative splice sites thisphenomenon was not automatically investigated by HSFTherefore to identify the alternative splice sites we chosein lsquoSelect an analysis typersquo the option lsquoNumber of nucleo-tides surrounding the exonrsquo and entered the value lsquo100rsquo Inaddition we checked the advanced parameter lsquoProcesssequencersquo and selected the lsquoFull sequencersquo option To ana-lyze only splice sites we then selected in lsquoAll or subset ofmatricesrsquo the lsquoSplice site matricesrsquo option Using theseparameters all alternative sites were identified either asthe closest and strongest alternative sites (five cases) oras the second-best sites (two cases) Overall HSF correctlypredicted the impact of mutations affecting 50ss or 30sseven when complex mechanisms were involvedIn addition to splicing defects due to 50ss or 30ss disrup-

tion it is well known that exonic mutations could result in

the creation or activation of cryptic splice sites As shownin Table 2 the nine mutations affecting the last base of anexon had a strong effect on the activity of the concerned50ss (CV=12 07) that resulted in exon skipping oractivation of a cryptic splice site The two mutationsaffecting the penultimate nucleotide of an exon had a lim-ited effect on the activity of the 50ss (CV=54 03)Indeed these mutations were pathogenic only when acryptic splice site was activated and therefore predictionswere hazardous Finally exonic mutations that were dis-tant both from the 50 and 30ss could activate a crypticsplice site and result in splicing defects as shown for muta-tions c658CgtG in RB1 c1915CgtT in MSH2 andc5985TgtG in DMD

Branch point sequences

We analyzed 14 BP sequences previously reported tobe abolished by mutations As shown in Table 3 13 outof 14 BPs were correctly predicted by HSF with an aver-age strength of 834 and a standard deviation of 86The only discrepancy concerned the mutation localizedin intron 3 of GH1 for which the BP was predicted tobe at position 26 by HSF instead of position 21Note that in both cases the BP was located within thec468-37_468-16del which is responsible for the cases ofautosomal dominant isolated GH deficiency (IGHDII)in one single family and therefore additional data areneeded to identify the functional BP Among the otherBP sequences 12 were reported as targets of point muta-tions leading to their inactivation In six cases the muta-tion involved the critical adenosine residue leading to aremarkable BP of 296 For mutations involvingresidues surrounding the BP the average BP was139 with a SD of 3 Taking into account theweight matrix (Figure 1) and experimental data thethreshold for BP prediction was thus set at 67

Table 1 Continued

Gene Mutation References WT CV Mutant CV CV variation ()

5535 8429 30 CS (7 nt upstream)c

RB1 c501 1GgtA (57) 9750 6855 29695482 8377 30 CS (1 nt downstream)c

RB1 c607+1delG (57) 9905 2254 77244251 8847 50 CS (1 nt upstream)c

RB1 c1815 2AgtG (57) 7542 4647 38398184 30 CS (19 nt downstream)b

RB1 c2107 2AgtG (57) 8073 5178 35866956 30 CS (35 nt downstream)b

TGFBR2 c95 2AgtG (56) 9177 6282 31556828 30 CS (18 nt downstream)b

TGFBR2 c1397 2AgtG (89) 9232 6338 31358432 30 CS (30 nt upstream)b

TGFBR2 c1397 1GgtA (90) 9232 6338 31358432 30 CS (30 nt upstream)b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe mutation induces exon skippingbA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSFcThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF

6 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

reduction (CV) of 31 and a standard deviation (SD) of28 Mutations affecting less conserved residues had aweaker effect with a CV of 7 for the residue in position+4 and 14 for nucleotides in position +3 or +5 Theseresults together with data from other disease-causingmutations (526869) indicated that a CV reduction ofat least 10 for a mutation in any position or of 7 for amutation in position +4 is likely to have a significantimpact on splicing and should be further investigatedSince a mutation can result not only in the disruption of

a 50ss or a 30ss but also in the creation of a new splice siteHSF evaluates the lsquocreation of cryptic splice sitesrsquo Asshown in Table 1 for intronic mutations HSF correctlypredicted the creation of cryptic splice sites in the RB1mutants c607+1delG c138-8TgtG and c501-1GgtAMutations in canonical sequences such as c95-2AgtGc1397-2AgtG and c1397-1GgtA in TGFBR2 c2293+2TgtC c3463+1GgtA c4747+5GgtT and c5788+1GgtA in FBN1 and c1815-2AgtG c2107-2AgtG andc2211+1GgtC in RB1 led to a more complex splicingdefect in which disruption of the wt splice site was coupledto the usage of an alternative pre-existing splice site Asmutations do not directly affect alternative splice sites thisphenomenon was not automatically investigated by HSFTherefore to identify the alternative splice sites we chosein lsquoSelect an analysis typersquo the option lsquoNumber of nucleo-tides surrounding the exonrsquo and entered the value lsquo100rsquo Inaddition we checked the advanced parameter lsquoProcesssequencersquo and selected the lsquoFull sequencersquo option To ana-lyze only splice sites we then selected in lsquoAll or subset ofmatricesrsquo the lsquoSplice site matricesrsquo option Using theseparameters all alternative sites were identified either asthe closest and strongest alternative sites (five cases) oras the second-best sites (two cases) Overall HSF correctlypredicted the impact of mutations affecting 50ss or 30sseven when complex mechanisms were involvedIn addition to splicing defects due to 50ss or 30ss disrup-

tion it is well known that exonic mutations could result in

the creation or activation of cryptic splice sites As shownin Table 2 the nine mutations affecting the last base of anexon had a strong effect on the activity of the concerned50ss (CV=12 07) that resulted in exon skipping oractivation of a cryptic splice site The two mutationsaffecting the penultimate nucleotide of an exon had a lim-ited effect on the activity of the 50ss (CV=54 03)Indeed these mutations were pathogenic only when acryptic splice site was activated and therefore predictionswere hazardous Finally exonic mutations that were dis-tant both from the 50 and 30ss could activate a crypticsplice site and result in splicing defects as shown for muta-tions c658CgtG in RB1 c1915CgtT in MSH2 andc5985TgtG in DMD

Branch point sequences

We analyzed 14 BP sequences previously reported tobe abolished by mutations As shown in Table 3 13 outof 14 BPs were correctly predicted by HSF with an aver-age strength of 834 and a standard deviation of 86The only discrepancy concerned the mutation localizedin intron 3 of GH1 for which the BP was predicted tobe at position 26 by HSF instead of position 21Note that in both cases the BP was located within thec468-37_468-16del which is responsible for the cases ofautosomal dominant isolated GH deficiency (IGHDII)in one single family and therefore additional data areneeded to identify the functional BP Among the otherBP sequences 12 were reported as targets of point muta-tions leading to their inactivation In six cases the muta-tion involved the critical adenosine residue leading to aremarkable BP of 296 For mutations involvingresidues surrounding the BP the average BP was139 with a SD of 3 Taking into account theweight matrix (Figure 1) and experimental data thethreshold for BP prediction was thus set at 67

Table 1 Continued

Gene Mutation References WT CV Mutant CV CV variation ()

5535 8429 30 CS (7 nt upstream)c

RB1 c501 1GgtA (57) 9750 6855 29695482 8377 30 CS (1 nt downstream)c

RB1 c607+1delG (57) 9905 2254 77244251 8847 50 CS (1 nt upstream)c

RB1 c1815 2AgtG (57) 7542 4647 38398184 30 CS (19 nt downstream)b

RB1 c2107 2AgtG (57) 8073 5178 35866956 30 CS (35 nt downstream)b

TGFBR2 c95 2AgtG (56) 9177 6282 31556828 30 CS (18 nt downstream)b

TGFBR2 c1397 2AgtG (89) 9232 6338 31358432 30 CS (30 nt upstream)b

TGFBR2 c1397 1GgtA (90) 9232 6338 31358432 30 CS (30 nt upstream)b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe mutation induces exon skippingbA cryptic splice site not created by the mutation and used in vivo was correctly predicted by HSFcThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSF

6 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Auxiliary splicing sequences enhancers and silencers

In order to simplify the interpretation of predictionsobtained with the different algorithms using weightmatrices we used a normalized range scale from 0 to100 As a consequence previous matrices from ESE-Finder (1927) were modified Nevertheless the user candefine the thresholds using either the original ESE-Finder range or the new 0ndash100 range In addition whenprocessing a single sequence and when CVs are availableHSF calculates the deviation as a percentage of thethreshold A reduced list can be obtained for eachmatrix by choosing the lsquoOnly variantrsquo option inlsquoAdvanced parametersrsquo A color code is used for eachquartile (from white to orange) to simplify the analysisWhen comparing mutant sequences HSF uses this colorcode to indicate the differences between the two sequences

When scalability is not possible HSF only displays thepresence of a motifTo evaluate the sensitivity and usefulness of auxiliary

splicing sequence predictions we used a first set of genesfor which 20 mutations have been reported to resultin exon skipping following targeting of ESE or ESS(58ndash66) For each mutation we selected the defaultoption that allows HSF to predict modifications of ESEandor ESS motifs using all available matrices (Table 4)For mutation c362CgtT in ACADM or c4250TgtA inDMD for which the target auxiliary sequences have beenexperimentally characterized (SF2ASF and hnRNPA1respectively) HSF correctly predicted the effect of themutation For other sequences different scenarioswere predicted (i) disruption of one or more ESE withoutcreation of an ESS as observed for mutations c882CgtT(MLH1) c362CgtT (ACADM) c8165CgtG and

Table 2 Exonic mutations in DMD (ENST00000357033) MLH1 (ENST 00000231790) MSH2 (ENST00000233146) and RB1 (ENST00000267163)

involved in splicing

Gene Mutation Position References WT CV Mutant CV CV variation ()

DMD c5985TgtG Deep exonic (91) 4665 7559 30 CS (63 nt downstream)a

MLH1 c677GgtA Last base (58) 8446 7389 1252b

MLH1 c882CgtT Exonic (58) 8446 7389 1252b

MLH1 c1037AgtG Penultimate base (58) 9304 8819 522 50 CS (upstreamc)MLH1 c1038GgtT Last base (58) 9304 8217 1168 50 CS (upstreamc)MLH1 c1667GgtT Last base (92) 8585 7499 1266 50 CS (88 nt downstream)a

MLH1 c1731GgtA Last base (58) 9327 8269 1134MLH1 c1989GgtT Last base (58) 9322 8235 1166MSH2 c1660AgtT Penultimate base (58) 8400 7925 565 50 CS (82 nt upstream)a

MSH2 c1759GgtC Last base (58) 8566 7465 1286b

MSH2 c1915CgtT Deep exonic (59) 6219 8902 50 CS (92 nt upstream)a

RB1 c658CgtG Deep exonic (57) 5866 8549 50 CS (61 nt upstream)a

RB1 c939GgtT Last base (57) 8375 7288 1298b

RB1 c1960GgtC Last base (57) 9402 8301 1171b

RB1 c1960GgtA Last base (57) 9402 8344 1125b

CS cryptic site (ie a new splice site is created by the mutation and is used instead of the regular site) Nucleotide numbering follows the referencecDNA sequence with +1 corresponding to the A of the ATG translation initiation codonaThe cryptic splice site created by the mutation and used in vivo was correctly predicted by HSFbThe mutation induces exon skippingcThe cryptic splice site used in vitro was not clearly reported and therefore was not available for comparison

Table 3 Branch point sequences

Gene Intron References Ref BP Ref Seq HSF BP HSF value

COL5A1 32 (93) 27 ENST00000355306 27 8781DYSF 31 (94) 33 ENST00000258104 33 9313FBN2 30 (95) 24 ENST00000262464 24 7706GH1 3 (96) 21 ENST00000323322 26 7336ITGB4 31 (97) 17 ENST00000200181 17 9379LCAT 4 (98) 20 ENST00000264005 20 9507LDLR 9 (99) 25 ENST00000252444 25 8659NPC1 6 (100) 28 ENST00000269228 28 7741PMM2 2 (101) 25 ENST00000268261 25 8056PMM2 7 (101) 23 ENST00000268261 23 7227RB1 23 (57) 26 ENST00000267163 26 7589TH 11 (102) 22 ENST00000324155 22 8496TSC2 38 (103) 18 ENST00000219476 18 6771XPC 3 (76) 24 ENST00000285021 24 8278

For each gene the reference sequence from the Ensembl genome database (Ref Seq) the intron number (Intron) and the position of the BP identifiedby in vitro experiments (Ref BP) as well as the BP position predicted by HSF (HSF BP) and the corresponding BP value (HSF value) are shown

Nucleic Acids Research 2009 7

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

Table 4 Exonic mutations known to result in exon skipping through ESE inactivation or ESS activation

Gene Mutation Ref Motif Ref Seq HSF prediction

ACADM c362CgtT (65) ESE (SF2ASF) ENST00000370841 9G8i (357_362)SF2ASFe (358_364)+EIEh (359_364)SRp40e (359_365) EIEh (360_365)+IIEc

4 (359_367)BRCA1 c5080GgtT (64) ENST00000357654 EIEh (5075_5080)

+SRp55e (5076_5081)9G8i (5077_5082)SF2ASFe (5078_5085)IIEc (5078_5083)+IIEc (5079_5084)ESSa(5076_5083)+hnRNPA1d (5080_5085)

BRCA2 c8165CgtG (62) ESE ENST00000380152 SRp40e (8162_8168)ESEf (8163_8168)+ESEf

2 (8164_8170)SRp55e (8163_8169)SF2ASFe (8165_8171)EIEh

4 (8160_8168)BRCA2 c5081GgtT (64) ENST00000380152 +SC35e (5075_5082)

+SRp40e (5080_5086)ESEfh

2 (5080_5086)9G8i (5081_5086)ESSa (5078_5085)

DMD c4250TgtA (61) +ESS (hnRNPA1) ENST00000357033 +9G8i 2 (4246_4251)(4248_4253)EIEh (4248_4253)+ESEf (4250_4255)IIEc

3 (4246_4253)+hnRNPA1d (4249_4254)

MLH1 c544AgtG (59) ENST00000231790 +ESSa (537_545)50ss CV=630

MLH1 c793CgtT (58) ENST00000231790 +ESSa (795_802)MLH1 c794GgtA (58) ENST00000231790 SRp40e (793_799)

SC35e (794_801)+ESSc (794_799)

MLH1 c882CgtT (58) ENST00000231790 +SC35e (876_883)SRp55e (877_882)

MLH1 c988_990del (58) ENST00000231790 +SF2ASFe (983_989)SRp55e (985_990)+9G8i (985_990)ESSa (985_992)

MSH2 c815CgtT (58) ENST00000233146 SRp55e (813_818)+ESSa (813_820)+ESSc 5 (801_819)

MSH2 c274_276del (58) ENST00000233146 +SC35e (272_279)+SRp40e 2 (274_285)IIEc

2 (274_280)LAMA2 c2230CgtT (60) ENST00000354729 SF2ASFe (2226_2232)

+ESSc (2228_2235)+IIEc

2 (2229_2235)+ESSa (2230_2237)

NF1 c557AgtT (66) ESE ENST00000356175 SRp55e (552_557)ESEf (552_557)EIEh

4 (552_560)9G8i (553_558)+ESSa 2 (550_557) (555_562)

NF1 c910CgtT (66) ESE ENST00000356175 9G8i (905_910)EIEh (905_910)+ESEf (908_913)ESEf (910_915)ESSa (906_913)

NF1 c943CgtT (66) ESE ENST00000356175 SC35e (941_948)SF2ASFe (943_949)PESEg (942_949)9G8i (938_943)+hnRNPA1d (943_948)+IIEc (942_947)

(continued)

8 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

c5081GgtT (BRCA2) c557AgtT and c910CgtT (NF1)(ii) creation of one or more ESS without disruption ofan ESE as shown for mutations c544AgtG andc793CgtT (MLH1) c4250TgtA (DMD) and c6792CgtG(NF1) and c) intermediate situation where both the dis-ruption of one or more ESE and the creation of one ormore ESS were predicted This was observed for muta-tions c5080GgtT (BRCA1) c794GgtA and c988_990del(MLH1) c815CgtT and c274_276del (MSH2)c2230CgtT (LAMA2) c943CgtT c1007GgtA andc5719GgtT (NF1) In order to evaluate the potential todifferentiate lsquotruersquo ESE or ESS motifs from false positivesignals we selected a second set of 36 mutations (positivecontrols) and 220 SNPs (negative controls)(Supplementary Table 1) Predictions were classified inthree categories disruption of ESE motifs only (ESE)creation of ESS motifs only (ESS) or both (ESE+ESS)In addition results were classified in two subsets a firstone (All) which included all predicted motifs and asecond one (Best) which was restricted to only onemotif for each case by selecting the one recognized bythe highest number of matrices

Comparison of the three categories (ESE ESS andESE+ESS) revealed a significant difference betweenpositive and negative controls both in the lsquoAllrsquo(2=1005 P=000656) and the lsquoBestrsquo subset(2=1175 P=00028) We then evaluated the potential

of each matrix to differentiate true from false positivesignals No statistical differences were found using theSironi PESS IIE hnRNPA1 and RESCUE-ESEmatrices A statistically significant difference was foundfor the lsquoAllrsquo subset (2=399 P=0045) but not forthe lsquoBestrsquo subset (2=247 P=0116) with the EIEmatrix Significant results in both subsets were obtainedwith ESE-Finder (lsquoAllrsquo subset 2=517 P=0023 lsquoBestrsquosubset 2=733 P=00067) the 9G8 and Tra2szlig matri-ces from HSF (lsquoAllrsquo subset 2=992 P=000164 lsquoBestrsquosubset 2=986 P=000169) and PESE (lsquoAllrsquo subset2=1952 P=995 106 lsquoBestrsquo subset 2=1352P=236 104) The positive (PPV) and negative(NPV) predictive values as well as the sensitivity (Sv)and the specificity (Sp) of these last three matrices werethen evaluated PPV ranged from 022 (9G8 and Tra2szlig) to056 (PESE) PNV from 076 (PESE) to 095 (9G8 andTra2szlig) Sv from 027 (PESE) to 040 (9G8 and Tra2szlig)and Sp from 088 (9G8 and Tra2szlig) to 091 (PESE)The ESE-Finder matrix showed intermediate values inall cases

DISCUSSION

During evolution from simple to higher eukaryotes splic-ing signals evolved from well-defined motifs to degener-ated sequences with the addition of new auxiliary splicing

Table 4 Continued

Gene Mutation Ref Motif Ref Seq HSF prediction

NF1 c1007GgtA (66) ESE ENST00000356175 +PESEg (1007_1014)EIEh

2 (1003_1011)+9G8i (1006_1011)+ESEf (1007_1014)ESSa 2 (1003_1011)IIEc

4 (1003_1011)+hnRNPA1d (1006_1011)

NF1 c5719GgtT (66) ESE ENST00000356175 ESEf 5 (5715_5724)

EIEh 5 (5715_5724)

ESSa 2 (5714_5725)+PESSg 2 (5712_5720)+hnRNPA1d (5719_5724)

NF1 c6792CgtA (66) ESE ENST00000356175 +ESEf 5 (6792_6797)

EIEh 2 (6788_6793) (6790_6795)

+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)

NF1 c6792CgtG (66) ESE ENST00000356175 +ESEf (6792_6797)EIEh

2 (6788_6793) (6790_6795)s+Tra2bi (6791_6795)ESSa 2 (6787_6794) (6792_6799)+hnRNPA1d (6790_6795)

+ a new site was created by the mutation the motif was abolished by the mutation Algorithms and matrices used toidentify the motifs wereaSilencer motifs from Sironi et al (31)bPESS octamers (28)cIIEs (30)dhnRNP motifs from HSFeESE Finder matrices (19)fRESCUE ESE hexamers (63)gPESE octamers (28)hEIEs (30)iESE motifs from HSF When multiple adjacent sites were predicted the number of sites is indicated 5 means that fiveadjacent sites were modified by the mutation Nucleotide numbering reflects the reference cDNA sequence with +1corresponding to the A of the ATG translation initiation codon

Nucleic Acids Research 2009 9

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

sequences known as ESE and ESS Although major SRproteins have been cloned and their target sites deter-mined much work remains to be done to understandhow splice signals are recognized and splicing specificityachieved As this complex world is progressively revealedbioinformatics resources could play a major role in help-ing researchers and diagnostic laboratories to evaluate theconsequence of mutations on splicing especially becausemost genetic tests use DNA and not RNA samplesBy giving an easy access to predictions of 50ss 30ss BPsequences as well as ESE and ESS the HSF tool (httpwwwumdbeHSF) fulfills this need and may assist clin-icians geneticists and researchers (70ndash75) By combiningmotifs identified with different experimental and computa-tional approaches it provides a common interface thatcan be used for sequence analysis The inclusion of allexons and introns extracted from the Ensembl humangenome database (20) allows an easy access to anysequence of human genes and thus direct comparison ofvirtually every mutation or SNP concerning splicing ele-ments Since SNPs are present at a very high frequency inthe genome (1300 bp) it could be useful to evaluate theirimpact in association with a mutation We thereforeincluded in HSF data from dbSNP using EnsemblBiomart The user can select the lsquoSearch for SNPs relatedto the analyzed sequencersquo option that automaticallyretrieves SNPs from the database When SNPs are loca-lized in exons their effect on ESE and ESS motifs couldhelp the user to better evaluate the consequence of a givenmutationTo evaluate the efficiency of the various algorithms

included in HSF and its contribution to the predictionof the consequences of mutations associated with a splic-ing defect we used a set of 69 intronic mutations thatdisrupt the 50ss or the 30ss and result in exon skippingandor activation of a cryptic splice site (Table 1) and agroup of 15 mutations that were previously reported toresult in splicing defects by creating or activating crypticsplice sites (Table 2) HSF was able to correctly predict thedisruption of the natural splice sites Moreover we couldconfirm that (i) mutations of the last nucleotide of an exonhave a strong effect on the 50ss (CV=12 07) result-ing frequently in exon skipping or partial exonic deletionor intronic retention due to activation of a cryptic splicesite (ii) mutations of the penultimate exonic nucleotidehave limited consequences on the 50ss (CV=54 03) but they can activate a cryptic splice site makingpredictions more difficult (iii) exonic mutations distantfrom the 50 and 30ss can activate a cryptic splice site lead-ing to partial exonic deletion Overall these findings under-line the efficiency of the HSF algorithm to predict theeffect of mutations on 50 and 30ss When using the HSFalgorithm the threshold for 50 and 30ss is 65 with a patho-genic CV of 10 except for position +4 where it is7 However in few cases when unusual splice sites areused this algorithm could be less efficientBP sequences represent another essential splicing signal

When a mutation is localized in proximity of the 50 of the30ss its potential effect on a BP sequence should be exam-ined especially when a nucleotide located at less than 85 bpfrom the 30ss is targeted In order to evaluate the HSF

algorithm dedicated to the identification of BP sequenceswe used 14 BP sequences inactivated by intronic mutations(Table 3) HSF correctly predicted 13 out of 14 BPs andthese data allowed us to define the threshold for BP detec-tion at 67 and the pathogenic BP at 10 Moreoverfor intron 3 of XPC HSF predicted a BP at position 24However according to Khan et al (76) two BP sequencesare present in this intron one at positions 24 andanother at ndash4 HSF could not predict the BP at position4 simply because the HSF algorithm excludes positions12 to 1 for BP identification because of steric obstruc-tion caused by the spliceosome

It has been demonstrated that two different splicing rec-ognition mechanisms correlated with intron length canbe used in a cell exon definition for long and exon defi-nition for short introns (77) Although the influence ofintron length seems to be less important in humans thanin other species it should nevertheless be kept in mindsince U12 and U2-type introns have different BP consen-sus sequences In the present version of HSF (v24) weonly focused on U2-type introns which are by far themost abundant type in mammalian cells

Concerning cis-acting elements many works havebeen performed to define ESE and ESS matrices basedon bioinformatics or experimental approaches (78)However due to technical andor conceptual bias the var-ious sequence sets only share partial homology To solvethis problem HSF included all available matrices in oneplace In addition we developed new matrices to predictESE motifs for the 9G8 and Tra2-b SR proteins and ESSmotifs for the hnRNPA1 ribonucleoprotein ESE and ESSmotifs frequently overlap and therefore the identificationof the specific motifprotein pair involved in a given spli-cing defect is difficult This is even more complicatedwhen considering the impact of SR and ribonucleoproteinconcentration in different tissues or during developmentWe used a set of 20 exonic mutations known to influencesplicing through ESE inactivation or ESS activation(Table 4) to evaluate the efficiency of HSF to correctlypredict motifs disrupted by these mutations We showedthat when the motifprotein pairs had been previouslyexperimentally characterized (hnRNPA1 or SF2ASF)HSF was able to correctly predict the effects of the muta-tion on ESE and ESS For most mutations however onlythe general mechanism was identified (ie the mutantsequence inhibits splicing in various in vitro reporter sys-tems) and therefore the motifprotein couple is unknownIn these cases HSF predicted the disruption of ESE motifsandor the creation of ESS motifs (Table 4) In additionto evaluate HSF efficiency to discriminate true from falsepositive signals we used a second group of positive andnegative controls (Supplementary Table 1) We showedthat both sets could be discriminated on the basis oftheir overall pattern (ESE ESS ESE+ESS 2=1175P=00028) Three matrices also gave statistically signifi-cant results ESE-Finder (2=733 P=00067) 9G8 andTra2szlig from HSF (2=986 P=00017) and PESE(2=1352 P=236 104) Since these three matricespredict ESE motifs these results could be associatedwith a bias towards the positive controls Indeed onlyfew experimental validations of auxiliary sequences are

10 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

available and they are frequently initiated by predictionsof ESE motifs using ESE-Finder PESE and the 9G8Tra2szlig HSF matrices gave stronger results than ESEFinder itself and therefore can be considered efficientmatrices for the identification of ESE motifs Howeverpredictions with other matrices especially the hnRNPA1matrix should also be considered as they could providevaluable information as shown for the c4250TgtA ofDMD We are still in the early days of ESE and ESSmotif predictions and further data are needed to selectthe best matrices and to define the rules for data interpre-tation as most mutation sets used to validate predictiontools contain mainly mutations affecting splice sites (79)Major work is also needed to ultimately address the tissueor developmental specificity

In conclusion the HSF tool is dedicated to the predic-tion of splicing signals present in any human gene using allavailable matrices to identify ESE and ESS and newmatrices to evaluate 50 and 30ss and BPs This tool is reg-ularly updated to include new data from bioinformaticsand experimental studies in order to improve predictionsMany users already have tested HSF and have stressed itsvalue both for basic science (identification of splicing sig-nals) and applied research or diagnostics (prediction of thepathogenic consequences of a given mutation) (70ndash75)In addition new genotype-based therapies such as theexon-skipping approach in Duchenne MuscularDystrophy are currently evaluated in clinical trials (inter-national multi-center phase III clinical studies withPRO051 in patients with Duchenne Muscular Dystrophyndash Prosensa company httpprosensaeu) HSF might rep-resent an useful tool to identify key splicing sequences indifferent exons (7580) and therefore to design antisenseoligonucleotides to induce exon skipping This approachis being actively evaluated throughout the world and espe-cially by the TREAT-NMD European network (httpwwwtreat-nmdeuhomephp)

Besides these gene-specific approaches global projectswhich either aim at developing a holistic view onGenotype-To-Phenotype data (GEN2PHEN Europeanprojects httpwwwgen2phenorg) or at improvinghealth outcomes by facilitating the analysis of humangenetic variation and its impact on human health suchas the Human Variome Project (81) might benefitfrom using HSF Indeed HSF could help to predict thetheoretical impact on splicing of any sequence variationaffecting a human gene

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

FUNDING

European Community Seventh Framework Program(FP72007-2013) under grant agreement number200754mdashthe GEN2PHEN project The EuropeanCommunity Sixth Framework Program (FP6) undergrant agreement number 036825 TREAT-NMDNetwork of Excellence Funding for open access

charge Institut National de la Sante Et de la RechercheMedicale (INSERM)

Conflict of interest statement None declared

REFERENCES

1 BergetSM MooreC and SharpPA (1977) Spliced segmentsat the 50 terminus of adenovirus 2 late mRNA Proc Natl Acad SciUSA 74 3171ndash3175

2 NilsenTW (2003) The spliceosome the most complexmacromolecular machine in the cell Bioessays 25 1147ndash1149

3 ZhouZ LickliderLJ GygiSP and ReedR (2002)Comprehensive proteomic analysis of the human spliceosomeNature 419 182ndash185

4 BreitbartRE NguyenHT MedfordRM DestreeATMahdaviV and Nadal-GinardB (1985) Intricate combinatorialpatterns of exon splicing generate multiple regulated troponin Tisoforms from a single gene Cell 41 67ndash82

5 ManiatisT and TasicB (2002) Alternative pre-mRNA splicing andproteome expansion in metazoans Nature 418 236ndash243

6 CartegniL ChewSL and KrainerAR (2002) Listening to silenceand understanding nonsense exonic mutations that affect splicingNat Rev Genet 3 285ndash298

7 RobbersonBL CoteGJ and BergetSM (1990) Exon definitionmay facilitate splice site selection in RNAs with multiple exonsMol Cell Biol 10 84ndash94

8 JacobM and GallinaroH (1989) The 50 splice site phylogeneticevolution and variable geometry of association with U1RNANucleic Acids Res 17 2159ndash2180

9 BlencoweBJ (2000) Exonic splicing enhancers mechanism ofaction diversity and role in human genetic diseases Trends BiochemSci 25 106ndash110

10 ZhuJ MayedaA and KrainerAR (2001) Exon identityestablished through differential antagonism between exonic splicingsilencer-bound hnRNP A1 and enhancer-bound SR proteinsMol Cell 8 1351ndash1361

11 ZhangXH LeslieCS and ChasinLA (2005) Computationalsearches for splicing signals Methods 37 292ndash305

12 BhasiA PandeyRV UtharasamySP and SenapathyP (2007)EuSplice A unified resource for the analysis of splice signalsand alternative splicing in eukaryotic genes Bioinformatics 231815ndash1823

13 ChurbanovA RogozinIB DeogunJS and AliH (2006)Method of predicting splice sites based on signal interactionsBiol Direct 1 10

14 DunckleyMG ManoharanM VillietP EperonIC andDicksonG (1998) Modification of splicing in the dystrophin genein cultured Mdx muscle cells by antisense oligoribonucleotidesHum Mol Genet 7 1083ndash1090

15 WiltonSD and FletcherS (2005) RNA splicing manipulationstrategies to modify gene expression for a variety of therapeuticoutcomes Curr Gene Ther 5 467ndash483

16 BeroudC HamrounD Collod-BeroudG BoileauC SoussiTand ClaustresM (2005) UMD (Universal Mutation Database)2005 update Hum Mutat 26 184ndash191

17 BeroudC Collod-BeroudG BoileauC SoussiT and JunienC(2000) UMD (Universal mutation database) a generic software tobuild and analyze locus-specific databases Hum Mutat 15 86ndash94

18 FairbrotherWG YeoGW YehR GoldsteinP MawsonMSharpPA and BurgeCB (2004) RESCUE-ESE identifies candi-date exonic splicing enhancers in vertebrate exons Nucleic AcidsRes 32 W187ndashW190

19 CartegniL WangJ ZhuZ ZhangMQ and KrainerAR (2003)ESEfinder A web resource to identify exonic splicing enhancersNucleic Acids Res 31 3568ndash3571

20 FlicekP AkenBL BealK BallesterB CaccamoM ChenYClarkeL CoatesG CunninghamF CuttsT et al (2008)Ensembl 2008 Nucleic Acids Res 36 D707ndashD714

21 KarolchikD KuhnRM BaertschR BarberGP ClawsonHDiekhansM GiardineB HarteRA HinrichsAS HsuF et al(2008) The UCSC Genome Browser Database 2008 update NucleicAcids Res 36 D773ndashD779

Nucleic Acids Research 2009 11

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

22 ShapiroMB and SenapathyP (1987) RNA splice junctions ofdifferent classes of eukaryotes sequence statistics and functionalimplications in gene expression Nucleic Acids Res 15 7155ndash7174

23 YeoG and BurgeCB (2004) Maximum entropy modeling ofshort sequence motifs with applications to RNA splicing signalsJ Comput Biol 11 377ndash394

24 GreenMR (1991) Biochemical mechanisms of constitutive andregulated pre-mRNA splicing Annu Rev Cell Biol 7 559ndash599

25 GoodingC ClarkF WollertonMC GrellscheidSN GroomHand SmithCW (2006) A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotideexclusion zones Genome Biol 7 R1

26 KolG Lev-MaorG and AstG (2005) Human-mouse compara-tive analysis reveals that branch-site plasticity contributes to splicingregulation Hum Mol Genet 14 1559ndash1568

27 SmithPJ ZhangC WangJ ChewSL ZhangMQ andKrainerAR (2006) An increased specificity score matrix for theprediction of SF2ASF-specific exonic splicing enhancers HumMol Genet 15 2490ndash2508

28 ZhangXH and ChasinLA (2004) Computational definition ofsequence motifs governing constitutive exon splicing Genes Dev18 1241ndash1250

29 GorenA RamO AmitM KerenH Lev-MaorG VigIPupkoT and AstG (2006) Comparative analysis identifies exonicsplicing regulatory sequencesndashThe complex definition of enhancersand silencers Mol Cell 22 769ndash781

30 ZhangC LiWH KrainerAR and ZhangMQ (2008) RNAlandscape of evolution for optimal exon and intron discriminationProc Natl Acad Sci USA 105 5797ndash5802

31 SironiM MenozziG RivaL CaglianiR ComiGPBresolinN GiordaR and PozzoliU (2004) Silencer elements aspossible inhibitors of pseudoexon splicing Nucleic Acids Res 321783ndash1791

32 WangZ RolishME YeoG TungV MawsonM andBurgeCB (2004) Systematic identification and analysis of exonicsplicing silencers Cell 119 831ndash845

33 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

34 BaileyTL WilliamsN MislehC and LiWW (2006) MEMEdiscovering and analyzing DNA and protein sequence motifsNucleic Acids Res 34 W369ndashW373

35 YuanB ThomasJP von KodolitschY and PyeritzRE (1999)Comparison of heteroduplex analysis direct sequencing andenzyme mismatch cleavage for detecting mutations in a large geneFBN1 Hum Mutat 14 440ndash446

36 YouilR TonerTJ BullE BaileyAL EarlCD DietzHCand MontgomeryRA (2000) Enzymatic mutation detection(EMD) of novel mutations (R565X and R1523X) in the FBN1 geneof patients with Marfan syndrome using T4 endonuclease VIIHum Mutat 16 92ndash93

37 SchrijverI LiuW OdomR BrennT OefnerP FurthmayrHand FranckeU (2002) Premature termination mutations in FBN1distinct effects on differential allelic expression and on protein andclinical phenotypes Am J Hum Genet 71 223ndash237

38 RommelK KarckM HaverichA SchmidtkeJ and Arslan-KirchnerM (2002) Mutation screening of the fibrillin-1 (FBN1)gene in 76 unrelated patients with Marfan syndrome or Marfanoidfeatures leads to the identification of 11 novel and three previouslyreported mutations Hum Mutat 20 406ndash407

39 ParkES PutnamEA ChitayatD ChildA and MilewiczDM(1998) Clustering of FBN2 mutations in patients with congenitalcontractural arachnodactyly indicates an important role ofthe domains encoded by exons 24 through 34 during humandevelopment Am J Med Genet 78 350ndash355

40 PalzM TieckeF BoomsP GoldnerB RosenbergT FuchsJSkovbyF SchumacherH KaufmannUC von KodolitschYet al (2000) Clustering of mutations associated with mildMarfan-like phenotypes in the 30 region of FBN1 suggests apotential genotype-phenotype correlation Am J Med Genet 91212ndash221

41 NijbroekG SoodS McIntoshI FrancomanoCA BullEPereiraL RamirezF PyeritzRE and DietzHC (1995)

Fifteen novel FBN1 mutations causing Marfan syndrome detectedby heteroduplex analysis of genomic amplicons Am J HumGenet 57 8ndash21

42 McGroryJ and ColeWG (1999) Alternative splicing of exon 37 ofFBN1 deletes part of an lsquoeight-cysteinersquo domain resulting in theMarfan syndrome Clin Genet 55 118ndash121

43 LoeysB NuytinckL DelvauxI De BieS and De PaepeA(2001) Genotype and phenotype analysis of 171 patients referred formolecular study of the fibrillin-1 gene FBN1 because of suspectedMarfan syndrome Arch Intern Med 161 2447ndash2454

44 LiuWO OefnerPJ QianC OdomRS and FranckeU (1997)Denaturing HPLC-identified novel FBN1 mutations polymorph-isms and sequence variants in Marfan syndrome and relatedconnective tissue disorders Genet Test 1 237ndash242

45 HutchinsonS WordsworthBP and HandfordPA (2001)Marfan syndrome caused by a mutation in FBN1 that gives rise tocryptic splicing and a 33 nucleotide insertion in the coding sequenceHum Genet 109 416ndash420

46 HallidayD HutchinsonS KettleS FirthH WordsworthP andHandfordPA (1999) Molecular analysis of eight mutations inFBN1 Hum Genet 105 587ndash597

47 GuptaPA WallisDD ChinTO NorthrupH Tran-FaduluVT TowbinJA and MilewiczDM (2004) FBN2mutation associated with manifestations of Marfan syndrome andcongenital contractural arachnodactyly J Med Genet 41 e56

48 GuptaPA PutnamEA CarmicalSG KaitilaI SteinmannBChildA DanesinoC MetcalfeK BerrySA ChenE et al(2002) Ten novel FBN2 mutations in congenital contractural ara-chnodactyly delineation of the molecular pathogenesis and clinicalphenotype Hum Mutat 19 39ndash48

49 GuoD TanFK CantuA PlonSE and MilewiczDM (2001)FBN1 exon 2 splicing error in a patient with Marfan syndromeAm J Med Genet 101 130ndash134

50 DietzHC McIntoshI SakaiLY CorsonGM ChalbergSCPyeritzRE and FrancomanoCA (1993) Four novel FBN1mutations significance for mutant transcript level and EGF-likedomain calcium binding in the pathogenesis of Marfan syndromeGenomics 17 468ndash475

51 ComeglioP JohnsonP ArnoG BriceG EvansAAragon-MartinJ da SilvaFP KiotsekoglouA and ChildA(2007) The importance of mutation detection in Marfan syndromeand Marfan-related disorders report of 193 FBN1 mutations HumMutat 28 928

52 Collod-BeroudG Le BourdellesS AdesL Ala-KokkoLBoomsP BoxerM ChildA ComeglioP De PaepeAHylandJC et al (2003) Update of the UMD-FBN1 mutationdatabase and creation of an FBN1 polymorphism database HumMutat 22 199ndash208

53 ChikumiH YamamotoT OhtaY NanbaE NagataKNinomiyaH NarasakiK KatohT HisatomeI OnoK et al(2000) Fibrillin gene (FBN1) mutations in Japanese patients withMarfan syndrome J Hum Genet 45 115ndash118

54 BigginA HolmanK BrettM BennettsB and AdesL (2004)Detection of thirty novel FBN1 mutations in patients with Marfansyndrome or a related fibrillinopathy Hum Mutat 23 99

55 AttanasioM LapiniI EvangelistiL LucariniL GiustiBPorcianiM FattoriR AnichiniC AbbateR GensiniG et al(2008) FBN1 mutation screening of patients with Marfan syndromeand related disorders detection of 46 novel FBN1 mutations ClinGenet 74 39ndash46

56 LoeysBL ChenJ NeptuneER JudgeDP PodowskiMHolmT MeyersJ LeitchCC KatsanisN SharifiN et al(2005) A syndrome of altered cardiovascular craniofacialneurocognitive and skeletal development caused by mutations inTGFBR1 or TGFBR2 Nat Genet 37 275ndash281

57 HoudayerC DehainaultC MattlerC MichauxDCaux-MoncoutierV Pages-BerhouetS drsquoEnghienCD LaugeACasteraL Gauthier-VillarsM et al (2008) Evaluation of in silicosplice tools for decision-making in molecular diagnosis HumMutat 29 975ndash982

58 TournierI VezainM MartinsA CharbonnierFBaert-DesurmontS OlschwangS WangQ BuisineMPSoretJ TaziJ et al (2008) A large fraction of unclassified variants

12 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

of the mismatch repair genes MLH1 and MSH2 is associated withsplicing defects Hum Mutat 29 1412ndash1424

59 AuclairJ BusineMP NavarroC RuanoE MontmainGDesseigneF SaurinJC LassetC BonadonaV GiraudS et al(2006) Systematic mRNA analysis for the effect of MLH1 andMSH2 missense and silent mutations on aberrant splicing HumMutat 27 145ndash154

60 Di BlasiC HeY MorandiL CornelioF GuicheneyP andMoraM (2001) Mild muscular dystrophy due to a nonsensemutation in the LAMA2 gene resulting in exon skipping Brain124 698ndash704

61 DissetA BourgeoisCF BenmalekN ClaustresM SteveninJand Tuffery-GiraudS (2006) An exon skipping-associated nonsensemutation in the dystrophin gene uncovers a complex interplaybetween multiple antagonistic splicing elements Hum Mol Genet15 999ndash1013

62 FackenthalJD CartegniL KrainerAR and OlopadeOI (2002)BRCA2 T2722R is a deleterious allele that causes exon skippingAm J Hum Genet 71 625ndash631

63 FairbrotherWG YehRF SharpPA and BurgeCB (2002)Predictive identification of exonic splicing enhancers in humangenes Science 297 1007ndash1013

64 MazoyerS PugetN Perrin-VidozL LynchHTSerova-SinilnikovaOM and LenoirGM (1998) A BRCA1nonsense mutation causes exon skipping Am J Hum Genet 62713ndash715

65 NielsenKB SorensenS CartegniL CorydonTJ DoktorTKSchroederLD ReinertLS ElpelegO KrainerARGregersenN et al (2007) Seemingly neutral polymorphicvariants may confer immunity to splicing-inactivating mutations asynonymous SNP in exon 5 of MCAD protects from deleteriousmutations in a flanking exonic splicing enhancer Am J HumGenet 80 416ndash432

66 ZatkovaA MessiaenL VandenbrouckeI WieserRFonatschC KrainerAR and WimmerK (2004) Disruption ofexonic splicing enhancer elements is the principal cause of exonskipping associated with seven nonsense or missense alleles of NF1Hum Mutat 24 491ndash501

67 den DunnenJT and AntonarakisSE (2000) Mutation nomencla-ture extensions and suggestions to describe complex mutations adiscussion Hum Mutat 15 7ndash12

68 FredericMY MoninoC MarschallC HamrounD FaivreLJondeauG KleinHG NeumannL GautierE BinquetC et al(2008) The FBN2 gene new mutations locus-specific database(Universal Mutation Database FBN2) and genotype-phenotypecorrelations Hum Mutat 30 181ndash190

69 FredericMY HamrounD FaivreL BoileauC JondeauGClaustresM BeroudC and Collod-BeroudG (2008) A new locus-specific database (LSDB) for mutations in the TGFBR2 geneUMD-TGFBR2 Hum Mutat 29 33ndash38

70 FrankV Ortiz BruchleN MagerS FrintsSG BohringA duBoisG DebatinI SeidelH SenderekJ BesbasN et al (2007)Aberrant splicing is a common mutational mechanism in MKS1 akey player in Meckel-Gruber syndrome Hum Mutat 28 638ndash639

71 AnczukowO BuissonM SallesMJ TribouletS LongyMLidereauR SinilnikovaOM and MazoyerS (2008) Unclassifiedvariants identified in BRCA1 exon 11 Consequences on splicingGenes Chromosomes Cancer 47 418ndash426

72 NgW LohAX TeixeiraAS PereiraSP and SwallowDM(2008) Genetic regulation of MUC1 alternative splicing in humantissues Br J Cancer 99 978ndash985

73 BaalaL RomanoS KhaddourR SaunierS SmithUMAudollentS OzilouC FaivreL LaurentN FoliguetB et al(2007) The Meckel-Gruber syndrome gene MKS3 is mutated inJoubert syndrome Am J Hum Genet 80 186ndash194

74 HabaraY DoshitaM HirozawaS YokonoY YagiMTakeshimaY and MatsuoM (2008) A strong exonic splicingenhancer in dystrophin exon 19 achieve proper splicing without anupstream polypyrimidine tract J Biochem 143 303ndash310

75 Aartsma-RusA van VlietL HirschiM JansonAAHeemskerkH de WinterCL de KimpeS van DeutekomJCt HoenPA and van OmmenGJ (2008) Guidelines for AntisenseOligonucleotide Design and Insight Into Splice-modulatingMechanisms Mol Ther 17 548ndash553

76 KhanSG MetinA GozukaraE InuiH ShahlaviT Muniz-MedinaV BakerCC UedaT AikenJR SchneiderTD et al(2004) Two essential splice lariat branchpoint sequences in oneintron in a xeroderma pigmentosum DNA repair gene mutationsresult in reduced XPC mRNA levels that correlate with cancer riskHum Mol Genet 13 343ndash352

77 SharpPA and BurgeCB (1997) Classification of introns U2-typeor U12-type Cell 91 875ndash879

78 ChasinLA (2007) Searching for splicing motifs Adv Exp MedBiol 623 85ndash106

79 NallaVK and RoganPK (2005) Automated splicing mutationanalysis by information theory Hum Mutat 25 334ndash342

80 BeroudC Tuffery-GiraudS MatsuoM HamrounDHumbertclaudeV MonnierN MoizardMP VoelckelMACalemardLM BoisseauP et al (2007) Multiexon skipping lead-ing to an artificial DMD protein lacking amino acids from exons 45through 55 could rescue up to 63 of patients with Duchennemuscular dystrophy Hum Mutat 28 196ndash202

81 (2007) What is the human variome project Nat Genet 39 42382 KainulainenK KarttunenL PuhakkaL SakaiL and

PeltonenL (1994) Mutations in the fibrillin gene responsible fordominant ectopia lentis and neonatal Marfan syndrome NatGenet 6 64ndash69

83 LiuW QianC ComeauK BrennT FurthmayrH andFranckeU (1996) Mutant fibrillin-1 monomers lacking EGF-likedomains disrupt microfibril assembly and cause severe marfansyndrome Hum Mol Genet 5 1581ndash1587

84 BoomsP CislerJ MathewsKR GodfreyM TieckeFKaufmannUC VetterU HagemeierC and RobinsonPN(1999) Novel exon skipping mutation in the fibrillin-1 gene two lsquohotspotsrsquo for the neonatal Marfan syndrome Clin Genet 55 110ndash117

85 WangM PriceC HanJ CislerJ ImaizumiKVan ThienenMN DePaepeA and GodfreyM (1995) Recurrentmis-splicing of fibrillin exon 32 in two patients with neonatalMarfan syndrome Hum Mol Genet 4 607ndash613

86 GodfreyM VandemarkN WangM VelinovM WargowskiDTsipourasP HanJ BeckerJ RobertsonW DrosteS et al(1993) Prenatal diagnosis and a donor splice site mutation infibrillin in a family with Marfan syndrome Am J Hum Genet 53472ndash480

87 WangM ClericuzioCL and GodfreyM (1996) Familialoccurrence of typical and severe lethal congenital contracturalarachnodactyly caused by missplicing of exon 34 of fibrillin-2Am J Hum Genet 59 1027ndash1034

88 KarttunenL UkkonenT KainulainenK SyvanenAC andPeltonenL (1998) Two novel fibrillin-1 mutations resulting in pre-mature termination codons but in different mutant transcript levelsand clinical phenotypes Hum Mutat Suppl 1 S34ndashS37

89 KosakiK TakahashiD UdakaT KosakiR MatsumotoMIbeS IsobeT TanakaY and TakahashiT (2006) Molecularpathology of Shprintzen-Goldberg syndrome Am J Med GenetA 140 104ndash108 author reply 109ndash110

90 LoeysBL SchwarzeU HolmT CallewaertBL ThomasGHPannuH De BackerJF OswaldGL SymoensSManouvrierS et al (2006) Aneurysm syndromes caused bymutations in the TGF-beta receptor N Engl J Med 355788ndash798

91 TranVK TakeshimaY ZhangZ HabaraY HaginoyaKNishiyamaA YagiM and MatsuoM (2007) A nonsensemutation-created intraexonic splice site is active in the lymphocytesbut not in the skeletal muscle of a DMD patient Hum Genet 120737ndash742

92 SharpA PichertG LucassenA and EcclesD (2004) RNAanalysis reveals splicing mutations and loss of expression defects inMLH1 and BRCA1 Hum Mutat 24 272

93 BurrowsNP NichollsAC RichardsAJ LuccariniCHarrisonJB YatesJR and PopeFM (1998) A point mutationin an intronic branch site results in aberrant splicing of COL5A1and in Ehlers-Danlos syndrome type II in two British families AmJ Hum Genet 63 390ndash398

94 SinnreichM TherrienC and KarpatiG (2006) Lariat branchpoint mutation in the dysferlin gene with mild limb-girdle musculardystrophy Neurology 66 1114ndash1116

Nucleic Acids Research 2009 13

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from

95 MaslenC BabcockD RaghunathM and SteinmannB (1997)A rare branch-point mutation is associated with missplicingof fibrillin-2 in a large family with congenital contracturalarachnodactyly Am J Hum Genet 60 1389ndash1398

96 VivenzaD GuazzarottiL GodiM FrascaD di NataleBMomigliano-RichiardiP BonaG and GiordanoM (2006) Anovel deletion in the GH1 gene including the IVS3 branch siteresponsible for autosomal dominant isolated growth hormonedeficiency J Clin Endocrinol Metab 91 980ndash986

97 ChavanasS GacheY VaillyJ KanitakisJ PulkkinenLUittoJ OrtonneJ and MeneguzziG (1999) Splicing modulationof integrin beta4 pre-mRNA carrying a branch point mutationunderlies epidermolysis bullosa with pyloric atresia undergoingspontaneous amelioration with ageing Hum Mol Genet 82097ndash2105

98 KuivenhovenJA WeibuschH PritchardPH FunkeHBenneR AssmannG and KasteleinJJ (1996) An intronicmutation in a lariat branchpoint sequence is a direct cause of aninherited human disorder (fish-eye disease) J Clin Invest 98358ndash364

99 WebbJC PatelDD ShouldersCC KnightBL andSoutarAK (1996) Genetic variation at a splicing branch point in

intron 9 of the low density lipoprotein (LDL)-receptor gene a raremutation that disrupts mRNA splicing in a patient with familialhypercholesterolaemia and a common polymorphism Hum MolGenet 5 1325ndash1331

100 Di LeoE PanicoF TarugiP BattistiC FedericoA andCalandraS (2004) A point mutation in the lariat branch point ofintron 6 of NPC1 as the cause of abnormal pre-mRNA splicing inNiemann-Pick type C disease Hum Mutat 24 440

101 Vuillaumier-BarrotS Le BizecC De LonlayP Madinier-ChappatN BarnierA DupreT DurandG and SetaN (2006)PMM2 intronic branch-site mutations in CDG-Ia Mol GenetMetab 87 337ndash340

102 JanssenRJ WeversRA HausslerM LuytenJASteenbergen-SpanjersGC HoffmannGF NagatsuT and Vanden HeuvelLP (2000) A branch site mutation leading to aberrantsplicing of the human tyrosine hydroxylase gene in a child with asevere extrapyramidal movement disorder Ann Hum Genet 64375ndash382

103 MayerK BallhausenW LeistnerW and RottH (2000) Threenovel types of splicing aberrations in the tuberous sclerosis TSC2gene caused by mutations apart from splice consensus sequencesBiochim Biophys Acta 1502 495ndash507

14 Nucleic Acids Research 2009

by guest on February 9 2016httpnaroxfordjournalsorg

Dow

nloaded from