The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout...

8
The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data Gautier Koscielny 1, *, Gagarine Yaikhom 2 , Vivek Iyer 3 , Terrence F. Meehan 1 , Hugh Morgan 2 , Julian Atienza-Herrero 2 , Andrew Blake 2 , Chao-Kung Chen 1 , Richard Easty 3 , Armida Di Fenza 2 , Tanja Fiegel 2 , Mark Grifiths 3 , Alan Horne 3 , Natasha A. Karp 3 , Natalja Kurbatova 1 , Jeremy C. Mason 1 , Peter Matthews 3 , Darren J. Oakley 3 , Asfand Qazi 3 , Jack Regnart 3 , Ahmad Retha 2 , Luis A. Santos 2 , Duncan J. Sneddon 2 , Jonathan Warren 1 , Henrik Westerberg 2 , Robert J. Wilson 3 , David G. Melvin 3 , Damian Smedley 3 , Steve D. M. Brown 2 , Paul Flicek 1 , William C. Skarnes 3 , Ann-Marie Mallon 2 and Helen Parkinson 1 1 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2 Medical Research Council Harwell (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD, UK and 3 Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Received August 15, 2013; Revised September 20, 2013; Accepted October 1, 2013 ABSTRACT The International Mouse Phenotyping Consortium (IMPC) web portal (http://www.mousephenotype. org) provides the biomedical community with a unified point of access to mutant mice and rich col- lection of related emerging and existing mouse phenotype data. IMPC mouse clinics worldwide follow rigorous highly structured and standardized protocols for the experimentation, collection and dissemination of data. Dedicated ‘data wranglers’ work with each phenotyping center to collate data and perform quality control of data. An automated statistical analysis pipeline has been developed to identify knockout strains with a significant change in the phenotype parameters. Annotation with bio- medical ontologies allows biologists and clinicians to easily find mouse strains with phenotypic traits relevant to their research. Data integration with other resources will provide insights into mamma- lian gene function and human disease. As pheno- type data become available for every gene in the mouse, the IMPC web portal will become an invaluable tool for researchers studying the genetic contributions of genes to human diseases. INTRODUCTION The goal of the International Mouse Phenotyping Consortium (IMPC) is to generate and phenotypically characterize knockout mutant strains for every protein- coding gene in the mouse (1,2). The IMPC was established as a large-scale coordinated effort of mouse clinics world- wide to undertake broad-based primary phenotyping of mutant mouse strains that carry a null mutation in a protein-coding gene (3,4). This program builds on the col- lection of mutant embryonic stem (ES) cells available from the International Knockout Mouse Consortium (IKMC) (5) and pilot programs that have established a set of robust high-throughput phenotyping assays (6,7). In this article, we describe the functionality and data available from the IMPC web portal delivered by the Mouse Phenotyping Informatics Infrastructure (MPI2) consor- tium comprising EMBL-EBI, MRC Harwell and the Wellcome Trust Sanger Institute (8). The IMPC portal is the central point of access to high- throughput phenotype data, IKMC ES cell resources and *To whom correspondence should be addressed. Tel: +44 1223 494 650; Fax: +44 1223 494 468; Email: [email protected] The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors. Present address: Darren J. Oakley, Nature Publishing Group, The Macmillan Building, 4 Crinan Street, London N1 9XW, UK. D802–D809 Nucleic Acids Research, 2014, Vol. 42, Database issue Published online 4 November 2013 doi:10.1093/nar/gkt977 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Transcript of The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout...

The International Mouse Phenotyping ConsortiumWeb Portal a unified point of access for knockoutmice and related phenotyping dataGautier Koscielny1 Gagarine Yaikhom2 Vivek Iyer3 Terrence F Meehan1

Hugh Morgan2 Julian Atienza-Herrero2 Andrew Blake2 Chao-Kung Chen1

Richard Easty3 Armida Di Fenza2 Tanja Fiegel2 Mark Grifiths3 Alan Horne3

Natasha A Karp3 Natalja Kurbatova1 Jeremy C Mason1 Peter Matthews3

Darren J Oakley3 Asfand Qazi3 Jack Regnart3 Ahmad Retha2 Luis A Santos2

Duncan J Sneddon2 Jonathan Warren1 Henrik Westerberg2 Robert J Wilson3

David G Melvin3 Damian Smedley3 Steve D M Brown2 Paul Flicek1

William C Skarnes3 Ann-Marie Mallon2 and Helen Parkinson1

1European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) Wellcome TrustGenome Campus Hinxton Cambridge CB10 1SD UK 2Medical Research Council Harwell (MammalianGenetics Unit and Mary Lyon Centre) Harwell Oxfordshire OX11 0RD UK and 3Mouse Informatics GroupWellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SA UK

Received August 15 2013 Revised September 20 2013 Accepted October 1 2013

ABSTRACT

The International Mouse Phenotyping Consortium(IMPC) web portal (httpwwwmousephenotypeorg) provides the biomedical community with aunified point of access to mutant mice and rich col-lection of related emerging and existing mousephenotype data IMPC mouse clinics worldwidefollow rigorous highly structured and standardizedprotocols for the experimentation collection anddissemination of data Dedicated lsquodata wranglersrsquowork with each phenotyping center to collate dataand perform quality control of data An automatedstatistical analysis pipeline has been developed toidentify knockout strains with a significant changein the phenotype parameters Annotation with bio-medical ontologies allows biologists and cliniciansto easily find mouse strains with phenotypic traitsrelevant to their research Data integration withother resources will provide insights into mamma-lian gene function and human disease As pheno-type data become available for every gene inthe mouse the IMPC web portal will become an

invaluable tool for researchers studying thegenetic contributions of genes to human diseases

INTRODUCTION

The goal of the International Mouse PhenotypingConsortium (IMPC) is to generate and phenotypicallycharacterize knockout mutant strains for every protein-coding gene in the mouse (12) The IMPC was establishedas a large-scale coordinated effort of mouse clinics world-wide to undertake broad-based primary phenotyping ofmutant mouse strains that carry a null mutation in aprotein-coding gene (34) This program builds on the col-lection of mutant embryonic stem (ES) cells available fromthe International Knockout Mouse Consortium (IKMC)(5) and pilot programs that have established a set ofrobust high-throughput phenotyping assays (67) In thisarticle we describe the functionality and data availablefrom the IMPC web portal delivered by the MousePhenotyping Informatics Infrastructure (MPI2) consor-tium comprising EMBL-EBI MRC Harwell and theWellcome Trust Sanger Institute (8)

The IMPC portal is the central point of access to high-throughput phenotype data IKMC ES cell resources and

To whom correspondence should be addressed Tel +44 1223 494 650 Fax +44 1223 494 468 Email koscielnebiacuk

The authors wish it to be known that in their opinion the first three authors should be regarded as Joint First Authors

Present addressDarren J Oakley Nature Publishing Group The Macmillan Building 4 Crinan Street London N1 9XW UK

D802ndashD809 Nucleic Acids Research 2014 Vol 42 Database issue Published online 4 November 2013doi101093nargkt977

The Author(s) 2013 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

mutant mouse strains The progress of mouse productionand phenotyping is presented for each gene with links torepositories that are distributing the mutant mouse strainsand the availability of IKMC ES cell resources and themolecular structures of the mutant alleles Both mutantalleles and phenotype data are seamlessly integrated withexisting community resources and databases Forinstance the Mouse Genome Informatics (httpwwwinformaticsjaxorg) database is used for defining mousegenetic alleles (9) and Ensembl (httpwwwensemblorg)for defining genomic contexts (10)

In addition to the information made accessible throughthe web portal the IMPC computational frameworkallows public access to the raw data using standardsoftware interfaces and web services The software appli-cation source code for these components and statisticalanalysis tools are also provided for community use

THE IMPC WEB PORTAL

The services provided by the IMPC fulfill the needs ofseveral user groups representing different strands of thebiomedical community These include external use cases1ndash5 and project-specific use cases 6ndash7

(1) The community of biomedical researchers accessingstatistically significant phenotypic associations for agiven gene eg a rare disease researcher searchingfor specific phenotypes of interest

(2) Researchers requiring mouse specimens or geneticmaterial for which phenotype data are availableeg a researcher who wishes to conduct secondaryexperiments on a well-known gene to augmentexisting broad-based phenotype data

(3) System biologists and statisticians seeking access tolarge-scale standardized gene-phenotype datasets toperform their own analysis

(4) Informatics users accessing all or partial datasetsfor inclusion in their own resource set eg projectFaceBase (httpswwwfacebaseorg) (11) thatcatalogs animal models related to craniofacialabnormalities

(5) High-throughput phenotyping centers producing andexporting their data eg KOMP2 or IMPC members

(6) Data wranglers (dedicated experts in reviewing andQC of phenotypic data) carrying out quality control(QC) checks on the raw data eg correction of datasubmission errors detection of baseline drift due toinstrumentation harmonization and standardizationof protocols across centers among others

(7) Funding bodies that are tracking the progress ofmouse production and phenotyping efforts and thestate of production collection and dissemination ofthe data

Given the diverse user groups that the IMPC servicesare targeting we have created a range of specialistsoftware tools for data analysis and dissemination thatcaters to these specific requirements Personas arecreated for user groups and extensive usability testing ofnew interface features is performed User feedback and

testing is gathered via deployment of a beta testing siteand is conducted with small groups of users to ensurethe components meet their needs Applications aredeveloped over a unifying framework of distributedcomputational resources and databases and delivered viathe web portal which we discuss in detail later

Data access

The web portalrsquos primary function is to display genotypendashphenotype data for knockout lines for the biomedical com-munity The site has been optimized to allow free textqueries that return structured data via facets allowing theuser to explore the data with a mixture of query terms Forexample a query for lsquoPfn1rsquo returns a summary page with alink to the gene and indicates the production status andavailability of phenotype data of mouse strains carrying anull allele of the gene Users are able to register for genes ofinterest and will be alerted via email when the genechanges status indicating new data are available or amouse is available A query for a general term lsquoglucosersquoreturns a summary page with a list of matches based onthe results for glucose-related genes phenotyping proto-cols measuring glucose function and glucose-relatedphenotypes Similarly anatomical queries such as lsquoeyersquoreturn results for genes protocols and phenotypes as wellas relevant images indexed by their annotationsPhenotype data are obtained from the IMPC web portal

by several routes that are tailored to the user groupsdescribed earlier Dedicated gene pages contain phenotypeassociation tables that list statistically significant pheno-types that occur in mouse strains carrying null mutationsof the given gene (Figure 1) One use case identifiedamong different types of users was the need to allowimmediate access to data before a strain has completedphenotyping Therefore data are uploaded as soon as theyare quality controlled after export from the centers andare available on gene pages with the status lsquoIMPCPhenotyping Status Startedrsquo Users can navigate tounderlying data supporting these assertions by clickingon graph icons The gene page also contains a list of EScells and mice available for this gene with links to thedetailed molecular structure of the allele and links torepositories that distribute the materials (Figure 2)Phenotype pages present a list of mutant mouse strains

associated with a phenotype described using MammalianPhenotype Ontology terms (12) The assigned terms reflectthe phenotyping procedures which assay anatomybehavior blood chemistry etc of the mutant line Forexample the lsquocorneal opacityrsquo page available at httpwwwmousephenotypeorgdataphenotypesMP0001314displays a list of all mouse strains in the database assoc-iated with this MP term A selected representation ofthe abnormal phenotype is provided when images areavailable

Public access to data and code repository

We have identified user groups requiring programmaticaccess to data We provide a Simple Object AccessProtocol (SOAP) web service to retrieve the diversestandard operating procedures (SOPs) from the IMPC

Nucleic Acids Research 2014 Vol 42 Database issue D803

pipeline and legacy protocols (httpwwwmousephenotypeorgimpresssoapserverwsdl) and aRESTful interface to the mouse alleles experimentalresults and genotypendashphenotype associations from thestatistical analysis We also provide both RESTful andBioMart interfaces to access details of mouse productionfrom the portal (httpwwwmousephenotypeorgimits)The portal code is distributed under the Apache v2software license and available on GitHub (httpsgithubcommpi2) and supported by user documentation (httpsgithubcommpi2PhenotypeArchivewiki) The project

uses an agile development approach and delivers newsoftware releases via the portal and supporting codereleases every month

DATA ACQUISITION AND QC

Phenotype data produced by the centers are first recordedin local laboratory information management systemsmanaged by the individual phenotyping centersHowever for these data to become part of the IMPCdataset the data must be captured by rigorously following

Figure 1 A view of the gene and phenotype data for Cib2 a calcium and integrin-binding family protein The phenotype heatmap shows significantphenotypes for auditory and brainstem and behavioral tests (Plt 00001) Users can explore underlying data by clicking on phenotype names Thegraph shows Cib2 homozygous knockout animals have impaired response to sound stimulus indicating a significant hearing defect as well asabnormal ear morphology A stock image of an abnormal ear is provided for reference

D804 Nucleic Acids Research 2014 Vol 42 Database issue

a well-defined set of SOPs Furthermore the recordedmeasurements must conform to a standardized specifica-tion which includes the unit of measurement the numberof measurements to be taken and other essential metadataWithin the IMPC consortium phenotyping experi-ments are referred to as lsquoproceduresrsquo and the set ofmeasurements produced by a procedure as lsquoparametersrsquoThe SOPs and the specifications for each of the proceduresand parameters are stored in the IMPReSS database(see later)

Phenotype data collection validation and dissemination

When the data are ready for collection and collation thephenotyping centers export their data as ExtensibleMarkup Language (XML) documents (httpwwww3orgTRREC-xml) These are documents that conformto the standardized data exchange format defined by theIMPC consortium using the XML Schema DefinitionLanguage (XSD) specified by the W3C consortium(httpwwww3orgTRxmlschema11-1 httpwwww3orgTRxmlschema11-2) The IMPC Data CoordinationCentre at MRC Harwell then downloads these documentsThe provenance and chain-of-custody of the data ismanaged using a data tracker not presented here Asshown in Figure 3 data processing happens in threemain phases to ensure the highest level of data integrityand traceability In the first phase the data exported bythe mouse clinics are validated against the required pro-cedure and parameter specifications as defined in the SOPs(Data Coordination Centre component) and the suppliedvalues are checked against the corresponding context-specific databases eg check for existence of a mousestrain in the IMPC Mouse Tracking System (iMITS)

In the second phase validated data are incorporated tothe centralized dataset and additional processing iscarried out to prepare the data for effective visualizationand statistical analysis The data are then made availableto the data wranglers for QC checks and also to re-searchers for preliminary data analysis In the third andfinal stage data that have passed QC are sent to theCentral Data Archive at EMBL-EBI where they aremade available as curated phenotype data The pipelineis designed to ensure data are publicly available as quicklyas possible to the users of the portal

Quality control

The IMPC aims to provide the highest quality data to thebiomedical community QC checks are performed inaddition to the checks performed at the mouse clinicsThe QC process involves identifying anomalies in thesubmitted data The aim is to remove data entry and com-munication errors before the measurements undergoextensive statistical analysis Some of the QC issuesidentified are missing data for required parametersmissing wild-type measurements duplicate measurementsmeasurements with wrong units unexpected values (eg 0or negative body weight) out-of-bounds and outliersamong others These are then communicated to thephenotyping centers which either fix the issue by correct-ing the error or provide an explanation All of theidentified issues are captured and managed using customQC tools QC tools provide the users (mouse centers anddata wranglers) with an integrated workbench forvisualization analysis identification and resolution ofQC issues By providing an interactive web applicationthat is designed specifically for the visualization of

Figure 2 iMITS (httpswwwmousephenotypeorgimits) stores and provides summary and detailed production information Users can view high-level allele information on the IMPC portal gene pages The iMITS tab of the IMPC portal shows detailed IMPC production information eg forSdha Information to access iMITS is provided on the IMPC homepage

Nucleic Acids Research 2014 Vol 42 Database issue D805

mouse phenotype data we are able to streamline theworkflow

Data availability

Phenotype data collection started in early 2013 and todate 2 079 607 data points at eight mouse clinics are avail-able and are undergoing QC before export for archiving(Table 1) This excludes legacy data that are alreadyarchived and available for query Data for 19 differentphenotyping procedures are available in the IMPReSSdatabase (Table 2) Mouse production of IKMC alleleshas been tracked since 2008 To date ES cell microinjec-tions that have produced gt3000 mouse lines are recorded

SOPS AND PROTOCOLS

IMPC provides high-quality phenotype data by followingrigorous data collection processes This is achieved by

reducing exposure to human error via semi-automateddata collection and validation processes The applicationof such procedures across multiple centers allows reliabledetection of subtle phenotypes eg in the broad-basedphenotyping of C57BL6J and C67BL6N mouse strains(13) Part of this automation is made possible due to theIMPC protocols which consist of the SOPs and the pro-cedure and parameter specifications These protocols aremaintained in a form that is both human readable and

Figure 3 A schematic overview of data flows into the web portal for IMPC data Currently eight mouse clinics are involved in IMPC and producephenotype data These are then collected validated and processed to produce curated data available from the project portal Legacy data fromEuroPhenome and Sanger MGP were directly transferred to the Central Data Archive at EMBL-EBI for direct integration on the portal

Table 2 Mouse phenotyping data points by SOP September 2013

Procedures Number ofdata points

Acoustic startle and pre-pulse inhibition (PPI) 80 787Auditory brain stem response 30 606Body composition (DEXA leanfat) 59 232Body weight 205 194Challenge whole body plethysmography 48 464Clinical blood chemistry 152 217Combined SHIRPA and dysmorphology 158 199Echo 5916Electrocardiogram (ECG) 35 407Eye morphology 181 839Grip strength 71 298Heart weight 26 520Hematology 94 416Indirect calorimetry 627 049Insulin blood level 38Intraperitoneal glucose tolerance test (IPGTT) 70 141Open field 34 416Organs weight 11 746X-ray 186 122

Total 2 079 607

Table 1 Mouse phenotyping data points by submitting center

September 2013

Mouse clinics Number of data points

Baylor College of Medicine 0Helmholtz Zentrum Munchen 75 662Institut Clinique de la Souris 446 670MRC Harwell 164 037The Jackson Laboratory 142 221The Toronto Centre for Phenogenomics 20 365University of California Davis 59 190Wellcome Trust Sanger Institute 1 171 462

Total 2 079 607

D806 Nucleic Acids Research 2014 Vol 42 Database issue

machine consumable The IMPC protocols areavailable from the International Mouse PhenotypingResource of Standardized Screens (IMPReSS) oneof the services provided by the IMPC infrastructureMachine-readable web services (httpswwwmous-ephenotypeorgimpresssoapserverwsdl) are used in thevalidation processes discussed in the data acquisition andQC section and a dedicated relational database storesSOPs

The adoption of standardized phenotyping protocolsacross all of the participating mouse clinics requires thatthe same procedures be carried out under the sameconditions specified by the protocol These protocolshave been agreed through active collaboration betweenthe data wranglers (who also administer the contents ofthe IMPReSS database) the phenotyping centers andmembers of the scientific community

The IMPReSS database maintains multiple pheno-typing pipelines where a lsquopipelinersquo is simply an orderedsequence of phenotyping procedures to be carried outThis caters to specific circumstances where a centerwishes to record and export supplementary data inaddition to those that are required by the standardIMPC pipeline This allows incorporation of data col-lected using historic pipelines such as EUMODIC TheIMPReSS database uses the Mammalian Phenotype (MP)Ontology terms (1214) to annotate procedures andparameters eg the parameter lsquoincreased bloodglucose concentrationrsquo is annotated to lsquoimpaired glucosetolerancersquo (MP0005203) These ontology terms convert anumerical data point via statistical analysis and the termannotated to the SOP to provide text definitions of pheno-deviance (a statistically significant result indicating aphenotype different from a wild-type animal of the samebackground strain) A specific knockout line may havemany different terms annotated to fully capture the pheno-types elicited by multiple SOPs and reflecting thecomplexity and variety of the SOPs applied

DATA INTEGRATION AND ONTOLOGIES

The IMPC portal relies on publicly available dataintegrated in context for different categories of usersFor example the IKMC resource (4) provides informationon ES cells availability mouse repositories such asEMMA (15) provide access to mice Ensembl(10) provides the genomic framework for each knockoutand Mouse Genome Informatics provides gene nomencla-ture and mouse ontology terms Ontologies are widelyused throughout the portal Project-specific views orslims which provide a relevant subset of the ontology ofthe Mouse Phenotype Ontology and the AdultMouse Anatomy Ontology (16) are used to annotatemutants support online user queries and are built intothe schema of our RESTful interface Ontologies arestored locally in a dedicated part of the schema for easeof query and processing and we expect to integrate termsmapping human anatomy and disease to data in thefuture

STATISTICAL ANALYSIS

A major goal of the IMPC is to assign functions toprotein-coding genes using high-throughput phenotypingassays and to extend the primary observations intospecialized fields of research using additional secondaryphenotyping screens High-throughput phenotypingassays produce many different types of data that may becontinuous categorical or time-series numerical dataimages or text descriptions of the parameters measuredduring this assay Data generated from knockout miceare then subjected to statistical analysis where the param-eters measured during the assay are compared with thesame parameters measured in parallel from control wild-type mice from an identical background strain The ex-perimental design also plays a fundamental role in theimplementation of a robust and reproducible analysis ofknockout phenotypic effects (1718) which requirescontrol selection to be given considerable attentionTo identify pheno-deviant lines we have implemented

using the R statistical computing toolkit (httpwwwr-projectorg) a statistical analysis pipeline based on thecomparison of each knockout line population with awild-type control population from a well-defined geneticbackground (C57BL6N) Continuous and time-seriesdata are analyzed using a linear mixed model frame-work (1719) Linear mixed models multiple sources ofvariability on a phenotype where some explanatoryfactors such as sex weight and knockout mutantgenotype are assumed to take fixed values while otherssuch as batch (measurements collected on a particularday) will be source of random effect (for example owingto laboratory conditions) We summarize time-series data(eg area under the curve or mean) and this variable isthen used into the linear mixed model as a continuousvariable Categorical data contain data separable inmutually exclusive categories and deal with qualitative at-tributes of the observed object A Fisher exact testis performed on categorical data and provides a quantita-tive description of the differences between the knock-out and wild-type populations For each knockoutline we aim to analyze data for seven males and sevenfemales When a test is considered statistically significantontology terms from the Mammalian Phenotype Ontology(12) are automatically associated to the individualgenotypes based on association specified in IMPReSSfor every parameter (14)

PROJECT TRACKING

The iMITS (httpwwwmousephenotypeorgimits) is thecentral database for the planning and tracking of IMPCmouse production The database contains the catalogs ofall IKMC ES cell clones and IMPC mouse alleles theirdetailed molecular structure and QC data that verify themutant allele (5) Mutant cells and mice are made avail-able to the scientific community on request via designatedrepositories iMITS facilitates the distribution of theseproducts by capturing information on the nominated dis-tribution center(s) and providing appropriate order linksIMPC mouse production centers cooperate to maximize

Nucleic Acids Research 2014 Vol 42 Database issue D807

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

mutant mouse strains The progress of mouse productionand phenotyping is presented for each gene with links torepositories that are distributing the mutant mouse strainsand the availability of IKMC ES cell resources and themolecular structures of the mutant alleles Both mutantalleles and phenotype data are seamlessly integrated withexisting community resources and databases Forinstance the Mouse Genome Informatics (httpwwwinformaticsjaxorg) database is used for defining mousegenetic alleles (9) and Ensembl (httpwwwensemblorg)for defining genomic contexts (10)

In addition to the information made accessible throughthe web portal the IMPC computational frameworkallows public access to the raw data using standardsoftware interfaces and web services The software appli-cation source code for these components and statisticalanalysis tools are also provided for community use

THE IMPC WEB PORTAL

The services provided by the IMPC fulfill the needs ofseveral user groups representing different strands of thebiomedical community These include external use cases1ndash5 and project-specific use cases 6ndash7

(1) The community of biomedical researchers accessingstatistically significant phenotypic associations for agiven gene eg a rare disease researcher searchingfor specific phenotypes of interest

(2) Researchers requiring mouse specimens or geneticmaterial for which phenotype data are availableeg a researcher who wishes to conduct secondaryexperiments on a well-known gene to augmentexisting broad-based phenotype data

(3) System biologists and statisticians seeking access tolarge-scale standardized gene-phenotype datasets toperform their own analysis

(4) Informatics users accessing all or partial datasetsfor inclusion in their own resource set eg projectFaceBase (httpswwwfacebaseorg) (11) thatcatalogs animal models related to craniofacialabnormalities

(5) High-throughput phenotyping centers producing andexporting their data eg KOMP2 or IMPC members

(6) Data wranglers (dedicated experts in reviewing andQC of phenotypic data) carrying out quality control(QC) checks on the raw data eg correction of datasubmission errors detection of baseline drift due toinstrumentation harmonization and standardizationof protocols across centers among others

(7) Funding bodies that are tracking the progress ofmouse production and phenotyping efforts and thestate of production collection and dissemination ofthe data

Given the diverse user groups that the IMPC servicesare targeting we have created a range of specialistsoftware tools for data analysis and dissemination thatcaters to these specific requirements Personas arecreated for user groups and extensive usability testing ofnew interface features is performed User feedback and

testing is gathered via deployment of a beta testing siteand is conducted with small groups of users to ensurethe components meet their needs Applications aredeveloped over a unifying framework of distributedcomputational resources and databases and delivered viathe web portal which we discuss in detail later

Data access

The web portalrsquos primary function is to display genotypendashphenotype data for knockout lines for the biomedical com-munity The site has been optimized to allow free textqueries that return structured data via facets allowing theuser to explore the data with a mixture of query terms Forexample a query for lsquoPfn1rsquo returns a summary page with alink to the gene and indicates the production status andavailability of phenotype data of mouse strains carrying anull allele of the gene Users are able to register for genes ofinterest and will be alerted via email when the genechanges status indicating new data are available or amouse is available A query for a general term lsquoglucosersquoreturns a summary page with a list of matches based onthe results for glucose-related genes phenotyping proto-cols measuring glucose function and glucose-relatedphenotypes Similarly anatomical queries such as lsquoeyersquoreturn results for genes protocols and phenotypes as wellas relevant images indexed by their annotationsPhenotype data are obtained from the IMPC web portal

by several routes that are tailored to the user groupsdescribed earlier Dedicated gene pages contain phenotypeassociation tables that list statistically significant pheno-types that occur in mouse strains carrying null mutationsof the given gene (Figure 1) One use case identifiedamong different types of users was the need to allowimmediate access to data before a strain has completedphenotyping Therefore data are uploaded as soon as theyare quality controlled after export from the centers andare available on gene pages with the status lsquoIMPCPhenotyping Status Startedrsquo Users can navigate tounderlying data supporting these assertions by clickingon graph icons The gene page also contains a list of EScells and mice available for this gene with links to thedetailed molecular structure of the allele and links torepositories that distribute the materials (Figure 2)Phenotype pages present a list of mutant mouse strains

associated with a phenotype described using MammalianPhenotype Ontology terms (12) The assigned terms reflectthe phenotyping procedures which assay anatomybehavior blood chemistry etc of the mutant line Forexample the lsquocorneal opacityrsquo page available at httpwwwmousephenotypeorgdataphenotypesMP0001314displays a list of all mouse strains in the database assoc-iated with this MP term A selected representation ofthe abnormal phenotype is provided when images areavailable

Public access to data and code repository

We have identified user groups requiring programmaticaccess to data We provide a Simple Object AccessProtocol (SOAP) web service to retrieve the diversestandard operating procedures (SOPs) from the IMPC

Nucleic Acids Research 2014 Vol 42 Database issue D803

pipeline and legacy protocols (httpwwwmousephenotypeorgimpresssoapserverwsdl) and aRESTful interface to the mouse alleles experimentalresults and genotypendashphenotype associations from thestatistical analysis We also provide both RESTful andBioMart interfaces to access details of mouse productionfrom the portal (httpwwwmousephenotypeorgimits)The portal code is distributed under the Apache v2software license and available on GitHub (httpsgithubcommpi2) and supported by user documentation (httpsgithubcommpi2PhenotypeArchivewiki) The project

uses an agile development approach and delivers newsoftware releases via the portal and supporting codereleases every month

DATA ACQUISITION AND QC

Phenotype data produced by the centers are first recordedin local laboratory information management systemsmanaged by the individual phenotyping centersHowever for these data to become part of the IMPCdataset the data must be captured by rigorously following

Figure 1 A view of the gene and phenotype data for Cib2 a calcium and integrin-binding family protein The phenotype heatmap shows significantphenotypes for auditory and brainstem and behavioral tests (Plt 00001) Users can explore underlying data by clicking on phenotype names Thegraph shows Cib2 homozygous knockout animals have impaired response to sound stimulus indicating a significant hearing defect as well asabnormal ear morphology A stock image of an abnormal ear is provided for reference

D804 Nucleic Acids Research 2014 Vol 42 Database issue

a well-defined set of SOPs Furthermore the recordedmeasurements must conform to a standardized specifica-tion which includes the unit of measurement the numberof measurements to be taken and other essential metadataWithin the IMPC consortium phenotyping experi-ments are referred to as lsquoproceduresrsquo and the set ofmeasurements produced by a procedure as lsquoparametersrsquoThe SOPs and the specifications for each of the proceduresand parameters are stored in the IMPReSS database(see later)

Phenotype data collection validation and dissemination

When the data are ready for collection and collation thephenotyping centers export their data as ExtensibleMarkup Language (XML) documents (httpwwww3orgTRREC-xml) These are documents that conformto the standardized data exchange format defined by theIMPC consortium using the XML Schema DefinitionLanguage (XSD) specified by the W3C consortium(httpwwww3orgTRxmlschema11-1 httpwwww3orgTRxmlschema11-2) The IMPC Data CoordinationCentre at MRC Harwell then downloads these documentsThe provenance and chain-of-custody of the data ismanaged using a data tracker not presented here Asshown in Figure 3 data processing happens in threemain phases to ensure the highest level of data integrityand traceability In the first phase the data exported bythe mouse clinics are validated against the required pro-cedure and parameter specifications as defined in the SOPs(Data Coordination Centre component) and the suppliedvalues are checked against the corresponding context-specific databases eg check for existence of a mousestrain in the IMPC Mouse Tracking System (iMITS)

In the second phase validated data are incorporated tothe centralized dataset and additional processing iscarried out to prepare the data for effective visualizationand statistical analysis The data are then made availableto the data wranglers for QC checks and also to re-searchers for preliminary data analysis In the third andfinal stage data that have passed QC are sent to theCentral Data Archive at EMBL-EBI where they aremade available as curated phenotype data The pipelineis designed to ensure data are publicly available as quicklyas possible to the users of the portal

Quality control

The IMPC aims to provide the highest quality data to thebiomedical community QC checks are performed inaddition to the checks performed at the mouse clinicsThe QC process involves identifying anomalies in thesubmitted data The aim is to remove data entry and com-munication errors before the measurements undergoextensive statistical analysis Some of the QC issuesidentified are missing data for required parametersmissing wild-type measurements duplicate measurementsmeasurements with wrong units unexpected values (eg 0or negative body weight) out-of-bounds and outliersamong others These are then communicated to thephenotyping centers which either fix the issue by correct-ing the error or provide an explanation All of theidentified issues are captured and managed using customQC tools QC tools provide the users (mouse centers anddata wranglers) with an integrated workbench forvisualization analysis identification and resolution ofQC issues By providing an interactive web applicationthat is designed specifically for the visualization of

Figure 2 iMITS (httpswwwmousephenotypeorgimits) stores and provides summary and detailed production information Users can view high-level allele information on the IMPC portal gene pages The iMITS tab of the IMPC portal shows detailed IMPC production information eg forSdha Information to access iMITS is provided on the IMPC homepage

Nucleic Acids Research 2014 Vol 42 Database issue D805

mouse phenotype data we are able to streamline theworkflow

Data availability

Phenotype data collection started in early 2013 and todate 2 079 607 data points at eight mouse clinics are avail-able and are undergoing QC before export for archiving(Table 1) This excludes legacy data that are alreadyarchived and available for query Data for 19 differentphenotyping procedures are available in the IMPReSSdatabase (Table 2) Mouse production of IKMC alleleshas been tracked since 2008 To date ES cell microinjec-tions that have produced gt3000 mouse lines are recorded

SOPS AND PROTOCOLS

IMPC provides high-quality phenotype data by followingrigorous data collection processes This is achieved by

reducing exposure to human error via semi-automateddata collection and validation processes The applicationof such procedures across multiple centers allows reliabledetection of subtle phenotypes eg in the broad-basedphenotyping of C57BL6J and C67BL6N mouse strains(13) Part of this automation is made possible due to theIMPC protocols which consist of the SOPs and the pro-cedure and parameter specifications These protocols aremaintained in a form that is both human readable and

Figure 3 A schematic overview of data flows into the web portal for IMPC data Currently eight mouse clinics are involved in IMPC and producephenotype data These are then collected validated and processed to produce curated data available from the project portal Legacy data fromEuroPhenome and Sanger MGP were directly transferred to the Central Data Archive at EMBL-EBI for direct integration on the portal

Table 2 Mouse phenotyping data points by SOP September 2013

Procedures Number ofdata points

Acoustic startle and pre-pulse inhibition (PPI) 80 787Auditory brain stem response 30 606Body composition (DEXA leanfat) 59 232Body weight 205 194Challenge whole body plethysmography 48 464Clinical blood chemistry 152 217Combined SHIRPA and dysmorphology 158 199Echo 5916Electrocardiogram (ECG) 35 407Eye morphology 181 839Grip strength 71 298Heart weight 26 520Hematology 94 416Indirect calorimetry 627 049Insulin blood level 38Intraperitoneal glucose tolerance test (IPGTT) 70 141Open field 34 416Organs weight 11 746X-ray 186 122

Total 2 079 607

Table 1 Mouse phenotyping data points by submitting center

September 2013

Mouse clinics Number of data points

Baylor College of Medicine 0Helmholtz Zentrum Munchen 75 662Institut Clinique de la Souris 446 670MRC Harwell 164 037The Jackson Laboratory 142 221The Toronto Centre for Phenogenomics 20 365University of California Davis 59 190Wellcome Trust Sanger Institute 1 171 462

Total 2 079 607

D806 Nucleic Acids Research 2014 Vol 42 Database issue

machine consumable The IMPC protocols areavailable from the International Mouse PhenotypingResource of Standardized Screens (IMPReSS) oneof the services provided by the IMPC infrastructureMachine-readable web services (httpswwwmous-ephenotypeorgimpresssoapserverwsdl) are used in thevalidation processes discussed in the data acquisition andQC section and a dedicated relational database storesSOPs

The adoption of standardized phenotyping protocolsacross all of the participating mouse clinics requires thatthe same procedures be carried out under the sameconditions specified by the protocol These protocolshave been agreed through active collaboration betweenthe data wranglers (who also administer the contents ofthe IMPReSS database) the phenotyping centers andmembers of the scientific community

The IMPReSS database maintains multiple pheno-typing pipelines where a lsquopipelinersquo is simply an orderedsequence of phenotyping procedures to be carried outThis caters to specific circumstances where a centerwishes to record and export supplementary data inaddition to those that are required by the standardIMPC pipeline This allows incorporation of data col-lected using historic pipelines such as EUMODIC TheIMPReSS database uses the Mammalian Phenotype (MP)Ontology terms (1214) to annotate procedures andparameters eg the parameter lsquoincreased bloodglucose concentrationrsquo is annotated to lsquoimpaired glucosetolerancersquo (MP0005203) These ontology terms convert anumerical data point via statistical analysis and the termannotated to the SOP to provide text definitions of pheno-deviance (a statistically significant result indicating aphenotype different from a wild-type animal of the samebackground strain) A specific knockout line may havemany different terms annotated to fully capture the pheno-types elicited by multiple SOPs and reflecting thecomplexity and variety of the SOPs applied

DATA INTEGRATION AND ONTOLOGIES

The IMPC portal relies on publicly available dataintegrated in context for different categories of usersFor example the IKMC resource (4) provides informationon ES cells availability mouse repositories such asEMMA (15) provide access to mice Ensembl(10) provides the genomic framework for each knockoutand Mouse Genome Informatics provides gene nomencla-ture and mouse ontology terms Ontologies are widelyused throughout the portal Project-specific views orslims which provide a relevant subset of the ontology ofthe Mouse Phenotype Ontology and the AdultMouse Anatomy Ontology (16) are used to annotatemutants support online user queries and are built intothe schema of our RESTful interface Ontologies arestored locally in a dedicated part of the schema for easeof query and processing and we expect to integrate termsmapping human anatomy and disease to data in thefuture

STATISTICAL ANALYSIS

A major goal of the IMPC is to assign functions toprotein-coding genes using high-throughput phenotypingassays and to extend the primary observations intospecialized fields of research using additional secondaryphenotyping screens High-throughput phenotypingassays produce many different types of data that may becontinuous categorical or time-series numerical dataimages or text descriptions of the parameters measuredduring this assay Data generated from knockout miceare then subjected to statistical analysis where the param-eters measured during the assay are compared with thesame parameters measured in parallel from control wild-type mice from an identical background strain The ex-perimental design also plays a fundamental role in theimplementation of a robust and reproducible analysis ofknockout phenotypic effects (1718) which requirescontrol selection to be given considerable attentionTo identify pheno-deviant lines we have implemented

using the R statistical computing toolkit (httpwwwr-projectorg) a statistical analysis pipeline based on thecomparison of each knockout line population with awild-type control population from a well-defined geneticbackground (C57BL6N) Continuous and time-seriesdata are analyzed using a linear mixed model frame-work (1719) Linear mixed models multiple sources ofvariability on a phenotype where some explanatoryfactors such as sex weight and knockout mutantgenotype are assumed to take fixed values while otherssuch as batch (measurements collected on a particularday) will be source of random effect (for example owingto laboratory conditions) We summarize time-series data(eg area under the curve or mean) and this variable isthen used into the linear mixed model as a continuousvariable Categorical data contain data separable inmutually exclusive categories and deal with qualitative at-tributes of the observed object A Fisher exact testis performed on categorical data and provides a quantita-tive description of the differences between the knock-out and wild-type populations For each knockoutline we aim to analyze data for seven males and sevenfemales When a test is considered statistically significantontology terms from the Mammalian Phenotype Ontology(12) are automatically associated to the individualgenotypes based on association specified in IMPReSSfor every parameter (14)

PROJECT TRACKING

The iMITS (httpwwwmousephenotypeorgimits) is thecentral database for the planning and tracking of IMPCmouse production The database contains the catalogs ofall IKMC ES cell clones and IMPC mouse alleles theirdetailed molecular structure and QC data that verify themutant allele (5) Mutant cells and mice are made avail-able to the scientific community on request via designatedrepositories iMITS facilitates the distribution of theseproducts by capturing information on the nominated dis-tribution center(s) and providing appropriate order linksIMPC mouse production centers cooperate to maximize

Nucleic Acids Research 2014 Vol 42 Database issue D807

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

pipeline and legacy protocols (httpwwwmousephenotypeorgimpresssoapserverwsdl) and aRESTful interface to the mouse alleles experimentalresults and genotypendashphenotype associations from thestatistical analysis We also provide both RESTful andBioMart interfaces to access details of mouse productionfrom the portal (httpwwwmousephenotypeorgimits)The portal code is distributed under the Apache v2software license and available on GitHub (httpsgithubcommpi2) and supported by user documentation (httpsgithubcommpi2PhenotypeArchivewiki) The project

uses an agile development approach and delivers newsoftware releases via the portal and supporting codereleases every month

DATA ACQUISITION AND QC

Phenotype data produced by the centers are first recordedin local laboratory information management systemsmanaged by the individual phenotyping centersHowever for these data to become part of the IMPCdataset the data must be captured by rigorously following

Figure 1 A view of the gene and phenotype data for Cib2 a calcium and integrin-binding family protein The phenotype heatmap shows significantphenotypes for auditory and brainstem and behavioral tests (Plt 00001) Users can explore underlying data by clicking on phenotype names Thegraph shows Cib2 homozygous knockout animals have impaired response to sound stimulus indicating a significant hearing defect as well asabnormal ear morphology A stock image of an abnormal ear is provided for reference

D804 Nucleic Acids Research 2014 Vol 42 Database issue

a well-defined set of SOPs Furthermore the recordedmeasurements must conform to a standardized specifica-tion which includes the unit of measurement the numberof measurements to be taken and other essential metadataWithin the IMPC consortium phenotyping experi-ments are referred to as lsquoproceduresrsquo and the set ofmeasurements produced by a procedure as lsquoparametersrsquoThe SOPs and the specifications for each of the proceduresand parameters are stored in the IMPReSS database(see later)

Phenotype data collection validation and dissemination

When the data are ready for collection and collation thephenotyping centers export their data as ExtensibleMarkup Language (XML) documents (httpwwww3orgTRREC-xml) These are documents that conformto the standardized data exchange format defined by theIMPC consortium using the XML Schema DefinitionLanguage (XSD) specified by the W3C consortium(httpwwww3orgTRxmlschema11-1 httpwwww3orgTRxmlschema11-2) The IMPC Data CoordinationCentre at MRC Harwell then downloads these documentsThe provenance and chain-of-custody of the data ismanaged using a data tracker not presented here Asshown in Figure 3 data processing happens in threemain phases to ensure the highest level of data integrityand traceability In the first phase the data exported bythe mouse clinics are validated against the required pro-cedure and parameter specifications as defined in the SOPs(Data Coordination Centre component) and the suppliedvalues are checked against the corresponding context-specific databases eg check for existence of a mousestrain in the IMPC Mouse Tracking System (iMITS)

In the second phase validated data are incorporated tothe centralized dataset and additional processing iscarried out to prepare the data for effective visualizationand statistical analysis The data are then made availableto the data wranglers for QC checks and also to re-searchers for preliminary data analysis In the third andfinal stage data that have passed QC are sent to theCentral Data Archive at EMBL-EBI where they aremade available as curated phenotype data The pipelineis designed to ensure data are publicly available as quicklyas possible to the users of the portal

Quality control

The IMPC aims to provide the highest quality data to thebiomedical community QC checks are performed inaddition to the checks performed at the mouse clinicsThe QC process involves identifying anomalies in thesubmitted data The aim is to remove data entry and com-munication errors before the measurements undergoextensive statistical analysis Some of the QC issuesidentified are missing data for required parametersmissing wild-type measurements duplicate measurementsmeasurements with wrong units unexpected values (eg 0or negative body weight) out-of-bounds and outliersamong others These are then communicated to thephenotyping centers which either fix the issue by correct-ing the error or provide an explanation All of theidentified issues are captured and managed using customQC tools QC tools provide the users (mouse centers anddata wranglers) with an integrated workbench forvisualization analysis identification and resolution ofQC issues By providing an interactive web applicationthat is designed specifically for the visualization of

Figure 2 iMITS (httpswwwmousephenotypeorgimits) stores and provides summary and detailed production information Users can view high-level allele information on the IMPC portal gene pages The iMITS tab of the IMPC portal shows detailed IMPC production information eg forSdha Information to access iMITS is provided on the IMPC homepage

Nucleic Acids Research 2014 Vol 42 Database issue D805

mouse phenotype data we are able to streamline theworkflow

Data availability

Phenotype data collection started in early 2013 and todate 2 079 607 data points at eight mouse clinics are avail-able and are undergoing QC before export for archiving(Table 1) This excludes legacy data that are alreadyarchived and available for query Data for 19 differentphenotyping procedures are available in the IMPReSSdatabase (Table 2) Mouse production of IKMC alleleshas been tracked since 2008 To date ES cell microinjec-tions that have produced gt3000 mouse lines are recorded

SOPS AND PROTOCOLS

IMPC provides high-quality phenotype data by followingrigorous data collection processes This is achieved by

reducing exposure to human error via semi-automateddata collection and validation processes The applicationof such procedures across multiple centers allows reliabledetection of subtle phenotypes eg in the broad-basedphenotyping of C57BL6J and C67BL6N mouse strains(13) Part of this automation is made possible due to theIMPC protocols which consist of the SOPs and the pro-cedure and parameter specifications These protocols aremaintained in a form that is both human readable and

Figure 3 A schematic overview of data flows into the web portal for IMPC data Currently eight mouse clinics are involved in IMPC and producephenotype data These are then collected validated and processed to produce curated data available from the project portal Legacy data fromEuroPhenome and Sanger MGP were directly transferred to the Central Data Archive at EMBL-EBI for direct integration on the portal

Table 2 Mouse phenotyping data points by SOP September 2013

Procedures Number ofdata points

Acoustic startle and pre-pulse inhibition (PPI) 80 787Auditory brain stem response 30 606Body composition (DEXA leanfat) 59 232Body weight 205 194Challenge whole body plethysmography 48 464Clinical blood chemistry 152 217Combined SHIRPA and dysmorphology 158 199Echo 5916Electrocardiogram (ECG) 35 407Eye morphology 181 839Grip strength 71 298Heart weight 26 520Hematology 94 416Indirect calorimetry 627 049Insulin blood level 38Intraperitoneal glucose tolerance test (IPGTT) 70 141Open field 34 416Organs weight 11 746X-ray 186 122

Total 2 079 607

Table 1 Mouse phenotyping data points by submitting center

September 2013

Mouse clinics Number of data points

Baylor College of Medicine 0Helmholtz Zentrum Munchen 75 662Institut Clinique de la Souris 446 670MRC Harwell 164 037The Jackson Laboratory 142 221The Toronto Centre for Phenogenomics 20 365University of California Davis 59 190Wellcome Trust Sanger Institute 1 171 462

Total 2 079 607

D806 Nucleic Acids Research 2014 Vol 42 Database issue

machine consumable The IMPC protocols areavailable from the International Mouse PhenotypingResource of Standardized Screens (IMPReSS) oneof the services provided by the IMPC infrastructureMachine-readable web services (httpswwwmous-ephenotypeorgimpresssoapserverwsdl) are used in thevalidation processes discussed in the data acquisition andQC section and a dedicated relational database storesSOPs

The adoption of standardized phenotyping protocolsacross all of the participating mouse clinics requires thatthe same procedures be carried out under the sameconditions specified by the protocol These protocolshave been agreed through active collaboration betweenthe data wranglers (who also administer the contents ofthe IMPReSS database) the phenotyping centers andmembers of the scientific community

The IMPReSS database maintains multiple pheno-typing pipelines where a lsquopipelinersquo is simply an orderedsequence of phenotyping procedures to be carried outThis caters to specific circumstances where a centerwishes to record and export supplementary data inaddition to those that are required by the standardIMPC pipeline This allows incorporation of data col-lected using historic pipelines such as EUMODIC TheIMPReSS database uses the Mammalian Phenotype (MP)Ontology terms (1214) to annotate procedures andparameters eg the parameter lsquoincreased bloodglucose concentrationrsquo is annotated to lsquoimpaired glucosetolerancersquo (MP0005203) These ontology terms convert anumerical data point via statistical analysis and the termannotated to the SOP to provide text definitions of pheno-deviance (a statistically significant result indicating aphenotype different from a wild-type animal of the samebackground strain) A specific knockout line may havemany different terms annotated to fully capture the pheno-types elicited by multiple SOPs and reflecting thecomplexity and variety of the SOPs applied

DATA INTEGRATION AND ONTOLOGIES

The IMPC portal relies on publicly available dataintegrated in context for different categories of usersFor example the IKMC resource (4) provides informationon ES cells availability mouse repositories such asEMMA (15) provide access to mice Ensembl(10) provides the genomic framework for each knockoutand Mouse Genome Informatics provides gene nomencla-ture and mouse ontology terms Ontologies are widelyused throughout the portal Project-specific views orslims which provide a relevant subset of the ontology ofthe Mouse Phenotype Ontology and the AdultMouse Anatomy Ontology (16) are used to annotatemutants support online user queries and are built intothe schema of our RESTful interface Ontologies arestored locally in a dedicated part of the schema for easeof query and processing and we expect to integrate termsmapping human anatomy and disease to data in thefuture

STATISTICAL ANALYSIS

A major goal of the IMPC is to assign functions toprotein-coding genes using high-throughput phenotypingassays and to extend the primary observations intospecialized fields of research using additional secondaryphenotyping screens High-throughput phenotypingassays produce many different types of data that may becontinuous categorical or time-series numerical dataimages or text descriptions of the parameters measuredduring this assay Data generated from knockout miceare then subjected to statistical analysis where the param-eters measured during the assay are compared with thesame parameters measured in parallel from control wild-type mice from an identical background strain The ex-perimental design also plays a fundamental role in theimplementation of a robust and reproducible analysis ofknockout phenotypic effects (1718) which requirescontrol selection to be given considerable attentionTo identify pheno-deviant lines we have implemented

using the R statistical computing toolkit (httpwwwr-projectorg) a statistical analysis pipeline based on thecomparison of each knockout line population with awild-type control population from a well-defined geneticbackground (C57BL6N) Continuous and time-seriesdata are analyzed using a linear mixed model frame-work (1719) Linear mixed models multiple sources ofvariability on a phenotype where some explanatoryfactors such as sex weight and knockout mutantgenotype are assumed to take fixed values while otherssuch as batch (measurements collected on a particularday) will be source of random effect (for example owingto laboratory conditions) We summarize time-series data(eg area under the curve or mean) and this variable isthen used into the linear mixed model as a continuousvariable Categorical data contain data separable inmutually exclusive categories and deal with qualitative at-tributes of the observed object A Fisher exact testis performed on categorical data and provides a quantita-tive description of the differences between the knock-out and wild-type populations For each knockoutline we aim to analyze data for seven males and sevenfemales When a test is considered statistically significantontology terms from the Mammalian Phenotype Ontology(12) are automatically associated to the individualgenotypes based on association specified in IMPReSSfor every parameter (14)

PROJECT TRACKING

The iMITS (httpwwwmousephenotypeorgimits) is thecentral database for the planning and tracking of IMPCmouse production The database contains the catalogs ofall IKMC ES cell clones and IMPC mouse alleles theirdetailed molecular structure and QC data that verify themutant allele (5) Mutant cells and mice are made avail-able to the scientific community on request via designatedrepositories iMITS facilitates the distribution of theseproducts by capturing information on the nominated dis-tribution center(s) and providing appropriate order linksIMPC mouse production centers cooperate to maximize

Nucleic Acids Research 2014 Vol 42 Database issue D807

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

a well-defined set of SOPs Furthermore the recordedmeasurements must conform to a standardized specifica-tion which includes the unit of measurement the numberof measurements to be taken and other essential metadataWithin the IMPC consortium phenotyping experi-ments are referred to as lsquoproceduresrsquo and the set ofmeasurements produced by a procedure as lsquoparametersrsquoThe SOPs and the specifications for each of the proceduresand parameters are stored in the IMPReSS database(see later)

Phenotype data collection validation and dissemination

When the data are ready for collection and collation thephenotyping centers export their data as ExtensibleMarkup Language (XML) documents (httpwwww3orgTRREC-xml) These are documents that conformto the standardized data exchange format defined by theIMPC consortium using the XML Schema DefinitionLanguage (XSD) specified by the W3C consortium(httpwwww3orgTRxmlschema11-1 httpwwww3orgTRxmlschema11-2) The IMPC Data CoordinationCentre at MRC Harwell then downloads these documentsThe provenance and chain-of-custody of the data ismanaged using a data tracker not presented here Asshown in Figure 3 data processing happens in threemain phases to ensure the highest level of data integrityand traceability In the first phase the data exported bythe mouse clinics are validated against the required pro-cedure and parameter specifications as defined in the SOPs(Data Coordination Centre component) and the suppliedvalues are checked against the corresponding context-specific databases eg check for existence of a mousestrain in the IMPC Mouse Tracking System (iMITS)

In the second phase validated data are incorporated tothe centralized dataset and additional processing iscarried out to prepare the data for effective visualizationand statistical analysis The data are then made availableto the data wranglers for QC checks and also to re-searchers for preliminary data analysis In the third andfinal stage data that have passed QC are sent to theCentral Data Archive at EMBL-EBI where they aremade available as curated phenotype data The pipelineis designed to ensure data are publicly available as quicklyas possible to the users of the portal

Quality control

The IMPC aims to provide the highest quality data to thebiomedical community QC checks are performed inaddition to the checks performed at the mouse clinicsThe QC process involves identifying anomalies in thesubmitted data The aim is to remove data entry and com-munication errors before the measurements undergoextensive statistical analysis Some of the QC issuesidentified are missing data for required parametersmissing wild-type measurements duplicate measurementsmeasurements with wrong units unexpected values (eg 0or negative body weight) out-of-bounds and outliersamong others These are then communicated to thephenotyping centers which either fix the issue by correct-ing the error or provide an explanation All of theidentified issues are captured and managed using customQC tools QC tools provide the users (mouse centers anddata wranglers) with an integrated workbench forvisualization analysis identification and resolution ofQC issues By providing an interactive web applicationthat is designed specifically for the visualization of

Figure 2 iMITS (httpswwwmousephenotypeorgimits) stores and provides summary and detailed production information Users can view high-level allele information on the IMPC portal gene pages The iMITS tab of the IMPC portal shows detailed IMPC production information eg forSdha Information to access iMITS is provided on the IMPC homepage

Nucleic Acids Research 2014 Vol 42 Database issue D805

mouse phenotype data we are able to streamline theworkflow

Data availability

Phenotype data collection started in early 2013 and todate 2 079 607 data points at eight mouse clinics are avail-able and are undergoing QC before export for archiving(Table 1) This excludes legacy data that are alreadyarchived and available for query Data for 19 differentphenotyping procedures are available in the IMPReSSdatabase (Table 2) Mouse production of IKMC alleleshas been tracked since 2008 To date ES cell microinjec-tions that have produced gt3000 mouse lines are recorded

SOPS AND PROTOCOLS

IMPC provides high-quality phenotype data by followingrigorous data collection processes This is achieved by

reducing exposure to human error via semi-automateddata collection and validation processes The applicationof such procedures across multiple centers allows reliabledetection of subtle phenotypes eg in the broad-basedphenotyping of C57BL6J and C67BL6N mouse strains(13) Part of this automation is made possible due to theIMPC protocols which consist of the SOPs and the pro-cedure and parameter specifications These protocols aremaintained in a form that is both human readable and

Figure 3 A schematic overview of data flows into the web portal for IMPC data Currently eight mouse clinics are involved in IMPC and producephenotype data These are then collected validated and processed to produce curated data available from the project portal Legacy data fromEuroPhenome and Sanger MGP were directly transferred to the Central Data Archive at EMBL-EBI for direct integration on the portal

Table 2 Mouse phenotyping data points by SOP September 2013

Procedures Number ofdata points

Acoustic startle and pre-pulse inhibition (PPI) 80 787Auditory brain stem response 30 606Body composition (DEXA leanfat) 59 232Body weight 205 194Challenge whole body plethysmography 48 464Clinical blood chemistry 152 217Combined SHIRPA and dysmorphology 158 199Echo 5916Electrocardiogram (ECG) 35 407Eye morphology 181 839Grip strength 71 298Heart weight 26 520Hematology 94 416Indirect calorimetry 627 049Insulin blood level 38Intraperitoneal glucose tolerance test (IPGTT) 70 141Open field 34 416Organs weight 11 746X-ray 186 122

Total 2 079 607

Table 1 Mouse phenotyping data points by submitting center

September 2013

Mouse clinics Number of data points

Baylor College of Medicine 0Helmholtz Zentrum Munchen 75 662Institut Clinique de la Souris 446 670MRC Harwell 164 037The Jackson Laboratory 142 221The Toronto Centre for Phenogenomics 20 365University of California Davis 59 190Wellcome Trust Sanger Institute 1 171 462

Total 2 079 607

D806 Nucleic Acids Research 2014 Vol 42 Database issue

machine consumable The IMPC protocols areavailable from the International Mouse PhenotypingResource of Standardized Screens (IMPReSS) oneof the services provided by the IMPC infrastructureMachine-readable web services (httpswwwmous-ephenotypeorgimpresssoapserverwsdl) are used in thevalidation processes discussed in the data acquisition andQC section and a dedicated relational database storesSOPs

The adoption of standardized phenotyping protocolsacross all of the participating mouse clinics requires thatthe same procedures be carried out under the sameconditions specified by the protocol These protocolshave been agreed through active collaboration betweenthe data wranglers (who also administer the contents ofthe IMPReSS database) the phenotyping centers andmembers of the scientific community

The IMPReSS database maintains multiple pheno-typing pipelines where a lsquopipelinersquo is simply an orderedsequence of phenotyping procedures to be carried outThis caters to specific circumstances where a centerwishes to record and export supplementary data inaddition to those that are required by the standardIMPC pipeline This allows incorporation of data col-lected using historic pipelines such as EUMODIC TheIMPReSS database uses the Mammalian Phenotype (MP)Ontology terms (1214) to annotate procedures andparameters eg the parameter lsquoincreased bloodglucose concentrationrsquo is annotated to lsquoimpaired glucosetolerancersquo (MP0005203) These ontology terms convert anumerical data point via statistical analysis and the termannotated to the SOP to provide text definitions of pheno-deviance (a statistically significant result indicating aphenotype different from a wild-type animal of the samebackground strain) A specific knockout line may havemany different terms annotated to fully capture the pheno-types elicited by multiple SOPs and reflecting thecomplexity and variety of the SOPs applied

DATA INTEGRATION AND ONTOLOGIES

The IMPC portal relies on publicly available dataintegrated in context for different categories of usersFor example the IKMC resource (4) provides informationon ES cells availability mouse repositories such asEMMA (15) provide access to mice Ensembl(10) provides the genomic framework for each knockoutand Mouse Genome Informatics provides gene nomencla-ture and mouse ontology terms Ontologies are widelyused throughout the portal Project-specific views orslims which provide a relevant subset of the ontology ofthe Mouse Phenotype Ontology and the AdultMouse Anatomy Ontology (16) are used to annotatemutants support online user queries and are built intothe schema of our RESTful interface Ontologies arestored locally in a dedicated part of the schema for easeof query and processing and we expect to integrate termsmapping human anatomy and disease to data in thefuture

STATISTICAL ANALYSIS

A major goal of the IMPC is to assign functions toprotein-coding genes using high-throughput phenotypingassays and to extend the primary observations intospecialized fields of research using additional secondaryphenotyping screens High-throughput phenotypingassays produce many different types of data that may becontinuous categorical or time-series numerical dataimages or text descriptions of the parameters measuredduring this assay Data generated from knockout miceare then subjected to statistical analysis where the param-eters measured during the assay are compared with thesame parameters measured in parallel from control wild-type mice from an identical background strain The ex-perimental design also plays a fundamental role in theimplementation of a robust and reproducible analysis ofknockout phenotypic effects (1718) which requirescontrol selection to be given considerable attentionTo identify pheno-deviant lines we have implemented

using the R statistical computing toolkit (httpwwwr-projectorg) a statistical analysis pipeline based on thecomparison of each knockout line population with awild-type control population from a well-defined geneticbackground (C57BL6N) Continuous and time-seriesdata are analyzed using a linear mixed model frame-work (1719) Linear mixed models multiple sources ofvariability on a phenotype where some explanatoryfactors such as sex weight and knockout mutantgenotype are assumed to take fixed values while otherssuch as batch (measurements collected on a particularday) will be source of random effect (for example owingto laboratory conditions) We summarize time-series data(eg area under the curve or mean) and this variable isthen used into the linear mixed model as a continuousvariable Categorical data contain data separable inmutually exclusive categories and deal with qualitative at-tributes of the observed object A Fisher exact testis performed on categorical data and provides a quantita-tive description of the differences between the knock-out and wild-type populations For each knockoutline we aim to analyze data for seven males and sevenfemales When a test is considered statistically significantontology terms from the Mammalian Phenotype Ontology(12) are automatically associated to the individualgenotypes based on association specified in IMPReSSfor every parameter (14)

PROJECT TRACKING

The iMITS (httpwwwmousephenotypeorgimits) is thecentral database for the planning and tracking of IMPCmouse production The database contains the catalogs ofall IKMC ES cell clones and IMPC mouse alleles theirdetailed molecular structure and QC data that verify themutant allele (5) Mutant cells and mice are made avail-able to the scientific community on request via designatedrepositories iMITS facilitates the distribution of theseproducts by capturing information on the nominated dis-tribution center(s) and providing appropriate order linksIMPC mouse production centers cooperate to maximize

Nucleic Acids Research 2014 Vol 42 Database issue D807

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

mouse phenotype data we are able to streamline theworkflow

Data availability

Phenotype data collection started in early 2013 and todate 2 079 607 data points at eight mouse clinics are avail-able and are undergoing QC before export for archiving(Table 1) This excludes legacy data that are alreadyarchived and available for query Data for 19 differentphenotyping procedures are available in the IMPReSSdatabase (Table 2) Mouse production of IKMC alleleshas been tracked since 2008 To date ES cell microinjec-tions that have produced gt3000 mouse lines are recorded

SOPS AND PROTOCOLS

IMPC provides high-quality phenotype data by followingrigorous data collection processes This is achieved by

reducing exposure to human error via semi-automateddata collection and validation processes The applicationof such procedures across multiple centers allows reliabledetection of subtle phenotypes eg in the broad-basedphenotyping of C57BL6J and C67BL6N mouse strains(13) Part of this automation is made possible due to theIMPC protocols which consist of the SOPs and the pro-cedure and parameter specifications These protocols aremaintained in a form that is both human readable and

Figure 3 A schematic overview of data flows into the web portal for IMPC data Currently eight mouse clinics are involved in IMPC and producephenotype data These are then collected validated and processed to produce curated data available from the project portal Legacy data fromEuroPhenome and Sanger MGP were directly transferred to the Central Data Archive at EMBL-EBI for direct integration on the portal

Table 2 Mouse phenotyping data points by SOP September 2013

Procedures Number ofdata points

Acoustic startle and pre-pulse inhibition (PPI) 80 787Auditory brain stem response 30 606Body composition (DEXA leanfat) 59 232Body weight 205 194Challenge whole body plethysmography 48 464Clinical blood chemistry 152 217Combined SHIRPA and dysmorphology 158 199Echo 5916Electrocardiogram (ECG) 35 407Eye morphology 181 839Grip strength 71 298Heart weight 26 520Hematology 94 416Indirect calorimetry 627 049Insulin blood level 38Intraperitoneal glucose tolerance test (IPGTT) 70 141Open field 34 416Organs weight 11 746X-ray 186 122

Total 2 079 607

Table 1 Mouse phenotyping data points by submitting center

September 2013

Mouse clinics Number of data points

Baylor College of Medicine 0Helmholtz Zentrum Munchen 75 662Institut Clinique de la Souris 446 670MRC Harwell 164 037The Jackson Laboratory 142 221The Toronto Centre for Phenogenomics 20 365University of California Davis 59 190Wellcome Trust Sanger Institute 1 171 462

Total 2 079 607

D806 Nucleic Acids Research 2014 Vol 42 Database issue

machine consumable The IMPC protocols areavailable from the International Mouse PhenotypingResource of Standardized Screens (IMPReSS) oneof the services provided by the IMPC infrastructureMachine-readable web services (httpswwwmous-ephenotypeorgimpresssoapserverwsdl) are used in thevalidation processes discussed in the data acquisition andQC section and a dedicated relational database storesSOPs

The adoption of standardized phenotyping protocolsacross all of the participating mouse clinics requires thatthe same procedures be carried out under the sameconditions specified by the protocol These protocolshave been agreed through active collaboration betweenthe data wranglers (who also administer the contents ofthe IMPReSS database) the phenotyping centers andmembers of the scientific community

The IMPReSS database maintains multiple pheno-typing pipelines where a lsquopipelinersquo is simply an orderedsequence of phenotyping procedures to be carried outThis caters to specific circumstances where a centerwishes to record and export supplementary data inaddition to those that are required by the standardIMPC pipeline This allows incorporation of data col-lected using historic pipelines such as EUMODIC TheIMPReSS database uses the Mammalian Phenotype (MP)Ontology terms (1214) to annotate procedures andparameters eg the parameter lsquoincreased bloodglucose concentrationrsquo is annotated to lsquoimpaired glucosetolerancersquo (MP0005203) These ontology terms convert anumerical data point via statistical analysis and the termannotated to the SOP to provide text definitions of pheno-deviance (a statistically significant result indicating aphenotype different from a wild-type animal of the samebackground strain) A specific knockout line may havemany different terms annotated to fully capture the pheno-types elicited by multiple SOPs and reflecting thecomplexity and variety of the SOPs applied

DATA INTEGRATION AND ONTOLOGIES

The IMPC portal relies on publicly available dataintegrated in context for different categories of usersFor example the IKMC resource (4) provides informationon ES cells availability mouse repositories such asEMMA (15) provide access to mice Ensembl(10) provides the genomic framework for each knockoutand Mouse Genome Informatics provides gene nomencla-ture and mouse ontology terms Ontologies are widelyused throughout the portal Project-specific views orslims which provide a relevant subset of the ontology ofthe Mouse Phenotype Ontology and the AdultMouse Anatomy Ontology (16) are used to annotatemutants support online user queries and are built intothe schema of our RESTful interface Ontologies arestored locally in a dedicated part of the schema for easeof query and processing and we expect to integrate termsmapping human anatomy and disease to data in thefuture

STATISTICAL ANALYSIS

A major goal of the IMPC is to assign functions toprotein-coding genes using high-throughput phenotypingassays and to extend the primary observations intospecialized fields of research using additional secondaryphenotyping screens High-throughput phenotypingassays produce many different types of data that may becontinuous categorical or time-series numerical dataimages or text descriptions of the parameters measuredduring this assay Data generated from knockout miceare then subjected to statistical analysis where the param-eters measured during the assay are compared with thesame parameters measured in parallel from control wild-type mice from an identical background strain The ex-perimental design also plays a fundamental role in theimplementation of a robust and reproducible analysis ofknockout phenotypic effects (1718) which requirescontrol selection to be given considerable attentionTo identify pheno-deviant lines we have implemented

using the R statistical computing toolkit (httpwwwr-projectorg) a statistical analysis pipeline based on thecomparison of each knockout line population with awild-type control population from a well-defined geneticbackground (C57BL6N) Continuous and time-seriesdata are analyzed using a linear mixed model frame-work (1719) Linear mixed models multiple sources ofvariability on a phenotype where some explanatoryfactors such as sex weight and knockout mutantgenotype are assumed to take fixed values while otherssuch as batch (measurements collected on a particularday) will be source of random effect (for example owingto laboratory conditions) We summarize time-series data(eg area under the curve or mean) and this variable isthen used into the linear mixed model as a continuousvariable Categorical data contain data separable inmutually exclusive categories and deal with qualitative at-tributes of the observed object A Fisher exact testis performed on categorical data and provides a quantita-tive description of the differences between the knock-out and wild-type populations For each knockoutline we aim to analyze data for seven males and sevenfemales When a test is considered statistically significantontology terms from the Mammalian Phenotype Ontology(12) are automatically associated to the individualgenotypes based on association specified in IMPReSSfor every parameter (14)

PROJECT TRACKING

The iMITS (httpwwwmousephenotypeorgimits) is thecentral database for the planning and tracking of IMPCmouse production The database contains the catalogs ofall IKMC ES cell clones and IMPC mouse alleles theirdetailed molecular structure and QC data that verify themutant allele (5) Mutant cells and mice are made avail-able to the scientific community on request via designatedrepositories iMITS facilitates the distribution of theseproducts by capturing information on the nominated dis-tribution center(s) and providing appropriate order linksIMPC mouse production centers cooperate to maximize

Nucleic Acids Research 2014 Vol 42 Database issue D807

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

machine consumable The IMPC protocols areavailable from the International Mouse PhenotypingResource of Standardized Screens (IMPReSS) oneof the services provided by the IMPC infrastructureMachine-readable web services (httpswwwmous-ephenotypeorgimpresssoapserverwsdl) are used in thevalidation processes discussed in the data acquisition andQC section and a dedicated relational database storesSOPs

The adoption of standardized phenotyping protocolsacross all of the participating mouse clinics requires thatthe same procedures be carried out under the sameconditions specified by the protocol These protocolshave been agreed through active collaboration betweenthe data wranglers (who also administer the contents ofthe IMPReSS database) the phenotyping centers andmembers of the scientific community

The IMPReSS database maintains multiple pheno-typing pipelines where a lsquopipelinersquo is simply an orderedsequence of phenotyping procedures to be carried outThis caters to specific circumstances where a centerwishes to record and export supplementary data inaddition to those that are required by the standardIMPC pipeline This allows incorporation of data col-lected using historic pipelines such as EUMODIC TheIMPReSS database uses the Mammalian Phenotype (MP)Ontology terms (1214) to annotate procedures andparameters eg the parameter lsquoincreased bloodglucose concentrationrsquo is annotated to lsquoimpaired glucosetolerancersquo (MP0005203) These ontology terms convert anumerical data point via statistical analysis and the termannotated to the SOP to provide text definitions of pheno-deviance (a statistically significant result indicating aphenotype different from a wild-type animal of the samebackground strain) A specific knockout line may havemany different terms annotated to fully capture the pheno-types elicited by multiple SOPs and reflecting thecomplexity and variety of the SOPs applied

DATA INTEGRATION AND ONTOLOGIES

The IMPC portal relies on publicly available dataintegrated in context for different categories of usersFor example the IKMC resource (4) provides informationon ES cells availability mouse repositories such asEMMA (15) provide access to mice Ensembl(10) provides the genomic framework for each knockoutand Mouse Genome Informatics provides gene nomencla-ture and mouse ontology terms Ontologies are widelyused throughout the portal Project-specific views orslims which provide a relevant subset of the ontology ofthe Mouse Phenotype Ontology and the AdultMouse Anatomy Ontology (16) are used to annotatemutants support online user queries and are built intothe schema of our RESTful interface Ontologies arestored locally in a dedicated part of the schema for easeof query and processing and we expect to integrate termsmapping human anatomy and disease to data in thefuture

STATISTICAL ANALYSIS

A major goal of the IMPC is to assign functions toprotein-coding genes using high-throughput phenotypingassays and to extend the primary observations intospecialized fields of research using additional secondaryphenotyping screens High-throughput phenotypingassays produce many different types of data that may becontinuous categorical or time-series numerical dataimages or text descriptions of the parameters measuredduring this assay Data generated from knockout miceare then subjected to statistical analysis where the param-eters measured during the assay are compared with thesame parameters measured in parallel from control wild-type mice from an identical background strain The ex-perimental design also plays a fundamental role in theimplementation of a robust and reproducible analysis ofknockout phenotypic effects (1718) which requirescontrol selection to be given considerable attentionTo identify pheno-deviant lines we have implemented

using the R statistical computing toolkit (httpwwwr-projectorg) a statistical analysis pipeline based on thecomparison of each knockout line population with awild-type control population from a well-defined geneticbackground (C57BL6N) Continuous and time-seriesdata are analyzed using a linear mixed model frame-work (1719) Linear mixed models multiple sources ofvariability on a phenotype where some explanatoryfactors such as sex weight and knockout mutantgenotype are assumed to take fixed values while otherssuch as batch (measurements collected on a particularday) will be source of random effect (for example owingto laboratory conditions) We summarize time-series data(eg area under the curve or mean) and this variable isthen used into the linear mixed model as a continuousvariable Categorical data contain data separable inmutually exclusive categories and deal with qualitative at-tributes of the observed object A Fisher exact testis performed on categorical data and provides a quantita-tive description of the differences between the knock-out and wild-type populations For each knockoutline we aim to analyze data for seven males and sevenfemales When a test is considered statistically significantontology terms from the Mammalian Phenotype Ontology(12) are automatically associated to the individualgenotypes based on association specified in IMPReSSfor every parameter (14)

PROJECT TRACKING

The iMITS (httpwwwmousephenotypeorgimits) is thecentral database for the planning and tracking of IMPCmouse production The database contains the catalogs ofall IKMC ES cell clones and IMPC mouse alleles theirdetailed molecular structure and QC data that verify themutant allele (5) Mutant cells and mice are made avail-able to the scientific community on request via designatedrepositories iMITS facilitates the distribution of theseproducts by capturing information on the nominated dis-tribution center(s) and providing appropriate order linksIMPC mouse production centers cooperate to maximize

Nucleic Acids Research 2014 Vol 42 Database issue D807

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

production efficiency and avoid duplication of effortEach IMPC production center registers the genesselected for production and phenotyping in the iMITSdatabase Conflicting intentions are flagged Once anIKMC ES cell clone is microinjected centers uploaddetails of the microinjection experiments onwardbreeding and progress of phenotype data collection andtransfer Actual and intended production is immediatelydisplayed on gene pages in the IMPC portal and the dataare publicly available for browsing and downloadingSummary iMITS ES allele and mouse production data

are displayed on the IMPC portal and detailedin-progress production information can be found bydirectly browsing the iMITS Web site (Figure 2) TheiMITS infrastructure allows users to be notified byemail on the status of the knockout mouse productionby registering interest as described earlier

LEGACY DATA

The IMPC portal consolidates data access to existingphenotyping data from the EuroPhenome and SangerMouse Genetics Project (MGP) pipelines Where thesedata are available for a gene or phenotype of interesttheir origin is clearly marked in the interface and linksTo date gt115 million data points are available forlegacy data Genotypendashphenotype associations fromEuroPhenome and MGP are presented in Table 3 andclassified by high-level mammalian phenotype terms Theinclusion of these data is the key to the mission of theIMPC and the MPI2 consortium which is to unifyaccess to data and to provide a stable archive

CONCLUSION

The IMPC Web Portal provides unique and unified accessto mouse phenotyping data from multiple sources

including genomic genotypic and phenotypic contextfrom ontologies and the literature and phenotypicimages Access is provided to data as soon as it is avail-able and for existing legacy data In future we willsupport data access for new embryonic phenotyping pipe-lines integrate public gene expression data and make thedata more accessible to translational researchers by inclu-sion of queries for human orthologs diseases and raredisease data The statistical pipeline is likely to berefined as more phenotype data are produced and datawill regularly be examined to ensure high standards aremaintained as new data are submitted We invite users toregister for data of interest via the interface or to sign upfor usability or beta testing activities to improve theportal and provide input into future developments

ACKNOWLEDGEMENTS

The authors thank the members of the IMPC consortiumthe EMBL-EBI Industry Programme Staff of MRCHarwell the Wellcome Trust Sanger Institute FrancisRowland Jennifer Cham Sangya Pundir and the NIHKOMP2 program for their user feedback and participa-tion in user experience sessions in which prototype inter-faces were tested and improved They are especiallygrateful to the users who have contacted them throughtheir user support contact form and the biologists whohave participated in our usability testing sessions toimprove the portal They also thank the IMPC SteeringCommittee and PSC for feedback on the interfaces Theythank Mary Todd Bergman and Spencer Phillips of theEBI for their assistance with figures for this article andHayley Protheroe and the WTSI Team 109 for providingmouse images

FUNDING

The National Institutes of Health (NIH) [1 U54HG006370-01] EMBL-EBI Core Funding (to HP andPF) and Wellcome Trust Core Funding (to WCS)Funding for open access charge NIH [1 U54HG006370-01]

Conflict of interest statement None declared

REFERENCES

1 BrownSDM and MooreMW (2012) The International MousePhenotyping Consortium past and future perspectives on mousephenotyping Mamm Genome 23 632ndash640

2 BrownSDM and MooreMW (2012) Towards anencyclopaedia of mammalian gene function the InternationalMouse Phenotyping Consortium Dis Models Mech 5 289ndash292

3 ValenzuelaDM MurphyAJ FrendeweyD GaleNWEconomidesAN AuerbachW PoueymirouWT AdamsNCRojasJ YasenchakJ et al (2003) High-throughput engineeringof the mouse genome coupled with high-resolution expressionanalysis Nat Biotechnol 21 652ndash659

4 SkarnesWC RosenB WestAP KoutsourakisM BushellWIyerV MujicaAO ThomasM HarrowJ CoxT et al (2011)A conditional knockout resource for the genome-wide study ofmouse gene function Nature 474 337ndash342

Table 3 Genotypendashphenotype associations from legacy EuroPhenome

and Sanger MGP available from the IMPC portal September 2013

Mammalian phenotype high-level terms Genotypendashphenotypeassociations

Behaviorneurological phenotype 1268Homeostasismetabolism phenotype 1032Growthsize phenotype 724Hematopoietic system phenotype 702Skeleton phenotype 450Visioneye phenotype 441Adipose tissue phenotype 135Limbsdigitstail phenotype 125Craniofacial phenotype 107Cardiovascular system phenotype 57Integument phenotype 33Nervous system phenotype 24Pigmentation phenotype 20Immune system phenotype 4Reproductive system phenotype 4Endocrineexocrine gland phenotype 3Digestivealimentary phenotype 2Total 5309

Associations are grouped by high-level mammalian phenotype ontologyterms

D808 Nucleic Acids Research 2014 Vol 42 Database issue

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809

5 RingwaldM IyerV MasonJC StoneKR TadepallyHDKadinJA BultCJ EppigJT OakleyDJ BrioisS et al(2011) The IKMC web portal a central point of entry to dataand resources from the International Knockout MouseConsortium Nucleic Acids Res 39 D849ndashD855

6 MorganH BeckT BlakeA GatesH AdamsN DebouzyGLeblancS LenggerC MaierH MelvinD et al (2010)EuroPhenome a repository for high-throughput mousephenotyping data Nucleic Acids Res 38 D577ndashD585

7 WhiteJK GerdinAK KarpNA RyderE BuljanMBussellJN SalisburyJ ClareS InghamNJ PodriniC et al(2013) Genome-wide generation and systematic phenotyping ofknockout mice reveals new roles for many genes Cell 154452ndash464

8 MallonAM IyerV MelvinD MorganH ParkinsonHBrownSDM FlicekP and SkarnesWC (2012) Accessing datafrom the International Mouse Phenotyping Consortium state ofthe art and future plans Mamm Genome 23 641ndash652

9 EppigJT BlakeJA BultCJ KadinJA and RichardsonJE(2012) The Mouse Genome Database (MGD) comprehensiveresource for genetics and genomics of the laboratory mouseNucleic Acids Res 40 D881ndashD886

10 FlicekP AhmedI AmodeMR BarrellD BealK BrentSCarvalho-SilvaD ClaphamP CoatesG FairleyS et al (2013)Ensembl 2013 Nucleic Acids Res 41 D48ndashD55

11 HochheiserH AronowBJ ArtingerK BeatyTHBrinkleyJF ChaiY ClouthierD CunninghamMLDixonM DonahueLR et al (2011) The FaceBase Consortiuma comprehensive program to facilitate craniofacial researchDev Biol 355 175ndash182

12 SmithCL GoldsmithCAW and EppigJT (2005) TheMammalian Phenotype Ontology as a tool for annotating analyzingand comparing phenotypic information Genome Biol 6 R7

13 SimonMM GreenawayS WhiteJK FuchsHGailus-DurnerV SorgT WongK BeduE CartwrightEJDacquinR et al (2013) A comparative phenotypic and genomicanalysis of C57BL6J and C57BL6N mouse strains GenomeBiol 14 R82

14 BeckT MorganH BlakeA WellsS HancockJM andMallonA-M (2009) Practical application of ontologies toannotate and analyse large scale raw mouse phenotype dataBMC Bioinf 10(Suppl 5) S2

15 WilkinsonP SengerovaJ MatteoniR ChenCK SoulatGUreta-VidalA FesseleS HagnM MassimiM PickfordK et al(2010) EMMAmdashmouse mutant resources for the internationalscientific community Nucleic Acids Res 38 D570ndashD576

16 HayamizuTF ManganM CorradiJP KadinJA andRingwaldM (2005) The Adult Mouse Anatomical Dictionary atool for annotating and integrating data Genome Biol 6 R29

17 KarpNA BakerLA GerdinAKB AdamsNCRamırez-SolisR and WhiteJK (2010) Optimising experimentaldesign for high-throughput phenotyping in mice a case studyMamm Genome 21 467ndash476

18 KarpNA MelvinD Sanger Mouse Genetics Project andMottRF (2012) Robust and sensitive analysis of mouseknockout phenotypes PLoS One 7 e52410

19 WestB WelchKB and GaleckiAT (2007) Linear MixedModels A Practical Guide Using Statistical SoftwareChapman amp HallCRC Boca Raton

Nucleic Acids Research 2014 Vol 42 Database issue D809