Defining Pancreatic Endocrine Precursors and Their Descendants

29
Defining pancreatic endocrine precursors and their descendants Peter White Ph.D. 1 , Catherine Lee May Ph.D. 1,2 , Rodrigo N. Lamounier M.D. 1 , John E. Brestelli M.S. 1 and Klaus H. Kaestner Ph.D. 1 * 1 Department of Genetics and Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania School of Medicine, Philadelphia, PA 19104. USA 2 Department of Pathology, The Children’s Hospital of Philadelphia Abramson Research Center, University of Pennsylvania School of Medicine, Philadelphia, PA 19104. USA Running Title: Expression profiling of the endocrine pancreas *Correspondence: [email protected] Received for publication 26 September 2007 and accepted in revised form 30 November 2007. Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org. Diabetes Publish Ahead of Print, published online December 10, 2007 Copyright American Diabetes Association, Inc., 2007

Transcript of Defining Pancreatic Endocrine Precursors and Their Descendants

Defining pancreatic endocrine precursors and their descendants

Peter White Ph.D.1, Catherine Lee May Ph.D.1,2, Rodrigo N. Lamounier M.D.1, John E. Brestelli M.S.1 and Klaus H. Kaestner Ph.D. 1*

1Department of Genetics and Institute for Diabetes, Obesity and Metabolism, University of

Pennsylvania School of Medicine, Philadelphia, PA 19104. USA 2Department of Pathology, The Children’s Hospital of Philadelphia Abramson Research Center,

University of Pennsylvania School of Medicine, Philadelphia, PA 19104. USA

Running Title: Expression profiling of the endocrine pancreas

*Correspondence: [email protected]

Received for publication 26 September 2007 and accepted in revised form 30 November 2007.

Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org.

Diabetes Publish Ahead of Print, published online December 10, 2007

Copyright American Diabetes Association, Inc., 2007

Expression profiling of the endocrine pancreas

ABSTRACT Objective: The global incidence of diabetes continues to increase. Cell replacement therapy and islet transplantation offer hope, especially for severely affected patients. Efforts to differentiate insulin producing β-cells from progenitor or stem cells require knowledge of the transcriptional programs that regulate the development of the endocrine pancreas. Research Design and Methods: Differentiation toward the endocrine lineage is dependent on the transcription factor Neurogenin 3 (Neurog3; Ngn3). We utilize a Neurog3-EGFP knock-in mouse model to isolate endocrine progenitor cells from embryonic pancreata (E13.5 through E17.5). Using advanced genomic approaches we generate a comprehensive gene expression profile of these progenitors and their immediate descendants. Results: A total of 1,029 genes were identified as being temporally regulated in the endocrine lineage during fetal development, 237 of which are transcriptional regulators. Through pathway analysis we have modeled regulatory networks involving these proteins which highlight the complex transcriptional hierarchy governing endocrine differentiation. Conclusion: We have been able to accurately capture the gene expression profile of the pancreatic endocrine progenitors and their descendants. The list of temporally regulated genes identified in fetal endocrine precursors and their immediate descendants provides a novel and important resource for developmental biologists and diabetes researchers alike.

2

Expression profiling of the endocrine pancreas

ignificant efforts to treat and potentially cure diabetes have been focused on generating renewable sources of

insulin-producing β-cells from their progenitors to be used in transplantation (1-3). This effort is motivated by the increased incidence of both type I and II diabetes and the limited effectiveness of pharmaceutical treatments for the disease. The Edmonton cadaveric islets transplantation protocol (4) opened the door to improved treatment of the disease, but faces two significant challenges: the need for improved immuno-regulatory measures and the necessity to increase the supply of islets or β-cells by a factor of at least 1,000 to be able to treat even the most severely affected patients. While we have gained significant insights into the transcriptional programs and signaling mechanisms that control the differentiation of endocrine precursors to mature hormone-expressing endocrine cells of the islet (reviewed in 5-9), efforts to direct differentiation of various cells into β-cells have met with only limited success (2).

Differentiation of the pancreatic endocrine lineage is dependent on Neurogenin 3 (Neurog3; Ngn3), a member of the family of basic helix-loop-helix (bHLH) transcription factors. Neurog3-null pancreata lack all five mature endocrine cell types (10; 11) and Neurog3 is also required for enteroendocrine cell development in the stomach and intestine (12; 13). Notably, regulatory genes marking endocrine precursors, including Pax4, Pax6, Isl1 and Neurod1 (14; 15), are not expressed in the absence of Neurog3. Ectopic expression of Neurog3 can initiate differentiation of endocrine cells in mouse (16), man (17), and pig (18). Furthermore, lineage tracing during mouse embryogenesis has revealed that Neurog3 positive cells differentiate exclusively into islet cells, suggesting that expression of this gene could be used as a

marker for isolating these progenitors for further study (19).

During pancreatic development in the mouse, expression of Neurog3 is initiated during pancreatic budding, as early as E9.5 in the dorsal primordium (20; 21). Relatively low levels of Neurog3 expression are maintained until E13.5 when a dramatic up-regulation is observed, marking the beginning of the second transition (16). Expression is believed to peak at E15.5 and then rapidly decreases to undetectable levels in juvenile and adult islets (10). Different transcriptional regulators participate in the activation of Neurog3, including Onecut1 (Hnf6) as a direct activator of its transcription and members of the FoxA family (7; 22; 23). Expression of Neurog3 is restricted to a relatively small population of cells destined for the endocrine lineages (19). After differentiation into hormone-expressing cells has occurred, expression of Neurog3 ceases. This is most likely due to negative feedback, as Neurog3 has been shown to repress its own expression (24). Homeodomain factors such as Pax4 (25) and Nkx2-2 (26) as well as the bHLH factor NeuroD1 (27) are targets for Neurog3 activation. However, the majority of downstream genes dependent on Neurog3 and the mechanisms through which this factor regulates differentiation of endocrine precursors into islet cells are unknown.

In the present study we set out to identify potential novel targets of Neurog3 and to determine the transcriptome of the endocrine lineage during fetal development. Taking advantage of the fact that EGFP protein persists in cells for several days after the promoter driving its expression has been turned off, we were able to sort Neurog3-EGFP endocrine precursors and their immediate descendants from fetal pancreas. By determining the gene expression profile of these developing cell populations, we derived gene signatures that can be used to guide

S

3

Expression profiling of the endocrine pancreas

future efforts to enhance differentiation of endocrine precursors. RESEARCH DESIGN AND METHODS Animals. Heterozygous male mice containing an EGFP-marked null allele of Neurog3 (Neurog3+/EGFP) (12) were mated with CD1 females and pregnant females sacrificed at either 12.5, 13.5, 14.5, 15.5, 16.5, or 17.5 days of gestation. Embryos were rapidly dissected and the pancreata removed and placed into PBS. Neurog3+/EGFP embryos (at the expected 50% Mendelian ratio) were easily identified by their green fluorescence and transferred into a separate buffer ready for dissociation into single cells. A total of 70 pregnant females, producing 785 embryos, were required to produce three biological replicates for each time point. Adult islets were prepared from 8 CD1 female mice as described previously (28). Islet preparations from 2 animals were pooled for RNA isolation using the RNeasy kit (Qiagen), giving a total of four biological replicate samples for expression analysis. Preparation of single cell suspensions and flow cytometry. EGFP positive pancreata were separated from EGFP negative pancreata and pooled in 500 to 1000 µl of prewarmed Trypsin at 37°C in a standard scintillation vial. The number of pancreata needed to form a biological replicate depended upon developmental stage and the number of Neurog3+/EGFP embryos harvested. On average 25 embryos from 6 females were required for each E13.5 and E14.5 replicate, 19 embryos from 5 females were required for each E15.5 and E16.5 replicate, and 12 embryos from 3 females were required for each E17.5 replicate. The pool of pancreata was minced in the trypsin solution by repeatedly chopping the samples using very fine surgical scissors (Fine Science Tools, CA) for 1-2 minutes. A small stir-bar was added to the vial and the sample was incubated at 37°C on a stir plate at low speed for 7 to 15 minutes, until no

clumps of cells were visible. The disassociation was terminated with the addition of an equal volume of RPMI 1640 containing 10% FBS, transferred to a 5 ml polystyrene round bottom Falcon tube through a 35 µm nylon mesh cell strainer cap (BD Biosciences, NJ) and placed immediately on ice ready for sorting.

Neurog3+/EGFP positive and negative cells were sorted by the University of Pennsylvania Flow Cytometry and Cell Sorting Resource Laboratory, using the FACSVantage SE (BD Biosciences, NJ). Cells were gated for viability and non-aggregates to achieve a high purity sort. Both positive and negative cells were separated and collected directly into sterile 1.5 ml microcentrifuge tubes containing 500 µl of the denaturing buffer RLT from the RNeasy mini kit (Qiagen). No more than 70,000 Neurog3+/EGFP positive cells were collected to prevent over-dilution of the denaturing buffer used for the RNA extraction. Once the sort was complete, samples were snap frozen in liquid nitrogen and stored at -80°C for further processing. Probe preparation and Microarray hybridization. Total RNA was prepared from the sorted cells or adult islets using the RNeasy Mini Kit (Qiagen, Inc., CA) and eluted in water. The quality of each RNA sample was assessed using either the RNA 6000 Pico or Nano LabChip Kit with a 2100 Bioanalyzer (Agilent Technologies, Inc., CA). Only samples of high RNA quality with a 28S/18S RNA ratio >2 and with RNA Integrity Number (RIN) greater than 7 were used.

Approximately 20ng of total RNA from each sample was used for labeling and hybridization. Samples were labeled using the Ovation™ Aminoallyl RNA Amplification and Labeling System (NuGEN Technologies, Inc., CA), which uses the rapid and sensitive Ribo-SPIA™ RNA amplification process. After amplification, the reference sample was created by pooling amplified cDNA in equal

4

Expression profiling of the endocrine pancreas

aliquots of each of the samples used in the hybridization. In order to analyze changes in gene expression occurring over time a “reference” experimental design was used for the microarray analysis. Cyanine dyes (GE Healthcare Bio-Sciences Corp., NJ) were chemically attached (coupled) to 2 µg of amplified cDNA in 50 mM sodium bicarbonate, pH 8.5, in the dark at room temperature for 1 hour. All test samples were coupled to Cy5 (red) while the reference sample was coupled to Cy5 (green). Each test sample was combined with an equal amount of the reference sample and purified using the MinElute Reaction Cleanup Kit (Qiagen, Inc., CA). The eluted sample of fluorescently labeled cDNA probe was mixed with hybridization buffer (2.5 µg Cot1 DNA, 2.5 µg oligo-dT, 25% formamide, 5X SSC, and 0.1% SDS), denatured at 95°C for 5 min and applied to the Mouse PancChip 6 spotted cDNA array (29; 30), covered with glass coverslips and incubated in a Corning Hybridization chamber (Corning Incorporated Life Sciences, MA) overnight at 42°C. The Mouse PancChip 6 contains approximately 13,000 mouse cDNAs chosen for their expression in various stages of pancreatic development, many of which are not found on commercially available arrays. Detailed information on this array and full protocols are available at http://www.cbil.upenn.edu/EPConDB. After hybridization, the coverslips were removed in 2X SSC, 0.1% SDS and the arrays were washed for 5 min in 0.2X SSC, 0.1% SDS at 42°C and 5 min in 0.2X SSC at room temperature. The arrays were immediately scanned with the Agilent DNA Microarray Scanner, Model G2565B (Agilent Technologies, Inc., CA) at a resolution of 5 µM. Data processing. The median Cy5 (red) and Cy3 (green) intensities of each element on the array were determined by processing the array images with GenePix Pro version 5.1

(Molecular Devices Corporation, CA). All subsequent steps were performed using scripts we developed for this analysis in the R open source language environment for statistical computing (http://www.r-project.org/). Positive control elements were removed from the data set and the expression ratio for each element on the array was calculated in terms of M [log2(Red/Green)] and A [(log2(Red) + log2(Green))/2)], without local background signal subtraction. The data was normalized by the print tip loess method using the BioConductor package “marray” (31). QC diagnostic plots were prepared for each array, and those failing to exhibit high quality hybridizations were excluded from further analysis. This resulted in the final data set containing three biological replicates for each embryonic time point and four biological replicates for the adult islet time point, giving a total of 19 samples in the time course. Quantitative real-time reverse transcription PCR (qRT-PCR). Gene expression profiles were confirmed using qRT-PCR. cDNA was prepared from each replicate at the five developmental time points, the adult islet samples, and from E14.5 and E16.5 EGFP- cells that were collected along with the EGFP+ cells. cDNA was synthesized from approximately 20ng of total RNA using the WT-Ovation™ RNA Amplification System (NuGEN Technologies, Inc., CA). PCR reaction mixes were assembled using the Brilliant® SYBR Green qRT-PCR Master Mix according to manufacturer’s instructions (Stratagene, CA). Reactions were performed using the SYBR Green (with Dissociation Curve) program on the Mx3000™ Multiplex Quantitative PCR System (Stratagene, CA). Cycling parameters were 95°C for 10 minutes and then 40 cycles of 95°C (30 s), 60°C (1 minute), and 72°C (30 s) followed by a melting curve analysis. All reactions were performed with 3 biological replicates and 3 technical replicates with reference dye normalization. Actb, Hprt, Gapdh, Tbp and

5

Expression profiling of the endocrine pancreas

Ubc were tested for their suitability as a housekeeping gene using the geNorm analysis package (32). Expression of Actb was shown to be unstable across the samples being analyzed and as such was not included in the qRT-PCR normalization set. The median cycle threshold (CT) value was used for analysis, and all CT values were normalized to expression of four housekeeping genes: Tbp, Hprt, Ubc and Gapdh. Primer sequences are available upon request Statistical analysis. All statistical analysis of the array data was performed using the normalized M values for all non-control elements on the array. This M value represents the ratio of the test sample over the reference sample. Identification of temporally regulated genes was performed using the R based application, EDGE (Extraction of Differential Gene Expression, version 1.1.291) (33), which uses the newly developed Optimal Discovery Procedure (ODP) statistical theory (34). For direct comparisons between any two conditions EDGE was implemented with static settings to identify differential expression. Clustering and principal component analysis was performed using GeneSpring GX (Agilent Technologies, Inc., CA).

Time course analysis of microarray data is considerably more difficult than identification of differentially expressed genes in a two-state comparison, and there are limited tools available to achieve this. Of the available statistical tools, four were compared for their ability to identify temporally regulated genes: ANOVA (GeneSpring GX), SAM (35), PaGE (36) and EDGE (33). It was very clear to us that EDGE was the superior method for identification of temporally regulated genes in this time course (data not shown). EDGE was executed with 1000 iterations, using a natural cubic spline, with a dimensional basis for the spline of 3. Analysis of the p-value vs. q-value plot and the q-value vs. no. of significant tests plot clearly indicated that a p

value cutoff of 0.05, corresponding to a q-value of 0.23, provided an ideal point to return the maximum number of significant genes with fewest false positives (Supplementary Data Figure 1 [available at http://diabetes.diabetesjournals.org]). A total of 1,170 distinct genes were returned with a P value <= 0.05. This list was filtered to remove any elements whose FC was below 1.2 between any two time points, giving a final list of 1,029 genes.

The complete list of targets organized into these broad functional groups can be found in the Supplementary Data Table 1. To aid the reader in exploring this results table, web links to the appropriate Entrez Gene description page are provided by clicking on the identifier. In addition, expression profiles were generated for the significant genes and are available in Supplementary Data Expression Profiles. The natural cubic spline used by EDGE to fit the data is shown in blue. Gene annotation enrichment analysis. The resulting gene list was annotated with functional categories using the National Institute of Allergy and Infectious Diseases DAVID bioinformatics resource, using all of those elements on the array annotated with an Entrez GeneID (37). Enrichment of these functional categories was determined using the Fisher Exact test, through a comparison of the categories found in our list when compared to the background list of all genes found on the PancChip. The complete results of this analysis can be found in Supplementary Data Table 4. Ingenuity Pathway Analysis. Biologically relevant networks were drawn from the list of 1,029 temporally regulated genes identified by EDGE analysis. These data were generated through the use of Ingenuity Pathways Analysis (IPA), a web-delivered application (www.Ingenuity.com) that enables the visualization and analysis of biologically relevant networks to discover, visualize and explore therapeutically relevant networks.

6

Expression profiling of the endocrine pancreas

The application and a detailed explanation of significance scoring were described previously (38). Of this list, 950 genes could be mapped to elements in the IPA database. To focus the analysis on those genes with most significant changes in expressions, pathway analysis was performed using a subset of 334 genes (Focus Genes) in this list that showed a fold change greater than 1.5 between any two time points. RESULTS Isolation of endocrine precursors. In order to profile the gene expression changes in fetal endocrine pancreas precursors and their immediate descendants, we employed Neurog3-EGFP knock-in mice (12). Mice homozygous for this allele develop diabetes and die shortly after birth, whereas heterozygous mice (Neurog3+/EGFP) show no apparent differences from wild-type littermates as assessed by overall development, growth characteristics and glucose homeostasis. Use of a “knock-in” allele is advantageous compared to a Neurog3 promoter-driven transgene, which might be missing essential cis-elements or which might be influenced by integration site effects. Heterozygous male mice were mated with CD1 female mice and embryos were harvested daily from E13.5 to E17.5, with a total of 377 Neurog3+/EGFP embryos being required to complete the study. Figure 1 highlights the development and differentiation of the pancreas throughout the secondary transition, with EGFP levels peaking at E14.5. This figure also illustrates the fact that EGFP is persistent in the fetal pancreas due to the long half life of the protein. Thus, we were able to isolate the direct descendants of Neurog3+ cells at later stages of pancreas development, i.e. cells in which Neurog3 transcription and new protein synthesis had been extinguished.

For optimal preparation of high quality RNA required for expression analysis, 30-

70,000 cells were sorted from pools of 12-25 pancreata to produce each biological replicate. At least three biological replicates were prepared for each time point. The procedure was attempted at E12.5 using 25 Neurog3+/EGFP embryos, but only 1000 Neurog3+ cells were obtained, representing less than 0.03% of the total cell number. This low level of expression is consistent with expression studies using in situ hybridization (10; 21).

The greatest percentage of cells expressing Neurog3-EGFP occurred on E14.5, with 5.2% of the sorted population being positive for EGFP (Figure 2A). Subsequently, there is a steady decrease in the percentage of positive cells, which likely reflects the differentiation and expansion of the population of non-Neurog3 expressing cells at a higher rate than those expressing the gene (Figure 2B). The persistence of 3,000 to 4,000 EGFP positive cells up to E18.5 is the result of the long half-life of EGFP; we observed Neurog3 mRNA levels to be considerably reduced in the latter half of the time-course and protein levels are not detectable at these late stages (10). Identification of temporally-regulated genes. Total RNA samples isolated from the sorted Neurog3-EGFP cells and passing stringent QC measures were used for expression profiling. Furthermore, unlike previous expression array-based studies in which only one or two arrays were used per time-point, we used three or four biological replicates for each time point, giving this study increased statistical power. Time course analysis of microarray data is considerably more difficult than identification of differentially expressed genes in a two-state comparison, but the EDGE program proved to be powerful in identifying temporally controlled gene expression (see Methods). Using EDGE, 1,029 temporally regulated genes were identified (complete list in Supplementary Data Expression Profiles). For the purposes

7

Expression profiling of the endocrine pancreas

of highlighting this data, genes were grouped based upon peak expression at each of the six developmental stages investigated and ranked according to their EDGE P-value. The top five most significantly temporally-regulated genes with expression levels peaking at each developmental time point are shown in Figure 3.

Expression profile data were confirmed via qRT-PCR (Figure 4). For the 18 genes tested, the results of the real-time PCR analysis closely mirrored the results of the array analysis. Little or no transcript was detected in EGFP- cells for the majority of the genes tested. The presence of Hmga2, Id1, and Id2 mRNA in these non-endocrine precursor cells may indicate an additional role for these genes in development of the exocrine pancreas. As expected, Neurog3 was not expressed in either the EGFP- cells or adult islets, indicating the high level of purity of the sorted cells. Ghrl and Pou3f4 also had no detectable mRNA in the mature islet, with expression profiles closely matching that which had been described previously (16; 39). Clustering Analysis. Hierarchical clustering was performed using the 1,029 genes identified to be temporally regulated (Figure 5A). Biological replicates for each time point were observed to cluster together, and a clear progression of similarity between time points is apparent (E13.5 most similar to E14.5, which together were similar to E15.5 and so forth). Clustering of the genes revealed that many genes were expressed in similar patterns during the time course. The cluster containing Neurog3 identified 21 genes whose expression profile was strikingly similar to this master regulator (Figure 5B). Finally, hierarchical clustering was performed on the subset of 64 genes with transcription factor activity (Figure 5C). Three patterns were evident amongst these genes: those with increasing expression over time (Sox15, Pax4, Myc), those with peak expression in the middle of the time course (Onecut1, Mafb,

Isl1), and then those with decreasing expression over time (Gata4, Neurog3, Foxa2). Neurog3 was observed to cluster most closely with Nkx2-2, which is known to function downstream of Neurog3 in the specification of the β-cell, and Mycl1, a gene with links to lung and colorectal cancer, but no prior literature with regard to its role in the pancreas.

Principal Component Analysis (PCA) was performed to identify predominant gene expression patterns. Three principal components were identified that accounted for 95% of the significant variability in the data (Figure 5D). Over half of the temporally-regulated genes were observed to cluster with Neurog3, with a pattern of decreasing expression over time. Of the genes whose expression profile correlated most significantly with Neurog3, 50% were found to be expressed in the nucleus (P < 0.007), and GO functions of development (p < 0.011) and transcription (p < 0.048) were significantly overrepresented. This is consistent with a model in which subdivision and differentiation of the endocrine lineage is dependent on a cascade of multiple transcription factors. Differential gene expression in endocrine precursors and their descendants. In addition to the temporal analysis performed with EDGE, a direct comparison analysis was performed to highlight differences between the endocrine precursors and their immediate descendants. Based upon the clustering analysis described above (Figure 5), the endocrine precursors sorted from E13.5 and E14.5 pancreata had very similar expression profiles. Likewise, the descendants of these precursor cells, which were marked by EGFP and sorted from later time points (E16.5 and E17.5), were also observed to cluster closely together, and clearly had an expression profile that was distinct from the precursors. Therefore, to gain statistical power, we performed a direct comparison between the

8

Expression profiling of the endocrine pancreas

six E13.5 and E14.5 biological replicates, representing endocrine precursors, and the six E16.5 and E17.5 biological replicates, representing their descendants. It should be noted that although we observed Neurog3 mRNA levels to be substantially reduced in this population, levels were not completely extinguished (Figure 4A). As such, the cells sorted at these latter time points may represent a mixed population of descendants and cells continuing to expressing Neurog3. 648 genes were found to be differentially expressed with an absolute fold change >=1.3 and a P-value < 0.05 (Supplementary Data Table 2). Of these genes, 182 (28%) have higher expression in endocrine precursors (E13.5/E14.5), and 466 (72%) higher expression in the descendants (E16.5/E17.5). Over 100 transcriptional regulators were identified in this analysis, 33 of which were expressed at highest levels in endocrine precursors.

An additional 338 differentially expressed genes were discovered using this approach which had not been identified in the time course analysis described earlier. These include Insm1, Nars2, Foxp1, Hhex and Cd14. Among the genes that were most highly expressed in the E16.5/E17.5 Neurog3-descendants were markers of the β-cell (Ins1, Ins2, Glut2) and ε-cell (Ghrl), indicative of differentiation towards the mature endocrine cell types. Overrepresented functions and canonical pathways were determined via pathway analysis (Supplementary Data Figure 2). Pathways involved in cellular movement, integrin signaling in relation to cytoskeletal rearrangements, protein ubiquitination and insulin receptor signaling were all observed to be significantly enriched in the progenitor cells when compared to the descendants. Strikingly, carbohydrate metabolism was absent from the progenitors but was observed to be highly enriched in the descendants, along with lipid metabolism and the Pparα/Rxrα activation pathway. These

observations reflect the differentiation of endocrine precursors into more mature endocrine cells.

Finally, we compared the expression profile of endocrine precursors (E13.5/E14.5) directly to that of adult islets. Over 2000 genes were found to be significantly differentially expressed. Of these genes, 103 were found to be highly expressed in the precursor cells, with a fold change >2 when compared with the adult islets (Supplementary Data Table 3). 25 of these genes were classified as transcriptional regulators, many of which have an unknown role in the developing endocrine pancreas, e.g. Nars2, Mycl1, Sox11, Foxb1 and Zfp306. Gene annotation enrichment analysis. Temporally-regulated genes were mapped to protein families using the Ingenuity Pathways Knowledge Base and to enriched Gene Ontology (GO) Biological Process and Molecular Function categories (Supplementary Data Figure 3). Protein metabolism and biosynthesis, cellular organization and differentiation, and development were highly enriched biological processes in endocrine precursors and their descendants. Gene annotation enrichment analysis using the list of temporally regulated genes revealed that the functions of nucleic acid binding and transcriptional regulation were significantly enriched. Detailed analysis and annotation with GO functions determined that this list contained 69 transcription factors, 71 other transcriptional regulators (such as cofactors and elements of the basal transcriptional machinery) and 97 other genes potentially involved in transcriptional regulation. Together, these genes represented 27% of the entire list of temporally-regulated genes. Table 1 lists these 246 transcription factors, transcriptional regulators and potential transcriptional regulators found to be temporally regulated during development of the endocrine precursors.

9

Expression profiling of the endocrine pancreas

Several of these transcription factors have been shown by genetic means to function in the development of the endocrine pancreas, including Neurod1, Mafa, Mafb, Nkx2-2, Foxa1, and Foxa2 (Figure 6). Moreover, markers of mature islets cells, such as Ins1, Ppy, Ghrl and Iapp were observed to have increasing expression levels over time. Interestingly, Gcg, a marker of the mature α-cell, was not significantly differentially expressed in these cells, but as expected high levels were observed in adult islets (Figure 4E). It was recently demonstrated that the role of Neurog3 in the development of the α-cell lineage occurs much earlier than the time frame studied in the present work, with the induction of Neurog3 in Pdx1+ progenitors at E8.7 resulting in an almost exclusive induction of glucagon-positive cells by E14.5 (40). Identification of regulatory networks. Gene annotation enrichment analysis provided information regarding categorical changes with regard to the biological function and metabolic activity of the temporally regulated genes in this time course. However, our specific interest in this study was to understand how individual genes were integrated into specific regulatory and signaling networks. This type of analysis has not been reported in microarray studies of the developing endocrine pancreas, and revealed several new findings.

Several major networks were identified, but by far the most significant was a complex regulatory network controlling endocrine system development and function, centered around Neurog3 (Figure 7). Many of the transcription factors previously reported to play a role in endocrine cell development, such as Pdx1, Foxa2, Neurod1, Isl1, Nkx2-2, Mafa, Pax4, are part of this network derived from our expression data. However, several genes that had limited prior evidence regarding their role in this developmental program were observed, such as Sim1, Insm1,

Id2, and Nr3c1 (glucocorticoid receptor). Furthermore, genes with no previously reported role in this development process were identified, such as Id1, H19, Yy1 and Mycl1.

Pathway analysis was also used to examine the relationship between our gene set and existing canonical signaling pathways. Four pathways in particular showed dramatically significant enrichment (see Supplementary Data Figures 4-7). 22 genes were associated with Igf1 Signaling (p-value = 1.24E-09), 7 gene with the Endoplasmic Reticulum Stress Pathway (p-value = 2.65E-05), 21 genes with Wnt/β-Catenin Signaling (p-value = 1.54E-04) and 17 genes with Insulin Receptor Signaling (p-value = 4.51E-04). DISCUSSION Expression profiling of endocrine precursors and their descendants. We identified over 1,000 genes that are temporally regulated during development of the endocrine pancreas. Strikingly, a large number of the genes identified in our analysis have no prior reports of a role in the developing pancreas and many represent novel genes with no prior art. The most significantly temporally regulated gene was Glypican 3 (Gpc3), with a marked decrease in expression over time. Glypicans are cell-surface heparan sulfate proteoglycans that are bound to the exoplasmic surface and are thought to play a role in the control of cell division and growth regulation, possibly via modulation of Wnt signaling (41). Gpc3 is known to be highly expressed during embryogenesis and was recently demonstrated to be a marker of hepatic progenitor/oval cells with an expression profile remarkably similar to that observed in our analysis of pancreas progenitors (42).

246 transcription factors, transcriptional regulators and potential transcriptional regulators were found to be temporally

10

Expression profiling of the endocrine pancreas

controlled during development of the endocrine precursors, several of which are well-established and important regulators of the endocrine pancreas, for instance Neurod1, Mafa, Mafb, Nkx2-2, Foxa1, and Foxa2. As expected, markers of mature islets cells, such as Ins1, Ppy, Ghrl and Iapp, were observed to have increasing expression levels over time, highlighting the comprehensiveness of our analysis.

Previous attempts to describe transcriptional regulation of the developing endocrine pancreas have failed to produce the in-depth transcriptional profile of endocrine precursors and their descendants presented in the current study. Gu et al. produced transcriptional profiles from four stages of endocrine pancreas development: E7.5 unspecified endoderm, E10.5 Pdx1-positive cells, E13.5 Neurog3-positive cells and adult islets (43). Expression profiling of Neurog3-precursor cells at E13.5 identified 71 genes that were temporally enriched in this population, of which only seven were transcription factors: Pou3f2 (Brn2), Isl1, Mycl1, Mafb, Myt1, Neurod1 and Pax4. With the exception of Myt1, all of these transcription factors were amongst the 246 transcription factors, transcriptional regulators and potential transcriptional regulators identified by our analysis. We observed expression of Myt1 to remain constant throughout the fetal time course, with a 2-fold reduction of expression in the adult islet (Supplementary Data Table 3). Several factors may explain this marked difference between this present study and the previous report. Gu et al. marked E13.5 endocrine precursors using a Neurog3-EGFP transgene which may be missing essential elements of the Neurog3 promoter and is less likely to represent the true expression pattern of Neurog3 that we obtained through the use of a knock-in approach. Furthermore, only one or two biological replicates were analyzed at each time point, resulting in significant

statistical limitations. Finally, their comparisons were performed using different GeneChips for each time point, the most comprehensive of which lacked approximately 4,000 of the genes found on the PancChip array (30).

An alternative approach to the use of transgenic mice to mark Neurog3+ cells has been to transduce immortalized cell lines with recombinant viral vectors expressing Neurog3 (44; 45). Microarray analysis demonstrated increased expression of 51 genes in an in vitro study where pancreatic duct cell lines were infected with a Neurog3-expressing adenovirus (44). Less than 20% of the genes claimed as Neurog3 targets from adenoviral over-expression were also identified in the present study, and were restricted to those genes whose interaction with Neurog3 has been reported elsewhere (see Supplemental Data Figure 8 for details). This striking difference between the two studies most likely reflects a significant disadvantage in studies where Neurog3 gene expression is artificially induced, in a cell line far removed from the fetal pancreas. Furthermore, there is limited evidence to show that pancreatic duct cells can function in vivo as endocrine progenitors or be induced to differentiate into mature islets cells (46).

More recently, microarray analysis was performed using a Neurog3-deficient mouse model in an attempt to identify Neurog3-dependent genes expressed in whole pancreas (47). The use of whole pancreas tissue, as opposed to sorted Neurog3+ cells, severely limited the power of this approach, as at most 5% of the cells in the total pancreas express Neurog3 (Figure 2). Therefore, the approach of using total pancreas to investigate changes resulting from the deletion of Neurog3 will result in at least a 20 fold-reduction in the power to detect alterations in gene expression. Consequently, this study only identified 52 differentially expressed genes, many of which were genes expressed at high levels in mature

11

Expression profiling of the endocrine pancreas

endocrine cells, like glucagon and insulin, and which were already known to be lacking in Neurog3-/- pancreas (10). Regulatory networks in the developing endocrine pancreas. Our pathway analysis identified several networks of genes regulating endocrine precursor specification, development and expansion. The centrality of Neurog3 in this process is highlighted in Figure 7. The signaling factors involved in specifying the developmental decision between endocrine and exocrine tissue during organogenesis of the pancreas are of considerable interest. Whereas insulin and glucagon producing cells are not related to a hormone coexpressing precursor cell, a common origin of endocrine cells does exist at the non-hormone-expressing precursor level, and both α- and β-cells develop from Pdx1 and Neurog3 positive cells (19; 48). Exocrine pancreatic cells arise from Ptf1a-expressing precursors, although it was recently demonstrated that mature pancreatic cells develop through a very early common progenitor expressing Pdx1, Ptf1a, cMyc and Cpa1 (49). In those cells not fated to become part of the islets, the transcriptional repressor Hes1, a main effector of Notch signaling, strongly inhibits Neurog3 gene promoter activity, through a mechanism know as lateral inhibition which occurs via Notch receptor signaling (20; 50). Indeed, in mice containing a homozygous mutation of the Hes1 gene, misexpression of Ptf1a throughout the gut epithelium results in ectopic pancreas formation (Fukada and Chiba, 2006).

Temporal regulation of Id1 and Id2 was highly significant and followed a pattern of down-regulation similar to that of Neurog3 as development progressed. The Id1 and Id2 proteins contain a helix-loop-helix (HLH) domain but not a basic domain, lack DNA binding activity and therefore can inhibit the DNA binding and transcriptional activation ability of bHLH transcription factors, such as Neurog3 and Neurod1. The spatial and

temporal control of the proneural bHLH factors (Neurog1, 2 and 3) and inhibitory HLH factors (Id1 and Hes1) was recently shown to coordinate the timing of differentiation of neurons and glia (51). However, the role of inhibitory HLH factors during development of the pancreas has received little attention. Unlike expression of Id2, expression of Id1 was observed to be highest in the adult islet and there is evidence to suggest that this gene may play a role in promoting β-cell function (52). Recent in vitro studies demonstrated that Bmp4 promotes the heterodimerization of Id2 to Neurod1, with a resultant decrease in Pax6 expression (53). We observed significant expression of Bmp4 during development of endocrine precursors with a peak in expression at E16.5. Our data support a model in which high expression of Bmp4 at E16.5 triggers the switch that inhibits continued transcriptional activation by bHLH factors (Neurog3, Neurod1) via regulation of inhibitory HLH factors (Id1, Id2). The result of this switch will be to block further differentiation of endocrine precursors and instead promote their expansion into mature islets cells.

In addition to the several factors with well documented roles in the development of endocrine precursors, our analysis confirmed several of the more recently discovered and less well documented potential targets of Neurog3. Insulinoma-associated 1 (Insm1 or Ia1) was found to be significantly down-regulated in the descendants of endocrine precursors (E16.5/E17.5). This transcriptional repressor was recently reported to have an essential role in β- and α-cell differentiation, acting downstream of Neurog3 and parallel with Neurod1 (45). Moreover, as is shown in Figure 7, Neurog3 is believed to form a heterodimer with Tcf3 (E47) on the E-box3 of the Insm1 promoter, and to recruit the Creb-binding protein (Crebbp) (54). Expression of Sim1, which is involved in the differentiation

12

Expression profiling of the endocrine pancreas

of neuroendocrine cells of the hypothalamic-pituitary axis (55), was recently shown to be under the control of Neurog3 in a murine embryonic stem cell line in which Neurog3 was overexpressed (56). Our data provide evidence to suggest that this in vitro observation may be biologically relevant during development of the endocrine pancreas in vivo. Although profiling of Sim1 expression by qRT-PCR (Figure 4M) supports the observations that this gene is expressed in the developing endocrine pancreas, and at much greater levels than in Neurog3-negative cells, we observed static expression profile across the time-course. This may suggest, as does Figure 7, that Sim1 is not a direct target of Neurog3.

Use of pathway analysis to examine the relationship between our gene set and existing canonical signaling pathways, strongly suggested that Igf1 and insulin receptor signaling pathways play a significant role in this process. Strikingly, Grb10 plays a critical role in both of these pathways, and along with Igfbp2 was observed to be one of the genes clustering most closely with Neurog3 (Figure 5B). This gene encodes a growth factor receptor-binding protein that interacts with insulin receptors and insulin-like growth-factor receptors, and is imprinted in a highly isoform- and tissue-specific manner. Grb10 was shown to form a complex with Nedd4, which was also highly significantly down-regulated over the time course (57; 58). The role of this protein-protein complex in the Igf1 signaling pathway is in regulating ubiquitination and stability of the Igf1 receptor (Igf1r). In response to Igf1, Grb10 associates with Igf1r and brings Nedd4 into

the vicinity of Igf1r, leading to its ubiquitination which in turn would result in the internalization and degradation of the receptor. Therefore, the high levels of Grb10 and Nedd4 expression we observed at E14.5 could represent a cellular mechanisms used to prevent continuous and enhanced activation in

response to Igf1, expression of which was seen to peak at E15.5 (58). Again, this represents a novel observation for the endocrine pancreas, suggesting a mechanism to temporally limit the rate of proliferation of endocrine precursors. CONCLUSION

Through the use of the Neurog3-EGFP knock-in model, combined with careful cell sorting, daily sampling throughout the secondary transition and beyond, and the use of powerful statistical and analytical tools, we have been able to accurately capture the gene expression profile of the pancreatic endocrine progenitors and their descendants. Furthermore, through the use of the mouse PancChip array, we have been able to capture information on numerous genes known to be expressed in the pancreas, but not found on commercially available arrays, many of which are novel and will lead to further investigation and discovery. The list of temporally regulated genes identified in fetal endocrine precursors and their immediate descendants provides a novel and important resource for developmental biologists and diabetes researchers alike. Furthermore, from our attempt to model the regulatory networks that control development of the endocrine pancreas, it is apparent that the complex interactions between these genes, and their requirements for carefully controlled spatial and temporal expression, goes far beyond what has traditionally been presented in simplified models of transcriptional hierarchy.

The identification of so many transcription factors, which are typically expressed at relatively low levels and as such often not detected in array analysis, lends significant credence to the quality of this dataset and the statistical tools employed to analyze it. A series of whole-genome association studies, in which over 380,000 single-nucleotide polymorphisms (SNPs) were analyzed in over 1,400 patients with

13

Expression profiling of the endocrine pancreas

type II diabetes, recently identified six novel loci strongly associated with the disease (59-61). The authors hypothesize that these six genes have a primary role in the β-cell. Our data provides strong evidence in support of this notion for four of the genes associated with these loci, as we found them to be differentially expressed in our time-course: Slc30a8, Tcf7l2, Cdk5rap1 and Igf2bp2 (Figure 6). Moreover, Igf2bp2 was one of the genes whose expression profile clustered most closely to that of Neurog3 (Figure 5B). Very little is known about the function of these genes, but our observations of their expression profile in murine endocrine precursors, combined with the genetic findings in man, provides strong evidence to their playing a key role not only in development of the endocrine pancreas, but also in long term islet function and the development of diabetes. ACKNOWLEDGMENTS

Thanks to Dr. Joshua R. Friedman for critical reading of the manuscript, Dr. Marco Z. Vatamaniuk for preparation of adult islets. Ryan D. Wychowanec and Dr. Jonni S.

Moore, University of Pennsylvania Flow Cytometry & Cell Sorting Facility and Drs. Elisabetta Manduchi, Joan Mazarelli, Jonathan Schugg and Chris Stoeckert for their advice on computational analysis. This study was supported by the United States National Institute of Diabetes & Digestive & Kidney Diseases (NIDDK) of the National Institutes of Health (NIH) “Functional Genomics of the Beta Cell” grant (UO1-DK56947) and the University of Pennsylvania Institute of Diabetes, Obesity and Metabolism (IDOM) DERC grant (P30DK19525). DATA ACCESS

Microarray data for this study have been deposited at Array Express with the accession number ?-????-??. The data and MIAME compliant annotation can also be queried through the user-friendly interface RAD (www.cbil.upenn.edu/RAD). The final annotated gene lists, and numerous additional analysis tools, are available through the Betacell Biology Consortium Website (www.betacell.org) via the Endocrine Pancreas Consortium Database (EPConDB: www.cbil.upenn.edu/EPConDB).

14

Expression profiling of the endocrine pancreas

REFERENCES 1. D'Amour KA, Bang AG, Eliazer S, Kelly OG, Agulnick AD, Smart NG, Moorman MA, Kroon E, Carpenter MK, Baetge EE: Production of pancreatic hormone-expressing endocrine cells from human embryonic stem cells. Nat Biotechnol 24:1392-1401, 2006 2. Bonner-Weir S, Weir GC: New sources of pancreatic beta-cells. Nat Biotechnol 23:857-861, 2005 3. Madsen OD: Stem cells and diabetes treatment. APMIS: Acta Pathol Microbiol Immunol Scand 113:858-875, 2005 4. Shapiro AM, Lakey JR, Ryan EA, Korbutt GS, Toth E, Warnock GL, Kneteman NM, Rajotte RV: Islet transplantation in seven patients with type 1 diabetes mellitus using a glucocorticoid-free immunosuppressive regimen. New Engl J Med 343:230-238, 2000 5. Edlund H: Pancreatic organogenesis--developmental mechanisms and implications for therapy. Nat Rev Genet 3:524-532, 2002 6. Habener JF, Kemp DM, Thomas MK: Minireview: transcriptional regulation in pancreatic development. Endocrinology 146:1025-1034, 2005 7. Lantz KA, Kaestner KH: Winged-helix transcription factors and pancreatic development. Clin Sci 108:195-204, 2005 8. Wilson ME, Scheel D, German MS: Gene expression cascades in pancreatic development. Mech Dev 120:65-80, 2003 9. Jensen J: Gene regulatory factors in pancreatic development. Dev Dyn 229:176-200, 2004 10. Gradwohl G, Dierich A, LeMeur M, Guillemot F: neurogenin3 is required for the development of the four endocrine cell lineages of the pancreas. Proc Natl Acad Sci USA 97:1607-1611, 2000 11. Heller RS, Jenny M, Collombat P, Mansouri A, Tomasetto C, Madsen OD, Mellitzer G, Gradwohl G, Serup P: Genetic determinants of pancreatic epsilon-cell development. Dev Biol 286:217-224, 2005 12. Lee CS, Perreault N, Brestelli JE, Kaestner KH: Neurogenin 3 is essential for the proper specification of gastric enteroendocrine cells and the maintenance of gastric epithelial cell identity. Genes Dev 16:1488-1497, 2002 13. Jenny M, Uhl C, Roche C, Duluc I, Guillermin V, Guillemot F, Jensen J, Kedinger M, Gradwohl G: Neurogenin3 is differentially required for endocrine cell fate specification in the intestinal and gastric epithelium. EMBO J 21:6338-6347, 2002 14. Naya FJ, Huang HP, Qiu Y, Mutoh H, DeMayo FJ, Leiter AB, Tsai MJ: Diabetes, defective pancreatic morphogenesis, and abnormal enteroendocrine differentiation in BETA2/neuroD-deficient mice. Genes Dev 11:2323-2334, 1997 15. Sosa-Pineda B, Chowdhury K, Torres M, Oliver G, Gruss P: The Pax4 gene is essential for differentiation of insulin-producing beta cells in the mammalian pancreas. Nature 386:399-402, 1997 16. Schwitzgebel VM, Scheel DW, Conners JR, Kalamaras J, Lee JE, Anderson DJ, Sussel L, Johnson JD, German MS: Expression of neurogenin3 reveals an islet cell precursor population in the pancreas. Development 127:3533-3542, 2000 17. Heremans Y, Van De Casteele M, in't Veld P, Gradwohl G, Serup P, Madsen O, Pipeleers D, Heimberg H: Recapitulation of embryonic neuroendocrine differentiation in adult human pancreatic duct cells expressing neurogenin 3. J Cell Biol 159:303-312, 2002

15

Expression profiling of the endocrine pancreas

18. Harb G, Heremans Y, Heimberg H, Korbutt GS: Ectopic expression of neurogenin 3 in neonatal pig pancreatic precursor cells induces (trans)differentiation to functional alpha cells. Diabetologia 49:1855-1863, 2006 19. Gu G, Brown JR, Melton DA: Direct lineage tracing reveals the ontogeny of pancreatic cell fates during mouse embryogenesis. Mech Dev 120:35-43, 2003 20. Apelqvist A, Li H, Sommer L, Beatus P, Anderson DJ, Honjo T, Hrabe de Angelis M, Lendahl U, Edlund H: Notch signalling controls pancreatic cell differentiation. Nature 400:877-881, 1999 21. Jensen J, Heller RS, Funder-Nielsen T, Pedersen EE, Lindsell C, Weinmaster G, Madsen OD, Serup P: Independent development of pancreatic alpha- and beta-cells from neurogenin3-expressing precursors: a role for the notch pathway in repression of premature differentiation. Diabetes 49:163-176, 2000 22. Jacquemin P, Durviaux SM, Jensen J, Godfraind C, Gradwohl G, Guillemot F, Madsen OD, Carmeliet P, Dewerchin M, Collen D, et al.: Transcription factor hepatocyte nuclear factor 6 regulates pancreatic endocrine cell differentiation and controls expression of the proendocrine gene ngn3. Mol Cell Biol 20:4445-4454, 2000 23. Lee JC, Smith SB, Watada H, Lin J, Scheel D, Wang J, Mirmira RG, German MS: Regulation of the pancreatic pro-endocrine gene neurogenin3. Diabetes 50:928-936, 2001 24. Smith SB, Watada H, German MS: Neurogenin3 activates the islet differentiation program while repressing its own expression. Mol Endocrinol 18:142-149, 2004 25. Smith SB, Gasa R, Watada H, Wang J, Griffen SC, German MS: Neurogenin3 and hepatic nuclear factor 1 cooperate in activating pancreatic expression of Pax4. J Biol Chem 278:38254-38259, 2003 26. Watada H, Scheel DW, Leung J, German MS: Distinct gene expression programs function in progenitor and mature islet cells. J Biol Chem 278:17130-17140, 2003 27. Huang HP, Liu M, El-Hodiri HM, Chu K, Jamrich M, Tsai MJ: Regulation of the pancreatic islet-specific gene BETA2 (neuroD) by neurogenin 3. Mol Cell Biol 20:3292-3307, 2000 28. Vatamaniuk MZ, Gupta RK, Lantz KA, Doliba NM, Matschinsky FM, Kaestner KH: Foxa1-deficient mice exhibit impaired insulin secretion due to uncoupled oxidative phosphorylation. Diabetes 55:2730-2736, 2006 29. Kaestner KH, Lee CS, Scearce LM, Brestelli JE, Arsenlis A, Le PP, Lantz KA, Crabtree J, Pizarro A, Mazzarelli J, et al.: Transcriptional program of the endocrine pancreas in mice and humans. Diabetes 52:1604-1610, 2003 30. Scearce LM, Brestelli JE, McWeeney SK, Lee CS, Mazzarelli J, Pinney DF, Pizarro A, Stoeckert CJ, Jr., Clifton SW, Permutt MA, et al.: Functional genomics of the endocrine pancreas: the pancreas clone set and PancChip, new resources for diabetes research. Diabetes 51:1997-2004, 2002 31. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80, 2004 32. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol 3:RESEARCH0034.0031-0034.0011, 2002 33. Leek JT, Monsen E, Dabney AR, Storey JD: EDGE: extraction and analysis of differential gene expression. Bioinformatics 22:507-508, 2006

16

Expression profiling of the endocrine pancreas

34. Storey JD, Dai JY, Leek JT: The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics, 2006 35. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116-5121, 2001 36. Grant GR, Liu J, Stoeckert CJ, Jr.: A practical false discovery rate approach to identifying patterns of differential expression in microarray data. Bioinformatics 21:2684-2690, 2005 37. Dennis G, Jr., Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3, 2003 38. White P, Brestelli JE, Kaestner KH, Greenbaum LE: Identification of transcriptional networks during liver regeneration. J Biol Chem 280:3715-3722, 2005 39. Prado CL, Pugh-Bernard AE, Elghazi L, Sosa-Pineda B, Sussel L: Ghrelin cells replace insulin-producing beta cells in two mouse models of pancreas development. Proc Natl Acad Sci USA 101:2924-2929, 2004 40. Johansson KA, Dursun U, Jordan N, Gu G, Beermann F, Gradwohl G, Grapin-Botton A: Temporal control of neurogenin3 activity in pancreas progenitors reveals competence windows for the generation of different endocrine cell types. Dev Cell 12:457-465, 2007 41. Song HH, Shi W, Xiang YY, Filmus J: The loss of glypican-3 induces alterations in Wnt signaling. J Biol Chem 280:2116-2125, 2005 42. Grozdanov PN, Yovchev MI, Dabeva MD: The oncofetal protein glypican-3 is a novel marker of hepatic progenitor/oval cells. Lab Invest 86:1272-1284, 2006 43. Gu G, Wells JM, Dombkowski D, Preffer F, Aronow B, Melton DA: Global expression analysis of gene regulatory pathways during endocrine pancreatic development. Development 131:165-179, 2004 44. Gasa R, Mrejen C, Leachman N, Otten M, Barnes M, Wang J, Chakrabarti S, Mirmira R, German M: Proendocrine genes coordinate the pancreatic islet differentiation program in vitro. Proc Natl Acad Sci USA 101:13245-13250, 2004 45. Mellitzer G, Bonne S, Luco RF, Van De Casteele M, Lenne-Samuel N, Collombat P, Mansouri A, Lee J, Lan M, Pipeleers D, et al.: IA1 is NGN3-dependent and essential for differentiation of the endocrine pancreas. EMBO J 25:1344-1352, 2006 46. Dor Y, Brown J, Martinez OI, Melton DA: Adult pancreatic beta-cells are formed by self-duplication rather than stem-cell differentiation. Nature 429:41-46, 2004 47. Petri A, Ahnfelt-Ronne J, Frederiksen KS, Edwards DG, Madsen D, Serup P, Fleckner J, Heller RS: The effect of neurogenin3 deficiency on pancreatic gene expression in embryonic mice. J Mol Endocrinol 37:301-316, 2006 48. Herrera PL: Adult insulin- and glucagon-producing cells differentiate from two independent cell lineages. Development 127:2317-2322, 2000 49. Zhou Q, Law AC, Rajagopal J, Anderson WJ, Gray PA, Melton DA: A multipotent progenitor domain guides pancreatic organogenesis. Dev Cell 13:103-114, 2007 50. Artavanis-Tsakonas S, Rand MD, Lake RJ: Notch signaling: cell fate control and signal integration in development. Science 284:770-776, 1999 51. Sugimori M, Nagao M, Bertrand N, Parras CM, Guillemot F, Nakafuku M: Combinatorial actions of patterning and HLH transcription factors in the spatiotemporal control of neurogenesis and gliogenesis in the developing spinal cord. Development 134:1617-1629, 2007 52. Wice BM, Bernal-Mizrachi E, Permutt MA: Glucose and other insulin secretagogues induce, rather than inhibit, expression of Id-1 and Id-3 in pancreatic islet beta cells. Diabetologia 44:453-463, 2001

17

Expression profiling of the endocrine pancreas

53. Hua H, Zhang YQ, Dabernat S, Kritzik M, Dietz D, Sterling L, Sarvetnick N: BMP4 regulates pancreatic progenitor cell expansion through Id2. J Biol Chem 281:13574-13580, 2006 54. Breslin MB, Wang HW, Pierce A, Aucoin R, Lan MS: Neurogenin 3 recruits CBP co-activator to facilitate histone H3/H4 acetylation in the target gene INSM1. FEBS Lett 581:949-954, 2007 55. Michaud JL, Rosenquist T, May NR, Fan CM: Development of neuroendocrine lineages requires the bHLH-PAS transcription factor SIM1. Genes Dev 12:3264-3275, 1998 56. Treff NR, Vincent RK, Budde ML, Browning VL, Magliocca JF, Kapur V, Odorico JS: Differentiation of embryonic stem cells conditionally expressing neurogenin 3. Stem Cells 24:2529-2537, 2006 57. Murdaca J, Treins C, Monthouel-Kartmann MN, Pontier-Bres R, Kumar S, Van Obberghen E, Giorgetti-Peraldi S: Grb10 prevents Nedd4-mediated vascular endothelial growth factor receptor-2 degradation. J Biol Chem 279:26754-26761, 2004 58. Vecchione A, Marchese A, Henry P, Rotin D, Morrione A: The Grb10/Nedd4 complex regulates ligand-induced ubiquitination and stability of the insulin-like growth factor I receptor. Mol Cell Biol 23:3363-3372, 2003 59. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331-1336, 2007 60. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, et al.: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341-1345, 2007 61. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, et al.: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316:1336-1341, 2007

18

Expression profiling of the endocrine pancreas

TABLE 1. Temporally regulated transcriptional genes in the development of the endocrine pancreas. Of the 1,029 temporally regulated genes, 246 transcription factors, transcriptional regulators and potential transcriptional regulators were found to be differentially expressed during development of the endocrine precursors.

Developmental Stage

Transcriptional Regulators (transcription factors, transcriptional regulators, potential regulators of transcription)

E13.5

Zfp445, Asb4, Nkx2-2, Etv5, Hnf4g, Sox11, Gata4, Yy1, Cutl1, C230097I24Rik, Zfp207, E2f5, Irx3, Nr2f6, Id2, Tcf12, Rbm14, Prdm4, Cnot7, Zfp143, Inppl1, Sfpq, Gmcl1, Basp1, Nono, Pspc1, Hmga2, Csde1, Rbl1, Mcm4, Abcb4, Stau1, Hnrpm, Paip1, Hmgn1, Dek, Kin, Nudt21, Mrpl2, Rbm12, Rps3, Pcbp2, Hmgn2, Hnrpa1, Rpl39, H2afy2, H2afz, Igf2bp2, Zcchc3, Prdm15, Nap1l4, Zcchc11, Ncoa6ip, Xab1, Mtpn, Bud31

E14.5

Klf5, Neurog3, Tcf7l2, Mycn, Gabpb1, Mta2, Runx2, Tcfec, Ahr, Hmgb2, Psmc5, Jarid2, Mxi1, Eaf1, Smarce1, Litaf, Ezh2, Sfrs6, Baz1a, Morf4l1, Dpf2, Hmgb3, Neo1, Ing4, Rab1, Polr2f, Khsrp, Tsnax, Npm1, Arid1a, Rpl37, Rpa2, Rps20, Rps24, Lsm2, Snrpd1, Tarbp2, Rpl9, Zfp521, Cbx5, Slbp, 2610209M04Rik, Rps18, 2610101N10Rik, Cstf2, Hnrpa2b1, Rbmxrt, Rps11, Trove2, Rps7, Aplp2, Rps6, Rps14, Sfrs2, Rps9, Rps23, Cbx1, Snrpe, Rps4x, Rpl28, Prpf40a, Parp1, Uba52, Eid1

E15.5

Isl1, Foxa2, Foxa1, Btf3, Mrg1, Pgr, Fubp1, Aebp2, Pqbp1, Neurod1, Hdac1, Eya2, Zfp710, 1810035L17Rik, Rpo1-1, Ciz1, Mrpl1, Msh3, Hnrpr, Rbm7, Rbm8a, H3f3a, Sf3a3, Srp14, Rps29, U2af2, Rap1b, Top2b

E16.5

Mycl1, Tbx19, Zhx2, Sox7, Zfp263, Preb, Tshz1, Satb1, Bcor, Stat1, Mafb, Vps72, Onecut1, Sec14l2, Mlxipl, Grlf1, Rfx1, Ncor1, Med12, Brca1, Rbpsuh, Hbp1, Mybl2, Hes2, Zfp652, Zfp9, A430033K04Rik, Ddx5, Zcchc9, Hist2h2aa1, Nufip2, Zc3h7a, Tiparp, Sfrs7, Zbtb20, Isg20, Ptma

E17.5 Fosl2, Cebpb, Gata3, Egr1, Smarca4, Zfp579, 1300003B13Rik, Strbp, Thoc1, H2afv

Islets

Rere, Foxd3, Pitx1, Sox12, Ankib1, Pax4, Nfya, Trps1, Sox15, Hey1, Nfat5, Mitf, Mafa, Creb3l2, Fos, Onecut2, Myc, Hod, Nr3c1, Rora, Nr4a3, Nr4a1, Nr1d2, Klf9, Id1, Pcaf, Baz1b, Mll3, Zfp715, Sf1, Med28, Ctnnd1, Pde8b, Zfp36l2, Ankrd57, Arid5a, Bicc1, Hist1h3f, Prpf19, Rod1, Prkra, Mbnl1, Atxn2, Rbpms, Nucb2, Snrpb2, Cpeb1, Ddx50, G3bp2, Hexim1, Ncoa4

19

Expression profiling of the endocrine pancreas

FIGURE LEGENDS Figure 1. Development of the endocrine pancreas. Heterozygous male mice containing an EGFP-marked null allele of Neurog3 (Neurog3+/EGFP) were mated with CD1 female mice. Pregnant females were sacrificed at either 13.5, 14.5, 15.5, 16.5, or 17.5 days of gestation. The photographs on the left represent bright field images of WT pancreata (left) and Neurog3+/EGFP (right). The panel on the right shows pancreata from Neurog3+/EGFP embryos identified by their green fluorescence. The Neurog3+/EGFP cells are located near the center of the organ. Figure 2. The proportion of endocrine precursors in the pancreas is highest at E14.5 and descendants of Neurog3 expressing cells can be marked by the persistence of EGFP. Neurog3+/EGFP pancreata were disassociated into a single cell suspension after trypsinization (see Experimental Procedures). The number of biological replicates (individual sorts) used is shown on the x-axis under the developmental stage, and the number below this represents the total number of embryos that were employed at that stage. Neurog3+/EGFP positive and negative cells were FACS-sorted and used for RNA extraction. (A). The percentage of EGFP positive cells was determined in reference to the total number of cells sorted. EGFP expression began at E13.5 and peaked the following day. EGFP expression declined rapidly thereafter, but positive cells could still easily be sorted until E17.5. (B). The total number of EGFP positive cells per pancreas was determined. This figure clearly demonstrates the ability of EGFP to mark descendants of Neurog3+ cells, with the long half-life of EGFP resulting in an accumulation of positive cells. At E18.5, four days following the peak in Neurog3 expression, a decline in the number of marked cells was apparent. Figure 3. Expression profiles of the five most significantly temporally regulated genes at each of the six developmental time points studied. 1,029 genes were found to be temporally regulated during development of endocrine precursors. Genes were observed to peak in expression at each of the six time points investigated. The top five genes, ranked by their EGDE p-value, at each time point are depicted. Plots are headed with official gene symbol: EDGE rank (p-value). The individual data points for each sample are represented by the circles, and left to right represent E13.5, E14.5, E15.5, E16.5, E17.5 (black) and adult islets (gray). The natural cubic spline used by EDGE to fit the data is shown by a dashed line. Expression in adult islets is shown for reference. The intent of this figure is to highlight some of the most significant expression profiles; the entire set of temporally expressed genes can be found in the Supplementary Data Expression Profiles file. Figure 4. Patterns of temporal regulation are consistently confirmed using Quantitative Real-Time PCR (qRT-PCR). cDNA was prepared from each replicate used for the analysis (gray bars) and from E14.5 and E16.5 EGFP-negative cells (white bars) that were collected along with the EGFP+ cells. For the 18 genes tested, the results of this qRT-PCR analysis closely mirrored the results of the array analysis. For the majority of genes tested, little or no transcript was present in EGFP- cells. Figure 5. Hierarchical clustering analysis identifies three principal components within the time-course and highlights several novel genes which were observed to cluster closely with Neurog3. Hierarchical clustering was performed using the subset of 1,029 genes identified to be

20

Expression profiling of the endocrine pancreas

temporally regulated using EDGE. Clustering was performed using the Average Linkage clustering algorithm and the Pearson Correlation as a measure of similarity (A). Biological replicates for each time point were observed to cluster together. Clustering of the genes revealed that many genes were expressed in similar patterns during the time course. The cluster containing Neurog3 identified 21 genes whose expression profile was strikingly similar to that of Neurog3 (B). Hierarchical clustering using the subset of 64 genes with transcription factor activity identified several transcription factors whose expression profile closely resembled that of Neurog3 (C). Three principal components were identified that accounted for 95% of the significant variability in the data. As such, K-means clustering was performed using the Pearson Correlation to cluster the genes into three groups with a high degree of similarity within each cluster, and a low degree of similarity between clusters (D). Over half of the genes identified were observed to cluster with Neurog3, with a pattern of decreasing expression over time. Figure 6. Temporal expression profiles of transcription factors during development of the endocrine pancreas. The individual data points for each sample are represented by the circles, and left to right represent E13.5, E14.5, E15.5, E16.5, E17.5 (black) and adult islets (gray). Plots are headed with official gene symbol: EDGE rank (p-value). The natural cubic spline used by EDGE to fit the data and determine significance is shown by a dashed line. Expression in adult islets is shown for reference. (A). Transcription factors known and speculated to play an important role during development of the endocrine pancreas. (B). Expression profiles of immature and mature hormones of the endocrine islet, demonstrating temporal regulation of Ins1, Ghrl and Ppy, but not Gcg. (C). Of the six diabetes-associated loci recently identified in a series of whole-genome association studies (59-61), four of the genes were observed to be differentially expressed in our time-course: Slc30a8, Tcf7l2, Cdk5rap1 and Igf2bp2. Figures 7. Pathway Analysis identifies a complex regulatory network controlling endocrine system development and function in the pancreas. The network is displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). Nodes were colored to highlight those genes whose differential expression was significant in either the temporal regulation analysis or static analysis. Color intensity indicates significance, from the most significant (dark red) to least significant (light pink) genes. Nodes are displayed using various shapes that represent the functional class of the gene product. Edges are displayed with various labels that describe the nature of the relationship between the nodes. The above network was produced by expanding the most significant network (34 “focus” genes and a score of 60) with additional information in PubMed. To simplify the image many of those interactions not demonstrated in the pancreatic tissues were removed.

21

Expression profiling of the endocrine pancreas

FIGURE 1

22

Expression profiling of the endocrine pancreas

FIGURE 2

23

Expression profiling of the endocrine pancreas

FIGURE 3

24

Expression profiling of the endocrine pancreas

FIGURE 4

25

Expression profiling of the endocrine pancreas

FIGURE 5

26

Expression profiling of the endocrine pancreas

27

Expression profiling of the endocrine pancreas

FIGURE 6

28

Expression profiling of the endocrine pancreas

FIGURE 7

29