Medaka genomics: a bridge between mutant phenotype and gene function

10
Review Medaka genomics: a bridge between mutant phenotype and gene function Kiyoshi Naruse a, * , Hiroshi Hori b , Nobuyoshi Shimizu c , Yuji Kohara d , Hiroyuki Takeda a, * ,1 a Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan b Division of Biological Science, Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan c Department of Molecular Biology, Keio University School of Medicine, Shinjuku-ku, Tokyo 160-8582, Japan d Genome Biology Laboratory, Center for Genetic Resource Information, National Institute of Genetics, Mishima 411-8540, Japan Received 6 January 2004; received in revised form 5 March 2004; accepted 14 May 2004 Abstract Recent advances in medaka genetics have proven that the medakafish is an excellent model system for developmental and evolutionary biology studies and that it can complement similar studies in zebrafish. Large-scale mutagenesis projects are now being conducted by several groups in Japan and are delivering a vastly expanded pool of medaka mutant stocks. This growing availability of genomic resources will greatly accelerate progress in moving from mutant phenotypes to the elucidation of gene function. This phenotype-driven approach can be expected to lead to the identification and characterization of novel genes and pathways in vertebrate genomes. This review discusses the current state of medaka genomic resources, the state of medaka gene mapping and medaka genome sequencing projects. q 2004 Elsevier Ireland Ltd. All rights reserved. Keywords: Medaka; Genetic map; Genome sequencing; SNP mapping 1. Introduction The Medaka, Oryzias latipes (order Beloniformes), is a small, egg-laying freshwater teleost fish found in brooks and rice paddies in eastern Asia, primarily in Japan (Shima and Mitani, 2004). There are two principal advantages of the medaka, as a model for vertebrate genomics, over the more commonly used zebrafish system. The first is the medaka genome is smaller (800–1000 Mb) (Uwa and Iwata, 1981; Lamatsch et al., 2000), being about half the size of the zebrafish genome and one-third that of the human genome. The second is that there are highly polymorphic inbred medaka strains available that can be used for both mutagenesis screening and genetic mapping (Wittbrodt et al., 2002). Bony fish have undergone significant genome-wide gene duplication during their evolution and both zebrafish and medaka have been shown to have seven Hox clusters (Amores et al., 1998; Wittbrodt et al., 1998; Naruse et al., 2000; Taylor et al., 2003) whereas only four have been found in mouse. Generally, duplicated genes are subject to disfunctionalization, neo-functionalization and sub-functio- nalization (Force et al., 1999), resulting in greater genetic diversity within fish species. Considering the long evolu- tionary distance between them (diverging 110 – 160 million years ago, Hedges and Kumar, 2002; Wittbrodt et al., 2002), one would expect that the medaka and zebrafish species would have different repertoires of gene sets that would result in a different spectrum of mutant phenotypes (Loosli et al., 2000; Ishikawa, 2000). Ongoing large-scale ENU mutagenesis of the medaka genome is providing a rapid and massive expansion of available medaka mutant resources (Furutani-Seiki et al., 2004). The ultimate goal of large-scale medaka mutagenesis projects is the identification of novel genes and pathways, thereby obtaining new insights into gene function in vertebrates. This can be achieved through the rapid progression from mutant phenotypes to an understanding of specific gene functions using medaka genomics. Medaka genomics is also providing new insights into vertebrate genome evolution by comparative analyses with the substantial genomic information that now exists for other vertebrates such as human, mouse, Fugu and zebrafish. To accelerate this progress, a medaka whole-genome shotgun 0925-4773/$ - see front matter q 2004 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.mod.2004.04.014 Mechanisms of Development 121 (2004) 619–628 www.elsevier.com/locate/modo 1 Tel.: þ81-3-5841-4431. * Corresponding author. Tel.: þ 81-3-5841-4443. E-mail addresses: [email protected] (K. Naruse), htakeda@ biol.s.u-tokyo.ac.jp (H. Takeda).

Transcript of Medaka genomics: a bridge between mutant phenotype and gene function

Review

Medaka genomics: a bridge between mutant phenotype and gene function

Kiyoshi Narusea,*, Hiroshi Horib, Nobuyoshi Shimizuc, Yuji Koharad, Hiroyuki Takedaa,*,1

aDepartment of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, JapanbDivision of Biological Science, Graduate School of Science, Nagoya University, Nagoya 464-8602, Japan

cDepartment of Molecular Biology, Keio University School of Medicine, Shinjuku-ku, Tokyo 160-8582, JapandGenome Biology Laboratory, Center for Genetic Resource Information, National Institute of Genetics, Mishima 411-8540, Japan

Received 6 January 2004; received in revised form 5 March 2004; accepted 14 May 2004

Abstract

Recent advances in medaka genetics have proven that the medakafish is an excellent model system for developmental and evolutionary

biology studies and that it can complement similar studies in zebrafish. Large-scale mutagenesis projects are now being conducted by several

groups in Japan and are delivering a vastly expanded pool of medaka mutant stocks. This growing availability of genomic resources will

greatly accelerate progress in moving from mutant phenotypes to the elucidation of gene function. This phenotype-driven approach can be

expected to lead to the identification and characterization of novel genes and pathways in vertebrate genomes. This review discusses the

current state of medaka genomic resources, the state of medaka gene mapping and medaka genome sequencing projects.

q 2004 Elsevier Ireland Ltd. All rights reserved.

Keywords: Medaka; Genetic map; Genome sequencing; SNP mapping

1. Introduction

The Medaka, Oryzias latipes (order Beloniformes), is a

small, egg-laying freshwater teleost fish found in brooks and

rice paddies in eastern Asia, primarily in Japan (Shima and

Mitani, 2004). There are two principal advantages of the

medaka, as a model for vertebrate genomics, over the more

commonly used zebrafish system. The first is the medaka

genome is smaller (800–1000 Mb) (Uwa and Iwata, 1981;

Lamatsch et al., 2000), being about half the size of the

zebrafish genome and one-third that of the human genome.

The second is that there are highly polymorphic inbred

medaka strains available that can be used for both

mutagenesis screening and genetic mapping (Wittbrodt

et al., 2002).

Bony fish have undergone significant genome-wide gene

duplication during their evolution and both zebrafish and

medaka have been shown to have seven Hox clusters

(Amores et al., 1998; Wittbrodt et al., 1998; Naruse et al.,

2000; Taylor et al., 2003) whereas only four have been

found in mouse. Generally, duplicated genes are subject to

disfunctionalization, neo-functionalization and sub-functio-

nalization (Force et al., 1999), resulting in greater genetic

diversity within fish species. Considering the long evolu-

tionary distance between them (diverging 110–160 million

years ago, Hedges and Kumar, 2002; Wittbrodt et al., 2002),

one would expect that the medaka and zebrafish species

would have different repertoires of gene sets that would

result in a different spectrum of mutant phenotypes (Loosli

et al., 2000; Ishikawa, 2000).

Ongoing large-scale ENU mutagenesis of the medaka

genome is providing a rapid and massive expansion of

available medaka mutant resources (Furutani-Seiki et al.,

2004). The ultimate goal of large-scale medaka mutagenesis

projects is the identification of novel genes and pathways,

thereby obtaining new insights into gene function in

vertebrates. This can be achieved through the rapid

progression from mutant phenotypes to an understanding

of specific gene functions using medaka genomics. Medaka

genomics is also providing new insights into vertebrate

genome evolution by comparative analyses with the

substantial genomic information that now exists for other

vertebrates such as human, mouse, Fugu and zebrafish. To

accelerate this progress, a medaka whole-genome shotgun

0925-4773/$ - see front matter q 2004 Elsevier Ireland Ltd. All rights reserved.

doi:10.1016/j.mod.2004.04.014

Mechanisms of Development 121 (2004) 619–628

www.elsevier.com/locate/modo

1 Tel.: þ81-3-5841-4431.

* Corresponding author. Tel.: þ81-3-5841-4443.

E-mail addresses: [email protected] (K. Naruse), htakeda@

biol.s.u-tokyo.ac.jp (H. Takeda).

sequencing project began in late 2002 at the National

Institute of Genetics (NIG) in Mishima, Japan. This review

discusses the current state of genetic mapping analyses, the

progress in genome sequencing and the isolation of ESTs in

addition to the other available genomic resources for the

medaka.

2. The origins of inbred medaka strains and genetic

differences between wild medaka populations

In the medaka, at least 15 inbred strains have so far been

described (see Table 1) (Hyodo-Taguchi, 1996; Shimada

and Shima, 1988; Loosli et al., 2001) and all except Cab,

AA2 and Kaga were established by Hyodo-Taguchi at the

National Institute of Radiological Sciences (NIRS) (Hyodo-

Taguchi, 1996). Inbred lines were derived from three

different wild populations of medaka, southern Japanese

(HO4C, HO5, HB32C, HB32D, HB12A, HB11A, HB11C,

Hd-rR, Hd-rr, Cab and AA2), northern Japanese (HNI-I,

HNI-II and Kaga) and east-Korean (HSOK) populations.

The origins of some of these strains are well known; for

example, the Hd-rR (a target of the genome sequencing

project, see below) and Hd-rr lines were derived from a

closed colony established by the late Toki-o Yamamoto at

Nagoya University for the purposes of experiments on sex

reversal by oestrogen and androgen (Yamamoto, 1953;

Yamamoto, 1958). The Cab strain established by Witt-

brodt’s group (Winkler et al., 2000) was originally obtained

from a commercial strain (of southern Japanese origin)

available from Carolina Biological Supply (http://www.

carolina.com/). The AA2 strain, which has three recessive

pigmentation phenotypes, was established by Shimada and

Shima (1988). The HNI and Kaga strains, however,

originated from northern medaka populations; the HNI-I

and II strains originated from a wild population in Niigata

City, Niigata prefecture, whereas the Kaga strain was

established from Kaga City, Ishikawa prefecture. From a

genetic standpoint, the three wild medaka populations have

a relatively similar level of genetic divergence from each

other, with the southern and northern populations grouped

together as a sister group (Sakaizumi, 1984; Takehana et al.,

2003). Although inbred strains from either the southern or

northern populations are commonly used, the Korean

HSOK strain will be of great usefulness because of its

genetic differences from these two main populations

(Sakaizumi and Joen, 1987).

It is known that the northern and southern Japanese

medaka populations differ from each other in many

morphological, behavioural and genetic characteristics. In

spite of these differences, intercrosses breed normally

producing hybrid offspring. Sequence comparisons of

orthologous loci reveals single nucleotide polymorphisms

(SNPs) between the two populations at a frequency of more

than 1% within exons and 3% in introns, in addition to many

insertions and deletions (Ohtsuka et al., 1999; Naruse et al.,

2000). When considering the fact that a 1–2% difference

exists between the human and great ape genomes (Fujiyama

et al., 2002), the SNP frequency between the two medaka

populations seems quite high. Hence, they are extremely

useful for genetic mapping of genes and mutations which

can be readily induced in medaka inbred lines. Information

on inbred lines and spontaneous mutants is available at

http://biol1.bio.nagoya-u.ac.jp:8000/.

3. Genetic mapping

High density genetic linkage maps are essential tools for

the identification of genes responsible for mutant pheno-

types and also for comparative and evolutionary genomics.

The markers that are commonly used for linkage map

Table 1

Medaka inbred strains and their features

Strain Genetic background Origin Special features Reference

HO4C Southern population Commercial strain in Japan Orange-red color (b/b) Hyodo-Taguchi, 1996

HO5 Southern population Commercial strain in Japan Orange-red color (b/b) Hyodo-Taguchi, 1996

HB32C Southern population Wild population in Chiba Wild type Hyodo-Taguchi, 1996

HB32D Southern population Wild population in Chiba Wild type Hyodo-Taguchi, 1996

HB12A Southern population Wild population in Chiba Wild type Hyodo-Taguchi, 1996

HB11A Southern population Wild population in Chiba Wild type Hyodo-Taguchi, 1996

HB11C Southern population Wild population in Chiba Wild type Hyodo-Taguchi, 1996

Hd-rR Southern population Stock at Nagoya Univ. Orange-red in male and white in female Hyodo-Taguchi, 1996

Hd-rr Southern population Stock at Nagoya Univ. White in both male and female Hyodo-Taguchi, 1996

Cab Southern population Commercial strain from Carolina

Biological Supply

Variegated body color Loosli et al., 2001

AA2 Southern population Mutant stocks at Nogoya Univ. b/b, gu/gu and lf/lf genotype Shimada and Shima, 1988

HNI-I Northern population Wild population in Niigata Wild type Hyodo-Taguchi, 1996

HNI-II Northern population Wild population in Niigata Wild type Hyodo-Taguchi, 1996

Kaga Northern population Wild population in Kaga Wild type Loosli et al., 2001

HSOK East-Korea population Wild population in Sokcho, Korea Wild type Hyodo-Taguchi, 1996

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628620

analysis include phenotypic traits, expressed sequence tags

(ESTs), randomly amplified polymorphic DNA markers

(RAPDs), amplified fragment length polymorphic makers

(ALFPs) and microsatellite markers. A medaka linkage map

was first described by Aida (1921), who demonstrated that

the male-determining factor (Y locus) had linkage to the

gene controlling carotenoid deposition in xanthophores

(R locus). More recently, a detailed, genome-wide linkage

map of the medaka was constructed using 633 markers (488

AFLPs, 28 RAPDs, 34 IRSs, 75 ESTs, four STSs and four

phenotypic traits) (Naruse et al., 2000). This map utilized

the high degree of polymorphism between two inbred

strains, HNI and AA2, derived from the northern and

southern Japanese populations, respectively, and was

constructed with a reference typing DNA panel from 39

cell lines derived from back-cross progeny of male meioses.

The number of linkage groups (LGs) was 24, which is

equivalent to the haploid chromosome number of the

medaka.

Most of the markers used for mapping were ‘DNA

fingerprinting’ polymorphisms and were used without other

genomic information, thus allowing a genome-wide genetic

map to be established rapidly. However, many of these

markers were found to be strain-specific and therefore

difficult to apply to other typing panels. For position-based

cloning, mutants have their own genetic background and

thus ‘portable single-locus type’ markers are needed that

represent a unique region of the medaka genome. To

develop such markers, ESTs from medaka cDNA libraries

were isolated from various sources and the resulting

sequence data were used to design PCR primer sets (Naruse

et al., 2004; Kimura et al., 2004). Due to the considerable

genetic divergence between the southern and northern

medaka populations, 80% of the amplified fragments that

appeared as a single band from the two populations showed

fragment length polymorphisms either directly or following

digestion with eight commonly used restriction enzymes.

The current medaka genetic map has accumulated 1762

single-locus type DNA markers including 1722 genes and

ESTs, though the resolution of the map is around 2 cM with

the 39 meiosis panel. Theoretically, using this number of

markers, one can map a mutated locus at the resolution of

500 kb, assuming the genome size of medaka is 800 Mbp.

For position-based cloning of novel mutants, usually DNA

marker within 1 cM of a mutant locus is obtained if over 100

meioses are analyzed.

Although the recombination frequency around the

mutant locus is one of the most important factors that affect

the success of positional cloning, the relationship between

the physical and genetic map lengths in cM has not been

precisely determined yet for medaka. This is because there

are only a few reported cases of successful positional

cloning of mutated medaka genes. One such example is the

sex-determining region of the medaka Y chromosome

(Matsuda et al., 2002) that harbours the transcription factor,

DMY, which contains a highly conserved DM domain

and plays a critical role in testis development (Matsuda et al.,

2002; Nanda et al., 2002). Matsuda et al. (2002) reported

that the map distance between markers 135D12.F and

51H7.F is 0.98 cM, which corresponds to about 500 kb

(510 kb/cM). Another reported example is the medaka B

locus, which was positionally cloned by Fukamachi et al.

(2001) and encodes a novel transporter protein, AIM1 that

affects melanin formation. According to the mapping data,

two STS markers, C27F and C27R, located at either end of a

36.3 kbp cosmid insert, correspond to 0.55 cM (66 kb/cM)

in female meiosis. This region therefore shows extremely

high recombination frequency, as the average is estimated at

470 kb/cM, based on cumulative map lengths in female

meiosis. These findings indicate that if the mutant locus of

interest is mapped to a region in which many markers are

clustered or a region with a low recombination frequency, it

may result in a poor outcome for position based cloning.

Table 2 summarizes statistical information for the

current map of medaka LGs. If the medaka genome is

800 Mbp, the estimated physical length of each LG would

vary from 59 to 19 Mbp; this estimation is based on the

distribution of anonymous DNA markers in each LG. The

cumulative map distances in each LG range from 104 to

26 cM in male meiosis. The largest LG, based on

Table 2

Distribution of mapped markers in each medaka linkage group

Medaka

linkage

group

No. mapped

EST And

gene

No.

anonymous

DNA markera

Ratiob Estimated

physical

length

(Mb)

Longest

segment

(cm)

1 33 41 0.80 59 53.3

2 21 40 0.53 58 43.6

3 32 38 0.84 55 45.6

4 28 37 0.76 53 104.4

5 51 29 1.76 42 70.6

6 26 25 1.04 36 53.5

7 49 24 2.04 35 45.6

8 44 24 1.83 35 80.5

9 34 23 1.48 33 70.9

10 38 23 1.65 33 65.6

11 42 22 1.91 32 59.1

12 37 21 1.76 30 80.8

13 34 21 1.62 30 31.8

14 34 21 1.62 30 57.9

15 29 21 1.38 30 59.6

16 44 20 2.20 29 75

17 39 18 2.17 26 64.1

18 19 18 1.06 26 61.4

19 34 17 2.00 25 52.4

20 28 16 1.75 23 79.5

21 38 14 2.71 20 76.1

22 41 14 2.93 20 26.6

23 21 14 1.50 20 43.6

24 23 13 1.77 19 52

819 554 1.48 800 1401.5

a Number of AFLP markers, RAPD markers and other STS markers.b Ratio of number of mapped EST and Gene markers and numbers of

anonymous markers.

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628 621

the estimated physical length, is LG1 (59 Mbp) whereas the

longest LG based on recombination mapping is LG4

(104 cM). As described above, the physical lengths of the

LGs may not simply reflect map distances. The distribution

of mapped markers is not uniform; 16 out of 24 LGs have

large clusters of markers and over 30% of the markers

within each LG mapped to the same positions in a male

meiotic panel. These results suggest that recombination

events in specific regions within these LGs are restricted

during male meiosis. We are now examining if this

phenomenon can also be observed in female meiosis (see

http://medaka.dsp.jst.go.jp/MGI/LG22/). The distribution

of both anonymous DNA markers and ESTs reveals

differences in gene density for each LG; the gene density

of LG2 is 3.2 times lower and that of LG22 is 1.7 times

higher than the average. The medaka genetic map would

provide reliable anchor points for positional cloning and a

portion of these mapping data are available at http://mbase.

bioweb.ne.jp/~dclust/medaka_top.html.

4. Evolutionary analyses

As described above, most single-locus type markers

mapped in medaka are attributable to either genes or ESTs

(Naruse et al., 2004) that have significant homology to

genes of other species. By comparison to the map positions

of these markers, we analysed the degree of synteny

conservation between different species. Fig. 1A and B show

Oxford grids for medaka, zebrafish and human, and indicate

that between medaka and human the distribution of

orthologous gene pairs seems scattered but is obviously

not random. One can easily detect clusters of orthologous

gene pairs in a medaka/human matrix, suggesting that the

medaka and human genomes share many conserved

syntenic segments even after more than four hundred

million years divergence from a common ancestor (Kumar

and Hedges, 1998). A greater number of orthologous gene

pair clusters are found in the medaka/zebrafish matrix

(Fig. 1B) and if the criterion of conserved synteny is set to at

least five orthologous pairs located on the same LG, the

conserved syntenic segments in LG1/LG1, LG3/LG7, LG7/

LG23 etc. becomes apparent.

For detailed analysis on conserved synteny, mapped

genes are sorted by human chromosome numbers, followed

by the assignment of colours to each human chromosome

(i.e. 23 colours). This procedure may exclude inversion

events within each chromosome that occurred following

the divergence of the medaka, zebrafish and human

lineages, and it reveals an interesting feature of chromo-

some evolution in vertebrates. For example, medaka LG11

and LG16 and zebrafish LG19 and LG16, which harbour

the HoxA cluster, show domain structures similar to human

orthologous gene pairs (Fig. 2). They each contain blocks

that correspond to human chromosome 1 (hsa1), hsa3,

hsa6, hsa7 and hsa8. This suggests that these chromosomes

arose from duplication of a single ancestral chromosome

(Proto-chromosome) and have maintained a paired

relationship. In principle, this can be extended to all LGs

of medaka and zebrafish, although there are three notable

exceptions in terms of the paired relationship. Phylogen-

etically, zebrafish and medaka have a relatively high level

of divergence from each other in terms of the ray finned-

fish lineages (Nelson, 1994; Kumar and Hedges, 1998;

Fig. 1. Oxford grid display of medaka-human (A) and medaka-zebrafish (B). Numbers in the cells depict the number of orthologous gene pairs in each matrix.

818 orthologous genes were analysed in the medaka-human matrix and 255 in the medaka-zebrafish matrix, respectively (For detail, see Naruse et al., 2004).

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628622

Miya et al., 2003). Indeed, medaka and zebrafish are

thought to diverge directly from a common ancestor of

nearly all euteleosts. The paired-chromosome relationship

must therefore be found in most euteleosts. Our results,

together with the current accumulated evidence, strongly

suggest that whole-genome duplication occurred in a

common ancestor of almost all euteleosts (Postlethwait

et al., 2000; Naruse et al., 2004).

5. Rapid mapping of medaka mutations using

the EST-marker set

The first step in the positional cloning of mutated genes is

their approximate assignment to a genomic region. To

facilitate this step, a bulked segregation analysis (BSA) with

selected EST markers (referred to as the M-marker set) has

been employed (Kimura et al., 2004). The latest version of

the M-marker set (M-marker 2003) could be applied to any

combination of HNI/Kaga and Hd-rR/AA2/Cab strains (see

detail for Kimura et al., 2004 and website http://medaka.lab.

nig.ac.jp). A similar system was established using a mapping

cross of Kaga and Cab strains (Martinez-Morales et al.,

2004). Both systems are equally effective but the M-marker

set seems more universal in that it can be applied to most

common strains including Hd-rR, d-rR, Cab, AA2, HNI and

Kaga. High-resolution mapping will necessarily follow by

the use of a higher number of mapped markers and embryos

with the result that mutation sites can finally be narrowed

down to a region that is covered by a small number of BACs.

Fig. 2. An example of a paired-chromosome relationship in medaka and zebrafish LGs. These four LGs have similar colour patterns, showing blocks of hsa1,

hsa3, hsa6, hsa7 and hsa8. Fifteen orthologous genes pairs were mapped to medaka LG11 and zebrafish LG19 and six orthologous genes pairs were mapped to

medaka LG16 and zebrafish LG16, suggesting the medaka LG11/zebrafish LG19 and medaka LG16/zebrafish LG16 are orthologous chromosomes,

respectively. However, two genes, RXRB and TWIST1 mapped to medaka LG16 are located on zebrafish LG19. This suggests a lineage specific loss of the

duplicated copy. Alternatively, another copy of the duplicated gene remains to be found in one or both of the two species. As a whole, these patterns suggest a

common origin of these four LGs (For detail, see Naruse et al., 2004).

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628 623

6. BAC libraries

In addition to a detailed linkage map, high quality

bacterial artificial chromosome (BAC) libraries are a

prerequisite for successful positional cloning of mutated

genes. The BAC system has been most commonly used for

large-insert DNAs. High-quality BAC libraries for medaka

should increase the rate of positional cloning of mutant

genes and are essential tools for subsequent genomic

analyses. At the moment, three gridded BAC libraries

have been established from different medaka strains. A

library derived from the Hd-rR southern inbred medaka

strain (HdrR library) was constructed by Matsuda et al.

(2001) and is available upon request (H. Hori, Nagoya

University). The average insert size in this library is 210 kb

and predicted 24-times coverage of the medaka genome. A

genomic BAC library of the HNI northern inbred strain was

previously constructed by Kondo et al. (2002). The average

insert size of this library was 160 kb and was expected to

cover 20 genome equivalents (refer to Wittbrodt et al.,

2002). A second BAC library of the southern medaka strain,

Cab, has also been established (Wittbrodt et al., 2002 and

http://www.rzpd.de, RZPD library number 756). The

average insert size of this library was 150 kb and it is

available from RZPD. Use of these BAC libraries

enabled cloning of the B locus and sex-determining genes

(Fukamachi et al., 2001; Matsuda et al., 2001). Furthermore,

Kondo et al. (2002) have determined the complete

nucleotide sequences of DMRT genes in HNI BAC to

study the evolution of vertebrate DMRT families.

The HdrR and Cab BAC libraries were used to construct a

BAC-based physical map, and the first generation of a BAC-

based physical map has been generated. Zadeh Khorasani

et al. (2004) hybridized 35-mer oligonucleotides to 60,000

BAC clones, which correspond to 14-fold coverage of the

medaka genome, and aligned them into 902 map segments

containing 2721 markers. The BAC physical map will

greatly facilitate the position based cloning of novel mutants

and BAC-based genome sequencing.

7. Large-scale isolation of ESTs

The EST approach is a powerful technique for large scale

cloning of cDNAs as well as large-scale characterization of

cDNA sequences in functional genomic studies. Several

groups have isolated ESTs from the medaka embryo

(Kimura et al., 2004) and adult, and from specific tissues

such as the liver and ovary. As a result, by March 2004,

about 150,000 entries of medaka ESTs were found in public

databases. As described above, the mapping of isolated

ESTs is in progress and their number is increasing rapidly

(http://mbase.bioweb.ne.jp/~dclust/medaka_top.html).

Expression analyses of isolated ESTs are also under way

(http://www.embl-heidelberg.de/mepd/ and http://medaka.

lab.nig.ac.jp/).

To take full advantage of the large and rapidly growing

body of medaka EST information, new technologies will

most certainly be required. The most powerful and versatile

tool currently available is a high-density array of oligo-

nucleotides or cDNAs which can measure the levels of gene

expression for thousands of genes simultaneously. Oligo-

nucleotide microarrays with 8,091 genes isolated from

medaka embryos (Medaka Microarray 8 K) have been made

and tested for their usefulness in expression analyses of

developing embryo (Kimura et al., 2004). The use of

microarrays could also play an important role in candidate

gene identification as comparison of wild-type and mutant

embryos will permit the identification of affected transcripts

that correspond to a candidate gene or candidate pathway

for a particular mutant phenotype.

8. Perspectives: ongoing projects

8.1. Single nucleotide polymorphism (SNP) mapping:

towards a high-density map

SNPs are stable genetic variations that spread throughout

the genome. The number of SNPs is huge and they can be

found in any region of the genome, albeit their non-uniform

distribution has been reported in the mouse genome (Wade

et al., 2002). Furthermore, with recent advances in genomic

technologies, SNPs can be mapped by high throughput

automated methods instead of conventional gel electrophor-

esis. The use of SNPs would thus allow for the generation of

an even higher density map, which would facilitate the fine-

scale mapping of mutants in addition to the assignment of

contigs created by the genome sequencing project (see

below) to a corresponding genomic region. Given that there

is a higher SNP rate (1–3%) found between northern and

southern inbred strains of medaka (in contrast to 0.2%

among mouse inbred strains, Waterston et al., 2002), medaka

could well be an ideal vertebrate for rapid construction of a

SNP map (Fig. 3B). We (KN, YK and HT) have therefore

commenced a SNP mapping project in the medaka.

In order to undertake SNP mapping, a reference typing

DNA panel from 94 back-cross progeny between the HNI

and Hd-rR medaka strains was generated (Fig. 3A). As a

pilot analysis, nine genes were selected from EST collec-

tions of Hd-rR and HNI strains, and were mapped either by

use of SNPs or by conventional gel-electrophoresis

methods. A high-throughput MALDI-TOF system (Jurinke

et al., 2002), was used for SNP mapping and, as shown in

Fig. 3C, the typing data that were obtained proved to be

identical to those found by the conventional method.

Furthermore, this SNP mapping system has great advantages

as it is relatively easy, faster than other techniques (a few

minutes for each SNP analysis in 96 well formats) and is

also more reliable. At present, large numbers of SNPs are

being collected by simple comparisons of sequences from

HNI strains (low-redundant coverage sequence data) with

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628624

Fig. 3. SNP genotyping for constructing a high-resolution genetic map. (A) Construction of a new typing panel with 94 back-cross progeny using female meioses. A female of HNI and a male of Hd-rR were

mated to obtain F1 offspring. An F1 female was backcrossed to a male of Hd-rR and 94 back-cross progeny were obtained. (B) Example of SNPs in the coding regions between HNI and Hd-rR strains. Several

substitutions were observed even in the coding regions. (C) Comparison of genotyping data obtained by MALDI-TOF and PCR-RFLP (conventional) systems. Genotyping for all individuals was identical

between the two systems, except for a few missing results from the PCR-RFLP method.

K.

Na

ruse

eta

l./

Mech

an

isms

of

Develo

pm

ent

12

1(2

00

4)

61

9–

62

86

25

those of a high-coverage Hd-rR strain (from the medaka

genome sequencing project, see below). The goal of this

project is to generate a set of 3000 SNP markers, and together

with a further 1700 EST markers, nearly 5000 invariant

genetic markers will be assigned throughout the medaka

genome. Theoretically, map resolution can reach the range

of 1 per 180–200 kb, the average size of a BAC insert, when

hundreds of meioses are used for positional cloning.

8.2. Genome sequencing

Over the past few years, more than 30 organisms

including human and mice have had their genomes

completely sequenced and other model organisms are

currently being sequenced or are on a waiting list for future

genome sequencing projects. The complete draft sequence

of Fugu (Aparicio et al., 2002) has had a high impact on

medaka genomics, as medaka and Fugu are evolutionarily

close to each other (Miya et al., 2003), about 60 Myr apart,

in contrast to zebrafish (110–160 Myr divergence from

medaka; Wittbrodt et al., 2002; Naruse et al., 2004). In

2000, the zebrafish genome sequencing project began at the

Sanger Center (http://www.sanger.ac.uk/Info/Press/001121.

shtml) and sequencing data continues to be updated

frequently on the database web site (http://pre.ensembl.

org/Danio_rerio/). Rapid completion of the genomic

sequence of medaka will certainly be a crucial step in the

rapid movement from mutant phenotypes to characterizing

novel gene functions in addition to the analysis of genomic

evolution during fish diversification.

There are two main approaches for sequencing large,

complex genomes such as medaka; shotgun sequencing of

the entire genome (whole-genome shotgun, WGS) and

shotgun sequencing of BAC clones or contigs arranged by

fingerprinting or hybridization (hierarchical shotgun). The

WGS approach has the advantage of both simplicity and

rapid early coverage of the whole genome. Indeed WGS

sequencing is also useful for identifying genes; almost all

genes are identified by at least one database hit at a twofold

level of redundancy (Bouck et al., 1998). However, at even

higher redundancies (e.g. six-fold), gaps and misassembled

fragments remain that require further directed sequencing

to be resolved. Furthermore, the WGS approach may

encounter difficulties when applied to genomes that contain

highly repetitive sequences, such as human. The hierarch-

ical approach, on the other hand, overcomes such

difficulties by sequencing assembled contigs or BAC

clones and thus decreases the number of repeats within

sets of sequencing data. The hierarchical approach will

therefore be required for finishing the complete sequencing

of complex genomes.

Although the ultimate goal is to obtain the finished

sequence of the medaka genome, a draft sequence is needed

as rapidly as possible to accelerate the progress from

identifying mutant phenotypes to characterizing novel genes

and gene functions. Accordingly, a strategy was adopted

that is mainly based upon WGS, with the integration of

detailed map information and pair-mate sequence data from

large inserts in BAC and/or fosmid vectors. Under the

support of the Grants-in-Aid for Scientific Research in

Priority Area ‘Genome Science’ from the Ministry of

Education, Culture, Sports, Science and Technology of

Japan (MEXT), this medaka genome project started in late

2002. A southern inbred strain, Hd-rR, was chosen for

sequencing, as most medaka mutants are of southern

Japanese origin. Sequencing is being carried out at the

Academia Sequencing Centre of the NIG in Mishima, Japan.

By the end of 2003, the sequence had 4-fold genome

coverage and reached 8.9-fold in May 2004 (http://dolphin.

lab.nig.ac.jp/medaka/). Mapping information of ESTs and

SNPs, pair-mate sequences of large inserts will be integrated

into the WGS sequence data to generate an initial draft

genome sequence. Additionally, in 2002, the National

Bioresource Project of MEXT supported medaka genome

sequencing (led by Y. Wakamatsu at Nagoya University;

http://shigen.lab.nig.ac.jp/medaka/genome/indexen.html)

and WGS data of approximately 1,000,000 reads (726 Mb,

approx. 0.9-fold genome coverage) of the Hd-rR genome

Table 3

Medaka research information and genomic resource websites

Site name Content URL

Medakafish homepage Medaka resource portal site http://biol1.bio.nagoya-u.ac.jp:8000/

MGI Medaka genome initiative homepage http://medaka.dsp.jst.go.jp/MGI/

ERATO (DMG) Kondoh Differentiation Signaling Project

(Developmental mutant group homepage)

http://medaka.dsp.jst.go.jp/DMG/

M base EST and linkage database http://mbase.bioweb.ne.jp/~dclust/ml_base.html;

TIGR Medaka Gene Index Integrated research data from international EST

sequencing and gene research projects

http://www.tigr.org/tdb/tgi/olgi/

NBRP medakafish genome project Genome sequence data http://shigen.lab.nig.ac.jp/medaka/genome/indexen.html;

MEPD Medaka expression pattern data http://www.embl-heidelberg.de/mepd/

Medaka EST database EST sequence search and expression pattern data http://medaka.lab.nig.ac.jp/

Mapping mutant with M-marker Medaka mutant mapping system http://medaka.lab.nig.ac.jp/

RZPD German Resource Center for Genome Research http://www.rzpd.de/cgi-bin/products/rzpd_products.pl.cgi

Medaka genome project Medaka genome project http://dolphin.lab.nig.ac.jp/medaka/

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628626

produced by the Academia Sequencing Centre and

the RIKEN Institute are already available (July, 2002;

http://shigen.lab.nig.ac.jp/medaka/genome/top.jsp). Initial

assembled medaka sequencing data of 6.7-fold genome

coverage will be available in the public domain via the NIG

and should be accessible by the time of publication of this

review. As a final step in the medaka genome project, the

isolation of ESTs will also be accelerated at the NIG to assist

in finding and annotating genes in a draft genome sequence

of the medaka. Together with the above projects, an

initiative has been launched within the medaka community,

the Medaka Genome Initiative (Wittbrodt et al., 2002; http://

medaka.dsp.jst.go.jp/MGI/) toward the complete sequen-

cing of the medaka genome.

9. Conclusions

The generation of hundreds of medaka mutants and the

unique phenotypes observed in some make this organism an

attractive model system for vertebrate genetics and

genomics that would nicely complement the zebrafish

(Wittbrodt et al., 2002; Naruse et al., 2004). As described

above, medaka genomic tools have been sufficiently

developed to permit rapid identification of mutated genes

(see Table 3). This involves the efficient determination of

approximate LG map positions by BSA with the M-marker

set, followed by fine mapping using thousands of mapped

markers (ESTs and SNPs), and finally, rapid identification

of candidate genes using existing information from other

vertebrate genomes. Genomic tools either are already in use

or will be available shortly, and there is no doubt that

the draft genome sequence of the medaka will greatly

accelerate these studies. Furthermore, whole-genome com-

parisons between zebrafish, Fugu, medaka and mammalian

draft genome sequences will provide novel insights into the

diversification of fish species during evolution and shed

light on vertebrate genome evolution.

Acknowledgements

The work presented here was supported in part by

Grants-in-Aid for Scientific Research Priority Area ‘Gen-

ome Science (K.N., Y.K., H.H., H.T.)’ and ‘Study of

Medaka as a Model for Organization and Evolution of the

Nuclear Genome (K.N.)’, and ‘Organized Research Com-

bination System (H.T.)’ from the Ministry of Education,

Culture, Sports, Science and Technology of Japan.

References

Aida, T., 1921. On the inheritance of colour in a freshwater fish,

Aplocheilus latipes Temminck and Schlegel, with special reference to

sex-linked inheritance. Genetics 6, 554–573.

Amores, A., Force, A., Yan, Y.L., Joly, L., Amemiya, C., Fritz, A., et al.,

1998. Zebrafish hox clusters and vertebrate genome evolution. Science

282, 1711–1714.

Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P.,

et al., 2002. Whole-genome shotgun assembly and analysis of the

genome of Fugu rubripes. Science 297, 1301–1310.

Bouck, J., Miller, W., Gorrell, J.H., Muzny, D., Gibbs, R.A., 1998. Analysis

of the quality and utility of random shotgun sequencing at low

redundancies. Genome Res. 8, 1074–1084.

Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., Postlethwait, J.,

1999. Preservation of duplicate genes by complementary, degenerative

mutations. Genetics 151, 1531–1545.

Fujiyama, A., Watanabe, H., Toyoda, A., Taylor, T.D., Itoh, T., Tsai, S.F.,

et al., 2002. Construction and analysis of a human-chimpanzee

comparative clone map. Science 295, 131–134.

Fukamachi, S., Shimada, A., Shima, A., 2001. Mutations in the gene

encoding B, a novel transporter protein, reduce melanin content in

medaka. Nat. Genetics 28, 381–385.

Furutani-Seiki, M., Sasado, T., Morinaga, C., Suwa, H., Niwa, K., Yoda,

H., et al., 2004. A systematic genome-wide screen for mutations

affecting organogenesis in Medaka, Oryzias latipes. Mech. Dev. 121,

647–658.

Hedges, S.B., Kumar, S., 2002. Vertebrate genome compared. Science 297,

1283–1285.

Hyodo-Taguchi, Y., 1996. Inbred strains of the medaka, Oryzias

latipes.The Fish Biol. J. Medaka 8, 11–14.

Ishikawa, Y., 2000. Medakafish as a model system for vertebrate

developmental genetics. Bioessays 22, 487–495.

Jurinke, C., van den Boom, D., Cantor, C.R., Koster, H., 2002. Automated

Genotyping using DNA MassARRAY Technology. Methods in Mol.

Biol. 187, 179–192.

Kimura, T., Jindo, T., Narita, T., Naruse, K., Kobayashi, D., Shin-I, T.,

et al., 2004. Large-scale isolation of ESTs from medaka embryos and its

application to medaka developmental genetics. Mech. Dev. 121,

915–932.

Kondo, M., Froschauer, A., Kitano, A., Nanda, I., Hornung, U., Volff, J.,

et al., 2002. Molecular cloning and characterization of DMRT genes

from the medaka Oryzias latipes and the platyfish Xiphophorus

maculatus. Gene 295, 213–222.

Kumar, S., Hedges, S.B., 1998. A molecular timescale for vertebrate

evolution. Nature 392, 917–920.

Lamatsch, D.K., Steinlein, C., Schmid, M., Schartl, M., 2000. Noninvasive

determination of genome size and ploidy level in fishes by flow

cytometry: detection of triploid Poecilia formosa. Cytometry 36,

91–95.

Loosli, F., Koster, R.W., Carl, M., Kuhnlein, R., Henrich, T., Mucke, M.,

et al., 2000. A genetic screen for mutations affecting embryonic

development in medaka fish (Oryzias latipes). Mech. Dev. 97,

133–139.

Loosli, F., Winkler, S., Burgtorf, C., Wurmbach, E., Ansorge, W., Henrich,

T., et al., 2001. Medaka eyeless is the key factor linking retinal

determination and eye growth. Development 128, 4035–4044.

Martinez-Morales, J., Naruse, K., Mitani, H., Shima, A., Wittbrodt, J.,

2004. Rapid chromosomal assignment of Medaka mutant by bulked

segregation analysis. Gene 329, 159–165.

Matsuda, M., Kawato, N., Asakawa, S., Shimizu, N., Nagahama, Y.,

Hamaguchi, S., et al., 2001. Construction of a BAC library derived from

the inbred Hd-rR strain of the teleost fish, Oryzias latipes. Genes Genet.

Syst. 76, 61–63.

Matsuda, M., Nagahama, Y., Shinomiya, A., Sato, T., Matsuda, C.,

Kobayashi, T., et al., 2002. DMY is a Y-specific DM-domain gene

required for male development in the medaka fish. Nature 417,

559–563.

Miya, M., Takeshima, H., Endo, H., Ishiguro, N.B., Inoue, J.G., Mukai, T.,

et al., 2003. Major Patterns of higher teleostean phylogenies: a new

perspective based on 100 complete mitochondrial DNA sequences.

Mol. Phylogenet. Evol. 26, 121–138.

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628 627

Nanda, I., Kondo, M., Hornung, U., Asakawa, S., Winkler, C., Shimizu, A.,

et al., 2002. A duplicated copy of DMRT1 in the sex-determining region

of the Y chromosome of the medaka, Oryzias latipes. Proc. Natl Acad.

Sci. 99, 11778–11783.

Naruse, K., Fukamachi, S., Mitani, H., Kondo, M., Matsuoka, T., Kondo,

S., et al., 2000. A detailed linkage map of medaka, Oryzias latipes:

comparative genomics and genome evolution. Genetics 154,

1773–1784.

Naruse, K., Tanaka, M., Mita, K., Shima, A., Postlethwait, J., Mitani, H.,

2004. Medaka gene map: the trace of ancestral vertebrate proto-

chromosomes revealed by comparative gene mapping. Genome Res.

14, 820–828.

Nelson, J.S., 1994. Fishes of the World, 3rd ed, Wiley, New York, NY.

Ohtsuka, M., Makino, S., Yoda, K., Wada, H., Naruse, K., Mitani, H., et al.,

1999. Construction of a linkage map of the medaka (Oryzias latipes)

and mapping of the Da mutant locus defective in dorsoventral

patterning. Genome Res. 9, 1277–1287.

Postlethwait, J.H., Woods, I.G., Ngo-Hazelett, P., Yan, Y.L., Kelly, P.D.,

Chu, F., et al., 2000. Zebrafish comparative genomics and the origins of

vertebrate chromosomes. Genome Res. 10, 1890–1902.

Sakaizumi, M., 1984. Rigid isolation between the Norhern population and

the southern population of the medaka, Oryzias latipes. Zool. Sci. 1,

795–800.

Sakaizumi, M., Joen, S.R., 1987. Two divergent groups in the wild

population of medaka Oryzias latipes (Pisces: Oryziatidae) in Korea.

Korean J. Limnol. 20, 13–20.

Shima, A., Mitani, H., 2004. Medaka as a research organism: past, present

and future. Mech. Dev. 121, 599–604.

Shimada, A., Shima, A., 1988. Combination of genomic DNA fingerprint-

ing into the medaka specific-locus test system for studying environ-

mental germ-line mutagenesis. Mutation Res. 399, 149–165.

Takehana, Y., Nagai, N., Matsuda, M., Tsuchiya, K., Sakaizumi, M., 2003.

Geographic variation and diversity of the cytochrome b gene in

Japanese wild populations of medaka, Oryzias latipes. Zool. Sci. 20,

1279–1291.

Taylor, J.S., Braasch, I., Frickey, T., Meyer, A., Van de Peer, Y., 2003.

Genome duplication, a trait shared by 22000 species of ray-finned fish.

Genome Res. 13, 382–390.

Uwa, H., Iwata, A., 1981. Karyotype and cellular DNA content of Oryzias

javanicus (Oryziatidae, Pisces). Chromosome Info. Service 31, 24–26.

Wade, C.M., Kulbokas, E.J. III, Kirby, A.W., Zody, M.C., Mullikin, J.C.,

Lander, E.S., et al., 2002. The mosaic structure of variation in the

laboratory mouse genome. Nature 420, 574–578.

Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F.,

Agarwal, P., et al., 2002. Initial sequencing and comparative analysis of

the mouse genome. Nature 420, 520–562.

Winkler, S., Loosli, F., Henrich, T., Wakamatsu, Y., Wittbrodt, J., 2000.

The conditional medaka mutation eyeless uncouples patterning and

morphogenesis of the eye. Development 127, 1911–1919.

Wittbrodt, J., Meyer, A., Schartl, M., 1998. More genes in fish? BioEssay

20, 511–515.

Wittbrodt, J., Shima, A., Schartl, M., 2002. Medaka-a model organism from

the Far East. Nature Rev. Genet. 3, 53–64.

Yamamoto, T., 1953. Artificially induced sex-reversal in genotyping males

of medaka (Oryzias latipes). J. Exp. Zool. 123, 571–594.

Yamamoto, T., 1958. Artificial induction of functional sex-reversal in

genotypic females of the Medaka (Oryzias latipes). J. Exp. Zool. 137,

227–264.

Zadeh Khorasani, M., Hennig, S., Imre, G., Asakawa, S., Palczewski, S.,

Berger, A., et al., 2004. A first generation physical map of the medaka

genome in BACs essential for positional cloning and clone-by-clone

based genomic sequencing. Mech. Dev. 121, 903–913.

K. Naruse et al. / Mechanisms of Development 121 (2004) 619–628628