Mining concept maps from news stories for measuring civic scientific literacy in media

13
Mining concept maps from news stories for measuring civic scientific literacy in media Yuen-Hsien Tseng a, * , Chun-Yen Chang b , Shu-Nu Chang Rundgren c , Carl-Johan Rundgren d a Information Technology Center, National Taiwan Normal University, Taiwan b Science Education Center & Graduate Institute of Science Education & Department of Earth Sciences, National Taiwan Normal University, Taiwan c Swedish National Graduate School in Science and Technology Education Research, Department of Social and Welfare Studies, Linköping University, Norrköping, Sweden d Department of Thematic Studies, Linköping University, Norrköping, Sweden article info Article history: Received 9 November 2009 Received in revised form 6 January 2010 Accepted 10 January 2010 Keywords: Media in education Evaluation methodologies Lifelong learning abstract Motivated by a long-term goal in education for measuring Taiwanese civic scientific literacy in media (SLiM), this work reports the detailed techniques to efficiently mine a concept map from 2 years of Chi- nese news articles (901,446 in total) for SLiM instrument development. From the Chinese news stories, key terms (important words or phrases), known or new to existing lexicons, were first extracted by a sim- ple, yet effective, rule-based algorithm. They were subjected to an association analysis based on their co- occurrence in sentences to reveal their term-to-term relationship. A given list of 3657 index terms from science textbooks were then matched against the term association network. The resulting term network (including 95 scientific terms) was visualized in a concept map to scaffold the instrument developers. When developing an item, the linked term pair not only suggests the topic for the item due to the clear context being mutually reinforced by each other, but also the content itself because of the rich back- ground provided by the recurrent snippets in which they co-occur. In this way, the resulting instrument (comprised of 50 items) reflect the scientific knowledge revealed in the daily news stories, meeting the goal for measuring civic scientific literacy in media. In addition, the concept map mined from the texts served as a convenient tool for item classification, developer collaboration, and expert review and discussion. Ó 2010 Elsevier Ltd. All rights reserved. 1. Introduction Promoting scientific literacy is one of the ultimate goals of scientific education. In fact, the level of scientific knowledge of the general population is of great concern to governmental decision makers. In the United States, the National Science Foundation (NSF), the National Academy of Sciences, and various academic scholars have pointed out that a scientifically literate population is needed for democratic pro- cesses to properly take place in a society that is more and more technologically demanding (Brossard & Shanahan, 2006). In Taiwan, sci- entific literacy including knowledge, skill, and attitude dimensions was a long-term concern from teachers’ perspectives (Chang, Chang, & Yang, 2009). All these concerns pose questions to science educators that need to be considered from time to time: how to create a science curriculum that is meaningful for all individuals? Or more specifically, how to measure the civic scientific literacy for feedback on the cur- riculum design and in future strategic planning? To answer the above questions, we took an intuitive view from the observations that follow. People in modern society come into contact with knowledge about science and technology through various media. Initially as students, most people acquire their knowledge about science from school textbooks. Later in life, after graduation, media becomes the major source of an individual’s knowledge about science. Furthermore, as De Boer (2000) states, a scientifically literate citizen ought to have the ability to critically read and discuss the scientific reports presented in media. This shows the importance of taking media into account when thinking about achieving the goal of scientific literacy. Accordingly, when evaluating the public’s scientific literacy, the concepts presented both in science textbooks and in media must be emphasized. These are our sources and starting points for developing an appropriate instrument to evaluate civic scientific literacy in media. 0360-1315/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.compedu.2010.01.002 * Corresponding author. Tel.: +886 2 77345535; fax: +886 2 23222009. E-mail addresses: [email protected] (Y.-H. Tseng), [email protected] (C.-Y. Chang), [email protected] (S.-N. Chang Rundgren), carl-johan.a.rundgren@ liu.se (C.-J. Rundgren). Computers & Education 55 (2010) 165–177 Contents lists available at ScienceDirect Computers & Education journal homepage: www.elsevier.com/locate/compedu

Transcript of Mining concept maps from news stories for measuring civic scientific literacy in media

Computers & Education 55 (2010) 165–177

Contents lists available at ScienceDirect

Computers & Education

journal homepage: www.elsevier .com/locate /compedu

Mining concept maps from news stories for measuring civic scientificliteracy in media

Yuen-Hsien Tseng a,*, Chun-Yen Chang b, Shu-Nu Chang Rundgren c, Carl-Johan Rundgren d

a Information Technology Center, National Taiwan Normal University, Taiwanb Science Education Center & Graduate Institute of Science Education & Department of Earth Sciences, National Taiwan Normal University, Taiwanc Swedish National Graduate School in Science and Technology Education Research, Department of Social and Welfare Studies, Linköping University, Norrköping, Swedend Department of Thematic Studies, Linköping University, Norrköping, Sweden

a r t i c l e i n f o

Article history:Received 9 November 2009Received in revised form 6 January 2010Accepted 10 January 2010

Keywords:Media in educationEvaluation methodologiesLifelong learning

0360-1315/$ - see front matter � 2010 Elsevier Ltd. Adoi:10.1016/j.compedu.2010.01.002

* Corresponding author. Tel.: +886 2 77345535; faxE-mail addresses: [email protected] (Y.-H. T

liu.se (C.-J. Rundgren).

a b s t r a c t

Motivated by a long-term goal in education for measuring Taiwanese civic scientific literacy in media(SLiM), this work reports the detailed techniques to efficiently mine a concept map from 2 years of Chi-nese news articles (901,446 in total) for SLiM instrument development. From the Chinese news stories,key terms (important words or phrases), known or new to existing lexicons, were first extracted by a sim-ple, yet effective, rule-based algorithm. They were subjected to an association analysis based on their co-occurrence in sentences to reveal their term-to-term relationship. A given list of 3657 index terms fromscience textbooks were then matched against the term association network. The resulting term network(including 95 scientific terms) was visualized in a concept map to scaffold the instrument developers.When developing an item, the linked term pair not only suggests the topic for the item due to the clearcontext being mutually reinforced by each other, but also the content itself because of the rich back-ground provided by the recurrent snippets in which they co-occur. In this way, the resulting instrument(comprised of 50 items) reflect the scientific knowledge revealed in the daily news stories, meeting thegoal for measuring civic scientific literacy in media. In addition, the concept map mined from the textsserved as a convenient tool for item classification, developer collaboration, and expert review anddiscussion.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Promoting scientific literacy is one of the ultimate goals of scientific education. In fact, the level of scientific knowledge of the generalpopulation is of great concern to governmental decision makers. In the United States, the National Science Foundation (NSF), the NationalAcademy of Sciences, and various academic scholars have pointed out that a scientifically literate population is needed for democratic pro-cesses to properly take place in a society that is more and more technologically demanding (Brossard & Shanahan, 2006). In Taiwan, sci-entific literacy including knowledge, skill, and attitude dimensions was a long-term concern from teachers’ perspectives (Chang, Chang, &Yang, 2009). All these concerns pose questions to science educators that need to be considered from time to time: how to create a sciencecurriculum that is meaningful for all individuals? Or more specifically, how to measure the civic scientific literacy for feedback on the cur-riculum design and in future strategic planning?

To answer the above questions, we took an intuitive view from the observations that follow. People in modern society come into contactwith knowledge about science and technology through various media. Initially as students, most people acquire their knowledge aboutscience from school textbooks. Later in life, after graduation, media becomes the major source of an individual’s knowledge about science.Furthermore, as De Boer (2000) states, a scientifically literate citizen ought to have the ability to critically read and discuss the scientificreports presented in media. This shows the importance of taking media into account when thinking about achieving the goal of scientificliteracy. Accordingly, when evaluating the public’s scientific literacy, the concepts presented both in science textbooks and in media mustbe emphasized. These are our sources and starting points for developing an appropriate instrument to evaluate civic scientific literacy inmedia.

ll rights reserved.

: +886 2 23222009.seng), [email protected] (C.-Y. Chang), [email protected] (S.-N. Chang Rundgren), carl-johan.a.rundgren@

166 Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177

The idea of using media as a source providing ideas for developing science curricula is, however, not new. Long before the era of com-puters, Hopkins (1925) analyzed 2770 articles, and concluded that biology content was more prevalent in newspapers and magazines thancontent from the other science subjects. The same tendency was found by Martin (1945), who also suggested that these results might betaken into account when reorganizing the curriculum. In the sixties, Koelsche (1965) made an attempt to extract terms that citizens shouldbe able to know to qualify as scientifically literate from media.

In the computer era, media is more prevalent than before in terms of volume, format, and penetration. Extracting scientific conceptsfrom the media, as was done in the past would be a very time consuming process. Thanks to the text mining techniques and knowledgerepresentation frameworks developed in recent decades, this task can be done in such a way that mining concepts and their relations inmedia, for suggesting instrument or curriculum development, can be exercised on a timely and periodical base.

This work is to report the detailed techniques that we used in developing an instrument aimed for measuring Taiwanese civic scientificliteracy in media (acronym as SLiM). Given a collection of Chinese news stories in digital format, key terms (important words or phrases),known or new to existing lexicons, were first extracted by a simple, yet effective, rule-based algorithm. They were subjected to an asso-ciation analysis based on their co-occurrence in sentences to reveal their term-to-term relationship. A given list of index terms from sciencetextbooks were then matched against the term association network. The resulting term network containing only the scientific terms wasvisualized in a concept map so as to scaffold the instrument developers. Each instrument item was then developed based on the group ofterms having links among them.

When developing an item, the linked term pair not only suggests the topic of the item because of the clear context mutually reinforcedby each other, but also the content itself since the recurrent snippets in which they co-occur provide rich background for item develop-ment. The items were then subjected to an expert review and pilot test process to yield a final instrument. In this way, the resulting instru-ment reflects the scientific knowledge revealed in the daily news stories, which meets the goal of measuring civic scientific literacy inmedia.

The idea of the SLiM instrument is actually inspired by the previous studies to be reviewed in the next section. However, the way ofdeveloping the instrument is motivated by the concept map framework and recent text mining technology, both of which are also dis-cussed in the next section. Section 3 introduces our general approach to concept map mining from texts, which involves two importantsteps: key terms extraction and term association analysis. The details of these two techniques are described in Sections 4 and 5. Examplesand evaluation of the techniques are presented at the end of those two sections. Section 6 presents the development of the SLiM instru-ment, where the benefits of concept map driven item development and the validation of the instrument are reported. Finally Section 7concludes this paper.

2. Related work

The section reviews the work of science literacy instrument development and automatic concept map mining to provide the backgroundinformation and comparative material for this work. Since we have not found any literature dealing with these two subjects together, theyare presented individually in the following sub-sections.

2.1. Science literacy instrument development

In addition to the studies of (Hopkins, 1925) and (Koelsche, 1965) mentioned above, Brossard and Shanahan (2006) proposed a meth-odology to build a science literacy instrument based on science media. Their work is the most relevant to our current study and is sum-marized below.

Brossard and Shanahan first chose Oxford Dictionary of Science as a source to establish the scientific terms to be included. The dictio-nary consisted of about 9000 terms covering biology, chemistry, physics, the earth sciences, and astronomy. A systematic random samplingwas used to select a term from every page of the dictionary. By excluding strictly mathematical terms (and concepts) and terms that mightbe used in contexts other than scientific, a list of 896 terms was obtained.

They then used these terms as keywords to search the ‘‘major newspapers” category of the Lexis–Nexis database to assess how ofteneach term appeared in the total corpus of major newspapers. The top 5% most frequently cited terms were selected.

For each of the selected term, they developed a ‘‘Fill-in-the-blank” item, with the scientific term left as a blank and its definition ordescription presented as its context. They believed that this item format eliminates the risk for correct answers due to chance while drivingout the shortcomings linked to the use of open-ended questions. Finally there were 31 items randomly ordered in the instrument.

One hundred twenty undergraduate students from communication classes at a university were asked to complete the instrument intheir pilot test. The results were compared with those based on the National Science Foundation’s scale, which is the most widely citedpublic scientific literacy measurement instrument based on a more conventional approach to scientific literacy.

As a comparison, our approach differs from theirs in three important ways: (1) The source of scientific terms is from the high schooltextbooks rather than from a scientific dictionary, providing a closer correlation to the learning experiences of the general public. (2)The media is not only used to learn the term distribution (leaving solely the dictionary definition or description as the item content),but also used as the hints for the item itself, probing better whether the reader understands the meaning and use of the terms in a dai-ly-life context. (3) Our items were developed based on a concept map, where concept relationship is an important dimension for scienceliteracy, thus covering wider aspects than those obtained from single term driven test items.

2.2. Concept map mining

In recent decades, the use of network graph to represent knowledge of a focus topic has gained increasing attention from various fields.This knowledge representation framework denotes terms, concepts, or basic knowledge elements by nodes in the graph, and facts, prop-ositions, or associations by links which connect two or more nodes to reflect their relations. Examples of this kind of frameworks includeconcept maps (Novak, 1990, 1995), knowledge maps (Herl, O’Neil, Chung, & Schacter, 1999; O’Donnell, Dansereau, & Hall, 2002), and topic

Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177 167

maps (Dicheva & Dichev, 2006; Pepper, 2000). Although these maps share many similarities in representing knowledge, they differ fromeach others in the detailed network specifications and application objectives.

In the educational field, concept maps proposed by Novak & Gowin (1984) have received the most attention. They have been applied to awide variety of pedagogical tasks including teaching (Horton, 1993; Stewart, Vankirk, & Rowell, 1979), learning (Kinchin, 2000; Okebukola,1992), evaluation (Chang, 2007; Markham, Mintzes, & Jones, 1994; Stoddart, Abrams, Gasper, & Canaday, 2000; Tsai, Lin, & Yuan, 2001), andcurriculum design (Soyibo, 1995). Experiments from the corresponding applications often showed positive impacts.

A typical use of concept maps is either asking learners to map the concepts and their relations following a focus question, or presentinglearners with expert-built concept maps to scaffold learning. The former, also called concept mapping, encourages learners to build andexpress their own knowledge about what has been learned, giving the learners to reorganize and reinforce their learning, and, in the mean-time, giving the instructor the feedback on their levels of understanding. The later scaffolds the learners with a referential knowledge struc-ture to inspire subsequent learning. This scenario is often adopted for difficult topics, complex and ill-structured issues, or subjects dealingwith large volume of learning materials. With the help of computers, these two ways of applying concept maps can be combined to furtherfacilitate learning.

For examples, Chang, Sung, and Chen (2001) described a system that, given a predefined expert concept map, assisted students in con-cept mapping with a ‘‘construct-by-self” mode (i.e., concept mapping from scratch without any suggestion, only final matching with expertconcept maps for evaluation and feedback was allowed) and a ‘‘construct-on-scaffold” mode (partial expert maps, or hints for nodes andlinks, were provided during the mapping process). Their experiments showed that not only the ‘‘construct-on-scaffold” mode has a bettereffect for learning, but also the hints and feedback are desired functions to most respondents.

However, expert skeleton maps, or concept maps that have the referential or scaffolding purposes, are not yet available for the ever-increasing learning materials of today. Although the Cmap tool (Cañas et al., 2004), a freely accessible Web-based concept mapping system,had already incorporated the above combined features and solicited experts to help build expert concept maps for various subjects, theneed for scaffolding concept maps is still there, especially when it comes to learning knowledge from a large volume of natural languagetexts (e.g., (Chen, Kinshuk, Wei, & Chen, 2008)).

Automatic concept map mining from texts to meet the above need has been a research goal in recent studies. Oliveira, Pereira, and Car-doso (2001) presented a TextStorm system that can extract predicates showing relations between two words from a natural language sen-tence. The system used WordNet to find all parts of speech to which a word may belong and then parsed the result using augmentedgrammar. The extracted predicates were then sent to a Clouds system which is a concept map constructor that uses machine learning in-spired algorithms to complete the map through dialogue with the user. Evaluation of TextStorm on 21 small text files of size varying be-tween 319 and 12,557 bytes showed that the correct rate for the extracted predicates reached an average of 52%.

Clariana and Koul (2004) proposed a term association method based on an ALA-Reader software utility to convert students’ text sum-maries into concept map-like representation. The ALA-Reader uses a list of manual selected terms (maximum 30 terms) to look for the co-occurrences of these terms in each sentence of the summary. A term-to-term co-occurrence matrix was built for each sentence and thenaggregated across all the sentences into a single one. The resultant matrix from each student’s summary was then fed to a graph analysisand mapping tool for comparison and visualization. To know whether the proposed representations actually capture the vital content prop-ositions in the written text, they compared the map matching scores to the scores of eleven pairs of human raters. The ALA-Reader ap-proach obtained scores that were moderately related to the combined text score (Pearson r = 0.69) and ranked 5th overall.

In Valerio & Leake (2006, 2008), the authors presented a series of concept map extraction methods for bootstrapping concept mappingand for associating documents with most appropriate concept maps. Their major steps included: (1) sentence parsing to tag each wordwith its part of speech and to produce a parse tree for revealing the dependencies between the constituents; (2) word normalization byPorter’s stemming algorithm (Porter, 1980) and WordNet’s synonym relations to obtain equivalence classes of words; (3) concept extrac-tion based on noun phrase identification; and (4) linking phrase extraction by extracting all pairs of concepts that have an indirect depen-dency link through a verb phrase. They did not directly evaluate the quality of the extracted concept maps. In (Valerio & Leake, 2006), onlythe effectiveness of the concept extraction was evaluated and their results suggested that using noun phrases improved performance com-pared to using individual words and that stemming increased overall performance. In (Valerio & Leake, 2008), the proposed new methodthat extracted concept maps from documents helped associate the documents to existing concept maps better than a baseline algorithmthat construct its document vector representation solely based on keyword frequency.

Chen, Kinshuk, Wei, & Chen (2008) extracted concept maps by mining a set of academic articles. Author keywords were used as conceptcandidates. They were then subjected to a normalization process including keyword cleansing (removing unnecessary dashes, hyphens,and spaces), acronym mapping, and stemming. Principle Component Analysis was then applied to select representative research keywords.Next, relation strength was calculated based on the degree of correlation between two selected keywords, taking into account the distanceof the two keywords and the number of times they co-occurred in the same sentence. The results were visualized by the WebDot graphicalsoftware. The mined concept maps were intended to be used by novice researchers to explore the knowledge domain represented by theacademic articles that were mined. Two experts were invited to evaluate the quality of the top 30 importance concepts and the relationship(with strength) among the concepts. From their questionnaire, the experts agreed with the facts that the relationship among concepts andthe overall fitness reached up to 80% of their professional knowledge and that the ranking of concepts was 70% compliant with theirknowledge.

However, as pointed out by Villalon and Calvo (2008), the above studies expose two issues: (1) overly simplistic and varying definitionsof the concept maps extracted, and (2) the lack of an evaluation framework that can be used to measure the quality of the generated mapsamong different techniques. They then attempted to define the idea of concept map mining from texts and propose an evaluation frame-work for performance comparison based on the difference between human crated and machine generated concept maps, with reference tothe inter-human difference.

Instead of mining concept maps from texts, Tseng, Sue, Su, Weng, and Tsai (2007) proposed an approach to construct the map fromlearners’ historical testing records. Given an item-concept correspondence table indicating which concepts are tested in which items,and a testing record showing every learner’s score for each item, they used fuzzy association rule mining to extract predicates of the form:‘‘ItemID1:performance1 implies ItemID2:performance2” to yield a dependence graph among the items. For example, the predicate ‘‘I6:Lowimplies I9:Low” indicates that item 6 is a prerequisite of item 9 (or I9 depends on I6) because the rule says that for the learners who got low

168 Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177

score on item 6, most of them also got low score on item 9. By substituting the items with their corresponding concepts in the dependencygraph, a concept dependency map was constructed. This concept map can be used to facilitate misconception diagnosis, customized coursedesign, and adaptive learning. No experiment was conducted to evaluate the quality of the resulting map. But a series of intriguing exam-ples clarified the method and the level of result that can be achieved.

In summary, the methods proposed by Oliveira et al. (2001) and Valerio and Leake (2006, 2008) applied natural language technologythat is able to extract concept maps conforming to those proposed by Novak. The extracted maps can be used directly to guide or aid learn-ers during concept mapping. In contrast, the algorithms presented by Chen et al. (2008) and Clariana and Koul (2004) applied statisticalmethods and resulted in concept maps similarly to the ‘‘standard” ones. The mined concept maps can be used as scaffolds to allow efficientexploration of the document content. Finally, Tseng et al. (2007) described a distinct concept map mined from the historical testing records.Although not based on text content, their method can be used after our concept map driven instrument has been tested. This would yield acomplementary dependency map which reflects the scientific concept dependency perceived by the general public.

3. Mining concept maps from texts

As shown in the previous review, mining concept maps from texts could adopt either purely statistical approaches or those based onnatural language processing techniques. The later may lead to less ambiguous relations; however it is more language-dependent and re-quires more manually maintained resources than the former. To allow processing English and Chinese texts (and possibly others, like Swe-den) in the same way, we propose a special statistical approach not seen in previous concept map mining.

Fig. 1 shows the flow chart of our approach. We assume a collection of text documents each having a coherent topic is provided.Although our target documents are news stories, the proposed method is general enough to deal with research articles, technical reports,textbook chapters, or any learning material with focus topics. In addition, the texts can be in English or Chinese.

Having these documents, the first row in the figure mainly deals with the extraction of key terms from each of them, where the ex-tracted key terms are recurrent words or phrases meant to denote the topical content of the document. The second row deals with theassociation for each key term pair, where the association calculated from each document was accumulated over all the documents for reli-able relation extraction.

The technical details are described in the following sections. The rationales behind these techniques are discussed. Some examples aregiven for better presenting the proposed ideas. Evaluation of major techniques is provided to show their effectiveness.

4. Key term extraction

To extract key terms from a text document, the basic language elements need to be identified first, which corresponds to the tokeni-zation step in Fig. 1. In English, the basic element is a word, separated by non-alphabet tokens, such as white spaces and punctuationmarks. There are exceptions, especially for abbreviations, digits, and some name entities. In Chinese, the basic element is a character, en-coded in a two-byte code. There are tens of thousands of Chinese characters, about 2000–3000 of which are in common use. Since Chineseis a hieroglyph language, each Chinese character has its own meanings and can be written in sequence without any spaces in-between. AlsoChinese text may contain some English terminology, in which each letter may be either presented by normal English alphabet or encodedin the Chinese two-byte code. All this knowledge should be crafted into a set of rules for proper tokenization.

Next, English words are subjected to morphological analysis in order to get their normalized representations for subsequent processing(e.g., correct occurrence counting). This often means lowercasing (or uppercasing) and stemming the words. To eliminating the variations aword may have, we use the Porter’s stemming algorithm (Porter, 1980) which contains a set of English morphological rules and a set ofcodes to apply the rules. However, the algorithm is so aggressive in removing the word’s suffix such that stemmed words become hardto read (e.g., ‘‘useful” becomes ‘‘us”). We therefore modify the algorithm to stem only simple plurals and general suffixes (such as regularpaste tense). In Chinese, morphological variants are extremely rare, so the texts can go directly to the next step.

Chinese words vary in length from a single character to nine or even more characters, with an average between two and three charac-ters. A Chinese word and its constituent characters or sub-words have their own meanings – so it is sometimes ambiguous to knowwhether a Chinese term is a word or a multi-word phrase. In English, single words alone are often too general in meanings or ambiguousto represent a concept. Multi-word phrases can be more specific and desirable. Both cases suggest that multi-word phrases should be ex-

Text documents

Tokenization

Tokenization rules

Term extraction

Stop word list

Term selectionMapping and visualization

Association accumulation

Lowercasing and stemming

Morphological rules

Term filtering

Term association

analysis

Fig. 1. The flow chart of the proposed concept map mining from texts.

Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177 169

tracted as key terms, in addition to identifying important single words. However, key terms have no lexical boundaries in texts, makingthem a challenge to identify. Other studies had used statistical approaches based on corpus-wide statistics (Chien, 1997), while ours, de-rived from (Tseng, 2002), applies a simple and fast rule-based extraction algorithm suitable for individual documents. The algorithm workswith the help of a stop word list alone. Other resources, such as corpora, lexicons, or dictionaries are not required. As such, it is readilyapplicable to any knowledge domains without much parameter tuning or resource acquisition.

The judgement of a ‘‘key” term is more or less subjective. To facilitate automatic extraction, we define them in an objective way as recur-rent strings in a document. More specifically, the algorithm assumes that a document concentrating on a topic is likely to mention a set ofstrings a number of times. A longest repeated string is thus often a correct key term, because its repetition provides sufficient evidence forthe decision on its left and right boundaries. Similarly, a repeated string that subsumes the others may also be a legal key term. For exam-ple, ‘‘science curriculum” subsumes ‘‘science curriculum reform” literally and also statistically if the former occurs more often than thelater (which is often the case).

Based on this assumption, Fig. 2 shows the algorithm. Step 1 in the algorithm converts the input text string into an ordered list witheach element being an English word or a Chinese character based on the above tokenization and morphological analysis processes. Occur-rence frequency is accumulated for any list element produced in the algorithm. In Step 2, elements in the list are merged, dropped, or ac-cepted iteratively by the corresponding merging, dropping, and accepting rules until no enough elements remained to be processed. Twoadjacent elements in the list are merged back into a longer input substring if both of their frequencies exceed a predefined threshold (Step2.3.2, the merging rule). The first element of the pair is accepted as a key term candidate if it is a high frequency term and did not mergewith its preceding and following element (Step 2.3.3.1, the accepting rule 1). The first element of the adjacent pair is dropped if its fre-quency does not exceed the predefined threshold (the dropping rule 1). For those first elements that have been merged, Step 2.5 says thatthe first element is reclaimed and accepted as a key term candidate if the frequency of the merged pair does not exceed the predefinedthreshold (the accepting rule 2); otherwise it was dropped (the dropping rule 2).

In Step 3, the resultant candidates are subjected to a filtering process. A precision-oriented rule may remove candidates containing anystop words. A recall-oriented rule may only recursively delete the stop words from the head and tail of the candidate. Besides these, a hy-brid strategy can be used. For English, the stop words from (van Rijsbergen, 1979) are used. For Chinese, a list of 2680 stop words was accu-mulated each time we saw a non-semantic bearing term resulting from the algorithm. Since the stop list mainly contains the functionwords, which are prepositions, pronouns, auxiliary verbs, conjunctions, grammatical articles, or particles, it is easily maintainable becausethey belong to the group of closed-class words, which means that it is very uncommon to have new function words created in texts. Inother words, they are predictable such that they are likely to be collected exhaustively. In contrast, semantic bearing terms belong tothe group of open-class words, meaning that there may be an infinite number of them.

To better understand the algorithm, Fig. 3 shows a running example, in which each capital letter denotes an English word or a Chinesecharacter. Fig. 4 demonstrates the algorithm with a real document (the first paragraph of (Novak & Cañas, 2006)). As can be seen, the re-peated patterns contains many noisy terms. However, after removing those with some stop words in them, the remaining terms are gooddescriptors that denote the topic of the corresponding text.

The sources of errors of the above algorithm for Chinese mainly come from some single character auxiliary words that often occur to-gether with other words. Thus, before applying the above algorithm, the input text was segmented into a sequence of short words (includ-ing un-segmented characters) using a longest-match strategy based on a lexicon of 123,226 terms downloaded from the Internet (Peterson,1998).

To know the effectiveness of the algorithm, a collection of 100 Taiwan news articles was processed article by article. An average of 33key terms (terms that occur at least twice) in an article were extracted, in which an average of 11 terms (or 33%) of them were not coveredby the lexicon of 123,226 terms. The total distinct 954 new terms contains 79 illegal words, an error rate of 8.3%. Compared to the totaldistinct 2197 extracted key terms, the error rate is only 3.6%. The longest new terms contain nine characters, while the shortest ones

1. Convert the input into a LIST with each word (or character) as a list element. 2. Do Loop 2.1 Set MergeList to empty. 2.2 Put a separator to the end of LIST as a sentinel and

set the occurring frequency of the separator to 0. 2.3 For I from 1 to NumOf(LIST) - 1, do 2.3.1 If LIST[ I ] is the separator, Go to Label 2.3. 2.3.2 If Freq(LIST[ I ]) > threshold and Freq(LIST[ I+1 ]) > threshold

then 2.3.2.1 Merge LIST[ I ] and LIST[ I+1 ] into Z 2.3.2.2 Put Z to the end of MergeList. 2.3.3 else 2.3.3.1 If Freq(LIST[ I ]) > threshold and

LIST[ I ] did not merge with LIST[ I - 1], then Save LIST[ I ] in FinalList.

2.3.3.2 If the last element of MergeList is not the separator, then Put the separator to the end of MergeList.

End of For loop 2.4 Set LIST to MergeList. 2.5 For each element Z in MergeList created in Step 2.3.2.1, restore the first part

LIST[ I ] from Z and save LIST[ I ] in FinalList if Freq(Z) <= threshold. Until NumOf(LIST) < 2.

3. Filter the candidates in the FinalList based on some criteria.

Fig. 2. The keyword extraction algorithm.

Example: Given an input string: BACDEAFCDGBACD. Let threshold=1, separator=x. Step 1: Create a list of single tokens:

LIST = (B:2, A:3, C:3, D:3, E:1, A:3, F:1, C:3, D:3, G:1, B:2, A:3, C:3, D:3, x) Step 2: After 1st iteration :

MergeList = (BA:2, AC:2, CD:3, x, CD:3, x, BA:2, AC:2, CD:3, x) FinalList = (A:3)

After 2nd iteration : MergeList = (BAC:2, ACD:2, x, BAC:2, ACD:2, x) FinalList = (A:3, CD:3)

After 3rd iteration : MergeList = (BACD:2, x, BACD:2, x) FinalList = (A:3, CD:3)

After 4th iteration : MergeList = (x) FinalList = (A:3, CD:3, BACD:2)

Fig. 3. A running example of the algorithm, where the number following a semicolon denotes the occurring frequency of the associated string.

The Theory Underlying Concept Maps and How to Construct Them. Concept maps are graphical tools for organizing and representing knowledge. They include concepts, usually enclosed in circles or boxes of some type, and relationships between concepts indicated by a connecting line linking two concepts. Words on the line, referred to as linking words or linking phrases, specify the relationship between the two concepts. We define co ncept as a perceived regularity in events or objects, or records of events or objects, designated by a label. The label for most concepts is a word, although sometimes we use symbols such as + or %, and sometimes more than one word is used. Propositions are statements about some object or event in the universe, either naturally occurring or constructed. Propositions contain two or more concepts connected using linking words or phrases to form a meaningful statement. Sometimes these are called semantic units, or units of meaning. Figure 1 shows an example of a concept map that describes the structure of concept maps and illustrates the above characteristics.

smret yek deretlif ehT detcartxe snrettap detaeper ehT1. or : 11 2. concept : 11 3. the : 8 4. a : 6 5. word : 5 6. and : 5 7. of : 5 8. concept map : 4 9. map : 4 10. link : 4 11. two : 3 12. in : 3

13. event : 3 14. sometime : 3 15. object : 3 16. are : 3 17. to : 3 18. as : 3 19. relationship between : 2 20. unit : 2 21. event or object : 2 22. statement : 2 23. for : 2 24. proposition : 2 25. label : 2

26. some : 2 27. we : 2 28. is : 2 29. by a : 2 30. two concept : 2 31. phrase : 2 32. connect : 2 33. link word or : 2 34. between : 2 35. concept map and : 2 36. more : 2 37. line : 2

1. concept : 11 2. word : 5 3. concept map : 4 4. map : 4 5. link : 4 6. event : 3 7. object : 3 8. unit : 2 9. event or object : 2 10. statement : 2 11. proposition : 2 12. label : 2 13. phrase : 2 14. connect : 2 15. line : 2

Fig. 4. A key term extraction example using the first paragraph of Novak & Cañas (2006). In the bottom row, the left column shows the result from Step 2 of the algorithm; theright column shows the result from Step 3.

170 Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177

contain two characters. In our breakdown analysis, the above (linear-time) longest-match pre-processing contributes to the effective-ness by an improvement of 10.1% in the accuracy rate, showing that the algorithm in Fig. 2 alone is effective enough for non-criticalapplications.

As to the efficiency of the algorithm, it takes O(kn) time complexity to extract at most n/(2k) key terms each having length up to k Eng-lish words or Chinese characters, where n is the length of the input string. In practice, the time required is far less than kn steps, as can beseen from Fig. 3. To our best knowledge, we have not seen any similar techniques that require the same level of resources to achieve similareffectiveness and efficiency.

5. Term association analysis

Traditional methods for term association analysis between any two terms are based on the number of times the two terms co-occur inthe same documents. Various indices or coefficients have been used to calculate the association, as shown in Fig. 5 (He, 1999; Salton, 1989).However, the time complexity is high. If there are m terms from n documents, the time complexity is in the order of O(nm2), where n stepsare required to calculate the co-occurrence of any of m2 term pairs.

I(Tj, Tk) = nj,k/min(nj, nk)

Inclusion Index

P(Tj, Tk) = (nj,k/ nj*nk)*n

Proximity Index

E(Tj, Tk) = nj,k/nj*nj,k/nk

Equivalence Index

Dice(Tj, Tk) = 2*nj,k/(nj+nk)

Dice Coefficient

Jaccard(Tj, Tk) = nj,k/(nj+nk-nj,k)

Jaccard Coefficient

Cos(Tj, Tk) = nj,k (nj*nk)

Cosine Similarity

Fig. 5. Various association measures for two terms, where nj,k denotes the number of documents in which both terms Tj and Tk occur, and nj (nk) denotes the number ofdocuments in which Tj (Tk) occurs regardless of Tk (Tj).

Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177 171

To deal with the time complexity problem, we limit the terms to be associated to those that co-occur in the same logical segments of asmaller text size, such as a sentence. Association weights are computed in this way for each document and then accumulated over all doc-uments. This changes it into a roughly O(nsk2) algorithm, where k is the average number of selected key terms per document and s is theaverage number of sentences in a document. As can be seen, the larger the m, the bigger the difference between O(nsk2) and O(nm2), be-cause k can be kept less than a constant and so can s by breaking large documents into smaller ones.

Specifically, key terms identified from each document are first sorted in decreasing order of their term frequencies, or other criterionsuch as TF�IDF (term frequency multiplied by the inverse document frequency) if the entire collection statistics are known in advance. Thenthe first k terms (k = 30 in our implementation) are selected for association analysis. A modified Dice coefficient was chosen to measure theassociation weights as:

1 For

wgtðTjðiÞ; TkðiÞÞ ¼2� sjkðiÞ

sjðiÞ þ skðiÞ� lnð1:72þ sðiÞÞ

where s(i) denotes the total number of sentences in document Di and sjk(i) denotes in document Di the number of sentences in which bothterms Tj and Tk occurs. Thus the first term in the above formula is simply the Dice coefficient. The second term ln(1.72 + s(i)), where ln is thenatural logarithm, is used to compensate for the weights of those terms in longer documents so that weights in documents of different lengthhave similar range of values. This is because longer documents tend to yield weaker Dice coefficients than those generated from the shorterones. Association weights larger than a threshold (1.0 in our implementation) are then accumulated over all the documents in the followingmanner:

simðTj; TkÞ ¼logðwk � n=nkÞ

logðnÞ �Xn

i¼1

wgtðTjðiÞ; TkðiÞÞ

where the first term in the above is a form of IDF, in which nk is the document frequency of term k and wk is the width of k (i.e., number ofconstituent words) to emphasize longer terms. Compared to the indices in Fig. 5, which are all symmetric, the IDF term make the above sim-ilarity asymmetric so that it reflects most real-world phenomena (e.g., in Earth science, Moon’s most associated term may be Earth, but notvisa versa).

Accumulation of the above association weights document by document can be efficiently carried out as the inverted index file in a re-trieval system is constructed for the entire collection. Specifically, association weights and associated terms from each document are accu-mulated for the extracted key terms just like the index terms accumulating their document frequencies and postings (i.e., document ID andTF for each document that the key term occurs). In this way, a global term relation structure can be obtained based on some fast indexingalgorithms in information retrieval (Frakes & Baeza-Yates, 1992; Salton, 1989).

To show the efficiency, our algorithm implemented in C++ for a collection of 25,230 Chinese news articles (a total of 40 MB pure texts)takes only 150 s to identify key terms, extract term relationship, and build the document index on a notebook computer with Intel T25002 GHz CPU and 2 GB RAM. In the resulting global term relation structure, there are 44,518 key terms consisting of a total of 191,674relations.

To know the general quality of the extracted association, 30 terms were selected from the key terms of the 25,230 Chinese news articles.Five assessors (all majored in library science) were invited for relevance judgment. For each key term, its top N (N = 50) associated termssorted in decreasing order of similarities were examined. Assessors were asked to judge whether the relationship between a key term andeach of its associated terms was relevant enough, based on their indexing training in library science. For example, the terms ‘Earth’ and‘eclipse’ were judged relevant to ‘Moon’, while ‘shadow’ was not. If they are not sure (mostly due to the lack of knowledge about the terms),they are advised to retrieve those documents that may explain their associations. The results show that in terms of percentage of related-ness, 69% associated terms were judged relevant to their key terms in average based on the asymmetric similarity (with the IDF termshown in the above formula). In contrast, this percentage is only 62% for the symmetric similarity (without the IDF term). As can be seen,the asymmetric similarity has superior effectiveness. In another similar experiment with a much larger collection (154,720 documents), thepercentage of relatedness increases to 78% for the asymmetric similarity. It turns out that the more the documents for analysis, the betterthe effectiveness of the term association analysis. This is a phenomenon commonly observed in robust data mining algorithms.

As an example, Fig. 6 shows an example for a Chinese news article, whose title reads ‘‘Islet cell transplantation – a major breakthroughfor diabetes treatment”. The 2-D graph representation is realized by the example codes under the Java installation directory. In the figure,the term ‘‘Islet cell” and ‘‘diabetes” are shown in italic font (and red1 color) and boldface (and blue color) in the text, respectively. The sen-tences in which they co-occur are underlined and hence their association weight is 2 � 4/(7 + 6) � ln(1.72 + 8) = 0.6154 � 3.2810 = 2.02 (herethe number of sentences in which a term occurs is approximated by its TF). The weight is greater than the threshold 1, so the relation isextracted from this document.

interpretation of color in Fig. 6, the reader is referred to the web version of this article.

Fig. 6. An example showing a Chinese news article, its key terms, and the term relations, where the title reads ‘‘Islet cell transplantation – a major breakthrough for diabetestreatment”. Note: for aesthetic reason, some lower-weight links between the key terms are deleted before visualization.

172 Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177

The above mining process in fact results in three data structures readily for building the concept maps: the key term list itself, the gen-eral term relation structure, and the inverted index file for retrieving the snippets related to any two co-occurring terms. Since these datastructures have the full information for each individual document, they can be used for concept mapping a single document, a set of them,or the whole collection to scaffold those who want to understand the document content in an efficient and effective way. Next sectionwould apply the idea of concept map mining from the whole document collection to help develop the desired SLiM instrument.

6. SLiM instrument construction

As mentioned, the basic idea for developing the instrument to investigate the public’s SLiM is to use the matched scientific terms fromboth of textbooks and news media. The scientific terms from textbooks were based upon the complete indexes of four Taiwanese juniorhigh textbooks (2001 version) about nature, life, and technology from four different publishers. A total of 3657 index terms were listedfrom the textbooks. These indexes are provided by the National Institute for Compilation and Translation in Taiwan.

The media terms were based on the complete news reports (covering all subjects) of a Taiwanese newspaper, United Daily News, duringyear 2001 to 2002 (i.e., corresponding to the time of the printing of the textbooks). In total, 901,446 documents (1 GB pure texts in total)starting from 2000/01/01 to 2001/12/31 were subjected to the processes of key term identification, term relation extraction, and documentindexing. This results in 1082,937 key terms (including scientific and non-scientific terms). After filtering those without associated terms,323,918 key terms remained, which were normally topic-relevant. In contrast, those been removed were often rare terms or outliers. Fi-nally, the key terms were matched against with the 3657 textbook terms, resulting in a list of 876 terms.

To obtain a manageable set of terms for SLiM instrument development, the most frequent 100 terms (each occurs in at least 2950 newsarticles) from the 876 terms were examined together with their related terms by two experts. After excluding non-scientific terms (e.g., e-mail, computer, Web, and so on), 39 key terms were selected. Together with their related terms, a set of 95 terms results from this selectionprocess.

These terms and their relations were mapped by a visualization tool (Nooy, Mrvar, & Batagelj, 2005). Fig. 7 shows the map, where eachterm was translated into English by the Google translation service with a few terms manually corrected. To reflect the way the terms wereused in the media, the node size corresponds to the term’s document frequency while the node color approximately reflects the subject towhich the node term belongs. In this way, this map denotes the major concepts of the scientific terms and their relations discovered fromthe public media.

A set of salient cliques representing different subjects can be readily identified, such as the astronomy topic and the energy issue in theupper left area. The large clique along the diagonal can be partitioned into several subjects by the term nodes where multiple subjectsintersect. It is interesting to note that the terms bridging different subjects can reveal some important concepts across different disciplines.For example, the ‘temperature’ is a basic physics concept and also a governing term in climatology under earth science.

Furthermore, the terms in the same cliques provide clearer contexts for each other to disambiguate their meanings than when they werelisted individually. For example, the Chinese term ‘ZuZhi’ can mean (and can be translated into) either ‘organization’ or ‘tissue’. Based on itsneighbours in the map (e.g., ‘cell’ and ‘organ’), we decided that ‘tissue’ is a more appropriate meaning (and translation).

Based on this concept map, the SLiM instrument was developed by a group of professors, post-doc researchers, and high school teachers,all concerned with science education. During the course of instrument development, the map turned out to be a convenient tool for itemclassification, developer collaboration, and expert review and discussion.

For each group of neighbouring terms in the map, an item was developed. It consists of a multiple-choice question with four alterna-tives, the last alternative being ‘‘I don’t know” for each question. The test persons were asked to click the alternative ‘‘I don’t know” ratherthan to purely guess at a question if they had no idea about which alternative was the correct answer. The rationale for this was to avoidcorrect responses by pure chance.

Fig. 7. The concept map of scientific terms mined from the news media.

Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177 173

As the terms were identified from occurrence in media, the test items were developed with the intention to provide context and daily-life-related questions, thus probing whether the reader could understand the meaning and use of a term in a daily-life context. The mostcommonly co-occurring terms were used to provide multiple-choice alternatives. However, if some linked terms were too difficult to de-velop as a context-based item, conceptual items were developed instead. Examples of contextual and conceptual items are shown in Fig. 8.

In the contextual example, it was based on the linked terms in the upper right area of the map: ‘structure’ and ‘fatty acid’. When devel-oping this item, the two terms were used to form a query to retrieve relevant snippets. These two terms seem to be irrelevant at the firstsight (they are among the few ones that seem most irrelevant in Fig. 7). But they do connect to an interesting nutrition fact worth of know-ing. As can be seen from the search result in Fig. 9, there are discussions in media about how consumers should choose different food prod-ucts relating to their content of saturated and unsaturated fatty acids. For examples, the first article entitled ‘‘Sunflower oil – anotheralternative to olive oil” reveals in its retrieved snippet that ‘‘ . . . squeezed oils have their own nutrition structures and fatty acids (see table).Traditional sunflower oil has unsaturated . . .”. The second article talks about that the fatty acid in fish oil can change the structure of the cellmembrane to prevent diseases. The fourth article suggests choosing oils based on their fatty acids. This made us with ease to create a con-textually based test item with connection to these discussions.

In the conceptual example, the keyword ‘molecule’ was to be combined with the co-occurring terms ‘atom’ and ‘cell’. These terms referto basic concepts in science, which may be of importance in daily-life as a background knowledge that can help to structure information,rather than having any direct application. The question about scale relationship could therefore be viewed as testing basic conceptualknowledge rather than testing the ability to apply this knowledge in a daily-life-related context.

In total, 50 items were generated from the 95 terms and their relations. According to the subject distribution of terms as shown in theAppendix, the subject of biology was represented by the highest number of terms (45.26%), earth science was the second field (37.90%), andthe subject areas of physics (11.58%) and chemistry (5.26%) had the lowest representation. According to the terms distribution of each sub-ject field, the number of items relating to each subject area (biology 22, earth science 19, physics 6 and chemistry 3) mirrored the occur-rence of scientific terms from the respective subject area in the media.

Ex1. Contextual ItemWhy do we recommend consumers to use vegetable oils, like olive oil, instead of oil from animals, such as butter?

Because it is natural

Because it contains unsaturated fatty acids

Because it contains saturated fatty acids

I don’t know

Ex2: Conceptual ItemCheck the correct order from smaller to bigger:

Molecules-Cells-Atoms

Molecules-Atoms-Cells

Atoms-Molecules-Cells

I don’t know

Fig. 8. Two examples showing the contextual and conceptual items.

Fig. 9. The top five documents and their snippets retrieved by the query (‘‘structure and fatty acid”) to suggest the item construction.

174 Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177

Specifically, it was found that biological terms were very often related to health (e.g., diabetes) and the human body (e.g., heart). In earthscience, the terms were often related to local phenomena, such as typhoons and earthquakes, which occur in Taiwan from time to time.Also, new scientific discoveries are often presented, e.g., relating to biotechnology/genetic engineering in biology. Environmental issuessuch as global warming, and celestial phenomena like comets and asteroids in earth science were regularly presented. In physics and chem-istry, the terms were related to technical applications, such as diesel engines, batteries and thermal conductors.

The test items were sent out for validation by eight experts in different areas of science. They all have doctoral degree in the specificscience field and are currently professors teaching undergraduate and graduate courses. Furthermore, these experts have experiences re-lated to secondary school education such as writing and/or reviewing secondary science textbooks. The items, together with the minedconcept map, were presented to the experts for content validation. The items were then revised according to the experts’ comments. Afterthat, we invited three main groups of 7th graders, 10th graders, and undergraduates to represent the citizens who only received an ele-mentary degree, a junior high degree, and a senior high degree, respectively, so as to exam the reliabilities of SLiM test. A total of 1034participants answered the SLiM questionnaires; among them 954 (92.3%) people were valid samples. The reliability of SLiM based onthe valid samples ranges from 0.60 (Guttman’s Lambda1, the most conservative one) to 0.85 (Kuder-Richardson 20 formula, KR20), whichis in the acceptable range (Fraenkel & Wallen, 1993) for an exploratory type measure such as SLiM used in the current study. The averagedifficulty of the whole 50 items of SLiM ranges from 0.19 to 0.91, and the discrimination powers are 0.1–0.59.

The instrument, after the above expert review and pilot testing, is intended to be used in a national wide civic SLiM measurement in thefuture.

7. Conclusions

Promoting science literacy is one of the ultimate goals of science education and developing an instrument to measure the civic scientificliteracy in media (SLiM) for feedback can help achieve this goal. This paper focuses on reporting a novel approach to develop the instru-ment. Through this approach, we could easily visualize the science textbook terms and their relations presented in news media, andaccordingly develop an instrument reflecting the scientific knowledge revealed in the daily news stories, thus meeting the need for mea-suring civic scientific literacy in media.

Our SLiM instrument development is based on the framework of concept maps. As previous studies showed, concept maps are not onlyused widely in instruction and learning, they are also used in curriculum development and assessment. Novak has envisioned that ‘‘perhapsin time even national achievement exams will utilize concept mapping as a powerful evaluation tool” (Novak, 1995; Novak & Cañas, 2006).Although we did not apply concept mapping for this kind of tasks (national wide civic SLiM measurement) in a direct way, we apply theconcept map framework (the concepts, the relations, the maps, and the contexts) for developing the instrument. This does not only echoNovak’s envision, but also demonstrates another viable way for assessment based on concept maps.

The ability to mine concept maps from text documents or learning material has several advantages in the concept map applications. Inaddition to facilitating online self-learning, it provides an easy means for item development, organization, and validation in our case. As anexample, based on the contexts associated with the mined concepts, the item development is efficient (with suggestion from the contexts),the items developed are timely (reflecting the current application and state of the art reported in media), and their topic distribution man-ifests the daily use of scientific terms in media. Subsequently, if all items have been devised by running out all the current contexts, newdocuments can be added and mined to provide additional information for suggestion. As such, the concept map driven SLiM instrument canbe developed periodically to reflect the civic science literacy on a timely base for longitudinal comparison.

This work details the technology to mine concept maps from texts. The presented algorithms are statistical approaches. Compared to theapproaches based on traditional language parsing technology, they use very limited resources that require manual maintenance, thus read-ily applicable to many knowledge domains. Furthermore, these algorithms are language independent (feasible for both English and Chineseas demonstrated), thus easily implemented in a wide variety of world areas. They have been applied to small examples shown in previoussections for validation and to large-scale cases for rigorous evaluation, where sufficient effectiveness and efficiency have been observed.

The concept map mined from the news collection proves to be useful. Each concept (a single term or group of linked terms) can be con-structed into a SLiM item, either contextually or conceptually. From this perspective, its usability reaches 100%, if evaluated by a quanti-tative measure.

Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177 175

In conclusion, this paper contributes by proposing a practical concept map mining methodology, presenting an innovative concept mapdriven item development scenario, and demonstrating their feasibility and usability. Future direction would be to apply the ideas to moretasks, such as guided concept mapping for adaptive online learning.

Acknowledgement

This work was supported in part by the National Science of Taiwan under the Grant number NSC 98-2631-S-003-002.

Appendix A

The classification and distribution of the developed SLiM items.

Earth science

No. of terms % of totalterms

No. of itemsdeveloped

Item no.

2

2.105% 1 1

2

2.105% 1 2

2

2.105% 1 3

4

4.211% 2 4.10

33

34.737% 17 5�9 11�22

Earth science

No. of terms % of totalterms

No. of itemsdeveloped

Item no.

10

10.526% 6 23�28

6

6.316% 3 29�31

(continued on next page)

176 Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177

Appendix A (continued)

Earth science

No. of terms % of totalterms

No. of itemsdeveloped

Item no.

20

21.053% 10 32�41

Physics

No. of terms % of totalterms

No. of itemsdeveloped

Item no.

6

6.316% 3 42�44

3

3.158% 2 45�46

2

2.105% 1 47

Chemistry

No. of terms % of totalterms

No. of itemsdeveloped

Item no.

2

2.105% 1 48

3

3.158% 2 49�50

TOTAL

95 100% 50

References

Brossard, D., & Shanahan, J. (2006). Do they know what they read? building a scientific literacy measurement instrument based on science media coverage. ScienceCommunication, 28(1), 47–63.

Cañas, A. J., Hill, G., Carff, R., Suri, N., Lott, J., Eskridge, T., et al. (2004). CmapTools: A knowledge modeling and sharing environment. Paper presented at the first internationalconference on concept mapping, Pamplona, Spain.

Chang, S.-N. (2007). Externalizing students’ mental models through concept maps. Journal of Biological Education, 41(3), 107–112.Chang, C. Y., Chang, Y. H., & Yang, F. Y. (2009). Exploring secondary science teachers’ perceptions on the goals of earth science education in Taiwan. International Journal of

Science Education, 31(17), 2315–2334.Chang, K. E., Sung, Y. T., & Chen, S. F. (2001). Learning through computer-based concept mapping with scaffolding aid. Journal of Computer Assisted Learning, 17(1), 21–33.Chen, N. S., Kinshuk Wei, C. W., & Chen, H. J. (2008). Mining e-learning domain concept map from academic articles. Computers & Education, 50(3), 1009–1021.Chien, L.-F. (1997). PAT-tree-based keyword extraction for Chinese information retrieval. ACM SIGIR Forum, 31, 50–58.Clariana, R. B., & Koul, R. (2004). A computer-based approach for translating text into concept map-like representations. Paper presented at the first international conference on

concept mapping, Pamplona, Spain.

Y.-H. Tseng et al. / Computers & Education 55 (2010) 165–177 177

De Boer, G. E. (2000). Scientific literacy: Another look at its historical and contemporary meanings and its relationship to science education reform. Journal of Research inScience Teaching, 37(6), 582–601.

Dicheva, D., & Dichev, C. (2006). TM4L: Creating and browsing educational topic maps. British Journal of Educational Technology, 37(3), 391–404.Fraenkel, J. R., & Wallen, N. E. (1993). How to design and evaluate research in education (2nd ed.). New York: McGraw-Hill.Frakes, B., & Baeza-Yates, R. (1992). Information retrieval: Data structures & algorithms. Prentice-Hall.He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, 48(1), 133–159.Herl, H. E., O’Neil, H. F., Chung, G. K. W. K., & Schacter, J. (1999). Reliability and validity of a computer-based knowledge mapping system to measure content understanding.

Computers in Human Behavior, 15(3–4), 315–333.Hopkins, L. T. (1925). A study of magazine and newspaper science articles with relation to courses in sciences for high schools. School Science and Mathematics, 25, 793–800.Horton, P. B. (1993). An investigation of the effectiveness of concept mapping as an instructional tool. Science Education, 77(1), 95–111.Kinchin, I. M. (2000). Concept-mapping activities to help students understand photosynthesis – and teachers understand students. School Science Review, 82(299), 11–14.Koelsche, C. (1965). Scientific literacy as related to the media of mass communication. School Science and Mathematics, 65, 719–725.Markham, K. M., Mintzes, J. J., & Jones, M. G. (1994). The concept map as a research and evaluation tool: Further evidence of validity. Journal of Research in Science Teaching,

31(1), 91–101.Martin, W. E. (1945). A chronological survey of published research studies relating to biological materials in newspapers and magazines. School Science and Mathematics, 45,

543–550.Nooy, W. D., Mrvar, A., & Batagelj, V. (2005). Exploratory social network analysis with Pajek. Cambridge University Press.Novak, J. D. (1990). Concept maps and vee diagrams: Two metacognitive tools to facilitate meaningful learning. Instructional Science, 19(1), 29–52.Novak, J. D. (1995). Concept mapping: A strategy for organizing knowledge. Mahwah, NJ: Lawrence Erlbaum Associates.Novak, J. D., & Cañas, A. J. (2006). The theory underlying concept maps and how to construct them. Pensacola: Florida Institute for Human and Machine Cognition.Novak, J. D., & Gowin, D. B. (1984). Learning how to learn. Cambridge: Cambridge University Press.O’Donnell, A. M., Dansereau, D. F., & Hall, R. H. (2002). Knowledge maps as scaffolds for cognitive processing. Educational Psychology Review, 14(1), 71–86.Okebukola, P. A. (1992). Concept mapping with a cooperative learning flavor. American Biology Teacher, 54(4), 218–221.Oliveira, A., Pereira, F. C., & Cardoso, A. (2001). Automatic reading and learning from text. Paper presented at the international symposium on artificial intelligence Kolhapur,

India.Pepper, S. (2000). The TAO of topic maps. Paper presented at the XML Europe 2000, Paris, France.Peterson, E. (1998). Chinese segmenter and annotation tool. <http://www.mandarintools.com/segmenter.html> Retrieved 25.10.2009.Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. MA: Addison-Wesley.Soyibo, K. (1995). Using concept maps to analyze text book presentations of respiration. American Biology Teacher, 57(6), 344–351.Stewart, J., Vankirk, J., & Rowell, R. (1979). Concept maps: A tool for use in biology teaching. American Biology Teacher, 41(3), 171–175.Stoddart, T., Abrams, R., Gasper, E., & Canaday, D. (2000). Concept maps as assessment in science inquiry learning – a report of methodology. International Journal of Science

Education, 22, 1221–1246.Tsai, C. C., Lin, S. S. J., & Yuan, S. M. (2001). Students’ use of web-based concept map testing and strategies for learning. Journal of Computer Assisted Learning, 17(1), 72–84.Tseng, Y.-H. (2002). Automatic thesaurus generation for Chinese documents. Journal of the American Society for Information Science and Technology, 53(13), 1130–1138.Tseng, S. S., Sue, P. C., Su, J. M., Weng, J. F., & Tsai, W. N. (2007). A new approach for constructing the concept map. Computers & Education, 49(3), 691–707.Valerio, A., & Leake, D. (2006). Jump-starting concept map construction with knowledge extracted from documents. Paper presented at the second international conference on

concept mapping, San José, Costa Rica.Valerio, A., & Leake, D. B. (2008). Associating documents to concept maps in context. Paper presented at the third international conference on concept mapping, Finland.van Rijsbergen, K. (1979). Information retrieval. <http://www.dcs.gla.ac.uk/Keith/Chapter.2/Table_2.1.html> Retrieved 25.10.2009.Villalon, J. J., & Calvo, R. A. (2008). Concept map mining: A definition and a framework for its evaluation. Paper presented at the IEEE/WIC/ACM international conference on web

intelligence and intelligent agent technology, Los Alamitos, CA, USA.