A Personalized and Collaborative eLearning Materials Recommendation Scenario using Ontology-based...

10
A Personalized and Collaborative eLearning Materials Recommendation Scenario using Ontology-based Data Matching Strategies Ioana Ciuciu and Yan Tang, Semantics Technology and Applications Research Laboratory, Department of Computer Science, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium {iciuciu, yan.tang}@vub.ac.be Abstract. We propose a virtual teacher for the evaluation of students’ competencies. It aims to improve learning by making personalized suggestions on the learning materials. It is based on three main components: 1) a semantically enriched content management system (CMS), playing the role of knowledge base, 2) a 3D anatomy browser and 3) an ontology-based matching strategy called Controlled Fully Automated Ontology Based Matching Strategy (C-FOAM), providing the evaluation methodology. Together with the collaborative knowledge base, which allows knowledge to be represented in natural language and to be further reused, the evaluation methodology becomes the main contribution of the paper. The approach is demonstrated on a learning scenario illustrated around 3D anatomical structures. Keywords: Ontology, Ontology-based Data Matching, Evaluation Methodology, Knowledge Management, E-learning. 1 Introduction The present study focuses on semantically rich medical data annotations for eLearning. The goal is to create an intelligent system that evaluates students based on the Controlled Fully Automated Ontology Based Matching Strategy (C-FOAM) developed in the EC FP6 Prolix project and on the evaluation methodology for the evaluation. The intelligent system makes personalized suggestions on the learning materials according to the similarity scores found by C-FOAM when comparing the students’ answers with the knowledge base. This paper focuses on a collaborative knowledge base in natural language and the evaluation methodology. The study begins with a background on ontology engineering in Section 2, it continues with the three components of the virtual teacher: the CMS - introduced in Section 3, the visual framework - introduced in Section 4 and the matching strategy - introduced in Section 5. The learning scenario and the interpretation of the results are presented in Section 6. Section 7 is the related work of the paper. Section 8 concludes on the presented work and discusses future research ideas emerging from it.

Transcript of A Personalized and Collaborative eLearning Materials Recommendation Scenario using Ontology-based...

A Personalized and Collaborative eLearning Materials Recommendation Scenario using Ontology-based Data

Matching Strategies

Ioana Ciuciu and Yan Tang,

Semantics Technology and Applications Research Laboratory, Department of Computer Science, Vrije Universiteit Brussel, Pleinlaan 2,

B-1050 Brussels, Belgium iciuciu, [email protected]

Abstract. We propose a virtual teacher for the evaluation of students’ competencies. It aims to improve learning by making personalized suggestions on the learning materials. It is based on three main components: 1) a semantically enriched content management system (CMS), playing the role of knowledge base, 2) a 3D anatomy browser and 3) an ontology-based matching strategy called Controlled Fully Automated Ontology Based Matching Strategy (C-FOAM), providing the evaluation methodology. Together with the collaborative knowledge base, which allows knowledge to be represented in natural language and to be further reused, the evaluation methodology becomes the main contribution of the paper. The approach is demonstrated on a learning scenario illustrated around 3D anatomical structures.

Keywords: Ontology, Ontology-based Data Matching, Evaluation Methodology, Knowledge Management, E-learning.

1 Introduction

The present study focuses on semantically rich medical data annotations for eLearning. The goal is to create an intelligent system that evaluates students based on the Controlled Fully Automated Ontology Based Matching Strategy (C-FOAM) developed in the EC FP6 Prolix project and on the evaluation methodology for the evaluation. The intelligent system makes personalized suggestions on the learning materials according to the similarity scores found by C-FOAM when comparing the students’ answers with the knowledge base.

This paper focuses on a collaborative knowledge base in natural language and the evaluation methodology. The study begins with a background on ontology engineering in Section 2, it continues with the three components of the virtual teacher: the CMS - introduced in Section 3, the visual framework - introduced in Section 4 and the matching strategy - introduced in Section 5. The learning scenario and the interpretation of the results are presented in Section 6. Section 7 is the related work of the paper. Section 8 concludes on the presented work and discusses future research ideas emerging from it.

2 Backgrounds

The knowledge in this study (e.g., learning materials) is grounded in natural language, following the paradigm of Developing Ontology Grounded Methodology and Applications (DOGMA) [1, 2]. In DOGMA, the ontology is two-layered, in order to make the reuse of facts easier. It is separated into 1) a lexon base layer and 2) a commitment layer.

A lexon is a quintuple , , , , Γ , where is a finite set of terms; and represent two concepts in a natural language (e.g., English). is a finite set of roles; and ( corresponds to “role” and corresponds to “co-role”) refer to the relationships that the concepts share with respect to one another; is a context identifier and refers to a context, which serves to disambiguate the terms , into the intended concepts, and in which they become meaningful. For example , , , , means the fact that “in the context of human anatomy, the lower limb has as part the muscle and muscle is a part of the lower limb”. This example is depicted in Fig. 1.

Fig. 1. A lexon example

A commitment contains a constraint of a set of lexons. For instance, we can apply the constraint on the above lexon: “each muscle belongs to at most one lower limb”. The commitment language is specified in a language such as OWL [3] or SDRule [4].

3 Collaborative Semantic Data Annotation

The collaborative CMS [5, 6] provides the communities with a framework for knowledge sharing and storage platform based on ontologies. The data - images, videos, publications, etc. - stored in the CMS is assigned meaning through annotations. The annotation of the data models is a social, collaborative process. The tool we propose is highly suitable for these social processes. It collects different parts of knowledge - expressed in natural language - from domain experts in the form of on-line templates. The knowledge captured by the templates can be made machine-readable by transforming it in the corresponding RDF(S) classes and properties.

In this study, the CMS serves as knowledge base for the visualization framework and for the ontology-based matching strategies, providing a way of representing, communicating and sharing knowledge for learning purposes. C-FOAM processes the knowledge in the lexon format. Therefore, we derive lexons from annotations and input them into C-FOAM in order to find the scores.

Fig. 2. Learning materials stored in the CMS: an image (left); a video (right).

Table 1 and Table 2 provide (partially) a conceptualization in terms of lexons of the two anatomical structures represented in Fig. 2. The lexons are inferred from the annotations of the learning materials (scientific publications, books, etc.) associated to these anatomical structures in the CMS.

Table 1. Lexons derived from the learning materials for the Extensor Hallucis Longus Tendon.

Extensor Hallucis Longus Tendon part of part Tendons Of Lower Leg Extensor Hallucis Longus Tendon affected by affects Rupture Extensor Hallucis Longus Tendon reconstructed with reconstructs Gracilis Tendon Autograft Extensor Hallucis Longus Tendon repaired by repairs Surgery Extensor Hallucis Longus Tendon replaced by replaces Accessory Tendon

Table 2. Lexons derived from the learning materials for the Acetabular Labrum Cartilage.

Acetabular Labrum part of part Cartilage of Hip Acetabular Labrum stabilizes stabilized by Hip Joint Hip Arthroscopy repairs repaired by Acetabular Labrum Hip Arthroscopy is a is Surgery Arthroscopy diagnoses diagnosed by Acetabular Labral Tear

The user can visualize the anatomical structures and in the same time retrieve semantic information thanks to the knowledge interaction framework, presented in the next section.

4 3D Anatomy Browser

We setup an interaction framework for the examination of the knowledge describing the musculoskeletal system of the human lower limb. The user can browse the anatomical structures which are of interest for him using the anatomy browser [7] and in the same time query the CMS to retrieve information on the selected structures.

Every time new structures are added to the anatomical browser resources, related information (text, images, associated publications, etc.) are updated on the collaborative CMS and annotated. Then a link is created between the application and the online data, enhancing the application with semantic information (see Fig. 3).

Fig. 3. Knowledge interaction: anatomy browser (left) and the corresponding annotation via the collaborative CMS (right).

In the scenario presented in this study, the anatomy browser is used only for visualization of the anatomical structures during the tests. All the semantic information related to those structures is hidden from the user. The following section introduces the ontology-based matching strategy algorithms used for competency evaluation.

5 C-FOAM Ontology-based Data Matching Strategy

The ontology-based data matching framework (ODMF, [8]) contains matching algorithms originally for: 1) matching strings, such as the ones for SecondString [9], in particular UnsmoothedJS [10, 11, 12], JaroWinklerTFIDF [10, 11, 12] and TFIDF (term frequency-inverse document frequency [13]); 2) matching lexical information, such as using WodNet [14] and 3) matching concepts in an ontology graph. There are several ontology-based data matching strategies in ODMF. Each strategy contains at least one graph algorithm.

5.1 C-FOAM

In this study, we applied Controlled Fully Automated Ontology Based Matching Strategy (C-FOAM), developed within ODMF. C-FOAM contains two modules: 1) the Interpreter and 2) the Comparator. The interpreter module makes use of the lexical dictionary, WordNet, the domain ontology and string matching algorithms to interpret end users’ input. Given a term that denotes either (a) a concept in the domain ontology, or (b) an instance in the ontology, the interpreter will return the correct concept(s) defined in the ontology or lexical dictionary, and an annotation set.

There are two penalty values in the interpreter module. The first one is the threshold for the internal output using string matching. The filtered terms will be the input for the lexical searching components. The second penalty value is to filter the output of the lexical searching components. For instance, when a user enters a string

“hearty” or “warmhearted”, C-FOAM finds a defined concept “heart” in the domain ontology and its annotation using JaroWinklerTFIDF (the string matching algorithm) and WordNet (the lexical dictionary).

The comparator can as well use any combination of the different graph algorithms to produce a composite score.

5.2 C-FOAM applied to an eLearning Scenario

Let Ω be the musculoskeletal ontology and a labeled concept, i.e., “Patella”. We denote as a lexon set defined in Ω; describes . Similarly, we denote as a lexon set of the learning materials. Each corresponds to a learning material, e.g., “Imaging of the dysplasia”. Let , be a function that calculates the overlapping rate of and : , 1 | |/| |

It means that , |0 , 1, , . Let be a natural language string provided by a student. Let be the

concept label that is linked to the interpretation of . We denote as a synonym set of .

Below are three possible situations: Situation 1: The similarity score , , . Situation 2: and In this situation, needs to be mapped into a set of . The similarity score , , , where is a lexon set defined

for . Situation 3: and In this situation, we use a he lexical dictionary, such as WordNet, to define . The

similarity score is calculated by finding a synonym or hypernym of , which is equivalent to C . The similarity score , , iff and , 0 iff , where is a weight that satisfies the following conditions: 0.5 1 if is the synonym of ; 0.2 0.5 if is the hypernym or hyponym of .

Note that in this paper , is calculated by the LeMaSt algorithm (simple version), which is a kind of graph similarity algorithm. We can as well use other graph algorithms for the calculation [8].

6 The eLearning Scenario

6.1. Scenario

We have tested our approach on a learning scenario, as follows : Step 1) the computer shows a highlighted zone in the anatomy browser; Step 2) the student gives text input - the anatomical structure he considers to be highlighted; Step 3) the matching engine finds the similarity scores (between student input and knowledge base); Step 4) the

evaluator calculates the matching score; Step 5) repeat Step 1, 2 and 3 as many times as wanted; Step 6) the evaluator calculates the final score; Step 7) the evaluator shows the correct answers for the answers with final score ≠ 100%; Step 8) the computer finds and suggests the learning materials to the student, based on the annotations in the CMS.

Five anatomical structures (see Fig. 4) have been considered for the test, with the corresponding learning material in the CMS: ”Patella”, ”Gluteus Medius”, ”Acetabular Labrum”, ”Extensor Hallucis Longus Tendon” and ”Popliteus”. The learning materials for the five anatomical structures consist of scientific publications and books, images and videos (see Fig. 2). The annotations corresponding to the learning materials are transformed into lexons (299 lexons in total in the lexon base) in order to be processed by the C-FOAM algorithms. The final goal is to find the final score and to make suggestions for the learning material.

Fig. 4. The test data (displays by the 3DAH Viewer).

6.2. Results and Interpretation

We simulated a test composed of five questions with correct and wrong answers to analyze the behavior of the system. In order to test the different matching algorithms, the answers range from correct (“patella”), correct with typo error (“patela”), correct synonym (“kneecap”), partially correct (“labrum”), partially wrong (“plantaris”, “disease”). The typo error case is solved using a string matching algorithm. For the last two questions, the answer was supposed to be wrong, that means 0% matching. However, the results found by C-FOAM are different from 0% (we obtained 0,30% for “plantaris” and “0,11” for disease). This is due to the fact that the matching algorithms found in the ontology (graph) that even if the two concepts are different, they are somehow related. Actually, “plantaris” and “popliteus” are both muscles of the “Posterior Compartment of the Lower Leg”. Whilst for “disease”, there are many annotated learning materials related to the diseases of the “popliteus”, so even if the concepts do not match, the score if different from zero. The test data and the scores are given in Fig.5.

Fig. 5. The results of the eLearning scenario and remarks

For “patella”, four situations have been tested: 1) correct answer – score 1; 2) error spelling – score 0,98; 3) correct synonym – score 0,75; 4) correct synonym and error spelling – score 0,69. These results have been calculated by an advanced C-FOAM version, which combines JaroWinkler and WordNet. Both of them use LeMaSt, which is a graph algorithm. For “patela”, LeMaSt performed a string matching to find the matching with the correct term “patella” and then a lexon matching in order to find the matching in the ontology graph. Whilst for “kneecap”, LeMaSt performed a

lexical matching using WordNet to find the correct term and then a lexon matching to search it in the ontology.

Based on these results, the system can recommend learning materials that can provide missing competencies or improve the existing skills. For instance, in the case of “knee” or “kneecap”, the system can infer that the student understands the concept, but he doesn’t remember the correct terminology for that particular structure.

7 Related Work

A number of works have been done lately in the field of personalized delivery of learning materials for eLearning. They mainly focus on capturing the user context in order to be able to recommend the right content, in the right form, to the right learner. Examples are as follow, just to name a few:

Baloian [15] proposes a recommender system, which suggest multimedia learning material based on the learner’s background preferences and available software/hardware. This approach faces the inconvenient of information overloading. Schmidt [16] proposes an approach for capturing the context of the learner based on the semantic modeling of the learner’s environment. Yu [17] proposes a method for context-aware learning content provisioning for ubiquitous learning.

Our approach is slightly different in that it focuses on capturing and evaluating the user knowledge. The evaluation and the delivery of the learning materials are based on the ontology-based data matching methodology.

A general evaluation methodology for ontology-based data matching does not exist. Evaluation methods are trivial and application specific. Related work on the types of evaluation methods is described as below:

Program evaluation is the systematic collection of information about the activities, characteristics and outcomes of programs to make judgments about the program, improve program effectiveness, and inform decisions about future programming [18].

Utilization-focused evaluation [19, 20] is a comprehensive approach to doing evaluations that are useful, practical, ethical and accurate.

Purpose oriented evaluation methodologies [21] contain three kinds of evaluation methodologies – formative evaluation, pertaining evaluation and summative evaluation. Formative evaluation focuses on the process. Pertaining evaluation focuses on judgment of the value before the implementation. Summative evaluation focuses on the outcome.

8 Conclusion

The paper presents an ontologic approach to improve eLearning by finding the similarity score between the student’s knowledge and the learning materials available in the knowledge base. The approach is based on collaborative semantic data annotations, on visual interaction and on ontology-based data matching strategies.

The learning score is used to evaluate the students’ competencies and also to make suggestions on the learning materials for further improvement of the students’ skills.

Different strategies exist to find the similarity score and a methodology has been developed to evaluate these strategies.

The knowledge (i.e., the learning materials) is modeled using the DOGMA ontology, which has the advantage of being grounded in natural language.

For the present, the suggestions on the learning materials are only done by the system, based on the similarity scores given by the different matching algorithms. A work in progress emerging from this study is to submit the results to the human expertise for verification and improvement. Different contexts/factors will have to be considered in the recommendation, to capture and understand the learner’s individual characteristics and learning behavior. For example, how often did the student take the test, how much time is needed for the student to acquire new knowledge, the learning context, etc.

A future research direction is to apply the ontology-based data matching strategies on annotated anatomical data for the purpose of medical diagnosis.

Acknowledgments. This work is supported by the EU FP6 Marie Curie project 3D Anatomical Human (MRTN-CT-2006-035763) and by the EU FP7 TAS3 project. The work has also been partly supported by the EU ITEA-2 Project 2008005 "Do-it-Yourself Smart Experiences", founded by IWT 459.The authors would like to thank Jerome Schmid from the University of Geneva and Jose Antonio Iglesias Guitian from CRS4, Visual Computer Group, Sardinia, Italy for the permission to use their anatomy browser for this study.

References

1. Meersman, R.: Ontologies and databases: More than a Fleeting Ressemblance. In: Proceedings of the International Symposium on Methodologies for Intelligent Systems, vol.1609, Springer (1999)

2. Meersman, R.: Semantics Ontology Tools in Information System Design. In: The Proceedings of OES/SEO 2001 Rome Workshop, Luiss Publications (2001)

3. Web Ontology Language (OWL), http://www.w3.org/TR/owl-ref 4. Tang, Y., Meersman, R.: SDRule Markup Language: Towards Modeling and Interchanging

Ontological Commitments for Semantic Decision Making, Handbook of Research on Emerging Rule-based Languages and Technologies: Open Solutions and Approaches, IGI Publishing, ISBN: 1-60566-402-2, USA (2009)

5. Collaborative 3DAH, https://starpc25.vub.ac.be 6. Ciuciu, I., Kang, H., Meersman, R., Schmid, J., Magnenat-Thalmann, N., Guitian, J.A.I.,

Gobbetti, E.: Collaborative Content Management: an Ongoing Case Study for Imaging Applications, Proceedings of the 11th European Conference on Knowledge Management, Famalicao, Portugal, (2010)

7. http://3dah.miralab.ch/index.php?option=com_remository&Itemid=78&func=fileinfo&id=394

8. Tang, Y., Meersman, R., Ciuciu, I.G., Leenarts, E., Pudney, K.: Towards Evaluating Ontology Based Data Matching Strategies, In Proceedings of the Fourth IEEE International Conference on Research Challenges in Information Science, 137--145, Nice, France (2010)

9. Cohen, W.W., Ravikumar, P.: Secondstring: An Open Source Java Toolkit of Approximate String-matching Techniques. Project web page: http://secondstring.sourceforge.net (2003)

10. Jaro, M.A.: Advances in Record-linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. The Journal of the American Statistical Association, vol. 84, 414--420 (1989) 11. Jaro, M.A.: Probabilistic Linkage of Large Public Health Data Files (disc: P687-689). Statistics in Medicine, vol. 14, 491--498 (1995) 12. Winkler, W.E.: The State of Record Linkage and Current Research Problems, Statistics of Income Division, Internal Revenue Service Publication R99/04, Available from http://www.census.gov/srd/www/byname.html (1999) 13. Jones, S.K.: A Statistical Interpretation of Term Specificity and its Application in Retrieval, Journal of Documentation, vol. 28(1), 11---21 (1972) 14. Fellbaum, C.: WordNet: An Electronic Lexical Database, Massachusets Institute of Technology, ISBN 0-262-06197 (1999) 15. Baloian, N., Galdames, P., Collazos, C.A., Guerrero, L.A.: A Model for a Collaborative Recommender System for Multimedia Learning Material, In Proceedings of The 10th International Workshop on Groupware (CRIWG’04), 281--288 (2004) 16. Schmidt, A., Winterhalter, C.: User Context Aware Delivery of E-Learning Material: Approach and Architecture, Journal of Universal Computer Science, vol. 10(1), 28--36 (2004) 17. Yu, Z., Nakamura, Y., Zhang, D., Kajita, S., Mase, K.: Content Provisioning for Ubiquitous Learning, IEEE Pervasive Computing, vol. 7, No. 4, October-December 2008, 62--70 (2008) 18. Patton, M. Q.: Qualitative Research and Evaluation Methods, 3rd edition, Sage Publications, Inc, London, UK, ISBN 0-7619-1971-6 (2002) 19. Stufflebeam, D. L., Madaus, G. F., Kellaghan, T.: Utilization-Focused Evaluation, in book Evaluation in Education and Human Services, vol. 49, Second edition, Springer, Netherlands, 425--438 (2006) 20. Blundell, R., Costa Dias, M.: Evaluation methods for non-experimental data, Fiscal Studies, vol. 21(4), 427--468 (2000) 21. Bhola, H. S.: Evaluating “Literacy for development” projects, programs and campaigns: Evaluation planning, design and implementation, and utilization of evaluation results. Hamburg, Germany: UNESCO Institute for Education; DSE [German Foundation for International Development], 306 pages (1990)