LTfLL D7. 3 Validation 3

93
Appendix B: Validation Reporting Templates Table of Contents Appendix B.1: Validation Reporting Template – WP4.1 (BIT-MEDIA) .................................................................................................. 2 Appendix B.2: Validation Reporting Template – WP4.2 (UNIMAN) .................................................................................................... 16 Appendix B.3: Validation Reporting Template – WP5.1 (PUB-NCIT).................................................................................................. 24 Appendix B.4: Validation Reporting Template – WP5.1 (UNIMAN) .................................................................................................... 46 Appendix B.5: Validation Reporting Template – WP5.2 (UPMF) ........................................................................................................ 52 Appendix B.6: Validation Reporting Template – WP6.1 (IPP-BAS; English) ....................................................................................... 61 Appendix B.7: Verification of WP6.1 (IPP-BAS, Bulgarian)................................................................................................................. 74 Appendix B.8: Validation Reporting Template – WP6.2 (UU & PUB-NCIT) ........................................................................................ 76 This appendix provides sections 3 – 8 of the validation reporting templates (results, recommendations and conclusions). The full versions are available to the European Commission and EC reviewers on the LTfLL project Moodle.

Transcript of LTfLL D7. 3 Validation 3

Appendix B: Validation Reporting Templates Table of Contents Appendix B.1: Validation Reporting Template – WP4.1 (BIT-MEDIA) .................................................................................................. 2 Appendix B.2: Validation Reporting Template – WP4.2 (UNIMAN) .................................................................................................... 16 Appendix B.3: Validation Reporting Template – WP5.1 (PUB-NCIT).................................................................................................. 24 Appendix B.4: Validation Reporting Template – WP5.1 (UNIMAN) .................................................................................................... 46 Appendix B.5: Validation Reporting Template – WP5.2 (UPMF) ........................................................................................................ 52 Appendix B.6: Validation Reporting Template – WP6.1 (IPP-BAS; English)....................................................................................... 61 Appendix B.7: Verification of WP6.1 (IPP-BAS, Bulgarian)................................................................................................................. 74 Appendix B.8: Validation Reporting Template – WP6.2 (UU & PUB-NCIT) ........................................................................................ 76

This appendix provides sections 3 – 8 of the validation reporting templates (results, recommendations and conclusions). The full versions are available to the European Commission and EC reviewers on the LTfLL project Moodle.

Page 2 of 93

Appendix B.1: Validation Reporting Template – WP4.1 (BIT-MEDIA) Pilots in English and German at BIT-MEDIA Section 3: Results - validation/verification of Validation Topics listed in the validation scenario

Summary – results

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

VT1

Quality of the questionnaire The type of questions is appropriate for using the positioning service.

Usability (learner)

It is important to prepare a questionnaire which leads the learners to provide long answers for the positioning service.

VT2 Feedback during the positioning system The ‘live feedback’ of the positioning service is useful for the learner during the positioning session.

Pedagogical (learner)

VT3 Concept coverage The positioning service provides tutors with useful results about the concepts the students have covered.

Pedagogical and Relevance (tutor)

Qualitative results show the system isn't accurate enough.

VT4 Optimised syllabus The education provider believes that an optimized syllabus is provided for each individual learner.

Pedagogical (education provider)

Page 3 of 93

VT5 Knowledge coverage The feedback to the tutor helps the tutor to advise the learner on the next steps in his learning.

Pedagogical and Relevance (tutor)

The qualitative data suggested that while the feedback is useful to the tutor, there is a timing issue in the tutor advising the learner.

VT6 The next steps the system offers are useful to the learner.

Pedagogical and Relevance (tutor)

Relatively low score in quantitative data

VT7 Required Resources The education provider needs less resources (time of tutors) for the Positioning System

Efficiency (education provider)

NB Less resources once system has been set up

Details Ref: VT1

Validation topic (feature and claim): Quality of the questionnaire The format of the questions is appropriate for using the LTfLL positioning service. Category: Usability Language of pilot: German and English Native language of students: German

Stakeholder(s): learner, language technology specialist Indicator: Learner questionnaire: Statement: This question is easy to understand and clear defined (to ask for each question). Results from language technology specialist: Statement: The amount and type of text provided by the learners are useful for the system. (to check for each question). Methodology: feedback form Type & no. of participants: 15

Summative results with respect to validation indicator

The summative results concerned the validity of the individual questions specific to the BIT implementation with regard to understanding of the

Page 4 of 93

question/relevance of the answer. The learners and tutors indicated that nearly all questions were easy to understand and useful (only one question was scored as not useful during the validation sessions).

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

It is important to prepare a questionnaire which leads the learners to provide long answers for the positioning service. Some of the questions didn’t achieve that goal. The learners are worried to provide long answers, because the algorithms for the language technologies are not clear for them. To overcome this fear the learners should get more information about the system during the introduction session. There are two suggestions for the next validation and dissemination sessions:

1) Give a short explanation for language technologies 2) Enhancing the demonstration of the service with explaining the feedback for demo-answers.

This additional information is useful for the learners, because they normally didn’t have any experience with language technologies before. Ref: VT2

Validation topic (feature and claim): Feedback during the positioning session The ‘live feedback’ helps the learner to improve the quality of his answers during the positioning session. Category: Pedagogical Language of pilot: German and English Native language of students: German

Stakeholder(s): learner Indicator: Interview of the learner: Statement: The results of the ‘live feedback’ are useful to improve the quality of my answers during the positioning session. Methodology: feedback form Type & no. of participants: 15

Summative results with respect to validation indicator

Learners Q24: The results of the ‘live feedback’ are useful to improve the quality of my answers during the positioning session. Mean=4.00, SD=0.96, Agree/Strongly Agree = 69% (n=13)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Page 5 of 93

The ‘live feedback’ provides a very good functionality of the ‘positioning service’. The main statements regarding to this functionality are: 1) The possibility to use the ‘live feedback’ and improve the answers is highlighting the goal of the service (positioning system – not a testing

system). (learners view)

2) To improve the answers based on the results of the ‘live feedback’ motivates the learner work with the system and to find information regarding to the question. (tutors view)

3) If the learner is not able to improve the answer to get a better grading by the system he is motivated to find additional learning materials for this topic. (tutors view).

The ‘live feedback’ shouldn’t be provided in the case the learners wrote only short answers. Ref: VT3

Validation topic (feature and claim): Concept coverage The positioning service provides tutors with useful results about the concepts the students have covered. Category: Pedagogical and Relevance Language of pilot: German and English Native language of students: German

Stakeholder(s): tutor Indicator: Interview of the tutor after the use of the positioning service: Statement: The positioning service provided me with useful results about the concepts the students have covered. Methodology: feedback form Type & no. of participants: 3

Summative results with respect to validation indicator

Tutors Q15: The positioning service provided me with useful results about the concepts the students have covered. Mean=4.00, SD=0.00, Agree/Strongly Agree = 100% (n=2) Tutors Q4: Overall, I believe that the WP 4.1 Positioning Services provides adequate support for learning. Mean=4.00, SD=0.00, Agree/Strongly Agree = 100% (n=2)

Formative results with respect to validation indicator

Page 6 of 93

Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

As additional activity to the planned validation process the tutors used the ‘live feedback’ possibility to test the results for a set of different answers for each question. Based on these tests the tutors provided the following results:

1) In the current version the results are useful, but the algorithms of the system should be improved. 2) In some cases the systems grading for short answers is better than for long answers. 3) The algorithms for the feedback are not clear to understand for the user.

Ref: VT4

Validation topic (feature and claim): Optimised syllabus The education provider believes that an optimized syllabus is provided for each individual learner. Category: Pedagogical Language of pilot: German and English Native language of students: German

Stakeholder(s): education provider Indicator: Questionnaire to the tutor after the use of the positioning service: Statement: The education provider believes that an optimized syllabus is provided for each individual learner. Interview of the tutor after the use of the positioning service: Formative Question: Think of some learners who didn’t receive an optimised syllabus. What were the reasons for this? Methodology: feedback form Type & no. of participants: 15

Summative results with respect to validation indicator

Tutors Q11: The education provider believes that an optimized syllabus is provided for each individual learner. Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=2) As additional indicator to the direct feedback regarding to the positioning service, all the learners passed the ECDL exam preparation test from bitmedia with success.

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

The education provider (manager of the education project in bitmedia, where the validation round was done) agreed based on the feedback of the tutors and the preparation test results that the positioning service is useful to generate an optimized syllabus for the learner.

Page 7 of 93

Especially the positioning system is useful to reveal the weak spots of the learner’s knowledge. To use the positioning service in combination with the experience of the tutor we recognized the following benefits:

1) The learner is motivated by the positioning system to provide the answer in the best available quality based on his knowledge. This avoids the risk of underrating the learner’s knowledge. Underrating the learner’s knowledge is a risk for the education provider, because this would lead to establish a too long syllabus.

2) The use normal text for the positioning system is a pedagogical benefit towards the traditional multiple choice testing system we are using. 3) The motivation benefit of the ‘live feedback’ is very helpful as a pedagogical benefit in the overall learning environment. 4) To have the tutors assisted by the positioning services enables to work the tutors and learners in a more flexible form.

Ref: VT5

Validation topic (feature and claim): Knowledge coverage The knowledge poor and knowledge rich feedback to the tutor helps the tutor to advise the learner on the next steps in his learning. Category: Pedagogical and Relevance Language of pilot: German and English Native language of students: German

Stakeholder(s): tutor Indicator: Interview of the tutor: Statement: The feedback report helps me to advise the learner on the next steps in his learning. Methodology: feedback form Type & no. of participants: 3

Summative results with respect to validation indicator

Tutors Q10: The feedback report helps me to advise the learner on the next steps in learning. Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=2)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

During the interview with the tutors explained that the results of the positioning service are useful as hints for the next steps of learning for the individual learners. Because the tutors were not able to establish the learning plan immediately after the learner had finished answering the questionnaire the learners proceeded themselves with the next learning steps. Therefore the learners used the provided concepts and phrases to find the training materials.

Page 8 of 93

This self-guided learning process was initiated by the learners – the tutors didn’t guide them to use the positioning system in this way. Ref: VT6

Validation topic (feature and claim): The next steps the system offers are useful to the learner. Category: Pedagogical and Relevance Language of pilot: German and English Native language of students: German

Stakeholder(s): learner Indicator: Interview of the learner at the end of the training: Statement: The next steps the system offered were useful to me. Methodology: feedback form Type & no. of participants: 15

Summative results with respect to validation indicator

Learners Q25: The next steps the system offered were useful to me. Mean=3.33, SD=0.75, Agree/Strongly Agree = 33% (n=12)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Because the learners had the experience that the tutors are able to establish a plan for the next steps of learning without or with the system the mainly graded this validation topic with ‘neutral’. The benefit of the positioning service is to get the feedback immediately and independent from the time resources of the tutor. The majority of the learners are also expecting to have the assistance of the ‘tutor’ as additional feedback is important. Ref: VT7

Validation topic (feature and claim): Required Resources The education provider and the tutors need less resources (time of tutors) for the positioning task (compared to the existing positioning sequence).

Stakeholder(s): education providers Indicator: Calculation of the education provider: Amount of time for the corpus building and service training (calculated on time per course). Amount of time a tutor needs to establish a learning path for each student (calculated on time per course).

Page 9 of 93

Category: Business efficiency Language of pilot: German and English Native language of students: German

These values are compared to the existing time required for the positioning of each student with current used multiple choice assessments. Result: How many students are required per course to achieve an benefit for the education provider. Methodology: Resource calculation form provided by the education provider Type & no. of participants: 15

Summative results with respect to validation indicator

Tutors Q5: It takes me less time to complete my teaching tasks using the ‘WP4.1 Positioning Service’ than without the system. Mean=4.5, SD=0.50, Agree/Strongly Agree = 100% (n=2) Tutors Q2: Overall, the ‘WP4.1 Positioning Service’ helps me to complete my teaching tasks successfully. Mean=4.5, SD=0.50, Agree/Strongly Agree = 100% (n=2) The calculation of time resource savings showed that in the learning environment for this pilot, the use of the positioning service will save time resources if the system is used for about 200 or more learners (there were over 1.500 learners involved in this learning environment). In other learning environments the positioning system would help to improve distance learning the needed amount of learners will be lesser (e.g. if there is a calculation of saving in travelling cost and time resources needed by the learner).

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

The tutors explained that using the positioning service enables the learners themselves to continue with finding learning materials and immediately continue with their learning steps after completing the positioning service questionnaire. This possibility avoids unused ‘time frames’ for the learners. The use of the feedback from the positioning service saves tutors time for establishing the syllabus, because of the following reasons:

1) The results of the positioning service can be directly (without preparation time) used for dialogue with the learner to establish the learning path.

2) The feedback of the positioning system provides a grading for each answer. This grading is used a short indicator (like a KPI – key performance indicator) to focus during the dialogue with the learner on the important learning topics.

Page 10 of 93

3) The phrases and concepts included in the feedback of the positioning system are highlighting the missing concepts in the learner’s knowledge. Based on this information the tutor is able to find adequate learning materials (books, e-Learning, labs …).

Overall we expect that the positioning service will reduce the time needed by the tutors for the positioning task by about 50%. To establish the positioning system for a learning environment there are additional preparation task required to do (create the questionnaire, create graded answers, provide learning materials …). Section 4: Results – inductive stakeholder validation activities

Ref: S3-1

VALIDATION EVENT Methodology: Stakeholder(s):

Type & no. of participants: Language of pilot: Native language of participants:

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

The learners would like to have an overview report of all results from on positioning sessions in addition to the final learning suggestions given by the tutor. This report will help the learners to find themselves additional learning resources (e.g. with additional LTfLL services or with traditional internet resources).

To improve the quality of the questions the service could use images to explain the question (there could be a button to expand the images as needed by the learner). Section 5: Summary of results – verification activities

ALPHA TESTING Outstanding issues

Outstanding challenges / opportunities from alpha testing (major points only):

1. Further improvements to the interface, make it more user-friendly. 2. Feedback accuracy has to be evaluated and further optimized.

Page 11 of 93

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: 1. Quantitative Feedback optimization improvements driven by an automated evaluation framework. 2. Qualitative Feedback improvement by expanding the phrase extraction corpora and additional adjustments. 3. Deeper Knowledge Rich approach integration by connecting with work package 6.1

How do these challenges/opportunities inform the roadmap for the end of the project?

Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may not be complete by the end of the project (e.g. LSA improvements). Feedback optimization (LSA, phrase extraction, lexicalization, etc.) for more than 2 domains and languages.

Verification of language technologies (reported in full in D4.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development?

How does this work inform the roadmap for the end of the project?

Is the student/tutor comfortable with the user interface and workflow?

Yes, with some reservations due to some less intuitive features and a relatively high setup cost and learning curve for the setup-process.

Reduce complexity and further automate steps where possible.

Consider thorough widgetization and/or full integration into established LMS to ensure a familiar user experience.

Are the knowledge poor phrases sufficient to provide additional and valuable information to the learner/tutor?

Some valuable information but still needs improvement. Negatively distinct phrases are hard to identify for certain types of questions.

Improve automated phrase extraction mechanisms and rethink presentation of negative/missing phrases for KP approach.

Is the quantitative feedback valid and in line with human judgments?

Mostly valid results, some confusion over feedback effects when adding suggested phrases.

Further improve scoring mechanism and clarify distinction between quantitative and qualitative/KR feedback.

Page 12 of 93

Section 6: Summary of results from validation associated with dissemination activities

Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

No activities reported.

Section 7: Recommendations from second validation round

Ref Validation topic (claim) if applicable Recommendations for next steps Specify section of

scenario / software unit

affected

By when (date)

User Interface During the navigation process between the questions, the learners should be asked, if the current answer should be saved or not. (To avoid losing answers).

Live-Feedback, Tutors hints If the learner provides too small an amount of text the Live-feedback shouldn’t be provided.

Feedback of the learners The learners pointed out, that the answers should be saved automatically, without the need of using the confirm button. (Some learners didn’t save all of their answers, because they didn’t expect the need of using the confirm button in the software).

Some improvements of the user interface (navigation, saving of answers) will increase the satisfaction with the service.

Adding the requested functionality to the version 1.1 of the service.

Page 13 of 93

It is not transparent for the learners, who the Positioning Service is calculation the live-feedback.

An improved description of the language technologies for end users should be established. The benefit of this additional background information is important for the learners to be able to completely trust the system.

If the learners are providing too little text in the answer the Positioning Service calculates incorrect values for the live-feedback.

Add additional checking of the provided answers, before the live-feedback is generated or the learner will be able to save the answer.

Section 8: Conclusions from second validation round Pedagogic effectiveness

We conclude that the WP 4.1 Positioning Service is pedagogically effective in the following respects: The live-feedback of the Positioning Service provides relevant information about the current position of the

learner. The system is motivating the learners to provide their knowledge. As result of the live-feedback the learner’s interests on important topics for their learning path was enhanced. The service is useful to improve the all over performance of an education scenario in the lifelong learning

process.

The following aspects of the service were of limited/no pedagogic value: The learners preferred to have human resource support in addition to the functionality of the Positioning

Service. Not all of the live-feedback results of the service are meeting the expectations of the tutors and learners, based on the limited amount of text provided by the learners.

Relevance

The project managers (bitmedia, management for this learning environment) conclude that the pedagogical problem tackled and the WP 4.1 Positioning Service is relevant in the following respects: The service provides relevant feedback to establish the next learning steps for the learner. The all over used time of human resources (tutors) will be reduced, if the group of learners is about between

100 and 200 learners – in current projects there are currently educated a few hundred of learners in the

Page 14 of 93

education environments of the bit-group. The following comments were made, against the Positioning Service being relevant: Based on individual human preferences not all of the learners are enjoying a computer based system for

positioning (less than 5% of the current learners are not satisfied with computer based services).

Satisfaction

We conclude that satisfaction with the WP 4.1 Positioning Service appears to be good, in the following areas/aspects of the service: The use and handling of the service was commented very positive by the learners. The live-feedback functionality was used to improve the quality of the answers and motivated the learners to

achieve additional knowledge regarding to specific topics. Satisfaction was found to be less good in the following areas: Not all of the answers were saved as expected by the learners.

It is not comprehensible how the system is going to calculate the live-feedback – therefore the learners didn’t fully trust the system.

Usability

We conclude that the usability of the service appears to be good, in the following areas/aspects of the service: The services is easy to understand and easy to use. Only for learners with absolutely no experience using computers the service is not useable. The following aspects of usability require improvement: The questions were not easy to read (font size and colour should be changed).

Efficiency

We conclude that the service appears to be efficient, in the following respects: The service reduces the amount of time required for positioning tasks for each learner. Based on the motivation effect the learners speeded up their learning tasks for the topics, where the live-

feedback guided them to. The following aspects of the efficiency of the service require improvement: The whole process of introducing and explaining the services needs time (about 2 or 3 hours). Therefore the

use of the service has to be implemented for the whole education and not only for one week of the education.

Transferability

The following information is useful with regard to possible transfer of the service to other domains, languages, and pedagogic and organisational settings. Strengths/opportunities:

The learners illuminated, that the positioning service could also be useful for a self-learning environment,

Page 15 of 93

because the questions in combination with the live-feedback are motivating to find information and learning materials about the specific topics, the system suggested.

Based on the estimation of the tutors the Positioning Service is useful for different learning settings of the bit-group (training for developers, e-learning, distance learning and educational settings for companies).

Weaknesses / threats (including competing systems): The learners are currently not aware with the functionality of language technologies and need to get a

detailed introduction into the concepts and goals of the service (and to recognise the service in distinction to testing- and assessment systems).

Only for learners with absolutely no experience using computers the service is not useable.

Page 16 of 93

Appendix B.2: Validation Reporting Template – WP4.2 (UNIMAN) Pilot in English at UNIMAN

Section 3: Results - validation/verification of Validation Topics listed in the validation scenario

Summary – results

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

1 The service allows tutors to easily identify, using conceptograms, individual learners whose topic coverage has progressed more or less than their peers.

USP X Poor tutor Likert results

2 The service allows tutors, using conceptograms, to easily identify individual learners who are not developing in line with intended learning outcomes

Pedagogic Effectiveness

X Tutors can distinguish differences between learners but it is a long-winded process

3 The service allows students to compare their conceptograms with those of other learners

Pedagogic Effectiveness

X

4 The service allows students to compare their own conceptograms with a conceptogram that is representative of the learning outcomes.

Pedagogic Effectiveness

X

Page 17 of 93

Details Ref: 1

Validation topic (feature and claim): The service allows tutors to easily identify, using conceptograms, those learners whose topic coverage has progressed more or less than their peers. Category: USP Tutors using the service are able to identify those learners who are not making good progress in relation to their peers and the targets of the curriculum, who are therefore in need of intervention, and those learners who are outliers in extending their knowledge beyond that of their peers.

Stakeholder(s): Tutors Indicator: A significant number of tutors report the service enables them to accurately identify learners who are progressing less than their peers, and also those learners who have excelled beyond their peers. Methodology: Stakeholder interview, qualitative evaluation Type & no. of participants: 2x Tutors

Results with respect to validation indicator including comparison with previous situation, where appropriate

The CONSPECT service conceptograms allows me to accurately identify those learners who are progressing less than their peers: Mean 2.5, SD= 1.5, 50% Agree (n=2) The conceptograms allows me to accurately identify those learners who have excelled beyond their peers Mean 2.5, SD= 1.5, 50% Agree (n=2)

Formative results Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

A tutor felt this aspect of the service had the potential to be extremely useful but had reservations about the cognitive load of the present visualisation. A perceived weakness is that it does not provide statistical data about the analysis it conducts, which could aid the identification of outliers.

No further relevant comments

Page 18 of 93

Ref: 2

Validation topic (feature and claim): The service allows tutors, using conceptograms, to easily identify individual learners who are not developing in line with intended learning outcomes Category: Pedagogic effectiveness Tutors are able to effectively monitor the progress of learners in relation to the Intended Learning Outcomes

Stakeholder(s): Tutors Indicator: A significant number of tutors report they are able to identify individual learners who are not developing in line with intended learning outcomes, compared to when the service is not used Methodology: Stakeholder interview, qualitative evaluation Type & no. of participants: 2 x Tutors

Results with respect to validation indicator including comparison with previous situation, where appropriate

The conceptograms help me to effectively identify learners who are not developing in line with intended learning outcomes. Mean = 4.0, SD= 0.00 100% Agree / Strongly Agree (n=2)

Formative results Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

CONSPECT was considered to be objective as an assessment support tool which can be used to identify outliers. Ref: 3

Validation topic (feature and claim): The service allows students to compare their conceptogrammes with those of other learners Category: Pedagogic Effectiveness Learners can compare their topic coverage, using the conceptogramme visualisations, with that of other students.

Stakeholder(s): Students Indicator: Learners report that they are able to compare their conceptogrammes with those of their peers Methodology: Stakeholder focus group, qualitative evaluation Type & no. of participants: 6 x Students

Results with respect to validation indicator including comparison with previous situation, where appropriate

It is useful for me to compare my conceptograms with those of other learners: Mean 4.67, SD=0.47, Agree / Strongly Agree: 100% (n=6)

Page 19 of 93

Formative results Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Students found this either useful or very useful, and were also keen to recommend the service to others. “It’s quite a clever way to compare yourself to what other people have written down, that you’ve covered everything you need to cover, which is a problem in PBL at the moment.” “I think, visually, it’s quite good, to have it as a mind map, it’s quite good, and the fact that it’s linked in with other words as well... I quite like the idea of comparing it with other blogs to see if you’ve covered the same things as other people.” “I find it useful as well, especially the comparing part... You can see what’s been missed out on, I think, yeah, I think it’s more useful to see what you’ve missed out on that to see overlap – it’s always good to know if you’ve covered extra.” Ref: 4

Validation topic (feature and claim): The service allows students to compare their own conceptograms with a conceptogram that is representative of the learning outcomes. Category: Pedagogic effectiveness

Stakeholder(s): Students Indicator: -Students report that the service allows them to compare their conceptogrammes with a conceptogramme indicative of required domain knowledge to meet the intended learning outcomes for that subject area. Methodology: Stakeholder focus group, qualitative evaluation Type & no. of participants: 6 x Students

Results with respect to validation indicator including comparison with previous situation, where appropriate

The comparison between my conceptograms and a reference conceptogram that represents the required domain knowledge is helpful for monitoring my learning progress: Mean 4.67, SD=0.47, Agree / Strongly Agree: 100% (n=6)

Formative results Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Students responded positively to the service’s ability to compare their learning with a concept map representative of the learning outcomes. They identified that this helped find areas where they could improve their coverage of a topic area. “I find it useful as well, especially the comparing [with learning outcomes] part... You can see what’s been missed out on, I think, yeah, I think it’s more useful to see what you’ve missed out on that to see overlap – it’s always good to know if you’ve covered extra.”

Page 20 of 93

“I suppose it would be easier to concentrate on the key points, and identify them earlier on. Instead of having lots of different points but not knowing whether to be looking at this, this, this or this... It would help you focus on things that are very important like, it would flag the really important points up and you could make notes on them” Section 5: Summary of results – verification activities

ALPHA TESTING Outstanding issues

Outstanding challenges / opportunities from alpha testing (major points only):

1. Create a help page to explain how CONSPECT works. 2. Provide a list of concepts – not just the graph

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: 1. We will do the changes listed in the previous paragraph

How do these challenges/opportunities inform the roadmap for the end of the project?

Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may not be complete by the end of the project (e.g. LSA improvements). 1. We created a development plan of Priority 1, 2, and 3 improvements and plan to have all of the priority 1 and 2 changes completed by the end of the project.

Page 21 of 93

Verification of language technologies (reported in full in D4.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development?

How does this work inform the roadmap for the end of the project?

Card sort – to see whether humans cluster concepts in the same way as does CONSPECT

Three different types of analysis were carried out for the card sorting data. The first analysis showed that humans were about 10% better than CONSPECT. The second analysis showed that the machine clustering was slightly better than the humans. The third analysis showed a Spearman’s rho of .9 (excellent) in CONSPECT’s and the humans’ ratings of highly descriptive concepts.

We’re on the right track for the technical engine underlying CONSPECT.

N/A

Text annotation – to see how humans judge CONSPECT’s choices of descriptive and non-descriptive terms

Three different types of analysis were carried out for the text annotation data. The first analysis showed that many of the humans ranked more of the terms as not very descriptive than we thought they would. The second analysis shows that humans don’t agree with each other – only a .4 kappa where .7 is considered acceptable. The third analysis showed that in 89% of the cases, the humans thought that the CONSPECT-selected distracter was less similar to the text than was the CONSPECT-selected descriptor – a very good result.

We are thinking of limiting the number of descriptive terms to only those with the highest similarity.

N/A

Page 22 of 93

Section 6: Summary of results from validation associated with dissemination activities

Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

23/03/10

Manchester Medical Education Conference, Manchester, UK

Workshop (75 mins)

Hospital doctors, GPs and academics from NW England involved in training undergraduate doctors.

8

Focus group

1. Tutors felt the tool would add valuable support to their formative assessment activities 2. Tutors were keen to use the tool, so long as the visualization and usability are improved.

Section 7: Recommendations from second validation round Ref Validation topic (claim) if applicable Recommendations for next steps Specify section of

scenario / software unit

affected

By when (date)

VR1 VT1 The service allows tutors to easily identify, using conceptograms, those learners whose topic coverage has progressed more or less than their peers.

The outputs of the service do not provide an efficient means by which Tutors can identify outliers. The provision of statistical / text data to support visualized outputs will provide a less ambiguous indication of students’ progress

8 01 August 2010

VR2 VT2 The service allows tutors, using conceptograms, to easily identify individual learners who are not developing in line with intended learning outcomes

The ability of the service to compare a learner’s conceptogram with a conceptogram representative of the intended learning outcomes was acknowledged by the tutors, but there is a need for alternative / supplementary textual and statistical data. Links back to source data from CONSPECT’s outputs will help tutors to interpret the learners’ development

8 01 August 2010

Page 23 of 93

Section 8: Conclusions from second validation round Pedagogic effectiveness

CONSPECT is pedagogically effective in the following respects: Generating conceptograms that usually identify the key aspects covered in a piece of text material Affording users the ability to compare their own conceptograms against a conceptogram representative of

the learning outcomes Affording users the ability to compare their conceptual coverage agains that of other users The conceptograms need to provide links to both the source text and suitable resources

Satisfaction

Student satisfaction with CONSPECT was found to be good: The students were able to recognise the concepts, and could see how the relationships had been derived.

They greatly valued the ability to compare conceptograms with each other. Tutor satisfaction with CONSPECT appears to be low: The visualisations aren’t meaningful without further supporting information relating to the source materials.

Usability

The usability of CONSPECT appears to be good Students were able to log in, add feeds and access their conceptograms successfully. Tutors were able to log in and navigate around the tool’s features. The following aspects of usability require improvement: More consideration needs to be given to the visualization – “A bit of tidying up and making menues etc more

accessible could help”

Efficiency

CONSPECT appears to be efficient in the following respects: an interviewed teacher considered CONSPECT’s ability to summarise texts an efficient approach to

assessment and was keen to conduct a further trial of the service for this purpose. The following aspects of the efficiency of CONSPECT require improvement: both interviewed teachers viewed CONSPECT’s outputs as requiring a high degree of decoding. More

thought needs to be given to how the results and comparisons are presented. Summary statistical data, to complement the visual outputs, has the potential to further enhance CONSPECT’s efficiency.

Page 24 of 93

Appendix B.3: Validation Reporting Template – WP5.1 (PUB-NCIT) Pilot in English at PUB-NCIT Section 3: Results - validation/verification of Validation Topics listed in the validation scenario

Summary – results

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

VT1 Students obtain feedback immediately after they finish a chat discussion (FC5.1-v1.02)

Efficiency

VT2 Students obtain feedback just-in-time for their participation in a discussion forum (FC5.1-v1.02)

Efficiency This could not be tested

VT3 The feedback and reports delivered by the system are considered useful and relevant (FC5.1-v1.13)

Pedagogic Effectiveness

Positive: All the tutors considered the feedback useful and 80% consider it relevant. All the learners find that the feedback is useful for their learning. Negative: Only 63% of the students find the feedback relevant (the rest are neutral). The relevance for students should be increased by

Page 25 of 93

removing the errors discovered in the feedback and providing explanations for each type of feedback and indicator. However, the answers of the students regarding the comparison to human feedback are not conclusive. This indicator should also be improved in v2.

VT4 Using POLYCAFE mediated collaboration has the potential to improve the learning outcomes of the students (FC5.1-v1.01)

Pedagogic Effectiveness

VT5 The visualization offers a better understanding of the chat conversation (FC5.1-v1.03)

Pedagogic Effectiveness

VT6 The time needed to provide final feedback and grading is reduced (FC5.1-v1.05, FC5.1-v1.06)

Efficiency

VT7 Increases the quality of the feedback resulting from analyzing collaborative chat sessions and discussion forums (FC5.1-v1.07)

Pedagogic Effectiveness

VT8 Easier to maintain consistency of feedback (FC5.1-v1.04)

Pedagogic Effectiveness

VT9 The feedback and grading offered by the system are

Pedagogic Effectiveness

Note: The tutors agreed with most of the feedback,

Page 26 of 93

consistent with the results of the tutors / experts (FC5.1-v1.14)

indicators and grading returned by the system. However, more formal measurements are needed for the next validation round.

VT10 The feedback and grading offered by the system are trusted by the tutors (FC5.1-v1.15)

Satisfaction

VT11 The feedback and grading offered by the system are trusted by the learners (FC5.1-v1.15)

Satisfaction Several students have stated that they do not trust or do not agree with part of the feedback presented in the conversation feedback widget and in the participant feedback widget.

Details Ref: VT1

Validation topic (feature and claim): Students obtain feedback within an appropriate time after they finish a chat discussion Category: Efficiency Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Students Indicator: Most students report that feedback is delivered in time Average processing time per chat conversation is below 5 minutes Methodology: Questionnaire and Focus group, Measurements Type & no. of participants: 9x undergraduate students Year 4

Summative results with respect to validation indicator Workpackage-specific statements: Learners-14. I find that the feedback delivered by the system is quick and the analysis is not taking long.: Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=9) All the students agreed that the feedback is delivered in time and that the processing of the chat conversations is not taking too long.

Average processing time per chat conversation is below 5 minutes. After the initial processing, all the other services return their results in less than 5 seconds.

Page 27 of 93

Formative results with respect to validation indicator

The students have reported that the 5 minutes necessary to process a chat conversation is a very short waiting period and that they are willing to wait longer in order to get good and valid feedback. The same results have been reported by the tutors.

The students reported that they would wait for more than 15 minute, up to a couple of hours if the feedback would be improved this way. The same results have been reported by the tutors.

Ref: VT2

Validation topic (feature and claim): Students obtain feedback within an appropriate time for their participation in a discussion forum Category: Efficiency Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Students Indicator: Most students report that feedback is delivered in time Average processing time per discussion thread is below 30 minutes; After the analysis is done, the answers to students queries should be delivered in less than a minute Methodology: Questionnaire and Focus group, Measurements Type & no. of participants: 9x undergraduate students Year 4

Summative results with respect to validation indicator

Only one discussion forum from UNIMAN was processed by the system. More tests need to be done in version 2. The processing time for the processed forum was under 1 hour (more than 20 discussion threads were present in the forum).

No students’ data with regard to their opinion on the time needed for processing a discussion forum.

Formative results with respect to validation indicator

The students from PUB considered that they should do not have any problems waiting for a couple of hours in order to receive valid and relevant feedback after participating in a discussion forum. However, after the initial processing, the response time of the services should be very quick (a couple of seconds at most). However, after the initial processing, all the other services return their results in less than 5 seconds so this condition is met.

Ref: VT3

Validation topic (feature and claim): The feedback and reports delivered by the system are considered useful by the students, tutors and relevant by the teachers

Stakeholder(s): Students, Tutors, Teacher Indicator: Most students report that the feedback and reports are useful for improving their future activity

Page 28 of 93

Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: Romanian

The tutors account that the feedback and reports are useful for analysing the activity of the students The teacher considers that the reports have a high degree of relevance Methodology Questionnaire and Focus group for students and tutors, Questionnaire and Interview for teacher Type & no. of participants: 9x undergraduate students Year 4, 5x tutors, 1x teacher

Summative results with respect to validation indicator Workpackage-specific statements: Learners-15. I find that the feedback delivered by the system is useful to improve my future learning activity on subject of the chat conversation.: Mean=3.78, SD=0.79, Agree/Strongly Agree = 78% (n=9) Learners-16. I find that the feedback delivered by the system is helpful to monitor and control my learning on the subject of the conversation.: Mean=3.67, SD=0.82, Agree/Strongly Agree = 67% (n=9) Learners-17. The information presented in the conversation feedback provides an useful overview on the conversation. : Mean=3.89, SD=0.57, Agree/Strongly Agree = 78% (n=9) Learners-19. The information presented in the participant feedback provides an useful overview for my role in the conversation. : Mean=3.67, SD=0.67, Agree/Strongly Agree = 78% (n=9) Learners-22. The information presented in the utterance feedback provides an useful filtering and classification of the utterances in the conversation. : Mean=4.11, SD=0.87, Agree/Strongly Agree = 89% (n=9) Learners-24. I have found that the classification of the utterances according to the speech act and argumentation is useful. : Mean=4.00, SD=0.94, Agree/Strongly Agree = 78% (n=9) Learners-26. The search widget provides an useful mechanism for identifying the most important utterances. : Mean=3.78, SD=0.92, Agree/Strongly Agree = 67% (n=9) Learners-27. The search widget provides an useful mechanism for identifying the most important participants. : Mean=3.67, SD=0.67, Agree/Strongly Agree = 56% (n=9) Learners-31. The conversation visualization allows me to find out when I should have participated more to the conversation. : Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=9) Learners-32. The conversation visualization allows me to find out what concepts I have not covered in a given part of the conversation. : Mean=4.11, SD=0.74, Agree/Strongly Agree = 78% (n=9) Tutors-17. I find that the automatic feedback delivered by the system is useful to improve the quality of the final feedback and comments delivered by me to the students. : Mean=4.80, SD=0.40, Agree/Strongly Agree = 100% (n=5) Tutors-21. The information presented in the conversation feedback provides an useful overview on the conversation. : Mean=4.20, SD=0.75,

Page 29 of 93

Agree/Strongly Agree = 80% (n=5) Tutors-23. The information presented in the participant feedback provides an useful overview for the roles of each participant in the conversation. : Mean=4.60, SD=0.49, Agree/Strongly Agree = 100% (n=5) Tutors-25. The information presented in the utterance feedback provides an useful filtering and classification of the utterances in the conversation. : Mean=4.60, SD=0.49, Agree/Strongly Agree = 100% (n=5) Tutors-27. I have found that the classification of the utterances according to the speech act and argumentation is useful. : Mean=4.20, SD=0.40, Agree/Strongly Agree = 100% (n=5) Tutors-29. The search widget provides an useful mechanism for identifying the most important utterances. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) Tutors-30. The search widget provides an useful mechanism for identifying the most important participants. : Mean=4.80, SD=0.40, Agree/Strongly Agree = 100% (n=5) Tutors-34. The conversation visualization allows me to find out when the students should have participated more to the conversation. : Mean=4.50, SD=0.50, Agree/Strongly Agree = 100% (n=5) Tutors-35. The conversation visualization allows me to find out what concepts the students have not covered in a given part of the conversation. : Mean=4.67, SD=0.67, Agree/Strongly Agree = 100% (n=5) General statements: Learners-1. Overall, the support provided by PolyCAFe (Chat Analysis and Feedback Service) was relevant to my learning activities.: Mean=3.63, SD=0.48, Agree/Strongly Agree = 63% (n=9) Learners-2. Overall, PolyCAFe (Chat Analysis and Feedback Service) helped me to successfully complete learning tasks.: Mean=3.80, SD=0.40, Agree/Strongly Agree = 80% (n=9) Learners-4. Overall, I believe that PolyCAFe (Chat Analysis and Feedback Service) provides adequate support for my learning.: Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=9) Learners-5. Overall, I believe that the support for my learning PolyCAFe (Chat Analysis and Feedback Service) provides is close enough to the current support provided by humans. : Mean=3.11, SD=1.10, Agree/Strongly Agree = 33% (n=9) Experts-1. Overall, the support provided by PolyCAFe (Chat Analysis and Feedback Service) is relevant to learning activities of learners. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=1) Experts-2. Overall, PolyCAFe (Chat Analysis and Feedback Service) helps learners to complete learning tasks successfully. : Mean=4.00, SD=0.00, Agree/Strongly Agree = 100% (n=1) Experts-6. Overall, I am satisfied that PolyCAFe (Chat Analysis and Feedback Service) helps learners in their learning. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=1) Experts-9. PolyCAFe (Chat Analysis and Feedback Service) help me to provide learners with relevant feedback. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=1) Experts-10. The information provided by PolyCAFe (Chat Analysis and Feedback Service) supports my decisions on improving learning. :

Page 30 of 93

Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=1) Tutors-1. Overall, the support provided by PolyCAFe (Chat Analysis and Feedback Service) is relevant to my teaching activities. : Mean=4.20, SD=0.75, Agree/Strongly Agree = 80% (n=5) Tutors-2. Overall, PolyCAFe (Chat Analysis and Feedback Service) helps me to complete my teaching tasks successfully. : Mean=4.60, SD=0.49, Agree/Strongly Agree = 100% (n=5)

All the tutors considered the feedback useful and 80% consider it relevant (the rest are neutral).

All the learners find that the feedback is useful for their learning, but only 63% find it relevant for their learning activities (the rest are neutral).

The feedback and indicators presented in all the 5 widgets have been considered useful by all the tutors and by the large majority of the students. The lowest score for usefulness is for the search conversation widget where only 56% of the student considered the search participant option useful, the rest being neutral to this statement (question 27).

On the whole, it is important to notice that very few negative opinions have been expressed by students with regard to the usefulness of all the widgets. Furthermore, all the tutors considered the feedback to be useful.

The teacher has considered that the service is useful to both tutors and learners (see questions 1, 2, 6, 9, 10).

However, the answers of the students regarding the comparison to human feedback are not conclusive.

Formative results with respect to validation indicator

Both students and tutors agreed that the feedback delivered by the 5 widgets is useful for their learning, respectively assessing activities. However, there has been an order on the usefulness of the widgets agreed by learners: conversation visualization, conversation feedback, utterance feedback widgets, participant feedback widget, search conversation widget. For tutors the order is: search conversation widget, conversation visualization, utterance feedback widgets, participant feedback widget and the conversation feedback widget.

The students have considered that the feedback presented in the participant feedback widget should be changed and that presenting the results on a five level scale is not very useful. However, tutors considered this scale very useful to differentiate between students.

Ref: VT4

Validation topic (feature and claim): Using POLYCAFE mediated collaboration has the potential to improve the learning outcomes of the students Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Students Indicator: Most students consider that the use of the system has improved their learning outcomes by providing them information about what they have done wrong Methodology: Questionnaire, Focus group

Page 31 of 93

Type & no. of participants: 9x undergraduate students Year 4

Summative results with respect to validation indicator Workpackage-specific statements: Learners-15. I find that the feedback delivered by the system is useful to improve my future learning activity on subject of the chat conversation. : Mean=3.78, SD=0.79, Agree/Strongly Agree = 78% (n=9) Learners-16. I find that the feedback delivered by the system is helpful to monitor and control my learning on the subject of the conversation. : Mean=3.67, SD=0.82, Agree/Strongly Agree = 67% (n=9) General statements: Learners-4. Overall, I believe that PolyCAFe (Chat Analysis and Feedback Service) provides adequate support for my learning.: Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=9)

All the students agreed that PolyCAFe provides adequate support for their learning.

Most of the students also considered that the feedback is good to improve their future learning activity. (question 15, agreement percentage: 78%, disagreement percentage 11%, the rest neutral)

Most of the students agreed is that the feedback is helpful to monitor and control their learning on the subject of the conversation (question 16, agreement percentage: 67%, disagreement percentage 11%, the rest neutral)

Formative results with respect to validation indicator

The students agreed that the conversation feedback is useful to understand what topics should have been better covered in order to study them after receiving the analysis. The students also agreed that it is better to also use the feedback offered by PolyCAFe as the current feedback offered by the tutors is very simplistic. Moreover, they consider that using PolyCAFe would definitely increase the quality of the feedback offered by the tutors thus improving their learning outcomes.

We have decided to comparatively test the learning outcomes of students in the next validation round.

Ref: VT5

Validation topic (feature and claim): The visualization offers a better understanding* of the chat conversation * Better understanding or insight means being able to extract more details about the chat in less time than using a usual text or chat log style alternative. To be able to understand what went good or wrong in their conversation with regard to collaboration and content (in order to know what to improve).

Stakeholder(s): Students, Tutors Indicator: Most students report that visualization offers them a better insight about the discussion Most tutors report that visualization offers them a better insight about the discussion Methodology: Questionnaire, Focus group

Page 32 of 93

Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: Romanian

Type & no. of participants: 9x undergraduate students Year 4, 5x tutors

Summative results with respect to validation indicator Workpackage-specific statements: Learners-29. The conversation visualization provides a simpler understanding of the conversation (in order to understand the structure and collaboration in the conversation) compared to a simple text presentation of the chat. : Mean=4.44, SD=0.68, Agree/Strongly Agree = 89% (n=9) Learners-30. The conversation visualization allows me to find out when I should have participated more to the conversation. : Mean=4.44, SD=0.68, Agree/Strongly Agree = 89% (n=9) Learners-31. The conversation visualization allows me to find out when I should have participated more to the conversation. : Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=9) Learners-32. The conversation visualization allows me to find out what concepts I have not covered in a given part of the conversation. : Mean=4.11, SD=0.74, Agree/Strongly Agree = 78% (n=9) Tutors-32. The conversation visualization provides a simpler understanding of the conversation (in order to understand the structure and collaboration in the conversation) compared to a simple text presentation of the chat. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) Tutors-33. The conversation visualization provides a better understanding of the conversation (in order to understand the structure and collaboration in the conversation) compared to a simple text presentation of the chat. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) Tutors-34. The conversation visualization allows me to find out when the students should have participated more to the conversation. : Mean=4.50, SD=0.50, Agree/Strongly Agree = 100% (n=5) Tutors-35. The conversation visualization allows me to find out what concepts the students have not covered in a given part of the conversation. : Mean=4.67, SD=0.67, Agree/Strongly Agree = 100% (n=5) General statements: None

The students agree that the conversation visualization widgets offer a simpler and better visualization of the chat conversation, that proves useful for understanding when they should have participated more in the discussion, as well as what concepts they covered the best during the conversation. (questions 29-32 – average score between: 4.11 – 4.44, percentage agreement between: 78%-100%, percentage disagreement: 0%)

All the tutors strongly agree that the conversation visualization widgets offer a simpler and better visualization of the chat conversation. Furthermore, all of them consider the feedback provided by this widget useful. (questions 32-35 – average score between: 4.50 – 5.00, percentage agreement between: 100%, percentage disagreement: 0%)

Formative results with respect to validation indicator

Page 33 of 93

The conversation visualization widget has been considered one of the most useful of the widgets both by students and tutors. The results of using it are also considered relevant for their (learning or teaching) activities. Suggestions offered by students: link the conversation visualization with the other widgets in order to be able to select utterances and threads in other widgets and see the results also in the visualization and vice-versa (inter-widget communication). Improvements that can be made to the system (issued by tutors): redefine a discussion thread as now they appear to be too long, consider alternative graphics such the evolution of the participants during the conversation.

Ref: VT6

Validation topic (feature and claim): The time needed for tutors to provide (1) final feedback and/or (2) grading is reduced Category: Efficiency Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Tutors Indicator: Most tutors report that the time needed to analyse a chat conversation in order to provide feedback and/or grading to the students has been reduced Improvement of the productivity of the tutors that use POLYCAFE by at least 30% compared to the ones that do not use it Methodology: Questionnaire, Focus group, Measurements Type & no. of participants: 5x tutors

Summative results with respect to validation indicator Workpackage-specific statements: Tutors-15. I consider that the time required to deliver the final feedback to the students for a chat conversation is improved by using the POLYCAFE. : Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) Tutors-16. I consider that the time required to deliver the final grading to the students for a chat conversation is improved by using the POLYCAFE.: Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) General statements: Tutors-5. It takes me less time to complete my teaching tasks using the PolyCAFe (Chat Analysis and Feedback Service) than without the system.: Mean=5.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) Tutors-6. Overall, using the system requires less mental effort to complete my teaching task than without the system. : Mean=4.60, SD=0.49, Agree/Strongly Agree = 100% (n=5)

All tutors strongly agree that it takes less time to provide both feedback and grading to the students (Questions 5, 15 and 16). All tutors also agree that is requires less mental effort to deliver this feedback (Question 6).

Preliminary measurement (analysis of chat conversations with and without PolyCAFe + measurements) show a reduction of the time needed to

Page 34 of 93

provide final feedback and grading to the students between 25-33%.

Formative results with respect to validation indicator

The tutors have considered the reduction of time very important, although for most of them it has been the first time when they have seen and used the widgets for providing feedback to the students. Only two of the tutors were familiarized with the widgets before the validation took place. They have considered that after adapting to the application and making some minor adjustments to the interface (that were suggested by the tutors, but are not detailed here), further improvements in time per analysis of a chat conversation are very likely to be observed.

Tutors saved time because several aspects would have been very difficult to analyze without the widgets, thus taking a lot of time: for example, the assessment of the degree of collaboration in the conversation, the coverage of the concepts in each part of the conversation and the concepts that were not correctly covered.

Ref: VT7

Validation topic (feature and claim): Increases the quality of the feedback resulting from analyzing collaborative chat sessions and discussion forums Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Tutors Indicator: Most tutors consider that the feedback offered after using the system has been better / more complete than without using it Methodology: Questionnaire, Focus group Type & no. of participants: 5x tutors

Summative results with respect to validation indicator Workpackage-specific statements: Tutors-17. I find that the automatic feedback delivered by the system is useful to improve the quality of the final feedback and comments delivered by me to the students..: Mean=4.80, SD=0.40, Agree/Strongly Agree = 100% (n=5) General statements: None

All the tutors agree that the quality of their feedback has improved by using PolyCAFe (80% of them strongly agree – question 17).

Formative results with respect to validation indicator

The quality of the feedback is improved because several aspects would have been very difficult to analyze without the widgets, thus lowering the quality of the feedback provided by the tutors: for example, the assessment of the degree of collaboration in the conversation, the coverage of the concepts in each part of the conversation and the concepts that were not correctly covered. A tutor said that several very important aspects in the conversation are almost impossible to determine correctly without using PolyCAFe or other analysis tools.

Page 35 of 93

Ref: VT8

Validation topic (feature and claim): Easier to maintain consistency of feedback (between different tutors, for different learners) Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Tutors Indicator: The tutors agree that their feedback is more consistent when using POLYCAFE Methodology: Questionnaire, Focus group, Comparison of feedback with and without system Type & no. of participants: 5x tutors

Summative results with respect to validation indicator Workpackage-specific statements: Tutors-18. Using POLYCAFE improves the consistency of the scope and quality of the feedback delivered to the students by different tutors..: Mean=4.80, SD=0.40, Agree/Strongly Agree = 100% (n=5) General statements: None

All the tutors agree that using PolyCAFe improves the consistency of the feedback they provide to the students, without any loss in the quality or the scope of the feedback (80% of them strongly agree – question 18). 2-4 tutors have provided feedback for chat conversations with (2 chats per tutor) and without the system (4 chats per tutor) and than all of the tutors compared the results. All the tutors have agreed that the feedback offered after using the system has a more homogenous structure and, therefore, is more consistent.

Formative results with respect to validation indicator

The tutors suggested that the feedback reported by PolyCAFe shall help them structure the final feedback delivered to the students, even if this happens without planning it. They have liked very much the division of the feedback between the whole conversation, the participants and the utterances. Plus, the further separation between feedback on content and on involvement and collaboration for an utterance would be useful.

Ref: VT9

Validation topic (feature and claim): The feedback and grading offered by the system are consistent with the results of the tutors / experts Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Tutors Indicator: Most tutors consider that the automatic feedback and grading offered by the system is consistent / similar with their own feedback The mistakes identified by the tutors in the automatic feedback should be as low as possible and the number of serious mistakes (that might mislead the student) should be zero.

Page 36 of 93

Methodology: Questionnaire, Focus group Type & no. of participants: 5x tutors

Summative results with respect to validation indicator Workpackage-specific statements: Tutors-19. Overall, the feedback offered by the system is consistent with your opinion. : Mean=4.25, SD=0.43, Agree/Strongly Agree = 100% (n=5) Tutors-20. Overall, the gradings and other indicators offered by the system are consistent with your point of view..: Mean=4.80, SD=0.40, Agree/Strongly Agree = 100% (n=5) General statements: Tutors-10. I feel confident using PolyCAFe (Chat Analysis and Feedback Service). : Mean=4.20, SD=0.40, Agree/Strongly Agree = 100% (n=5) Tutors-11. Overall, I am satisfied with PolyCAFe (Chat Analysis and Feedback Service). : Mean=4.00, SD=0.00, Agree/Strongly Agree = 100% (n=5)

All the tutors felt confident and satisfied using PolyCAFe (Questions 10, 11).

Furthermore, all of them agreed that the overall results are consistent with their own opinion (Question 19).

None of the tutors found serious misleads in the results offered by each of the 5 feedback and support widgets, although for some widgets (e.g. participant feedback) part of the tutors were not very confident with the results, being only neutral.

Formative results with respect to validation indicator

On the whole, the tutors reported that the feedback and results are consistent with their opinion, although several small issues were observed (especially related to the participant and utterance feedback widgets). It has been agreed to have a more formal validation using measurements between the opinion of the tutors without using the widgets and the results returned by the services.

Ref: VT10

Validation topic (feature and claim): The feedback and grading offered by the system are trusted by the tutors Category: Satisfaction Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Tutors Indicator: Most tutors consider that the automatic feedback and grading offered by the system can be trusted Degree of trust of the automatic feedback and grading offered by the system Methodology: Questionnaire, Focus group Type & no. of participants: 5x tutors

Page 37 of 93

Summative results with respect to validation indicator Workpackage-specific statements: Tutors-22. I have not found any errors or misleads in the information presented in the conversation feedback widget. : Mean=4.20, SD=0.40, Agree/Strongly Agree = 100% (n=5) Tutors-24. I have not found any errors or misleads in the information presented in the participant feedback widget concerning the participation of the students in the conversation. : Mean=3.80, SD=0.75, Agree/Strongly Agree = 60% (n=5) Tutors-26. I have found that the most important utterances in the conversation are correct and correspond to the ones identified by me. : Mean=4.00, SD=0.00, Agree/Strongly Agree = 100% (n=5) Tutors-28. I have found that the classification of the utterances according to the speech act and argumentation is correct. : Mean=4.00, SD=0.63, Agree/Strongly Agree = 80% (n=5) Tutors-31. I agree with the results returned by the search widget on the importance of utterances and participants. : Mean=4.40, SD=0.49, Agree/Strongly Agree = 100% (n=5) General statements: None

See VT9, the feedback and grading were trusted by the tutors.

Formative results with respect to validation indicator

The results are trusted by the tutors on the whole. However, there have been identified special cases with misleads that should be solved in the next version. It has been agreed to have a more formal validation using measurements between the opinion of the tutors without using the widgets and the results returned by the services. Ref: VT11

Validation topic (feature and claim): The feedback and grading offered by the system are trusted by the learners Category: Satisfaction Language of pilot: English Native language of participants: Romanian

Stakeholder(s): Learners Indicator: Most learners consider that the automatic feedback and grading offered by the system can be trusted Degree of trust of the automatic feedback and grading offered by the system Methodology: Questionnaire, Focus group Type & no. of participants: 9x undergraduate students Year 4

Summative results with respect to validation indicator Workpackage-specific statements:

Page 38 of 93

Learners-18. I have not found any errors or misleads in the information presented in the conversation feedback widget. : Mean=2.00, SD=0.68, Agree/Strongly Agree = 0% (n=9) Learners-20. I have not found any errors or misleads in the information presented in the participant feedback widget concerning my participation in the conversation. : Mean=3.00, SD=0.94, Agree/Strongly Agree = 22% (n=9) Learners-21. I have not found any errors or misleads in the information presented in the participant feedback widget concerning the others participation in the conversation. : Mean=2.67, SD=0.47, Agree/Strongly Agree = 0% (n=9) Learners-23. I have found that the most important utterances in the conversation are correct and correspond to the ones identified by me. : Mean=3.89, SD=0.87, Agree/Strongly Agree = 56% (n=9) Learners-25. I have found that the classification of the utterances according to the speech act and argumentation is correct. : Mean=3.56, SD=1.07, Agree/Strongly Agree = 56% (n=9) Learners-28. I agree with the results returned by the search widget on the importance of utterances and participants.: Mean=2.89, SD=0.87, Agree/Strongly Agree = 33% (n=9) General statements: None

On the whole, learners trust the results of the services. However, specific results were not trusted or the results are neutral: the conversation feedback, the participant feedback and the search conversation widgets are the ones that are less trusted by learners, in this specific order. (Questions 18, 20, 21 and 28)

Formative results with respect to validation indicator

The learners have observed the same misleads as the tutors and because they are directly affected by the system insisted that these misleads made them not trust the some of the feedback and indicators computed by the system. Furthermore, they said they that want to know how the system is functioning ion order to trust the results more than in the present setting. In order to solve this issue, they suggested using a help system that explains (in simple words) how each feedback is computed and what are the main criteria used for delivering the feedback. They insisted that this way they will know better what to improve in the future. Section 4: Results – inductive stakeholder validation activities Ref: S5-1

VALIDATION EVENT Methodology: Think aloud during the use of the software. Focus group following use of the software.

Type & no. of participants: 9x undergraduate students Year 4, 5x tutors Language of pilot: English Native language of participants: Romanian

Page 39 of 93

Stakeholder(s): Learners, tutors

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Both students (especially) and tutors had difficulty understanding some of the scores. A short guide to interpreting the reports was requested, as was more on-screen help.

Suggestion: use of tool-tips to guide the user.

Ref: S5-2

VALIDATION EVENT Methodology: Think aloud during the use of the software. Focus group following use of the software. Stakeholder(s): Learners, tutors

Type & no. of participants: 9x undergraduate students Year 4, 5x tutors Language of pilot: English Native language of participants: Romanian

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Inter-widget communication would be useful especially between the conversation visualization, utterance feedback and search conversation widgets (e.g. selecting an utterance in the utterance feedback widget to be reflected in the graphical visualization by focusing on the same utterance). This was the opinion of both learners and tutors and the LTfLL team agrees with it.

Ref: S5-3

VALIDATION EVENT Methodology: Interview Stakeholder(s): Head of the Computer Science Department

Type & no. of participants:1x Head of the Computer Science Department Language of pilot: English Native language of participants: Romanian

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Scenario-related: It is important to support chatting because this situation is very similar with real life (workplace situations) and students have to think fast and use all their knowledge without having much time to reflect. The discourse in not artificial and is based on the students’ inner knowledge.

Page 40 of 93

Advantage: Difficult for the teachers to analyze real-time conversations, so the tools are needed to help the teachers provide a fair assessment to the students.

Comments: 1. The tools should be validated by teachers and tutors in order to grade the feedback delivered by the application. It is important to have a good evaluation of the feedback of the system by the teachers and tutors before using them at large-scale. 2. A period of accommodation to the instruments is essential. 3. It is very important to have such an instrument because it may stimulate students to participate in the conversation and collaborate. 4. The evaluation / assessment should be well explained in order the students to understand the feedback. 5. Threads: important to make recommendations of resources after having a conversation about the concepts that were not covered. 6. Would definitely recommend the services not only to colleagues in the engineering departments, but also for social sciences domains or marketing, economics, etc. Section 5: Summary of results – verification activities

ALPHA TESTING Outstanding Issues

Outstanding challenges / opportunities from alpha testing (major points only):

1. A part of the functionality from the two-page scenario needs to be implemented in the next round of the development. 2. Verification must be a vital key of the next validation round. 3. The information that is displayed into several widgets is not self-explaining to the users. 4. Several usability issues have been identified. 5. There should be a mechanism for signalling feedback errors and saving this information in the system (or rating the feedback).

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: 1. Implement the rest of the features. 2. Build a corpora manually analyzed by tutors for most of the indicators and feedback computed by the system. 3. Improve usability provide tool-tips and visual markers to improve the process of understanding the feedback. 4. Provide separate grades for content and collaboration of each utterance.

Page 41 of 93

5. Provide a rating mechanism for the feedback.

How do these challenges/opportunities inform the roadmap for the end of the project?

Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may not be complete by the end of the project (e.g. LSA improvements). 1. The verification should continue after the end of the project (taking into account other languages and domains).

Verification of language technologies (reported in full in D5.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development?

How does this work inform the roadmap for the end of the project?

The relevance / accuracy of feedback

The quality of feedback is a concern for students, tutor, teaching managers and the LTfLL development team

1) A corpus of manually annotated feedback and indicators that are very similar to the ones provided by the system has started to be build in order to assess the accuracy of the feedback and tune the indicators if needed. 2) A mechanism for signaling errors / poor feedback must be implemented.

Section 6: Summary of results from validation associated with dissemination activities

Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

10/04/10 Conference, CSEDU 2010

International Conference (40 minutes)

Professors and researchers in education and computer science. TEL developers

10-20 Focus group, Discussions, Comments

- Suggestion that the tutors scored higher because PolyCAFe is helping them reduce the time required to provide feedback, therefore they

Page 42 of 93

could be biased by this factor - Time and resources required to setup PolyCAFe in a new context is very important

15/03/10 IBM Academic Days for Universities in Romania, Cluj-Napoca, Romania

National forum (30 mins)

Professors and researchers in education and computer science. TEL developers. Company and product managers

70 Discussions, Comments

- Explored the difficulties to adopt the use of PolyCAFe in a company environment (where the documents need to train LSA are not very didactic or scientific, but focused on competencies and processes) - LSA might not be appropriate for documents focused on competencies or processes

30/06/09 Workshop, IRIT Toulouse

Workshop (20 mins)

Professors and researchers in computer science

25 Discussions, Comments

- Suggestion to use LSA instead of a domain ontology or combine the two approaches (maybe even compare them)

Section 7: Recommendations from second validation round

Ref Validation topic (claim) OR high

level question Recommendations for next steps Specify section of

scenario / software unit

affected

By when (date)

VT3 The feedback and reports delivered by the system are considered useful by the students, tutors and relevant by the teachers

Learners were not satisfied by the information presented in the conversation and participant feedback widgets. Improve this indicators and feedback.

Conversation & Participant feedback widget

September 2010

VT3 The feedback and reports delivered by the system are considered useful by the students, tutors and relevant by the teachers

Students considered not to be very useful the participant feedback information because the learners were given scores accordingly to their level in the chat group. However, the tutors agreed to this approach. Alternative solutions should be considered.

Participant feedback widget

September 2010

Page 43 of 93

VT10 The feedback and grading offered by the system are trusted by the tutors and learners

Correct the erroneous feedback in the next version and provide explanations on how the feedback is computed.

All widgets September 2010

S5-1 Usability Both students (especially) and tutors had difficulty understanding some of the scores. A short guide to interpreting the reports was requested, as was more on-screen help.

All widgets September 2010

S5-2 Usability Improve the presentation and formatting of the widgets. All widgets September 2010

Section 8: Conclusions from second validation round Pedagogic effectiveness

We conclude that PolyCAFe is pedagogically effective in the following respects: Provide quick feedback to learners that participated in a chat conversation related to a course topic. Suggest participants that have had a good coverage of specific concepts in the discussion. The visualization widget offers a simpler and better understanding of chat conversations. The quality of the final feedback delivered the tutors to learners increases by using the service. The consistency (between different tutors) of the final feedback delivered the tutors to learners increases by

using the service. The division of the feedback and the information presented in the 5 widgets is considered useful and relevant

by students, tutors and teachers The following aspects were not considered pedagogically effective by part of the stakeholders: o Students considered not to be very relevant the participant feedback information because the learners were

given scores accordingly to their level in the chat group. However, the tutors agreed to this approach. Alternative solutions should be considered.

Relevance

PolyCAFe is considered relevant for all the stakeholders, especially because it is important to support chatting because this situation is very similar with real life (workplace situations) and students have to think fast and use all their knowledge without having much time to reflect. However, the system should be validated and verified by several teachers and tutors in order to assess the quality of the feedback and utility that is provides.

Page 44 of 93

This service is relevant for other areas as well, especially for social sciences.

Satisfaction

We conclude that satisfaction with PolyCAFe appears to be good, in the following areas/aspects of the service: For tutors, all the results were satisfactory. Learners were very satisfied with the conversation visualization, the utterance feedback and less satisfied

with the search conversation widget Satisfaction was found to be less good in the following areas: Learners were not satisfied by the information presented in the conversation and participant feedback

widgets. They required more information about how the system is computing the feedback and more details about how to use each indicator and feedback provided by the system.

Usability

We conclude that the usability of PolyCAFe appears to be good, in the following areas/aspects of the service: The users have found the system easy to use, easy to navigate and easy to recover from errors. Less mental effort is needed by both students and tutors to complete their tasks with PolyCAFe than without

it. The following aspects of usability require improvement: Implementation of inter-widget communication to provide the users an enhanced functionality and user

experience (e.g. selecting an utterance in the utterance feedback widget to be reflected in the graphical visualization by focusing on the same utterance).

Improve the presentation and formatting of the widgets. Provide tool-tips and a help mechanism

Efficiency

We conclude that PolyCAFe appears to be efficient, in the following respects: The time needed for the analysis of chats and forums is very good (the service can do some more analysis

without becoming inefficient). The time taken by the tutors to provided final feedback to the students has improved by 25-33% using

PolyCAFe.

Transferability

The following information is useful with regard to possible transfer of the service to other domains, languages, and pedagogic and organisational settings. Strengths/opportunities: There is no alternative product for this task at the current moment. For most languages, the NLP pipe, linguistic and domain ontologies are freely available.

Page 45 of 93

Weaknesses / threats (including competing systems): Finding the right components for the LSA pipe for some languages. Some languages do not have linguistic and domain ontologies freely accessible.

Page 46 of 93

Appendix B.4: Validation Reporting Template – WP5.1 (UNIMAN) 'Showcase' Pilot in English at UNIMAN Section 3: Results - validation/verification of Validation Topics listed in the validation scenario

Summary – results Only a sub-set of the WP5.1 validation topics could be validated at UNIMAN as the service had to be demonstrated. In particular, VT2 ("Students obtain feedback just-in-time for their participation in a discussion forum") could not be tested.

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

VT3 The feedback and reports delivered by the system are considered useful and relevant

Pedagogic Effectiveness

VT4 Using POLYCAFE mediated collaboration has the potential to improve the learning outcomes of the students

Pedagogic Effectiveness

VT5 The visualization offers a better understanding of the chat conversation

Pedagogic Effectiveness

Details

Ref: Validation topic (feature and claim): The feedback and reports delivered by the system are

Stakeholder(s): Students (3), Tutor (1)

Page 47 of 93

VT3

considered useful by the students, tutors and relevant by the teachers Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: English

Indicator: Most students report that the feedback and reports are useful for improving their future activity The tutors account that the feedback and reports are useful for analysing the activity of the students The teacher considers that the reports have a high degree of relevance Methodology Questionnaire and Focus group for students and tutors, Questionnaire and Interview for teacher Type & no. of participants: 3x undergraduate students Year 3 1 x Tutor

Summative results with respect to validation indicator Workpackage-specific statements: General statements: Learners-1. Overall, the support provided by PolyCAFe (Chat Analysis and Feedback Service) was relevant to my learning activities.: Mean=4.67, SD=0.47, Agree/Strongly Agree = 100% (n=3) Learners-2. Overall, PolyCAFe (Chat Analysis and Feedback Service) helped me to successfully complete learning tasks.: Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=3) Learners-4. Overall, I believe that PolyCAFe (Chat Analysis and Feedback Service) provides adequate support for my learning.: Mean=4.67, SD=0.47, Agree/Strongly Agree =100% (n=3) Learners-5. Overall, I believe that the support for my learning PolyCAFe (Chat Analysis and Feedback Service) provides is close enough to the current support provided by humans. : Mean=4.0, SD=0.82, Agree/Strongly Agree = 67% (n=3) Tutor-1. Overall, the support provided by PolyCAFe (Chat Analysis and Feedback Service) is relevant to my teaching activities. Score: 4, Agree = 100% (n=1) Tutor-4. Overall, I believe that PolyCAFe (Chat Analysis and Feedback Service) provides adequate support for teaching. Score: 4, Agree = 100% (n=1)

Formative results with respect to validation indicator

Student: “There was lots of useful info very quickly, you could see what you’d done well, what you’d done poorly.” Tutor: “You can see who's participated, levels of participation are interesting to see who's contributed and who hasn't. Also, you can see the key topics of interest, what the focus is. In terms of teaching, it can inform your long term planning - you can see the areas of interest and develop the next learning outcomes from that.”

Page 48 of 93

Ref: VT4

Validation topic (feature and claim): Using POLYCAFE mediated collaboration has the potential to improve the learning outcomes of the students Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: English

Stakeholder(s): Students Indicator: Most students consider that the use of the system has improved their learning outcomes by providing them information about what they have done wrong Methodology: Questionnaire, Focus group Type & no. of participants: 3x undergraduate students Year 3

Summative results with respect to validation indicator Workpackage-specific statements: General statements: Learners-4. Overall, I believe that PolyCAFe (Chat Analysis and Feedback Service) provides adequate support for my learning.: Mean=4.67, SD=0.47, Agree/Strongly Agree = 100% (n=3)

Formative results with respect to validation indicator [b] “…Would make you work more as a team, come to a general conclusion as a group,” [c] “I think it’ll make it more of a discussion instead of just posting something up, which is what it is at the moment really. Gives you the opportunity to bounce off each other.” [d] “It’s more of a challenge, which is good. Over time you’d be able to hopefully see improvements in the way you’re performing in discussions”

Ref: VT5

Validation topic (feature and claim): The visualization offers a better understanding* of the chat conversation * Better understanding or insight means being able to extract more details about the chat in less time than using a usual text or chat log style alternative. To be able to understand what went good or wrong in their conversation with regard to collaboration and content (in order to know what to improve). Category: Pedagogic Effectiveness Language of pilot: English Native language of participants: English

Stakeholder(s): Students Indicator: Most students report that visualization offers them a better insight about the discussion Methodology: Questionnaire, Focus group Type & no. of participants: 3x undergraduate students Year 3

Page 49 of 93

Summative results with respect to validation indicator Workpackage-specific statements: General statements: Learners-29. The conversation visualization provides a simpler understanding of the conversation (in order to understand the structure and collaboration in the conversation) compared to a simple text presentation of the chat: Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=3) Learners-30. The conversation visualization provides a better understanding of the conversation (in order to understand the structure and collaboration in the conversation) compared to a simple text presentation of the chat: Mean=4.33, SD=0.47, Agree/Strongly Agree = 100% (n=3) Learners-31. The conversation visualization allows me to find out when I should have participated more to the conversation: Mean=3.67, SD=1.25, Agree/Strongly Agree = 67% (n=3) Learners-32. The conversation visualization allows me to find out what concepts I have not covered in a given part of the conversation: Mean=4.0, SD=0.82, Agree/Strongly Agree = 67% (n=3)

Formative results with respect to validation indicator “[d] it lets you go back and reflect on EXACTLY what you said, if your argument was torn to pieces, you could see why” Section 4: Results – inductive stakeholder validation activities

Ref: S5-1

VALIDATION EVENT Methodology: Stakeholder(s):

Type & no. of participants: 3rd Year Medical Students x 3 Tutor x1 Language of pilot: English Native language of participants: English

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Suggestion to use different colours to represent different participants in a discussion in the visualisation tool. Section 5: Summary of results – verification activities

ALPHA TESTING Outstanding issues

Outstanding challenges / opportunities from alpha testing (major points only):

PolyCafe would not run on the standard desktop at UNIMAN. PolyCafe also would not run on personal laptops via the wireless network at UNIMAN. There appears to

Page 50 of 93

be a security issue, as PolyCafe runs successfully on UNIMAN-owned laptops with registered MAC addresses.

How do these challenges/opportunities inform the next round of design and development?

Investigations are under way to determine how participants can access PolyCafe in the next round.

How do these challenges/opportunities inform the roadmap for the end of the project?

Documentation for new users needs to make it clear what the technical requirements are for using the system.

Verification of language technologies (reported in full in D5.3) Not undertaken at UNIMAN.

Section 6: Summary of results from validation associated with dissemination activities

Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

Section 7: Recommendations from second validation round

Ref Validation topic (claim) OR high

level question Recommendations for next steps Specify section of

scenario / software unit

affected

By when (date)

1 All topics Need for training of tool to support medicine Development team to produce a version that is trained to work with a medicine background corpus.

1st October 2010

2 All topics Develop a usage scenario for Medical PBL curriculum 1st October

Page 51 of 93

2010 Section 8: Conclusions from second validation round Pedagogic effectiveness

Students saw real value in the tool as an indicator of their participation. Students identified the results of a conversation analysis as presenting them with a challenge for personal improvement. Tutor saw limited value over and above what they already have in the reporting from WebCT

Relevance

Satisfaction

Students felt that the measures of performance used in the analysis were more objective and the results were provided quicker than a tutor or facilitator intervention.

Usability Tutor felt that the tool required refinement to make it more useable.

Efficiency

The tutor identified the tool may increase workload, rather than decrease, in its present form. There needs to be more concise summary reporting at the group level.

Transferability

Students and the tutor identified the potential of the tool to support medicine and could see how it related to their practice.

Page 52 of 93

Appendix B.5: Validation Reporting Template – WP5.2 (UPMF) Pilot in French at UPMF Section 3: Results - validation/verification of Validation Topics listed in the validation scenario

Summary – results

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

VT1 Optimize learning outcomes by explorative and self-regulated reading

pedagogic effectiveness

x There is no evidence to support this claim.

VT2 Students can get feedback on their productions as often as they want.

pedagogic effectiveness

x Feedback too slow

VT3 The teacher’s activity is more directed on higher levels of students’ activity (guidance, collaborative learning management, production assessment on style).

Efficiency x

VT4 Users find the software easy to use Usability x Usability of Pensum is satisfactory with minor changes required

VT5 Users’ cognitive load is reduced.” Usability x

VT6 Students can get correct feedback Satisfaction x The feedback is not satisfactory but could be improved

Page 53 of 93

Details Ref: VT1

Validation topic (feature and claim): Optimize learning outcomes by explorative and self-regulated reading Category: pedagogic effectiveness Language of pilot: French Native language of students: Predominantly French

Stakeholder(s): students Indicator:

Positive answers to a likert questionnaire on student’s judgment about system (e.g., does the system enable you to regulate your understanding?).

Number of requests for a feedback of PenSum Methodology: questionnaire + recorded track in the database from the student activity during the use of Pensum Type & no. of participants: 11 students in educational sciences and linguistics (years 4 & 5)

Summative results with respect to validation indicator Values of answers to a likert questionnaire: Pensum got a medium value on opinion questions related to learning (question # 1: M = 2.45; SD = .93), help (question # 2: M = 2.18; SD = .60) and scaffolding (question # 4: M = 2.64; SD = 1.03). Mean number of feedback requests: Students asked feedback about four times, but a pretty high variation was found (M = 4.27; SD = 2.97), which indicates important interpersonal differences in the use of Pensum.

Ref: VT2

Validation topic (feature and claim): Students can get feedback on their productions as often as they want. Category: pedagogic effectiveness Language of pilot: French Native language of students: Predominantly French

Stakeholder(s): students Indicator: Number of requests for a feedback of PenSum Student point of view about feedback response-time (phone

interview) Methodology: phone interview + recorded track in the database from the student activity during the use of Pensum Type & no. of participants: 11 students in educational sciences and linguistics (years 4 & 5)

Summative results with respect to validation indicator

Page 54 of 93

Mean number of feedback requests: The students was free to work with pensum and ask a feedback during ten days. They just logged on 1 or 2 times and students asked feedback about four times. But a pretty high variation was found (M = 4.27; SD = 2.97), which indicates important interpersonal differences in the use of Pensum.

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Point of view from a phone interview: Several questions during the phone interview allowed students to formulate their opinion about the feedback. Students felt that Pensum processes the feedback too slowly. Five students out of eleven mentioned this problem when asked on possible enhancements for this service.

Ref: VT3

Validation topic (feature and claim): The teacher’s activity is more directed on higher levels of students’ activity (guidance, collaborative learning management, production assessment on style). Category: Efficiency Language of pilot: French Native language of students: Predominantly French

Stakeholder(s): teacher Indicator: Positive answers to a likert questionnaire on teacher’s judgments

about system (e.g., do you think that the revision of a student synthesis is easier for you, less commentaries to do about the synthesis coherence, completeness, etc ?).

Methodology: questionnaire Type & no. of participants: 1 teacher in computer science/TEL

Summative results with respect to validation indicator

Values of answers to a likert questionnaire: The interviewed teacher thinks that Pensum can help teachers focus on higher levels of students’ activity (questions # 8 about the fluency to learn with pensum, question # 9 about the fluency to navigate with pensum and question # 10 about the fluency to go back in pensum). Especially, he answered 4 or 5 (out of 5) to questions concerning addressing problems encountered by students (question # 8), feedback provided (question # 9) and which decisions to take to enhance students’ learning (question # 10).

Ref: VT4

Validation topic (feature and claim): Users find the software easy to use Category: usability Language of pilot: French Native language of students: Predominantly French

Stakeholder(s): students Indicator: The majority of users agree that the system is easy to use

(questions 6 to 8) Point of view of students about the ergonomics

Methodology: questionnaire + phone interview

Page 55 of 93

Type & no. of participants: 11 students in educational sciences and linguistics (years 4 & 5)

Summative results with respect to validation indicator

Values of answers to a likert questionnaire: The results regarding usability are contradictory. On one hand, students agree that Pensum is easy to use (Question # 6 : M = 3.81; SD = .75 ; Question # 8 : M = 3.80; SD = 1.03) and all but one use it to directly write their synthesis and to get feedback. Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Point of view from a phone interview: Three students judged that Pensum’s windows were too small to comfortably manage their activity. Care must be taken in developing the widgetised version to make sure that Pensum's windows are large enough.

Ref: VT5

Validation topic (feature and claim): Users’ cognitive load is reduced.” Category: usability Language of pilot: French Native language of students: Predominantly French

Stakeholder(s): students Indicator: The cognitive load measured by the NASA-TLX questionnaire is low.

Methodology: Type & no. of participants: 11 students in educational sciences and linguistics (years 4 & 5)

Summative results with respect to validation indicator

Values of answers to the NASA-TLX questionnaire: On the other hand, NASA-TLX-derived questions showed that the most important effort is spent in wondering how to take profit from Pensum and how to use it (4 students out of 11) and frustration (4 students out of 11), comparatively to other scales, like mental demand (2 students), temporal demand (1 student), and performance (0 student).

Ref: VT6

Validation topic (feature and claim): Students can get correct feedback Category: satisfaction Language of pilot: French Native language of students: Predominantly French

Stakeholder(s): students, teacher Indicator: Positive answers to a questionnaire about the opinion on the

validity of feedback Point of view of students about the quality of feedback

Methodology: questionnaire + phone interview

Page 56 of 93

Type & no. of participants: 11 students in educational sciences and linguistics (years 4 & 5) + teacher/tutor in TEL.

Summative results with respect to validation indicator Values of answers to a questionnaire: In a question on feedback validity and pertinence, only 10% of students were fully satisfied with Pensum’s feedback, while 30% of them find the latter neither pertinent nor valid. However, the feedback presentation form is not, in itself, to be reconsidered since 64% of the students find it easy to understand (question # 3b). Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Point of view from a phone interview: When asked on likely enhancements on Pensum, five students out of eleven mentioned feedback validity concerns. Section 4: Results – inductive stakeholder validation activities Ref: S4-1

VALIDATION EVENT Methodology: Summative questionnaires and phone interview Stakeholder(s): Students

Type & no. of participants: 11 students in educational sciences and linguistics (years 4 & 5) Language of pilot: French Native language of participants: French

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan To help the students to write their synthesis, Pensum proposes a notepad organized in 4 parts depending on the kind of notes (prior knowledges and main ideas of source text to include in the synthesis, problematic of source texts, encountered difficulties). This functionality is very appreciated. The students found interesting and proposed some enhancements as well. Some of them are now under implementation. Most of students (82%) actually paid attention on Pensum’s feedback, but only 36% said they revised their synthesis according to the latter. Students gave the following reasons not to take Pensum’s feedback into account: too many problems identified, lack of validity, form and lexical requirements of the system too strong (a synthesis do not have to stick to source texts).

Ref: S4.2

VALIDATION EVENT Methodology: Summative questionnaires and phone interview

Type & no. of participants: 1 teacher, 11 students in educational sciences and linguistics (years 4 & 5)

Page 57 of 93

Stakeholder(s): Teacher & students Language of pilot: French Native language of participants: French

Additional formative results (not associated with validation topics) The interviewed teacher suggested to use Pensum as a starter before a debate. This point goes toward the integration process. One could begin with Pensum to set up ideas on a given subject, then move to a chat in order to discuss them (see WP 5.1). This scenario is better than a single debate, in which only a few students take the floor, usually Pensum also presents a interest with regard to course revision and learning. Five students expressed through the formative questionnaire their willingness of use of Pensum to revise their courses. Amongst of them, two students suggested that Pensum can be useful to learn courses from notes taken from within this service. One student may use Pensum for retrieving information more quickly, another one for understanding course texts and, eventually, two latter to be assisted in the task of writing syntheses. Overall, all these different uses suggest that Pensum could be used for fostering students’ self-regulated learning. Moreover, one student suggested to add to Pensum a feedback displaying the keywords of several given course text. Section 5: Summary of results – verification activities

ALPHA-TESTING Outstanding Issues

Outstanding challenges / opportunities from alpha testing (major points only):

1. About the feedback : lack of relevance, delay of display. 2. Saving issues for student syntheses, due to database format. 3. The lack of a return character, spelling mistakes... in source text. 4. Problems of ergonomics: hyperlinks in the first menu to be replaced by buttons. An alternate

presentation format of feedback lacks. Lack of a “hold” message during the feedback processing.

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: 1. computational model of feedback and the waiting time for display will be improved for version 2. 2. take into account previous suggestions. 3. correction of bugs related to synthesis storage (into both the database and code)

modification of data in the database to correct spelling mistakes, return characters.

How do these challenges/opportunities inform the roadmap for the end of the project?

Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may not be complete by the end of the project (e.g. LSA improvements).

Page 58 of 93

1. A better integration with other LTfLL services and with alredy-existing Elgg widgets. Verification of language technologies (reported in full in D5.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development? How does this work inform the roadmap for the end of the project?

The relevance / accuracy of feedback

The quality of feedback is a concern for many students

1/ by focusing the software development on the enhancement of the actual functionalities. 2/ by driving an additional experiment to match students’ answers on some syntheses and source texts with the results of actual model and new models (e.g. ask teachers to put tags or underline sentences to indicate the important ones, their topic). These additional data could allow to improve the computational cognitive models). 3/ modify the algorithms to improve the processing time and the display of feedback.

This would lead to a more reliable tool.

Section 6: Summary of results from validation associated with dissemination activities

Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

September/October 2009

Tests of a cognitive model based on LSA, about the detection of keywords

Tests in classroom

2nd year master students

30 Quantitative analysis

The extracted keywords with LSA seem relevant but they were quite different from those provided by participants

Page 59 of 93

Section 7: Recommendations from second validation round

Ref Validation topic (claim) if applicable Recommendations for next steps Specify section of

scenario / software unit

affected

By when (date)

VT1 Optimize learning outcomes by explorative and self-regulated reading

Feedback format: allow students to add their own course documents

For the v2: Oct-Nov 2010

VT1 Optimize learning outcomes by explorative and self-regulated reading

Use Pensum in threaded scenarios with WP 5.1, 4.1 and 6 services.

After the project

VT3 The teacher’s activity is more directed on higher levels of students’ activity (guidance, collaborative learning management, production assessment on style).

Alternate scenarios: more rewarding ones (beforehand written input), with face-to-face interaction, debates.

After the project

VT6 Students can get correct feedback Feedback accuracy: validity, pre-assessment of important sentences

For the v2: Oct-Nov 2010

Section 8: Conclusions from second validation round Pedagogic effectiveness

We conclude that Pensum is pedagogically effective in the following respects: Just-in-time feedback, as many times as students want. Its Notepad enable students manage their task effectively. the interviewed teacher found that current Pensum’s functionalities are pedagogically useful. The following aspects of Pensum were of limited/no pedagogic value: The feedback is not currently sufficiently valid/pertinent, so we have to the feedback Pensum delivers might

be more valid and pertinent, so we have to enhance the underlying computational models and the quality and validity of associated feedback.

Page 60 of 93

the interviewed teacher found that this version of Pensum lacks of communication ways between teacher and students.

Satisfaction

We conclude that students’ satisfaction with Pensum appears to be mixed: The kinds of feedback are satisfactory. However, satisfaction was found to be less good in the following areas: Feedback was judged not so valid and pertinent. They could be improved in implementing and testing more

valid pieces of feedback.

Usability

The following aspects of usability require improvement: the feedback delivery might be enhanced in order for the students to grasp them. As expected, due to Pensum’s novelty, students dedicated the most of their cognitive effort in wondering

how to take profit (4 students over 11) from Pensum and how to use it (4 students over 11). Once mastered, its use would require to reduce cognitive effort regarding these aspects.

Efficiency We conclude that Pensum appears to be efficient, in the following respects: In its very first times of use, Pensum needs a rather high cognitive load, in order for the students to

understand its functionalities. This load is expected to decrease once Pensum’s use is mastered

Page 61 of 93

Appendix B.6: Validation Reporting Template – WP6.1 (IPP-BAS; English) Pilot in English at IPP-BAS Section 3: Results - validation/verification of Validation Topics listed in the validation scenario

Summary – results

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

VT1 The teacher can derive the main structure of a course, together with the relevant support content, which raises the pedagogical value.

Pedagogic effectiveness

VT2 The teacher can adapt easily the tools wrt his needs in the given learning environment.

Pedagogic effectiveness

The general opinion was that the system could be more flexible and provide the user with the option to choose the way in which the information is organised and visualised.

VT3 The ontology and lexicons provide language and content transferability.

Pedagogic effectiveness

VT4 When writing the curriculum/preparing a course, the manager/teacher consults the ontology in FLSS for finding the hierarchy of main concepts and their definitions.

Satisfaction

Page 62 of 93

VT5 The manipulation (addition, deletion, changing, combining of data) over the repository of learning materials is fast and easy.

Usability Automatic annotation too slow. The speed of these actions depends on the Internet connection.

VT6 The teachers can choose a visualization(s) which best match(es) their purpose.

Efficiency More visualisation modes are to be created to meet the expectations of the users to have an option to choose the visualization that is suited best for their particular needs, especially with respect to the granularity of the information.

VT7 The teachers find quickly the relevant material via the semantic search.

Efficiency

VT8 The teacher can also annotate images which are part of the learning materials.

Pedagogic effectiveness

Details Ref: VT1

Validation topic (feature and claim): The teacher can derive the main structure of a course, together with the relevant support content, which raises the pedagogical value. Category: Pedagogic effectiveness Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Teachers Indicator: Teachers agree that the ontology and semantic search serve as ‘controllers’ of getting the main orientation information in the preparation phase Methodology: think alouds, questionnaire Type & no. of participants: 5 teachers

Summative results with respect to validation indicator

Specific: The FLSS helps me to derive the main structure of the course. М = 4.00, SD = NaN, Agree/Strongly Agree = 100 % (n=5) Generic: Overall, I believe that the FLSS provides adequate support for learning. M = 4.20, SD = 0.40, Agree/Strongly Agree = 100 % (n=5)

Page 63 of 93

Generic: Overall, the support provided by FLSS is relevant to my teaching activities. M = 4.20, SD = 0.40, Agree/Strongly Agree = 100 % (n=5) Generic: Overall, the FLSS helps me to complete my teaching tasks successfully. M = 4.20, SD = 0.49, Agree/Strongly Agree = 80 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Some of the respondents suggested that the ontology might be enriched with more concepts or more (types of) relations. Ref: VT2

Validation topic (feature and claim): The teacher can adapt easily the tools wrt his needs in the given learning environment. Category: Pedagogic effectiveness Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Teachers Indicator: Teachers agree that they can use various combinations of services in order to achieve their goals Methodology: think alouds, questionnaire Type & no. of participants: 5 teachers

Summative results with respect to validation indicator

Specific: The FLSS allows me to easily adapt available tools according to my needs. М = 3.20, SD = 0.40, Agree/Strongly Agree = 20 % (n=5). Specific: I can easily combine ontology browsing with semantic search to find appropriate learning materials within the repository of FLSS. М = 4.40, SD =0.80 Agree/Strongly Agree = 80 % (n=5) Generic: It is easy to learn to use the FLSS. M = 4.20, SD = 0.75, Agree/Strongly Agree = 80 % (n=5) Generic: I feel confident using the FLSS. M = 4.40, SD = 0.49, Agree/Strongly Agree = 100 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

The common interpretation of the statement “The FLSS allows me to easily adapt available tools according to my needs” was done with focus on the visualization possibilities of the system and they found the interface somewhat static. Ref: VT3

Validation topic (feature and claim): The ontology and lexicons provide language and content transferability. Category: Pedagogic effectiveness Language of pilot: English

Stakeholder(s): Teachers Indicator: When searching in Bulgarian, the teachers admit that they can find also relevant materials in English Methodology: think alouds, questionnaire Type & no. of participants: 5 teachers

Page 64 of 93

Language of participants: Bulgarian

Summative results with respect to validation indicator

Specific: FLSS helps me to find appropriate learning materials. М = 4.00, SD = 0.63, Agree/Strongly Agree = 80 % (n=5) (All teachers agreed that it is easy to use their native (in this case) language to retrieve documents in another language (en), e.g “маркиращи езици” instead of “markup languages”.) Generic: The information provided by FLSS supports my decisions on improving learning. M = 4.40, SD =0.49, Agree/Strongly Agree = 100 % (n=5) Generic: Overall, the support provided by FLSS is relevant to my teaching activities. M = 4.20 SD = 0.40, Agree/Strongly Agree = 100 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

One of the teachers commented that in the domain of computer science and in particular for the topic “Introduction to HTML” most of the terms, representing the basic concepts are borrowed from English or loan translations. He suggested that in cases where there is an asymmetry in the meanings of the terms in the (two) different languages, this is signalled explicitly. Ref: VT4

Validation topic (feature and claim): When writing the curriculum/preparing a course, the manager/teacher consults the ontology in FLSS for finding the hierarchy of main concepts and their definitions. Category: Satisfaction Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Educational managers, teachers Indicator: Managers and teachers admit that the ontology provides structured content of a good quality Methodology: interview Type & no. of participants: 1 manager/teacher, 4 teachers

Summative results with respect to validation indicator

Specific: Consulting ontology in FLSS to find the hierarchy of main concepts and their definitions is useful for me when designing courses. М = 4.20, SD = 0.75, Agree/Strongly Agree = 80 % (n=5) Generic: The support provided by the FLSS is complementary to my expertise. M = 4.20, SD = 0.49, Agree/Strongly Agree = 100 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

The teachers consulted the ontology not only in the beginning of the course preparation, but also when devising the list of recommended learning objects. One of them suggested that it will be nice to have a list of relevant documents available in the repository, attached to every concept in the

Page 65 of 93

ontology tree. One of the teachers suggested to implement what he called “ontology-driven edit” – annotation with concepts from the ontology that come in the form of suggestions from the system while he types in the annotation box. The teaching manager said that the continuity in the educational process is reflected in the reusability of the learning materials, which is a plus.

Ref: VT5

Validation topic (feature and claim): The manipulation (addition, deletion, changing, combining of data) over the repository of learning materials is fast and easy Category: Usability Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Teachers Indicator: Teachers provide a number of manipulated documents and confirm that their work was facilitated by the services Methodology: think alouds, interview Type & no. of participants: 5 teachers

Summative results with respect to validation indicator

Specific: The enriching of the repository of learning materials is easy. М = 4.00, SD = 0.63 Agree/Strongly Agree = 80 % (n=5) Specific: I can add as much information as needed additionally to the automatic annotation. М = 4.20, SD = 0.40, Agree/Strongly Agree = 100 % (n=5) Generic: It is easy to learn to use the FLSS. M = 4.20, SD = 0.75, Agree/Strongly Agree = 80 % (n=5) Generic: It is easy to navigate through the system. M = 4.00, SD = 0.89, Agree/Strongly Agree = 60 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Although the other types of manipulation over learning objects are easily done, the automatic annotation is still not fast enough and that might slow down the work of the tutor if he has to prepare the document himself. This, however, was accepted as something that is to be expected and was not emphasized in the teachers’ comments. Ref: VT6

Validation topic (feature and claim): The teachers can choose a visualization(s) which best match(es) their purpose. Category: Efficiency Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Teachers Indicator: Teachers use different visualizations in the same or in different tasks Methodology: think alouds Type & no. of participants: 5 teachers

Page 66 of 93

Summative results with respect to validation indicator

Specific: I can choose a visualization that best matches my purposes. М = 3.40, SD = 1.02, Agree/Strongly Agree = 40 % (n=5) Specific: The visualization of ontology is satisfactory. М = 3.40, SD = 0.80, Agree/Strongly Agree = 60 % (n=5) Specific: The visualization of the document annotation is clear and easy to use. М = 3.80, SD = 0.40, Agree/Strongly Agree = 80 % (n=5) Specific: I find the combined display of the chosen text and its annotation useful. М = 4.40, SD = 0.49, Agree/Strongly Agree =100 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

Here are some of the proposed changes: to have more explicit visualisation of the relations between the learning objects, for example, to group similar objects together and show them not only as a list of titles; to have different views where a concept occurrences are highlighted on different text levels: sentence, paragraph, whole text or group of documents. The teaching manager commented that the interface is not “overloaded” with visual information, but enriching it with more views would motivate the tutors to strongly use it. Ref: VT7

Validation topic (feature and claim): The teachers find quickly the relevant material via the semantic search. Category: Efficiency Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Teachers Indicator: Teachers find the relevant material with few searches in comparison with the text search Methodology: think alouds Type & no. of participants: 5 teachers

Summative results with respect to validation indicator

Specific: The semantic search allows me to quickly find relevant material. М = 4.40, SD = 0.49, Agree/Strongly Agree = 100 % (n=5) Generic: The support provided by the FLSS is tailored to the level of knowledge and skills of learners. M = 4.00, SD = 0.63, Agree/Strongly Agree = 80 % (n=5) Generic: It takes me less time to complete my teaching tasks using the FLSS than without the system. M = 4.20, SD = 0.40, Agree/Strongly Agree = 80 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

There were some recommendations about the visualization of the material: to have measures of relevance and snippets with the query concept for every document in the retrieval list.

Page 67 of 93

Ref: VT8

Validation topic (feature and claim): The teacher can annotate images which are part of the learning materials. Category: Pedagogic effectiveness Language of pilot: English Language of participants: Bulgarian

Stakeholder(s): Teachers Indicator: Teachers admit that they can add as much information as needed additionally to the automatic annotation Methodology: think alouds, questionnaire Type & no. of participants: 5 teachers

Summative results with respect to validation indicator

Specific: I find the facility of semantically annotating images useful, since images complement informatively to the plain text. М = 4.60, SD = 0.49, Agree/Strongly Agree = 100 % (n=5) Generic: The FLSS helps me to provide learners with relevant feedback. M = 4.60, SD = 0.49, Agree/Strongly Agree = 100 % (n=5)

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan Section 4: Results – inductive stakeholder validation activities Ref: AF-v1

VALIDATION EVENT Methodology: interview Stakeholder(s): Teachers

Type & no. of participants: 5 teachers Language of pilot: English Native language of participants: Bulgarian

Additional formative results (not associated with validation topics)

The teachers proposed a strategy to keep track of the knowledge development that is supported by the recommended documents on each learning step (lesson). For this purpose they suggested devising a “dynamic ontology” that will “grow from scratch” or will be based on the background knowledge required for the course. Ref: AF-i1

VALIDATION EVENT Methodology: interview Stakeholder(s): Teachers

Type & no. of participants: 5 teachers Language of pilot: English Native language of participants: Bulgarian

Page 68 of 93

Additional formative results (not associated with validation topics)

An Index of the concepts is necessary to be added, which is representative for the given course. Then links might be established to the locations and information about the contexts of usage. Ref: AF-i2

VALIDATION EVENT Methodology: interview Stakeholder(s): Teachers

Type & no. of participants: 5 teachers Language of pilot: English Native language of participants: Bulgarian

Additional formative results (not associated with validation topics)

More statistical information, represented in a visual way: concept frequency and relations (types and number). The teaching manager pointed out that details are not that important to him and he would like to get the information about a course in a more generalized mode, i.e. to have statistics about the coverage of the predefined expert knowledge and to be able to assess its quality in terms of novelty, completeness, effectiveness and so on. Ref: AF-i3

VALIDATION EVENT Methodology: interview Stakeholder(s): Teaching manager

Type & no. of participants: 1 teaching manager Language of pilot: English Native language of participants: Bulgarian

Additional formative results (not associated with validation topics)

The system has to provide means to compare different courses designed for the same subject by different tutors or in different periods of time. Section 5: Summary of results – verification activities

ALPHA-TESTING (ENGLISH VERSION)

Outstanding Issues

Outstanding challenges / opportunities from alpha testing (major points only):

1. the manual manipulation over the annotation can be made easier and more expressive to the users 2. the speed of the automatic annotation has to be improved

Page 69 of 93

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: 1. the comment and concept annotations will be visualized differently, and with different operations 2. the automatic annotation pipe will be made faster

How do these challenges/opportunities inform the roadmap for the end of the project?

Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may not be complete by the end of the project (e.g. LSA improvements). 1. the manual manipulation over the data will be synchronized with the information from the ontology, peers’ comments, etc, which will save time and will feel comfortable to the users 2. various types of visualizations over the same results will be made available to the users to choose wrt their task and target audience

Verification of language technologies (reported in full in D6.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development?

How does this work inform the roadmap for the end of the project?

Are the markers of relevance on the retrieved learning material explicit enough for the learning purposes?

semantic search already provides a set of highly relevant materials in a ranked order with respect to the query, but more and various markers are necessary, since the tutor might want to focus on specific details. Such details are not made explicit at the moment.

addition of statistics, such as: - number of concept occurrences in the result files, and per file - precision vs. recall information

addition of more sophisticated content-orientation metrics over the results

Is the annotation coverage sufficient for supporting the learning goals?

domain annotation is often sparse, and consequently – the search, might be strengthened also by common-sense annotation

adding some probes of common-sense annotation (i.e. concept annotation over the general lexica)

addition of fully designed common-sense annotations on the top of the domain-specific ones

Page 70 of 93

Section 6: Summary of results from validation associated with dissemination activities

Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

10/03/10

Seminar of the Communication and Informatics group at the Institute of Mathematics, BAS

Invited talk on the LTfLL project with focus on Task 6.1

Tutors in IT, researchers in IT

10

Focus group

1. feedback on the T6.1 strategy: the most appreciated assets were: - the re-usability of the learning material - the fluctuation of steps to be followed, which saves time and helps to stay focused 2. a number of minor problems to usability were noted

15/03/10 Seminar of the Communication and Informatics group at the Institute of Mathematics, BAS

hands-on activities, 2 hours

Tutors in IT, researchers in IT, managers

6 Focus group

1. feedback on software design: strengths: - the interface is not overloaded, thus it is comprehensible - semantic search ensures quick and relevant results - the manual annotation, such as the Image Annotator, gives freedom to tutors in their approaches and solutions. concerns: - the system administration costs, if adopted - the time for getting acquainted with the services - overcoming the stereotypes of usually not using such systems

Page 71 of 93

Section 7: Recommendations from second validation round

Ref Validation topic (claim) Recommendations for next steps Specify section of

scenario / software unit

affected

By when (date)

VT1 The teacher can derive the main structure of a course, together with the relevant support content, which raises the pedagogical value.

More additional information in the retrieved documents list. Search over peers’ comments.

Steps 1 and 2 from the validation script; search service

15 October 2010

VT5 The manipulation (addition, deletion, changing, combining of data) over the repository of learning materials is fast and easy.

Faster document annotation. Integration of image annotation functionality.

Steps 4 and 5 from the validation script; manual annotation service

15 October 2010

VT6 The teachers can choose a visualization(s) which best match(es) their purpose.

Development of more visualisation options. Step 3 from the validation script; visualization service

Roadmap

AF-v1 Dynamic ontology Matching domain ontology to the set of concepts and their properties, represented in the learning material.

Step 1 and 2 from the validation script; ontology mapping service

Roadmap

AF-i1 Concept Index Provide a table with the concepts and their occurrences in the course.

Step 2 from the validation script; visualization service

15 October 2010

AF-i2 Statistical information Include statistical measures over the retrieved semantically materials, and visualize them.

Step 2 from the validation script; implementation of statistical metrics and visualization service

Roadmap

Page 72 of 93

AF-i3 Teaching history Provide means to track down changes in teaching. Steps 1 and 2 from the validation script

Roadmap

Section 8: Conclusions from second validation round Pedagogic effectiveness

We conclude that the FLSS is pedagogically effective in the following respects: It provides means to find relevant materials on a given topic. The structured knowledge, represented in the domain ontology, supports the tutor in his course design work. The teachers could adapt services with respect to their specific purposes. The teachers could use also the images information in addition to the textual information within the learning

materials The following aspects of the FLSS were of limited/no pedagogic value: The visualisation of the information could be improved by providing more options, and thus made more

flexible with respect to the stakeholder’s needs. As one of the teachers pointed out, the quality of the learning materials itself can be a point of dispute. Providing more statistical information is required, which has to be delivered in intuitive and easily

comprehendible form; also - providing an option to switch between different views that will make manipulation with the learning materials even faster and more precise.

Satisfaction

We conclude that satisfaction with the FLSS appears to be good, in the following areas/aspects of the service: The teachers managed to structure a curriculum and to find the materials for the proposed by the scenario

course “Introduction to HTML” relevant enough. Satisfaction was found to be less good in the following areas: Although teachers viewed the visualization as friendly, they were less positive about the lack of more

visualisation options over the results. They definitely would have liked the service better if it provided various display modes that would support the accomplishment of different tasks.

Usability

We conclude that the usability of the FLSS appears to be good, in the following areas/aspects of the service: Its interface is simple and it doesn’t take much time to explore it in order to be able to use the FLSS to its full

potential. The following aspects of usability require improvement:

Page 73 of 93

the manipulation over the learning materials might be improved, such as the automatic annotation to be faster in real time usage

the manual annotation over concepts should be expanded to more that one token in order to handle real life expressions

Efficiency

We conclude that the FLSS appears to be efficient, in the following respects: The teachers were able to find the relevant structure and related materials quickly via the ontology browsing

and semantic search facilities. The following aspects of the efficiency of the FLSS require improvement: Providing more information about the type of relation between the concepts in a given learning object, the

context of use (in argumentative chunks, definitions, examples, etc.), their rank among the other concepts that represent the content of the document will decrease the time spent in reading the document itself.

Transferability

The following information is useful with regard to possible transfer of the service to other domains, languages, and pedagogic and organisational settings. Strengths/opportunities:

The core idea of the FLSS is that ontologies and the lexicons for different languages make document retrieval multilingual. If the tutor cannot think of the term for the corresponding concept right away, he can use the term in a different language.

Weaknesses / threats (including competing systems): With regard to the transferability to other domains, one of the teachers commented that the system might

be used in other domains and for other tasks (such as second language acquisition), but the appropriate supporting resources should be adapted (relevant lexicons, relevant ontology, relevant materials).

Page 74 of 93

Appendix B.7: Verification of WP6.1 (IPP-BAS, Bulgarian) Section 5: Summary of results – verification activities ALPHA-TESTING (BULGARIAN VERSION)

Outstanding Issues

Outstanding challenges / opportunities from alpha testing (major points only):

Since our efforts were focused on the integration of the technology in general, and the tests have been performed in English, the Bulgarian-based system still lacks some of the improvements, implemented for English, in spite of the existing potential in this respect. More precisely: 1. There is no integrated automatic annotation pipe for Bulgarian, although the separate components do exist and are state-of-the-art. For that reason point 2 is still a challenge. 2. The BG learning materials lack co-reference chains, which makes the present semantic search results relevant but sparser. 3. The mapping between the ontology and BG lexicon has some gaps to be handled due to the fact that more efforts have been invested in the ontology the English lexicon expansion.

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: 1. Providing an integrated automatic annotation pipe for Bulgarian. 2. Fixing the discrepancies in the mapping between the ontology and the BG lexicon.

How do these challenges/opportunities inform the roadmap for the end of the project?

Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may not be complete by the end of the project (e.g. LSA improvements): 1. Providing a co-reference chain tool for Bulgarian, integrated on the top of the elaborated automatic pipe.

Verification of language technologies (reported in full in D6.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development?

How does this work inform the roadmap for the end of the project?

How good is the coverage of the the concept annotation itself - the developed strategies for the annotation will be extended to

Page 75 of 93

retrieved learning material in BG? ensures the high quality of the returned material, but: - valuable context information is lost, which might be relevant for the learning goals, due to the lack of coreference annotation

English will be applied to Bulgarian (extension of the lexicon, the quality extension of the linguistic annotation towards the more context-oriented one);

a fully integrated co-reference module for Bulgarian

How efficient is the tuning of the existing NLP tools for Bulgarian with respect to the learning tasks?

all the required NLP tools exist as separate modules, but: - not integrated in a pipe, which - slows down the efficiency at the pre-processing stage

the English NLP pipe will be used as a state-of-the-art for connecting the BG tools, since it proved to be efficient for our task

the annotated data will become part of the semantic search, and will improve the information quality and precision of the retrieved BG data

Page 76 of 93

Appendix B.8: Validation Reporting Template – WP6.2 (UU & PUB-NCIT) Pilots in English at UU and PUB-NCIT Section 3: Results - validation/verification of Validation Topics listed in the validation scenario Summary – results

Ref Validation topic (feature and claim)

Category (effectiveness, usability etc)

Validated unconditionally

Validated with qualifications*

Not validated

*Qualifications to validation

VT1 The system provides the learners relevant learning material related to their learning task

Pedagogic effectiveness

x

VT2 Showing a topic in a domain ontology fragment improves the awareness of the learner with respect to the ways in which the topic is related to other topics.

Pedagogic effectiveness

x

VT3 Learners can more reliably judge if the retrieved learning material is to be trusted, because it is placed in the context of a social network

Pedagogic effectiveness

x

VT4 The recommended peers (including tutors) help the learner in his learning process

Pedagogic effectiveness

x In absolute terms, it should be Fully Validated. But the scores are relatively low; SA and A scores are 43% for advanced learners; Advanced learners

Page 77 of 93

score lower than beginners.

VT5 Learners find that the information the system provides in addition to the learning materials is relevant to the task being undertaken.

Pedagogic effectiveness

x

VT6 Learners find the results (e.g. list of results / feedback) easy to understand

Usability x

Details

Ref: VT1

Validation topic (feature and claim): The system provides the learners relevant learning material related to their learning task Category: Pedagogic effectiveness Language of pilot: English Native language of students: Dutch or Romanian

Stakeholder(s): learners Indicator: learning materials retrieved via semantic search & ontology

browsing are relevant learning materials retrieved on basis of tags and Social Networks

are relevant (compare relevance assessments for the three search methods to each other (summative) and to Google (formative)) Methodology: Ratings + Questionnaire + Interview Type & no. of participants: 7 Computer Science students and 12 non-CS students (Romanian and Dutch learners)

Summative results with respect to validation indicator

The relevance of the learning materials retrieved via the Semantic Search service were evaluated using a five level Likert scale (1=strongly disagree – 5=strongly agree). Both the beginners and advanced learners agreed that the documents were relevant (beginners: mean = 3.6, SD = 0.7, Agree/Strongly Agree = 66.7% (n=12) and advanced: mean = 4.0, SD = 1.0, Agree/Strongly Agree = 85.7% (n=7)). The resources retrieved with the Social Search were slightly higher appreciated, but the differences are not significant (beginners: mean = 3.8, SD = 0.7, Agree/Strongly Agree = 83.3% (n=12), and advanced: mean = 4.3, SD = 0.8, Agree/Strongly Agree = 85.7% (n=7)). Overall, the advanced learners judged the quality of the documents significantly higher than the beginners (t(36) = -1.705, p < 0.10).

Page 78 of 93

Formative results with respect to validation indicator

The learners considered the learning materials retrieved via the Social Search as being more relevant than the results from the Semantic Search. They mentioned two main reasons for this: (1) the list is shorter and contains more specific results because it is only from the network of the tutor, and (2) the results are more trustworthy because they are bookmarked or uploaded by their tutor or someone from within the network from the tutor. Compared to Google, the learners indicated that they would prefer to get the learning materials retrieved via the Social Search Service in a separate list in addition to the Google results. They would use the results from the Semantic Search only for practical tasks whereas the Social Search results are useful regardless of the task. As for the Semantic Search, one of the groups suggested to remove documents without the search term (e.g. HTML) in the top 5 of tags. Other suggestions that may help to decide on the relevance include visualization of the relevance (e.g. with colours), and making a distinction between types of documents (e.g. tutorials / code / explanation) One of the groups remarked that they wanted to get documents recommended that contain more in-depth information (e.g. articles) in addition to the tutorials / code from Delicious (more practical documents). They suggested showing a separate list with Google Scholar results. A last suggestion was to use ratings to indicate the quality of documents.

Ref: VT2

Validation topic (feature and claim): Showing a topic in a domain ontology fragment improves the awareness of the learner with respect to the ways in which the topic is related to other topics. Category: Pedagogic effectiveness Language of pilot: English Native language of students: Dutch or Romanian

Stakeholder(s): learners Indicator: A significant number of learners agree that they get a better view of how a concept is related to other concepts within a domain Methodology: mixed methods: questionnaire + interview Type & no. of participants: 7 Computer Science students and 12 non-CS students (Romanian and Dutch learners)

Summative results with respect to validation indicator

Two questions related to this validation topic were included in the questionnaire. In the first question, we asked whether the ontology fragment improved the understanding of the learners for the domain of the task: ''The related concepts in the ontology fragment improved my understanding of the topic”. The majority of the learners - both advanced and beginners - indicated that their understanding improved (beginners: mean = 3.75, SD = 0.45, Agree/Strongly Agree = 75% (n=12); advanced: mean = 4.00, SD = 0.58, Agree/Strongly Agree = 85.7% (n=7)). The remaining 21.1% was neutral for several reasons: one advanced learner knew the domain already while beginners indicated that the ontology only gives a general impression that needs to be refined by a more in-depth search and that the ontology can also be a disadvantage because you keep clicking and can

Page 79 of 93

get distracted from the amount of information you have to process. The second question focused on the use of the ontology and ontology browsing within their own studies: 'Concept browsing would be useful for my studies'. On this statement, both the beginners and the advanced learners were very positive (beginners: mean = 4.00, SD = 0.74, Agree/Strongly Agree = 91.2% (n=12); advanced: mean = 4.57, SD = 0.53, Agree/Strongly Agree = 100% (n=7)). One beginner disagreed on this statement, because she wants to find the relations herself and prefers to learn socially by discussing topics with other people: “I think I'd prefer to have a discussion with a teacher or fellow student, rather than browsing myself, or I would try browsing through relevant articles to form the concept myself.”

Formative results with respect to validation indicator

In the interviews, the beginners were very positive on the ontology fragment. They would use the fragment to put a term into context: they had no idea on how to develop a website and learned a lot from the ontology fragment in this respect. They remarked that this is especially useful when you are a beginner in a domain. For advanced learners, they expected it would be less useful. This is confirmed by the advanced learners, who remarked that they knew already how the term was related to other terms and that this made the ontology fragment less useful. However, they all saw the potential of the use of ontologies for their own learning process and would use it if they would need to explore a new domain.

Ref: VT3

Validation topic (feature and claim): Learners can more reliably judge if the retrieved learning material is to be trusted, because it is placed in the context of a social network Category: Pedagogic effectiveness Language of pilot: English Native language of students: Dutch or Romanian

Stakeholder(s): learners Indicator:

The learners indicate that the social network information helps them to judge the reliability of documents (summative)

Resources from social search are considered more trustworthy than Google results (formative)

Methodology: mixed methods: questionnaire (incl. open questions) + interview Type & no. of participants: 7 Computer Science students and 12 non-CS students (Romanian and Dutch learners)

Summative results with respect to validation indicator

The learners indicated on a five level Likert scale whether they agreed on the statement 'The social network information helped me to judge the reliability of documents'. There was a difference between the advanced learners and the beginners (beginners: mean = 3.42, SD = 0.67, Agree/Strongly Agree = 50% (n=12); advanced: mean = 4.14, SD = 0.38, Agree/Strongly Agree = 100 % (n=7)). The remaining beginners are either neutral (41.7%) or disagreed (8.3%). The difference between the beginners and advanced learners is significant (t(17) = 2.62, p < 0.05). The main reason for the beginners to be sceptical about the reliability was the fact that they couldn't judge who the persons are on the basis of their profiles. They remarked that they would like to see more information about the peers and how they are related to the tutor.

Page 80 of 93

Formative results with respect to validation indicator

All groups mentioned that the most important reason to trust documents is the link between documents and peers: this makes it possible to judge the quality of the documents on the basis of peers (and the other way around). In other words, the social network information helped the learners to judge the learning materials retrieved. A second reason to trust the documents is the fact that the system is open, that is, they could check the pages of the people to decide whether they are reliable and use this then to judge the documents. A more pragmatic reason mentioned in one of the groups is the fact that they didn't see any mistakes in the results, which makes them trustworthy for them. The beginners had different strategies to assess the reliability of learning materials. A first strategy is to use a sequential combination of different types of information: (1) check the tags, (2) read the learning materials, and (3) investigate who uploaded/bookmarked them. Other learners didn't use the social network information at all and considered all documents as being reliable simply because they have been uploaded or bookmarked within the network. A third strategy that has been used is to check who the users are first and thereafter check the tags and open the documents. This reveals that there is not one way to assess the reliability, but that our service provides the learner with different options from which they can choose. The beginners indicated that they would like some additional information in addition to the network information and the most important tags: (1) how old the information is, (2) show how important the tags are (e.g. percentages), (3) implement a possibility to see all tags for a document, (4) add more scientific content (e.g. papers). They also asked us to improve the visualisation, since they didn't like it that they could only see the tags for a document and no title or fragment (like Google). According to the beginners, the main advantage of the Social Search service compared to Google is the fact that the Social Search service provides information on how other people judge the quality of a document. When a tutor (or someone from his network) has indicated that a document is of good quality, it is more likely that it is a good document than when you find a document with Google. Another positive aspect of the Social Search service is the list of results, which is shorter. This makes it easier and faster to identify appropriate documents. However, this short list of results can be a disadvantage as well: the results depend on your network and as soon as your network doesn't have information on a term, you need to use another method whereas Google browses the complete web to find documents.

Ref: VT4

Validation topic (feature and claim): The recommended peers (including tutors) help the learner in his learning process Category: Pedagogic effectiveness Language of pilot: English Native language of students: Dutch or Romanian

Stakeholder(s): learners Indicator: A significant number of learners agree that the peers

recommended are relevant to the context of the task A significant number of learners agree that the feedback

provided by the system made it easy to judge whether the people are relevant.

Methodology: questionnaire (open and closed questions) + interview Type & no. of participants: 7 Computer Science students and 12 non-

Page 81 of 93

CS students (Romanian and Dutch learners)

Summative results with respect to validation indicator

All learners agreed that the system suggested relevant people that provided valuable information for the task they had to complete: one person (5.3%) found all peers relevant, 52.6% found most people relevant, while 42.1% found some of them relevant. There is no significant difference between the beginners and advanced learners. We also asked the learners whether they could easily assess the relevance of people on the basis of feedback provided by the system. There seems to be room for improvement in this respect, since 36.8% of all learners are neutral while 5.3% disagreed (beginners: mean = 3.67, SD = 0.49, Agree/Strongly Agree = 66.7% (n=12); advanced: mean = 3.43, SD = 0.98, Agree/Strongly Agree = 42.9% (n=7)). In the open question related to this multiple choice question one learner remarked that you have to click some things first before you can see if a person is relevant or not. In the formative questions, we asked the students for more feedback on this aspect and to give some suggestions for improvement.

Formative results with respect to validation indicator

The learners judged the reliability of the peers on the basis their profiles. They preferred the SlideShare profiles above the ones from Delicious. Another source of information are the uploads and bookmarks from the people. One person remarked that she decided which persons are trustworthy on the basis of his uploads and bookmarks instead of the other way around (i.e. determine reliability of documents on basis of person). The learners had some suggestions for improvement as well. They asked us to show how a peer is related to the learner, and they proposed to provide more information on the background of people (e.g. studies, level of expertise (bachelor / master / PhD)) and suggested to use LinkedIn for this. They also suggested that a better visualization may improve the system in this respect.

Ref: VT5

Validation topic (feature and claim): Learners find that the information the system provides in addition to the learning materials is relevant to the task being undertaken. Category: Relevance Language of pilot: English Native language of students: Dutch or Romanian

Stakeholder(s): learners Indicator: relevance assessment of: documents on basis of tags (semantic search) definitions (semantic search) peers on basis of tags (social search) documents on basis of peers (social search) Methodology: mixed methods: questionnaire + interview Type & no. of participants: 7 Computer Science students and 12 non-CS students (Romanian and Dutch learners)

Summative results with respect to validation indicator

In the semantic search, the learners could decide on the relevance of a document on the basis of the document titles and five tags that are shown as mouse-over. The majority of learners indicated that this information was useful and helped them to decide whether a document is relevant: mean

Page 82 of 93

= 4.00, SD = 0.95, Agree/Strongly Agree = 73.7% (n=19). One of the learners who didn't consider the tags to be useful didn't see the tags at all, whereas the other said that “sometimes they are very useful, but not always. There are documents where the related concept occurs only a few times, without any further explanation of the concept.” As for the definitions, the majority of the learners agreed that they are relevant to the task being undertaken: mean = 4.37, SD = 0.68, Agree/Strongly Agree = 89.5% (n=19). Both for the tags and the definitions, there is no significant difference between the opinions of the beginners and advanced learners. As for the social search, the learners were asked to indicate the relevance of the feedback for the peers and for the documents separately. In VT4, we already described the opinions of the learners in this respect.

Formative results with respect to validation indicator

The tags The 5 socially most relevant tags that were shown to the learners have been used often to assess the relevance of learning materials in the Semantic Search Service. The learners either clicked only documents in which the tag HTML was used, or based their decision on all tags. The title proved to be a relevant indicator as well for some of the users. There were also learners that simply clicked all documents to decide which ones are useful. We asked the learners whether they would prefer a text fragment containing the term, like Google provides, instead of the tags. They indicated that tags are either more useful or equally useful. One group (of two persons) remarked that the fragments can be misleading as well, since they are often on pages that are only slightly related to a topic. The advantage of the tags is the fact that a word has been used as a tag gives valuable information about the content of the document. The filtering includes Definitions The learners appreciated the definitions, because they give a quick idea on the meaning of a term. Usually, when they encounter a new term, they search for a definition first and the service assists them by providing this definition automatically. Peers and documents See formative results of VT4.

Ref: VT6

Validation topic (feature and claim): Learners find the results easy to understand Category: Usability Language of pilot: English Native language of students: Dutch or Romanian

Stakeholder(s): learners Indicator: Learners agree that the following are easy to understand and interpret: search results, ontology fragment returned when query is given, definitions, social network. Methodology: mixed methods: observation + questionnaire + interview Type & no. of participants: 7 Computer Science students and 12 non-

Page 83 of 93

CS students (Romanian and Dutch learners)

Summative results with respect to validation indicator including comparison with previous situation, where appropriate

The usability of the Social and Semantic Search service were validated separately. The learners agreed that it was easy to learn how to use the Semantic Search Service (mean = 4.37, SD = 0.50, Agree/Strongly Agree = 100% (n=19)) and that it was easy to navigate through the system (mean = 4.32, SD = 0.58, Agree/Strongly Agree = 100% (n=19)). For the Social Search Service, the usability scores regarding the navigation through the system was lower, but also for this component the usefullness was highly appreciated. Learning to use the service: mean = 4.43, SD = 0.53, Agree/Strongly Agree = 100% (n=19) and navigating through the system: mean = 4.14, SD = 1.07, Agree/Strongly Agree = 85.7% (n=19).

Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

The learners remarked that both services are very easy to use and understand. However, some suggestions for improvement of the visualisation and feedback offered were proposed. For the Semantic Search service, the learners suggested to

provide an explanation of the difference between blue and green concepts show the tags in the results list standard, and not only as a mouse-over. show the complete titles instead of part of the titles (for long titles) make a distinction between different types of documents (tutorials / code / explanation) in the list of results indicate in the ontology how strongly related concepts are, e.g. by using broad and narrow arrows. restrict definitions to one sentence.

For the Social Search service, the learners suggested to open documents and profiles in another window or tab automatically show the titles of the documents in addition to the tags. show the most important tags for a user as mouse-over, e.g. by showing their APML profiles.

Page 84 of 93

Section 4: Results – inductive stakeholder validation activities Ref: AD2

Integrating Social and Semantic Search Methodology: interview Stakeholder(s): learners

Type & no. of participants: 7 Computer Science and 12 non-Computer Science students Language of pilot: English Native language of participants: Dutch and Romanian

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

In the interviews, different learners proposed to integrate the Semantic and Social Search services. They wanted to keep aspects from both: Ontology fragment, Ontology browsing, and Definitions from Semantic Search and the Recommended People and Documents from the Social Search. The documents from the Semantic Search can be replaced by the Social Search documents and the APML profile (that visualizes the interests of a learner) can be left out. When no documents are found within the network, they suggest to show the results of the Semantic Search.

Ref: TM1 – TM8

Flexible ontologies Methodology: interview Stakeholder(s): teaching manager

Type & no. of participants: 1 teaching manager at PUB-NCIT Language of pilot: English Native language of participants: Romanian

Additional formative results (not associated with validation topics) Including results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan

TM1: The TM liked the degree of flexibility of the semantic search service, i.e. the fact that it doesn’t just use the ontology as is, but can be enriched automatically. This is important because it offers much more information and also contains up to date information. TM2: The TM was positive about the potential of the idea, and especially appreciated the possibility to use the service for many kinds of problems. For example, the peer to peer search, where the idea of trust could be used. The service also offers possibilities for dealing with security, where considering the network of trust of a user in a security environment could relax some security constraints when talking to people the user knows). TM3: The TM expressed that a longer validation and use of the service would be necessary in order to be trusted by teachers. TM4: The TM felt that the Learning Objects retrieved via the semantic search were not ranked correctly and he considered that an improved ranking system would be beneficial for the system. TM5: The TM has considered that it is important to refine the relations between the users in the social network. According to him, it is important to

Page 85 of 93

monitor the evolutions of relations and to rate their importance in a dynamic way. Also he considered the relation shouldn’t be just 1/0 – exists or doesn’t exist. The user should be able to offer negative feedback on relations to other existing users. TM6: There is no help functionality now, this should be added to enable new learners to find out how to use the system. TM7: The learner and tutor should be allowed to interact with the system to add and/or adapt information:

(1) they should be able to give feedback on the system (2) they should be able to new resources directly to the system (3) there should be a system for moderation of content

TM8: The TM feels that the social search environment is less suitable in a bachelor setting due to the type of learning necessary for the bachelor student. He therefore considered the system to be much more useful and suitable for a masters or a PhD program where people are asked to interact more with researchers from outside their institution and to collaborate and research starting from a given article or keyword. Section 5: Summary of results – verification activities

ALPHA-TESTING Outstanding Issues

Outstanding challenges / opportunities from alpha testing (major points only):

Direction of relations are not shown in the ontology (e.g. show that HTML ISA markup language instead of only showing that there is a ISA relation between the two concepts)

Searching does not work for all the concepts that have been automatically added (i.e. green concepts)

Search triggers only a restricted fragment of the ontology, should be easy to get further away than two steps

Save history of the user to make it easier to remember his activities (e.g. queries entered, read resources)

Display more meta-data with each resource (e.g. popularity, how recent the resource is, SN network info)

Find users in own network and outside your network which best match your profile

How do these challenges/opportunities inform the next round of design and development?

Changes you will make in Round 3: Indicate direction of relations Solve search for green concepts Enable dynamic search in which fragment can be easier extended

How do these challenges/opportunities Desirable changes that cannot be scheduled during the project, or changes started in Round 3 that may

Page 86 of 93

inform the roadmap for the end of the project?

not be complete by the end of the project (e.g. LSA improvements). Save history of user Display more meta-data with each resource (e.g. popularity, how recent the resource is, SN

network info) Find users in own network and outside your network which best match your profile.

Verification of language technologies (reported in full in D6.3)

Purpose of activity* (or research question)

Conclusion(s) drawn from activity (or answer to research question)

How does this work inform the next round of design and development?

How does this work inform the roadmap for the end of the project?

Relation extraction Only a restricted number of relation types are included in the current ontology. More relations would enhance the knowledge discovery process.

Investigate existing solutions and adapt them to our needs

To be done within the project

Section 6: Summary of results from validation associated with dissemination activities Date Event and

location Type and length of presentation

Type of attendees Number of attendees

Validation method

Main results

27/02/10 AIMAS Winter Olympics, University Politehnica of Bucharest

Educational conference in the AI field

Live demonstration of services (90 mins)

students, AI professionals 30 Open discussions

1. suggestions on improving the interface

2. suggestions on other data that should be considered when searching (such as Google Scholar articles)

Page 87 of 93

Section 7: Recommendations from second validation round Round 3

Ref Validation topic (claim) Recommendations for next steps Specify section of scenario /

software unit affected

By when (date)

VT1 The system provides the learners relevant learning material related to their learning task

(1) Develop combined Social and Semantic Search service. Recommend documents from network in addition to Semantic Search results.

(2) Provide scientific documents in addition to materials from social network sites (e.g. Cite-u-like)

SemOntology/SemLOs/SocLOs SemLOs/SocLOs

for pilot for pilot

VT2 Showing a topic in a domain ontology fragment improves the awareness of the learner with respect to the ways in which the topic is related to other topics.

VT3 Learners can more reliably judge if the retrieved learning material is to be trusted, because it is placed in the context of a social network

Show how people are related Widen scope of social network search when no

resources are identified

SocPeople SocPeople

for pilot end of project

VT4 The recommended peers (including tutors) help the learner in his learning process

Use APML profile to indicate expertise of peers (as tag cloud)

SocPeople/SocAPML

for pilot

VT5 Learners find that the information the system provides in addition to the learning materials is relevant to the task being undertaken.

Reduce length of definitions Semantic: indicate were documents come from (e.g.

Delicious, SlideShare, YouTube)

SemDefinition SemLOs/SocLOs

for pilot for pilot

VT6 Learners find the results easy to understand

Social: show document titles as well Improve visualisation of graph when more than one

concept is entered

SocLOs SemOntology/SemBrowse

for pilot for pilot for pilot

Page 88 of 93

Include explanation of interface / help function (e.g. to explain distinction between green and blue concepts)

All

for pilot

AD1 Integrating Social and Semantic Search integrate Semantic and Social Search services SemOntology/SemLOs/SocLOs/SocPeople

for pilot

Other Not related to VTs (1) Check the use of the services with > 25 users (2) Check that widget communication works with > 25

users (3) Twitter crawler (4) Facebook crawler (5) Personalised difficulty estimator for concepts/tags (6) Relation extractor (7) Developed a service for discovering persons having

accounts on multiple platforms using string similarity metrics between user accounts and comparing clusters of their tags

(8) Include explanation of interface / help function (e.g. to explain distinction between green, reddish and blue concepts)

For roadmap

Ref Validation topic (claim) Recommendations for next steps Specify section of scenario /

software unit affected

By when (date)

VT1 The system provides the learners relevant learning material related to their learning task

(1) Make a distinction between different types of documents (e.g. tutorials / code / articles / etc.)

SemLOs/SocLOs for roadmap

VT2 Showing a topic in a domain ontology fragment improves the awareness of

(1) Make ontology fragment dynamic

SemOntology/SemBrowse

for roadmap

Page 89 of 93

the learner with respect to the ways in which the topic is related to other topics

VT3 Learners can more reliably judge if the retrieved learning material is to be trusted, because it is placed in the context of a social network

(1) Include a rating possibility

SocLOs for roadmap

VT4 The recommended peers (including tutors) help the learner in his learning process

(1) Include possibility to contact peers (2) Give more information on recommended people

(e.g. studies, educational level) (3) Let learners rate the quality of other peers

SocPeople SocPeople SocPeople

for roadmap for roadmap for roadmap

VT5 Learners find that the information the system provides in addition to the learning materials is relevant to the task being undertaken.

(1) Include ratings for documents as additional indicator (2) Visualize the relevance (e.g. with colours) (3) Make a distinction between types of documents (e.g.

tutorials / code / explanation)

SocLOs SemLOs/SocLOs SemLOs/SocLOs

for roadmap for roadmap for roadmap

VT6 Learners find the results easy to understand

(1) Indicate strength of relationships between concepts in the ontology fragment

SemOntology for roadmap

Page 90 of 93

Section 8: Conclusions from second validation round Pedagogic effectiveness "Doing the right thing…the software makes a difference to the effectiveness of learning or facilitating learning".

We conclude that the iFLSS is pedagogically effective in the following respects: 94.7% of the learners indicated that ontology browsing and the ontology fragment would be useful in their

studies. The main contribution that was reported is that it helps the learner to discover to which concepts a term is related and indicates the type of relation.

89.5% of the learners indicated that the definitions provide valuable information that is relevant for the task. They said they would use them to improve the effectiveness of learning, since they offer them a quick, short description of terms.

73.7% of the learners indicated that the documents (i.e. Delicious bookmarks) retrieved with the knowledge discovery component were useful for the task at hand.

94.7% of the learners indicated that the social learning component was useful for the task at hand. 94.7% of the learners indicated that the social learning component would be useful for their studies. They

said that it would make their learning process more effective in two ways: 1. the service makes it easier to find relevant documents by restricting the learning materials on the basis of

the social network of the learner. 2. the service offers links to related people, who can assist the learner

The following aspects of the iFLSS were of limited/no pedagogic value: in our scenario, the learners performed a practical task, i.e. orientation for building a (dynamic) website. The

documents retrieved with the knowledge discovery component from Delicious are mainly practical and therefore useful for this specific task. Some learners were sceptical about the relevance of these documents when more in-depth information is needed and indicated that these documents are mainly useful for practical tasks.

68.4% of the learners indicated that the APML profiles can be useful in their learning process. However, the interviews revealed that the learners think the profiles should be improved to make them really useful; a suggestion was to link them to the queries of learners, i.e. the learners want a dynamic profile instead of a static one. Another possibility is to show the profiles of peers as mouse-over to see what their expertise is.

(3) The teaching manager indicated that the ranking of the Learning Objects obtained via semantic search could be improved.

According to the teaching manager, the pedagogical effectiveness could be further improved if the learner

Page 91 of 93

could interact with the system (e.g. adding/adapting content, offering feedback)

Relevance "…pedagogical problem tackled is clearly urgent and it has priority to be implemented in their institutions…

We conclude that the pedagogical problem tackled and the iFLSS is relevant in the following respects: the knowledge discovery component as a whole is considered to be useful for their studies by the majority of

learners (94.7%). The main contribution is the fact that it provides learners quick insights in a domain and lets him easily discover how concepts are related to each other.

concept browsing alone is considered to be useful for their studies by the majority of learners as well (94.7%). This was confirmed in the interviews and the open questions, where learners described this as very relevant for their learning, since it helps them to identify the most important concepts in a domain and the relations between concepts.

The social learning component is very relevant for the learning process as well. The learners indicated that they get more and more involved in social networks (e.g. Hyves, Facebook). They would like to use social networks in their learning process as well.

The following comments were made, against the Social Search Service being relevant: The success strongly depends on the willingness of tutors to use social networks and let their learners

connect to them. There is a snowball effect: it starts with some users (tutors and learners) and others join when they see the added value

Satisfaction "Satisfaction is about positive attitudes of the stakeholders towards the service. e.g. users are satisfied with the use of the service, it is pleasant to use, they would recommend it to fellows".

We conclude that satisfaction with the iFLSS appears to be good, in the following areas/aspects of the service: the majority of the learners (94.7%) would recommend the knowledge discovery component to other learners The majority of learners (73.7%) would recommend the social learning service to other learners, and the

majority of learners (68.4%) were satisfied with this service the beginners and advanced learners were asked to indicate which element of the knowledge discovery

component they liked most (ontology fragment, documents, or definitions). They appreciated different aspects of the semantic search service:

1. the majority of the beginners liked the ontology fragment most (66.7%) 2. the majority of the advanced learners liked the documents most (57.1%)

Satisfaction was found to be less good in the following areas: according to 73.7% of the learners, the documents from Delicious are suitable for the task. However, some

learners wondered whether they would be of suitable quality for other tasks, like writing a paper or learning for an exam, as well. They proposed to include Google Scholar documents to provide learners with more scientific content.

The visualisation of the social learning service should be improved, especially with respect to the representation of the documents.

Page 92 of 93

It was easy to keep clicking in the social learning component, but going back resulted in losing the results you had, so you have to fill the terms in again. It should be possible to go one step back (like the back-button in your browser)

The explanation of the APML profile should be improved The teaching manager pointed out that the relevance of the system could be improved by enabling learners

to refine the relations between users in their social network instead of having only information on whether or not people are related.

Usability "…how easy it is to use the software, and how quickly users can learn to use it".

We conclude that the usability of the iFLSS appears to be good, in the following areas/aspects of the service: all learners agreed that it is easy to learn how to use the knowledge discovery component the majority of the learners (89.5%) found it easy to navigate through the knowledge discovery compnent the majority of the learners (84.2%) felt confident using the knowledge discovery compnent The majority of learners (89.5%) indicated that it is easy to learn how to use the social learning component According to 73.7% of the learners, it was easy to navigate through the social learning component The majority of learners (78.9%) felt confident using the social learning component The following aspects of usability require improvement: neighbour concepts in the ontology are sometimes not visible, this will be addressed in the next version in the current implementation, it is not possible to go one step back in the ontology the ontology fragment is not clear when more than one concept is entered the concepts of the ontology fragment sometimes overlap documents and profiles should always be opened in a new tab the teaching manager indicated that a help functionality is missing at the moment.

Efficiency "Doing the thing right. Efficiency is about resources (e.g. time, cognitive load) the stakeholders need to accomplish specified goals. e.g. time-saving for the user, low cognitive load for user".

We conclude that the iFLSS appears to be efficient, in the following respects: According to 63.2% of the learners, the knowledge discovery service would be time-saving. Only two

learners (10.6%) think that using the service is not time-saving while the remaining 26.3% neither agree nor disagree. In their explanation, they describe that they need to try the system more to decide whether it would save them time or not.

57.8% of the learners indicated that the mental effort needed to accomplish the task is lower when the knowledge discovery service is used compared to using other means (e.g. Google). However, a relatively large proportion of the learners indicates that they do not know whether or not the service helps in this respect (26.3%) while the remaining 15.8% do not think the system reduces the mental effort.

Page 93 of 93

68.4% of the learners indicated that using the social learning service it takes less time to complete learning tasks

57.9% of the learners said that, using the social learning comopnent, less mental effort is required to complete learning tasks than without the system.

The following aspects of the efficiency of the Semantic Search Service require improvement: some learners would prefer a dynamic ontology fragment. In the current implementation, you can only

browse two steps away, and it would be better if fragment dynamically changes during browsing some learners remarked in the interviews that improving the user interface of the social learning component

would enhance the efficiency

Transferability "In the LTfLL context, this means transfer to other pedagogic and organisational settings, beyond the pilot context. e.g. likelihood of adoption in your institution for different courses/modules, and why".

The following information is useful with regard to possible transfer of the iFLSS to other domains, languages, and pedagogic and organisational settings. Strengths/opportunities:

The visualisation and enrichment process are completely domain independent, which makes it possible to include other ontologies instead of the computing ontology that has been used in this pilot.

The ontology is currently restricted to the computing domain. It can be transferred to any other domain as long as a domain ontology exists. Otherwise, a domain ontology can be bootstrapped using NLP techniques or tags.

From a technical point of view, transferability of the iFLSS is straightforward: the widgets can be easily integrated in existing systems.

From a practical point of view, the only requirement for using the social learning service is to have a user base (esp. tutors) that is accustomed to social media applications and willing to use external web sites for bookmarking/storing learning content. When the system is proving to be effective, a snowball effect will result in more and more users

Weaknesses / threats (including competing systems): The documents do not contain language metadata in the social bookmarking networks, which makes it

difficult to determine which language they are. The tags help in this respect. There is relatively little agreement about the relevance of the documents: 73.7% found them relevant,

while 10.5% found them irrelevant and 15.8% was neutral. In the interview, some learners remarked that it depends on the task whether the documents are relevant or not and that they are mainly useful for practical assignments. For other tasks (e.g. writing a paper), they would like to get other types of learning materials.