Design and Implementation of Affective E-Learning Strategy Based on Facial Emotion Recognition

S.C. Satapathy et al. (Eds.): Proceedings of the InConINDIA 2012, AISC 132, pp. 613–622. springerlink.com © Springer-Verlag Berlin Heidelberg 201

Design and Implementation of Affective E-Learning Strategy Based on Facial Emotion Recognition

Arindam Ray1 and Amlan Chakrabarti2

1 Awadh Centre of Education Guru Gobind Singh Indraprastha University,

New Delhi, India [email protected]

2 A K Choudhury School of Information Technology University of Calcutta

Kolkata, West Bengal, India [email protected]

Abstract. E Learning is emerging as a heavily learner-centric, emphasizing pervasive and personalized learning technology. Affective learning outcomes in a nutshell, involve attitudes, motivation, and values. In the same tune we can also define the affective E-learning, as a strategy, which implies recognition of learner’s emotion and selection of pedagogy in a best possible way. For the best delivery, learner’s affective state needs to be identified where the key solution is emotion recognition. Our work focuses on emotion detection using biophysical signals which further explores the evolution of emotion during learning process, to generate a feedback that can be used to improve learning experiences. Our research is deeply focused into the aspects of operative content delivery mechanism by using physiological facial signals for the detection of learner’s emotion but without detecting the face. In this paper we propose a key technique to detect learner’s facial expression, based on neural network classification and selection of appropriate learning style, which shows reasonable results in comparison with the other existing systems. The result manifests that the recognizer system is effective.

1 Introduction

A fundamental tenet of this design is that one method does not fit to all learners. Different pedagogy has to be chosen for different learner. In E-Learning portal the method of teaching-learning is unidirectional which implies simultaneous communication can’t happen. But in the face to face interactive session, it happens. Teacher’s experience plays an important role and hence an E-Learning portal needs such platform for emotion sharing between the leaner and the teacher. Learner’s emotion first reflects on the face and hence facial emotion recognition [1] is preferred to get the affective state of learner. The proposed model can recognize learners’ emotion to identify the affective state. In this paper we propose a technique to detect learner’s facial expression using SVM (Support Vector Machine) and also selection of the course based on neural network. As per the psychological theory that human emotions –could be classified into six typical emotions [2] viz. ‘‘happiness’’,

2

614 A. Ray and A. Chakrabarti

‘‘sadness’’, ‘‘surprise’’, ‘‘fear’’, ‘‘disgust’’ and ‘‘anger’’. For the appropriate learning pedagogy, we need to identify the learner’s psychology in the best way. Number of parameters is involved for psychological emotion, but our work is to utilize the facial gesture which can be correlated with emotions using neuro-fuzzy approach of classification. After identifying the psychological emotion, our system will automatically detect the learning pedagogy. The required course detection will be held automatically as per the algorithm. Our total proposal fastens in twofold operations; one is identification of learners’ emotion and another is selection of the learning style.

James J. Lien [3] has shown the process of automated facial expression of upper face. But the detection technique used in the Facial Action Coding System (FACS) [4] to identify facial action was an anatomically based coding system that enables discrimination between closely related expressions. FACS divided the face into upper and lower face action and further subdivided motion into action units (AUs). Their approach recognized upper face expressions in the forehead and brow regions only. For emission detection lower face also plays an important role and so this approach fulfills partial emotion extraction. Moreover it also needs high gradient image but in web cam, assuming that the learner having a cheap and minimum amount of computational resource, we can’t get such quality images. C.H. Messom, A Sarrafzadeh, M.J. Johnson, F. Chao said in their paper [5] that many software systems would significantly improve performance if they could adapt to the emotional state of the user, for example if intelligent tutoring systems, information / help kiosks, ATM’s, automatic ticketing machines could recognize when users were confused, frustrated or angry then they could guide the user back to remedial help systems, so improving the services. Current software systems are not able to estimate the affective state of the users and so, that are not able to offer these additional capabilities. They have proposed a model for detection of affective state estimator. This system consists of two neural network classifiers and a fuzzy logic facial analysis system. This system has been successfully prototyped for use in an intelligent tutoring system that has been adapted to the affective state of the user. Though facial expressions are the most important means of detecting emotions, however, other bio-signals such as heartbeat, skin resistance and voice tone can also be used for detecting human emotions. Our proposed model detects the movement of some points / spots of the face, from which it can detect the learner’s emotion. Moreover our approach is one step forward i.e. it could select the appropriate course for the learner using neural network.

Our research claims better facial emotion detection of a learner in an E-Learning platform and it requires a very minimum amount of resource at client side for detection of the affective state of the learner. Our objective is to detect affective state of learner through facial emotion; hence we choose the spot detection technique which has not been done earlier. The achievements of our research can be briefed as follows:

• It works in client side, which minimizes the server side overhead. • It doesn’t needs to detect the full face; only spot movement detection is

enough for emotions detection. • Emotion can easily be detected from the same group of learners because of

the similar facial pattern. • There is a model for automatic lesson detection.

Design and Implementation of Affective E-Learning Strategy 615

Our proposed work accomplishes a fusion of facial emotion and learning pedagogy, ensuring an affective E-Learning strategy, which is a new work in this domain of research.

In this paper, the abstraction of the affective computing model in E-Learning that identifies the effectiveness of learner has been explained in Section 1.1. Learner’s facial expression capturing and emotion detection framework has been implemented in the section 2. The key technology and methodologies for facial emotion recognition has been implemented in Section 3. Implementation and results of the facial emotion recognition using SVM has been shown in Section 4 and conclusive remarks are given in Section 5.

1.1 E-Learning Model Based on Affective Computing

Mase K. proposed an emotion recognition system that has used the major directions of specific facial muscles [6]. We are proposing the model for affective computing, based on fusion of emotional behaviors (speech and facial expressions) which works as an important feedback signal to know about psychological state of the learner. Feedback signals can be taken care effectively and can help in tuning the teaching strategies to serve personalized learning. We have taken the traditional E-Learning model and have added the affective computing module with it. The proposed model of E-learning system based on affective computing is shown in Figure 1.

Fig. 1. E-Learning Model Based on Affective Computing

Learner Interface module: Affective computing input (speech emotion and facial expression recognition input) is given to the human machine interface of the traditional E-Learning system, which collects learners’ speech and facial emotion feedback information primarily, and thus realizes the emotion compensation. In this paper we have focused on facial expression only.

Recognition module: Emotion recognition module is composed of input, pre-processing, feature extraction, feature selection and emotion recognition sub modules.

Evaluation module: It takes the input from the recognition module and generates the corresponding evaluation parameters.


2 Facial Emotion Recognition By Capturing Learners’ Facial Expression

2.1 Technical Framework

The recognition process has been framed with intelligent affective state recognizer. This affective structure is being composed with scanning of Images, Pre-Processing, Classification, Feature Extraction and Interpretation [7]. The following steps have been executed for the task:

Step I: Image of the learner is captured before the course delivery and stored in the database. Step II: The image of the learner is again captured after the course delivery to get the changes in the face emotion. Step III: The image acquired in Step II is then pre-processed by a neural network so that the positions of the face, specially forehead, eyebrow, low eye, cheek areas are detected. Step IV: This phase identifies the user, based on the database of known user image using Support Vector Machine. Step V: This stage consists of a facial feature extraction system based on neural network and an affective state estimator based on fuzzy logic. Step VI: This final phase selects the course delivery module as per the learners’ learning style.

The extraction of facial features could be done by the use of markers, so we don’t use face detection and tracking algorithms. All of the processing will be completed at the client side, which means that the online system will only send the final affective state of the user to the server side of the system which requires only a small overhead.

3 Key Technology Adopted for Emotion Detection

3.1 Training Data

Our training dataset consists of 129 college students and staffs ranging in age between 19 to 35 years. 45% were female, 55% were male and all were Indians. Videos were recorded in QHM495LM-3207 web camera located directly in front of the learner whose basic configuration are - lens capacity:14 Mega pixels; output Size: 640x480; capture size 640x480; avoid flicker 50Hz; zoom 1x. Subjects have been delivered by an experimenter to perform a series of all 129 facial expressions. Image sequences from neutral to target display were 640 by 480 pixel arrays with 8-bit precision. The only selection criterion was that a sequence be labeled as one of the 6 basic emotions (disgust, sadness, happiness, fear, anger, surprise). The sequences came from 5 different subjects, with 1 to 6 emotions per subject and we have classified the emotions in ten groups for further course selection.


3.2 Methodology Adapted

Facial expressions give important clues about emotions. Therefore, the features used are typically based on local spatial position or displacement of specific points and regions of the face. For a complete review of recent emotion recognition systems based on facial expression the readers are referred to [8]. A motion capture system (Webcam) was used to capture the expressive facial motion after the delivery of the course. Notice that the facial features are extracted with the precision of webcam (14 Mega pixels).

a) Before course delivery b) After course delivery

Fig. 2. Five areas of the face has been considered in this study

In the system based on visual information, the spatial data collected from markers in each frame of the video were reduced into a 6-dimensional feature vector per sentence, which is used as input to the classifier. The facial expression system, which is shown in Figure 2, is described below. After capturing the motion data, it has been normalized.

(1) All markers are translated in order to make a nose marker which acts as the local coordinate center of each frame,

(2) One frame with neutral and close-mouth head pose is picked as the reference frame,

(3) Three approximately rigid markers (manually chosen and illustrated as red points in Figure 2) define a local coordinate origin for each frame, and

(4) Each frame is rotated to align it with the reference frame. Each data frame is divided into five blocks: forehead, eyebrow, low eye, right cheek and left cheek area.

For each block, the 3D coordinate of markers in this block is concatenated together to form a data vector. It has been noticed that the markers near the lips are not considered, because the articulation of the speech might be recognized as a smile, which will confuse the emotion recognition system [9]. It is well observed that the different emotions appear in separate clusters, so important clues could be extracted from the spatial position of these 6-dimensional features space. Psychological research has classified six facial expressions which correspond to distinct universal emotions [10]. It is interesting to note that four out of the six are negative emotions. We have generalized the cues for facial expression as given below [11].


Table 1. Facial Expression and its motion cues

Expression Motion Cues Expression Motion Cues

Happiness raising and lowering of mouth

corners Fear brows raised

eyes open mouth opens slightly

Sadness lowering of mouth corners raise inner portion of brows Disgust

upper lip is raised nose bridge is wrinkled

cheeks raised

Surprise

brows arch eyes open wide to expose more

white jaw drops slightly

Anger

brows lowered lips pressed firmly

eyes bulging

4 Implementation and Results

We have used the support vector machine (SVM) [12] concept for setting of related supervised learning methods that analyze data and recognized patterns, used for classification and regression analysis. The facial spots in the individual five blocks have been identified before and after the course delivery. Our objective is to classify data and hence our training data points belong to one of the two classes and the goal is to decide which class a new data point will be. SVM views as a p-dimensional vector (a list of p numbers), and we found separated points in a (p − 1) dimensional hyperplane. In our training data we have used 81- dimensions, so p=81. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So we have chosen the hyperplane so that the distance from it to the nearest data point on each side is maximized. This resulted to the hyperplane, the perceptron (ANNs) of optimal stability.

Fig. 3. Facial points before and after course delivery

We have extracted the set of values from the facial spots from one training sample. We have identified 81 spots throughout the face and as per the muscle movements and it has been divided into the 5 blocks. Before the delivery of the course snap was taken (Figure 2(a)) and was marked in black spot set. After the delivery of the course another snap was taken (Figure 2(b)) and those marks were done with blue spot set.

X

Y


The both are mapped for detection of pattern using the SVM as shown in Figure 3. Now we go ahead for assigning the prediction class. The black points are considered in one class (+1) and the blue points are considered in another class (-1). We have taken a training data, which is a set of n points.

Positive: w • x + b = +1 Negative: w • x + b = -1 Hyperplane: w • x + b = 0

We have to find the unknowns, w and b by expending the equations:

w1x1 + w2x2 + b = +1 (1)

w1x1 + w2x2 + b = -1 (2)

w1x1 + w2x2 + b = 0 (3)

We have generated the training data for 129 facial emotions before and after course delivery. One dataset content 81 values of which first ten rows of the training data is given in Table 2. For all 129 training snaps same set of tables has been generated.

Table 2. Training data (First 10 rows)

Sample Black (+1) Blue (-1) Sample Black (+1) Blue (-1) SL Class X Y Class X Y SL Class X Y Class X Y

1 +1 15 71 -1 17 65 6 +1 61 37 -1 68 39 2 +1 20 61 -1 23 58 7 +1 75 38 -1 85 41 3 +1 25 51 -1 30 57 8 +1 91 46 -1 100 47 4 +1 34 42 -1 39 44 9 +1 100 52 -1 111 54 5 +1 47 38 -1 54 41 10 +1 108 61 -1 47 58

By using DTREG-SVM (www.dtreg.com) modeling we have generated the report

which is shown in Table 3

4.1 Analysis of the Report

From one training data, the result is presented in Table 3.

Table 3. Result of the training data

Bin Cutoff Mean Mean Cum % Cum % Cum % of % of Index Target Predicted Actual Population Target Gain Population Target Lift -------------------------------------------------------------------------------------------------------------------- 1 1.9971709 1.9972103 2.0000000 11.11 14.81 1.33 11.11 14.81 1.33 2 1.9971441 1.9971503 2.0000000 22.22 29.63 1.33 11.11 14.81 1.33 3 1.9971144 1.9971161 2.0000000 33.33 44.44 1.33 11.11 14.81 1.33 4 1.9971038 1.9971082 2.0000000 44.44 59.26 1.33 11.11 14.81 1.33 5 1.0030278 1.4999917 1.5000000 55.56 70.37 1.27 11.11 11.11 1.00 6 1.0030010 1.0030055 1.0000000 66.67 77.78 1.17 11.11 7.41 0.67 7 1.0029607 1.0029694 1.0000000 77.78 85.19 1.10 11.11 7.41 0.67 8 1.0028067 1.0028810 1.0000000 88.89 92.59 1.04 11.11 7.41 0.67 9 1.0023318 1.0025693 1.0000000 100.00 100.00 1.00 11.11 7.41 0.67 10 1.0023318 0.0000000 0.0000000 100.00 100.00 1.00 0.00 0.00 0.00

Average gain = 1.190 , Mean value of target variable = 1.5


4.2 Explanation

We need the average gain for further identification of the course delivery pattern. The rest of the data is not required in our study. As per the software, if the gain is 1.00 or less implies doesn’t detect the pattern and hence we have taken the average of all the training samples. First 10 training sample average gains are shown in Table 4.

Table 4. First 10 Training Data

Sample No Average Gain Sample No Average Gain 1 1.1900 6 1.1200 2 1.9100 7 1.1800 3 1.0000 8 1.1700 4 1.2900 9 1.1403 5 1.1100 10 1.4404

Average of all 129 training sample is: 1.3693 and which is more than 1.00, this

implies that the pattern has been recognized. The next step is to make a group for lesson identification and as per the psychological theory the groups are formed which is given in Table 5.

Table 5. Emotion - Group – Decision Taken

Emotion Detected Grouping Learning Style detection

Happiness, surprise Positive Group Lesson Understood Sadness, fear, Disgust, anger Negative Group Lesson Not Understood

4.3 Modeling of Learning Styles with Neural Networks

E-learning environments can take advantage of these different forms of learning by recognizing the pedagogy of each individual student using the system and adapting the content of courses to match this style. The method is based on artificial neural networks (ANNs) [13]. Neural networks are computational models for classification inspired by the neural structure of the brain: models that have proven to produce very accurate classifiers. In the proposed approach, neural networks are used to recognize learners’ learning styles based upon the actions they have performed in an E-Learning system. As per the detection of affective state, the system will suggest the lesson to the individual learner. As per the flow diagram system can select the learning pedagogy of the learner and neuro-fuzzy logic is required to implement such methodology. The flow diagram is shown in Figure 4


Apply Lesson 1 Method 2

Lesson Not Understood

Lesson Understood

Repeat Process

Apply Lesson 2 Method 1

Y

N

Start

Scan Facial Expression

Apply Lesson 1 - Method 1

Scan Facial Expression

Start

Detection of emption using pre scanned

image by Neuro-Fuzzy Logic

Fig. 4. Flow of Learning Method identification

5 Conclusion

In this paper, we have shown the effectiveness of facial emotion recognition in order to identify the affective state of a learner. This research also analyzed the strengths of facial expression classifiers in E-Learning environment using SVM. The results presented in this research shows that it is feasible to recognize human affective states with high accuracy by the use of visual modalities. Therefore, the next generation of human-computer interfaces might be able to perceive humans feedback, and respond appropriately and opportunely to get users’ affective states, improving the performance and engagement of the current interfaces.

In future work, the other image capturing obstructions like lateral impression, face with spectacle, sweaty face, etc. are to be considered. Our goal is to increase the maximum competence in human computer interaction for building up affective E-Learning system.

References

1. Cacioppo, J.T., Tassinary, L.G.: Inferring psychological significance from physiological signals. Americon Psychologist, 16–28 (1990)

2. Elfenbein, H.A., Ambady, N.: Universals and cultural differences in understanding emotions. Curr. Dir. Psychol. Sci. 12(5), 159–164 (2003a)


3. Lien, J.J., Kanade, T., Cohn, J.F., Li, C.-C.: Automated Facial Expression Recognition Based on FACS Action Units. In: IEEE 3rd International Conference, pp. 390–395 (1998)

4. Ekman, P., Friesen, W.V.: The Facial Action Coding System. Consulting Psychologists Press, Inc., San Francisco (1978)

5. Messom, C.H., Sarrafzadeh, A., Johnson, M.J., Chao, F.: Affective State Estimation From Facial Images Using Neural Networks And Fuzzy Logic, http://www.massey.ac.nz/~chmessom/Manuscript%20NGITS_NN.pdf

6. Mase, K.: Recognition of facial expression from optical flow. IEICE Transc., E. 74(10), 3474–3483 (1991)

7. Zhu, E., Liu, Q., Xu, X., Lei, T.: Research on Affective State Recognition in E-Learning System by Using Neural Network. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part III, LNCS, vol. 4489, pp. 575–578. Springer, Heidelberg (2007)

8. Pantic, M., Rothkrantz, L.J.M.: Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)

9. Guo, G., Li, S., Chan, K.: Face recognition by support vector machines. Image and Vision Computing 19(9-10), 631–638 (2001)

10. Yacoob, Y., Davis, L.: Recognizing Human Facial Expressions from Long Image Sequences Using Optical Flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 636–642 (1996)

11. Ekman, Friesen: Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology 53(4), 712–717 (1987)

12. Burges, C.: A tutorial on support vector machines for pattern recognition. Data Mining and Know. Disc. 2(2), 1–47 (1998)

13. Keefe, L.W.: Learning style: an overview. NASSP’s Student learning styles: Diagnosing and prescribing programs, 1–17 (1979)

Design and Implementation of Affective E-Learning Strategy Based on Facial Emotion Recognition

Documents

Transcript of Design and Implementation of Affective E-Learning Strategy Based on Facial Emotion Recognition