How about using the PCA to analyze changes in learning styles ?

10
How about using the PCA to analyze changes in learning styles ? Federico Scaccia 1, , Carlo Giovannella 1,2 1 ISIM Garage - Dept. of Science and Technology of Education and 2 Scuola IaD Tor Vergata University of Rome via della ricerca scientifica 1, 00133 Rome, Italy [email protected], [email protected] Abstract. The huge amount of traces that learners could potentially produce during the nowadays collaborative educational processes requires the development of increasingly sophisticated analytical tools to help teachers and tutors to identify at best emergent behaviors and individual styles and, as well, to let students acquire a higher level of awareness of their potential, also in relation to given operational context. In this paper we describe: a) a module that has been designed and developed to perform Principal Component Analysis as internal facility of the on-line learning place LIFE; b) its use to identify, as an example, groups of students characterized by similar learning styles. We have also investigated the possibility of using PCA to follow the time evolution (one or two years away) of such styles. The results reveal interesting potentialities and open new paths of investigations. Keywords: PCA, docimology, learning styles 1 Introduction to research problems on learning styles Since few years we are assisting to a progressive diffusion of more collaborative and open learning processes and, at the same time, to a growing attention toward all meaningful dimensions of learning "experiences", in the attempt to satisfy as much as possible propensities and expectations of individuals or, in other words, her/his personal styles. Due to all this today we are facing two main problems: • a quite old one related to the definition of satisfactory models of personal styles and of the experience, considered essential to offer personalized learning paths [1-12]; • and a newer one that is becoming increasingly evident with the growth of the complexity and openness level of learning processes: the need for new and more powerful techniques of monitoring, analysis and visualization [13]. After more than fifty years of research the elaboration of models of individual styles applicable to learning processes is, in fact, at least partially still an open issue. Despite all efforts dedicated to identify a convincing model of individual learning style still many doubts remain about their practical usefulness. The situation is well

Transcript of How about using the PCA to analyze changes in learning styles ?

How about using the PCA to analyze changes in learning styles ?

Federico Scaccia1,, Carlo Giovannella1,2

1 ISIM Garage - Dept. of Science and Technology of Education and 2 Scuola IaD Tor Vergata University of Rome

via della ricerca scientifica 1, 00133 Rome, Italy [email protected], [email protected]

Abstract. The huge amount of traces that learners could potentially produce during the nowadays collaborative educational processes requires the development of increasingly sophisticated analytical tools to help teachers and tutors to identify at best emergent behaviors and individual styles and, as well, to let students acquire a higher level of awareness of their potential, also in relation to given operational context. In this paper we describe: a) a module that has been designed and developed to perform Principal Component Analysis as internal facility of the on-line learning place LIFE; b) its use to identify, as an example, groups of students characterized by similar learning styles. We have also investigated the possibility of using PCA to follow the time evolution (one or two years away) of such styles. The results reveal interesting potentialities and open new paths of investigations.

Keywords: PCA, docimology, learning styles

1 Introduction to research problems on learning styles

Since few years we are assisting to a progressive diffusion of more collaborative and open learning processes and, at the same time, to a growing attention toward all meaningful dimensions of learning "experiences", in the attempt to satisfy as much as possible propensities and expectations of individuals or, in other words, her/his personal styles.

Due to all this today we are facing two main problems: • a quite old one related to the definition of satisfactory models of personal styles

and of the experience, considered essential to offer personalized learning paths [1-12]; • and a newer one that is becoming increasingly evident with the growth of the

complexity and openness level of learning processes: the need for new and more powerful techniques of monitoring, analysis and visualization [13].

After more than fifty years of research the elaboration of models of individual

styles applicable to learning processes is, in fact, at least partially still an open issue. Despite all efforts dedicated to identify a convincing model of individual learning style still many doubts remain about their practical usefulness. The situation is well

summarized in a comprehensive overview [14] where 71 models of learning styles have been analyzed and grouped in 5 families sortable according to their degree of stability: one goes from the more stable one, the constitutionally (physiology and genetic) based, to those reflecting the cognitive structure (including patterns and abilities), to those considered components of stable personalities, then to flexible-stable learning preferences and, finally, to the learning approaches and strategies. A deeper analysis of the 16 most popular models led the authors to criticize the concept of learning styles whose utility they do not believe has been demonstrated convincingly.

As discussed in previous papers [15,16] this may derive partially on the lack of a reasonable attempt to unify the different backgrounds of the various models in a more general framework that takes into account the whole complexity of the educational process and of the experience and, at same time, to clearly distinguish among dimensions describing the process, those describing the context and, finally, those describing the individual styles. It is not strange, thus, that all experimental studies performed to date leave a considerable uncertainty on the orthogonality of the dimensions of very popular learning style models like, for example, the Felder-Silverman (FSLSM) one [17].

Accordingly to all this we need: a) to rethink the theoretical foundations of the learning styles [15,16, 20, 26]; b) to explore new analytical approaches that can overcome problems such as the

non-orthogonality of the spaces of representation [18,19] In trying to give an answer to the first of the above needs we have introduced a

new framework - the "experience style" framework - that defines a 3D representational space of the experience and decouples process', context's and individual's dimensions [20]. The "experience style" framework, however, requires a relevant effort (currently in progress) to identify methodologies and strategies to track relevant traces and work out meaningful indicators useful to describe personal and contextualized learning experience. It is a research line that requires still a few years to be fully developed.

In the while one can also develop a deeper understanding of limitations of previous models to prepare the use of the new framework.

Recently we have shown how the long term variability of individual LS values (observation have been conducted over three years) arises reasonable doubts about the appropriateness of using FSLSM to derive indicators able to lead the design of strategies aimed at realizing customized educational experience. On the other hand we have also shown that changes in mean LS values characterizing groups of homogeneous or identical individuals could be probably used to measure style changes induced by specific training environments or curricula (see also next paragraph).

The intensity of the detected variations, however, is quite low and indicates a limited sensitivity of the measurement tool (FSLS questionnaire) [21].

To investigate further the level of meaningfulness of such variations and, at the same time, to explore the possibility to overcome the problem of the non-orthogonality of the FSLSM' spaces of representation we have decided to implement a

procedure and a tool to perform a Principal Component Analysis (PCA) of the recorded dates.

In the following we first remind the operational principles of the PCA, then we will describe briefly the analysis module that has been implemented and, finally we will present its application to a set of data collected over three years to trace variations in the individual LS values during the attendance of a the bachelor's degree in Media Science and Technology of the University of Rome Tor Vergata.

2 The methodological approach and PCA implementation

The PCA is a method of the factorial analysis proposed more than a century ago and nowadays widely used whenever one has to do with quantitative variables and a multidimensional space of representation (multivariate distribution) characterized by an incertitude on the orthogonality of the axes forming its basis [22] (here the 4 dimensions of the FSLSM). The PCA, in fact, allows to carry out a linear transformation of the initial space into a new Cartesian space of representation whose dimensions, linear combinations of the original ones, are othogonal. The ultimate goals, in fact, are the orthogonalization of the initial space of representation and the concentration of the largest part of all significant information (at least 70% of the whole information) into a limited number of dimensions (called Principal Components, PCs) so as to obtain an optimal two-dimensional representation of the original information and, at the same time, a limited loss of information. To achieve this goal [14] one could proceed by starting with a standardization of the initial dataset Xi - such that all

Zi =Xi � µi

�i have zero mean and unit variance, then move on to the calculation of the covariance matrix, its diagonalization and the derivation of its eigenvalues, the definition of the corresponding set of eigenvectors forming the new orthogonal space of representation, where data set consisting of a linear combination of observables will be positioned. Analogously, it has been shown that an identical result can be achieved by diagonalizing the correlation matrix whose elements are defined as follows:

Ri,j = Cov(Xi,Xj)/SQRT(Var(Xi)Var(Xj)) This second method is less time consuming and it is the one we have adopted in our study. Once that the datasets have been represented according to the PCs, it is also important to develop an appropriate interpretation strategy that in our case made use of the following supports: a) display of the contributions provided by each dimension of the old space of representation to the new one;

b) k-means clustering algorithm [23] to perform an unsupervised search for potential clusters. In the following we will show how one can apply the above outlined strategy to an experimental set of data - individual FS learning styles measured by means of the FS questionnaire - to investigate the possible existence of LS clusters or the formation of LS clusters induced by the attendance of the degree course in Media Science and Technology of the University of Rome Tor Vergata.

Fig. 1. Screenshot of the PCA module integrated in the survey area of LIFE Accordingly to the methodological strategy described here above, we have implemented and integrated in LIFE [24] a PCA module, see fig. 1, that allows to upload data files in csv format and to analyze them. The module allows to apply a PCA to any dimensional subset of the initial observational space and to compare two different set of data on a unique Principal Component Space of Representation (PCSR) to investigate for cluster formation and/or evolution. It allows also to visualize eigenvalues and eigenvectors matrices, the scree-plot, and to choose the axis of representation (by default those corresponding to the two highest eigenvalues).

3 Datasets, analysis and discussion

The groups of students involved each academic year in the study are listed in Table 1. All subjects were asked to fill the ILS questionnaire [21] at the beginning of the course in Physics held in the second half of the first year of the Bachelor degree and again/or

at the beginning of the course of "Multimodal Interfaces and Systems" (ISM) which is held in the second half of the third and final year of the same Bachelor degree.

Table 1. Groups of students involved in the overall study.

Course  &  

Academic  Year  N.  of  regular  students N.  of  students  

repeating  the  course Physics  -­‐  a.y.  08-­‐09 22  (6  female) 7  (1  female) Physics  -­‐  a.y.  09-­‐10 47  (6  female) 9  (2  female) Physics  -­‐  a.y.  10-­‐11 27  (11  female) 22  (1  female) Physics  –  a.y.  11-­‐12   24  (8  female)   7  (2  female)  ISM  -­‐  a.y.  08-­‐09 17  (6  female) 4  (no  female) ISM  -­‐  a.y.  09-­‐10 31  (6  female) - ISM  -­‐  a.y.  10-­‐11 16  (4  female) - ISM  -­‐  a.y.  11-­‐12   16  (5  female)   -

Overall the tests involved 120 (26% female) + 45 (repeating, 13% female) first

year Bachelor students and 90 (23% female) third year Bachelor students. The age of participants ranged between 19 and 23 years old for students attending the first year and between 21 and 26 years old for those attending the third year of the Bachelor.

Table 2. Groups of students involved in the present study.

Time  interval  lasted  to  measure  LS  

differences  on  the  same  subjects  attending  different  courses  

N.  of  subjects

2  years:    ISM  10-­‐11  –  Phys.  08-­‐09  &  ISM  11-­‐12  –  Phys.  09-­‐10  

16  (4  female)

1  years:  ISM  09-­‐10  –  Phys.  08-­‐09    &  ISM  11-­‐10  –  Phys.  09-­‐10

12  (2  female)

For the present study we decided to investigate changes in LS induced by one or

two years attendance of the Bachelor's course on the same group of individuals and thus we considered only the subset of students listed in table 2:

(a) all students that attended the course in Physics on the second year and the ISM course on the third year of the Bachelor (1 year away check);

(b) all students that attended on regular time both courses in Physics and ISM (2 years away check). A description of the study conducted an all student involved can be found in ref. [25].

3.1 Descriptive statistics

In Table 3 we report differences between LS mean value describing propensities of group of same individuals one or two years away, along with the range of individual LS variability (in brackets). The observed changes, max 17% of the full scale, indicate a tendency toward a reinforcement of sensitive and global styles while

contrasted is the tendency toward the development of stronger visual and active styles (more evident in the 2 year away check). It is worthwhile to stress that variations in the LS mean values are much smaller than the variability range of individual LS. This observation, as stated in the introduction, legitimate doubts about the appropriateness of using FSLSM to derive indicators useful to design strategies aimed at supporting customized educational experience on individual level when short time period (e.g. a semester) are considered. The overall tendencies, however are not unreasonable and in our case can be justified by the nature of the Bachelor, offering many workshops and courses on visual media. Despite of the detected variations, quite small indeed to indicate a limited sensitivity of the measurement tool (FSLS questionnaire), the reasonable consistency of the observations lets the door open to the possibility to use FSLS to detect changes induced by specific training environments/curricula. Table 3. Differences between LS mean values of group of same individuals one or two years away, along with the range of individual LS variability (in brackets).

Differences  between  LS  mean  values  of  student  subgroups  

composed  by  the  same  subjects  

Act(-­‐)  Ref(+) Sen(-­‐)  Int(+)  

Vis(-­‐)  Ver(+)  

Seq(-­‐)  Glo(+)  

2  years  ISM  10-­‐11  –  Phys.  08-­‐09  &  ISM  11-­‐12  –  Phys.  09-­‐10  (individual  variability  range)

-­‐1.50  (-­‐8,8)

-­‐0.25  (-­‐6,12)

-­‐1.00  (-­‐10,6)  

0,94  (-­‐6,6)  

1  years  ISM  09-­‐10  –  Phys.  08-­‐09    &  ISM  11-­‐10  –  Phys.  09-­‐10  (individual  variability  range)

0,00  (-­‐4,4)

-­‐1.83  (-­‐8,2)

0.17  (-­‐4,4)  

0,67  (-­‐10,8)  

The correlation matrices worked out for all subset of data confirm the existence of

a strong correlation between sensing/intuitive and sequential/global dimensions and a reasonable anti-correlation between sequential/global and visual-verbal dimensions as reported also in previous studies. With the time, we observed also an increasing correlation between active/reflective and visual/verbal dimensions and a strong de-correlation between visual/verbal and sensing/intuitive dimensions. Such correlations indicate the non-orthogonality of the FSLSM and it is a good reason to carry on a more detailed investigation of the subset of data considered here by means of PCA. Another good reason to use the PCA is to make emerge the dynamics that are hidden behind the variations of the LS mean values.

3.2 Inferential statistics

A particularity of this study with respect to usual PCA studies is that we want to compare subset of data taken one or two year away that, after the application of the space transformation would lead each one to a different space of representation not easily comparable. Because of this we decided to use the PCA to operate a space transformation on the "initial" subset of data (i.e. the data collected during the Physics course of the Bachelor) and to use the same Principal Components to visualize also data collected one year or two years away.

To verify the meaningfulness of the transformation obtained by diagonalizing the correlation matrices of the two subsets of data (see table 2) collected during the Physics course we show in figure 2 the respective scree-plots: in both cases the sum of the weight of the first two eingevalues is higher than 70% and in the first case is even more than 85%.

Fig. 2. Scree-plots of the eigenvalues derived from the diagonalization of the data collected during the course in Physics to be compared with data collected one year away (top) and two year away (bottom) during the ISM course

Figure 3 shows the comparisons between the subsets of data recorded at the beginning of the course in Physics (blue dots) and those collected one year (top - Group 1) and two years (bottom - group 2) away at the beginning the course of the ISM (red dots). We would like to recall that the points represented in the plot are no longer the data collected through the questionnaire LS, but the result of a linear combination of observables. To make the reader better understand which is the contribution that each learning style provides to the two Principal Components, Y1 and Y2 of the orthogonal space of representation, we have positioned on fig. 3 the initials of each learning style, accordingly to the weight they have in the linear combinations that defines the news axis. In the first case, students attending the second year of the Bachelor, (top figures) we observe the formation of a cluster characterized by active and sequential styles. This cluster within a year moves almost rigidly toward positive Y1 as a consequence of an increase in global and visual styles, counterbalanced by a decrease of the sequential characteristic of the cluster.

Fig. 3. Representation of students styles in the Principal Component space. Blue dots derived from data collected at the beginning of the course in Physics, red dots derived from data collected before the ISM course. Comparison are among data collected one year away (top), and two years away (bottom). Arrows indicate the displacements of the points.

In the same period we observe also the formation of a smaller cluster acting as attraction point for few elements characterized by a dominant sequential style. In the second case (bottom figures) we observe once again the formation of a small cluster of attraction dominated, this time, by the sensitive style of its members. Moreover we observe again also the development of a large clustering region in correspondence of the positive portion of axis Y1, but it splits in two: the upper cluster shows a dominant active and visual features but loses the global character in favor of the sensitive style; the lower cluster is dominated by global and visual styles, but it is no longer characterized by the active one. Considering the process as a whole we can say that at the beginning of the Bachelor course students seem to be characterized by heterogeneously distributed propensities so that no clusters can be identified. Clusters start to develop already during the first year (see blue ellipse, top plot of figure 3), but they continue to develop throughout the second and third year of the degree program. The educational process induces in most of the students more active, global and visual behaviors/styles, although not in a homogeneous manner. There is always, however, a small group of students that develops more verbal, sequential and/or sensitive styles. Finally, there is also a minority of students who are characterized by completely independent styles, and are not attracted by the clustering areas. The analysis could be pushed further on (for example as regards the cluster analysis) but it is beyond the scope of this article. Our study, in fact, was primarily intended to show how the introduction of the inferential statistical allow us to obtain more precise information on the influence that educational processes may have on the transformation of learning styles and behaviors, information that normally remain hidden within data produced by descriptive statistics which, by definition, compensate opposite behaviors by mediating them. The approaches of the inferential statistical, thus, are very important anytime the dimensions considered relevant for a given model are not orthogonal (as in the case of FSLSM) and more in general anytime the complexity of the system and the number of relevant dimensions coming into play is high (as in the case of all models of the "experience"). All this indicates that inferential statics will assume in the future an increasingly relevance.

References

1. Brusilovsky P.: User Modeling and User-Adapted Interaction, Adaptive Hypermedia, 11, pp. 87--110 (2001).

2. Brusilovsky P., Peylo C.: Adaptive and Intelligent Web-based Educational Systems", International Journal of Artificial Intelligence in Education, 13, pp. 156--169 (2003).

3. Papanikolau K.A., Grigoriadou M.: Accommodating learning style characteristics in adaptive educational hypermedia systems. In: AH2004 part I, Eindhoven (2004).

4. Branco Neto W., Gauthier F., Modesto Nassar S.: An adaptive e-learning model for the Semantic Web. In: International Workshop on Applications of Semantic Web technologies for E-Learning, Banff, Canada, pp. 63--64 (2005).

5. Brusilovsky P., Kobsa A., Nejdl W.(eds.): Adaptive Web 2007, LNCS, Springer, vol. 4321 (2007).

6. Graf S., Kinshuk: An Approach for Detecting Learning Styles in Learning Management Systems. In: ICALT 2006, IEEE press, pp. 161--163 (2006).

7. Graf S.: Adaptivity in learning management systems focusing on learning styles, Ph.D. Thesis, Faculty of Informatics, Vienna Univerity of Technology (2007).

8. Graf S., Lin T., Kinshuk: The relationship between learning styles and cognitive traits - Getting additional information for improving student modelling, Computers in Human Behavior, 24, pp. 122--137 (2008).

9. Thalmann S.: Adaptation Criteria for Preparing Learning Material for Adaptive Usage: Structured Content Analysis of Existing Systems. In USAB 2008, LNCS, Springer, 5298, pp. 411--418 (2008).

10. Germanakos P., Tsianos N., Lekkas Z., Mourlas C., Belk M., Samaras G.: Towards an Adaptive and Personalized Web Interaction using Human Factors. In: Advances in Semantic Media Adaptation and Personalization, Vol. 2. M. Angelides, P Mylonas, M. Wallace (Ed.), Taylor & Francis Group, LLC (2010).

11. Granic A., Nakic J.: Enhancing the Learning Experience: Preliminary Framework for User Individual Differences. In USAB 2010, LNCS, Springer, 6389, pp. 384--399 (2010).

12. Popescu E., Badina C., Moraret L.: Accomodating Learning Styles in an Adaptive Educational System, Informatica, 34, pp. 451--462 (2010).

13. Giovannella C., Carcone S., Camusi A.: What and how to monitor complex educative experiences. Toward the definition of a general framework, IxD&A, 11&12, pp. 7--23 (2011).

14. Coffield C., Mosely D., Hall E., Ecclestone K.: Learning styles and Pedagogy in Post-16 Learning. LSRC, Univ. of Newcastle upon Tyne, London (2004).

15. Giovannella C., Spadavecchia C., Camusi A.: Educational Experiences and Experiences Styles, IxD&A, 9&10, pp. 104--116 (2011).

16. Giovannella C., Camusi A., Spadavecchia C.: From Learning Styles to Experience styles. In: ICALT2010, pp. 732--733. IEEE Press, New York (2010).

17. Felder R.M., Silverman L.K.: Learning and teaching styles in engineering education, Engineering Education, 78(7), pp. 674--681 (1988).

18. Viola S.R., Graf S., Kinshuk, Leo T.: Analysis of Felder-Silverman Index of Learning Styles by a Data-driven Statistical Approach, International Journal of Interactive Technology and Smart Education, 4, pp. 7--18 (2007).

19. Felder R.M., Spurlin J.: Applications, Reliability and Validity of the Index of Learning Styles, Int. J. Engng, 21, pp. 103--112 (2005).

20. Giovannella C., Moggio F.: Toward a general model of the learning experience. In: ICALT 2011, IEEE publisher, pp. 644--645 (2011).

21. http://www.engr.ncsu.edu/learningstyles/ilsweb.html 22. Bolasco S., Analisi Multidimensionale dei dati. Carocci, Roma (1999). 23. MacQueen J. B.: Some Methods for classification and Analysis of Multivariate Observations. In:

Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1, 281--297 (1967).

24. Learning in an Interactive Framework to Experience, http://life.mifav.uniroma2.it 25. Giovannella C.: What can we learn from long-time lasting measurments of Felder-

Silverman's learning styles ?. In ICALT 2012, IEEE publisher (2012). 26. Popescu E.: A Unified Learning Style Model for Technology-Enhanced Learning: What,

Why and How?, International Journal of Distance Education Technologies, IGI Global, 8, pp. 65--81 (2010).