The relationship between complexity (taxonomy) and difficulty

9
The relationship between complexity (taxonomy) and difficulty Yih Tyng Tan and Abdul Rahman Othman Citation: AIP Conf. Proc. 1522, 596 (2013); doi: 10.1063/1.4801179 View online: http://dx.doi.org/10.1063/1.4801179 View Table of Contents: http://proceedings.aip.org/dbt/dbt.jsp?KEY=APCPCS&Volume=1522&Issue=1 Published by the American Institute of Physics. Additional information on AIP Conf. Proc. Journal Homepage: http://proceedings.aip.org/ Journal Information: http://proceedings.aip.org/about/about_the_proceedings Top downloads: http://proceedings.aip.org/dbt/most_downloaded.jsp?KEY=APCPCS Information for Authors: http://proceedings.aip.org/authors/information_for_authors Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

Transcript of The relationship between complexity (taxonomy) and difficulty

The relationship between complexity (taxonomy) and difficultyYih Tyng Tan and Abdul Rahman Othman Citation: AIP Conf. Proc. 1522, 596 (2013); doi: 10.1063/1.4801179 View online: http://dx.doi.org/10.1063/1.4801179 View Table of Contents: http://proceedings.aip.org/dbt/dbt.jsp?KEY=APCPCS&Volume=1522&Issue=1 Published by the American Institute of Physics. Additional information on AIP Conf. Proc.Journal Homepage: http://proceedings.aip.org/ Journal Information: http://proceedings.aip.org/about/about_the_proceedings Top downloads: http://proceedings.aip.org/dbt/most_downloaded.jsp?KEY=APCPCS Information for Authors: http://proceedings.aip.org/authors/information_for_authors

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

The Relationship between Complexity (Taxonomy) and Difficulty

Tan Yih Tyng and Abdul Rahman Othman

School Of Distance Education, Universiti Sains Malaysia, 11800 USM Penang, Malaysia

Abstract. Difficulty and complexity are important factors that occur in every test questions. These two factors will also affect the reliability of the test. Hence, difficulty and complexity must be considered by educators during preparation of the test questions. The relationship between difficulty and complexity is studied. Complexity is defined as the level in Bloom’s Taxonomy. Difficulty is represented by the proportion of students scoring between specific score intervals. A chi-square test of independence between difficulty and complexity was conducted on the results of a continuous assessment of a third year undergraduate course, Probability Theory. The independence test showed that the difficulty and complexity are related. However, this relationship is small.

Keywords: difficulty, complexity PACS: 01.40.-d

INTRODUCTION

Item difficulty and complexity are important factors when preparing exam questions. They need to be considered by educators and researchers in the field of education. Difficulty and complexity can also impact the reliability of the test.

The simplest and general definition of difficulty of a question is the ratio of students who answered the question correctly and the number of students taking the test [1]. This ratio is also known as the index of difficulty. This difficulty index is denoted as the P-value for a particular question.

cPn

� (1)

where c is the number of students who could answer the question correctly and n is the total number of students who took the test. This P-value is not the same as the p-value in hypothesis testing. Note that the letter ‘p’ in p-value for hypothesis testing is in lowercase.

Higher P-value indicates an easy question or item, while lower P-value indicates the questions or items are difficult. Hotiu [2], Phipps [3] and Zaman [4] redefined equation (1) as the percentage of students who answer the question correctly, which is

100cPn

� � (2)

According to Sousa [5], difficulty refers to the amount of effort required to obtain the solution at the same level

of complexity. In general complexity can be classified into six levels in Bloom’s Taxonomy [6]. Complexity accounts for the syllabus that you want to test and can be at varying levels. At each level, the number of tasks or solution steps varies. Usually low level complexity has fewer steps than higher level complexities. This can be seen in the system used by the Florida Department of Education (FLDOE) [7]. In this system, the level of Bloom’s taxonomy has been recategorized into three levels, level 1 and 2 in Bloom’s taxonomy as low; level 3 and 4 in Bloom’s taxonomy as moderate; level 5 and 6 in Bloom’s taxonomy as high.

From Sousa’s definition [5], item difficulty levels can be increased without changing the level of complexity. He also stated that the educators will not realize the difference between difficulty and complexity. In fact they believe item difficulty and complexity are the same. In their opinion, the increased number of steps in an item with high

Proceedings of the 20th National Symposium on Mathematical SciencesAIP Conf. Proc. 1522, 596-603 (2013); doi: 10.1063/1.4801179

© 2013 AIP Publishing LLC 978-0-7354-1150-0/$30.00

596

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

complexity turns the item into a difficult one. They will relate complexity with the student’s ability, but it is difficulty that is related to the ability of students.

The definitions of difficulty mentioned above are for multiple choice questions and they are not compatible with the mathematics questions with structured answers. Normally in mathematics, the number of solution steps reflects the complexity of the question, e.g. the larger the number of solution steps, the higher the complexity of the question (higher level in Bloom’s Taxonomy). At the same time, question with high level of complexity will give the impression that the question is difficult.

Given that difficulty of the multiple choice questions is defined based on the percentage of students who answered correctly, we proposed that the difficulty of structured questions for the course on probability be defined in a similar but different manner. The difficulty in this study is based on student performance which is the proportion of students scoring marks in particular ranges. Complexity is based on the levels in Bloom’s Taxonomy. Thus, the relationship between difficulty and complexity is studied in this paper.

METHODOLOGY

Initially the questions of the continuous assessment for the third year undergraduate course, Probability Theory were set based on the levels of Bloom’s Taxonomy. Since only three levels of the Bloom’s Taxonomy were identified, we reclassify them as low, moderate and high and considered them as levels of complexity.

Difficulty is defined as the percentage of students that scored at certain ranges, therefore the number of range becomes a problem. We determined the number of ranges as three, five and twelve levels. The effects of this increase in the number ranges for difficulty was studied to see whether they have an impact on the complexity and vice versa.

For three levels of difficulty, with a maximum score of 100, the range of scores and their labels are defined as:

(1) 80 to 100, Distinction; (2) 50 to 79, Pass and (3) 0 to 49, Fail.

These scoring ranges are commonly used in schools. Note that the sizes of the ranges are not the same.

Therefore, we proposed the division of scores into ranges of equal size as follows: (1) 80-100, A; (2) 60-79, B; (3) 40-59, C; (4) 20-39, D and (5) 0-19, F.

Labels A, B, C, D and F were chosen for convenience.

Universiti Sains Malaysia uses twelve levels of score ranges. They are:

(1) 80-100, A; (2) 70-79,A-; (3) 60-69, B+; (4) 56-59, B; (5) 50-55, B-; (6) 46-49, C+; (7) 40-45, C; (8) 36-39, C-; (9) 30-35, D+; (10) 25-29, D; (11) 20-24, D- and (12) 0-19, F.

597

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

Difficulty defined at twelve levels has never been studied in terms of its relationship with any factors or variables related to student performance or characteristics of questions. This study will determine whether the difficulty defined by the three different ways will have an impact on complexity or otherwise.

There were a total of 140 students who sat for the continuous assessment. This assessment was an open book test, which means the students were allowed to refer any books and notes during the test. There were eight questions in this test: two questions at low level complexity, two at moderate level complexity, and four questions at high level of complexity. Therefore, the total number of data points was 140 × 8 = 1120.

Three tests of independence were run on a 3×3 contingency table (three levels of complexity with three levels of difficulty); a 3×5 contingency tables (three levels of complexity with five levels of difficulty) and a 3×12 contingency tables (three levels of complexity with twelve levels of difficulty). The test statistics, V is given by:

� �2

1 1

r cij ij

iji j

O EV

E� �

���� (3)

where Oij is the observed value, Eij is expected value and V is distributed chi-square with (r-1)(c-1) degrees of freedom, r is the number of levels of complexity and c is the number of levels of difficulty. When V is large, thus, causing its corresponding p-value of the test to be smaller than 0.05, then the factors tested is related.

The variable selection procedure in predictive modeling of a categorical dependent variable, Y as a function of many categorical independent variables, X1, X2, .., Xn , where n is a large number, in data mining involved the construction of n contingency tables of Y × X1, Y × X2, …, Y × Xn [8]. n V values were obtained and the ones that are large and significant indicate which Xi variables crossed with Y can be chosen to build the model. Thus, giving the impression that the larger the V value the stronger the relationship between Y and Xi. In statistical methodology the magnitude of the V value has no connotation whatsoever with the strength of relationship between two categorical variables. Rather that it leads to the rejection of the null hypothesis that the two categorical variables are independent. Now, a rejected test of independence implies that the two categorical variables are related. Hence a more meaningful measure of association or one can say, a correlation like measure is needed if we are to explore this implied relationship further.

Conover [9] presented several measures proposed by Kendall [10] which are based on ranks. Therefore if we are to look at these paired ranks and summarized them in a contingency table, we have two categorical variables that are ordinal in nature. Our variables complexity and difficulty are ordinal. Over the years, Kendall’s measures have been modified and the following measures of association indices related to Kendall [10] are packaged by IBM SPSS Statistics in their crosstab menu: Kendall’s tau-b, Kendall’s tau-c, Gamma and Somer’s D. The Gamma index was the original index proposed by Kendall [10] as �. All of these indices were chosen so that we can determine the relationship by consensus.

The Kendall’s tau-b index is formulated as follows:

� �� �s d

bs d x s d y

N N

N N T N N T�

��

(4)

where Ns is the number of concordant pairs; Nd is the number of discordant pairs; Tx is the number of pairs tied on the independent variables, X and Ty is the number of pairs tied on the dependent variable, Y. An example on how this index is calculated is explained in the Appendix. Kendall’s tau-c index is formulated as follows:

� �21 1 /2

s dc

N N

N m m�

��

���

(5)

where, N is number of observations, m is the minimum value of � �,r c . The Gamma index is formulated as follows:

598

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

s d

s d

N NGamma

N N�

(6)

While the Somer’s D index can be calculated by:

s d

s d y

N ND

N N T�

(7)

These indices take values between -1 to +1. Zero indicates that the variables in the contingency table have no relationship. While +1 indicates a strong relationship between the two variables in the same direction and -1 also indicate a strong relationship between the two variables but in different direction.

RESULT AND DISCUSSION

Tables 1, 2 and 3 gives the frequency of each cell when complexity is crossed with difficulty. For the three tables, the calculated values of V are significant at 0.01� � . These mean that there is a relationship between complexity and difficulty. Table 4 gives the value of V for the three contingency tables.

TABLE (1). Complexity crossed with difficulty (3 levels)

Difficulty Total Fail Pass Distinction

Complexity Low 215 43 22 280 Moderate 197 52 31 280 High 351 126 83 560

Total 763 221 136 1120

TABLE (2). Complexity crossed with difficulty (5 levels)

Difficulty Total F D C B A

Complexity Low 208 7 42 1 22 280 Moderate 197 0 2 51 30 280 High 311 40 83 43 83 560

Total 716 47 127 95 135 1120

TABLE (3). Complexity crossed with difficulty (12 levels) Difficulty Total

F D- D D+ C- C C+ B- B B+ A- A Complexity Low 208 2 3 2 0 0 0 42 0 1 0 22 280

Moderate 197 0 1 0 0 0 0 1 0 48 3 30 280 High 311 21 7 11 1 38 3 40 2 17 26 83 560

Total 716 23 11 13 1 38 3 83 2 66 29 135 1120

In addition to the statistical values V, indices measuring association are also displayed in Table 4. Although all indices are statistically significant, they are only significantly different from 0. When the range of 0 to 1 was considered, the relationship between complexity and difficulty is weak.

TABLE (4). Relationship between Complexity and Difficulty Complexity × Difficulty V Kendall’s tau-b Kendall’s tau-c Gamma Somer’s D 3×3 18.616a 0.118b 0.097b 0.219b 0.117b

3×5 136.658a 0.144b 0.127b 0.248b 0.143b

3×12 231.112a 0.141b 0.126b 0.242b 0.141b

aThere is a relationship between complexity and difficulty at 0.01� � , bIndex not equal to 0 at 0.01� �

599

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

FIGURE 1. Three levels of complexity with five levels of difficulty

Figure 1 shows the interaction of complexity with five levels of difficulty. Please refer to Table 2. From Figure 1, the full lines K_1, K_2 and K_3 represents low, moderate and high level of complexity, respectively. Presumably, a K_1 question, a question with low complexity level, is an easy question and the expected percentage of students who scored grade A for this question is expected to be more than the expected percentages for a K_2 and a K_3 questions. We also expect the percentage of students scoring grade F in a K_1 question should be the least compared to those scoring the same grade in a K_2 and a K_3 questions. A K_2 question is a question of moderate complexity level therefore we expect the percentage of students scoring grades B and grades C together should be more than those in K_1 and K_3 questions. A K_3 question is a question at the highest complexity level. Hence the percentage of students scoring grade F are expected to be the highest and the percentage of students who get grade A is the least compared to questions of other levels of complexity.

The graph clearly shows that 74.29% of the students answered a K_1 question get grade F. This is also the highest percentage among questions at the three levels complexity. This means that students felt a K_1 question to be the most difficult question. The percentage of students getting grade F while attempting a K_2 question is 70.36%. This is contrary to expectation that a high percentage of these should appear in grade B and grade C. A percentage of 55.54% of the students who attempted a K_3 question gained grade F. The student performance on the most complex questions is better than their performance in answering questions of lower complexity. The reverse situation also was observed in the percentage of students getting grade A.

600

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

CONCLUSIONS

From the analyses run on the contingency tables, the 3×5 contingency table, matching complexity with 5 categories of difficulty displayed the strongest relationship between complexity and difficulty, although generally this relationship is quite weak (measures of association < 0.25). We found that low complexity questions are not necessarily easy questions and vice versa.

We can conclude that complexity and difficulty have relationship but it is not a strong relationship and the USM grading scheme with 12 levels does not have any distinct advantage in determining this relationship over other schemes with lesser levels. At the same significance level, the five categories of difficulty (grading) A, B, C, D and F are sufficient.

ACKNOWLEDGEMENTS

We would like to thank Universiti Sains Malaysia for funding this work through the university’s Short Term Research Grant 304/PJJAUH/6310001.

REFERENCES

1. H.K. Suen, Principles of Test Theories, Hillsdale: Lawrence Erlbaum, 1990. 2. A. Hotiu, "The Relationship Between Item Difficulty and Discrimination Indices in Multiple-choice Tests in A Physical

Science Course", Master thesis, Florida Atlantic University, 2006. 3. S. D. Phipps, D. Pharm and L. Marcia, American Journal of Pharmaceutical Education 73,146 (2009). 4. A. Zaman, A. Niwaz, F.A. Faize, M.A. Dahar and Alamgir, European Journal of Social Sciences 17, 61–66 (2010). 5. D. A. Sousa, How the Brain Learns, Thousand Oaks: Corwin, 2006. 6. B. S. Bloom, Taxonomy of Educational Objective: Book 1 Cognitive Domain. London: Longman, 1979. 7. Florida Department of Education, Cognitive Classification of FCAT Test Items, Tallahassee: Florida Department of

Education, 2008. 8. K. S. Sarma, Predictive Modeling with SAS® Enterprise MinerTM: Practical Solutions for Business Applications, Cary: SAS

Institute, 2007. 9. W. J. Conover, Practical Nonparametric Statistics, New York: Wiley, 1999. 10. M. G. Kendall, Biometrika 30, 81–93 (1938).

601

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

APPENDIX

An Example of the Calculation of Kendall’s tau-b Measure of Association.

Table 1 is used to illustrate the size calculation of the index (measure of association), Kendall’s tau-b. This index was chosen as an example because all the terms of Ns, Nd, Tx and Ty are calculated. The other indices only use a portion of these terms.

The calculation for concordant pairs, Ns starts from the top left hand side of the table and proceeds to the right hand side and down. The number of concordant pairs is calculated as follows:

215 52 31 126 83

215 (52 + 31 + 126 + 83) = 62780 43 31 83

43 (31 + 83) = 4902

197 126 83

197 (126 + 83) = 41173

52 83

52 (83) = 4316

Ns = 62780 + 4902 + 41173 + 4316 = 113171

To calculate discordant pairs, Nd, we start from the top right hand side of the table and proceed to the left hand side and down. The number of discordant pairs is calculated as follows:

22

197 52 351 126

22 (197 + 52 + 351 + 126) = 15972 43

197 351

43 (197 + 351) = 23564

31

351 126 31 (351 + 126) =14787

52

351 52 (351) = 18252

Nd = 15972 + 23564 + 14787 + 18252 = 72575

Ty is the number of related pairs that depends on variable, Y. The calculation begins at the top of left hand side of the table, moving down the first column and then moving on to the right to the top of the second column and similarly moving down the second column. The number of related pairs dependent on Y is:

215 43 22

215 (43 + 22) = 13975

197 52 31

197 (52 + 22) = 16351

351 126 83 351 (126 + 83) = 73359

43 22

43 (22) = 946 52 31

52 (31) = 1612 126 83

126 (83) = 10458

Ty = 13975 + 16351 + 73359 + 946 + 1612 + 10458 = 116701

602

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

Tx is the number of related pairs dependent on the variable, X. The calculation begins at the top of left hand side of the table, moving across the first row and then moving on to first entry in the second row and similarly moving across the second row. The number of related pairs dependent on X is:

215 197 351

215 (197 + 351) = 117820 43 52 126

43 (52 + 126) = 7654 22 31 83

22 (31 + 83) = 2508

197 351

197 (351) = 69147 52 126

52 (126) = 6552 31 83

31 (83) = 2573

Tx = 117820 + 7654 + 2508 + 69147 + 6552 + 2573 = 206254

� �� �s d

bs d x s d y

N N

N N T N N T�

��

� �� �

113171 72572113171 72575 206254 113171 72575 116701

��

� �� �

40596392000 302447

0.118�

603

Downloaded 28 Apr 2013 to 202.170.57.254. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions