Dynamic discreduction using Rough Sets

11
Applied Soft Computing 11 (2011) 3887–3897 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Dynamic discreduction using Rough Sets P. Dey a,, S. Dey a , S. Datta b , J. Sil c a School of Materials Science and Engineering, Bengal Engineering and Science University, Shibpur, Howrah 711 103, India b Birla Institute of Technology, Deoghar, Jasidih, Deoghar 814 142, India c Department of Computer Science and Technology, Bengal Engineering and Science University, Shibpur, Howrah 711 103, India article info Article history: Received 28 January 2010 Received in revised form 1 September 2010 Accepted 3 January 2011 Available online 12 January 2011 Keywords: Rough Set Discretization Classification Data mining TRIP steel abstract Discretization of continuous attributes is a necessary pre-requisite in deriving association rules and dis- covery of knowledge from databases. The derived rules are simpler and intuitively more meaningful if only a small number of attributes are used, and each attribute is discretized into a few intervals. The present research paper explores the interrelation between discretization and reduction of attributes. A method has been developed that uses Rough Set Theory and notions of Statistics to merge the two tasks into a single seamless process named dynamic discreduction. The method is tested on benchmark data sets and the results are compared with those obtained by existing state-of-the-art techniques. A real life data on TRIP steel is also analysed using the proposed method. © 2011 Elsevier B.V. All rights reserved. 1. Introduction With the impending explosion in digital data over the past few decades, it has become obligatory to search for methods that put semantics to these information. Deriving easily understandable rules and discovering meaningful knowledge in the data is a non- trivial task facing bottlenecks in the industry as well as in research levels. There exist different methods for analysis of data tables or infor- mation systems using techniques ranging from Statistics to formal logic. Most of them are designed to work on data where attributes can have only a few possible values [1]. But majority of the practi- cal data mining problems are characterised by continuous attribute values. It has been common practice to resolve the problem in two separate modules. The first step involves transforming the continu- ous data into a few discrete intervals, which is termed discretization. The discretized data is then analysed using the available techniques, loosely called machine learning algorithms. There has been a considerable volume of research on discretiza- tion. A couple of review papers [2,3] summarizes and categorizes the major studies, whereas [4] enlists 33 different discretization methods. But no single method has been reported to perform bet- ter than the others in all respects. Nor is any one of them known to apply indiscriminately to any data whatsoever [5]. Corresponding author. E-mail address: [email protected] (P. Dey). Rough Set (RS) is known to be a powerful tool for deriving rules from data using a minimum number of attributes. The method involves finding a reduct containing a minimal subset of attributes which is just sufficient to classify all objects in the data. However RS methods are typically devised to deal with discrete attribute values, which calls for methods that discretize the data either beforehand, or after the reduct has been found. RS has been successfully used to discretize continuous data through different innovative meth- ods [4–6], often used in combination with ant colony [7] or particle swarm optimization [8]. Different statistical techniques and mea- sures have also been used to innovate or improve a discretization procedure [9–11]. But problems with such modular methodology is that the attribute reduction and discretization processes are usually assumed to be independent of one another. This involves high computation cost or low prediction accuracy. If the data is first discretized, cost is escalated from discretizing the redundant attributes, particulary if the data is large. On the other hand, giving prime emphasis to the elimination of attributes may lead to loss of important information and over-fitting of the data. The rules thus derived will have fewer objects in their support and suffer from low prediction accuracy. Presence of noise in the data often hastens such possibility. Some very recent works have tried to fuse the two tasks of discretization and feature selection [12,13]. Researchers have also proceeded along more generalized paths like symbolic value par- titioning [14] or tolerance Rough Set method based on similarity measure [15] that performs the two tasks simulatneously. There has been studies [16] where attributes values are grouped differ- 1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2011.01.015

Transcript of Dynamic discreduction using Rough Sets

D

Pa

b

c

a

ARRAA

KRDCDT

1

dsrtl

mlccvsoTl

ttmta

1d

Applied Soft Computing 11 (2011) 3887–3897

Contents lists available at ScienceDirect

Applied Soft Computing

journa l homepage: www.e lsev ier .com/ locate /asoc

ynamic discreduction using Rough Sets

. Deya,∗, S. Deya, S. Dattab, J. Sil c

School of Materials Science and Engineering, Bengal Engineering and Science University, Shibpur, Howrah 711 103, IndiaBirla Institute of Technology, Deoghar, Jasidih, Deoghar 814 142, IndiaDepartment of Computer Science and Technology, Bengal Engineering and Science University, Shibpur, Howrah 711 103, India

r t i c l e i n f o

rticle history:eceived 28 January 2010eceived in revised form 1 September 2010ccepted 3 January 2011vailable online 12 January 2011

a b s t r a c t

Discretization of continuous attributes is a necessary pre-requisite in deriving association rules and dis-covery of knowledge from databases. The derived rules are simpler and intuitively more meaningful ifonly a small number of attributes are used, and each attribute is discretized into a few intervals. Thepresent research paper explores the interrelation between discretization and reduction of attributes. Amethod has been developed that uses Rough Set Theory and notions of Statistics to merge the two tasks

eywords:ough Setiscretizationlassificationata miningRIP steel

into a single seamless process named dynamic discreduction. The method is tested on benchmark datasets and the results are compared with those obtained by existing state-of-the-art techniques. A real lifedata on TRIP steel is also analysed using the proposed method.

© 2011 Elsevier B.V. All rights reserved.

. Introduction

With the impending explosion in digital data over the past fewecades, it has become obligatory to search for methods that putemantics to these information. Deriving easily understandableules and discovering meaningful knowledge in the data is a non-rivial task facing bottlenecks in the industry as well as in researchevels.

There exist different methods for analysis of data tables or infor-ation systems using techniques ranging from Statistics to formal

ogic. Most of them are designed to work on data where attributesan have only a few possible values [1]. But majority of the practi-al data mining problems are characterised by continuous attributealues. It has been common practice to resolve the problem in twoeparate modules. The first step involves transforming the continu-us data into a few discrete intervals, which is termed discretization.he discretized data is then analysed using the available techniques,oosely called machine learning algorithms.

There has been a considerable volume of research on discretiza-ion. A couple of review papers [2,3] summarizes and categorizeshe major studies, whereas [4] enlists 33 different discretization

ethods. But no single method has been reported to perform bet-er than the others in all respects. Nor is any one of them known topply indiscriminately to any data whatsoever [5].

∗ Corresponding author.E-mail address: [email protected] (P. Dey).

568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved.oi:10.1016/j.asoc.2011.01.015

Rough Set (RS) is known to be a powerful tool for deriving rulesfrom data using a minimum number of attributes. The methodinvolves finding a reduct containing a minimal subset of attributeswhich is just sufficient to classify all objects in the data. However RSmethods are typically devised to deal with discrete attribute values,which calls for methods that discretize the data either beforehand,or after the reduct has been found. RS has been successfully usedto discretize continuous data through different innovative meth-ods [4–6], often used in combination with ant colony [7] or particleswarm optimization [8]. Different statistical techniques and mea-sures have also been used to innovate or improve a discretizationprocedure [9–11]. But problems with such modular methodologyis that the attribute reduction and discretization processes areusually assumed to be independent of one another. This involveshigh computation cost or low prediction accuracy. If the data isfirst discretized, cost is escalated from discretizing the redundantattributes, particulary if the data is large. On the other hand, givingprime emphasis to the elimination of attributes may lead to loss ofimportant information and over-fitting of the data. The rules thusderived will have fewer objects in their support and suffer fromlow prediction accuracy. Presence of noise in the data often hastenssuch possibility.

Some very recent works have tried to fuse the two tasks ofdiscretization and feature selection [12,13]. Researchers have also

proceeded along more generalized paths like symbolic value par-titioning [14] or tolerance Rough Set method based on similaritymeasure [15] that performs the two tasks simulatneously. Therehas been studies [16] where attributes values are grouped differ-

3888 P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897

Table 1An illustrative example: 10 samples from the Iris data set.

Objects A D Class

a1 a2 a3 a4 dSepal length Sepal width Petal length Petal width Iris

U

u1 4.8 3.0 1.4 0.1 Setosa

C1u2 5.7 4.4 1.5 0.4 Setosau3 5.1 3.5 1.4 0.3 Setosau4 4.9 3.6 1.4 0.1 Setosau5 4.9 2.4 3.3 1.0 Versicolor

C2u6 5.9 3.2 4.8 1.8 Versicoloru7 5.5 2.5 4.0 1.3 Versicolor

et

rRdsvTritdpk

iTaiomiF

2

sbetoc

oad

I

is

2

cFS

their Sepal Lengths. From Eq. (1), the classification accuracy of {a1}comes to be 0.8. An information system is said to be consistent if theentire set of attributes A can classify all the objects in U. That is, if

�(A, d) = 1

u8 6.7 2.5u9 6.2 2.8u10 6.2 3.4

ntly for each rule. Evidently there is need for further research inhis area.

In this paper the possible relations between discretization andeduction of attributes are explored. A method is proposed that usesS and notions from Statistics to discretize, and at the same timeistinguish the important attributes in a data set based on randomamples. Using samples instead of the whole data to discretize theariables reduce the computation time, specially for large data sets.he resulting rules become simpler, and the classification accu-acy increases, particularly in case of noisy data sets. The methods tested on some benchmark data, and comparison is made withhe results obtained using state-of-the-art techniques. A real lifeata from Materials Engineering field is also analysed using theroposed method, and the rules derived are assessed in view ofnowledge discovery.

The remaining part of the paper is organized into the follow-ng five sections. Section 2 briefs the basic notions of Rough Setheory and exemplifies the interrelation between discretizationnd reduction of attributes. The concept of discreduction is alsontroduced. In Section 3 the dependence of a discreduct on choicef attributes and sample size is explored. Section 4 describes theethod of finding a dynamic discreduct, whereas results on apply-

ng the method to different data sets is presented in Section 5.inally, Section 6 concludes the article.

. Rough Set Theory

Rough Set Theory (RST), introduced by Pawlak [17], is an exten-ion of the classical Set Theory proposed by Georg Cantor [18]. Inrings forth the concept of a boundary region which accommodateslements lying midway between belonging and not belonging tohat set. The theory has shown immense usefulness in distillingut requisite information through a minimal set of attributes thatlassify all possible objects in a data set.

An information system I essentially consists of a set (universe)f objects U. Each object has definite values for a set of conditionalttributes A, depending on which the value of a decision attributeis ascertained. Formally it is the ordered pair

= (U,A ∪ {d})In the following subsections the basic notions of RST have been

llustrated with the help of an example (in Table 1) consisting of 10amples from the Fisher’s Iris data set [19].

.1. Classification

We can choose a non-empty subset of attributes S⊆A and pro-eed to examine the capability of S in classifying the objects in U.or example, if only the Sepal Length attribute is considered, i.e.= {a1}, the objects u4 and u5 are indiscernible, and so are u9 and

5.8 1.8 VtginicaC34.8 1.8 Vtginica

5.4 2.3 Vtginica

u10. Whereas the latter pair of objects, by virtue of being in the sameclass pose no problem in classification, the former pair belong todifferent classes of Iris, but are indistinguishable from one anotherfrom its Sepal Length alone. {a1} can thus definitely classify 3 ofthe 4 Iris Setosa flowers in the sample (u1, u2, u3), as also 2 of 3 Ver-sicolor, and 3 of the 3 Virginica flowers. This puts the classificationaccuracy of S at 8/10. The concepts may be formalized as follows.

Let Ci represent the set of objects in the ith class for i = 1, 2, . . .,l (l being the number of decision classes in I). Now U/d = {C1, C2,. . ., Cl} is the partition of U induced by the decision attribute d. TheS-lower approximation of Ci is defined as the set of objects definitelyclassified as Ci using the information in S. It is denoted by

S(Ci) ={u|[u]S ⊆ Ci

}, i = 1,2, . . . , l

where [u]S is the set of objects indiscernible from u, given the infor-mation S (i.e. the objects for which values of all the attributes in S areidentical to those of u). The corresponding S-upper approximationis the set of objects possibly classified as Ci using S

S̄(Ci) ={u|[u]s ∩ Ci /= ∅

}, i = 1,2, . . . , l

and the difference of the two sets give the S-boundary region of Ci

BNs(Ci) = S̄(Ci)− S-(Ci), i = 1,2, . . . , l

The positive region of U/d with respect to S is the union of lowerapproximations of all classes in U/d. It represents the set of objectsthat can be definitely classified into one of the decision classes C1,C2, . . ., Cl using the information in S. It is denoted by

POSS(d) =⋃

Ci ∈U/dS(Ci)

whereas the classification accuracy of S with respect to d is thefraction of definitely classifiable objects expressed as

�(S, d) = |POSS(d)||U| (1)

Clearly, the positive region of I w.r.t. Sepal Length is {u1, u2, u3,u6, u7, u8, u9, u10}, denoting the set of objects about which definiteconclusion may be drawn (regarding the class of Iris) using only

The information system I in Table 1 is a consistent system.The significance of an attribute a∈ S is defined by the relative

decrease in classification accuracy on removing a from S. It is

P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897 3889

h and

ple (T

d

w

S

2

seosta

a

γ

aT

rsnaso

2

uiF

i

wc

i

wvi

(a) Sepal Length (b) Sepal Lengt

Fig. 1. (a–c) Discretization of the Iris sam

enoted by

(S,d)(a) =�(S, d)− �(S′, d)

�(S, d)(2)

here S′ = S−{a}.As an example, S = {a1, a2} can classify all the 10 objects, while

′ = {a1} classifies only 8. Thus �(s,d)(a2) = 0.2.

.2. Redundancy

If two attributes are sufficient to classify all the objects in a dataet, the other attributes are considered redundant, and eventuallyliminated to form a reduct. Formally, a reduct R is a minimal subsetf attributes which has a classification accuracy of 1, minimal in theense that every proper subset of R has a classification accuracy lesshan 1. Formally, R⊆A is a reduct if and only if both Eqs. (3) and (4)re satisfied

(R, d) = 1 (3)

nd

(S, d ) < 1 ∀S R (4)

There may be more than one reduct in any information system,nd a reduct with the lowest cardinality is called minimal reduct.hus, R1 = {a1, a2} is a minimal reduct for Table 1.

A set of attributes satisfying Eqs. (3) and (4) is called an ‘exact’educt. A known problem in exact reducts is that they are highlyusceptible to noise, and often perform badly in classifying exter-al data [20]. Dynamic reducts, on the other hand, constitute thosettributes that appear ‘most frequently’ in the reducts of a series ofamples taken from the data. Dynamic reduct and the general rolef samples has been dealt in greater detail in Section 2.6.

.3. Discretization and association rules

Data, in most practical cases, is presented as real-valued (contin-ous) attributes, which per se generate rules with very few objects

n its support. These rules are inefficient in classifying external data.or example, the rule

f Sepal Length is 6.2, then the Iris isVirginica

ith a set of two objects {u9, u10} in its support. However, the ruleould be further generalized to the form

f Sepal Length ≥ 6, thenVirginica (R1)

hich has a three-object support {u8, u9, u10} in its favour. Thealue 6, in that case, is called a cut on the attribute Sepal Length. Its denoted by the ordered pair (a1, 6).

Sepal Width (c) Sepal Length and Petal Length

able 1) with different sets of attributes.

At least three more cuts (shown by the dotted lines in Fig. 1(a))are required to classify the remaining objects. This results in fourmore rules

if 5.8 ≤ Sepal Length< 6.0, thenVersicolor (R2)

if 5.6 ≤ Sepal Length< 5.8, then Setosa (R3)

if 5.3 ≤ Sepal Length< 5.6, thenVersicolor (R4)

if Sepal Length< 5.3, then Setosa (R5)

The first three of these (R2), (R3), (R4) are supported by single-tons {u6}, {u2}, {u7}, respectively. (R5), however, has a triplet {u1,u3, u4} in its support, as well as one misclassified object (u5) thatdiminishes its accuracy to 3/4.

To formalize the concepts: discretization is the division of therange of an attribute’s values into a few intervals, and the values atthe interval boundaries are called cuts. A set of cuts spanning oneor more attributes is said to be consistent if all the objects in theinformation system can be properly classified using the intervalsformed by the cuts. For example, the set of cuts {(a1, 5.3), (a1, 5.6),(a1, 5.8), (a1, 6.0)} is inconsistent, since u5 ∈C2 is indiscernible fromu1, u3, u4 ∈C1.

For any rule “if then �,” the accuracy and coverage [21] maybe expressed as

Accuracy = || ||I ∩ ||�||I|| ||I(5)

and

Coverage = || ||I ∩ ||�||I||�||I (6)

where || ||I and ||�||I are the set of objects in I matching theantecedent ( ) and consequent (�) of a rule, and |·| is the cardi-nality of a set. The set of objects || ||I ∩ ||�||I that match both theantecedent and consequent, is the support of a rule.

2.4. Linguistics and semantics

An improvement in the rule set (R) (i.e. (R1)–(R5)) is evident if asecond attribute is taken into account. Including Sepal Width alongwith Sepal Length reduces the number of necessary cuts to three(denoted by the 3 dotted lines in Fig. 1(b)) and all the 10 objects arecorrectly classified by them. Thus {(a1, 5.8), (a1, 6), (a2, 2.75)} is aconsistent set of cuts.

The intervals formed by the cuts are often assigned linguisticlabels like short, moderate and long, or wide and narrow, to rep-resent the rules in a more familiar form. As such, the process ofdiscretization not only increases the efficacy of classification, the

3 ompu

rm

s

s

m

l

w

A

a

A

imePwo

s

s

l

wsl

adtirc

2

g

(

r

rfiacct

••

(f

890 P. Dey et al. / Applied Soft C

esulting rules are also simplified, reduced in number, and renderedeaningful from the intuitive aspect. The four rules thus formed are

hort,wide Sepals are Iris Setosa (S1)

hort,narrow Sepals are Iris Versicolor (S2)

oderately long Sepals are Iris Versicolor (S3)

ong Sepals are Iris Virginica (S4)

here the intervals for Sepal Width are denoted as follows

Sepal is

{narrow if Sepal Width < 2.75 cm, andwide otherwise

nd as regards the Sepal Length, the intervals are

Sepal is

{short if Sepal Length< 5.8 cm,long if Sepal Length ≥ 6 cm, andmoderately long otherwise

The rule set (S) (i.e. (S1)–(S4)) is evidently better than (R), bothn terms of classification accuracy and rule semantics. Only the

oderately long interval is sarcastically narrow. To take a lastxample, we consider another pair of attributes Sepal Length andetal Length. Together they excellently classify all the 10 objectsith just two cuts – one on each attribute {(a1, 6), (a3, 2)}. The set

f rules then becomes extremely simple

hort Sepals and short Petals are Iris Setosa (P1)

hort Sepals and long Petals are Iris Versicolor (P2)

ong Sepals and long Petals are Iris Virginica (P3)

here the thresholds distinguishing short and long are 6 cm forepals and 2 cm for the petals, corresponding to the two dottedines in Fig. 1(c).

Evidently the set of rules (P1)–(P3) (or (P)) appear much simplernd is intuitively easier to understand than the rule set (S). On pon-ering over (P), one may even raise a ‘childish’ question: why arehere no Iris flowers with long sepals and short petals? To a botanistt might lead to some specialized knowledge discovery, but we mayather content ourselves with the simple answer that nature has aertain sense of proportion, which she abhors to disrupt.

.5. Unification

To sum up, clarity and classification accuracy of a set of rules isoverned by two major factors:

(i) choice of attributes (resulting in the reduct), andii) selection of cuts (forming the discretization).

In order to obtain an optimum balance between them, the inter-elation between the two need to be taken into account.

A reduct is the result of eliminating redundant attributes, whileedundant cuts are eliminated in discretization. The heuristic fornding reducts (or discretization) essentially consists of selectingset of minimal attributes (or cuts) [20]. Whereas a few finely dis-

retized attributes may satisfy Eqs. (3) and (4), the discretizationould be coarse if more attributes are considered. In other words,he problems of discretization and finding a reduct are

self-similar across scales, andcomplementary and inter-dependent.

Considered in unison, the interdependence of the two factors:i) and (ii) very much determine the clarity of knowledge extractedrom the data (expressed by the comprehensibility of the rules)

ting 11 (2011) 3887–3897

and to some extent influence the classification accuracy. In orderto arrive at an optimally discretized set of attributes, discretiza-tion and finding reducts need to be merged into a single seamlessprocess. With this unification in view, the following terms may beintroduced.

Definition 0. Discreduction is essentially a process that success-fully blends the two related processes of discretization and findingreduct for an information system.

Definition 1. A discreduct is a set of ordered pairs that, at the sametime discretizes and determines a reduct of an information system.It may be expressed as

D ={

(ak, tki )|ak ∈A and tki ∈Vak

}for

{k∈ {1,2, . . . , na}i = 1,2, . . . ,mk

where na is the no. of attributes in A, mk is the no. of cuts on ak,and Vak = [lak , rak ) ⊂ R is the range of values of ak (R being the setof real numbers).

Without loss of generality, it may be assumed that the mk cutson any attribute ak are arranged in ascending order of their values,i.e.

i < j⇔ tki < tkj , i, j∈ {1,2, . . . ,mk}

Definition 2. The reduct of a discreduct is the set of attributeswhich has at least one cut in the discreduct. In other words it is thedomain of the discreduct

R = dom D ={a | (a, t)∈D for some t

}.

Definition 3. The discreduced information system I may bedefined as the ordered pair consisting of the same set of objectswith a reduced set of attributes R⊆A and discrete attribute values

ID = (U,R ∪ {d})where the attributes in the reduct are transformed as

aDk (u) = i⇔ ak(u)∈ [tki , t

ki+1), i = 0,1,2, . . . ,mk

with tk0 = lak and tkmk+1 = rak . The transformed attribute valuesmay be suitably mapped to a set of linguistic labels as discussedin Section 2.4. The set of discreduced attributes in the reduct mayalso be denoted by RD.

Definition 4. The classification accuracy of a discreduct is the ratioof the number of objects classified by the discreduct and the num-ber of objects classified by all the attributes prior to discreduction.

�(D) = |POSRD (d)||POSA(d)|

2.6. Role of samples

A consistent set of cuts suffers from two major drawbacks: onequantitative and the other qualitative. First, the amount of com-putation drastically increases with the size of the data. Secondly,the consistent set of cuts that exactly classifies every object in agiven data (particularly with the minimal set of attributes) suf-fers from some kind of over-fitting, which decreases its accuracyin predicting unknown external data. The first makes the processtime-consuming and costly—specially in case of large data sets with

many distinct values of each attribute, while the second rendersthe final classification or decision rules clumsy, incomprehensible,inaccurate or even incorrect—particularly when the data is impre-cise or noisy.

P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897 3891

Table 2Data set properties.

Name of the data set # of attrib.’s # of objects # of classes # of minimal reducts | · | of minimal reducts Avg. # of cuts|A| |U| |U/d| n |R| nc

rirr‘tiTgs

t

(

3

fidcawTd

[L

Iris 4 150 3Glass 9 214 6Breast 9 683 2

To resolve the problem, a series of data samples are taken andeducts are found for these samples. The attributes that are presentn ‘most’ of the sample reducts are collected to form a dynamiceduct, which has been shown [20] to improve classification accu-acy. But there are some open problems, such as quantification ofmost,’ or determination of the size and number of samples, so aso ensure that the major trends in the data are sufficiently reflectedn the rules, and at the same time noise is satisfactorily filtered out.hese are quantities that should be carefully calibrated in order toet optimum results. Unfortunately, there exists no unanimity intandard literature for determining such thresholds.

In this paper we devise a method that uses samples taken fromhe data to find a dynamic discreduct which determines

(i) the requisite attributes,(ii) the number of cuts on each attribute, andiii) the value (or position) of the cuts.

. Behaviour of dynamic discreducts

Two major arguments are made in this section:

Cuts from more significant attributes are more effective in a dis-creduct; this means that if we go on choosing the most effectivecut in each step (as the MD-heuristic does) more cuts will finallybe selected from attributes with a higher significance.The number of cuts needed to consistently classify a sample ofobjects is proportional to the square root of the sample size.

The first claim has no direct bearing on the methodology fornding discreducts. It is a study which warranties that cuts in aiscreduct are not chosen randomly from any attribute, rather eachut chosen from an attribute (using the heuristic method) stressesnd signifies the presence of that attribute in the reduct. In otherords it asserts that a discreduct is also a reduct (see Definition 1).

he second claim is a lemma that is directly applied in the dynamic

iscreduct algorithm.

Three commonly used data sets, namely Iris, Glass, and Breasti.e. Winconsin Breast Cancer (Original)], from the UCI Machineearning Repository [19] have been used to study these behaviours.

(a) Iris (b) Gl

Fig. 2. (a–c) Significance vs. no. of

4 3 10.010 2 45.5

8 4 16.7

The properties of the data sets are given in the first four columns ofTable 2. Records with missing values has been removed from thelast data set.

3.1. Significance vs. no. of cuts

For the first claim, all the minimal reducts (R1, R2, . . ., Rn) arecomputed for a data set, and each reduct is then discretized with theMD-heuristic algorithm [20] to get n discreducts (D1, D2, . . ., Dn).The number and cardinality of the minimal reducts for the threedata sets are given in columns 5 and 6 of Table 2, while the last col-umn gives the average number of cuts, nc in the discreducts. Sincea tie in choosing the most effective cut (that discerns maximumobject pairs from different decision classes) was resolved by a toss,10 runs are taken for each reduct. Considering all the discretizedminimal reducts, the average number of cuts contributed by anattribute, �(a), is plotted against the significance of that attribute,�(a) in Fig. 2.

�(a) = 1n

∑i

(cuts contributed by a in Di)

�(a) = 1n

∑i

�(Ri,d)(a)

A weighted average � ′(a) for the number of cuts contributed byan attribute is also calculated. The logic of placing weight is that:if a set of objects could be classified by fewer cuts, each cut shouldbe deemed more effective. Thus the weight carried by a cut wouldvary inversely as the the number of cuts in that discreduct, nc/Digiving the relative efficiency of every cut in Di

� ′(a) = ncn

∑i

(cuts contributed by a in Di

total no. of cuts in Di

)

where nc = (1/n)∑

|Di| is the average number of cuts in the n min-imal reducts.

The linear fits in Fig. 2 shows that the significance of an attributeis roughly proportional to the number of cuts contributed by theattribute. The correlation coefficient (r) is about 90% for all the datasets, with the weighted average giving a slightly better value.

ass (c) Breast

cuts on different data sets.

3892 P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897

) Gla

ampl

3

r

n

wasnUt

n

wi(tsa

dtpt3a

n

lniifivowb

tgccr2

(a) Iris (b

Fig. 3. (a–c) No. of cuts vs. s

.2. No. of cuts vs. sample size

A sample of size nz is taken from the set of objects U usingandom numbers generated from the code. nz is varied as

z = z · |U|, z = 0.1,0.2, . . . ,0.9

here, for each of the nine relative sample sizes (z), 100 samplesre taken. The consistent set of cuts Ti is determined for the ithample Ui using the MD-heuristic algorithm [20], and the averageumber of cuts (nc =

∑100i=1 |Ti|/100) needed to consistently classify

i is plotted against z. The results in Fig. 3 show that z–nc plots forhe three data sets closely fit the power relation

c = c · zp (7)

ith an r2 value of 99.6%. Here, c is a constant for any data set denot-ng the number of cuts needed to consistently classify the entire Ui.e. when z = 1), and the parameter p has a value very near 0.5 forhe Glass and Breast data sets. The relation of nc with z thus corre-ponds to the well known expression for the standard deviation ofsample, �n ∝

√n, for fairly large sample sizes n�1.

In case of Iris, the value of p is slightly higher (about 0.6). Theeviation is caused by the two points in Fig. 3(a) corresponding tohe smallest sample sizes (z = 0.1 and 0.2) that clearly fall out of thearabolic fit. The misfit of these two points is explicably becausehe number of cuts has fallen to an extremely low value (less than), in which condition relation (7) does not hold. We thus fit anlternative form of the power relation

c = b+ c · zp (8)

eaving out the two smallest sample sizes (z = 0.1 and 0.2). The close-ess of fit (r2) then increases to more than 99.9%. The improvement

s also graphically evident, where the gray dotted curve represent-ng Eq. (8) passes through all the seven points, whereas the (blue)rm line as per Eq. (7) seems a bit stiff in fitting the points. Thealue of p also touches the expected value 0.5. The small interceptn the z-axis (about 4.7) may be interpreted as the size of a samplehere just one (round off zero to the next highest integer) cut will

e sufficient to discern all objects in the sample.To sum up, firstly an attribute in an information system con-

ributes more cuts in a discreduct if its removal from a reductreatly reduces the classification accuracy (i.e. it has a high signifi-

ance as per Eq. (2)); and secondly, the number of cuts required toonsistently classify a random sample of objects varies as the squareoot of the sample size for fairly sized discreducts (with more thancuts).

ss (c) Breast

e size of different data sets.

4. Methodology for finding dynamic discreduct

The algorithm for finding a dynamic discreduct D for a data setis given below. The number and size of samples ns and nz shouldbe tuned so that the major trends in the data is sufficiently rep-resented, and at the same time any noise gets eliminated as far aspossible. The optimum values of these two parameters has beendiscussed at the end of Section 4.1.

ALGORITHM: Dynamic Discreduct (I, ns, nz)1 for i = 1 to ns do2 create Ui , a random sample of nz objects3 determine Ti , a minimal consistent set of cuts for Ui

4 for k = 1 to na do5 Tik ← {t ∈ Ti | t is a cut on attribute ak}6 end for7 end for8 D← ∅, R← ∅9 for k = 1 to na do

10 computemk = 1ns

ns∑i=1

|Tik |; round-off mk

11 if mk > 0 then12 Fk← the mk most frequent cuts in Tik , i = 1, 2, . . ., ns

13 D← D ∪ {ak} × Fk14 R← R− ∪{ak}15 end if16 end for17 return D, R

where I = (U,A ∪ {d}) is the information system, ns is the numberof samples, nz is the size of each sample,and na = |A| the number ofattributes. The set Tik contains the cuts contributed by ak in dis-cretizing the objects in the ith sample Ui. In ns successive samples,the cuts that are chosen with the highest frequency are selected inFk. The cardinality of Fk is determined by averaging the cardinalitiesof Tik (over i = 1, 2, . . ., ns). The principle is of proportional represen-tation: those attributes which can classify more objects get a highershare of cuts.

4.1. Classification accuracy of dynamic discreduct

In this subsection, we explore to find out the optimum size ofa sample that could best predict the class of some unknown datain a particular domain. For this the data set is split into 5 equalsubsets (S1, S2, . . ., S5) using random numbers generated from thecode. One of them is chosen as the ‘test-set,’ and the remaining aremerged into a ‘training-set.’ Dynamic discreduct is found for thetraining set using the algorithm in Section 4. Rules are derived from

the discreduced training set, which are given to predict the class ofevery object in the test set. If no rule is applicable on an object inthe test set, it is predicted to fall in the widest class (i.e. one with themaximum number of objects). The percentage of objects correctly

P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897 3893

(b) G

racy w

c

e(

apsc

n

wcdsbTbs(t3

5

5

v(ofastR[

bw

(z) in Fig. 5, and a power fit was tried on the data points for eachof the three data sets. Now, the time required for selecting a cutvaries as kz·nz, where the number of objects in a sample nz ∝ z, and

Table 3Classification accuracies achieved by different methods.

Data S-ID3 C4.5 MDLP 1R RS-D LEM2 DD

Iris 96.67 94.67 93.93 95.9 95.33 95.3 96.06Glass 62.79 65.89 63.14 56.4 66.41 – 66.68Breast – – 93.63 – – – 95.34

Average 79.73 80.28 83.57 76.15 80.87 95.3 –DD avg. 81.37 81.37 86.03 81.37 81.37 96.06 –

(a) Iris

Fig. 4. (a–c) Variation of classification accu

lassified is the classification accuracy for that test set

i =no. of objects in Si correctly classified by the rules from U − Si

no. of objects in Si

The classification accuracies are averaged over the five test sets,ach time merging the remaining four subsets into the training setthe fivefold cross validation scheme [1])

= 15

5∑i=1

�i (9)

The classification accuracy (�) for the three data sets is plottedgainst the number of cuts (nc) for different values of relative sam-le size (z) in Fig. 4. The results suggest that 20–30% of the datahould be the optimum sample size. The optimum number of cutsan then be found from Eq. (7) according as

(optm)c = c ·

√z(optm)

hich is near about 1/2 the number of cuts required to consistentlylassify all the objects in U. But in practice, c is quite difficult toetermine; rather the number of cuts needed to consistently clas-ify a sample of size nz is much more readily available. So, it isest to set the value of z to 0.25, and the average cardinality ofi, doubled, would serve as a good estimate for the optimal num-er of cuts needed to predict the whole data. If it falls below 3, 1hould be added to it, and it’s safe to round the value off to the nexthigher) integer. The number of samples ns can be determined fromhe thumb rule that each object in the data should be representedor 4 times. Thus, for nz = 0.25, ns would be 12–15.

. Results

.1. Classification of UCI data sets

Using the algorithm described in the previous section, the bestalues of classification accuracy achieved by dynamic discreductionDD) for the three data sets has been compared with other meth-ds in Table 3. The results denoted as S-ID 3 and C 4.5 were takenrom [20]. The MDLP algorithm was originally proposed by Fayyadnd Irani [22], but the original paper bears no numerical results;o other sources [1,23] were resorted to. The results of 1 R wereaken from the original paper by Holte [24]. The results denoted asS-D are obtained using classical Rough Set discretization methods

1,20], while LEM 2 is a powerful Rough Set algorithm [25].

The results of each reference in Table 3 have been summarizedy an average of the existing results, and comparison has been madeith the average of present (DD avg.) results for the corresponding

lass (c) Breast

ith number of cuts for different data sets.

data sets. It is clear that the method of dynamic discretization pro-posed in the present paper outperforms all the prevailing methods.

5.1.1. Run time vs. sample sizeExperiments were conducted with different sample sizes (z)

varying from 10% to 80% of the training data. The number of sam-ples (ns) was set so as to ensure that each data in the training set isrepresented thrice on an average in the samples, i.e. z ·ns�3. Thetime taken for discreduction (i.e. selection of cuts) is presented inTable 4. The results suggest that for the larger data sets (exceptingIris) taking samples substantially reduce the computation time.

The run time (t) was also plotted against the relative sample size

Fig. 5. Run time vs. sample size of different data sets.

3894 P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897

Table 4Run-time for full training set and different sample sizes (z) and no. of samples (ns �3/z).

(z, ns) (0.1, 30) (0.2, 15) (0.25, 12) (0.3, 10) (0.4, 8) (0.5, 6) (0.6, 5) (0.8, 4) (1, 1)

0.188 0.227 0.255 0.354 0.1343.335 4.303 5.597 8.616 3.6031.966 2.372 3.013 4.713 1.950

tt

t

1ss

5

crBaau

Zdmac

nmsTcpdah

iu5uT

1

2

v

cTtTm

plfe

Table 5Numerical range of the attributes in the TRIP steel data.

Attributes (units) Symbol Min. Max.

A. Conditional(i) Composition

1. Carbon (wt%) C 0.12 0.292. Manganese (wt%) Mn 1.00 2.393. Silicon (wt%) Si 0.48 2.00

(ii) Processing4. Cold deformation (%) d 56.25 77.145. Intercritical annealing Temp. (◦C) Ta 750 8606. Intercritical annealing time (s) ta 51 12007. Bainitic transformation temp. (◦C) Tb 350 5008. Bainitic transformation time (s) t 30 1200

Iris 0.422 0.242 0.216 0.253Glass 0.888 1.428 1.835 2.244Breast 0.697 0.888 1.060 1.281

he number of distinct values in a sample varies as kz ∝√z. Thus

he time taken to discretize a sample should vary as

∝ z1.5 (10)

The exponents of z in Fig. 5 strongly suggest the expected value.5, with r2 greater than 99%. The figure also suggests that a sampleize less than 40% would be computationally economic, taking theuggested number of samples.

.2. Mining a real life data set

In this subsection the advantages of using the dynamic dis-reduct algorithm over the conventional process of finding theeduct and discretizing the attributes in the reduct is examined.oth the methods are applied to a data set relating the compositionnd process parameters to the mechanical properties of TRIP steel,nd the obtained rules are compared as to how the rules reveal thenderlying TRIP phenomena.

Transformation induced plasticity (TRIP) steels, first reported byackay et al. [26], exhibit a superior combination of strength anductility. This characteristic has made TRIP aided steel a potentialaterial for various automobile parts requiring high strength with

dequate formability. As the phenomenon of TRIP is quite compli-ated, there is still sufficient scope for exploration.

Several attempts have been made to model the TRIP phe-omenon from its physical understanding [27,28] but no suitableodel exists till date to predict the mechanical properties of TRIP

teel directly from the composition and processing parameters.his is mainly due to the lack of precise knowledge about theomplex and non-linear role of the independent variables on theroperties of the steel. Efforts have been made to develop data-riven models using tools like artificial neural network and geneticlgorithm to predict TRIP steel properties [29–31]. But these modelsave the inherent complexity and opacity of a black box.

In a recent work [32] the properties of TRIP steel have beennvestigated from the RS approach to deduce rules from the datasing the minimal reduct discretization method described in Section.2.1. In Section 5.2.2 the dynamic discreduction algorithm has beensed to derive another set of rules from the same TRIP steel data.he two methods may be briefed below

. Determine a minimal reduct, then discretize every attributetherein, and derive a set of rules for the system.

. Find a dynamic discreduct, and arrive at a set of rules.

And finally, the two sets of rules thus obtained are comparedis-à-vis the existing knowledge on TRIP steel.

The TRIP steel data set contains 90 objects, most of which wasollected from published literature of several workers [30,33–37].he multiple sources are also an inherent source of noise in the data,hat comes through different dimensions of experimental errors.his renders the application of dynamic discreduction algorithmore relevant here.The range of values of the conditional (i.e. compositional and

rocessing) attributes, as well as the decision attribute (UTS) areisted in Table 5. To start with, UTS is discretized into three equal-requency classes, roughly allotting the same number of objects toach class. Steels below 730 MPa falls in the class of Low-strength

b

B. DecisionUltimate tensile strength (MPa) UTS 580.72 887.44

steels, those from 730 to 770 MPa in the Medium-strength class,and steels with UTS above 770 MPa falls in the High-strength class(see the last three columns of Table 6). This discretization of UTShas been used in both Sections 5.2.1 and 5.2.2.

5.2.1. Minimal reduct discretizationThe results in this subsection was reported in a recent work

by the present authors [32]. We re-present them here to draw acomparison with the results of dynamic discreduction presentedin the next subsection

The first task is to find a minimal reduct for the TRIP steel dataset. Since the number of attributes are quite few, an extensivesearch was undertaken to see whether a subset of attributes canclassify all the objects of the data consistently, starting from the 8one-attribute subsets, next searching the 28 two-attribute subsets,and so on. At cardinality 4, just one reduct is found; this is of coursethe only minimal reduct. The 4 attributes in the reduct are thendiscretized with the MD-heuristic algorithm, yielding intervals towhich certain names (or labels) are given as in Table 6. Rules arederived from the discretized minimal set of attributes. The num-ber of rules with one to four terms in the antecedent came to nearabout 200. From these, only a handful few are selected that actu-ally represent the general patterns in the data. This was done onthe basis of two qualifying metrics (Eqs. (5) and (6)). The thresholdvalues of accuracy and coverage for selecting the rules were set to80% and 15%, respectively. This was done on an ad-hoc basis, so asto limit the set of rules to a handful few. The final set of rules havebeen presented in Rule Set 1. The pair of values in square bracketsindicate these two values respectively for each rule.

Rule Set 1 Rules obtained by minimal reduct discretization

1. if Si = MH ∧Ta = L ∧Tb = ML then UTS = L [100, 21.4]2. if Si = QH ∧Tb = MH then UTS = M [90, 23.1]3. if Si = QH ∧tb = L then UTS = M [86, 15.4]4. if Si = QH ∧tb = ML then UTS = M [100, 17.9]5. if Si = QH ∧tb = MH then UTS = M [100, 17.9]6. if Si = VL ∧tb = MH then UTS = H [83, 21.7]7. if Si = L ∧Ta = H ∧tb = MH then UTS = H [100, 21.7]8. if Si = ML ∧Ta = L ∧Tb = ML then UTS = H [100, 21.7]

The absence of intercritical annealing time (ta) and cold defor-mation (d) in the minimal reduct seems to be reasonable, as thesevariables are known to have insignificant contribution in the finalmicrostructure and property. On the other hand, it may be noted

P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897 3895

Table 6Intervals on discretizing the four attributes in the minimal reduct of the TRIP steel data.

Si (wt%) Ta (◦C) Tb (◦C) tb (s) UTS (MPa)

Min Max Label Min Max Label Min Max Label Min Max Label Min Max Label

0.48 0.73 VL 750 795 L 350 375 L 30 45 VL 580 730 L0.73 0.985 QL 795 810 M 375 415 ML 45 150 L 730 770 M0.985 1.09 L 810 860 H 415 440 M 150 230 ML 770 890 H1.09 1.19 ML 440 457 MH 230 280 M1.19 1.22 MH 457 500 H 280 450 MH1.22 1.40 H 450 950 H1.40 1.46 QH 950 1200 VH1.46 2.00 VH

V h; QH

tpeot

asTmfi

itr(ior

5

3maoo

TI

L

L: very low; QL: quite low; L: low; ML: moderately low; M: medium. VH: very hig

hat the only compositional parameter Si (and the processingarameter bainitic transformation time, tb) is present in (almost)very rule. Interestingly, both Si and tb has also more cuts than thether attributes, which in some way verify the first claim made athe beginning of Section 3.

The rules clearly show that lesser amount of Si, and moder-tely high bainitic transformation time (tb) is favoured for highertrength of the steel. This can be justified from the fact thatRIP steel with lower amount of Si may contain carbides in theicrostructure leading to high strength, whereas a bit higher trans-

ormation time favours a good amount of bainite resulting in anncrease in the strength level.

But the absence of C and Mn in the reduct cannot be justified, ast is known that these two elements play the most important role inhe stability of retained austenite, and consequently in the occur-ence of TRIP. Since the data was compiled from various sourcesreporting experiments carried out in different situations) theres ample scope for noise in the data. This may have caused thever-fitting, and under-representation of essential attributes in theeduct.

Rule Set 2 Rules obtained by dynamic discreduction

1. If C = L ∧Mn = L ∧Si = M ∧Ta = L ∧Tb = L then UTS = L [86, 46]2. If C = H ∧Mn = L ∧Si = H ∧Ta = L then UTS = M [92, 33]3. If Mn = H ∧Tb = L ∧tb = M then UTS = H [100, 35]4. If Mn = H ∧Ta = H ∧tb = M then UTS = H [89, 39]5. If C = H ∧Si = H ∧tb = M then UTS = H [87, 30]

.2.2. Dynamic discreductionThirty samples were taken from the data set, each containing

0 objects. For each sample a consistent set of cuts was deter-

ined using the MD-heuristic algorithm, with all the 8 conditional

ttributes being allowed to contribute cuts. The 10 most frequentlyccurring cuts were chosen using a proportional representationf attributes as described in the algorithm in subsection 4. The

able 7ntervals in the dynamic discreduct of the TRIP steel data.

C (wt%) Mn (wt%)

Min Max Label Min Max

0.12 0.145 L 1.0 1.480.145 0.29 H 1.48 2.15

2.15 2.39

Ta (◦C) Tb (◦C)

Min Max Label Min Max

750 810 L 350 415810 860 H 415 457

457 500

: low; M: medium; H: high.

: quite high; H: high; MH: moderately high.

resulting discreduct with 10 cuts (compared to 19 from the con-sistent discretization in the previous section) spans six attributes(instead of four previously). The position of the cuts and the labelsassigned to respective intervals are shown in Table 7. Five ruleswere obtained from the data set which cleared the 80% accuracyand 30% coverage levels. They are presented in Rule Set 2.

Two processing attributes (d and ta) failed to contribute any cut,and were regarded as redundant – similar to the previous method.On the other hand, introduction of C and Mn is very much sig-nificant from the metallurgical point of view, since, in the givenrange of values, they are known to be quite important parametersin deciding the strength of any steel. C and Mn are the most potentaustenite stabilizers and also play a significant role in the hard-enability of the retained austenite. Thus the TRIP phenomenon insteel is chiefly controlled by C and Mn. From this point of view inclu-sion of these two attributes is a commendable achievement of theproposed algorithm.

Introduction of these two compositional attributes triggeranother interesting series of events. The additional attributes helpto reduce the cuts to a meagre 10 (compared to 19 in the pre-vious set of rules). This confines the composition and processingattributes to two to three discrete classes, which can be describedby simple qualifiers like ‘Low’, ‘Medium’ or ‘High’, dispensing theuse of finer intervals like ‘Quite High’ or ‘Moderately Low.’ This inturn keeps the coverage of the rules to a higher level, indicatingthat the most general patterns are represented in the rules. Thusthe rules appear very comfortable from the aspect of understandingthe TRIP phenomena, and subsequent application of the inheritedknowledge in further development of TRIP steel. This is evidentlya marked improvement achieved by the dynamic discreduction, ascompared to the previous method.

5.2.3. Cuts in the dynamic discreductsFinally we present two characteristics of the cuts in the 30

samples from which the dynamic discreduct was constructed.

Si (wt%)

Label Min Max Label

L 0.48 0.73 LM 0.73 1.4 MH 1.4 2.0 H

tb (s)

Label Min Max Label

L 30 230 LM 230 450 MH 450 1200 H

3896 P. Dey et al. / Applied Soft Computing 11 (2011) 3887–3897

e cuts

TsscF

sTsceirn

clTTdS

6

trSbsccoaw

vma

tsfar

[

[

[

[

[

[

[

Fig. 6. (a and b) Sampl

he number of cuts that were required to consistently clas-ify a sample was plotted in a bar-chart in Fig. 6(a), while thehares of each of the eight attributes in the total number ofuts in the samples were plotted in another bar-chart shown inig. 6(b).

Fig. 6(a) is interesting from the point that it almost repre-ents a normal distribution, except a sharp rise for the value 11.he possible explanation for this is that few objects containedpecial information that was not included in other objects. Thisould represent results in a region that was not covered by otherxperiments; otherwise it would denote noise in the data, i.e. exper-mental errors. However, the average value comes to 8.4, whichounds off to 9. Keeping a safe limit, we take 10 cuts as the cardi-ality of the dynamic discreduct.

The share of cuts in the sample discreducts shown in Fig. 6(b)learly demarcate two attributes (d and ta) as redundant, gettingess than 5% of the cuts in the samples. Two other attributes, C anda, get around 10% of the sample cuts, while four others (viz. Mn, Si,b, and tb) receive 15–20% cuts each. In constructing the dynamiciscreduct, C and Ta are thus given one cut each, while each of Mn,i, Tb, and tb are allotted two cuts (see Table 7).

. Conclusion

In the present paper the relation and interdependence betweenwo vital tasks carried out through Rough Set Theory, namelyeduction and discretization of attributes, have been investigated.elf-similarity and complementarity of the two processes haveeen utilized to devise a method that finds the optimally discretizedet of attributes. The cuts that are most frequently required tolassify all objects in a series of samples taken from the data, areollected to form a dynamic discreduct. Those attributes where oner more cut is placed form the reduct. The process of discretizationnd finding reducts are thus merged into a single seamless process,hich has been named dynamic discreduction.

The efficiency of the algorithm is dependant on two parameters,iz. the number of cuts and the size of samples. To obtain an opti-um value of these two parameters, their effect on classification

ccuracy has been studied.The method has been applied on some benchmark data sets, and

he results clearly outperform all existing methods. A real life data

et on TRIP steel has also been analysed, where the rules derivedrom dynamic discreduct are found to be simpler, more general,nd appropriate from the metallurgical aspect compared to the theules derived from discretized minimal reducts.

[

[

in the TRIP steel data.

Acknowledgements

The present research was conducted as part of a Fast TrackScheme for Young Scientists supported by the Department ofScience and Technology, Government of India, vide Grant no.SR/FTP/ETA-02/2007. The financial support is being duly acknowl-edged.

References

[1] H.S. Nguyen, S.H. Nguyen, Discretization Methods in Data Mining, vol. 1,Springer Physica-Verlag, 1998, pp. 451–482.

[2] H. Liu, F. Hussain, C.L. Tan, M. Dash, Discretization: an enabling technique, DataMining and Knowledge Discovery 6 (2002) 393–423.

[3] R. Jin, Y. Breitbart, C. Muoh, Data discretization unification, in: Sev-enth IEEE International Conference on Data Mining, 2007, pp. 183–192,doi:10.1109/ICDM.2007.35.

[4] Y. Yang, Discretization for Naive-Bayes Learning, PhD thesis, School of Com-puter Science and Software Engineering, Monash University, 2003.

[5] P. Blajdo, Z.S. Hippe, T. Mroczek, J.W. Grzymala-Busse, M. Knap, L. Piatek,An extended comparison of six approaches to discretization – a Rough Setapproach, Fundamenta Informaticae 94 (2009) 121–131.

[6] J. Zhao, Y. Zhou, New heuristic method for data discretization based on RoughSet Theory, The Journal of China Universities of Posts and Telecommunications16 (2009) 113–120.

[7] Y. He, D. Chen, W. Zhao, Integrated method of compromise-based ant colonyalgorithm and Rough Set Theory and its application in toxicity mechanism clas-sification, Chemometrics and Intelligent Laboratory Systems 92 (2008) 22–32.

[8] L. Xu, F. Zhang, X. Jin, Discretization algorithm for continuous attributes basedon niche discrete particle swarm optimization, Journal of Data Acquisition andProcessing 23 (2008) 584–588 (in Chinese: Shuju Caiji Yu Chuli).

[9] M. Boulle, Khiops: a statistical discretization method of continuous attributes,Machine Learning 55 (2004) 53–69.

10] G. Li, H. Sun, H. Li, X. Jiang, Discretization of continuous attributes based onstatistical information, Journal of Computational Information Systems 4 (2008)1069–1076.

11] T. Qureshi, D.A. Zighed, Using resampling techniques for better quality dis-cretization, in: 6th International Conference on Machine Learning and DataMining in Pattern Recognition, MLDM 2009, Leipzig, 2009, pp. 1515–1520.

12] J. Senthilkumar, D. Manjula, R. Krishnamoorthy, NANO: a new supervised algo-rithm for feature selection with discretization, in: IEEE International AdvanceComputing Conference, IACC 2009, 2009, pp. 1515–1520.

13] L. Tinghui, S. Liang, J. Qingshan, W. Beizhan, Reduction and dynamic discretiza-tion of multi-attribute based on Rough Set, in: World Congress on SoftwareEngineering, 2009, WCSE 09, Xiamen, 2009.

14] F. Min, Q. Liu, C. Fang, Rough Sets approach to symbolic value partition, Inter-national Journal of Approximate Reasoning 49 (2008) 689–700.

15] Y.-Y. Guan, H.-K. Wang, Y. Wang, F. Yang, Attribute reduction and optimal deci-sion rules acquisition for continuous valued information systems, InformationSciences 179 (2009) 2974–2984.

16] J. Mata1, J.-L. Alvarez, J.-C. Riquelme, Discovering numeric association rules viaevolutionary algorithm, in: 6th Conference on Knowledge Discovery and Data

Mining, 2002, pp. 40–51.

17] Z Pawlak, Rough Sets, International Journal of Computer & Information Sciences11 (1982) 341–356.

18] G. Cantor, Contributions to the Founding of the Theory of Transfinite Numbers,Dover Publications, 1915.

ompu

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

P. Dey et al. / Applied Soft C

19] A. Frank, A. Asuncion, UCI Machine Learning Repository, University of Cal-ifornia, Irvine, School of Information and Computer Sciences, 2010. URL:http://archive.ics.uci.edu/ml.

20] J. Komorowski, Z. Pawlak, L. Polkowski, A. Skowron, Rough Sets: A Tutorial,2002. URL: alfa.mimuw.edu.pl/prace/1999/D5/Tutor06 09.ps.

21] I. Düntsch, G. Gediga, Rough Set data analysis: a road to non-invasive knowl-edge discovery, Metho�os (2000).

22] U.M. Fayyad, K.B. Irani, On the handling of continuous-valued attributes indecision tree generation, Machine Learning 8 (1992) 87–102.

23] A. An, N. Cercone, Discretization of Continuous Attributes for Learning Classifi-cation Rules LNAI 1574, Springer-Verlag, Berlin/Heidelberg, 1999, pp. 509–514.

24] R.C. Holte, Very simple classification rules perform well on most commonlyused datasets, Machine Learning 11 (1993) 63–91.

25] J.W. Grzymala-Busse, LERS – a system for learning from examples based onRough Sets, in: Intelligent Decision Support – A Handbook of Applications andAdvances in the Rough Set Theory, Kluwer Academic Publishers, 1992, pp. 3–18.

26] V.F. Zackay, E.R. Parker, D. Fahr, R. Bush, The enhancement of ductility inhigh strength steels, Transactions of the American Society of Metals 60 (1967)252–259.

27] J. Bouquerel, K. Verbeken, B.C. De Cooman, Microstructure-based model for thestatic mechanical behaviour of multiphase steels, Acta Metallurgica Materiala54 (2006) 1443–1456.

28] H.N. Han, C.G. Lee, C.-S. Oh, T.-H. Lee, S.-J. Kim, A model for deformation behaviorand mechanically induced martensitic transformation of metastable austeniticsteel, Acta Metallurgica Materiala 52 (2004) 5203–5214.

29] S.M.K. Hosseini, A. Zarei-Hanzaki, M.J.Y. Panah, S. Yue, ANN model for predictionof the effects of composition and process parameters on tensile strength and

[

ting 11 (2011) 3887–3897 3897

percent elongation of Si–Mn TRIP steels, Materials Science and Engineering A374 (2004) 122–128.

30] M. Mukherjee, S.B. Singh, O.N. Mohanty, Neural network analysis of straininduced transformation behaviour of retained austenite in TRIP-aided steels,Materials Science and Engineering A 434 (2006) 237–245.

31] S. Datta, F. Pettersson, S. Ganguly, H. Saxén, N. Chakraborti, Identification offactors governing mechanical properties of TRIP-aided steel using genetic algo-rithms and neural networks, Materials and Manufacturing Processes 23 (2008)130–137.

32] S. Dey, P. Dey, S. Datta, J. Sil, Rough Set approach to predict the strengthand ductility of TRIP steel, Materials and Manufacturing Processes 24 (2009)150–154.

33] H.C. Chen, H. Era, M. Shimizu, Effect of phosphorus on the formation of retainedaustenite and mechanical properties in Si low-carbon steel sheet, MetallurgicalTransactions A 20 (1989) 437–445.

34] Y. Sakuma, O. Matsumura, O. Akisue, Influence of C content and annealing tem-perature on microstructure and mechanical properties of 400 ◦C transformedsteel containing retained austenite, ISIJ International 31 (1991) 1348–1353.

35] M.D. Meyer, D. Vanderschueren, B.D. Cooman, The influence of the substitutionof Si by Al on the properties of cold rolled C–Mn–Si TRIP steels, ISIJ International39 (1999) 813–822.

36] S. Papaefthymiou, W. Bleck, S. Kruijver, J. Sietsma, L. Zhao, S. van der Zwaag,

Influence of intercritical deformation on microstructure of TRIP steels contain-ing Al, Materials Science and Technology 20 (2004) 201–206.

37] N.R. Bandyopadhyay, S. Datta, Effect of manganese partitioning on transforma-tion induced plasticity characteristics in microalloyed dual phase steels, ISIJInternational 44 (2004) 927–934.