Uncertainty Measurement Based on In-sim-dominance Relation

11
Uncertainty Measurement Based on In-sim-dominance Relation Liulin Zhou a , Guoyin Wang a,b,* , Taihua Xu b a Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, PR China. b School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031,PR China. *Corresponding author. Email addresses: [email protected] (Liulin Zhou), [email protected] (Guoyin Wang ), [email protected](Taihua Xu). ABSTRACT: In-sim-dominance relation is proposed to deal with hybrid information system in which the objects are described by a finite set of qualitative and quantitative attributes. Accuracy and roughness are two main tools to deal with uncertainty measurement issue in Pawlak rough set theory. However, there are few studies on uncertainty measurement based on the in-sim-dominance relation. In this paper, traditional accuracy and roughness measurements are extended to deal with hybrid information system, approximation accuracy and approximation roughness based on the in-sim-dominance relation are also defined. In particular, a concept called hybrid entropy is first introduced to measure the uncertainty of a hybrid information system. Then entropy-based roughness and approximation roughness of hybrid information system are proposed. Experiments are conducted on standard UCI data sets to test the proposed methodologies, and the results demonstrate that the entropy-based approximation roughness is effective and suitable for measuring the uncertainty of hybrid information system. Ke yword: rough set, in-sim-dominance relation, uncertainty measurement, hybrid information system 1. INTRODUCTION As a useful mathematical tool for dealing with uncertain and ambiguous information, rough set theory (RST) [1-2] proposed by Pawlak has been studied by many scholars and has been applied successfully in many research area, such as data mining [3], pattern recognition [4], decision making analysis [5], artificial intelligence [6-7], knowledge discovery [8], machine learning [9], and intelligent data analyzing [10], etc. The main thoughts of RST is that building a knowledge database by using all known knowledge of given data space, then classifying the knowledge database by indiscernibility relation, in fact, the process of classifying the knowledge database can be viewed as classifying the given data space. In this way, uncertain knowledge can be described approximately by known knowledge of knowledge database. Compared with other data processing methods, RST is more objective because it does not need prior knowledge. As is well-known, the indiscernbility relation in universe plays a crucial role for Pawlak RST, but for many practical problems, the binary relations on their universe are not equivalent, then the application of the Pawlak RST was limited. Therefore, many scholars were devoted to extend the Pawlak RST, indiscernibility relation is extended to gain the RST based on the generalized indiscernibility relation [11-20] for different information system. Practically, there exists a hybrid information system, the objects in it are described by several attributes, and the value of attributes are various, such as nominal value, integer value, numerical value, interval value etc. In order to construct a comprehensive preference model, it is reasonable to consider both criteria and regular attributes sometimes, An and Tong [21] Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015 ISBN: 978-1-941968-09-3 Β©2015 SDIWC 40

Transcript of Uncertainty Measurement Based on In-sim-dominance Relation

Uncertainty Measurement Based on In-sim-dominance Relation

Liulin Zhoua, Guoyin Wanga,b,*, Taihua Xub aChongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and

Telecommunications, Chongqing 400065, PR China. bSchool of Information Science and Technology, Southwest Jiaotong University, Chengdu

610031,PR China.

*Corresponding author.

Email addresses: [email protected] (Liulin Zhou), [email protected] (Guoyin Wang ),

[email protected](Taihua Xu).

ABSTRACT:

In-sim-dominance relation is proposed to deal

with hybrid information system in which the objects

are described by a finite set of qualitative and

quantitative attributes. Accuracy and roughness are

two main tools to deal with uncertainty measurement

issue in Pawlak rough set theory. However, there are

few studies on uncertainty measurement based on

the in-sim-dominance relation. In this paper,

traditional accuracy and roughness measurements

are extended to deal with hybrid information system,

approximation accuracy and approximation

roughness based on the in-sim-dominance relation

are also defined. In particular, a concept called

hybrid entropy is first introduced to measure the

uncertainty of a hybrid information system. Then

entropy-based roughness and approximation

roughness of hybrid information system are

proposed. Experiments are conducted on standard

UCI data sets to test the proposed methodologies,

and the results demonstrate that the entropy-based

approximation roughness is effective and suitable

for measuring the uncertainty of hybrid information

system.

Keyword: rough set, in-sim-dominance relation,

uncertainty measurement, hybrid information

system

1. INTRODUCTION

As a useful mathematical tool for dealing

with uncertain and ambiguous information,

rough set theory (RST) [1-2] proposed by

Pawlak has been studied by many scholars and

has been applied successfully in many research

area, such as data mining [3], pattern

recognition [4], decision making analysis [5],

artificial intelligence [6-7], knowledge

discovery [8], machine learning [9], and

intelligent data analyzing [10], etc. The main

thoughts of RST is that building a knowledge

database by using all known knowledge of given

data space, then classifying the knowledge

database by indiscernibility relation, in fact, the

process of classifying the knowledge database

can be viewed as classifying the given data

space. In this way, uncertain knowledge can be

described approximately by known knowledge

of knowledge database. Compared with other

data processing methods, RST is more objective

because it does not need prior knowledge.

As is well-known, the indiscernbility

relation in universe plays a crucial role for

Pawlak RST, but for many practical problems,

the binary relations on their universe are not

equivalent, then the application of the Pawlak

RST was limited. Therefore, many scholars

were devoted to extend the Pawlak RST,

indiscernibility relation is extended to gain the

RST based on the generalized indiscernibility

relation [11-20] for different information system.

Practically, there exists a hybrid information

system, the objects in it are described by several

attributes, and the value of attributes are various,

such as nominal value, integer value, numerical

value, interval value etc. In order to construct a

comprehensive preference model, it is

reasonable to consider both criteria and regular

attributes sometimes, An and Tong [21]

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 40

proposed the discernibility-similarity-

dominance matrix and its functions to induce the

decision rules based on the in-sim-dominance

relation.

Recently, many scholars proposed different

measurements for uncertainty in different RST.

Pawlak [2] proposed four numerical uncertainty

measurements, namely accuracy and roughness

in information table, approximation accuracy

and approximation roughness in decision table

to evaluate uncertainty of a rough set. Dai et.al

proposed an uncertainty measurement based on

the similarity degree for interval-valued

information systems [22], and approximation

accuracy for incomplete information systems

[23]. Beaubouef et.al [24] addressed the

measurement of uncertainty in rough sets and

rough relational databases by introducing a

measurement based on information entropy. Yao

et al. [25-27] worked on the attribute importance

in rough sets by information entropy

measurement. Liang [28] based on the

intuitionistic knowledge content nature of

information gain, the concepts of combination

entropy and combination granulation are

introduced in RST. However, there are few

studies on uncertainty measurements for hybrid

information system. In this paper, we address

the uncertainty measurement for hybrid

information system based on in-sim-dominance

relation. We investigate the properties of

in-sim-dominance relation; propose

approximation accuracy and approximation

roughness based on in-sim-dominance relation.

Moreover, the concepts of entropy-based

roughness and entropy-based approximation

roughness measurements are presented.

Experimental results show that the proposed

uncertainty measurements are effective for

evaluating the uncertainty in hybrid information

system based on in-sim-dominance relation.

The rest of this paper is organized as

follows. Some preliminary notions in RST are

briefly reviewed in Section 2. In Section 3,

in-sim-dominance relation and its rough

approximations are introduced, several

knowledge uncertainty measurements of hybrid

information systems based on

in-sim-dominance relation are defined, and then

some important properties of them are discussed.

Throw numerical experiments to evaluate the

proposed uncertainty measurement’s

effectiveness in Section 4. Then give the

conclusion in Section 5.

2. PRELIMINARY

In this section, we will review some basic

concepts in RST, including information system,

indiscernibility relation, rough approximations

and uncertainty measures.

2.1 Indiscernibility Relation And Rough

Approximations

An information system is a

quadruple 𝐼𝑆 = {π‘ˆ , 𝐢, 𝑉, 𝑓} , where U is a

non-empty finite set of objects called the

universe, C is a non-empty finite set of attribute

and V is the union of attribute domains such

that 𝑉 = β‹ƒπ‘Žβˆˆπ΄π‘‰π‘Ž ,whereπ‘‰π‘Ž denotes the value

domain of attribute a for any π‘Ž ∈ 𝐢 , 𝑋 βŠ†

π‘ˆ determines a information functionπ‘“π‘Ž : π‘ˆ β†’

π‘‰π‘Ž,it means 𝑓(π‘Ž, π‘₯) ∈ π‘‰π‘Ž, where π‘‰π‘Ž is the set

of values of a, 𝑓 π‘₯, π‘Ž denotes the value of

attribute a for object x. A decision system is

defined as 𝐷𝑆 =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 >, where C

is the set of condition attributes and d is a

decision attribute.

For an attribute subset 𝑃 βŠ† 𝐢 determines

an indiscernibility relation that is denoted

by 𝐼𝑁𝐷(𝑃) and 𝐼𝑁𝐷 𝑃 = {(π‘₯,𝑦) ∈ π‘ˆ Γ—

π‘ˆ|βˆ€π‘Ž ∈ 𝑃, 𝑓 π‘₯, π‘Ž = 𝑓 𝑦, π‘Ž . In fact, the

relation 𝐼𝑁𝐷 𝑃 induces a partition of U which

is denoted by π‘ˆ/𝐼𝑁𝐷(𝑃) or π‘ˆ/𝑃; the notion

[π‘₯]𝑃 denotes the indiscernibility class of P

containing x.

For any given information system 𝐼𝑆 =

{π‘ˆ,𝐢 , 𝑉, 𝑓} and 𝑃 βŠ† 𝐢, 𝑋 βŠ† π‘ˆ, one can define

the lower and upper approximation of X:

π‘ƒβˆ— 𝑋 = π‘₯ ∈ π‘ˆ π‘₯ 𝑃 βŠ† 𝑋 (2.1)

π‘ƒβˆ— 𝑋 = {π‘₯ ∈ π‘ˆ| π‘₯ 𝑃⋂𝑋 β‰  βˆ…} (2.2)

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 41

2.2 Uncertainty Measurements in RST

The uncertainty of rough set is modeled

from the approximation regions, Pawlak [2]

proposed two numerical measurements for

evaluating uncertainty of an information system

or a decision system in rough set theory:

accuracy and roughness. Where accuracy is

defined by the ratio of the cardinalities of the

lower and upper approximation sets of X, then

through the accuracy figured out the roughness

by subtracting the accuracy from one. Let

𝐼𝑆 = {π‘ˆ, 𝐢, 𝑉 , 𝑓}be an information system, for a

domain subset 𝑋 βŠ† π‘ˆ and an attribute

subset 𝑃 βŠ† 𝐢, accuracy and roughness of X with

respect to P are defined as:

𝛼𝑃 𝑋 =|π‘ƒβˆ—(𝑋)|

|π‘ƒβˆ—(𝑋)|, 𝛽𝑃 𝑋 = 1 βˆ’ 𝛼𝑃 𝑋 (2.3)

However, the accuracy and roughness

don’t consider the decision attribute, so that they

are not suitable for the decision systems,

therefore, approximation accuracy and

approximation roughness were proposed by

Pawlak [2] for the decision systems.

Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 > be a

decision system, π‘ˆ/𝑑 = 𝐷1,𝐷2,… , π·π‘˜ be

indiscernibility classes constituted by decision

attribute d on U and the condition attribute

subset 𝑃 βŠ† 𝐢. The approximation accuracy and

approximation roughness of π‘ˆ/𝑑 by P are

defined as:

𝛼𝑃 π‘ˆ/𝑑 = |π·π‘–βˆˆπ‘ˆ/𝑑 π‘ƒβˆ—(𝐷𝑖 )|

|π·π‘–βˆˆπ‘ˆ /𝑑 π‘ƒβˆ—(𝐷𝑖)| (2.4)

𝛽𝑃 π‘ˆ/𝑑 = 1 βˆ’ 𝛼𝑃 π‘ˆ/𝑑 (2.5)

3. UNCERTAINTY MEASUREMENT

BASED ON IN-SIM-DOMAINANCE

RELATION

In this section, in-sim-dominance relation

and its rough approximations are introduced,

and then several uncertainty measurements

based on in-sim-domainance relation are

defined.

3.1 In-sim-domainance Relation and Rough

Approximations

Because of many real-world problems have

both qualitative and quantitative attributes,

according to Greco et al. [20], the information

system can be describe as follows:

Let IS = {U, C, V, f} be an information

table, where C = C= βˆͺ C≽ βˆͺ C~, C= is a subset

of nominal attributes, C≽ is a subset of ordinal

attributes and C~ is a subset of quantitative

attributes and C= ∩ C≽ = βˆ… ,C= ∩ C~ =

βˆ…, C~ ∩ C≽ = βˆ…. Furthermore, for any P βŠ† C,

the subsets of P are denoted by P=,P≽andP∼,

respectively:

1) the subset of nominal attributes,

i.e., P= = Pβ‹‚C= ,

2) the subset of ordinal attributes, i.e.,

P≽ = Pβ‹‚C≽ ,

3) the subset of quantitative

attributes, i.e.,P~ = Pβ‹‚C~.

Furthermore, because of the key of rough

set philosophy is approximation of one

knowledge by another knowledge and the

in-sim-dominance relation among condition

attributes there are nominal attributes, ordinal

attributes and quantitative attributes, and

decision class are preference-ordered, the

approximated knowledge is a collection of

up-ward and down-ward unions of decision

classes and the β€œgranules of knowledge” are sets

of objects defined using indiscernibility,

similarity and outranking relations together.

Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 > be a

decision system, assuming that the decision

attribute d makes a partition of U into a finite

number of decision classes. Then the sets that

we use to be approximated are called the upward

union and downward union of decision classes,

respectively [19]:

𝐢𝑙𝑑≽ = ⋃ 𝐢𝑙𝑠𝑠β‰₯𝑑 ,𝐢𝑙𝑑

β‰Ό = ⋃ 𝐢𝑙𝑠𝑠≀𝑑 , t=1,2,…,n.

The statement π‘₯ ∈ 𝐢𝑙𝑑≽ means β€œx belongs

at least to class 𝐢𝑙𝑑 ” and x ∈ Cltβ‰Ό means β€œx

belongs at most to class 𝐢𝑙𝑑”. Then we can consider establishing the

indiscernibility relation on nominal attributes,

the outranking relation on ordinal attributes and

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 42

similarity or outranking relation on quantitative

attributes. Binary relations established on

different attributes can be considered jointly

(moreover, with the needs of the problems, we

can establish other binary relations).

Definition 3.1[21] Let 𝐼𝑆 = {π‘ˆ, 𝐢, 𝑉 , 𝑓} be an

information table.C= βŠ† C,C≽ βŠ† C,C~ βŠ† C, P βŠ†

C , P= = Pβ‹‚C= , P≽ = Pβ‹‚C≽ , P~ = Pβ‹‚C~ , the

in-sim-domainance relations of P on U are

defined as follows:

𝑅𝑃𝑙≽ = π‘₯, 𝑦 ∈ π‘ˆ Γ— π‘ˆ:𝑦𝐼𝑃π‘₯⋀𝑦𝐷𝑃

≽π‘₯⋀𝑦𝑆𝑃π‘₯ (3.1)

π‘…π‘ƒπ‘Ÿβ‰½ = π‘₯, 𝑦 ∈ π‘ˆ Γ— π‘ˆ: 𝑦𝐼𝑃π‘₯⋀𝑦𝐷𝑃

≽π‘₯β‹€π‘₯𝑆𝑃𝑦 (3.2)

𝑅𝑃𝑙≼ = π‘₯, 𝑦 ∈ π‘ˆ Γ— π‘ˆ:𝑦𝐼𝑃π‘₯⋀𝑦𝐷𝑃

β‰Όπ‘₯⋀𝑦𝑆𝑃π‘₯ (3.3)

π‘…π‘ƒπ‘Ÿβ‰Ό = {(π‘₯, 𝑦) ∈ π‘ˆ Γ— π‘ˆ: 𝑦𝐼𝑃π‘₯⋀𝑦𝐷𝑃

β‰Όπ‘₯β‹€π‘₯𝑆𝑃𝑦} (3.4)

Where 𝐼𝑃 is indiscernibility relation, DP≽ is

outranking relation, DP≼ is outranked relation

and 𝑆𝑃 is similarity relation.

Definition 3.2[21] The global class of an

object x with respect to P are defined as:

𝑅𝑃𝑙≽ π‘₯ = 𝑦 ∈ π‘ˆ: 𝑦𝑅𝑃

𝑙≽π‘₯ (3.5)

π‘…π‘ƒπ‘Ÿβ‰½ π‘₯ = {𝑦 ∈ π‘ˆ: 𝑦𝑅𝑃

π‘Ÿβ‰½π‘₯} (3.6)

𝑅𝑃𝑙≼ π‘₯ = 𝑦 ∈ π‘ˆ: 𝑦𝑅𝑃

𝑙≼π‘₯ (3.7)

π‘…π‘ƒπ‘Ÿβ‰Ό π‘₯ = {𝑦 ∈ π‘ˆ: 𝑦𝑅𝑃

π‘Ÿβ‰Όπ‘₯} (3.8) Theorem 3.1 Let 𝐼𝑆 = {π‘ˆ, 𝐢, 𝑉 ,𝑓} be an

information table.𝐢= βŠ† 𝐢,𝐢≽ βŠ† 𝐢 , 𝐢~ βŠ† 𝐢, for

in-sim-domainance relations, βˆ€π‘₯𝑖 ∈ π‘ˆ ,if

𝑃 βŠ† 𝑄 βŠ† 𝐢 , then we have:

𝑅𝑃𝑙≽ π‘₯𝑖 βŠ‡ 𝑅𝑄

𝑙≽ π‘₯𝑖 ,π‘…π‘ƒπ‘Ÿβ‰½ π‘₯𝑖 βŠ‡ 𝑅𝑄

π‘Ÿβ‰½ π‘₯𝑖 ;

𝑅𝑃𝑙≼ π‘₯𝑖 βŠ‡ 𝑅𝑄

𝑙≼ π‘₯𝑖 ,π‘…π‘ƒπ‘Ÿβ‰Ό π‘₯𝑖 βŠ‡ 𝑅𝑄

π‘Ÿβ‰Ό π‘₯𝑖 .

Proof: Since 𝑄 βŠ† 𝑃 and π‘₯𝑖 ∈ π‘ˆ , then

𝑅𝑃𝑙≽ π‘₯𝑖 = 𝐼𝑃1

π‘₯𝑖 ⋂𝐷𝑃2

≽ π‘₯𝑖 ⋂𝑆𝑃3

𝑙 π‘₯𝑖 and

𝑅𝑄𝑙≽ π‘₯𝑖 = 𝐼𝑄1

π‘₯ 𝑖 ⋂𝐷𝑄2

≽ π‘₯𝑖 ⋂𝑆𝑄3

𝑙 π‘₯𝑖 , where Pi, Qi

is the subset of P, Q respectively, it is easy to

obtain that 𝐼𝑃1 π‘₯𝑖 βŠ‡ 𝐼𝑄1

π‘₯𝑖 ,𝐷𝑃2

≽ π‘₯𝑖 βŠ‡ 𝐷𝑄2

≽ π‘₯𝑖

and 𝑆𝑃3

𝑙 π‘₯𝑖 βŠ‡ 𝑆𝑄3

𝑙 π‘₯𝑖 . Thus 𝑅𝑃𝑙≽ π‘₯𝑖 βŠ‡ 𝑅𝑄

𝑙≽ π‘₯𝑖 .

The others proof is similar.

Example1. An example of in-sim-dominance

binary relation

Table1[20] illustrates a representative

decisions of a decision maker (DM) concerning

8 warehouses described by means of 3 condition

attribute: a, capacity of the sales staff; b,

geographical region; c, area and a decision

attribute d specifies the assignment made by the

DM into 2 sets of warehouses making either

profit or loss.

Table1. A decision table.

Warehous

es

a b c d

x1 A 5

00

Medium Loss

x2 A 4

00

Good Profit

x3 A 4

50

Medium Profit

x4 B 4

00

Good Loss

x5 B 4

75

Good Profit

x6 B 4

25

Medium Profit

x7 B 3

50

Medium Profit

x8 B 3

50

Medium Loss

Table1 can be viewed as an example of

hybrid information system, the value of attribute

a is nominal; the value of attribute b is

quantitative; the value of attribute c and decision

attribute d are ordinal.

We consider dividing decision table into

upward-union πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ = π‘₯2,π‘₯3, π‘₯5,π‘₯6, π‘₯7 and

downward-union πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό = π‘₯1,π‘₯4,π‘₯8 .With

respect to attribute a establish the

indiscernibility relation, with respect to attribute

b establish the similarity relation that is defined

as[20]:

𝑆𝑏 = π‘₯𝑖 ,π‘₯𝑗 ∈ π‘ˆ Γ— π‘ˆ: |𝑓 π‘₯ 𝑖 ,𝑏 βˆ’ 𝑓 𝑗,𝑏 | ≀

0.1𝑓π‘₯𝑖,𝑏 (3.9)

and with respect to attribute c establish the

outranking relation that with the attribute value

β€œGood” is better than β€œMedium”. And let

𝐷𝑆 = {π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓}, where 𝐢 = π‘Ž , 𝑏, 𝑐 is

the set of condition attributes and d is a decision

attribute, π‘ˆ = π‘₯1,π‘₯2, π‘₯3,π‘₯4, π‘₯5,π‘₯6, π‘₯7,π‘₯8 ,

then, the in-sim- dominance binary relation of C

on U is show in Table2.

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 43

Table2. The in-sim-dominance relation of C on U.

Warehouse 𝑅𝐢𝑙≽ π‘₯ 𝑖

π‘…πΆπ‘Ÿβ‰½ π‘₯ 𝑖

𝑅𝐢𝑙≼ π‘₯ 𝑖

π‘…πΆπ‘Ÿβ‰Ό π‘₯ 𝑖

x1 π‘₯1 ,π‘₯3 π‘₯1

π‘₯1 ,π‘₯3 π‘₯1

x2 π‘₯2 π‘₯2

π‘₯2 π‘₯2

x3 π‘₯3 π‘₯1 ,π‘₯3

π‘₯3 π‘₯1 ,π‘₯3

x4 π‘₯4 π‘₯4

π‘₯4 ,π‘₯6 π‘₯4 ,π‘₯6

x5 π‘₯5 π‘₯5

π‘₯5 π‘₯5

x6 π‘₯4 ,π‘₯6 π‘₯4 ,π‘₯6

π‘₯6 π‘₯6

x7 π‘₯7 ,π‘₯8 π‘₯7 ,π‘₯8

π‘₯7 ,π‘₯8 π‘₯7 ,π‘₯8

x8 π‘₯7 ,π‘₯8 π‘₯7 ,π‘₯8

π‘₯7 ,π‘₯8 π‘₯7 ,π‘₯8

Definition 3.3 Let DS =< U, C βˆͺ d , V, f > be

a decision table. With respect to 𝑃 βŠ† 𝐢, the set

of all objects belonging to 𝐢𝑙𝑑≽ without any left

ambiguity constitutes the 𝑃 𝑙 -lower

approximation of 𝐢𝑙𝑑≽ denoted by π‘ƒβˆ—

𝑙(𝐢𝑙𝑑≽) and

the set of all objects that could belonging to 𝐢𝑙𝑑≽

constitutes the 𝑃 𝑙-upper approximation of 𝐢𝑙𝑑≽

denoted by 𝑃 π‘™βˆ—(𝐢𝑙𝑑≽), for t=1,2,…n: [21]

π‘ƒβˆ—π‘™ 𝐢𝑙𝑑

≽ = {π‘₯ ∈ π‘ˆ: 𝑅𝑃𝑙≽ π‘₯ βŠ† 𝐢𝑙𝑑

≽} (3.10)

𝑃 π‘™βˆ— 𝐢𝑙𝑑≽ = {π‘₯ ∈ π‘ˆ:𝑅𝑃

π‘Ÿβ‰Ό π‘₯ ⋂𝐢𝑙𝑑≽ β‰  πœ™} (3.11)

Definition 3.4 Let 𝐷𝑆 =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 >

be a decision table. With respect to 𝑃 βŠ† 𝐢, the

set of all objects belonging to 𝐢𝑙𝑑≽ without any

right ambiguity constitutes the π‘ƒπ‘Ÿ -lower

approximation of 𝐢𝑙𝑑≽ denoted by π‘ƒβˆ—

π‘Ÿ(𝐢𝑙𝑑≽)

and the set of all objects that could belonging to

𝐢𝑙𝑑≽ constitutes the π‘ƒπ‘Ÿ-upper approximation of

𝐢𝑙𝑑≽ denoted by π‘ƒπ‘Ÿβˆ—(𝐢𝑙𝑑

≽), for t=1,2,…n: [21]

π‘ƒβˆ—π‘Ÿ 𝐢𝑙𝑑

≽ = {π‘₯ ∈ π‘ˆ: π‘…π‘ƒπ‘Ÿβ‰½ π‘₯ βŠ† 𝐢𝑙𝑑

≽} (3.12)

π‘ƒπ‘Ÿβˆ— 𝐢𝑙𝑑≽ = {π‘₯ ∈ π‘ˆ: 𝑅𝑃

𝑙≼ π‘₯ ⋂𝐢𝑙𝑑≽ β‰  πœ™} (3.14)

We can consider letting the intersection of

𝑃 𝑙-lower and π‘ƒπ‘Ÿ-lower approximation of 𝐢𝑙𝑑≽,

the union of 𝑃 𝑙 -upper and π‘ƒπ‘Ÿ -upper

approximation 𝐢𝑙𝑑≽ to be the lower and upper

approximation of 𝐢𝑙𝑑≽, as follows[21]:

π‘ƒβˆ— 𝐢𝑙𝑑≽ = π‘ƒβˆ—

𝑙 𝐢𝑙𝑑≽ β‹‚π‘ƒβˆ—

π‘Ÿ 𝐢𝑙𝑑≽ (3.15)

π‘ƒβˆ— 𝐢𝑙𝑑≽ = 𝑃 π‘™βˆ— 𝐢𝑙𝑑

≽ β‹ƒπ‘ƒπ‘Ÿβˆ— 𝐢𝑙𝑑≽ (3.16)

Similarly, we can define the lower and

upper approximation of 𝐢𝑙𝑑≼, as follows:

π‘ƒβˆ—π‘™ 𝐢𝑙𝑑

β‰Ό = {π‘₯ ∈ π‘ˆ: 𝑅𝑃𝑙≼ π‘₯ βŠ† 𝐢𝑙𝑑

β‰Ό} (3.17)

𝑃 π‘™βˆ— 𝐢𝑙𝑑≼ = {π‘₯ ∈ π‘ˆ:𝑅𝑃

π‘Ÿβ‰½ π‘₯ ⋂𝐢𝑙𝑑≼ β‰  πœ™} (3.18)

π‘ƒβˆ—π‘Ÿ 𝐢𝑙𝑑

β‰Ό = {π‘₯ ∈ π‘ˆ: π‘…π‘ƒπ‘Ÿβ‰Ό π‘₯ βŠ† 𝐢𝑙𝑑

β‰Ό} (3.19)

π‘ƒπ‘Ÿβˆ— 𝐢𝑙𝑑≼ = {π‘₯ ∈ π‘ˆ: 𝑅𝑃

𝑙≽ π‘₯ ⋂𝐢𝑙𝑑≼ β‰  πœ™} (3.20)

π‘ƒβˆ— 𝐢𝑙𝑑≼ = π‘ƒβˆ—

𝑙 𝐢𝑙𝑑≼ β‹‚π‘ƒβˆ—

π‘Ÿ 𝐢𝑙𝑑≼ (3.21)

π‘ƒβˆ— 𝐢𝑙𝑑≼ = 𝑃 π‘™βˆ— 𝐢𝑙𝑑

β‰Ό β‹ƒπ‘ƒπ‘Ÿβˆ— 𝐢𝑙𝑑≼ (3.22)

π‘ƒβˆ— 𝐢𝑙𝑑≽ and π‘ƒβˆ— 𝐢𝑙𝑑

β‰Ό consist of those

objects which are precise ones, π‘ƒβˆ— 𝐢𝑙𝑑≽ and

π‘ƒβˆ— 𝐢𝑙𝑑≼ consist of those objects which are

precise or left ambiguous or right ambiguous.

Theorem 3.2[21] (Monotonic) For any 𝑑 ∈ 𝑇

and 𝑃 βŠ† 𝑄 βŠ† 𝐢 , then:

π‘ƒβˆ—π‘™ 𝐢𝑙𝑑

≽ βŠ† π‘„βˆ—π‘™ 𝐢𝑙𝑑

≽ , π‘ƒβˆ—π‘Ÿ(𝐢𝑙𝑑

≽) βŠ† π‘„βˆ—π‘Ÿ(𝐢𝑙𝑑

≽);

π‘ƒβˆ—π‘™(𝐢𝑙𝑑

β‰Ό) βŠ† π‘„βˆ—π‘™(𝐢𝑙𝑑

β‰Ό),π‘ƒβˆ—π‘Ÿ(𝐢𝑙𝑑

β‰Ό) βŠ† π‘„βˆ—π‘Ÿ(𝐢𝑙𝑑

β‰Ό);

𝑃 π‘™βˆ— 𝐢𝑙𝑑≽ βŠ‡ 𝑄𝑙 βˆ— 𝐢𝑙𝑑

≽ ,π‘ƒπ‘Ÿβˆ—(𝐢𝑙𝑑≽) βŠ‡ π‘„π‘Ÿ βˆ—(𝐢𝑙𝑑

≽);

𝑃 π‘™βˆ— 𝐢𝑙𝑑≼ βŠ‡ 𝑄𝑙 βˆ— 𝐢𝑙𝑑

β‰Ό ,π‘ƒπ‘Ÿβˆ—(𝐢𝑙𝑑≼) βŠ‡ π‘„π‘Ÿ βˆ—(𝐢𝑙𝑑

β‰Ό).

Example2. The example of lower and upper

approximation of in-sim-dominance relation by

table1.

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 44

Let the condition attribute subset 𝑃 = 𝐢,

the upward-union is πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ and the

downward-union is πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό , then the lower and

upper approximation of πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ and πΆπ‘™πΏπ‘œπ‘ π‘ 

β‰Ό on

P are:

π‘ƒβˆ— πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ = π‘ƒβˆ—

𝑙 πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ ∩ π‘ƒβˆ—

π‘Ÿ πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½

= π‘₯2,π‘₯5

π‘ƒβˆ— πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ = 𝑃 π‘™βˆ— πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘

≽ βˆͺ π‘ƒπ‘Ÿβˆ— πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½

= π‘₯1,π‘₯2,π‘₯3, π‘₯4,π‘₯5, π‘₯6,π‘₯7, π‘₯8

π‘ƒβˆ— πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό = π‘ƒβˆ—

𝑙 πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό ∩ π‘ƒβˆ—

π‘Ÿ πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό = πœ™

π‘ƒβˆ— πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό = 𝑃 π‘™βˆ— πΆπ‘™πΏπ‘œπ‘ π‘ 

β‰Ό βˆͺ π‘ƒπ‘Ÿβˆ— πΆπ‘™πΏπ‘œπ‘ π‘ β‰Ό

= π‘₯1,π‘₯3,π‘₯4, π‘₯6,π‘₯7, π‘₯8

3.2. Uncertainty Measurements Based on

In-sim-dominance Relation

Definition 3.5 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉 , 𝑓 >

be a decision table where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ ,

for an attribute subset 𝑃 βŠ† 𝐢, the accuracy of

𝐢𝑙𝑑≽ and 𝐢𝑙𝑑

β‰Ό with respect to P is defined as:

𝛼𝑃 𝐢𝑙𝑑≽ =

π‘ƒβˆ—(𝐢𝑙𝑑≽)

π‘ƒβˆ—(𝐢𝑙𝑑≽ )

,𝛼𝑃 𝐢𝑙𝑑≼ =

π‘ƒβˆ—(𝐢𝑙𝑑≼)

π‘ƒβˆ—(𝐢𝑙𝑑≼ )

(3.23)

Definition 3.6 Let 𝐷𝑆 =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 >

be a decision table where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ ,

for an attribute subset 𝑃 βŠ† 𝐢, the roughness of

𝐢𝑙𝑑≽ and 𝐢𝑙𝑑

β‰Ό with respect to P is defined as:

πœŒπ‘ƒ 𝐢𝑙𝑑≽ = 1 βˆ’ 𝛼𝑃 𝐢𝑙𝑑

≽ (3.24)

πœŒπ‘ƒ 𝐢𝑙𝑑≼ = 1 βˆ’ 𝛼𝑃 𝐢𝑙𝑑

β‰Ό (3.25)

Theorem 3.3 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 > be

a decision table where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ , if

𝑃 βŠ† 𝑄 βŠ† 𝐢 , then πœŒπ‘ƒ 𝐢𝑙𝑑≽ β‰₯ πœŒπ‘„ 𝐢𝑙𝑑

≽ and

πœŒπ‘ƒ 𝐢𝑙𝑑≼ β‰₯ πœŒπ‘„ 𝐢𝑙𝑑

β‰Ό .

Proof. Since 𝑃 βŠ† 𝑄 βŠ† 𝐢 , from the definition of

π‘ƒβˆ—(𝐢𝑙𝑑≽) = π‘ƒβˆ—

𝑙 𝐢𝑙𝑑≽ β‹‚π‘ƒβˆ—

π‘Ÿ 𝐢𝑙𝑑≽ and π‘ƒβˆ— 𝐢𝑙𝑑

≽ =

𝑃 π‘™βˆ— 𝐢𝑙𝑑≽ β‹ƒπ‘ƒπ‘Ÿβˆ— 𝐢𝑙𝑑

≽ , according to the theorem

3.2, it is easy to obtain that

(π‘ƒβˆ—π‘™(𝐢𝑙𝑑

≽)β‹‚π‘ƒβˆ—π‘Ÿ(𝐢𝑙𝑑

≽)) βŠ† (π‘„βˆ—π‘™(𝐢𝑙𝑑

≽)β‹‚π‘„βˆ—π‘Ÿ(𝐢𝑙𝑑

≽))

and (𝑃 π‘™βˆ—(𝐢𝑙𝑑≽)β‹ƒπ‘ƒπ‘Ÿβˆ—(𝐢𝑙𝑑

≽)) βŠ‡

(π‘„π‘™βˆ—(𝐢𝑙𝑑≽)β‹ƒπ‘„π‘Ÿβˆ—(𝐢𝑙𝑑

≽)), so π‘ƒβˆ—(𝐢𝑙𝑑≽) βŠ† π‘„βˆ—(𝐢𝑙𝑑

≽)

and π‘ƒβˆ—(𝐢𝑙𝑑≽) βŠ‡ π‘„βˆ—(𝐢𝑙𝑑

≽) . Then π‘ƒβˆ—(𝐢𝑙𝑑

≽ )

π‘ƒβˆ—(𝐢𝑙𝑑≽)

≀

π‘„βˆ—(𝐢𝑙𝑑≽ )

π‘„βˆ—(𝐢𝑙𝑑≽)

, thus 𝛼𝑃 𝐢𝑙𝑑≽ ≀ 𝛼𝑄 𝐢𝑙𝑑

≽ . Therefore

πœŒπ‘ƒ 𝐢𝑙𝑑≽ β‰₯ πœŒπ‘„ 𝐢𝑙𝑑

≽ . The proof of πœŒπ‘ƒ 𝐢𝑙𝑑≼ β‰₯

πœŒπ‘„ 𝐢𝑙𝑑≼ is similar to πœŒπ‘ƒ 𝐢𝑙𝑑

≽ β‰₯ πœŒπ‘„ 𝐢𝑙𝑑≽ .

Definition 3.7 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉 , 𝑓 >

be a decision table where C = C= βˆͺ C≽ βˆͺ C~ ,

and assume that π‘ˆ 𝐢𝑙𝑑≽ = {𝐢𝑙𝑑, 𝐢𝑙𝑑+1, … , πΆπ‘™π‘š};

π‘ˆ 𝐢𝑙𝑑≼ = {𝐢𝑙1, 𝐢𝑙2,… , 𝐢𝑙𝑑} be indiscernibility

class are constituted by decision attribute d on

the upward union 𝐢𝑙𝑑≽ and downward union

𝐢𝑙𝑑≼ of decision classes and condition attribute

subset 𝑃 βŠ† 𝐢. The approximation accuracy of

π‘ˆ 𝐢𝑙𝑑≽ and π‘ˆ 𝐢𝑙𝑑

β‰Ό with respect to P under

in-sim-dominance relation are defined as:

𝛼𝑃 π‘ˆ 𝐢𝑙𝑑≽ =

|π‘ƒβˆ—(𝑑𝑖)|π‘‘π‘–βˆˆπ‘ˆ 𝐢𝑙 𝑑

≽

|π‘ƒβˆ—(𝑑𝑖)|𝑑𝑖 βˆˆπ‘ˆ 𝐢𝑙 𝑑

≽

(3.26)

𝛼𝑃 π‘ˆ 𝐢𝑙𝑑≼ =

|π‘ƒβˆ—(𝑑𝑖)|π‘‘π‘–βˆˆπ‘ˆ 𝐢𝑙 𝑑

β‰Ό

|π‘ƒβˆ—(𝑑𝑖)|𝑑𝑖 βˆˆπ‘ˆ 𝐢𝑙 𝑑

β‰Ό

(3.27)

Then we can define the approximation

roughness by the approximation accuracy under

in-sim-dominance relation as definition 3.6:

πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ = 1 βˆ’ 𝛼𝑃 π‘ˆ 𝐢𝑙𝑑

≽ (3.28)

πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≼ = 1 βˆ’ 𝛼𝑃 π‘ˆ 𝐢𝑙𝑑

β‰Ό (3.29)

Theorem 3.4 Let 𝐷𝑆 =< π‘ˆ , 𝐢 βˆͺ 𝑑 , 𝑉 ,𝑓 > be

a decision table, where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ . If

𝑄 βŠ† 𝑃 βŠ† 𝐢 , then πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ ≀ πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑

≽

and πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≼ ≀ πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑

β‰Ό .

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 45

Proof. Since𝑄 βŠ† 𝑃, according to theorem 3.1,

we know that βˆ€π‘₯𝑖 ∈ π‘ˆ , 𝑅𝑃𝑙≽ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰½ π‘₯𝑖 =

𝑅𝑃≽ π‘₯𝑖 βŠ‡ 𝑅𝑄

≽ π‘₯𝑖 = 𝑅𝑄𝑙≽ π‘₯𝑖 ⋂𝑅𝑄

π‘Ÿβ‰½ π‘₯𝑖 .

Consequently, βˆ€π‘₯𝑖 ∈ 𝐢𝑙𝑑≽ , 𝑅𝑃

≽ π‘₯𝑖 βŠ†

𝐢𝑙𝑑≽ and 𝑅𝑄

≽ π‘₯𝑖 βŠ† 𝐢𝑙𝑑≽ . Hence, βˆ€π‘‹ ∈ 𝐢𝑙𝑑

≽ , it

follows that |π‘ƒβˆ— 𝑋 | β‰₯ |π‘„βˆ— 𝑋 | , so βˆ€π‘‘π‘– ∈

π‘ˆ 𝐢𝑙𝑑≽ , |π‘ƒβˆ—(𝑑𝑖)| β‰₯ |π‘„βˆ—(𝑑𝑖)|.

On the other hand, βˆ€π‘₯𝑖 ∈ π‘ˆ, 𝑅𝑃≽ π‘₯𝑖 ⋂𝑋 β‰ 

πœ™ and 𝑅𝑄≽ π‘₯𝑖 ⋂𝑋 β‰  πœ™ , since βˆ€π‘₯𝑖 ∈

π‘ˆ, 𝑅𝑃≽ π‘₯𝑖 βŠ‡ 𝑅𝑄

≽ π‘₯𝑖 . Hence, βˆ€π‘‹ ∈ 𝐢𝑙𝑑≽ , it

follows that |π‘ƒβˆ— 𝑋 | ≀ |π‘„βˆ— 𝑋 | , so βˆ€π‘‘π‘– ∈

π‘ˆ 𝐢𝑙𝑑≽ , |π‘ƒβˆ— 𝑑𝑖 | ≀ |π‘„βˆ—(𝑑𝑖)|.Consequently, we

have πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ ≀ πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑

≽ .The proof of

πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≼ ≀ πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑

β‰Ό is similar.

3.3. Entropy-based Uncertainty Measurements

Based on In-sim-dominance Relation

Shannon provided a useful measurement

that called entropy to measure the information

of data set in information theory [29]. In fact,

entropy can be used as uncertainty measurement

in rough set theory that some scholars studied

early [24-28].

In this part, we define new entropy called

hybrid entropy based on in-sim-dominance

relation, then two more useful measurements

called entropy-based approximation roughness

of upward-union or downward-union are

proposed based on hybrid entropy.

Definition 3.8 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉 , 𝑓 >

be a decision table, where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ ,

for any attribute subset 𝑃 βŠ† 𝐢 , the hybrid

entropy with respect to P is defined as:

𝐻 𝑃 = βˆ’1

2

|𝑅𝑃𝑙≽(π‘₯𝑖)⋂𝑅𝑃

π‘Ÿβ‰½(π‘₯𝑖)|

|π‘ˆ|2 π‘™π‘œπ‘”1

|𝑅𝑃𝑙≽(π‘₯𝑖)⋂𝑅𝑃

π‘Ÿβ‰½(π‘₯𝑖)|

|π‘ˆ|

𝑖=1

βˆ’1

2

|𝑅𝑃𝑙≼(π‘₯𝑖)⋂𝑅𝑃

π‘Ÿβ‰Ό(π‘₯ 𝑖)|

|π‘ˆ|2π‘™π‘œπ‘”

1

|𝑅𝑃𝑙≼(π‘₯𝑖)⋂𝑅𝑃

π‘Ÿβ‰Ό(π‘₯𝑖)|

|π‘ˆ|𝑖=1 (3.30)

The hybrid entropy achieves the maximum

value 𝐢𝑙𝑑

≽ π‘™π‘œπ‘” 𝐢𝑙𝑑≽ + πΆπ‘™π‘‘βˆ’1

β‰Ό π‘™π‘œπ‘” πΆπ‘™π‘‘βˆ’1β‰Ό

2|π‘ˆ| whenβˆ€π‘₯𝑖 ∈ π‘ˆ,

𝑅𝑃𝑙≽ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰½ π‘₯𝑖 = 𝐢𝑙𝑑≽ and

𝑅𝑃𝑙≼ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰Ό π‘₯𝑖 = πΆπ‘™π‘‘βˆ’1β‰Ό , and it achieves the

minimum value 0 when βˆ€π‘₯𝑖 ∈ π‘ˆ ,

𝑅𝑃𝑙≽ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰½ π‘₯𝑖 = {π‘₯𝑖} and

𝑅𝑃𝑙≼ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰Ό π‘₯𝑖 = {π‘₯𝑖} . Hence, we have

0 ≀ 𝐻 𝑃 ≀ 𝐢𝑙𝑑

≽ π‘™π‘œπ‘” 𝐢𝑙𝑑≽ + πΆπ‘™π‘‘βˆ’1

β‰Ό π‘™π‘œπ‘” πΆπ‘™π‘‘βˆ’1β‰Ό

2|π‘ˆ|.

Theorem 3.5 (Monotonic) Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉 , 𝑓 > be a decision table , where 𝐢 =

𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ . If 𝑄 βŠ† 𝑃 βŠ† 𝐢 , then 𝐻 𝑃 ≀

𝐻 𝑄 .

Proof. Since 𝑄 βŠ† 𝑃 βŠ† 𝐢 , according to the

theorem 3.1, we have 𝑅𝑃𝑙≽(π‘₯𝑖)⋂𝑅𝑃

π‘Ÿβ‰½(π‘₯𝑖) βŠ†

𝑅𝑄𝑙≽ (π‘₯𝑖)⋂𝑅𝑄

π‘Ÿβ‰½(π‘₯𝑖) and 𝑅𝑃𝑙≼(π‘₯𝑖)⋂𝑅𝑃

π‘Ÿβ‰Ό(π‘₯𝑖) βŠ†

𝑅𝑄𝑙≼(π‘₯𝑖)⋂𝑅𝑄

π‘Ÿβ‰Ό(π‘₯𝑖) then π‘™π‘œπ‘”1

|𝑅𝑃𝑙 ≽(π‘₯ 𝑖)⋂𝑅𝑃

π‘Ÿ ≽(π‘₯ 𝑖)|β‰₯

π‘™π‘œπ‘”1

|𝑅𝑄𝑙≽ (π‘₯ 𝑖)⋂𝑅𝑄

π‘Ÿ ≽ (π‘₯ 𝑖)| and π‘™π‘œπ‘”

1

|𝑅𝑃𝑙 β‰Ό(π‘₯ 𝑖)⋂𝑅𝑃

π‘Ÿ β‰Ό(π‘₯ 𝑖)|β‰₯

π‘™π‘œπ‘”1

|𝑅𝑄𝑙≼ (π‘₯ 𝑖)⋂𝑅𝑄

π‘Ÿ β‰Ό (π‘₯ 𝑖)|, therefore 𝐻 𝑃 ≀ 𝐻 𝑄 .

Theorem 3.6(Equivalence) Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉 , 𝑓 > be a decision tablewhere 𝐢 = 𝐢= βˆͺ

𝐢≽ βˆͺ 𝐢~ . For 𝑄, 𝑃 βŠ† 𝐢 , if βˆ€π‘₯𝑖 ∈ π‘ˆ ,

𝑅𝑃𝑙≽ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰½ π‘₯𝑖 = 𝑅𝑄𝑙≽ π‘₯𝑖 ⋂𝑅𝑄

π‘Ÿβ‰½ π‘₯𝑖 and

𝑅𝑃𝑙≼ π‘₯𝑖 ⋂𝑅𝑃

π‘Ÿβ‰Ό π‘₯𝑖 = 𝑅𝑄𝑙≼ π‘₯𝑖 ⋂𝑅𝑄

π‘Ÿβ‰Ό π‘₯𝑖 ,then

𝐻 𝑃 = 𝐻 𝑄 .

Proof. It is easy to prove by the definition 3.2

and 3.8.

Definition 3.9 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉 , 𝑓 >

be a decision table, where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~,

for an attribute subset 𝑃 βŠ† 𝐢, the entropy-based

roughness of 𝐢𝑙𝑑≽ and 𝐢𝑙𝑑

β‰Ό with respect to P

under in-sim-dominance relation are defined as

follows:

π»πœŒπ‘ƒ 𝐢𝑙𝑑≽ = πœŒπ‘ƒ 𝐢𝑙𝑑

≽ 𝐻 𝑃 (3.31)

π»πœŒπ‘ƒ 𝐢𝑙𝑑≼ = πœŒπ‘ƒ 𝐢𝑙𝑑

β‰Ό 𝐻 𝑃 (3.32)

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 46

Theorem 3.7 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 > be

a decision table,where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~, if

𝑄 βŠ† 𝑃 βŠ† 𝐢 , then π»πœŒπ‘ƒ 𝐢𝑙𝑑≽ ≀ π»πœŒπ‘„ 𝐢𝑙𝑑

≽ and

π»πœŒπ‘ƒ 𝐢𝑙𝑑≼ ≀ π»πœŒπ‘„ 𝐢𝑙𝑑

β‰Ό .

Proof. Since Q βŠ† P βŠ† C, we have πœŒπ‘ƒ 𝐢𝑙𝑑≽ ≀

πœŒπ‘„ 𝐢𝑙𝑑≽ by theorem3.3 and 𝐻 𝑃 ≀ 𝐻 𝑄 by

theorem3.4. Hence, we get π»πœŒπ‘ƒ 𝐢𝑙𝑑≽ ≀

π»πœŒπ‘„ 𝐢𝑙𝑑≽ . And the proof of π»πœŒπ‘ƒ 𝐢𝑙𝑑

β‰Ό ≀

π»πœŒπ‘„ 𝐢𝑙𝑑≼ is similar.

Definition 3.10 Let 𝐷𝑆 =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 >

be a decision table, where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~,

and assume that π‘ˆ 𝐢𝑙𝑑≽ = {𝐢𝑙𝑑 , 𝐢𝑙𝑑+1,… , πΆπ‘™π‘š},

π‘ˆ 𝐢𝑙𝑑≽ = {𝐢𝑙1, 𝐢𝑙2,… , 𝐢𝑙𝑑} be indiscernibility

class constituted by decision attribute d on the

upward union 𝐢𝑙𝑑≽ and downward union 𝐢𝑙𝑑

β‰Ό

of decision classes and condition attribute

subset 𝑃 βŠ† 𝐢 . The entropy-based

approximation roughness of π‘ˆ 𝐢𝑙𝑑≽ and

π‘ˆ 𝐢𝑙𝑑≼ with respect to P under

in-sim-dominance relation are defined as

follows:

π»πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ =πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑

≽ 𝐻 𝑃 (3.33)

π»πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≼ = πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑

β‰Ό 𝐻 𝑃 (3.34)

Theorem 3.8 Let DS =< π‘ˆ, 𝐢 βˆͺ 𝑑 , 𝑉, 𝑓 > be

a decision table, where 𝐢 = 𝐢= βˆͺ 𝐢≽ βˆͺ 𝐢~ , if

𝑄 βŠ† 𝑃 βŠ† 𝐢 , then we have π»πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ ≀

π»πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑≽ , π»πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑

β‰Ό ≀ π»πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑≼ .

Proof. Since 𝑄 βŠ† 𝑃 βŠ† 𝐢 ,we have

πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ ≀ πœŒπ‘„ π‘ˆ 𝐢𝑙𝑑

≽ by theorem3.4 and

𝐻 𝑃 ≀ 𝐻 𝑄 by theorem 3.5. Hence, we get

π»πœŒπ‘ƒ 𝐢𝑙𝑑≽ ≀ π»πœŒπ‘„ 𝐢𝑙𝑑

≽ . And the proof of

π»πœŒπ‘ƒ 𝐢𝑙𝑑≼ ≀ π»πœŒπ‘„ 𝐢𝑙𝑑

β‰Ό is similar.

Example3. The comparison of approximation

roughness and entropy-based approximation

roughness.

Fig.1. Approximation roughness vs. entropy-based

approximation roughness of upward-union.

Fig.2. Approximation roughness vs. entropy-based

approximation roughness of downward-union.

As example1, we established the

in-sim-dominance relation, and then we

calculate the value of approximation roughness

and entropy-based approximation roughness to

compare their difference. The results are shown

in Fig.1-2. It is easy to find that both

πœŒπ‘ƒ π‘ˆ πΆπ‘™π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘β‰½ and πœŒπ‘ƒ π‘ˆ πΆπ‘™πΏπ‘œπ‘ π‘ 

β‰Ό do not

change as the number of attributes increase from

2 to 3. By contrast, the entropy-based

approximation roughness can discern them

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 47

clearly. It means that entropy-based

approximation roughness π»πœŒπ‘ƒ π‘ˆ 𝐢𝑙𝑑≽ and

π»πœŒπ‘ƒ π‘ˆ πΆπ‘™π‘‘βˆ’1β‰Ό is more powerful for evaluating

the uncertainty in some cases.

4. EXPERIMENTS

In order to verify the effectiveness of the

uncertainty measures proposed above, we

conduct four experiments on two real- life data

sets which are Heart disease dataset and Credit

approval from UCI repository of machine

learning database. The experiments are

conducted by the analysis of the properties of

every data set attributes, establishes of

corresponding binary relation, then build the

in-sim-dominance relation.

The Heart disease dataset, there are 6 real

attributes, 1 ordered attribute, 3 binary attributes,

3 nominal attributes and 1 decision attribute.

And it has 270 objects that have heart disease or

not. The results are shown in Fig.3-4.

The Credit approval, there are 6 categorical

attributes, 3 binary attributes, 6 qualitative

attributes, 1 decision attribute. And it has 690

objects that belong to the class - or class +.

Because there are 2 attributes have the same

classification and it has missing value, so

through the preprocessing, there are 14

Fig.3. The result of Heart d isease dataset’s upward-union

condition attributes and 653 objects. The results

are shown in Fig.5-6.

Fig.4. The result of Heart d isease dataset’s

downward-union.

Fig.5. The result of Credit approval dataset’s

upward-union.

Fig.6. The result of Credit approval dataset’s

downward-union.

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 48

It can be seen that the values of

entropy-based approximation roughness and

approximation roughness measure are

decreasing with the number of attributes

becoming bigger from the Fig3-6, means that

the uncertainty decrease when the attributes

increases, on the other hand, if supplying more

available knowledge, the uncertainty will

decrease. The experiments demonstrate the

availability of the two uncertainty

measurements based on in-sim-dominance

relation. Through the Fig3-6, it is easy to find

that the value of approximation roughness has

no change when the number of attributes

increases from 1 to 2 in Heart disease and Credit

card data sets. By contrast, the entropy-based

approximation roughness can evaluate

uncertainty of hybrid information system more

accurately.

5. CONCLUSIONS

In this paper, we studied the uncertainty

measurement based on in-sim-dominance

relation. The roughness and approximation

roughness measurements are extended to deal

with the hybrid information system firstly, and

then propose the entropy-based roughness and

the entropy-based approximation roughness of

Up-union and Down-union to measure the

uncertainty of hybrid information system based

on the hybrid entropy. And the experimental

results demonstrate that the approximation

roughness and the entropy-based approximation

roughness measurements are useful and

effective for evaluating the uncertainty of hybrid

information system. However, the results also

show that the entropy-based approximation

roughness can evaluate the uncertainty more

clearly than approximation roughness.

Moreover, we can establish the other binary

relations such as tolerance relation to jointly

with in-sim-dominance relation meet the needs

of special problems.

Acknowledgment

This work is supported by National Natural

Science Foundation of China (No. 61272060),

and Key Natural Science Foundation of

Chongqing (No. CSTC2013jjB40003).

REFERENCES

[1] Z.Pawlak.Rough sets: International Journal of computer and Information Sciences, 11(5): 341- 356[J]. 1982

[2] Z.Pawlak.Rough sets: theoretical aspects of reaso- ning about data, system theory, Knowledge Engineering and Problem Solving, vol. 9[J]. 1991.

[3] F.Hu,G.Y.Wang. Quick reduction algorithms bas- ed on attribute order [j].Chinese Journal of Computers, 8:029, 2007.

[4] S.Hirano,S.Tsumoto. Segmentation of medical images based on approximations in rough set theory. In Rough Sets and Current Trends in Computing, pages554–563. Springer, 2002.

[5] Z.Pawlak.Rough set approach to knowledge-based decision support. Europeanjournal of operational research, 99(1):48–57, 1997.

[6] Y.Z.Liu,H.Y.Xuan,G.X.Lin. Application research on tax forecasting in china based on rough set theory [j]. Systems Engineering-theory & Practice, 10:017, 2004.

[7] R.R.Tan. Rule-based life cycle impact assessment using modified rough set induction methodology. Environmental Modeling & Software, 20(5):509–513, 2005.

[8] K.Thangavel,A.Pethalakshmi. Dimensionality red- uction based on rough set theory: A review. Applied Soft Computing, 9(1):1–12, 2009.

[9] J.H.Zhang,Y.Y.Wang. A rough margin based support vector machine. Information Sciences, 178(9):2204–2214, 2008.

[10] T.Herawan,M.M.Deris, and J.H. Abawajy. A ro- ugh set approach for selecting clustering attribute. Knowledge-Based Systems, 23 (3): 220 –231, 2010.

[11] M.Kryszkiewicz. Rough set approach to incomp- lete information systems [J]. Information sciences, 1998, 112(1): 39-49.

[12] M. Kryszkiewicz. Rules in incomplete informati- on systems. Information Sciences, 113(3):271-292, 1999.

[13] J.Stefanowski,A.Tsoukias. On the extension of rough sets under incomplete information. In New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pages 73-81. Springer, 1999.

[14] J.Stefanowski,A.Tsoukias. Incomplete informati- on tables and rough classification. Computational Intelligence, 17(3):545-566, 2001.

[15] J.Stefanowski,A.Tsoukias. Valued tolerance and decision rules. In Rough Sets and Current Trends in Computing, pages 212-219. Springer, 2001.

[16] G.Y.Wang. Extension of rough set under incomp- lete information systems. In Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on, volume 2, pages 1098-1103.IEEE, 2002.

[17] J.W.Grzymala-Busse. Characteristic relations for incomplete data: A generalization of the indiscer-

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 49

nibility relation. In Rough Sets and Current Tren- ds in Computing, pages 244-253. Springer, 2004.

[18] J.W.Grzymala-Busse. Rough set strategies to data with missing attribute values. In Foundations and Novel Approaches in Data Mining, pages 197-212. Springer, 2006.

[19] S.Greco, B.Matarazzo, R.Slowinski. Rough sets theory for multi criteria decis ion analysis [J]. European journal of operational research, 2001, 129(1): 1-47.

[20] S.Greco,B.Matarazzo,R.Slowinski. Rough sets methodology for sorting problems in presence of multiple attributes and criteria[J]. European journal of operational research, 2002, 138(2): 247 - 259.

[21] L.P.An, L.Y.Tong. Rough approximations based on intersection of indiscernibility, similarity and outranking relations [J]. Knowledge-Based Syst- ems, 2010, 23(6): 555-562.

[22] J.H.Dai,W.T.Wang,J.S. Mi. Uncertainty measur- rement for interval-valued information systems [J]. Information Sciences, 2013, 251: 63-78.

[23] J.H.Dai, Q.Xu. Approximations and uncertainty measures in incomplete information systems [J]. Information Sciences, 2012, 198: 62-80.

[24] T.Beaubouef,F.E.Petry, G. Arora. Information- theoretic measures of uncertainty for rough relational database. Information Sciences, 1998, 109(1-4):185-195.

[25] Y.Y.Yao,S.K.M.Wong,C.J.Butz. On information -theoretic measures of attribute importance [M]// Methodologies for Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 1999: 133-137.

[26] Y.Y.Yao,L.Q.Zhao. A measurement theory view on the granularity of partitions [J]. Information Sciences, 2012, 213: 1-13.

[27] Y.Y.Yao,X.F.Deng. Quantitative rough sets bas- ed on subsethood measures [J]. Information Sciences, 2014, 267: 306-322.

[28] Y.H.Qian,J.Y.Liang. Combination entropy and combination granulation in rough set theory [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2008, 16(02): 179-193.

[29] C.Shannon. The mathematical theory of communication [J]. Bell Syst. Tech, 27 (1948) 379 - 423.

Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015

ISBN: 978-1-941968-09-3 Β©2015 SDIWC 50