Causation and Dominance: A Study of Finnish Causative Verbs Expressing Social Dominance
Uncertainty Measurement Based on In-sim-dominance Relation
Transcript of Uncertainty Measurement Based on In-sim-dominance Relation
Uncertainty Measurement Based on In-sim-dominance Relation
Liulin Zhoua, Guoyin Wanga,b,*, Taihua Xub aChongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and
Telecommunications, Chongqing 400065, PR China. bSchool of Information Science and Technology, Southwest Jiaotong University, Chengdu
610031,PR China.
*Corresponding author.
Email addresses: [email protected] (Liulin Zhou), [email protected] (Guoyin Wang ),
[email protected](Taihua Xu).
ABSTRACT:
In-sim-dominance relation is proposed to deal
with hybrid information system in which the objects
are described by a finite set of qualitative and
quantitative attributes. Accuracy and roughness are
two main tools to deal with uncertainty measurement
issue in Pawlak rough set theory. However, there are
few studies on uncertainty measurement based on
the in-sim-dominance relation. In this paper,
traditional accuracy and roughness measurements
are extended to deal with hybrid information system,
approximation accuracy and approximation
roughness based on the in-sim-dominance relation
are also defined. In particular, a concept called
hybrid entropy is first introduced to measure the
uncertainty of a hybrid information system. Then
entropy-based roughness and approximation
roughness of hybrid information system are
proposed. Experiments are conducted on standard
UCI data sets to test the proposed methodologies,
and the results demonstrate that the entropy-based
approximation roughness is effective and suitable
for measuring the uncertainty of hybrid information
system.
Keyword: rough set, in-sim-dominance relation,
uncertainty measurement, hybrid information
system
1. INTRODUCTION
As a useful mathematical tool for dealing
with uncertain and ambiguous information,
rough set theory (RST) [1-2] proposed by
Pawlak has been studied by many scholars and
has been applied successfully in many research
area, such as data mining [3], pattern
recognition [4], decision making analysis [5],
artificial intelligence [6-7], knowledge
discovery [8], machine learning [9], and
intelligent data analyzing [10], etc. The main
thoughts of RST is that building a knowledge
database by using all known knowledge of given
data space, then classifying the knowledge
database by indiscernibility relation, in fact, the
process of classifying the knowledge database
can be viewed as classifying the given data
space. In this way, uncertain knowledge can be
described approximately by known knowledge
of knowledge database. Compared with other
data processing methods, RST is more objective
because it does not need prior knowledge.
As is well-known, the indiscernbility
relation in universe plays a crucial role for
Pawlak RST, but for many practical problems,
the binary relations on their universe are not
equivalent, then the application of the Pawlak
RST was limited. Therefore, many scholars
were devoted to extend the Pawlak RST,
indiscernibility relation is extended to gain the
RST based on the generalized indiscernibility
relation [11-20] for different information system.
Practically, there exists a hybrid information
system, the objects in it are described by several
attributes, and the value of attributes are various,
such as nominal value, integer value, numerical
value, interval value etc. In order to construct a
comprehensive preference model, it is
reasonable to consider both criteria and regular
attributes sometimes, An and Tong [21]
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 40
proposed the discernibility-similarity-
dominance matrix and its functions to induce the
decision rules based on the in-sim-dominance
relation.
Recently, many scholars proposed different
measurements for uncertainty in different RST.
Pawlak [2] proposed four numerical uncertainty
measurements, namely accuracy and roughness
in information table, approximation accuracy
and approximation roughness in decision table
to evaluate uncertainty of a rough set. Dai et.al
proposed an uncertainty measurement based on
the similarity degree for interval-valued
information systems [22], and approximation
accuracy for incomplete information systems
[23]. Beaubouef et.al [24] addressed the
measurement of uncertainty in rough sets and
rough relational databases by introducing a
measurement based on information entropy. Yao
et al. [25-27] worked on the attribute importance
in rough sets by information entropy
measurement. Liang [28] based on the
intuitionistic knowledge content nature of
information gain, the concepts of combination
entropy and combination granulation are
introduced in RST. However, there are few
studies on uncertainty measurements for hybrid
information system. In this paper, we address
the uncertainty measurement for hybrid
information system based on in-sim-dominance
relation. We investigate the properties of
in-sim-dominance relation; propose
approximation accuracy and approximation
roughness based on in-sim-dominance relation.
Moreover, the concepts of entropy-based
roughness and entropy-based approximation
roughness measurements are presented.
Experimental results show that the proposed
uncertainty measurements are effective for
evaluating the uncertainty in hybrid information
system based on in-sim-dominance relation.
The rest of this paper is organized as
follows. Some preliminary notions in RST are
briefly reviewed in Section 2. In Section 3,
in-sim-dominance relation and its rough
approximations are introduced, several
knowledge uncertainty measurements of hybrid
information systems based on
in-sim-dominance relation are defined, and then
some important properties of them are discussed.
Throw numerical experiments to evaluate the
proposed uncertainty measurementβs
effectiveness in Section 4. Then give the
conclusion in Section 5.
2. PRELIMINARY
In this section, we will review some basic
concepts in RST, including information system,
indiscernibility relation, rough approximations
and uncertainty measures.
2.1 Indiscernibility Relation And Rough
Approximations
An information system is a
quadruple πΌπ = {π , πΆ, π, π} , where U is a
non-empty finite set of objects called the
universe, C is a non-empty finite set of attribute
and V is the union of attribute domains such
that π = βπβπ΄ππ ,whereππ denotes the value
domain of attribute a for any π β πΆ , π β
π determines a information functionππ : π β
ππ,it means π(π, π₯) β ππ, where ππ is the set
of values of a, π π₯, π denotes the value of
attribute a for object x. A decision system is
defined as π·π =< π, πΆ βͺ π , π, π >, where C
is the set of condition attributes and d is a
decision attribute.
For an attribute subset π β πΆ determines
an indiscernibility relation that is denoted
by πΌππ·(π) and πΌππ· π = {(π₯,π¦) β π Γ
π|βπ β π, π π₯, π = π π¦, π . In fact, the
relation πΌππ· π induces a partition of U which
is denoted by π/πΌππ·(π) or π/π; the notion
[π₯]π denotes the indiscernibility class of P
containing x.
For any given information system πΌπ =
{π,πΆ , π, π} and π β πΆ, π β π, one can define
the lower and upper approximation of X:
πβ π = π₯ β π π₯ π β π (2.1)
πβ π = {π₯ β π| π₯ πβπ β β } (2.2)
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 41
2.2 Uncertainty Measurements in RST
The uncertainty of rough set is modeled
from the approximation regions, Pawlak [2]
proposed two numerical measurements for
evaluating uncertainty of an information system
or a decision system in rough set theory:
accuracy and roughness. Where accuracy is
defined by the ratio of the cardinalities of the
lower and upper approximation sets of X, then
through the accuracy figured out the roughness
by subtracting the accuracy from one. Let
πΌπ = {π, πΆ, π , π}be an information system, for a
domain subset π β π and an attribute
subset π β πΆ, accuracy and roughness of X with
respect to P are defined as:
πΌπ π =|πβ(π)|
|πβ(π)|, π½π π = 1 β πΌπ π (2.3)
However, the accuracy and roughness
donβt consider the decision attribute, so that they
are not suitable for the decision systems,
therefore, approximation accuracy and
approximation roughness were proposed by
Pawlak [2] for the decision systems.
Let DS =< π, πΆ βͺ π , π, π > be a
decision system, π/π = π·1,π·2,β¦ , π·π be
indiscernibility classes constituted by decision
attribute d on U and the condition attribute
subset π β πΆ. The approximation accuracy and
approximation roughness of π/π by P are
defined as:
πΌπ π/π = |π·πβπ/π πβ(π·π )|
|π·πβπ /π πβ(π·π)| (2.4)
π½π π/π = 1 β πΌπ π/π (2.5)
3. UNCERTAINTY MEASUREMENT
BASED ON IN-SIM-DOMAINANCE
RELATION
In this section, in-sim-dominance relation
and its rough approximations are introduced,
and then several uncertainty measurements
based on in-sim-domainance relation are
defined.
3.1 In-sim-domainance Relation and Rough
Approximations
Because of many real-world problems have
both qualitative and quantitative attributes,
according to Greco et al. [20], the information
system can be describe as follows:
Let IS = {U, C, V, f} be an information
table, where C = C= βͺ Cβ½ βͺ C~, C= is a subset
of nominal attributes, Cβ½ is a subset of ordinal
attributes and C~ is a subset of quantitative
attributes and C= β© Cβ½ = β ,C= β© C~ =
β , C~ β© Cβ½ = β . Furthermore, for any P β C,
the subsets of P are denoted by P=,Pβ½andPβΌ,
respectively:
1) the subset of nominal attributes,
i.e., P= = PβC= ,
2) the subset of ordinal attributes, i.e.,
Pβ½ = PβCβ½ ,
3) the subset of quantitative
attributes, i.e.,P~ = PβC~.
Furthermore, because of the key of rough
set philosophy is approximation of one
knowledge by another knowledge and the
in-sim-dominance relation among condition
attributes there are nominal attributes, ordinal
attributes and quantitative attributes, and
decision class are preference-ordered, the
approximated knowledge is a collection of
up-ward and down-ward unions of decision
classes and the βgranules of knowledgeβ are sets
of objects defined using indiscernibility,
similarity and outranking relations together.
Let DS =< π, πΆ βͺ π , π, π > be a
decision system, assuming that the decision
attribute d makes a partition of U into a finite
number of decision classes. Then the sets that
we use to be approximated are called the upward
union and downward union of decision classes,
respectively [19]οΌ
πΆππ‘β½ = β πΆππ π β₯π‘ ,πΆππ‘
βΌ = β πΆππ π β€π‘ , t=1,2,β¦,n.
The statement π₯ β πΆππ‘β½ means βx belongs
at least to class πΆππ‘ β and x β CltβΌ means βx
belongs at most to class πΆππ‘β. Then we can consider establishing the
indiscernibility relation on nominal attributes,
the outranking relation on ordinal attributes and
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 42
similarity or outranking relation on quantitative
attributes. Binary relations established on
different attributes can be considered jointly
(moreover, with the needs of the problems, we
can establish other binary relations).
Definition 3.1[21] Let πΌπ = {π, πΆ, π , π} be an
information table.C= β C,Cβ½ β C,C~ β C, P β
C , P= = PβC= , Pβ½ = PβCβ½ , P~ = PβC~ , the
in-sim-domainance relations of P on U are
defined as follows:
π ππβ½ = π₯, π¦ β π Γ π:π¦πΌππ₯βπ¦π·π
β½π₯βπ¦πππ₯ (3.1)
π ππβ½ = π₯, π¦ β π Γ π: π¦πΌππ₯βπ¦π·π
β½π₯βπ₯πππ¦ (3.2)
π ππβΌ = π₯, π¦ β π Γ π:π¦πΌππ₯βπ¦π·π
βΌπ₯βπ¦πππ₯ (3.3)
π ππβΌ = {(π₯, π¦) β π Γ π: π¦πΌππ₯βπ¦π·π
βΌπ₯βπ₯πππ¦} (3.4)
Where πΌπ is indiscernibility relation, DPβ½ is
outranking relation, DPβΌ is outranked relation
and ππ is similarity relation.
Definition 3.2[21] The global class of an
object x with respect to P are defined as:
π ππβ½ π₯ = π¦ β π: π¦π π
πβ½π₯ (3.5)
π ππβ½ π₯ = {π¦ β π: π¦π π
πβ½π₯} (3.6)
π ππβΌ π₯ = π¦ β π: π¦π π
πβΌπ₯ (3.7)
π ππβΌ π₯ = {π¦ β π: π¦π π
πβΌπ₯} (3.8) Theorem 3.1 Let πΌπ = {π, πΆ, π ,π} be an
information table.πΆ= β πΆ,πΆβ½ β πΆ , πΆ~ β πΆ, for
in-sim-domainance relations, βπ₯π β π ,if
π β π β πΆ , then we have:
π ππβ½ π₯π β π π
πβ½ π₯π ,π ππβ½ π₯π β π π
πβ½ π₯π ;
π ππβΌ π₯π β π π
πβΌ π₯π ,π ππβΌ π₯π β π π
πβΌ π₯π .
Proof: Since π β π and π₯π β π , then
π ππβ½ π₯π = πΌπ1
π₯π βπ·π2
β½ π₯π βππ3
π π₯π and
π ππβ½ π₯π = πΌπ1
π₯ π βπ·π2
β½ π₯π βππ3
π π₯π , where Pi, Qi
is the subset of P, Q respectively, it is easy to
obtain that πΌπ1 π₯π β πΌπ1
π₯π ,π·π2
β½ π₯π β π·π2
β½ π₯π
and ππ3
π π₯π β ππ3
π π₯π . Thus π ππβ½ π₯π β π π
πβ½ π₯π .
The others proof is similar.
Example1. An example of in-sim-dominance
binary relation
Table1[20] illustrates a representative
decisions of a decision maker (DM) concerning
8 warehouses described by means of 3 condition
attribute: a, capacity of the sales staff; b,
geographical region; c, area and a decision
attribute d specifies the assignment made by the
DM into 2 sets of warehouses making either
profit or loss.
Table1. A decision table.
Warehous
es
a b c d
x1 A 5
00
Medium Loss
x2 A 4
00
Good Profit
x3 A 4
50
Medium Profit
x4 B 4
00
Good Loss
x5 B 4
75
Good Profit
x6 B 4
25
Medium Profit
x7 B 3
50
Medium Profit
x8 B 3
50
Medium Loss
Table1 can be viewed as an example of
hybrid information system, the value of attribute
a is nominal; the value of attribute b is
quantitative; the value of attribute c and decision
attribute d are ordinal.
We consider dividing decision table into
upward-union πΆπππππππ‘β½ = π₯2,π₯3, π₯5,π₯6, π₯7 and
downward-union πΆππΏππ π βΌ = π₯1,π₯4,π₯8 .With
respect to attribute a establish the
indiscernibility relation, with respect to attribute
b establish the similarity relation that is defined
as[20]:
ππ = π₯π ,π₯π β π Γ π: |π π₯ π ,π β π π,π | β€
0.1ππ₯π,π (3.9)
and with respect to attribute c establish the
outranking relation that with the attribute value
βGoodβ is better than βMediumβ. And let
π·π = {π, πΆ βͺ π , π, π}, where πΆ = π , π, π is
the set of condition attributes and d is a decision
attribute, π = π₯1,π₯2, π₯3,π₯4, π₯5,π₯6, π₯7,π₯8 ,
then, the in-sim- dominance binary relation of C
on U is show in Table2.
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 43
Table2. The in-sim-dominance relation of C on U.
Warehouse π πΆπβ½ π₯ π
π πΆπβ½ π₯ π
π πΆπβΌ π₯ π
π πΆπβΌ π₯ π
x1 π₯1 ,π₯3 π₯1
π₯1 ,π₯3 π₯1
x2 π₯2 π₯2
π₯2 π₯2
x3 π₯3 π₯1 ,π₯3
π₯3 π₯1 ,π₯3
x4 π₯4 π₯4
π₯4 ,π₯6 π₯4 ,π₯6
x5 π₯5 π₯5
π₯5 π₯5
x6 π₯4 ,π₯6 π₯4 ,π₯6
π₯6 π₯6
x7 π₯7 ,π₯8 π₯7 ,π₯8
π₯7 ,π₯8 π₯7 ,π₯8
x8 π₯7 ,π₯8 π₯7 ,π₯8
π₯7 ,π₯8 π₯7 ,π₯8
Definition 3.3 Let DS =< U, C βͺ d , V, f > be
a decision table. With respect to π β πΆ, the set
of all objects belonging to πΆππ‘β½ without any left
ambiguity constitutes the π π -lower
approximation of πΆππ‘β½ denoted by πβ
π(πΆππ‘β½) and
the set of all objects that could belonging to πΆππ‘β½
constitutes the π π-upper approximation of πΆππ‘β½
denoted by π πβ(πΆππ‘β½), for t=1,2,β¦n: [21]
πβπ πΆππ‘
β½ = {π₯ β π: π ππβ½ π₯ β πΆππ‘
β½} (3.10)
π πβ πΆππ‘β½ = {π₯ β π:π π
πβΌ π₯ βπΆππ‘β½ β π} (3.11)
Definition 3.4 Let π·π =< π, πΆ βͺ π , π, π >
be a decision table. With respect to π β πΆ, the
set of all objects belonging to πΆππ‘β½ without any
right ambiguity constitutes the ππ -lower
approximation of πΆππ‘β½ denoted by πβ
π(πΆππ‘β½)
and the set of all objects that could belonging to
πΆππ‘β½ constitutes the ππ-upper approximation of
πΆππ‘β½ denoted by ππβ(πΆππ‘
β½), for t=1,2,β¦n: [21]
πβπ πΆππ‘
β½ = {π₯ β π: π ππβ½ π₯ β πΆππ‘
β½} (3.12)
ππβ πΆππ‘β½ = {π₯ β π: π π
πβΌ π₯ βπΆππ‘β½ β π} (3.14)
We can consider letting the intersection of
π π-lower and ππ-lower approximation of πΆππ‘β½,
the union of π π -upper and ππ -upper
approximation πΆππ‘β½ to be the lower and upper
approximation of πΆππ‘β½, as follows[21]:
πβ πΆππ‘β½ = πβ
π πΆππ‘β½ βπβ
π πΆππ‘β½ (3.15)
πβ πΆππ‘β½ = π πβ πΆππ‘
β½ βππβ πΆππ‘β½ (3.16)
Similarly, we can define the lower and
upper approximation of πΆππ‘βΌ, as follows:
πβπ πΆππ‘
βΌ = {π₯ β π: π ππβΌ π₯ β πΆππ‘
βΌ} (3.17)
π πβ πΆππ‘βΌ = {π₯ β π:π π
πβ½ π₯ βπΆππ‘βΌ β π} (3.18)
πβπ πΆππ‘
βΌ = {π₯ β π: π ππβΌ π₯ β πΆππ‘
βΌ} (3.19)
ππβ πΆππ‘βΌ = {π₯ β π: π π
πβ½ π₯ βπΆππ‘βΌ β π} (3.20)
πβ πΆππ‘βΌ = πβ
π πΆππ‘βΌ βπβ
π πΆππ‘βΌ (3.21)
πβ πΆππ‘βΌ = π πβ πΆππ‘
βΌ βππβ πΆππ‘βΌ (3.22)
πβ πΆππ‘β½ and πβ πΆππ‘
βΌ consist of those
objects which are precise ones, πβ πΆππ‘β½ and
πβ πΆππ‘βΌ consist of those objects which are
precise or left ambiguous or right ambiguous.
Theorem 3.2[21] (Monotonic) For any π‘ β π
and π β π β πΆ , then:
πβπ πΆππ‘
β½ β πβπ πΆππ‘
β½ , πβπ(πΆππ‘
β½) β πβπ(πΆππ‘
β½);
πβπ(πΆππ‘
βΌ) β πβπ(πΆππ‘
βΌ),πβπ(πΆππ‘
βΌ) β πβπ(πΆππ‘
βΌ);
π πβ πΆππ‘β½ β ππ β πΆππ‘
β½ ,ππβ(πΆππ‘β½) β ππ β(πΆππ‘
β½);
π πβ πΆππ‘βΌ β ππ β πΆππ‘
βΌ ,ππβ(πΆππ‘βΌ) β ππ β(πΆππ‘
βΌ).
Example2. The example of lower and upper
approximation of in-sim-dominance relation by
table1.
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 44
Let the condition attribute subset π = πΆ,
the upward-union is πΆπππππππ‘β½ and the
downward-union is πΆππΏππ π βΌ , then the lower and
upper approximation of πΆπππππππ‘β½ and πΆππΏππ π
βΌ on
P are:
πβ πΆπππππππ‘β½ = πβ
π πΆπππππππ‘β½ β© πβ
π πΆπππππππ‘β½
= π₯2,π₯5
πβ πΆπππππππ‘β½ = π πβ πΆπππππππ‘
β½ βͺ ππβ πΆπππππππ‘β½
= π₯1,π₯2,π₯3, π₯4,π₯5, π₯6,π₯7, π₯8
πβ πΆππΏππ π βΌ = πβ
π πΆππΏππ π βΌ β© πβ
π πΆππΏππ π βΌ = π
πβ πΆππΏππ π βΌ = π πβ πΆππΏππ π
βΌ βͺ ππβ πΆππΏππ π βΌ
= π₯1,π₯3,π₯4, π₯6,π₯7, π₯8
3.2. Uncertainty Measurements Based on
In-sim-dominance Relation
Definition 3.5 Let DS =< π, πΆ βͺ π , π , π >
be a decision table where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~ ,
for an attribute subset π β πΆ, the accuracy of
πΆππ‘β½ and πΆππ‘
βΌ with respect to P is defined as:
πΌπ πΆππ‘β½ =
πβ(πΆππ‘β½)
πβ(πΆππ‘β½ )
,πΌπ πΆππ‘βΌ =
πβ(πΆππ‘βΌ)
πβ(πΆππ‘βΌ )
(3.23)
Definition 3.6 Let π·π =< π, πΆ βͺ π , π, π >
be a decision table where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~ ,
for an attribute subset π β πΆ, the roughness of
πΆππ‘β½ and πΆππ‘
βΌ with respect to P is defined as:
ππ πΆππ‘β½ = 1 β πΌπ πΆππ‘
β½ (3.24)
ππ πΆππ‘βΌ = 1 β πΌπ πΆππ‘
βΌ (3.25)
Theorem 3.3 Let DS =< π, πΆ βͺ π , π, π > be
a decision table where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~ , if
π β π β πΆ , then ππ πΆππ‘β½ β₯ ππ πΆππ‘
β½ and
ππ πΆππ‘βΌ β₯ ππ πΆππ‘
βΌ .
Proof. Since π β π β πΆ , from the definition of
πβ(πΆππ‘β½) = πβ
π πΆππ‘β½ βπβ
π πΆππ‘β½ and πβ πΆππ‘
β½ =
π πβ πΆππ‘β½ βππβ πΆππ‘
β½ , according to the theorem
3.2, it is easy to obtain that
(πβπ(πΆππ‘
β½)βπβπ(πΆππ‘
β½)) β (πβπ(πΆππ‘
β½)βπβπ(πΆππ‘
β½))
and (π πβ(πΆππ‘β½)βππβ(πΆππ‘
β½)) β
(ππβ(πΆππ‘β½)βππβ(πΆππ‘
β½)), so πβ(πΆππ‘β½) β πβ(πΆππ‘
β½)
and πβ(πΆππ‘β½) β πβ(πΆππ‘
β½) . Then πβ(πΆππ‘
β½ )
πβ(πΆππ‘β½)
β€
πβ(πΆππ‘β½ )
πβ(πΆππ‘β½)
, thus πΌπ πΆππ‘β½ β€ πΌπ πΆππ‘
β½ . Therefore
ππ πΆππ‘β½ β₯ ππ πΆππ‘
β½ . The proof of ππ πΆππ‘βΌ β₯
ππ πΆππ‘βΌ is similar to ππ πΆππ‘
β½ β₯ ππ πΆππ‘β½ .
Definition 3.7 Let DS =< π, πΆ βͺ π , π , π >
be a decision table where C = C= βͺ Cβ½ βͺ C~ ,
and assume that π πΆππ‘β½ = {πΆππ‘, πΆππ‘+1, β¦ , πΆππ};
π πΆππ‘βΌ = {πΆπ1, πΆπ2,β¦ , πΆππ‘} be indiscernibility
class are constituted by decision attribute d on
the upward union πΆππ‘β½ and downward union
πΆππ‘βΌ of decision classes and condition attribute
subset π β πΆ. The approximation accuracy of
π πΆππ‘β½ and π πΆππ‘
βΌ with respect to P under
in-sim-dominance relation are defined as:
πΌπ π πΆππ‘β½ =
|πβ(ππ)|ππβπ πΆπ π‘
β½
|πβ(ππ)|ππ βπ πΆπ π‘
β½
(3.26)
πΌπ π πΆππ‘βΌ =
|πβ(ππ)|ππβπ πΆπ π‘
βΌ
|πβ(ππ)|ππ βπ πΆπ π‘
βΌ
(3.27)
Then we can define the approximation
roughness by the approximation accuracy under
in-sim-dominance relation as definition 3.6:
ππ π πΆππ‘β½ = 1 β πΌπ π πΆππ‘
β½ (3.28)
ππ π πΆππ‘βΌ = 1 β πΌπ π πΆππ‘
βΌ (3.29)
Theorem 3.4 Let π·π =< π , πΆ βͺ π , π ,π > be
a decision table, where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~ . If
π β π β πΆ , then ππ π πΆππ‘β½ β€ ππ π πΆππ‘
β½
and ππ π πΆππ‘βΌ β€ ππ π πΆππ‘
βΌ .
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 45
Proof. Sinceπ β π, according to theorem 3.1,
we know that βπ₯π β π , π ππβ½ π₯π βπ π
πβ½ π₯π =
π πβ½ π₯π β π π
β½ π₯π = π ππβ½ π₯π βπ π
πβ½ π₯π .
Consequently, βπ₯π β πΆππ‘β½ , π π
β½ π₯π β
πΆππ‘β½ and π π
β½ π₯π β πΆππ‘β½ . Hence, βπ β πΆππ‘
β½ , it
follows that |πβ π | β₯ |πβ π | , so βππ β
π πΆππ‘β½ , |πβ(ππ)| β₯ |πβ(ππ)|.
On the other hand, βπ₯π β π, π πβ½ π₯π βπ β
π and π πβ½ π₯π βπ β π , since βπ₯π β
π, π πβ½ π₯π β π π
β½ π₯π . Hence, βπ β πΆππ‘β½ , it
follows that |πβ π | β€ |πβ π | , so βππ β
π πΆππ‘β½ , |πβ ππ | β€ |πβ(ππ)|.Consequently, we
have ππ π πΆππ‘β½ β€ ππ π πΆππ‘
β½ .The proof of
ππ π πΆππ‘βΌ β€ ππ π πΆππ‘
βΌ is similar.
3.3. Entropy-based Uncertainty Measurements
Based on In-sim-dominance Relation
Shannon provided a useful measurement
that called entropy to measure the information
of data set in information theory [29]. In fact,
entropy can be used as uncertainty measurement
in rough set theory that some scholars studied
early [24-28].
In this part, we define new entropy called
hybrid entropy based on in-sim-dominance
relation, then two more useful measurements
called entropy-based approximation roughness
of upward-union or downward-union are
proposed based on hybrid entropy.
Definition 3.8 Let DS =< π, πΆ βͺ π , π , π >
be a decision table, where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~ ,
for any attribute subset π β πΆ , the hybrid
entropy with respect to P is defined as:
π» π = β1
2
|π ππβ½(π₯π)βπ π
πβ½(π₯π)|
|π|2 πππ1
|π ππβ½(π₯π)βπ π
πβ½(π₯π)|
|π|
π=1
β1
2
|π ππβΌ(π₯π)βπ π
πβΌ(π₯ π)|
|π|2πππ
1
|π ππβΌ(π₯π)βπ π
πβΌ(π₯π)|
|π|π=1 (3.30)
The hybrid entropy achieves the maximum
value πΆππ‘
β½ πππ πΆππ‘β½ + πΆππ‘β1
βΌ πππ πΆππ‘β1βΌ
2|π| whenβπ₯π β π,
π ππβ½ π₯π βπ π
πβ½ π₯π = πΆππ‘β½ and
π ππβΌ π₯π βπ π
πβΌ π₯π = πΆππ‘β1βΌ , and it achieves the
minimum value 0 when βπ₯π β π ,
π ππβ½ π₯π βπ π
πβ½ π₯π = {π₯π} and
π ππβΌ π₯π βπ π
πβΌ π₯π = {π₯π} . Hence, we have
0 β€ π» π β€ πΆππ‘
β½ πππ πΆππ‘β½ + πΆππ‘β1
βΌ πππ πΆππ‘β1βΌ
2|π|.
Theorem 3.5 (Monotonic) Let DS =< π, πΆ βͺ π , π , π > be a decision table οΌ where πΆ =
πΆ= βͺ πΆβ½ βͺ πΆ~ . If π β π β πΆ , then π» π β€
π» π .
Proof. Since π β π β πΆ , according to the
theorem 3.1, we have π ππβ½(π₯π)βπ π
πβ½(π₯π) β
π ππβ½ (π₯π)βπ π
πβ½(π₯π) and π ππβΌ(π₯π)βπ π
πβΌ(π₯π) β
π ππβΌ(π₯π)βπ π
πβΌ(π₯π) then πππ1
|π ππ β½(π₯ π)βπ π
π β½(π₯ π)|β₯
πππ1
|π ππβ½ (π₯ π)βπ π
π β½ (π₯ π)| and πππ
1
|π ππ βΌ(π₯ π)βπ π
π βΌ(π₯ π)|β₯
πππ1
|π ππβΌ (π₯ π)βπ π
π βΌ (π₯ π)|, therefore π» π β€ π» π .
Theorem 3.6(Equivalence) Let DS =< π, πΆ βͺ π , π , π > be a decision tablewhere πΆ = πΆ= βͺ
πΆβ½ βͺ πΆ~ . For π, π β πΆ , if βπ₯π β π ,
π ππβ½ π₯π βπ π
πβ½ π₯π = π ππβ½ π₯π βπ π
πβ½ π₯π and
π ππβΌ π₯π βπ π
πβΌ π₯π = π ππβΌ π₯π βπ π
πβΌ π₯π ,then
π» π = π» π .
Proof. It is easy to prove by the definition 3.2
and 3.8.
Definition 3.9 Let DS =< π, πΆ βͺ π , π , π >
be a decision tableοΌ where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~,
for an attribute subset π β πΆ, the entropy-based
roughness of πΆππ‘β½ and πΆππ‘
βΌ with respect to P
under in-sim-dominance relation are defined as
follows:
π»ππ πΆππ‘β½ = ππ πΆππ‘
β½ π» π (3.31)
π»ππ πΆππ‘βΌ = ππ πΆππ‘
βΌ π» π (3.32)
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 46
Theorem 3.7 Let DS =< π, πΆ βͺ π , π, π > be
a decision tableοΌwhere πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~, if
π β π β πΆ , then π»ππ πΆππ‘β½ β€ π»ππ πΆππ‘
β½ and
π»ππ πΆππ‘βΌ β€ π»ππ πΆππ‘
βΌ .
Proof. Since Q β P β C, we have ππ πΆππ‘β½ β€
ππ πΆππ‘β½ by theorem3.3 and π» π β€ π» π by
theorem3.4. Hence, we get π»ππ πΆππ‘β½ β€
π»ππ πΆππ‘β½ . And the proof of π»ππ πΆππ‘
βΌ β€
π»ππ πΆππ‘βΌ is similar.
Definition 3.10 Let π·π =< π, πΆ βͺ π , π, π >
be a decision tableοΌ where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~,
and assume that π πΆππ‘β½ = {πΆππ‘ , πΆππ‘+1,β¦ , πΆππ},
π πΆππ‘β½ = {πΆπ1, πΆπ2,β¦ , πΆππ‘} be indiscernibility
class constituted by decision attribute d on the
upward union πΆππ‘β½ and downward union πΆππ‘
βΌ
of decision classes and condition attribute
subset π β πΆ . The entropy-based
approximation roughness of π πΆππ‘β½ and
π πΆππ‘βΌ with respect to P under
in-sim-dominance relation are defined as
follows:
π»ππ π πΆππ‘β½ =ππ π πΆππ‘
β½ π» π (3.33)
π»ππ π πΆππ‘βΌ = ππ π πΆππ‘
βΌ π» π (3.34)
Theorem 3.8 Let DS =< π, πΆ βͺ π , π, π > be
a decision table, where πΆ = πΆ= βͺ πΆβ½ βͺ πΆ~ , if
π β π β πΆ , then we have π»ππ π πΆππ‘β½ β€
π»ππ π πΆππ‘β½ , π»ππ π πΆππ‘
βΌ β€ π»ππ π πΆππ‘βΌ .
Proof. Since π β π β πΆ ,we have
ππ π πΆππ‘β½ β€ ππ π πΆππ‘
β½ by theorem3.4 and
π» π β€ π» π by theorem 3.5. Hence, we get
π»ππ πΆππ‘β½ β€ π»ππ πΆππ‘
β½ . And the proof of
π»ππ πΆππ‘βΌ β€ π»ππ πΆππ‘
βΌ is similar.
Example3. The comparison of approximation
roughness and entropy-based approximation
roughness.
Fig.1. Approximation roughness vs. entropy-based
approximation roughness of upward-union.
Fig.2. Approximation roughness vs. entropy-based
approximation roughness of downward-union.
As example1, we established the
in-sim-dominance relation, and then we
calculate the value of approximation roughness
and entropy-based approximation roughness to
compare their difference. The results are shown
in Fig.1-2. It is easy to find that both
ππ π πΆπππππππ‘β½ and ππ π πΆππΏππ π
βΌ do not
change as the number of attributes increase from
2 to 3. By contrast, the entropy-based
approximation roughness can discern them
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 47
clearly. It means that entropy-based
approximation roughness π»ππ π πΆππ‘β½ and
π»ππ π πΆππ‘β1βΌ is more powerful for evaluating
the uncertainty in some cases.
4. EXPERIMENTS
In order to verify the effectiveness of the
uncertainty measures proposed above, we
conduct four experiments on two real- life data
sets which are Heart disease dataset and Credit
approval from UCI repository of machine
learning database. The experiments are
conducted by the analysis of the properties of
every data set attributes, establishes of
corresponding binary relation, then build the
in-sim-dominance relation.
The Heart disease dataset, there are 6 real
attributes, 1 ordered attribute, 3 binary attributes,
3 nominal attributes and 1 decision attribute.
And it has 270 objects that have heart disease or
not. The results are shown in Fig.3-4.
The Credit approval, there are 6 categorical
attributes, 3 binary attributes, 6 qualitative
attributes, 1 decision attribute. And it has 690
objects that belong to the class - or class +.
Because there are 2 attributes have the same
classification and it has missing value, so
through the preprocessing, there are 14
Fig.3. The result of Heart d isease datasetβs upward-union
condition attributes and 653 objects. The results
are shown in Fig.5-6.
Fig.4. The result of Heart d isease datasetβs
downward-union.
Fig.5. The result of Credit approval datasetβs
upward-union.
Fig.6. The result of Credit approval datasetβs
downward-union.
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 48
It can be seen that the values of
entropy-based approximation roughness and
approximation roughness measure are
decreasing with the number of attributes
becoming bigger from the Fig3-6, means that
the uncertainty decrease when the attributes
increases, on the other hand, if supplying more
available knowledge, the uncertainty will
decrease. The experiments demonstrate the
availability of the two uncertainty
measurements based on in-sim-dominance
relation. Through the Fig3-6, it is easy to find
that the value of approximation roughness has
no change when the number of attributes
increases from 1 to 2 in Heart disease and Credit
card data sets. By contrast, the entropy-based
approximation roughness can evaluate
uncertainty of hybrid information system more
accurately.
5. CONCLUSIONS
In this paper, we studied the uncertainty
measurement based on in-sim-dominance
relation. The roughness and approximation
roughness measurements are extended to deal
with the hybrid information system firstly, and
then propose the entropy-based roughness and
the entropy-based approximation roughness of
Up-union and Down-union to measure the
uncertainty of hybrid information system based
on the hybrid entropy. And the experimental
results demonstrate that the approximation
roughness and the entropy-based approximation
roughness measurements are useful and
effective for evaluating the uncertainty of hybrid
information system. However, the results also
show that the entropy-based approximation
roughness can evaluate the uncertainty more
clearly than approximation roughness.
Moreover, we can establish the other binary
relations such as tolerance relation to jointly
with in-sim-dominance relation meet the needs
of special problems.
Acknowledgment
This work is supported by National Natural
Science Foundation of China (No. 61272060),
and Key Natural Science Foundation of
Chongqing (No. CSTC2013jjB40003).
REFERENCES
[1] Z.Pawlak.Rough sets: International Journal of computer and Information Sciences, 11(5): 341- 356[J]. 1982
[2] Z.Pawlak.Rough sets: theoretical aspects of reaso- ning about data, system theory, Knowledge Engineering and Problem Solving, vol. 9[J]. 1991.
[3] F.Hu,G.Y.Wang. Quick reduction algorithms bas- ed on attribute order [j].Chinese Journal of Computers, 8:029, 2007.
[4] S.Hirano,S.Tsumoto. Segmentation of medical images based on approximations in rough set theory. In Rough Sets and Current Trends in Computing, pages554β563. Springer, 2002.
[5] Z.Pawlak.Rough set approach to knowledge-based decision support. Europeanjournal of operational research, 99(1):48β57, 1997.
[6] Y.Z.Liu,H.Y.Xuan,G.X.Lin. Application research on tax forecasting in china based on rough set theory [j]. Systems Engineering-theory & Practice, 10:017, 2004.
[7] R.R.Tan. Rule-based life cycle impact assessment using modified rough set induction methodology. Environmental Modeling & Software, 20(5):509β513, 2005.
[8] K.Thangavel,A.Pethalakshmi. Dimensionality red- uction based on rough set theory: A review. Applied Soft Computing, 9(1):1β12, 2009.
[9] J.H.Zhang,Y.Y.Wang. A rough margin based support vector machine. Information Sciences, 178(9):2204β2214, 2008.
[10] T.Herawan,M.M.Deris, and J.H. Abawajy. A ro- ugh set approach for selecting clustering attribute. Knowledge-Based Systems, 23 (3): 220 β231, 2010.
[11] M.Kryszkiewicz. Rough set approach to incomp- lete information systems [J]. Information sciences, 1998, 112(1): 39-49.
[12] M. Kryszkiewicz. Rules in incomplete informati- on systems. Information Sciences, 113(3):271-292, 1999.
[13] J.Stefanowski,A.Tsoukias. On the extension of rough sets under incomplete information. In New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pages 73-81. Springer, 1999.
[14] J.Stefanowski,A.Tsoukias. Incomplete informati- on tables and rough classification. Computational Intelligence, 17(3):545-566, 2001.
[15] J.Stefanowski,A.Tsoukias. Valued tolerance and decision rules. In Rough Sets and Current Trends in Computing, pages 212-219. Springer, 2001.
[16] G.Y.Wang. Extension of rough set under incomp- lete information systems. In Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on, volume 2, pages 1098-1103.IEEE, 2002.
[17] J.W.Grzymala-Busse. Characteristic relations for incomplete data: A generalization of the indiscer-
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 49
nibility relation. In Rough Sets and Current Tren- ds in Computing, pages 244-253. Springer, 2004.
[18] J.W.Grzymala-Busse. Rough set strategies to data with missing attribute values. In Foundations and Novel Approaches in Data Mining, pages 197-212. Springer, 2006.
[19] S.Greco, B.Matarazzo, R.Slowinski. Rough sets theory for multi criteria decis ion analysis [J]. European journal of operational research, 2001, 129(1): 1-47.
[20] S.Greco,B.Matarazzo,R.Slowinski. Rough sets methodology for sorting problems in presence of multiple attributes and criteria[J]. European journal of operational research, 2002, 138(2): 247 - 259.
[21] L.P.An, L.Y.Tong. Rough approximations based on intersection of indiscernibility, similarity and outranking relations [J]. Knowledge-Based Syst- ems, 2010, 23(6): 555-562.
[22] J.H.Dai,W.T.Wang,J.S. Mi. Uncertainty measur- rement for interval-valued information systems [J]. Information Sciences, 2013, 251: 63-78.
[23] J.H.Dai, Q.Xu. Approximations and uncertainty measures in incomplete information systems [J]. Information Sciences, 2012, 198: 62-80.
[24] T.Beaubouef,F.E.Petry, G. Arora. Information- theoretic measures of uncertainty for rough relational database. Information Sciences, 1998, 109(1-4):185-195.
[25] Y.Y.Yao,S.K.M.Wong,C.J.Butz. On information -theoretic measures of attribute importance [M]// Methodologies for Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 1999: 133-137.
[26] Y.Y.Yao,L.Q.Zhao. A measurement theory view on the granularity of partitions [J]. Information Sciences, 2012, 213: 1-13.
[27] Y.Y.Yao,X.F.Deng. Quantitative rough sets bas- ed on subsethood measures [J]. Information Sciences, 2014, 267: 306-322.
[28] Y.H.Qian,J.Y.Liang. Combination entropy and combination granulation in rough set theory [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2008, 16(02): 179-193.
[29] C.Shannon. The mathematical theory of communication [J]. Bell Syst. Tech, 27 (1948) 379 - 423.
Proceedings of the The Second International Conference on Artificial Intelligence and Pattern Recognition, Shenzhen, China, 2015
ISBN: 978-1-941968-09-3 Β©2015 SDIWC 50