Efficient Reduction of the Number of Associations Rules using Fuzzy Clustering on the Data

14
IJCA, Vol. 16, No. 4, Dec. 2009 ISCA Copyright© 2009 1 An Innovative Contribution to Flexible Query Through the Fusion of Conceptual Clustering, Fuzzy Logic, and Formal Concept Analysis Amel Grissa Touzi * and Minyar Sassi * LSTS – ENIT, le Belvédère, 1002 Tunis, TUNISIA Habib Ounelli URPAH – FST, Campus Universitaire, 1060 Tunis, TUNISIA Abstract In this paper, we propose our contribution to support database flexible querying. It is based on ordered lattice theory extension while combing fuzzy logic with Formal Concept Analysis. Our approach consists of two steps: the first consists of data organization which permits us to deduce the data’s semantic. The second consists of seeking, by querying them, relevant data sources for a given query while defining a concept view by the set from data of the query dependencies. The contributions of this approach are a) The interdependence of the query search criteria, b) An optimal research of the relevant data sources for a given query, and c) The scheduling of the results. Key Words: Flexible querying, data mining, FCA, fuzzy logic, clustering. 1 Introduction The diversification of the applications of the database (DB) showed the limits of the Relational DataBase Management Systems (DBMS) in particular in the querying field [2]. The traditional querying of a Relational DB (RDB) is qualified by “Boolean querying” in the measure where the user formulates a query, with SQL for example, that returns a result or anything at all. This querying surrounds a problem for certain applications: (1) the user must knows all details concerning the diagram and the data of the database, (2) the user cannot express his preferences or use some imprecise linguistic terms (as “moderate”, “means”) to better characterize the sought- after data, what is often a legitimate request of users. The aim of the database flexible querying is extending this binary behavior by introducing preferences into the query criteria. Thus, an element returned over by a query will be more or less relevant according to the preferences that it will have satisfied. Several works were proposed in the literature to introduce the database flexible interrogation [4]. The majority of these works used the fuzzy sets formalism to model the linguistic terms such as (“moderated”, ”means”) and to evaluate predicates container such terms [2]. In [21], a flexible and cooperative database flexible querying approach within the fuzzy theory framework has been * BP37. Email: [email protected] and [email protected]. Email: [email protected]. proposed. This approach contributes two promising shares compared to the similar approaches. The first, taking into account the semantic dependences between the query research criteria to determine its realizability or not. The second, con- tribution related to its cooperative aspect in the flexible inter- rogation. To ensure these functionalities, they have proposed to construct a mono-attribute Type Abstraction Hierarchy (TAH) and a Multi-attribute Type Abstraction Hierarchy (MTAH) [8, 9]. Problems lie in: 1) The generation of TAH’s and MTAH from relieving attributes, 2) The storage and the indexing of such structures, and 3) The update of MTAH. To cure these problems, we propose the followings contributions: 1) The automatic generation of TAH’s and MTAH from relieving attributes, 2) The research of relevant data sources for a given query, 3) A detection of the query unrealizability, and 4) The scheduling of the results. For satisfying these, we propose a new flexible querying approach which combines two data analysis methods such as Formal Concept Analysis (FCA), based on ordered lattice theory extension, and fuzzy clustering. It permits to querying the database and to have relevant answers. The use of these methods is justified in the fact that, first of all, fuzzy clustering has been a very successful data analysis technique as demonstrated in different domains [27]. These techniques allow data to belong to several groups (or clusters) simultaneously, with different membership degrees. Secondly, Formal Concept Analysis (FCA) is a method for knowledge representation that takes advantage of the features of formal concepts [29]. The rest of the paper is organized as follows. Section 2 makes a review of flexible querying and some data analysis methods. Section 3 presents the problems and the contributions of this paper. Section 4 describes our database flexible querying approach. Section 5 presents an example of relieving query. Section 6 presents a comparison with other approaches. Section 7 studies the complexity of our approach. Finally, Section 8 concludes the paper and gives some future works. 2 Backgrounds In this section we give in the first part, basic concepts and the principal database flexible approaches. In the second part, we give the use of some data analysis methods such as fuzzy clustering and FCA.

Transcript of Efficient Reduction of the Number of Associations Rules using Fuzzy Clustering on the Data

IJCA, Vol. 16, No. 4, Dec. 2009

ISCA Copyright© 2009

1

An Innovative Contribution to Flexible Query Through the Fusion of Conceptual Clustering, Fuzzy Logic, and Formal Concept Analysis

Amel Grissa Touzi* and Minyar Sassi* LSTS – ENIT, le Belvédère, 1002 Tunis, TUNISIA

Habib Ounelli†

URPAH – FST, Campus Universitaire, 1060 Tunis, TUNISIA

Abstract In this paper, we propose our contribution to support database flexible querying. It is based on ordered lattice theory extension while combing fuzzy logic with Formal Concept Analysis. Our approach consists of two steps: the first consists of data organization which permits us to deduce the data’s semantic. The second consists of seeking, by querying them, relevant data sources for a given query while defining a concept view by the set from data of the query dependencies. The contributions of this approach are a) The interdependence of the query search criteria, b) An optimal research of the relevant data sources for a given query, and c) The scheduling of the results. Key Words: Flexible querying, data mining, FCA, fuzzy logic, clustering.

1 Introduction

The diversification of the applications of the database (DB) showed the limits of the Relational DataBase Management Systems (DBMS) in particular in the querying field [2]. The traditional querying of a Relational DB (RDB) is qualified by “Boolean querying” in the measure where the user formulates a query, with SQL for example, that returns a result or anything at all. This querying surrounds a problem for certain applications: (1) the user must knows all details concerning the diagram and the data of the database, (2) the user cannot express his preferences or use some imprecise linguistic terms (as “moderate”, “means”) to better characterize the sought-after data, what is often a legitimate request of users.

The aim of the database flexible querying is extending this binary behavior by introducing preferences into the query criteria. Thus, an element returned over by a query will be more or less relevant according to the preferences that it will have satisfied. Several works were proposed in the literature to introduce the database flexible interrogation [4]. The majority of these works used the fuzzy sets formalism to model the linguistic terms such as (“moderated”, ”means”) and to evaluate predicates container such terms [2].

In [21], a flexible and cooperative database flexible querying approach within the fuzzy theory framework has been * BP37. Email: [email protected] and [email protected]. † Email: [email protected].

proposed. This approach contributes two promising shares compared to the similar approaches. The first, taking into account the semantic dependences between the query research criteria to determine its realizability or not. The second, con-tribution related to its cooperative aspect in the flexible inter-rogation. To ensure these functionalities, they have proposed to construct a mono-attribute Type Abstraction Hierarchy (TAH) and a Multi-attribute Type Abstraction Hierarchy (MTAH) [8, 9]. Problems lie in: 1) The generation of TAH’s and MTAH from relieving attributes, 2) The storage and the indexing of such structures, and 3) The update of MTAH.

To cure these problems, we propose the followings contributions: 1) The automatic generation of TAH’s and MTAH from relieving attributes, 2) The research of relevant data sources for a given query, 3) A detection of the query unrealizability, and 4) The scheduling of the results.

For satisfying these, we propose a new flexible querying approach which combines two data analysis methods such as Formal Concept Analysis (FCA), based on ordered lattice theory extension, and fuzzy clustering. It permits to querying the database and to have relevant answers.

The use of these methods is justified in the fact that, first of all, fuzzy clustering has been a very successful data analysis technique as demonstrated in different domains [27]. These techniques allow data to belong to several groups (or clusters) simultaneously, with different membership degrees. Secondly, Formal Concept Analysis (FCA) is a method for knowledge representation that takes advantage of the features of formal concepts [29].

The rest of the paper is organized as follows. Section 2 makes a review of flexible querying and some data analysis methods. Section 3 presents the problems and the contributions of this paper. Section 4 describes our database flexible querying approach. Section 5 presents an example of relieving query. Section 6 presents a comparison with other approaches. Section 7 studies the complexity of our approach. Finally, Section 8 concludes the paper and gives some future works.

2 Backgrounds

In this section we give in the first part, basic concepts and the principal database flexible approaches. In the second part, we give the use of some data analysis methods such as fuzzy clustering and FCA.

IJCA, Vol. 16, No. 4, Dec. 2009

2

Price 50 60

Price ∈[50,60]

50 60 Price

Decomposition of database content according to only

Price attribute

Price [50,60]Surface [10,30]

Decomposition of database content according to two

attributes: Price and surface

30

10

Surface

2.1 Flexible Querying 2.1.1 Basic Concept. In the database field, several works introduce the concept of inaccuracy as well on the stored data level as on querying level [15]. To complete such works, fundamental research was undertaken on the following problems:

- Flexible queries formulation and evaluation - Vague or fuzzy data description and processing - Definition and use of fuzzy dependences - Fuzzy Data Mining.

Our work relates to the introduction of certain flexibility into the query writing.

The traditional database querying uses a query to find elements satisfying a Boolean condition. In certain applications, the user can find difficulty to characterize in a precise and clear way the information which it seeks. It can also express preferences on the search criteria level with various degrees of importance between these criteria. This is why the concept of flexible query was proposed in the database systems.

Definition 1: A flexible query is a query which comprises

vague descriptions and/or vague terms. Let us take, for example, the case of a person who wishes to

seek, in an advertisement database, an apartment close to the center of town with an accessible price. In order to express such preferences, this person can formulate a flexible query comprising the terms “near” et “accessible”. It can also express the fact that the price criterion is more significant than that of the distance.

2.1.2 Principal Flexible Querying Approaches. In this section, we present the essential idea of the principal relieving approaches closest to ours.

Four principal approaches were proposed to express and evaluate the flexible queries [3, 23]:

1) Use of the secondary criteria, 2) Use of the distances and the similarities, 3) Expression of the preferences with linguistic terms, and 4) Modeling of the inaccuracy by the fuzzy subsets theory.

Theses approaches differ primarily by the used procedure to find the set of similar values to those presented by the user. The second difference is by the formalism which is used to model real world imperfection. Relieving in CoBase system [9] bases itself on the data field’s decomposition by an automatic regrouping of elements which share a set of common characteristics. An example illustrating this approach is given in Figure 1. The operation of data regrouping is assured by the automatic construction of a Type Abstraction Hierarchy (TAH) for an attribute and of a Multi-attribute Type Abstraction Hierarchy (MTAH) in the case of decomposition according to several attributes. TAH allows representing the database fields on

several abstraction levels. The root of the hierarchy constitutes the most abstract representation. CoBase [9] used this data organization to carry out the query relieving. Relieving consists in extending the research space by the other values obtained after a succession of generalizations and specializations in the field containing this space or by the values of the field obtained after a succession of generalizations and specializations.

Figure 1: Example of relieving in CoBase system

In [21], a relieving approach within the fuzzy set framework was proposed. This approach contributes two promising shares compared to the preceding approaches. The first contribution is taking into account the semantic dependences between the query research criteria to determine its realizability or not. The second contribution relates to its co-operative aspect in the flexible interrogation. For the dependencies extraction, this approach consists in building TAH’s and MTAH from relieving attributes. The problem here lies in storage, indexing of such structures and the incremental update of these structures. 2.2 Data Analysis Methods

Cluster Analysis and Formal Concept Analysis are well-

known graphical methods in data analysis [18]. For its part, cluster analysis loses some information on the way from the original data to the graphical output. On the other hand, Formal Concept Analysis may represent the original data without loss of information in the plane.

2.2.1 Cluster Analysis. The objective of cluster analysis [27] is the classification of objects according to similarities among them, and organizing of data into groups. Clustering techniques are among the unsupervised methods, they do not use prior class identifiers. The main potential of clustering is to detect the underlying structure in data for model reduction and optimization. Since groups (or clusters) can formally be seen as subsets of the data set, one possible classification of clustering methods can be according to whether the subsets are hard or fuzzy.

The goal of hard clustering is to assign each data point to one and only one cluster. In real applications, there is very often no sharp boundary between clusters so that fuzzy

IJCA, Vol. 16, No. 4, Dec. 2009

3

clustering is often better suited for the data. It assigns different degrees of membership to each point. The membership of a point is thus shared among various clusters.

Several researches were carried out for the automatic determination of the number of clusters [13] and the quality evaluation of the obtained partitions. They are based on the definition of an objective function [1] making it possible to measure the quality of the obtained partitions. A prominent fuzzy clustering algorithm for determining the optimal number of clusters is presented in [26]. This algorithm for determining the optimal number of cluster is presented in [26].

To make it easier for the readers to understand the ideas behind clustering technique, we tried to unify the notation used in this list. To achieve that, the following definitions are assumed:

- MNRX ×∈ denotes a set of data items representing a set of N objects ix in MR ,

- ( )ik AC is the thk cluster of attribute iA , and - ( )iAC is the optimal number of clusters found for the

attribute iA .

Fuzzy clustering techniques allow objects to belong to several clusters simultaneously, with different degrees of membership. The data set X is thus partitioned into C fuzzy subsets [1]. The structure of the partition matrix [ ]ikU µ= .

Despite being a very effective technique, difficulties arise when interpreting fuzzy clustering results. In the case of large samples, the large number of membership values with respect to the constructed clusters makes it almost impossible to effectively compare the fuzzy properties of the objects. In this case, the mutual relationships between specific clusters can be masked. In addition, the relationships between data structure and fuzzy clustering results are difficult to understand.

2.2.2 Ordered Lattice Theory. Formal Concept Analysis

(FCA) is an effective data analysis technique and knowledge discovery. Concept lattice, which is derived from mathematical order theory and lattice theory, is the core of FCA. Many research works of various areas show that concept lattices structures is an effective platform for data mining, machine learning, information retrieval, software engineer, etc. The core of FCA is concept lattice. The theoretical foundation of concept lattice is found in the mathematical lattice theory [10]. Lattice is a popular mathematical structure for modeling conceptual hierarchies. Concept lattice is a method for deriving conceptual structures out of data. It describes the character of the set pair: intent and extent of concept. This understanding has been formalized by a formal context, K defined by a triple ),,( IMG where G and M are sets and I is a binary relation between G and M (i.e., MGI ×⊆ ). The elements of G are called objects or transactions, while the elements of M are called attributes or items.

The formal concept of K are the pair (A,B) with GA ⊆ and MB ⊆ such that ( )BA, is maximal with respect to the property

IBA ⊆× . The set A is called the extent and B is called the intent of the concept (A,B) [29].

The set C of all concepts of a context K with order ( ) ( ) 212211 :,, AABABA ⊆⇔≤ is always a complete lattice, and is called the concept lattice of the context K, noted in the rest of the paper as ( )Cℑ .

Example: Table 1 shows a formal context where the object set G comprises all the students, namely S1,S2,S3,S4,S5 and S6 and the attribute set M lists five interesting modules: Databases(DB), Programming Languages(PL), Networks(NT), Litterature(LI) and Anothers Topics(AT). The relationship between an object and an attribute is represented by a cross.

Table 1: A formal context about the interesting

modules of students DB PL NT LI AT S1 × × × S2 × × × S3 × × × S4 × × S5 × × S6 × ×

Figure 2 shows the concept lattice of the context in Table 1

by a line diagram

Figure 2: The concept lattice of the context in the Table 1 In this line diagram, each node represents a formal concept.

A concept 1C is a subconcept of a concept 2C if and only if there is a path of descending edges from the node representing

2C to the node representing 1C . The name of an object g is always attached to the node representing the smallest concept with g and its extent; dually, the name of an attribute m is always attached to the node representing the largest concept with m and its intent. We can read the context relation from the diagram because an object g has an attribute m if and only if the concept labelled by g is a subconcept of the one labelled by m . The extent of a concept consists of all objects whose labels are attached to subconcepts, and, dually, the intent consists of all attributes attached to superconcepts.

In the top of the lattice, we find the modules which are served by most of the students.

A data context is usually represented by the binary data but, in practice, the values of attribute are not binary, we can

({φ },{DB,PL,NT,LI,AT})

({S2,S3,S5},{LI}) ({S1,S3,S4},{NT}) ({S2,S6},{PL})

({S2},{PL,LI})

({S3},{NT,LI})

({S1},{AT})

({S1,S2,S3,S4,S5,S6},{DB})

IJCA, Vol. 16, No. 4, Dec. 2009

4

transform many-valued data context to binary values context by concept scaling [30].

Such data is formalized as a many valued context ( )IWMG ,,, . G is a set of objects, M is a set of attributes, W is a set of attribute values, and WMGI ××⊆ is a relation such that ( ) Iwmg ∈111 ,, and ( ) Iwmg ∈222 ,, implies that 21 ww = . Attributes

im are understood as partial functions from G into W . A formal context in which there are no attribute values is often called a single-valued context. Many-valued contexts can be mapped into formal contexts using conceptual scales [30]. A conceptual scale for a set MY ⊆ is a single-valued context

( )YYYY IMGS ,,:= with mYmY WG ∈×⊆ . The idea is to replace the attribute values in mW which are often too specific by more general attributes which are provided in YM . The concept lattice of the derived context can be visualized in a nested line diagram. In nested line diagrams, the nodes of the concept lattice of the first scale are enlarged so that the concept lattice of the second scale can be drawn inside. The second lattice is then used to further differentiate each of the extents of the concepts of the first lattice.

Though lattice-based information representation has the advantage of providing efficient visual interface over textual display, the complexity of a lattice may grow rapidly with the size of the database. Another problem is the creating of conceptual scales process. It is performed by using knowledge from the domain from which the data is taken. Often conceptual scales are created by hand, and their concept lattice, since they are represented by formal contexts, laid out by hand. Conceptual scales may also be used to impose an external ordering on the clusters, both for a many-valued context, and for a single valued context.

3 Problems and Contributions

Despite being a very effective technique, fuzzy clustering

presents some difficulties when interpreting fuzzy clustering results. n this case, the mutual relationships between specific clusters can be masked and difficult to understand. n the same way, despite its mathematical foundations, one limit of using lattice is complexity (number of concepts).

In this paper, we propose to combine the FCA data analysis method based on lattice theory with the fuzzy cluster analysis method for:

- Reduction of the number of concepts by using fuzzy

clustering method, - Generation of hierarchy thus allowing the interpretation

and the extraction of the dependences.

However, there are many situations in which uncertain information also occurs. Traditional ordered lattice theory is hardly able to represent such vague information. To tackle this problem, we propose to combine fuzzy logic with the ordered lattice theory in which uncertain information is directly represented by a real number of membership value in the range of [0,1]. s such, linguistic variables are no longer needed. n

comparison with the fuzzy ordered lattice generated from the L-fuzzy [16] context, the fuzzy concept lattice generated using the ordered lattice extension will be simpler in terms of the number of formal concepts, and it also supports a formal mechanism for calculating concept similarities. Therefore, the proposed ordered lattice extension provides a suitable data structure.

This ordered lattice extension is applied in the Information Retrieval (IR) to exceed the problems of the approaches proposed in literature such as the research of the relevant data sources for a given query and the scheduling of the results.

4 The Proposed Approach

We propose in this section a relieving approach within the

fuzzy set framework [31]. For this, we consider a relational database containing

relieving attributes, i.e., attributes which the users can use in a predicate of comparison containing a linguistic term. In this paper, we limit ourselves to relieving numerical attributes.

Figure 3 shows the proposed approach. It consists of two steps: the first step consists in data organization and the second aims at seeking, to interrogate them, relevant data sources for a given query.

The first step consists of TAH’s and MTAH generation of relieving attributes. The second consists of; first of all, to determine the concept relative to the user query to check its realizability. Third, we construct the part of lattice relative to this concept and finally scheduling of relevant query Answers.

4.1 TAH’s Generation

We have mentioned in Section 2.2.1 that fuzzy clustering

methods allow data to belong to several clusters simultane-ously, with different degrees of membership. The data set X is thus partitioned into C fuzzy partitions (clusters).

Definition 2: A fuzzy formal context is a triplet

( )( )MGIMG ×ϕ==Κ ,, where G is a set of objects, M is a set of clusters, and I is a fuzzy set on domain MG × . Each relation ( ) Img ∈, has a membership value ( )mg,µ in [ ]1,0 .

However, data is formalized as a many-valued context ( )IWMG ,,, , where G is a set of objects, M is a set of attributes, W is a set of attribute values, and WMGI ××⊆ .

Definition 3: A fuzzy conceptual scale for a set MY ⊆ is a (single-valued) fuzzy formal context

( )( )YYYYYY MGIMGS ×ϕ== ,,: with mYmY WG ∈×⊆ . The idea is to allow objects G to belong to several clusters simultaneously. We replace the attribute values in mW with different degrees of membership. Each relation ( ) YImg ∈, has a membership value ( )mg,µ in [ ]1,0 . The sum of the values of each fuzzy conceptual scale is equal to 1.

Example: Let a relational database table presented by Table

2 containing the list of the notes of the students in the different modules.

IJCA, Vol. 16, No. 4, Dec. 2009

5

( )MetaDataQueryQ ,=

Justification of unrealizability of the query

Unrealizable Query

Detection of the failure causes

Figure 3: Proposed approach

Table 2: A relational database table

DB PL NE LI AT S1 15 14 12 14 10 S2 14 15 9 8 10 S3 16 13 12 12 7 S4 7 10 14 12 8 S5 11 5 18 15 14 S6 12 11 10 10 10 S7 17 6 14 15 14 S8 9 10 12 11 10 S9 5 6 10 6 10 S10 13 7 12 14 13

Fuzzy conceptual scales, generates from fuzzy clustering

results (we can use some fuzzy clustering algorithm such as FCM [1] or EFCM [26], relating to the relieving attributes DB and PL are those of Table 3.

Table 3: Fuzzy conceptual scales for DB and PL attributes

DB PL C1 C2 C3 C4 C5

S1 0.007 0.940 0.053 0.011 0.989 S2 0.031 0.581 0.388 0.044 0.956 S3 0.001 0.995 0.004 0.002 0.998 S4 0.981 0.004 0.014 0.517 0.483 S5 0.033 0.031 0.936 0.952 0.048 S6 0.001 0.001 0.998 0.228 0.772 S7 0.013 0.931 0.056 0.986 0.014 S8 0.509 0.075 0.416 0.517 0.483 S9 0.944 0.016 0.041 0.986 0.014 S10 0.025 0.142 0.833 0.789 0.211

Definition 4: Given a fuzzy conceptual scale ( )( )YYYYYY MGIMGS ×ϕ== ,,: , we define an

( ) ( )( ) 1−=−α ii ACACut where ( )iAC the optimal number of clusters of scale iA .

In our example, 3.0)( =−α DBCut and .5.0)( =−α PLCut Table 4 presents the fuzzy conceptual scales for DB and PL

attributes.

Table 4: Fuzzy conceptual scales for DB and PL attributes with Cut−α

DB PL C1 C2 C3 C4 C5

S1 - 0.940 - - 0.989 S2 - 0.581 0.388 - 0.956 S3 - 0.995 - - 0.998 S4 0.981 - - 0.517 0.483 S5 - - 0.936 0.952 - S6 - - 0.998 - 0.772 S7 - 0.931 - 0.986 - S8 0.509 - 0.416 0.517 0.483 S9 0.944 - - 0.986 - S10 - - 0.833 0.789 -

Traditional ordered lattice is hardly able to represent fuzzy

properties from uncertain data. To tackle this problem, we use a new technique that incorporates fuzzy logic into the ordered lattice theory in which uncertain information is directly represented by a real number of membership value in the range of [0,1].

MTAH

Fuzzy Clustering

Fuzzy partitions

TAH’s

An Attribute’s

fuzzy lattice

A Fuzzy Nested Lattice

Initial user query

Query concept Construction

Generation of relevant Answers

Scheduling of the result

Data Organization

Ordered Final result

Checking of the realizability as of the

Data Querying

IJCA, Vol. 16, No. 4, Dec. 2009

6

({S2(0.39),S5(0.93),S6(1.00),S8(0.41),S1({S1(0.94),S2(0.58),S3(0.99),S7(0.93}}

({S4(0.98),S8(0.51),S9(0.94)}

({S2(0.39)},{C1,C3})({S8(0.41)},{C1,C3})

0.00

0.00 0.00

0.17 0.11

0.110.11

({S1(0.0),S2(0.0),S3(0.0),S4(0.0),S5(0.0),S6(0.0),S7(0.0),S8(0.0),S9(0.0),S10(0.0)},{Θ })

({Φ},{C1,C2,C3})

({S4(0.48),S8(0.48)},{C4,C5})

({S1(0.99),S2(0.95),S3(1.0),S4(0.48),S6(0.77),S8(0.48)},{C5})

({S4(0.51),S5(0.95),S7(0.98),S8(0.51),S9(0.98),S10(0.79)},{C4})

0.00 0.00

0.20

0.20

({S1(0.0),S2(0.0),S3(0.0),S4(0.0),S5(0.0),S6(0.0),S7(0.0),S8(0.0),S9(0.0),S10(0.0)},{Θ })

Definition 5: Given a fuzzy formal context ( )( )MGIMG ×ϕ==Κ ,, and an Cut−α , we define

( ){ }CutmggMm −α≥µΧ∈∀∈=Χ ,:* for G⊆Χ and

( ){ }CutmgmGg −α≥µΥ∈∀∈=Υ ,:* for MY ⊆ . A fuzzy formal concept (or fuzzy concept) of a fuzzy formal context ( )( )MGIMG ×ϕ==Κ ,, with an Cut−α is a pair

( )( )ΥΧϕ=Χ ,f where ,G⊆Χ MY ⊆ , BA =* and .X*Y =

Each object ( )Χϕ∈g has a membership gµ defined as

( )mgYm

g ,min µ=µ∈

.

Where ( )mg,µ is the membership value between object g and attribute m , which is defined in I. Note that if { }=Υ then

1=µg for every g . Generally, we can consider the attributes of a formal concept

as the description of the concept. Thus, the relationships between the object and the concept should be the intersection of the relationships between the objects and the attributes of the concept. Since each relationship between the data and an attribute is represented as a set of membership values in fuzzy formal context, then the intersection of these membership

values should be the minimum of these membership values, according to fuzzy theory.

Definition 6: Let ( )11, BA and ( )22, BA be two fuzzy concepts of a fuzzy formal context ( )( )MGIMG ×ϕ==Κ ,, . ( )( )11 , BAϕ is a the sub concept of ( )( )22 , BAϕ denoted as ( )( ) ( )( )2211 ,, BABA ϕ≤ϕ if and only if ( ) ( )21 AA ϕ⊆ϕ ( )12 BB ⊆⇔ . Equivalently, ( )22, BA is the super concept of ( )11, BA .

Definition 7: A fuzzy concept lattice of a fuzzy formal context Κ with an Cut−α is a set C of all fuzzy concepts of K with the partial order ≤ with the Cut−α value, noted as ( )Cℑ .

Definition 8: The similarity s of a fuzzy formal concept ( )( )111 , BAC ϕ= and its sub concept ( )( )222 , BAC ϕ= is defined as:

( )21

2121,

CCCC

CCS∪∩

= .

The corresponding fuzzy concept lattices, noted as TAH’s

are given by the line diagrams presented in Figure 4.

Figure 4a: DB TAH

Figure 4b: PL TAH.

Figure 4: DB TAH and PL TAH

IJCA, Vol. 16, No. 4, Dec. 2009

7

0.00

0.47

({S5(0.95),S8(0.51),S10(0.79)}) ({S2(0.95),S6(0.77),S8(0.48)})

({S4(0.48),S8(0.48)})

({S8(0.48)})

({S1(0.99),S2(0.95),S3(1.00)}({S7(0.98)})

0.00

0.42 0.20

0.63

0.50

0.00

({S2(0.95)})

({S1(0.99),S2(0.95),S3(1.0),S4(0.48),S6(0.77),S8(0.48)},{C5})

({S4(0.98),S8(0.51),S9(0.94)},{C1})

({S1(0.94),S2(0.58),S3(0.99),S7(0.93}},

({S2(0.39),S5(0.93),S6(1.00),S8(0.41),S10(0.83)},{C3})

({S1(0.0),S2(0.0),S3(0.0),S4(0.0),S5(0.0),S6(0.0),S7(0.0),S8(0.0),S9(0.0),S10(0.0)},{Φ })

0.47

0.430.32

({S4(0.51),S5(0.95),S7(0.98),S8(0.51),S9(0.98),S10(0.79)},{C4})

({S4(0.51),S8(0.51),S9(0.98)})

This very simple sorting procedure gives us for each fuzzy many-valued attribute the distribution of the objects in the fuzzy line diagram of the chosen fuzzy scale. The well-known histograms for one variable arise as special cases from fuzzy line diagrams. 4.2 MTAH Generation

Usually we are interested in the interaction between two or

more fuzzy many-valued attributes. This interaction can be visualized using the so-called fuzzy nested line diagrams. This fuzzy nested line diagrams are an extension of fuzzy concept lattices. They are used for:

- Visualizing larger fuzzy concept lattices - Emphasizing sub-structures and regularities. - Combining fuzzy conceptual scales on-line.

Figure 5 shows the fuzzy nested line diagram constructed from Figure 4.

In this fuzzy nested line diagram, we are interested to see for each diagram concept represented in Figure 5 how these students are distributed in the fuzzy PL scale. We blow up

each circle of DB TAH of Figure 4 and insert the ‘PL’ TAH presented in Figure 4. Hence, Figure 5 represents all pairs ( )dc, of concepts c from the first and concepts d from the second TAH. This structure is called the direct product of the two given fuzzy lattices.

From the fuzzy nested lattice, we can draw a nice usual fuzzy lattice, noted MTAH of the same fuzzy context.

5 Example of Relieving Query

The concern always remains, after having stored theses

structures, to be able to interrogate the database and to have relevant answers to the queries. In this section, we present our flexible interrogation approach which consists of the research and the scheduling of the relevant user’s queries answers.

5.1 Construction of the Query Concept

When it carries out a search for information in a data base, the user is very often submerged by the mass of returned answers. To cure this problem, we propose formalism for relevant data sources research for a given query. For better explaining the steps of the proposed approach, we

Figure 5: Fuzzy nested lattice

C4 C5

IJCA, Vol. 16, No. 4, Dec. 2009

8

consider a relational database table describing the notes of the students in some modules. The relations of this database are as follows where the primary key of each relation is underlined:

Student (Id_Student, Name, Surname, Address) Module (Id_Module, Label_Module, Coef_Module) Module_Notes (Id_Student, Id_Module, Note)

Let us consider the following query: listing the students who have the notes 14 in DB and 13 in PL.

In this query, the user wishes that its preferences be

considered according to the descending order: DB and PL. In other words, returned data must be ordered and presented at the user according to these preferences. Without this flexibility, the user must refine these search keys until obtaining satisfaction, if required, since it does not have precise knowledge on the data which it consults.

According to the criteria of the query Q , only the A2 and A4 criteria correspond to relievable attributes.

Initially, we determine starting from the DB the tuples, satisfying the non relievable criteria (A1 and A3), result of the following query:

Select Id_Student, Name, Surname From Student, Module, Module_Notes Where Label like “DB” (A1) and Student.Id_Student=Module_Notes.Id_Student and Module.Id_Module=Module_Notes.Id_Module Union Select Id_Student, Name, Surname From Student, Module, Module_Notes Where Label like “PL” (A3) and Student.Id_Student=Module_Notes.Id_Student and Module.Id_Module=Module_Notes.Id_Module

These tuples are broken up into clusters according to labels

of the relievable attributes DB and PL. We obtain for the previously mentioned example the TAH’s of Figures 4. These TAH’s are overlapped in a nested lattice represented in Figure 5. Finally, the MTAH is built to make it possible to extract the dependences between database attributes.

Once the MTAH is built, the research of the approximate answers can start. For this, we define a query concept

( )BA QQQ ,= where AQ is a name to indicate a required extension and BQ is the set of clusters describing the data reached by the query. The set BQ of clusters is determined by the following procedure:

Procedure Construction of the query concept Input: Vector ( ) ( ){ }ACjvAV j ,...,1: == of cluster centers of

relievable attribute A and the value of Q associated to this last.

Output: Query concept ( )BA QQQ ,= . Begin Step 1: Calculate the membership degrees of the specified

clusters for each value of the criterion of Q associated to the relievable attribute A .

Step 2: Apply Cut−α to generate the fuzzy context. Step 3: Form the set BQ of clusters whose membership is

higher than the Cut−α value. End Procedure

5.2 Flexible Query Modeling

To improve the data representation, we can define the

terminals of the intervals of distribution in each generated cluster. With this intention we must define the minimal and maximum values of each interval. In the case of DB attribute, its first cluster contains three data. The minimal value (respectively maximum) in this cluster is that of which the value of DB attribute is minimal (respectively maximum).

With these generated intervals, the expert can assign labels to each relieving attribute. For example, for the relieving attribute DB, the users can refer to it by one of the three following linguistic terms: "weak", "average" and "well". They can use it also with an exact value.

We define a query concept ( )BA QQQ ,= where { }QueryQA = , i.e., a name for the query extension and

BQ is the set of metadata (clusters) describing the sources sought by the query.

The labels associated with the relievable attributes DB and LP are those of Figures 6 and 7.

These metadata are given with part of the fuzzy clustering operation to determine data membership’s degrees in the various clusters.

Table 5 gives the membership degrees associated to the query. These degrees are obtained based on memberships matrix obtained by a fuzzy clustering algorithm.

Then, we apply the Cut−α for each attribute to minimize the number of concepts. We obtain the reduced context request presented by Table 6.

According to our example, the query 0Q seek the data sources having the metadata =BQ {(C2,C3,C5)}.

Once definite, the concept Q is inserted in the lattice by using the incremental construction algorithm of Godin [11, 12].

Select Id_Student, Name, Surname From Student, Module, Module_Notes Where Label like “DB” (A1)and Note = 14 (A2)and Student.Id_Student=Module_Notes.Id_Student and Module.Id_Module=Module_Notes.Id_Module Union Select Id_Student, Name, Surname From Student, Module, Module_Notes Where Label like “PL” (A3)and Note = 13 (A4)and Student.Id_Student=Module_Notes.Id_Student and Module.Id_Module=Module_Notes.Id_Module

Q :

Q :

IJCA, Vol. 16, No. 4, Dec. 2009

9

C1 : Weak C2 : Average C3 : Well

[Inf=5 Sup=9] [Inf=9 Sup=14] [Inf=14 Sup=17]

C4 : Weak C5 : Well

[Inf=5 Sup=10] [Inf=11 Sup=15]

Figure 6: Linguistic terms associated to the DB attribute

Figure 7: Linguistic terms associated to the PL attribute

Table 5: Query memberships degrees DB PL

C1 C2 C3 C4 C5 0.030 0.582 0.388 0.105 0.895

Once the query concept is built, the user can use linguistic

terms in the criteria of qualification of the required data and express preferences between these criteria.

To illustrate this problem, let us consider a user who consults a database of the students. The user wishes to find a student having an "average" note in DB and a " well " note in PL

The user wishes that the qualification subjected to the evaluation be as follows:

Table 6: Query memberships degrees DB PL

C1 C2 C3 C4 C5 - 0.582 0.388 - 0.895

Proposition: A data source S is relevant for a given query ( )BA QQQ ,= if and only if S is characterized by at least one of

the meta-data given fromBQ . The relevance degree of S is

given by the number of meta-data that S divides with BQ .

This definition of relevance is at the base of the research process which is detailed in the rest of this section and illustrated by an example. It is different from the vicinity concept used in [5, 6], which can lead to obtaining the data divide no metadata with the query that does not correspond to our needs.

Let given a query ( )BA QQQ ,= , all the relevant data sources are in the extension of Q and of its subsumers in the concepts lattice since the intention of each one of these concepts are included in

BQ (the intension of the query concept). Figure 8 shows the lattice of the query concept.

5.3. Checking of the Query Realizability

Now we can check if the query is realizable or not. Indeed, if the query criteria are in contradiction with their dependences extracted from the database, it is known as unrealizable. To check if a query Q is realizable, we define a query concept

15, 14, 16, 7, 11, 12, 17, 9, 5, 13

5, 7, 9 9,11,12,13,14 14,15,16,17

14, 15, 13, 10, 5, 11, 6, 7

5, 6, 7, 10 11, 13, 14, 15

Select Id_Student, Name, Surname From Student, Module, Module_Notes Where Label like “DB” and Note “average” and Student.Id_Student=Module_Notes.Id_Studentand Module.Id_Module=Module_Notes.Id_Module Union Select Id_Student, Name, Surname From Student, Module, Module_Notes Where Label like “PL” and Note “well” and Student.Id_Student=Module_Notes.Id_Studentand Module.Id_Module=Module_Notes.Id_Module

Q :

IJCA, Vol. 16, No. 4, Dec. 2009

10

0.00

0.00

0.00

({S1(0.94),S2(0.58),S3(0.99),S7(0.93},Query},{C2})

({S1(0.99),S2(0.95),S3(1.00),S4(0.48),S6(0.77),S8(0.48),Query},{C5})

({S2(0.39),S5(0.93),S6(1.00),S8(0.41),S10(0.83),Query},{C3})

({S1(0.99),S2(0.95),S3(1.00),Query},{C2,C5})

({S2(0.95),S6(0.77),S8(0.48),Query},{C3,C5})

({S2(0.95),Query},{C2,C3,C5})

0.00

0.43 0.32

0.37

0.63

0.47 0.65

({S1(0.0),S2(0.0),S3(0.0),S4(0.0),S5(0.0),S6(0.0),S7(0.0),S8(0.0),S9(0.0),S10(0.0), Query},{Φ})

({Query},{C1,C2,C3,C4,C5})

( )BA QQQ ,= where { }QueryQA = , i.e., a name for the query extension and

BQ is the set of metadata (clusters) describing the sources sought by the query.

Proposition: A query Q is unrealizable if and only if ∃

data source in AQ which divide any metadata of the set BQ . Example: Let us take again the case of the database of

students and suppose that after consultation of the clusters relievable attributes of the query, the criteria of the user will be expressed by the following labels:

In addition, the query concept is modeled in the following

form ( ,QueryQ = {DB Average, PL Well}.

5.4 Scheduling of the Resulting Answers Relevant data sources can be sorted according to the

distance separating the concepts in the lattice. This step consists in ordering the n-uplets obtained according to their satisfaction degrees of the initial query. To ensure this operation, we note by ( )CQR , the set of relevant data sources for the considered query as a set of formal concepts C of formal context K .

The set of data appear in the extension of Q in the query concept lattice is inserted in the list ( )CQR , .

Let ( )1,,CQSUBS the set of subsumers of Q in ( )CQR , . The set of the data sources which appear in the extensions of the concepts of ( )1,, CQSUBS and which is not already in ( )CQR , are added to the result with this stage. The following

step consists in determining ( )2,, CQSUBS the set of the concept’s subsumers of ( )1,,CQSUBS (or subsumers of distance 2 of query).

In the same way for ( )1,,CQSUBS , new data sources in the extensions of the concepts of ( )1,,CQSUBS are added to the result ( )CQR , . The same operation is carried out until reaching a unit

( )nCQSUBS ,, which is empty (stop condition of the algorithm). With each step, the data sources appear in a concept with empty intension are ignored. The row of a source in ( )CQR , can be memorized according to the distance from the

source (or of the first concept in which the source appears) at the query in ( )QC⊕ℑ .

In Figure 8, numbers presented with concepts represent the iterations of the algorithm explained previously. In the first iteration the query concept { }( )5,3,2, CCCQQ A= is considered. In this example, extension { })95.0(2SQA = .

Figure 8: Query concept lattice

DB Average PL Well

IJCA, Vol. 16, No. 4, Dec. 2009

11

The second iteration makes it possible to add to the result ( )CQR , the students S1, S3, S6 and S8 with their respective

degrees 0.99, 1, 0.77 and 0.48 because concepts ({S1(0.99), S2(0.95), S3(1.00), Query}, {C2,C5}) and ({S2(0.95), S6(0.77), S8(0.48), Query}, {C3,C5}) (forming the set

( )1,, fCQmaj ) subsume the query concept

})5,3,2{,( CCCQQ A= . With the third iteration, students S7, S4, S5 and S10 are added to ( )fO CQR , with the respective degrees 0.93, 0.48, 0.93 and 0.83 because ( )=2,, fCQmaj (({S1(0.94), S2(0.58),

S3(0.99), S7(0.93}, Query},{C2}), ({S2(0.39), S5(0.93), S6(1.00), S8(0.41), S10(0.83), Query},{C3}), ({S1(0.99), S2(0.95), S3(1.00), S4(0.48), S6(0.77), S8(0.48), Query}, {C5}).

With the fourth iteration ( ) { } { }( ){ }Φ= ,10,9,8,7,6,5,4,3,2,13,, SSSSSSSSSSCQmaj f .

The only forming concept )3,,( fCQmaj is a concept having an empty intension. No data is thus added to

( )fO CQR , with this step and the algorithm stops since ( ) φ=4,, fCQmaj . Thus ( )fO CQR , is consisted of the data

as follows: 1) ({S2(0.95)},{C2,C3,C5}) 2) ({S1(0.99), S3(1.00)},{C2,C5}) 3) ({S6(0.77),S8(0.48)},{C3,C5}) 4) ({S4(0.48)},{C5})

5) ({S7(0.93}},{C2}) 6) ({S5(0.93),S10(0.83)},{C3}) The search algorithm for the relevant data sources is given

below.

5.5 Satisfaction Degree

The generated data are ordered according to a satisfaction degree measured compared to the initial request. To determine this degree, we introduce the following notations:

- P is the number of major direct concepts (with

distance P ) from query concept ( )BQQueryQ ,= in ⊕fT . - { }NbpArc p ,...,1: = is the set of arcs of binding the query

concept ( )BQQueryQ ,= with the concept { }PTC fp ,...,1:=∈ ⊕

- ),( 1−pp CCWeight is the similarity between the concepts

pC and 1+pC .

- The satisfaction degree, noted DS , of a query concept ( )BQQueryQ ,= with the concept ⊕∈ fTC is

given by the following equation:

)),(),((1

111 ∑

=−− −=

Nb

ppppp CCWeightCCWeightMaxDS

Begin: 1. Build the concept ( )BA QQQ ,= . 2. Construct the query concept lattice ( )Qℑ . 3. Seek in ( )Qℑ the new concept ( )BBA QQQQ ,'∪= 4. lebel:=0 5. ( ) { }QlevelKQSUBS =:,, (initialization of the query subsumers research Q in the lattice in function of their level or distance

from Q 6. ( ) φ=:, CQR (initialization of the result) 7. While ( ) φ=:,, levelCQSUBS do 7.1. ( ) φ=:,, levelCQR 7.2. For concept ( ) ( )levelCQSUBSBAC ,,, ∈= do

a.If φ≠B then ( ) ( ) AlevelCQRlevelCQR ∪= ,,:,,

b.End If 7.3. End For 7.4. ( ) ( ) ( )levelCQRCQRCQR ,,,:, ∪= 7.5. Build ( )1,, +levelCQSUBS the set of direct concepts subsumers of ( )levelCQSUBS ,, 7.6. level := level +1 8. End While 9. Outputting ( )CQR , , the set of relevant data sources for Q and the concept’s similarity as satisfaction degrees. End

IJCA, Vol. 16, No. 4, Dec. 2009

12

The weights ),( 1−pp CCWeight are given by the following equation:

( ) ( )( ) ( )1

11),(

−−

ϕ∪ϕ

ϕ∩ϕ=

pp

pppp

CC

CCCCWeight

In our example, we can calculate the satisfaction degrees of

the various generated answers. These degrees are given in Table 7.

Table 7: Satisfaction degree of the generated answers

Students Clusters Satisfaction degree S2 C2,C3,C5 100%

S1,S3 C2,C5 32% S6, S8 C3,C5 43%

S4 C5 31% S7 C2 33%

S5,S10 C3 6%

As show in Table 7, the result of the query is given to several levels according to a satisfaction degree measured compared to that initial one.

6 Comparison With Other Approaches

In this sub-section, we present the essential idea of the principal existing flexible querying approaches closest to our approach. Those differ primarily by the manner used to find the values closest to those required by the user and the used formalism to model the uncertainty and the imperfection of the real world.

The literature on the flexible querying and the co-operative systems abounds. We can distinguish three principal categories. The first category, indicated by C1, includes “ad hoc” approaches specific to particular systems. The objective of such approaches is the introduction of flexibility by the use of linguistic terms and the specification of the preferences of the users between the various search keys from the desired data. A study of art on C1 is in [3] and [4]. Among the approaches of C1, we can quote the systems ARES [17], MULTOS [24], SEAVE [19], FLEX [20].

Second category approaches indicated by C2, used the formalism of the sets and fuzzy logic to model in addition to the imperfection and the uncertainty of the real world, the evaluation of the query known as vague or fuzzy. The principal common points between these approaches are as follows: modification of the query language, generally SQL [2, 4, 22]. This modification consists in introducing vague linguistic terms, like “accessible price” or “large budget”, and of the operators of approached comparison like “Near-to” and “similar-to” of the system CoBase [9]. Each vague linguistic term is modelled by a vague unit whose function of membership is most often of trapezoidal type.

To not modify the DBMS system, these systems add an additional layer charged to transform a fuzzy query into a

traditional one known as “wraps query”. This one is subjected to the target DBMS for evaluation. Its result is then filtered according to preferences of the user before being presented to him. This process of transformation and filtering is based on established properties of the sets and fuzzy logic.

The third category, indicated by C3, comprises approaches which lie within the scope of the artificial intelligence techniques and aim at determining tacit knowledge starting from the explicit data. Several systems like DBLEARN [14], DB-Discover [25] and GBDR [7], belong to this category. Generated knowledge is in the form of rules or of hierarchy of concepts. Their objective is not to release the querying of the databases or to seek approximate answers to queries with the common direction of the flexible querying, although the tacit knowledge can be regarded as approximate answers. The results obtained by these approaches, in particular within clustering, are of a great utility for this work.

The contributions of approaches of C2, such as for example those of CoBase, are significant, in particular the concepts of TAH and MTAH, to model generalization and specialization by hierarchies of concepts. However, we estimate these systems remain demanding with respect to the end-users. For example, in CoBase, the operators used require a precise knowledge of the contents of the database,. It does not detect the realizability of a query only after its execution. CoBase can also generate false answers [9]. The users must also know the organization of the database since they must specify the attributes which they must release or not as well as the level of relieving of each attribute.

In [21], no modification of SQL is necessary, which constitutes an asset for the practical application of this approach. The user is not to solicit during relieving to make choices, which can be hazardous, as it is the case in several systems such as Flex, Vagueness and CoBase, to quote only those. In this approach, the relieving attributes are fixed by the administrator of the database. This is all the more significant since the approach suggested is addressed to end-users not having the knowledge precise and detailed on the organization and the data which they consult. It is easier to an expert to specify than a price attribute of a table of the database is relievable and than it can be used with the terms “weak” or “accessible”. This is easier than to use the operator “Within” (100,120,150,300) of CoBase. However, this approach presents limits at the level of the structures which it uses. We quote:

a) The incremental maintenance of the base of knowledge

of the relievable attributes (BCAR), b) The clustering of the relievable attributes without fixing a

priori the number of clusters; and c) The problem of storage of the clusters and indexing of

the MTAH.

In the proposed approach, the clusters generated for each relievable attribute are not stored anymore in the catalogue of the DBMS. So, the maintenance of this meta-base does not pose any problem. Indeed, to be able to trace the lattices, it is quite simply necessary to charge an XML file which makes it

IJCA, Vol. 16, No. 4, Dec. 2009

13

possible to recover all information necessary to the tracing of these lattices.

XML parsers recover information and recall the lattice starting from the methods of constructions of these structures. In this file are backed up:

- the title of the lattice. - identifiers of the concepts, their positions with the styles

of the labels of the objects and attributes of the concept. - the set of data and attributes of each concept. - the set of the arcs and the concepts which they bind.

This parser also allows curing the problem of storage of the clusters and indexing of the MTAH.

The problem of clustering does not arise with this approach since the approaches of clustering suggested allow, in addition to the optimization of the number of clusters, the evaluation of the quality of the latter.

7 Study of Complexity A study of special and temporal complexities of the

proposed approach is presented in this section.

- Space complexity: In the field of space complexity, we store only XML files. The clusters of the relievable attributes are not stored anymore in the BCAR. What constitutes an asset for the practical application of this approach.

- Temporal complexity: Temporal complexity includes the following costs: a) construction of the clusters of the relievable attributes, b) construction of lattice and scheduling of the approximate answers.

For the construction of the clusters of the attributes

relaxables, we calculated the theoretical complexity of the approaches of clustering suggested. It is equal to ( )2NcO , where N corresponds with the number of data and C is the maximum number of clusters.

For the construction of the lattice, temporal complexity depends on the method of adopted construction. We present in Table 8 a study of the complexity of some algorithms of construction of the lattices. [28] The complexity of the generation and scheduling of the approximate answers is about ( )levelnO * where n is the number of concepts from the lattice and level corresponds to the number of levels present in the lattice. Table 8: Study of temporal complexity of Lattice construction

algorithms Algorithm Temporal Complexity

Bordat O(n.|N|.(|N| + |M|)), n is the number of concepts

Nourine et Raynaud

O(n.|N|.(|N| + |M|)), is the number of concepts

Ganter O((max(|N|, |M|)).(|N|.|M|)) Godin Quadratic compared to the number of

elements in the lattice of concepts.

8 Conclusions

In this paper we have proposed an approach for flexible querying database based on an extension of ordered lattice Theory.

The proposed approach consists of the following steps: the first step consists of data organization for dependence’s extraction. For this we have proposed to combine two data analysis techniques: fuzzy cluster analysis and Formal Concept Analysis based on an ordered lattice theory extension. The first technique is applied for allowing objects of a data set to belong to several clusters simultaneously, with different degrees of memberships. Despite being a very effective technique, the mutual relationships between specific clusters of interest are masked. We have used a second data analysis method which is based on an extension ordered lattice theory for the generation of these relationships. The result is an incremental Multi-attributes Type Abstraction Hierarchy (MTAH).

The second step consists of database interrogation. For this we have proposed to use the MTAH generated from the first step for seeking, to interrogate them, relevant data sources.

We have proposed a new formalism for scheduling the result data sources based on concept similarity. We can extend this work to discover implication rules in databases. Future investigations should take this into account.

References

[1] J. C. Bezdek, Pattern Recognition with Fuzzy Objective

Function, Plenum Press, New York, 1981. [2] P. Bosc, M. Galibourg, and G. Hamon, “Fuzzy Querying

with SQL: Extensions and Implementation Aspects,” Fussy sets and Systems, 28:333-349, 1988.

[3] P. Bosc, L. Liétard, and O. Pivert, “Bases de Données et Flexibilité: Les Requêtes Graduelles,” Techniques et Sciences Informatiques, 17(3):355-378, 1998.

[4] P. Bosc and O. Pivert, “Some Approaches for Relational Databases Flexible Querying,” Journal of Intelligent Information Systems, 1(3-4): 323-354,1992.

[5] C. Carpineto and G. Romano, “Order-Theoretical Ranking,” Journal of the American Society for Information Science, 51(7):587-6001,2000.

[6] C. Carpineto and G. Romano, Concept Data Analysis: Theory and Applications, John Wiley & Sons, 2004.

[7] C. L. Carter, H. J. Hamilton, W. B. Hase, and C. Rivera, “GDBR: An Optimal Relation Generalization Algorithm for Knowledge Discovery from Databases,” Department of Computer Science, University of Regina, 1998.

[8] W. W. Chu, K. Chiang, C. Hsu, and H. Yau, “An Errorbased Conceptual Clustering Method for Providing Approximate Query Answers,” Communications of ACM, 39(12):216-230. 1996.

[9] W. W. Chu, H. Yang, K. Chiang, M. Minock, G. Chow, and C. Larson, “CoBase: A Scalable and Extensible Cooperative Information System,” Journal of Intelligence Information Systems, Kluwer Academic Publishers, Boston, Mass, 6(2-3):223-259, 1996.

IJCA, Vol. 16, No. 4, Dec. 2009

14

[10] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer, Verlag, 1999.

[11] R. Godin, R. Missaoui, and H. Alauoi, “Incremental Concept Formation Algorithms Based on Galois (Concept) Lattices,” Computational Intelligence, 11(2):246-267, 1995.

[12] R. Godin, G. Mineau, R. Missaoui, and H Mili, “Méthodes de Classification Conceptuelle Basée Sur Les Treillis de Galois et Application,” Revue d’intelligence Artificielle, 9(2):105-137, 1995.

[13] M. Halkidiand and M. Mazirgiannis, “Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set,” Proceedings of the IEEE International Conference on Data Mining (ICDM.01), San Jose, California, USA, pp 187-194, 2001.

[14] H. J. Hamilton and D. R. Fudger, “Estimating DBLEARN’s Potential for Knowledge Discovery in Databases,” Computational Intelligence, 11(2):280-296, 1995.

[15] M. Jamil. Hasan and S. Fereidoon, Recognizing Credible Experts in Inaccurate Databases, Springer Berlin/Heidelberg, 869/1994:46-55, 2006.

[16] V. N. Huynh and Y. Nakamori, “Fuzzy Concept Formation Based on Context Model,” Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies, N. Baba et al., Editors, IOS Press, Amsterdam, pp 687-691, 2001.

[17] T. Ichikawa and M. Hirakawa, “ARES: A Relational Database with the Capability of Performing Flexible Interpretation of Queries,” IEEE Transactions on Software Engineering, 12(5):624-634, 1986.

[18] S. Laszlo and N. Amedeo, “Les Treillis de Galois Pour l’organisation et la Gestion des Connaissances,” Proceedings of 11`emes Rencontres de la Société Francophone de Classification - SFC ’04, Bordeaux, France, pp 298-301, 2004.

[19] A. Motro, “SEAVE: A Mechanism for Verifying User Presuppositions in Query Systems,” ACM Transactions on Information Systems, 4(4):312-330 ,1986.

[20] A. Motro, “FLEX: A Tolerant and Cooperative User Interface to Databases,” IEEE Transactions on Knowledge and Data Engineering, 2(2):231-246, 1990.

[21] H. Ounelli and R. Belhadj, “Interrogation Flexible et Coopérative d'une BD par Abstraction Conceptuelle Hiérarchique,” Proceedings of INFORSID, pp 41-56, 2004.

[22] E. K. Park and S. Yoon, “An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches,” 32nd Hawaii International Conference on System sciences, 1999.

[23] O. Pivert, Contribution à l’interrogation Flexible de Bases de Données: Expression et Évaluation de Requêtes Floues, PhD Thesis, Université de Rennes 1, December 1991.

[24] F. Rabitti and P. Savino, “Retrieval of Multimedia Documents by Imprecise Query Specification”, Lecture Notes in Computer Science, 416:203-218, 1990.

[25] C. B. Rivera and C. L. Carter, “A Tutorial Guide to DB-Discovrer,” Version 2.0, Technical Report CS-95-05, University of Regina, pp. 280-296, 1995.

[26] M. Sassi, A. Grissa Touzi, and H. Ounelli, “Using Gaussians Functions to Determine Representative Clustering Prototypes,” 17th IEEE International Conference on Database and Expert Systems Applications, Poland, pp 435-439,2006.

[27] K. Uri and Z. Jianjun, “Fuzzy Clustering Principles, Methods and Examples,” Technical Report, Technical University of Denmark, Department of Control and Engineering Design (IKS). 13 pages, December 1998.

[28] P. Valtchev and R. Missaoui, “Building Concept (Galois) Lattices from Parts: Generalizing the Incremental Approach,” Proceedings of the 9th International Conference on Conceptual Structures 2001, H. Delugach, G. Stumme (éd.), LNCS, Springer Verlag, 2120:290-303, 2001.

[29] R. Wille, “Lattices in Data Analysis: How to Draw them with a Computer,” I. Rival (ed.): Algorithms and Order, Kluwer, Dordrecht-Boston, p.33-58, 1989.

[30] K. E. Wolff, “Information Channels and Conceptual Scaling,” International Conference on Conceptual Structures (ICCS'2000), Darmstadt, 2000.

[31] L. Zadeh, “Fuzzy Sets”, Information and Control, 8(3):338-353, 1965.

A. Grissa Touzi received the Diploma of Engineering in Computer Science and Ph.D. in Computer Science from the Faculty of Sciences of Tunis, (Tunisia) in 1989 and 1994, respectively. Dr. Amel Grissa Touzi is an Assistant Professor at the Department of Technologies of Information and Communications in the National School of

Engineering of Tunis, (Tunisia). Her researches interest includes many aspects of deductive databases, fuzzy databases, and flexible querying.

M. Sassi received the Diploma of Engineering in Computer Science and Ph.D in Automatic and Signal Processing from National School of Engineering of Tunis, (Tunisia) in 2003 and 2007, respectively. Dr. Minyar Sassi is a an Assistant Professor at the

Superior Institute of Data Processing and Techniques of Communication (Tunisia). Her research interest include query optimization, clustering, and flexible querying.

H. Ounelli received a Ph.D in Computer Science (1987) from the University of Paris-Sud Orsay (Paris-France). Since 1988, Dr. Habib Ounalli is a Full Professor in the Computer Sciences Department of Faculty of Sciences of Tunis, (Tunisia). His researches

interest include fuzzy databases, flexible querying, and deductive databases.