Empty versus overabundant answers to flexible relational queries

18
Fuzzy Sets and Systems 159 (2008) 1450 – 1467 www.elsevier.com/locate/fss Empty versus overabundant answers to flexible relational queries Patrick Bosc , Allel Hadjali, Olivier Pivert IRISA/ENSSAT, University of Rennes 1, 6, rue de Kerampont, BP 80518, 22305 Lannion Cedex, France Available online 26 January 2008 Abstract When retrieving and searching desired data over large databases accessible in the web, users might be confronted with two common problems: overabundant answers and empty answers. In the former, the user is provided with an avalanche of responses that satisfy his/her query, while in the latter, no data are returned to the query asked. In this paper, we attempt to address those problems in the context of flexible queries. The basic idea behind the solutions proposed consists in modulating the fuzzy conditions involved in the user query by applying appropriate transformations. According to the problem at hand, this operation leads to a relaxation or an intensification of the user query. Two particular transformations that rely on a convenient parameterized proximity relation are introduced. The predicates of the modified queries are obtained by means of fuzzy arithmetic. The main properties of our proposal are investigated and a comparison with other approaches is outlined. © 2008 Elsevier B.V. All rights reserved. Keywords: Flexible queries; Query relaxation; Query intensification; Proximity relation; Fuzzy intervals 1. Introduction Nowadays, there is more and more interest in using the World Wide Web, especially, for searching and retrieving information over large databases that are available “on-line”. Exploiting Web-based information sources is non-trivial because the user has no direct access to the data (one cannot for instance browse the whole target database and hence it is difficult to be provided with information about the data distribution). Users in general accomplish their search using Boolean queries and an item from the database simply either matches or it does not. In such context, users may be confronted with the following two problems: no data or a very large amount of data are returned. It is worthy to note that these kinds of answers are sometimes informative but we assume here that users are interested in the values of answers, rather than in the cardinality of the set of answers. In the first case, the problem is called the empty answer problem (EAP), that is, the problem of not being able to provide the user with any data fitting his/her query. Users are frustrated by such a kind of answers since they do not meet their needs and expectations. Several approaches have been proposed to deal with this issue. Some of them are based on a relaxation paradigm that expands the scope of the query [2,14,21]. This allows the database system to return answers, related to the original query, what is more convenient than returning nothing. The second problem corresponds to the situation where the user query results in overabundant answers, i.e., there is a huge number of data that satisfy the query. In this case, users are overwhelmed by the hugeness of answers. It is pragmatically impossible Corresponding author. E-mail addresses: [email protected] (P. Bosc), [email protected] (A. Hadjali), [email protected] (O. Pivert). 0165-0114/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2008.01.007

Transcript of Empty versus overabundant answers to flexible relational queries

Fuzzy Sets and Systems 159 (2008) 1450–1467www.elsevier.com/locate/fss

Empty versus overabundant answers to flexible relational queriesPatrick Bosc∗, Allel Hadjali, Olivier Pivert

IRISA/ENSSAT, University of Rennes 1, 6, rue de Kerampont, BP 80518, 22305 Lannion Cedex, France

Available online 26 January 2008

Abstract

When retrieving and searching desired data over large databases accessible in the web, users might be confronted with twocommon problems: overabundant answers and empty answers. In the former, the user is provided with an avalanche of responsesthat satisfy his/her query, while in the latter, no data are returned to the query asked. In this paper, we attempt to address thoseproblems in the context of flexible queries. The basic idea behind the solutions proposed consists in modulating the fuzzy conditionsinvolved in the user query by applying appropriate transformations. According to the problem at hand, this operation leads to arelaxation or an intensification of the user query. Two particular transformations that rely on a convenient parameterized proximityrelation are introduced. The predicates of the modified queries are obtained by means of fuzzy arithmetic. The main properties ofour proposal are investigated and a comparison with other approaches is outlined.© 2008 Elsevier B.V. All rights reserved.

Keywords: Flexible queries; Query relaxation; Query intensification; Proximity relation; Fuzzy intervals

1. Introduction

Nowadays, there is more and more interest in using the World Wide Web, especially, for searching and retrievinginformation over large databases that are available “on-line”. Exploiting Web-based information sources is non-trivialbecause the user has no direct access to the data (one cannot for instance browse the whole target database and hence itis difficult to be provided with information about the data distribution). Users in general accomplish their search usingBoolean queries and an item from the database simply either matches or it does not. In such context, users may beconfronted with the following two problems: no data or a very large amount of data are returned. It is worthy to notethat these kinds of answers are sometimes informative but we assume here that users are interested in the values ofanswers, rather than in the cardinality of the set of answers.

In the first case, the problem is called the empty answer problem (EAP), that is, the problem of not being able toprovide the user with any data fitting his/her query. Users are frustrated by such a kind of answers since they do notmeet their needs and expectations. Several approaches have been proposed to deal with this issue. Some of them arebased on a relaxation paradigm that expands the scope of the query [2,14,21]. This allows the database system toreturn answers, related to the original query, what is more convenient than returning nothing. The second problemcorresponds to the situation where the user query results in overabundant answers, i.e., there is a huge number of datathat satisfy the query. In this case, users are overwhelmed by the hugeness of answers. It is pragmatically impossible

∗ Corresponding author.E-mail addresses: [email protected] (P. Bosc), [email protected] (A. Hadjali), [email protected] (O. Pivert).

0165-0114/$ - see front matter © 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.fss.2008.01.007

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1451

to sift through them. A way to reduce the number of answers consists in reconstructing the query by adding anothercondition. However, it may be difficult for the user to find a condition that effectively restricts the retrieved data.

To the best of our knowledge, only little attention, however, has been paid to the overabundant answers problem(OAP) in the literature. Ozawa and Yamada have addressed this issue in [22,23]. In the first work, they suggest a methodbased on generating macroexpressions of the queried database. These expressions allow for providing the user withinformation about the data distribution. Then, the system identifies the appropriate attribute (the attribute values ofthe data retrieved are dispersed) on which a new condition can be added to reconstruct the query. In the latter work,Ozawa and Yamada propose a cooperative approach that provides the user with linguistic answers using knowledgediscovery techniques. Such answers are described with intentional expressions and summarize the retrieved data. Fromthis information, the user can easily understand what kinds of data were retrieved and can then make a new query thatshrinks the data set according to his/her interests. Let us also mention the work done by Godfrey [17] in which hediscusses the sources of the two above described problems. He claims that finding a balancing specificity in a queryplays a central role in avoiding such problems.

In the context of flexible (or fuzzy) queries, similar problems could still arise. In this context, the EAP is defined inthe same way as in the Boolean case. Namely, there is no available data in the database that somewhat satisfies the userquery. Let us now introduce the fuzzy counterpart of the OAP. 1 It can be stated as follows: there are too many data inthe database that fully satisfy the user query. This means that satisfaction degrees of all retrieved data are equal to 1.Facing this problem, users’ desires are mainly to reduce this very large set of answers and keep a manageable subsetthat can be easily examined and exploited.

Let Q be a flexible query that contains one or several gradual predicates represented by means of fuzzy sets. There arefew works that have addressed the EAP, i.e. when Q fails to produce any answer. These works mainly aim at relaxingthe fuzzy requirements involved in the failing query. Query relaxation can be achieved by applying an appropriatetransformation to gradual predicates of a failing query. Such transformation aims at modifying a given predicate intoan enlarged one by widening its support. Andreasen and Pivert [1] have proposed a linguistic modifier-based approach.Recently, Bosc et al. [3,6] have shown how a tolerance relation modeled by an appropriate parameterized proximityrelation can provide a basis for a transformation that is of interest for the purpose of query weakening. Let us alsomention the experimental platform [8], called PRETI, in information processing which includes a flexible queryingmodule that is endowed with an empirical method to avoiding empty answers to user’s request expressing a search fora house to let. In [24], the authors consider queries addressed to data summaries and propose a method based on aspecified distance to repair failed queries.

Now to cope with the issue of overabundant answers related to Q, the idea is to carry on like above by transformingthe fuzzy constraints contained in Q in order to obtain more restrictive variants. This transformation aims at intensifyingthe query Q to make it more demanding. Shrinking the core of the fuzzy set associated to a given predicate to modifyis the basic required property of this transformation. This property allows for reducing the width of the core and theneffectively decreasing the number of answers to Q with degree 1.

In this paper, we discuss a particular transformation to intensify the meaning of a gradual predicate P. This trans-formation relies on the notion of a parameterized proximity relation. Applied to P, it aims at eroding the fuzzy setrepresenting P by the fuzzy parameter underlying the semantics of the considered proximity relation. The resultingpredicate is semantically not too far from the original one but it is more precise and more restrictive. As will be seen,the desirable property of reducing the core is satisfied. The intensification approach we propose to deal with the OAPis investigated both in single-predicate and multi-predicate queries. Basic features of EAP and OAP are pointed out.Due to their formulations and to the ways they are handled, one can view each problem as the dual of the other. Weshow that this duality relation strongly appears in the case of single-predicate queries, while for conjunctive queriesthis relation misses somewhat of its sense and becomes weaker.

The paper is structured as follows. Section 2 gives some basic definitions and introduces a fuzzy modeling of aproximity relation. Two operations that are the key tools in our approaches for query relaxation and query intensifi-cation are then described. In Section 3, we present in details the problem of empty answers and that of overabundantanswers on the one hand, and discuss how they can be solved in the case of single-predicate queries on the otherhand. Relaxation and intensification strategies to deal with those problems in case of flexible conjunctive queries areinvestigated in Section 5. Last, we briefly recall the main features of our proposal and conclude.

1 The OAP in the context of flexible queries is first introduced in [5].

1452 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

2. Basic notions

We give here some basic notions and definitions that are used later.

2.1. Basic definitions

As mentioned in the Introduction, we focus on flexible (or fuzzy) queries that contain gradual predicates representedby means of fuzzy sets. That kind of queries allows returning a set of discriminated elements according to their globalsatisfaction (instead of a flat set of selected elements). A typical example of a fuzzy query is: “retrieve the employeeswhich are young and well-paid”. Let now Q be a flexible query.

Definition 1 (Atomic query). Q is an atomic query (or a single-predicate query) if it contains only one predicate, i.e.,Q = P where P is a gradual predicate.

Definition 2 (Complex query). Q is a complex query (or a multi-predicate query) if it contains several predicates, i.e.,Q = P1op . . . opPk where Pi is a gradual predicate.

The symbol “op” stands for a connector that can express a conjunction (usually interpreted as a “min” in a fuzzyframework), a disjunction (usually interpreted as a “max”), etc. In our study, we only consider conjunctive queries thatare of the form Q = P1∧ · · · ∧Pk (the symbol “∧” represents the “and” connector), for instance, Q = young∧well-paid.In practice, this is the kind of queries that are often used and they may raise the problem of empty answers (since eachpredicate must be somewhat fulfilled).

Definition 3 (Subquery). Given a query Q consisting of the predicates P1∧ · · · ∧Pk , a query Q′ is called a subqueryof Q iff Q′ = Ps1∧ · · · ∧Psr and {s1, . . . , sr} ⊆ {1, . . . , k}. We say that Q′ is a proper subquery of Q if {s1, . . . , sr} ⊂{1, . . . , k}.

2.2. Absolute proximity relation

The purpose of this section is twofold. First, the notion of a parameterized absolute proximity relation is introduced.Then, we present two operations on fuzzy sets that are of interest in the problems addressed later.

Definition 4. A proximity relation (or a tolerance relation) is a fuzzy relation E on a domain X, such that for x, y ∈ X,

�E(x, x) = 1 (reflexivity),

�E(x, y) = �E(y, x) (symmetry).

The quantity �E(x, y) can be viewed as a grade of approximate equality of x with y. On a universe X which is a subsetof the real line, an absolute proximity relation can be conveniently modeled by a fuzzy relation E of the form

�E(x, y) = �Z(x − y),

which only depends on the value of the difference x − y.

The parameter Z, called a tolerance indicator, is a fuzzy interval (i.e., a fuzzy set on the real line) centered in 0, suchthat:

(i) �Z(r) = �Z(−r). This property ensures the symmetry of the proximity relation E (i.e., �E(x, y) = �E(y, x)).(ii) �Z(0) = 1 which expresses that x is approximately equal to itself to a degree 1.

(iii) The support S(Z) = {r, �Z(r) > 0} is bounded and is of the form [−�, �] where � is a positive real number.(iv) In terms of trapezoidal membership function (t.m.f.), Z can be expressed by the quadruplet (−z, z, �, �) with

� = z + � and [−z, z] denotes the core C(Z) (i.e., {r, �Z(r) = 1}) of Z.

Let us emphasize that by this kind of proximity relation we evaluate to what extent the amount x − y is close to 0. Thecloser x is to y, the closer x − y and 0 are. Classical (or crisp) equality is recovered for Z = 0 defined as �0(x − y) = 1

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1453

FZ

1

a + δ b +δ

A

a b

F

B

A-z B+z

Fig. 1. Dilation operation.

if x = y and �0(x − y) = 0 otherwise. Other interesting properties of the parameterized relation E are given in [10].Furthermore we shall write E[Z] to denote the proximity relation E parameterized by Z.

2.2.1. Dilation and erosion operationsLet us consider a fuzzy set F on the numeric universe U and an absolute proximity E(Z), where Z is a tolerance

indicator. The set F can be associated with a nested pair of fuzzy sets when using E(Z) as a tolerance relation. Indeed,

(i) we can build a fuzzy set FZ close to F, such that F ⊆ FZ . This is the dilation operation,(ii) we can build a fuzzy set FZ close to F, such that FZ ⊆ F . This is the erosion operation.

Let us now show how the above two fuzzy sets can be built.

2.2.1.1. Dilation operation. Dilating the fuzzy set F by Z will provide a fuzzy set FZ defined by

�FZ (r) = sups

min(�E[Z](s, r), �F (s)) = sups

min(�Z(r − s), �F (s)) since Z = −Z

= �F⊕Z(r) observing that s + (r − s) = r.

Hence,

FZ = F ⊕ Z,

where ⊕ is the addition operation extended to fuzzy sets [12]. FZ gathers the elements of F and the elements outsideF which are somewhat close to an element in F.

Lemma 1. Given a fuzzy set F and a proximity relation E[Z], for FZ = F ⊕ Z we have F ⊆ FZ .

In other terms, the fuzzy set FZ is less restrictive than F, but still semantically close to F. Thus, FZ can beviewed as a relaxed variant of F. In terms of t.m.f., if F = (A, B, a, b) 2 and Z = (−z, z, �, �) then FZ =(A − z, B + z, a + �, b + �), see Fig. 1.

Let us emphasize that some practical applications require only one constituent part of a fuzzy set F (either the core orthe support) to be affected when applying a dilation operation. We denote by core dilation (resp. support dilation) thedilating transformation that affects only the core (resp. support) of F. The following proposition shows how to obtainsuch results.

2 Recall that if F = (A, B, a, b) then, the support S(F ) = [A − a, B + b] and the core C(F ) = [A, B]. The scalar a (resp. b) represents theleft-hand (resp. right-hand) spread of F.

1454 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

Proposition 1. Let F be a fuzzy set and E[Z] a proximity relation, we have

• Core dilation is obtained using the family of tolerance indicators of the form Z = (−z, z, 0, 0). 3

• Support dilation is obtained using the family of tolerance indicators of the form Z = (0, 0, �, �).

Making use of this proposition, if F = (A, B, a, b) the core dilation (resp. support dilation) leads to FZ =(A − z, B + z, a, b) (resp. FZ = (A, B, a + �, b + �)).

2.2.1.2. Erosion operation. Let Z ⊕ X = F be an equation where X is the unknown variable. Solving this equationhas extensively been discussed in [11]. It has been demonstrated that the greatest solution of this equation is given byX = F�(−Z) = F�Z since Z = −Z and where � is the extended Minkowski subtraction defined by [11,12]

�F�Z(r) = infs

(�Z(r − s)IT(�F (s))) = infs

(�E[Z](s, r)IT�F (s)),

where T is a t-norm, andIT is the R-implication induced by T and defined byIT(u, v) = sup{� ∈ [0, 1]/T(u, �)�v},for u, v ∈ [0, 1]. We make use of the same t-norm T = min as in the dilation operation which implies that IT is theso-called Gödel implication.

Let (E[Z])r = {s, �E[Z](s, r) > 0} be the set of elements that are close to r in the sense of E[Z]. Then, the aboveexpression can be interpreted as the degree of inclusion of (E[Z])r in F. This means that r belongs to F�Z if all theelements s that are close to r are F. Hence, the inclusion F�Z ⊆ F holds. This operation is very useful in naturallanguage to intensify the meaning of vague terms. Now, eroding the fuzzy set F by Z results in the fuzzy set FZ

defined by

FZ = F�Z.

Lemma 2. Given a fuzzy set F and a proximity relation E[Z], for FZ = F�Z we have FZ ⊆ F .

The fuzzy set FZ is more precise than the original fuzzy set F but it still remains not too far from F semanticallyspeaking. If F = (A, B, a, b) and Z = (−z, z, �, �) then F�Z = (A + z, B − z, a − �, b − �) provided thata�� and b��. Fig. 2 illustrates this operation. In the crisp case, F�Z = [A, B]�[−z, z] = [A + z, B − z] (whileF ⊕ Z = [A − z, B + z]).

Lemma 3. The following semantic entailment holds as well:

FZ ⊆ F ⊆ FZ.

This means that F can be associated with a pair (FZ, FZ) of fuzzy sets where the former (resp. the latter) can beconsidered as an intensified (resp. a weakened) variant of F.

Again and in order to satisfy the needs of some practical applications, only one constituent part of a fuzzy set F(either the core or the support) must be affected when applying an erosion operation. We denote by core erosion (resp.support erosion) the eroding transformation that affects only the core (resp. support) of F. The following propositionallows for obtaining such results.

Proposition 2. Let F be a fuzzy set and E[Z] a proximity relation. We have

• Core erosion is obtained using the family of tolerance indicators of the form Z = (−z, z, 0, 0). 4

• Support erosion is obtained using the family of tolerance indictors of the form Z = (0, 0, �, �).

By this proposition, if F = (A, B, a, b) the core erosion (resp. support erosion) leads to FZ = (A + z, B − z, a, b)

(resp. FZ = (A, B, a − �, b − �)).

3 In t.m.f., if the core changes, the support will also change. In this case, the parts of t.m.f. that are not affected by the dilation are the left-handand the right-hand spreads.

4 The same remark as in Proposition 1 applies for this case, i.e., only the left-hand and the right-hand spreads are unchanged.

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1455

FZ1

b

A+z

a-δ b-δ

F

B-z

Aa

B

Fig. 2. Erosion operation.

3. Empty versus overabundant answers

Let Q be a flexible query and let �Q be the set of answers to Q when addressed to a regular relational database.The set �Q contains the items of the database that satisfy somewhat the fuzzy requirements involved in Q, i.e., eachitem has a strict positive satisfaction degree. Let now �∗

Q denotes the set of answers that fully satisfy the fuzzy con-ditions of Q, i.e., returned items that have the satisfaction degree equal to 1. Obviously, we have�∗

Q ⊆ �Q.

3.1. Empty answers

Here we recall the problem of empty answers in a context of flexible database querying and restate an approach todeal with that problem which relies on the parameterized proximity relation E[Z] [3,6].

Definition 5. We say that Q results in empty answers if �Q = ∅.

This means that no data in the database somewhat satisfies the fuzzy conditions involved in Q. In the literature, thisproblem is known as the EAP. It is the most popular problem approached in the field of cooperative answering. Let usnote that query relaxation is one of the basic approaches that were proposed to deal with this problem. For instance,in the Boolean querying, query relaxation consists in expanding the user query by replacing some query conditions bymore general ones [20] or by just eliminating some conditions [16].

In the context of flexible queries, query relaxation consists in modifying the fuzzy constraints contained in the queryin order to obtain less restrictive variants. Such a modification can be achieved by applying a basic transformationto all or some of the predicates of the query. Some desirable properties are required for any transformation T ↑ whenapplied to a predicate P (T ↑(P ) representing the modified predicate):

RC1: T ↑ does not decrease the membership degree for any element of the domain, i.e., ∀u ∈ domain(A), �T ↑(P )(u)��P (u) where A denotes the attribute concerned by P;

RC2: T ↑ must extend the support S(P ) of the predicate P, i.e. S(P ) = {u/�P (u) > 0} ⊂ S(T ↑(P )) ={u/�T ↑(P )(u) > 0};

RC3: T ↑ preserves the specificity of the predicate P (in order to not significantly alter its semantics) i.e. C(P ) ={u/�P (u) = 1} = C(T ↑(P )) = {u/�T ↑(P )(u) = 1}.

In [3,6], we have shown how the absolute proximity relation E[Z] (discussed in Section 2.2) can provide the basis foran appropriate transformation able to generate enlarged fuzzy predicates, then achieving query relaxation. Other toolsexist to modify a fuzzy set representing a gradual predicate. For instance, fuzzy modifiers are operators transforming afuzzy set into another one [19]. A particular modifier which is of interest for the purpose of query relaxation has been

1456 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

1

0

Q QiP

Fig. 3. Example of membership functions associated with Q, Qi and FP .

studied in [1]. The reader can refer to [3] for a comparative study of the modifier-based approach and our proposal. Inthe following, we present the principle of our relaxation approach in the case of atomic queries.

3.1.1. Relaxing atomic queriesLet P be a fuzzy predicate and E[Z] be an absolute proximity relation parameterized by a tolerance indicator Z of

the form (0, 0, �, �) (i.e., such that only the support of P should be changed). Dilating P allows transforming it into anenlargedfuzzy predicate P ′ defined as follows:

P ′ = T ↑(P ) = P Z = P ⊕ Z.

Clearly, the modified predicate P ′contains the elements of P and the elements outside P which are somewhat close toan element in P. Hence, the transformation T ↑ is not only a technical tool, but it is endowed with a clear semanticsinduced by that of the underlying fuzzy relation E[Z]. Moreover, we can easily check that the desirable propertiesRC1 to RC3 are satisfied by T ↑. Namely, we have: (i) ∀u, �P ′(u)��P (u); (ii) S(P ) ⊂ S(P ′); (iii) C(P ) = C(P ′).Let us note that a transformation of the same nature has already been used in the tolerant fuzzy matching setting [13].In this context, when a pattern does not match any data of a given database, it is replaced by an enlarged one thanksto a similar proximity relation. As will be seen, this paper goes beyond the work addressed by Dubois and Prade [13],in particular: (i) we investigate the basic properties of the transformation; (ii) we show how the transformation canbe iteratively applied but in a controlled way. Besides, our focus is to make use of this transformation in a compoundquery rather than a query made of a single predicate.

Principle of the approach: Let Q = P be an atomic query. If the set of answers to Q is empty, then Q is transformedinto Q1 = T ↑(P ) = P Z . This relaxation mechanism can be repeated n times until the answer to the modified queryQn = T ↑(n)(P ) = P n·Z = P ⊕ n · Z is not empty. In practice, as pointed out in [3], the only difficulty whenapplying this technique concerns its semantic limits (i.e., what is the maximum number of relaxation steps such thatthe final modified query Qn is not too far, semantically speaking, from the original one). Indeed, no intrinsic criterionis attached to this transformation which would enable to stop the iterative process when the answer still remainsempty.

Controlling the relaxation: To enable some control over the relaxation process, one solution consists in asking theuser to specify, along with her/his query, a fuzzy set FP of more or less non-authorized values in the related domain.See Fig. 3. Now, the new query to be considered writes Q′

i = Qi ∩ (FP )c where (FP )c denotes the complement ofFP . Then, the satisfaction degree of an element u becomes min(�Qi

(u), 1 − �FP(u)). Thus the incremental relaxation

process will now stop when the answer to Qi is not empty (�Qi = ∅) or when the core of the complement of the fuzzy

set associated with Qi is included in the core of FP (min(�Qi(u), 1 −�FP

(u)) = 0). The strength of this condition is toallow for filtering items from the database that are totally rejected and retrieving data as soon as they somewhat satisfythe query Q′

i .Let us note that one can attempt to use the following criterion “the support of Qi overlaps the core of FP ” instead

of the second part of the above stopping condition. This is not relevant and does not respect the relaxation principleadvocated, that is, to continue applying the transformation as long as there is a hope of retrieving data which more orless satisfy Q′

i . This case is illustrated in Fig. 3 where in the left side the support of Qi overlaps the core of FP and nosatisfactory element can be returned, but in the right side we can still carry on with the relaxation process until eitherthe set of answer is not empty or overlapping the right part of the core of FP .

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1457

1

35250 Age

Fig. 4. Fuzzy predicate “young”.

Algorithm of the relaxation: This incremental relaxation technique can be sketched by Algorithm 1 (where �Qi

stands for the set of answers to Qi and Qci for the complement of the fuzzy set associated with Qi).

Algorithm 1. Incremental relaxation of an atomic querylet Q := P ;let � be an absolute tolerance value; /* Z = (0, 0, �, �) */i := 0; Qi := Q;Q′

i := Qi ∩ (FP )c;compute �Q′

i;

while �Q′i= ∅ and (C(Qc

i ) ⊂ C(FP )) dobegin

i := i + 1;Qi := T ↑(i)(P ) := P ⊕ i · Z;Q′

i := Qi ∩ (FP )c;compute �Q′

i;

endif �Q′

i = ∅ then

return �Q′i;

endif.

Particular case: Let us emphasize that for some kind of fuzzy predicates to be relaxed, the property of symmetry ofthe tolerance indicator Z is not required. Consider, for instance, the predicate P = (0, 25, 0, 10) expressing the concept“young”, as depicted in Fig. 4. Weakening P comes down to increasing the width of its support S(P ). This has to bedone in the right side of the t.m.f. of P. Then, the appropriate family of the tolerance indicators will be of the formZ = (0, 0, 0, �) that leads to T ↑(P ) = (0, 25, 0, 10 + �).

3.2. Overabundant answers

In the following, we introduce the problem of overabundant answers in the context of flexible database querying andwe show how it can be addressed by means of the parameterized absolute proximity relation E[Z].

Definition 6. We say that Q results in overabundant answers if the cardinality of �∗Q is too large.

This means that the database system returns an avalanche of responses that fully satisfy the user’s requirements. Thisis what we will call the OAP. It is worthy to note that Definition 6 is specific to flexible queries and does not make sensein the Boolean setting since flexible queries express preferences and the notion of satisfaction is a matter of degree. Inthe case of too many items that partially satisfy the query (i.e., whose degrees lie in ]0, 1[), the solution is simple andit consists in considering just an �-cut of the retrieved data with an appropriate high level. This is why our definitiononly concerns retrieved data with degree 1.

Generally, users are overwhelmed by the hugeness of retrieved responses since it is pragmatically impossible to siftthrough them. Then, users would like to reduce this large set of answers and to keep a manageable subset. This problemoften stems from the specificity of the user query that is too general. In other terms, fuzzy requirements involved inthe query are not restrictive enough. To counter this problem, one can refine the query to make it more specific, so as

1458 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

to return a reasonable set of items. This refinement consists in intensifying the fuzzy constraints of the query in orderto reduce the set �∗

Q. To achieve this task, a fundamentally required property of the intensification mechanism is tosignificantly shrink the cores of the fuzzy sets associated with the conditions of the query.

As in the case of query relaxation, query intensification can then be performed by applying a basic transformationT ↓ on all or some predicates of the query. This transformation can be applied iteratively if necessary. Three basicproperties are required for any transformation T ↓ when applied to a predicate P(T ↓(P ) representing the intensifiedpredicate):

IC1: T ↓ does not increase the membership degree for any element of the domain, i.e., ∀u ∈ domain (A), �T ↓(P )(u)��P (u) where A denotes the attribute concerned by P;

IC2: T ↓ must shrink the core C(P ) of the fuzzy predicate P, i.e., C(T ↓(P )) ⊂ C(P );IC3: T ↓ preserves the left-hand (resp. right-hand) spread of the fuzzy predicate P, i.e., if P = (A, B, a, b), T ↓(P ) =

(A′, B ′, a′, b′) with A < A′, B > B ′, a = a′ and b = b′, and A′ − A < a and B − B ′ < b.

The second property allows for reducing the width of the core and then effectively decreasing the number of answerswith degree 1 (i.e., the set �∗

Q). The last property guarantees that the data excluded from the core of P remain in itssupport.

In the following, we show how the notion of parameterized absolute proximity relation E[Z] can provide the basisfor an intensification transformation T ↓. We first investigate the case of an atomic query.

3.2.1. Intensifying atomic queriesLet P be a fuzzy predicate and E[Z] be a proximity relation parameterized by a tolerance indicator Z of the form

(−z, z, 0, 0). Making use of the erosion operation, the predicate P can be transformed into a restricted fuzzy predicateP ′ defined as follows:

P ′ = T ↓(P ) = PZ = P�Z.

This transformation aims at reinforcing the meaning of the vague concept expressed by P. As previously mentioned,the resulting predicate P ′ contains elements r such that all elements that are close to r are in P. This explains thatour transformation is not simply a technical operator acting on the membership degrees but it is equipped with a clearsemantics as well. Now, if P = (A, B, a, b) then T ↓(P ) = (A + z, B − z, a, b), see Fig. 2. As can be checked, theproperties IC1 to IC3 hold. Namely, we have (i) ∀u, �T ↓(P )(u)��P (u); (ii) C(T ↓(P )) ⊂ C(P ); (iii) the left-hand(resp. right-hand) spread is preserved.

Principle of the approach: Let Q = P be a flexible query containing a single fuzzy predicate P. Assume that �∗Q

is too large. In order to reduce the cardinality of �∗Q, we transform Q into Q1 = T ↓(P ) = P�Z. This intensification

mechanism can be applied iteratively until the database returns a manageable set of answers to the modified queryQn = T ↓(n)(P ) = P�n · Z. An implicit measure of nearness such that Qk is nearer to Q than Ql if k < l is theninduced by this intensification strategy to atomic queries.

Let us take a look at the subset of �∗Q resulting from the intensification process. The items of that subset (which are

the answers returned to the revised query Qn) can be viewed as the best answers to the original query Q since theyconstitute the typical values (i.e., prototypes) of the concept expressed by the fuzzy set associatedto Q.

Controlling the intensification: To our opinion, semantic limits of an intensification process are not as crucialas in the case of query relaxation. Indeed, the intensification process of interest only aims at reducing the largeset of answers; not at finding alternative answers. It is worthy, however, to emphasize that the query refinementmust stop when the upper bound and the lower bound of the core (of the modified query Qi) are equal to (A +B)/2. Indeed, the t.m.f. associated to Qi is (A + i · z, B − i · z, a, b). Now, since A + i · z�B − i · z holdswe have i�(B − A)/2z. This means that the maximal query refinement is obtained when the core is reduced to asingleton.

Let us note that the risk to obtain an empty set of answers during the process is excluded when a > z and b > z

(since the data that have been eliminated from �∗Qi−1

related to Qi−1 still belong to the support of Qi . Hence, thosedata are responses to Qi .). Now if a too specific query arises and returns �∗

Q = ∅, one can back up and try anothervariation (for instance, adjust the tolerance parameter Z).

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1459

Algorithm of the intensification: Algorithm 2 formalizes this intensification approach

Algorithm 2. Atomic query intensification.let Q := P ;let Z = (−z, z, 0, 0) be a tolerance indicator;i := 0; Qi := Q;compute �∗

Qi;

while (|�∗Qi

| is too large and i � (B − A)/2 · z) dobegin

i := i + 1;Qi := T ↓(i)(P ) := P�i · Z;compute �∗

Qi;

endreturn �∗

Qi;

Particular cases: Let us note that for some kinds of atomic queries to be intensified, the property of symmetry ofthe tolerance indicator Z is not required. Consider, for instance, the query Q = P where P = (0, 25, 0, 10) whichexpresses the concept “young” in a given context, see Fig. 4. Intensifying P comes down to reduce the width of thecore C(P ) and thus to come closer to the typical values of P. As can be seen, the left part of C(P ) contains the typicalvalues of the concept “young”. Then, the intensification transformation must only affect the right part of C(P ) andpreserve entirely its left part. The appropriate form of Z allowing this transformation is (0, z, 0, 0) which leads toT ↓(P ) = P�Z = (0, 25 − z, 0, 10).

Consider now a query Q of the form Q = ¬P where ¬ stands for the negation (�¬P (u) = 1−�P (u), ∀u) and assumethat it results in overabundant answers. To solve this problem, one applies the intensification mechanism proposed toQ, i.e. shrinking the core of ¬P. It is easy to check that this transformation is equivalent to extending the support of P.This means that applying T ↓ to Q comes down to applying T ↑ to P. So, we have T ↓(¬P) = T ↑(P ). One can easilyverify that T ↑ (¬P) = T ↓(P ) holds as well.

3.2.2. Comparison with other modifier-based approachesAs mentioned in Section 3.1, modifying a linguistic term P represented by a fuzzy set can be also achieved by means

of fuzzy modifiers. Such operators are used either to weaken or to intensify a linguistic term P. Here, we only focuson modifiers leading to an intensifying transformation. Among this type of modifiers two are very popular: poweringmodifiers and shifting modifiers [19]. Let m be a fuzzy modifier (m(P ) denotes the modified predicate):

(i) Powering modifiers: They operate on the membership degrees of the fuzzy set. A particular family of such modifierswhich is of great interest to our problem is that leading to a decrease in the degrees of membership (called therestrictive modifiers). In a formal way, we have

∀u, �m(P )(u) = (�P (u))n with n > 1.

For instance, �very(P )(u) = (�P (u)2). It is easy to see that the entailment �P (u) = 1 ⇒ (�P (u))n = 1 alwayshold. This means that the core of P is never affected by the modification (the property IC2 is not satisfied). Hence,such modifiers are inappropriate for our problem.

(ii) Shifting modifiers: They shift the membership function n units to the left or to the right in the universe of discourse.Formally, we have

∀u, �m(P )(u) = �P (u − n) with n in R.

If P is partially increasing and partially decreasing, m(P ) can never be a subset of P, i.e., the two terms denotedifferent (possibly overlapping) categories. Then, the property IC1 does not hold. Note that if P is non-increasing(resp. non-decreasing) and n < 0 (resp. n > 0), the modified predicate m(P ) satisfies both properties IC1 andIC2. In this particular case, one could then use this approach to solve the problem at hand. But a major shortcomingof this approach lies in the lack of semantics, whereas semantics is our starting point.

In [7], another interesting family of linguistic modifiers m (such as “really”) has been proposed to reinforce themeaning of P. Such modifiers are defined in the following way:

�m(P )(u) = max(0, min(1, (u − A + a)/ · a, (u − B − b)/ · b)),

1460 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

Table 1Basic properties

Criteria Behavior

(i) Symmetrical intensification/relaxation by nature(ii) Attribute domain-dependent and

predicate membership function-independent(iii) Still effective in the crisp case

with ∈]1, (B −A+ a + b)/(b + a)[ and P = (A, B, a, b). Let us take a closer look at the modified predicate m(P ).It is easy to check that S(m(P )) = S(P ) (support preserving) and C(m(P )) ⊂ C(P ) (core reducing). This meansthat both properties IC1 and IC2 are satisfied by the transformation induced by this modifier. Hence, one could usethis approach to deal with the problem at hand as well. Nevertheless, the approach suffers from the similar flaw asabove.

Let us also note that one can use a t-norm T to intensify a fuzzy term P. Unfortunately, this approach is not appropriatefor solving our problem since ∀T, T(1, 1) = 1 (which means that the modification never affects the core of P ).

3.3. Basic features of the two approaches

In this section, we investigate the main features of the two approaches proposed for dealing with the EAP and OAPwhen considering atomic queries. This comes down to studying the properties of the transformations T ↑ (used forquery relaxation) and T ↓ (used for query intensification). To do this, we point out three criteria that seem to be of amajor importance from a user point of view:

(i) Transformation nature: It consists in checking whether the transformation effect in the right and left parts of thet.m.f. of the predicate is similar or not.

(ii) Impact of the domain and the predicate: It consists in verifying whether the attribute domain and the shape (orrelative position) of the predicate membership function can have some impact on the transformation effect.

(iii) Applicability in the crisp case: It consists in verifying if the transformation is still valid for predicates expressedas crisp (or traditional) intervals.

We give hereafter the details of the behavior of the intensification transformation T ↓ with respect to the above threecriteria (the transformation T ↑ satisfies the same properties than T ↓ as we will see):

Criterion (i): By considering the t.m.f. of T ↓(P ), it is easy to see that the effect of the intensification over thecore of P in the right and the left parts is the same and amounts to z. This means that the resulting intensificationis of a symmetrical nature. In the similar way, we can check that T ↑ leads also to a symmetrical relaxation.Criterion (ii): As illustrated in Fig. 5 where P1 and P2 are two predicates related to the same attribute, the relativeposition of the membership function (in the domain of the attribute) has no impact on the intensifying effect. 5

However, the attribute domain is identified as a major factor affecting the intensification because the parameter zis an absolute value which is added and subtracted (for instance, z will be different for the attribute “age” and theattribute “salary”).The same property holds for the transformation of relaxation T ↑.Criterion (iii): It can be easily checked that our transformation T ↓ is still valid for crisp predicates. For example,if P = (22, 30, 0, 0) representing a crisp predicate, then T ↓(P ) writes (22 + z, 30 − z, 0, 0).Also, T ↑ still remains applicable when relaxing conventional queries. For instance, we have T ↑(P )=(22, 30, �, �).

In Table 1, we summarize the behavior of our approaches with respect to the above criteria.

5 It may happen that a user makes use of different tolerance parameters (z) for the same attribute. For instance, one can use a parameter z1 forthe predicate “extremely expensive” that is different from the parameter z2 used for the predicate “quite cheap”. The same remark applies to therelaxation parameters (�).

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1461

z

P1

T↓(P1) T↓(P2)

P2

zz z

Fig. 5. Impact of the slopes and the relative position of the membership functions.

Table 2Duality between EAP and OAP

Definition Handling

EAP On the basis of the emptiness of answer set (with degree > 0) Based on relaxationOAP On the basis of the hugeness of answers (with degree 1) Based on intensification

3.4. Duality

Let Q be a flexible atomic query and let us state the EAP and OAP in the following manner:

EAP : arises when the set of answers to Q with degree > 0 is empty;

OAP : arises when the set of answers to Q with degree 1 is huge.

Now regarding those definitions, one could view OAP as the dual of EAP (and vice versa). See Table 2. Moreover,when considering the way of handling each problem this duality strongly appears. Indeed, EAP is approached on thebasis of the relaxation principle, while OAP is solved on the basis of the intensification principle.

As shown in Sections 3.1 and 3.2, this duality between EAP and OAP is fully meaningful in the case of atomicqueries. In contrast, for conjunctive queries this duality gets fragile and misses somewhat its sense as it will be shownin the next section.

4. Case of conjunctive fuzzy queries

A conjunctive fuzzy query Q is of the form P1∧ · · · ∧Pk , where the symbol “∧” stands for the connector “and” andit is interpreted by the “min” operator 6 and Pi is a fuzzy predicate.

It is worthy to notice that Q can be modified by means of a transformation (in order to perform a relaxation orintensification) in two distinct ways:

(i) a global query modification which consists in applying uniformly a transformation Tj to each predicate Pj . Givena set of transformations {T1, . . . , Tk} and a conjunctive query Q = P1 ∧ · · · ∧ Pk , the set of modified queries

6 Of course, we can use any other t-norm for interpreting this connector.

1462 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

related to Q resulting from applying {T1, . . . , Tk} is

{T (i)1 (P1) ∧ · · · ∧ T

(i)k (Pk)},

where i > 0 and T(i)j means that the transformation Tj is applied i times.

(ii) a local query modification which affects only some predicates (or subqueries). Given a set of transformations{T1, . . . , Tk} and a conjunctive query Q = P1 ∧ · · · ∧ Pk , the set of modified queries related to Q resulting fromapplying {T1, . . . , Tk} is

{T (i1)1 (P1) ∧ · · · ∧ T

(ik)k (Pk)},

where ih �0, T(ih)j means that the transformation Tj is applied ih times to Pj and T 0

j (Pj ) = Pj .

4.1. Relaxing strategy

Let us first recall that in the Boolean case, we distinguish two main approaches that can be used to relax a conjunctiveBoolean query when it fails to produce any answer:

(i) Relaxing by generalization: It consists in transforming query conditions into more general ones[20];

(ii) Relaxing by removal: It aims at deleting some parts of the query in order to obtain a subquery which is lessconstraining [16].

The second approach leads to a subquery which can be considered as an extreme generalization of the query at hand.As will be seen, our relaxation method consists in transforming a failing query into a more general query. This querygeneralization is obtained by replacing some query predicates by more general ones. As stressed in [3], local querymodification leads to modified queries that are closer, semantically speaking, to the original failing query than theones obtained on the basis of the global modification. This is why in our relaxation strategy we make use of the localmodification approach. Let us first introduce some definitions and propositions before sketching the principle of thestrategy.

Proposition 3. Given a query Q = P1 ∧ · · · ∧ Pk , if a subquery Q′ of Q fails, then the query Q itself mustfail.

Definition 7. Let Q = P1 ∧ · · · ∧ Pk be a failing query, Q′ a failing subquery of Q is minimal iff no subquery of itfails.

In general, a failing query can have one or several Minimal Failing Subqueries (MFSs). Identifying MFSs has beenconsidered by several authors [9,15,16,18] as a means of providing cooperative answers to failing Boolean databasequeries. In particular, an efficient algorithm is proposed in [16] to find an MFS of a query of k conjuncts. This algorithmthat proceeds depth-first and top-down, is polynomial and runs in O(k) time. It has been also shown that finding allMFSs of a query is intractable and is NP-complete. However, finding l MFSs, for any fixed l(l < k), can be done inacceptable time.

Let Q = P1 ∧ · · · ∧ Pk be a failing query, T ↑(Q) a relaxed query of Q and SQ[j ] a subquery of Q obtained bydeleting the predicate Pj from Q. Let also mf s(Q) = {Pl1 ∧ · · · ∧ Plm1 , . . . , Pl1 ∧ · · · ∧ Plmh

} be the set of MFSs of Q

with {l1, . . . , lmr } ⊂ {1, . . . , k} for 1�r �h. To characterize the set of MFSs of T ↑(Q) with respect to that of Q, weintroduce the following propositions [4].

Proposition 4. If T ↑(Q) = SQ[j ] ∧ T↑j (Pj ) with j ∈ {l1, . . . , lmr } and 1�r �h, then the MFSs of T ↑(Q) must be

searched in mf s(Q) by substituting T↑j (Pj ) for Pj in each element of mf s(Q).

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1463

T1↑(2)(P1)∧ P2

T1↑(1)(P1)∧ P2 P1∧T2

↑(1)(P2)

P1∧T2↑(2)(P2)

T1↑(1)(P1)∧T2

↑(2)P2T1↑(2)(P1)∧T2

↑(1)P2

T1↑(1)(P1)∧T2

↑(1)P2

P1 ∧ P2

T1↑(3)(P1)∧P2 P1∧T2

↑(3)(P2)

Fig. 6. Lattice of relaxed queries (limited to three levels).

Proposition 5. If T ↑(Q) = SQ[j ] ∧ T↑j (Pj ) with j /∈ {l1, . . . , lmr } for 1�r �h, then mf s(Q) is also the set of MFSs

of T ↑(Q).

Example. Assume that Q = P1 ∧ P2 ∧ P3 ∧ P4 and mf s(Q) = {P1 ∧ P3, P1 ∧ P4}. Then,

• If T ↑(Q) = T↑1 (P1)∧SQ[1] = T

↑1 (P1)∧P2∧P3∧P4, the MFSs of T ↑(Q) are searched in {T (P1)∧P3, T (P1)∧P4}.

• If T ↑(Q) = SQ[2] ∧ T↑2 (P2) = P1 ∧ T

↑2 (P2) ∧ P3 ∧ P4, the MFSs of T (Q) are the MFSs of Q.

As pointed out in [3], local query modification of a query Q = P1 ∧ · · · ∧Pk (where all conditions involved in Q areof the same importance for the user) leads to an ordering (≺) between the revised queries related to Q. That orderingcan be defined on the basis of the number of transformations applied. If Q′ and Q′′ are two relaxed queries of Q, wesay that

Q′ ≺ Q′′ ifk∑

i=1

count(T ↑i inQ′) <

k∑

i=1

count(T ↑i inQ′′).

The set of modified queries related to Q (i.e., {T (i1)1 (P1) ∧ · · · ∧ T

(ik)k (Pk)}) can be then organized in a structure of

lattice. For instance, the lattice associated with the weakening of the query “P1 ∧ P2” is given in Fig. 6.In practice and in order to find a relaxed query related to Q = P1 ∧ · · · ∧ Pk that returns non-empty answers, we

have to deal with the following three main issues when using a local query modification:

(i) Define an intelligent way to exploit the lattice of weakened queries.(ii) Guarantee the property of Equal Relaxation Effect for all fuzzy predicates.

(iii) Study the user behavior with respect to the relaxation process, i.e., to what extent does the user have to intervenein this process?

Due to space limitation, we will not investigate the two latter issues. They have been discussed in [6]. However, asit will be seen in Section 4.2, we will discuss the second issue with respect to query intensification (i.e., property ofEqual Intensification Effect). In what follows, we only show how the scanning of the lattice can be done in an efficientway by exploiting the MFSs of the failed original query.

Scanning the lattice: In general, the search techniques in a lattice are time consuming and result in algorithms whosetime complexity is exponential due to the very large search space. To make more efficient our search over the lattice,we exploit the MFSs of the failed original query. Our search technique consists in two steps:

• Step 1: enumerating l MFSs of Q. To do this, we make use the algorithm proposed in [16] which is designed forcomputing l MFSs in acceptable time (when l is not too large). Such an algorithm can be easily adapted in the case offlexible queries with no main changes as pointed out in [4]. By Propositions 4 and 5, we run only once this algorithmsince the MFSs of any relaxed query T ↑(Q) can be deduced from those of Q.

1464 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

• Step 2: an intelligent search technique. Information about MFSs allows for providing an intelligent search techniqueover the lattice by avoiding evaluating some nodes. Indeed, the node in the lattice that preserves at least one MFS ofits father-node (that is the node from which it is derived) does not have to be evaluated (since we are certain that itfails). This technique is sketched in Algorithm 3 where:

Algorithm 3. MFSs-based search for a relaxed query with non-empty answers.Let Q = P1 ∧ P2 ∧ · · · ∧ Pk ;mf s(Q) = {Pl1 ∧ Pl2 ∧ · · · ∧ Plm1 , . . . , Pl1 ∧ Pl2 ∧ · · · ∧ Plmh

};found := false; i := 1;while (i � · k) and (not found) do

beginLevel(i) := {Q1

i , . . . , Qni

i };for req in Level(i) do

beginmodif :=true;for a_mfs in mfs(father(req)) do

modif :=modif ∧(a_mfs /∈ req);if modif then

if evaluate(req) thenbegin Level(i) = ∅; found :=true end;

if Level(i) = ∅ then compute mfs(req);end;

i := i + 1;end;

If found thenreturn req; /* req is a relaxed query of Q that returns non-empty answers*/

• Level(i) stands for the set of relaxed queries in the level i of the lattice;• father(Q) is the set of nodes from which Q can be derived, i.e., Q is an immediate relaxed variant of any query

contained in father(Q). For instance, in Fig. 6, if Q = T (P1) ∧ T (P2), then father(Q) = {T (P1) ∧ P2, P1 ∧T (P2)};

• evaluate(Q) is a function that evaluates Q against the database. It returns true if Q produces some answers, falseotherwise;

• 7 represents the maximal number of relaxation steps for each predicate Pj (j = 1, k). Hence, the maximal

relaxation of a query Q = P1 ∧· · ·∧Pk is the modified query given by T ↑(max)(Q) = T↑()1 (P1)∧· · ·∧T

↑()k (Pk).

This implies that the lattice is bounded and maximally contains · k levels.

As can be seen, to find the relaxed query with non-empty answers, several follow-up queries must be evaluatedagainst the database. One way to substantially reduce the cost of this evaluation is: (i) to first evaluate the lower boundof relaxed queries, i.e., T ↑(max)(Q) and store the resulting items; (ii) then, each follow-up query will be evaluated onthe basis of the result of T ↑(max)(Q).

4.2. Intensifying strategy

Let Q = P1 ∧ · · · ∧ Pk be a conjunctive fuzzy query. Assume that the set of answers �∗Q to Q is too large. Now

the problem of interest is to reduce the set �∗Q in order to obtain a manageable subset that can be easily examined and

exploited. To do so, one can envisage two options:

(i) Reconstructing a new query by adding a constraint to Q. However, it may be difficult to the user to find a conditionthat effectively reduces the set �∗

Q.(ii) Reinforcing the fuzzy requirements involved in Q by applying an appropriate transformation.

The advantage of the second way is the fact that it needs no access to the target database and only acts on the predicatesinvolved in Q. To describe our intensification strategy, let first consider the following proposition.

7 can be fixed for instance by the user.

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1465

Proposition 6. If Q = P1 ∧ · · · ∧ Pk results in overabundant answers, then each subquery of Q results also inoverabundant answers.

Lemma 4. If Q = P1 ∧ · · · ∧ Pk results in overabundant answers, then each predicate Pi of Q results also inoverabundant answers.

This means that if the OAP arises in a conjunctive query Q, it then arises in each proper subquery of Q. Then, thenotion of MFSs (discussed in Section 4.1 to solve the EAP) has no sense and appears useless for solving the OAP.Furthermore, no counterpart of this notion may exist to be used in looking for a solution to this problem.

It is worthy to notice that solving the OAP consists in reducing the cardinality of the set �∗Q, not in finding alternative

answers. Then, the global query modification which is simple and easy to implement is suitable to achieve this goal inan efficient way. Given a set of intensification transformations {T ↓

1 , . . . , T↓k }, the set of intensified queries related to Q

writes

{T ↓(i)1 (P1) ∧ · · · ∧ T

↓(i)k (Pk)}.

In addition, two arguments are in favor of this kind of modification. First, it operates uniformly on all the predicates ofthe query. Second, it leads to a fast query intensification (i.e., allows for obtaining a reduced subset of �∗

Q in few steps)by acting on all predicates. Unlike the EAP, local query modification is not relevant in the OAP.

Let us note that the ordering (≺) introduced in Section 4.1 may be defined also for intensified queries. Let Q′ andQ′′ be two intensified queries of Q, we say that

Q′ ≺ Q′′ ifk∑

i=1

count(T ↓i inQ′) <

k∑

i=1

count(T ↓i inQ′′).

This ordering allows for introducing a semantic distance between queries. For that semantic distance to make sense,it is desirable that the set of transformation {T ↓

1 , . . . , T↓k } fulfills the property of Equal Intensification Effect (EIE)

on all predicates. Several ways can be used for defining this property. A possible definition is to consider the ratio ofthe lengths of the cores associated to the original and the modified fuzzy predicates. This ratio must be of the samemagnitude for all the predicates involved in Q, when the transformations {T ↓

1 , . . . , T↓k } are applied. One main reason

to express the EIE property in terms of the core is due to the fact that only that part of the predicates is crucial forsolving the problem at hand. Let �(Pi, T

↓i (Pi)) denotes this ratio when T

↓i is applied to Pi , we have

�(Pi, T↓i (Pi)) = L(C(T

↓i (Pi)))/L(C(Pi)),

where L(C(Pi)) (resp. L(C(T↓i (Pi)))) stands for the length of C(Pi) (resp. C(T

↓i (Pi))). A simple calculus enables

to obtain (with Pi = (Ai, Bi, ai, bi) and Zi = (−zi, zi, 0, 0)):

�(Pi, T↓i (Pi)) = 1 − 2zi/(Bi − Ai).

Now, given k predicates P1, . . . , Pk , the EIE property for a set of transformation {T ↓1 , . . . , T

↓k } can be expressed as

follows:

�(P1, T↓1 (P1)) = · · · = �(Pk, T

↓k (Pk)).

Thus, to start the intensification process of a conjunctive query Q = P1 ∧ · · · ∧ Pk , we have to initialize the tolerancevalue zi associated to each predicate Pi . Using the EIE property described above, this initialization reduces to set onlyone value zl (1� l�k) and the other values are automatically deduced. Algorithm 4 sketches our query intensificationtechnique.

Remark. Let Q′ be an atomic subquery (i.e., it contains one predicate Pi , i = 1, k) of Q = P1 ∧ · · · ∧ Pk . Sincereducing the set of answers �∗

Q′ related to Q′ leads also to restrict the set �∗Q, one can intend to solve the OAP by

1466 P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467

iteratively intensifying only the subquery Q′ (rather than all the query Q) until it returns a reasonable subset of �∗Q′ .

In practice, the question that can be raised by this approach is which predicate to choose for intensification (makinga good choice might accelerate the reduction of �∗

Q). Obviously, this question does not arise when applying a globalquery modification.

Algorithm 4. Conjunctive query intensificationLet Q = P1 ∧ · · · ∧ Pk ; /* with Pi = (Ai, Bi , ai , bi ) ∗ /

/* Initialization step */Choose a predicate Pl(1� l �k);(1): Set the tolerance indicator Zl associated to Pl ; /* Zl = (−zl , zl , 0, 0) */j := 1;While (j �k and j = l) do

Compute the tolerance indicator Zj associated to Pj ; /* using the EIE property *//* Intensification process */i := 0; Qi := Q; impossible :=false;compute �∗

Qi;

while { (not impossible) and (|�∗Qi

| is too large) } dobegin

i := i + 1;for j = 1 to k do

if (i � (Bj − Aj )/2 · zj ) then compute T↓(i)j (Pj ) := Pj�i · Zj

else impossible :=true;endif;

if (not impossible) thenbegin

Qi := T↓(i)

1 (P1) ∧ · · · ∧ T↓(i)k (Pk)

compute �∗Qi

end;end;

if (impossible) then adjust the tolerance indicator Zl , goto (1);return �∗

Qi;

4.3. Discussion

Our aim in this section is to investigate whether the EAP and OAP in the case of conjunctive queries continue tobehave as two dual problems. Let us first remark that

(i) In solving the OAP, the modified queries are obtained on the basis of global query modification. While local querymodification seems more relevant to solve the EAP.

(ii) MFSs (or in general the subqueries) of a conjunctive query Q are the key tools in solving EAP. In contrast, MFSsare useless when addressing OAP. Furthermore, we claim that no counterparts of such notions could be used tosolve this latter problem.

From the point of view of handling and due to the above two features, it appears that the duality that holds betweenEAP and OAP misses somewhat its sense and goes weaker when considering conjunctive queries.

5. Conclusion

Introducing fuzziness in queries addressed to regular databases has improved the expressive power of the Booleanquerying and contributed to finding more answers. Nevertheless some problems still happen in the framework of fuzzyquerying. In this paper, two common problems that could arise when users attempt to exploit Web large databases havebeen addressed. We have shown how they can be automatically approached. The key tool of the proposed approachesis a tolerance relation expressed by a convenient parameterized proximity relation. Such relation can provide the basisto achieving query intensification/relaxation of the user query. The main advantage of our proposal is the fact that itoperates only on the conditions involved in the user query without adding new conditions or performing any summarizingoperation nor using the data distribution of the queried database. Such approaches can be useful to construct intelligentinformation retrieval systems that provide the user with cooperative answers.

P. Bosc et al. / Fuzzy Sets and Systems 159 (2008) 1450–1467 1467

Acknowledgments

The authors are grateful to the referees for their valuable comments and suggestions, which have greatly improvedthe paper.

References

[1] T. Andreasen, O. Pivert, On the weakening of fuzzy relational queries, in: First 8th Internat. Symp. on Methods for Intelligent Systems, Charlotte,USA, 1994, pp. 144–151.

[2] F. Benamara, P. Saint Dizier, Advanced relaxation for cooperative question answering, in: M. Maybury (Ed.), New Directions in QuestionAnswering, AAAI/MIT Press, Cambiridge, MA, 2004(Chapter 21).

[3] P. Bosc, A. HadjAli, O. Pivert, Towards a tolerance-based technique for cooperative answering of fuzzy queries against regular databases,Seventh Internat. Conf. CoopIS, Lecture Notes in Computer Science, Vol. 3760, Springer, Berlin, 2005, pp. 256–273.

[4] P. Bosc, A. HadjAli, O. Pivert, Relaxation paradigm in a flexible querying context, Seventh Internat. Conf. on Flexible Query AnsweringSystems (FQAS’06), Milano, Lecture Notes in Computer Science, Vol. 4027, Springer, Berlin, 2006, pp. 39–50.

[5] P. Bosc, A. HadjAli, O. Pivert, About overabundant answers to flexible queries, in: 11th Internat. Conf. on Information and Processing andManagement of Uncertainty in Knowledge-based Systems (IPMU’06), Paris, 2006, pp. 2221–2228.

[6] P. Bosc, A. HadjAli, O. Pivert, Weakening of fuzzy relational queries: an absolute proximity relation-based approach, J. Mathware Soft Comput.14 (1) (2007) 35–55.

[7] B. Bouchon-Meunier, Stability of linguistic modifiers compatible with a fuzzy logic, in: Uncertainty in intelligent systems, Lecture Notes inComputer Science, Vol. 313, 1988, pp. 63–70.

[8] M. de Calmès, D. Dubois, E. Hullermeier, H. Prade, F. Sedes, Flexibility and fuzzy case-based evaluation in querying: an illustration in anexperimental setting, Internat J. Uncertainty Fuzziness Knowledge-based Systems 11 (1) (2003) 43–66.

[9] F. Corella, S.J. Kaplan, G. Wiederhold, L. Yesil, Cooperative responses to Boolean queries, in: First Internat. Conf. on Data Engineering, 1984,pp. 77–85.

[10] D. Dubois, A. HadjAli, H. Prade, Fuzzy qualitative reasoning with words, in: P.P. Wang (Ed.), Computing with Words Series of Books onIntelligent Systems, Wiley, New York, 2001, pp. 347–366.

[11] D. Dubois, H. Prade, Inverse operations for fuzzy numbers, in: Proc. of IFAC Symp. on Fuzzy Information Knowledge Representation andDecision Analysis, Marseille, 1983, pp. 391–395.

[12] D. Dubois, H. Prade, Possibility Theory, Plenum Press, New York, 1988.[13] D. Dubois, H. Prade, Tolerant fuzzy pattern matching: an introduction, in: P. Bosc, J. Kacprzyk (Eds.), Fuzziness in Database Management

Systems, Physica-Verlag, Wurzburg, 1995, pp. 42–58.[14] T. Gaasterland, Cooperative answering through controlled query relaxation, IEEE Expert 12 (5) (1997) 48–59.[15] T. Gaasterland, P. Godfrey, J. Minker, An overview of cooperative answering, J. Intelligent Inform. Systems 1 (2) (1992) 123–157.[16] P. Godfrey, Minimization in cooperative response to failing database queries, Internat. J. Cooperative Inform. Systems 6 (2) (1997) 95–149.[17] P. Godfrey, Relaxation in Web search: a new paradigm for search by Boolean queries, Personal Communication, 〈http://citeseer.ist.psu.edu/

godfrey98relaxation.html〉, March, 1998.[18] S.J. Kaplan, Cooperative responses from a portable natural language query system, Artificial Intelligence 19 (1982) 165–187.[19] E.E. Kerre, M. de Cock, Linguistic modifiers: an overview, in: G. Chen, M. Ying, K.-Y. Cai (Eds.), Fuzzy Logic and Soft Computing, 1999,

pp. 69–85.[20] A. Motro, SEAVE: A mechanism for verifying user presuppositions in query systems, ACM Trans. Off. Inform. Systems 4 (4) (1986)

312–330.[21] I. Muslea, Machine learning for online query relaxation, in: 10th Internat. Conf. of Knowledge and Discovery and Data mining, KDD’2004,

Washington, 2004, pp. 246–255.[22] J. Ozawa, K. Yamada, Cooperative answering with macro expression of a database, in: Fifth Internat. Conf. IPMU, Paris, 4–8 July, 1994,

pp. 17–22.[23] J. Ozawa, K. Yamada, Discovery of global knowledge in database for cooperative answering, in: Fifth IEEE Internat. Conf. on Fuzzy Systems,

1995, pp. 849–852.[24] W.A. Voglozin, G. Raschia, L. Ughetto, N. Mouaddib, Querying the SaintEtiq summaries: dealing with null answers, in: Proc. of 14th IEEE

Internat. Conf. on Fuzzy Systems, Reno, USA, 2005, pp. 585–590.