Post on 07-Mar-2023
The Concurrency Control Problem in Multidatabases:
Characteristics and Solutions*
Sharad Mehrotrat
Rajeev Rastogit
Yuri Breitbart$
Henry F. Korth$
Avi Silberschatzt
Abstract
A Mdtidatabase System (MDBS) is a collection of localdatabase management systems, each of which may followa different concurrency control protocol. Thk heterogene-ity makes the task of ensuring global serializability in anMDBS environment difficult. In this paper, we reduce theproblem of ensuring global serializability to the problem ofensuring serializability in a centralized database system. Weidentify characteristics of the concurrency control problemin an MDBS environment, and additional requirements onconcurrency control schemes for ensuring global serializabil-ity. We then develop a range of concurrency control schemesthat ensure global serializability in an MDBS environment,and at the same time meet the requirements. Finally, westudy the tradeoffs between the complexities of the variousschemes and the degree of concurrency provided by each ofthem.
1 Introduction
The problem of transaction management in a multi-
database system (MDBS) has received considerable at-
tention from the database community in recent years
[BS88, Pu88, BST90, ED90, GRS91, MRKS91, SKS91].
The basic problem is to design an effective and efficient
transaction management scheme that allows users to ac-
cess and update data items managed by pre-existing
*This material is based in part upon work supported by NSF
grants IRI-8805215 and IRI-9003341 aud by grants from the IBM
corporation and the NEC corporation. This material is baaed
in part upon work supported by the Center for Manufacturingand Robotics of the University of Kentucky and by NSF Grant
IRI-8904932.t Dep=tment of Computer Sciences, University of T=ae,
Austin, TX 78712-1188
: Department of Computer Sciences, University of Kentucky,
Lexington, KY 40506
~Matsushita Infomtion Technology Laboratory, 182 Nassau
Street, third floor, Princeton, NJ 08542-7072
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is givan
that copying is by permission of the Association for Computing
Machinery. To copy otherwisa, or to republish, requiras a fee
and/or specific permission.
1992 ACM SIGMOD - 6/921CA, USA
@ 1992 ACM 0-89791 -522-4 /92/0005 /0288 . ..$1 .50
and autonomous local database management systems
(DBMS,) located at different sites. Transactions in an
MDBS are of two types:
o Local transactions. Those transactions that
execute at a single site.
c Global transactions. Those transactions that
may execute at several sites.
The execution of global transactions is co-ordinated by
the global transaction manager (GTM) – a software
package built on top of the existing DBMSS whose
function is to ensure that the concurrent execution of
local and global transactions is serializable. Ensuring
global serializability in an MDBS is complicated by the
fact that each of the participating local DBMSS is a
pre-existing database system whose software cannot be
modified. As a result,
●
●
●
Each local DBMS may follow a different concurrency
control protocol.
Local DBMSS may not communicate any informa-
tion (e.g., conflict graph) relating to concurrency
control to the GTM.
The GTM is unaware of indirect conflicts betweenglobal transactions due to the execution of local
transactions at the local DBMSS. This is due to the
fact that pre-existing local applications make callsto the local DBMS interfaces, and thus the GTM,which is built on top of the local DBMSS, is not
involved in the execution of the local transactions.
Various schemes to ensure global serializability in
an MDBS environment have been previously proposed
(e.g., [BS88, Pu88, ED90, GRS91]). These proposedschemes have been ad-hoc in nature, and no analysis of
their performance, the degree of concurrency provided
by them, or their complexity has been made. In this
paper, we provide a unifying framework for the study
and development of concurrency control schemes in an
MDBS environment. We utilize a notion similar toseriahzation events [ED90] (referred to as O-elements in
288
[Pu88]) in order to reduce the problem of ensuring global
serializability in an MDBS to the problem of ensuring
serializability in a centralized DBMS. We then develop a
range of concurrency control schemes for ensuring global
serializability in an MDBS environment. Finally, we
compare the degree of concurrency provided by each of
the various schemes and analyze their complexities.
2 MDBS Concurrency Control
In this section, we show how the problem of ensuring
global serializability in an MDBS can be reduced to
the problem of ensuring serializability in a centralized
DBMS. Since centralized concurrency control is a well
studied problem and a number of schemes for ensuring
serializability in centralized DBMSS have been proposed
in the literature, the development of concurrency control
schemes for MDBSS is thus simplified. We begin by first
discussing the MDBS model.
2.1 The MDBS Model
An MDBS is a collection of pre-existing DBMSS located
atsites sl, sz, . . .. sm. A transaction T in an MDBS en-
vironment is a totally ordered set of read (denoted by
ri), write (denoted by Wi), begin (denoted by bi) andcommit (denoted by ci) operations (a global transac-
tion may have multiple begin and commit operations,
one for each site at which it executes). A global sched-
ule S is the set of all operations belonging to local and
global transactions with a partial order -+ on them.
The local schedule at a site sk, denoted by sk, is the set
of all operations (belonging to local and global trans-
actions) that execute at .% with a total order <Sk on
them. The schedule sk is a resirictionl of the global
schedule S.
We assume that the GTM is centrally located,
and controls the execution of global transactions. It
communicates with the various local DBMSS by means
of serwer processes (one per transaction per site) that
execute at each site on top of the local DBMSS. We
assume that the interface between the servers and the
local DBMSS provides for operations to be submitted bythe servers to the local DBMSS, and the local DBMSS
to acknowledge the completion of operations to theservers. The local DBMSS do not distinguish between
local transactions and global subtransactions executing
at its site. In addition, each of the local DBMSS ensures
that local schedules are serializable.
1A set PI with a partial order <PI on its elements is a
vedrictiow of a set P2 with a partizl order -tPz on its elementsifP1 ~ P2, and forallel, ez c PI, el <PI ez if =d only if
el <P2 e2.2 ~ ~M~ paper, we ~fit ~~selva to confiict serializa~llity
(CSR) [Pap86], which we shall refer to, in the remainder of the
paper, as serializability.
2.2 Serialization Functions
In order to develop our idea, we need to first introduce
the notion of serialization functions, which is similar to
the notion of serialization events [ED90]. Let n be the
set of all global subtransactions in &. A serialization
function fO1 i%, Ser, iS a fUIICtiOII that maps every
transaction in rk to one of its operations such that for
any pair of transactions Ti, Tj e n, if Ti is serialized
before ~ in sk, then ser(Td) +, ser(~).
For example, if the timestanap ordering (TO) concur-
rency control protocol is used at site sk, and the lo-
Cal DBMS d Sh Sk Z@ns tbM?ShYIpS h hnSaChnS
when they begin execution, then the function that maps
every transaction Ti c rk to T~’s begin operation is a se-
rialization fUnCtiOn for Sk.
For a site sk, there may be multiple serialization
functions. For example, if the local DBMS ats& follows
the two-phase locking (2PL) protocol, then a possible
SC!IidhhII funCtiOn for sk maps every transaction
Ti E Q to the operation that results in Ti obtaining its
last lock. Alternatively, the function that maps everytransaction Ti G ~ to the operation that results in Ti
releasing its first lock is also a serialization function for
sk3.
Unfortunately, serialization functions may not exist
for sites following certain protocols (e.g., serializationgraph testing (SGT)). For such sites, serialization
functions can be introduced using external means by
forcing conflicts between transactions [GRS91]. For
example, every transaction in rk can be forced to write
a particular data item at site sk, say, ticket. Thus, if
some transaction Ti E rk is serialized before anothertransaction Tj E ~ in sk, then Ti must have written
ticket before Tj wrote it. Thus, the function that maps
every transaction Ti E rk to its write operation on ticket
is a serialization fUnCtiOn fOr sk. We denote by serk, any
one of the possible serialization functions for site sk.
2.3 Global Serializability
Serialization functions can be used to ensure global se-
rializability in an MDBS environment. In the follow-
ing theorem, we state a sufficient condition for ensuring
global serializability in an MDBS environment.
Theorem 1: Consider an MDBS where each lo-
cal schedule is serializable. Global serializability is as-
sured if there exists a total order on the global trans-
actions such that at each site sk, for all pairs of global
transactions Gi, Gj executing at site sk, if serk (Gi) +,
.W’k (Gj ), then Gi is before Gj in the total order. u
3Actu~y, my fuction that maps every transaction Ti ~ ‘%
to one of its operations that executes between the time Ti obtains
its last lock and the time it releases its first lock is a serialization
function for Sk.
289
For+every global transaction Gi, we define transac-
tion Gi to be a restriction of Gi consisting of all the
operations in {se?l (G~) : G~ executes at site sk }. For
global schedule S, we define schedule ser(S) to be the
set of operations belonging to all transactions ~i, with
a partial order on them. Further, ser(S) is a restriction
of S. In the global schedule S, two operations conflict
if both access the same data item and one of them is a
write operation. However, the notion of conflict between
operations in ser(S) is defined differently. Operations
Serk (Gi) and serl (Gj) conflict in ser(S) if and only if
k = 1. From Theorem 1, it follows that S is serializable
if ser(S) is serializable.
Theorem 2: Consider an MDBS where each localschedule is serializable. A global schedule S is serializ-
able if ser(S) is serializable. ❑
We have thus reduced the problem of ensuring seri-alizability in an MDBS environment to the problem of
ensuring that ser(S) is serializable. Since global trans-
actions execute under the control of the GTM, the GTM
can control the execution of the operations in ser(S) in
order to ensure that ser=(S) is serializable. Thus, for en-
suring global serializability in an MDBS environment,
we can restrict ourselves to the development of schemes
for ensuring that ser(S) is serializable.
To do so, we split the GTM into two components,
GTM1 and GTM2 (see Figure 1). GTM1 utilizes the in-
formation on serialization functions for various sites in
order to determine for every global transaction Gi, op-
erations se?’k(Gi), and submits them to GTM2 for pro-
cessing. The remaining global transaction operations
(that are not serk(Gj)) are directly submitted to the lo-
cal DBMSS (through the servers). Further, GTM1 does
not submit an operation belonging to a global transac-
tion G~ (except the first operation) to either the localDBMSS or GTM2 unless an acknowledgement for the
completion of the execution of the previous operation
belonging to Gi (at the local DBMSS) has been received.
GTM2 is responsible for ensuring that the operations
submitted to it by GTM1 execute at the local DBMSS insuch a manner that ser(S) is serializable. Our concern,
for the remainder of the paper, shall be the development
of concurrency control schemes for GTM2 that ensureser(S) is serializable.
3 Characterktics of the Concurrency
Control Problem
A number of schemes for ensuring serializability in
centralized DBMSS exist in the literature (e.g., 2PL,
TO, SGT). Any one of them can be employed by
GTM2 in order to ensure that ser(S) is serializable.
However, certain characteristics of MDBS environments
make some of the existing schemes unsuitable for
G~, Gj - Global Transactif
~GTM
GTM1
~j = se~~(Gi)
Oj # StTk(Gi)
GTM2
Oj = se?’k(Gi)
‘+’ ~
Servers and Local DBMSS
Figure 1: The GTM Components
ensuring the serializability of ser(S). Below, we list
some of the factors that play an important role in the
design of concurrency control protocols for ensuring the
serializability of ser(S).
1.
2.
In most MDBS environments, we expect the number
of sites to be small in comparison to the number
of active global transactions in the system (that
is, global transactions that have begun execution,
but have not yet completed execution). Thus,
since any pair of operations Serk (Gi) and se?’k(Gj)
conflict in ser(S), ser(S) may contain a large
number of conflicting operations. As a result, if,
for example, the 2PL protocol were used to ensure
the serializability of ser(S), then there would be
frequent deadlocks; if instead, the TO or optimistic
protocols were used, a large number of transacti~n
aborts would result. An abort of transaction Giin ser(S) corresponds to the abortion of the globaltransaction Gi, which may be expensive, and thus
highly undesirable in an MDBS environment. Thus,
protocols to ensure serializability of ser(S) mustavoid transaction aborts, that is, they must be
conservative (e. g., conservative 2PL, conservative
TO [BHG87]). This is quite feasible in an MDBSenvironment.
Concurrency control protocols that provide a low de-
gree of concurrency may be unsuitable for ensuring
that ser(S) is serializable since such protocols may
cause a number of operations in ser(S) to be delayed
unnecessarily. Such delays may adversely affect the
performance of the system since unnecessarily delay-
290
I GTM2
dc(.?e~k(Gi))GTM1
Figure 2: Basic Structure of GTM2
3.
ing an operation in ser(S) may correspond to delay-
ing the execution of an entire global subtransaction.
For example, for a site % that uses the TO scheme,
ser~ may map each transaction to its begin opera-
tion. As a result, causing an operation of ser(S) to
wait could cause the execution of an entire global
subtransaction to be delayed.
A common problem with concurrency control pro-
tocols that provide a high degree of concurrenc-y is
that they incur substantial overhead for scheduling a
single operation (e.g., SGT). However, it may be jus-
tifiable to use such concurrency control schemes with
high overhead in order to ensure the serializability
of ser(S), since the overhead involved in schedul-
ing an operation of ser(S) is amortized, not over
one operation, but over all the operations belonging
to the corresponding global subtransaction. Thus,
the gain in terms of increased throughput, faster re-
sponse times, and the number of global subtransac-
tions that maybe permitted to execute concurrently
by a concurrency control scheme that permits a high
degree of concurrency may outweigh the overhead
associated with the concurrency control scheme.
The above factors imply that concurrency control
schemes for ensuring the serializability of ser(S) must
be conservative, and must provide high degrees of con-
currency (even though they may involve a high over-
head). Conservative schemes for ensuring global seri-alizability in MDBS environments have been proposed
in [BS88, ED90], while non-conservative schemes have
been proposed in [Pu88, GRS91].
4 Structure and Complexity of
Conservative Schemes
In this section, we describe the basic structure of conser-
vative concurrency control schemes employed by GTMz,
and the methodology we adopt for analyzing their com-plexity. As mentioned earlier, GTM1 submits the
Serk (Gi) operations belonging to each global transac-
tion Gi (or alternatively, each transaction ~i) to GTMz.
GTM1 inserts these operations in~o a queue, QUEUE.In addition, for every transaction Gil GTM1 inserts into
QUEUE, the operations initi and fini (whose utility is
discussed below). We now briefly describe the opera-
tions in QUEUE for an arbitrary transaction ~i and
site sk.
●
●
●
●
imiti : This operation is inserted into QUEUE ~y
GTM1 before any other operation belonging to Gi
is inserted into QUEUE.
86?I’k(Gi) : This operation is inserted into QUEUE
by GTM1 in order to request the execution of
operation se?’k(Gi).
a&(~e~h(Gi)) : This operation is inserted into
QUEUE by the servers when the local DBMSS
complete executing OperatiOn Se?’k(G~).
f ini : This operation is inserted into QUEUE by
GTM1 after ack(se9’k(Gi)), for all serk(G~) E ~ihave been received by GTM1.
Note that the i~iti and fini operations do not belong
to transaction Gi.
In Figure 2, we present the basic structure of GTMz.
CC is any conservative concurrency control scheme for
ensuring serializability of ser(S). CC selects operations
from the front of QUEUE, in order to process them.
Associated with CC are certain data structures (DS)
that are manipulated every time an operation selected
from QUEUE is processed by it. In addition, the
following actions are performed by CC when it processes
an
●
●
●
●
.
oper~tion oj in QUEUE. -
i’lliti:Operati~n initi contains informati~n relating
to transaction Gi (e.g., the operations in Gi, the set
of sites Gi executes at). This information is utilized
by CC to determine conflicting operations and is
added to DS.
Se?’k (Gi) : Operation Se?l (Gi) is submitted to thelocal DBMSS for execution (through the servers).
~&?(~eT’k(Gi)) : Operation ack(ser~(Gi)) is for-warded to GTM1.
f hi : Information relating to ~i is deleted from DS.
We denote by cact(oj), the actions performed by CCwhen it processes an operation oj in QUEUE. An
essential feature of conservative schemes is that they
ensure serializability without resorting to transactionaborts. As a result, it may not always be possible for CC
to process an operation when it is selected from QUEUE
since processing every operation when it is selected from
QUEUE may result in non-serializable schedules.
291
procedure Basicfichemeo:
Initialize data structures;
while (true)
begin
Select operation Oj from the front of QUEUE;if cond(oj) then begin
act(oj);
while (there exists an operation
of G WAIT such that Cond(ol) is true)
begin
act(oi);
WAIT := WAIT -{01}
end
end
else WAIT := WAIT U {Oj};
end
Figure 3: Basic Structure of Conservative Schemes
Thus, associated with every operation o; in QUEUE,
is a condition, cond(o<), that & defined ‘over ‘DS and. . .that must hold if oj is to be processed by CC. If
cond(oj ) does not hold when operation oj is selected
from QUEUE by CC, then Oj is added to a set of waiting
operations, WAIT, to be processed at a later time
when cond(oj) becomes true. Thus, every conservative
scheme for ensuring the serializability of ser(S) has the
same basic structure as shown in Figure 3. However,
different conservative schemes differ in the values for
~d(oj ) and cond(oj ) for the various operations, and
the data structures associated with the scheme. Thus,
a conservative concurrency control scheme can bespecified by specifying cond(oj ), ~d(oj ) for the various
operations, and the data structures maintained by the
scheme.
We now illustrate, by an example, how our abstrac-
tion for the structure of conservative schemes can be
used to specify a simple scheme similar to the conserva-tive TO scheme [BHG87], which we refer to as Scheme O.
The data structures maintained by Scheme O consist of
queues (initially empty), one associated with every site
sk. For an operation oj in QUEUE, cond(oj) and act(oj)
are defined as follows.
● cono!(imitj): true.
e ad[i~iti):Every operation Serk (Gil is inserted at
the end of the queue for site Sb. ‘ ‘
cond(ser~ (Gi)): Operation ser~ (Gi) is the first
operation in the queue for site sh.
act(se~~ (Gi)): Operation se?’k(Gi) is submitted tothe local DBMSS (through the servers) for execution,
●
●
●
co?d((d?(SfWk (Gi))): true.
act(ack(8e?’k (~i))): Operation serk (Gi) is de-
queued from the front of the queue for sk, and
ack(ser~ (Gi)) is sent to GTM1.
C07kd(~i71i): true.
No actions are performed by Scheme O when ~ ~ini
operation is processed. Since transactions Gi are
serialized in the order in which the i~itioperations
are processed, trivially, Scheme O ensures that ser(s)
is serializable.
We now analyze the complexity of CC which has thebasic structure as shown in Figure 3. The complexity
of CC is the average n-umber of steps it takes CC to
schedule a transaction Gi. For the purpose of analyzing
the complexity of the various schemes, we assume the
following.
● Every transaction Gi has, on an average, dav
operations (that is, the average number of sites at
which a global transaction executes is da”).
● At no point during the execution of CC does the
difference between the number of initiand finioperations processed by CC exceed n.
Since every transaction ~i is assumed to contain da.
operation~ the average number of steps taken by CC to
schedule Gi is the sum OE
. The number of steps required by CC to process initi,
o dav x (the number of steps required by CC to process
sevt(Gi)),
● dav x (the number of steps required by CC to process
~Ck?(Se?’k(Gi))), and
e The number of steps required by CC to process ~ini.
Note that every time an operation oj is processed by
CC (that is, ad(oj) is executed), operations 01 E WAIT
for which cond(ol) holds are processed, too. Thus, thenumber of steps required by CC to process an operation
oj is the sum of
. The number of steps in cond(oj ),
● The number of steps in ut (Oj),and
. The number of steps required to determine the
operations 01 c WAIT for which cond(oz) holds dueto the execution of ~d(oj).
Scheme O can be shown to have complexity O(d.v).
Since checking if an operation Se9’k(Gi) is the firstoperation in the queue for site sk takes 0(1) time,the number of steps in cond(oj), for all Oj, is O(l).
292
Further, since enqueueing and dequeueing an operation
takes O(1) time, and da. operations are enqueued when
an i~itioperation is processed, the number of steps
in act(initi ) is O(ddV), and the number of steps in
act(oj), Oj # hiti, is O(l). Since co~d(i~iti)andcond(ach(ser~ (Gi)) are both true, the only operations
in WAIT are ser~ (Gi), for some site sk and transaction
~i. Further, since execution of ad(~j) can cause(XJnd(serk (Gi)) to hold only if Oj = ack(ser~ (Gl)),
for some transaction Gl, and s~r~(Gi) follows serk (G/)
in the queue for sk, the number of steps required to
determine the operations or E WAIT for which cond(ol)
holds due to the execution of act(oj), for all operations
Oj, is O(1). Thus, the complexity of Scheme O is O(dov ).
In Scheme O, when initiis p~ocessed, the processing
of every operation Serk (Gi) C Gi is restricted to follow
the processing of operations ahead of serk (Gi) in the
queue for sk. No restrictions on the processing of
operations are added when Serk (Gi) or Uck(serk (Gi))
is processed. We refer to such schemes in which
restrictions on the processing of serk (Gi) operations are
added to DS only when an initioperation is processed,as begin transaction schemes or BT-schemes. The
schemes proposed [BS88, ED90] are BT-schemes. On
the other hand, a scheme in which restrictions on the
processing of ser~ (Gi) operations are added every time
an initi or a Serk (Gi) operation is processed is referredto as an operation scheme or O-scheme. O-schemes,
in general, result in a higher degree of concurrency
than BT-schemes (a concurrency control scheme, say
CC1, is said to provide a higher degree of concurrency
than another concurrency control scheme CC2 if, for
any given order of insertion of operations into QUEUE
by GTM1, CC2 does not cause a fewer number of
operations to be added to WAIT than CC1). In this
paper, we present an O-scheme that permits the set
of all serializable schedules. Even though BT-schemes
cannot provide a high degree of concurrency, certain
BT-schemes (e.g., Scheme O) are attractive, since they
have low complexities compared to O-schemes.
In the following sections, we present two BT-schemes,
and an O-scheme. The schemes ensure that ser(S)
is serializable. We specify the concurrency control
schemes by specifying the data structures maintained
by the scheme, and cond(oj ), act(oj) for the various
operations. We also state the complexity of each of
the schemes, and compare the degree of concurrency
provided by the various schemes. A detailed analysis
of the complexity of the schemes and proofs of their
correctness can be found in [MRB+91].
5 The Transaction-site Graph Scheme
Even though Scheme O has a low complexity, O(dav),
it has a serious drawback in that it permits a low
degree of concurrency. Below we present a scheme,
which we refer to as Scheme 1, and that provides a
higher degree of concurrency than Scheme O. It utilizes
a data structure similar to the site graph introduced
in ‘[BS88], which we refer to as the trans~ction-siie
graph (TSG). A TSG is an undirected bi-partite graph
consisting of a set of nodes V corresponding to sites (site
nodes) and transactions in ser(S) (transaction nodes),and a set of edges E. Site and transaction nodes are
labeled by the corresponding sites and transactions,
respectively. Edges in the TSG may be present only
between transaction nodes a~d site nodes. An edge
between a transaction node Gi and ~a site node sk is
present only if operation Serk (Gi) c Gi, and is denoted
by either (sk~~i) or (~i, sk). The s~t of edges {(~i, sk) :
Serk (Gi) E Gi } are referred to as Gi’s edges.
Associated with every site sk, are two queues : an
insert queue and a delete queue. Initially, all queues
are empty, and for the TSG, both V = @and E = QJ.
Processing of certain operations in the insert queues isconstrained by “marking’) them. For an operation Oj in
QUEUE, cond(oj) and d(oj) are defined as follows:
●
●
●
●
●
●
●
●
cond(initi): true.
act(initi): ~i and its edges are inserted into the
TSG. Also, for every operation Serk (Gi) ● ~i,
Serk (Gi) is inserted at the end of the insert queue
for sit% sk. If the TSG contains a cycle involving
edge (Gi, Sk), then operation Serk (Gi) in the insert
queue for site sk is marked.
cond(ser~ (Gi)): For every transaction @j such
that serk (Gj ) c @j, if ~Ct(Se7’k (Gj)) has executed,
then Uct(Uck(serk (Gj))) has also completed execu-
tion. In addition, if serk (Gi) is marked, then it is
the first element in the insert queue for site sk.
act(~t?~k (Gi)): Operation Serk (Gi) is submitted tothe local DBMSS (through the servers) for execution.
cond(ack(serk (Gi))): true.
act(~ck(~e?’k (Gi))): operation $erk (Gi) is deleted
from the insert queue for site sk (note that serk(Gi)
may not be at the front of the insert queue for site
sk ), and it is added to the end of the delete queue for
site sk. operation ad?(Serk (Gi)) is sent to GTM1.
cond(fini): For every operation S.5?’k (Gi) E Gi,
Serk (Gi) is the first element in the delete queue for
Sk Sk.
ad( f ini): ~i and its edges are deleted from the
TSG. For every operation serk (Gi) C ~i, serk (Gi)
is deleted from the delete queue for site sk.
293
Scheme 1 permits the TSG to contain cycles, but
prevents cycles in the serialization graph of ser(S) bymarking operations whose processing may potentially
lead to cycles in the serialization graph. Further, pro-
cessing of a marked operation is delayed until all the
operations ahead of it in the insert queue have been
processed (note that processing of unmarked operations
is not constrained in any way).
Theorem 3: Scheme 1 ensures that ser(S) is seri-
alizable. •I
The number of steps required to detect cycles in theTSG dominates the complexity of Scheme 1. Cycles
in the TSG can be detected using depth-first scorch
[AHU74]. Note that the TSG has at most m + n nodes
and at most ndav edges.
Theorem 4 The complexity of Scheme 1 is
O(?n+ n+ ?&v).❑
6 The Transaction-site
Graph-with-dependencies Scheme
Scheme 1, presented in the previous section, does notexploit the knowledge of the order in which operations
are processed (the TSG is checked only for cycles). As a
result, Scheme 1 places unnecessary restrictions on the
processing of operations. The transaction-site graph-
with-dependencies scheme, referred to in the sequel as
Scheme 2, is presented below and exploits the knowledge
of the order in which operations are processed. Inorder to permit schedules not permitted by Scheme 1,
Scheme 2 utilizes a structure similar to the TSG, The
structure contains, in addition to transaction and site
nodes, dependencies (denoted by ~) between edges
incident on a common site node, and is referred to as
Transaction Site Graph with Dependencies (TSGD). A
TSGD is a 3-tuple (V, 1??,D), where V is the set of
transaction and site nodes, E is the set of edges and
D is the set of dependencies. Dependencies specify the
relative order in which operations are processed and areused to restrict the processing of operations. If (~i, Sk)
and (sk, ~j) are edges in the TSGD, then a dependency
of the form (~~, s~)4(s~, ~j) denotes that ser~ (G~) is
processed before se?l (Gj).
Acyclicity of the TSGD plays an important role inensuring that ser(S) is serializable. Below, we formally
state the conditions under which a TSGD is said to
be acyciic. Consider a TSGD containing edges (V1, V2),
(?)2, ?JJ, . . . . (v~, Vi), k >2. This set of edges form acycle ifvi # vj, for alli, j= 1,2, . . ..k. i# j, and either
one of the following is true.
● For alli, i = 2,3, ..., k, dependency (vi_l, vi) *
procedure Eliminate.Cycles( (V, E, D), ~i):
1. NJark all edges “unused”. For all transactions
Gj E V, s-par(~j) := null, t-par(~~) := null.
Also, V := ~~, A := 0.
2. If for all pairs of distinct edges (v, u), (u, w), either
e w # &i and (u, w) is marked “used”, or
e there is a dependency (v, u)+(u, w) in D U A,
or
● head(s-par(v)) = u.
then go to step (4).
3. Choose a pair of distinct edges (v, u), (u, w) such
that
● w = @i or (u, w) is not marked “used’), and
● there is no dependency (v, u)-+(u, w) in D U A,
and
● head(s-par(v)) # u.
Mark (u, w) “used”. If w = 6i, then add to A
the dependency (v, u) ~ (u, ~i). If w # ~i, then
s-par(w) := uos-par(w), t-par(w) := vet-par(w),
and v := w. Go to step (2).
4. If v # @i, then temp := head(t-par(v)), t-par(v)
:= tail(t-par(v)), s-par(v) := tail(s-par(v)), v :=
temp, go to step (2).
5. return(A).
Figure 4: The procedure Eliminate-Cycles
(vi> v(i mod ~)+1) @‘.
● Foralli, i= 2,3,.. ., k, dependency (v(i mod k)+l, vi)+
,(Vi, Vi-~) @’ .D.,.
We say the TSGD is acyclic if it does not containany cycles. We now specify, for every operation Oj
in QUEUE, cond(oj ) and act(oj) that preserve theacyclicity of the TSGD. Initially, for the TSGD, V = 0,
E= O, D=O.
●
o
cond(initi): true.
act(hitj): ~i and its edges are inserted into the
TSGD. For every operation serk (Gi) c d~, for all
transactions ~j c V such that Serk (Gj) C ~j and
act(serk (Gj )) has executed, dependencies (~j, sk)~
(sk, 6,) are added to D. The set of dependencies,D, is further modified as follows.
294
D:= D U Elirninate.Cycles( (V, E, D), ~j)
The procedure Eliminate-Cycles (specified in Fig-
ure 4) returns a set of dependencies A such that
(V, E, D U A) does not contain any cycles involving
&
cond(ser~ (G~)): For all transactions Gj c V,
if dependency (~j, sk) - (sk, ~~) c D, then
act(ack(ser~ (Gj ))) has completed execution.
act(se~~ (Gi)): For every transaction @j E V such
that ser~ (Gj ) E @j and act(ser~ (Gj)) has ~ot yet
been executed, dependencies (~i, s~)~(s~, Gj) are
added to D. Operation se?%(G~) is submitted to the
local DBMSS (through the servers) for execution.
corzd(ack(ser~ (G~))): true.
act(ack(se~~ (Gi))): Operation ack(ser~ (Gi)) is
sent to GTM1.
cond(~in~): For every operation ser~ (Gi) c Gi,
there does not exist a ~j E V such that (@j, Sk )+
(Sk, @ ~ ~.
mt( f ini): ~i, along with its edges and dependen-
cies is deleted from the TSGD.
We now discuss the procedure Eliminate.Cyc~s that
takes as arguments the TSGD and a transaction Gi c V.
Eliminate-Cycles exploits the knowledge of the order of
in which operations are processed and returns a set of
dependencies A with the following properties.
● Every dependent y in A is of the form (Gj, sk) *
(~ij sk), for some transaction @j C V and some site
sk e V.
● In (V, E, D U A) there are no cycles involving Gi.
~ Eliminate-Cycles attempts to detect cycles involving
G~ in the TSGD, and then eliminates them by addingdependencies to A. It traverses edges in the TSGD
“marking” them as it goes along so that an edge isAnot
traversed multiple times. If an edge incident on Gi is
traversed, then El~minate-C ycles concludes that there is
a cycle involving Gi and adds appropriate dependenciesto A in order to eliminate the cycle.
In Eliminate-Cycles, v is the current transaction nodebeing visited (site nodes are not visited). Unlike depth-
first search[AHU74], in Eliminate.Cycles, a transaction
node may be visited multiple times. For a transaction
node ~j, t.Par(~j) stores the list of transaction nodes
to which backtracking from @j must take place, and
s~r(~j), the list of site nodes from which Gj is vis-
ited, every time it is visited. Functions head, tad ando are as defined for lists4.
Eliminate-Cycles returns a set of dependenci~ A such
that (V, E, DUA) contains no cycles involving Gi. Since
Eliminate-Cycles is invoked every time an initi opera-
tion is processed, it can be shown by a simple induction
argument on the number of initi operations processed,
that the TSGD is always acyclic, and thus ser(S) is se-
rializable.
Theorem 5: Scheme 2 ensures that ser(S) is seri-
alizable. Cl
The number of steps in Eliminate-Cycles dominates
the complexity of Scheme 2. It can be shown that Elim-
inate-C ycles terminates in O(n2dau) steps.
Theorem 6: The complexity of Scheme 2 is
O(n%a”).❑
Scheme 2 provides a higher degree of concurrency
than Scheme O. However, Scheme 2 does not provide
a higher degree of concurrency than Scheme 1 since
certain dependencies in the set of dependencies A
returned by Eliminate-Cycles may be unnecessary for
the purpose of ensuring that (V, E, D U A) contains no
cycles involving di. Thus, there may exist a set of
dependencies, Al, such that Al C ~ and (V, E, DUAI)
does not contain a cycle involving Gi.
We formally define the notion of minimality as
follows. A set of dependencies A i~ minimal with respect
to the TSGD and a transaction Gi c V iff
● (V, E, D U A) does not contain any cycles involving
~i, and
● for all d c A, (V, E, D U A — d) contains a cycle
involving Gi.
The set of dependencies A that are returned by
Eliminate-Cyc@ may not be minimal with respect to(V, E, D) and Gi, and thus unnecessary restrictions may
be imposed on the processing of operations, hurting the
degree of concurrency. In order to impose minimal re-strictions on the processing of operations and to provide
maximal concurrency without jeopardizing the serializ-
ability of ser(S), A must be minimal with respect to
(V, E, D) and ~i. However, the problem of computingsuch a A is NP-hard [GJ79], and is a consequence of the
following NP-completeness result.
Theorem 7: The following problem-is NP-complete.
Given a TSGD and a transaction node Gi G V. Is A = (?
4For a list 1 = [11, 12, . . . . /p] and element 10, head(l) returns 11,
tad(l)returm [12, . . ..lp]. =dloolmt~[io,ll,lz,... ,lp].
295
not minimal with respect to the TSGD and transaction~i? ❑
7 An O-Scheme that Permits all
Serializable Schedules
The problem with the BT-schemes presented in the pre-
vious sections is that they either provide a low degree
of concurrency or have high complexity. This is due to
the req~irement that all the restrictions on the process-
ing of Gi’s operations in order to ensure serializability
of ser(S) be added to DS when initi is processed (since
no restrictions are added when Serk (Gi) operations are
processed). Thus, BT-schemes cannot provide a very
high degree of concurrency since they a priori restrict
the processing of operations to permit only a subset of
serializable schedules. Furthermore, BT-schemes that
attempt to provide even a moderately high degree of
concurrency are intractable, as shown in the previous
section.
In this section, we present an O-scheme that permits
the set of all serializable schedules, which we refer to as
Sch~me 3. Scheme 3 adds restrictions on the processing
of Gj’s operations to DS every time an init; or serk (Gi)
operation is processed. AS a result, when an initi
or a serk (Gi) operation is processed, Scheme 3 only
adds minimum restrictions to DS such that processing
the next Serk (Gi) operation cannot cause se?’(s) to be
non-serializable (additional restrictions are added whenthe next 8e?’k(G;) operation is processed). Since, at
any point, minimum restrictions are imposed on the
processing of operations in order to ensure serializability
of ser(S), Scheme 3 permits the set of all possible
serializable schedules. Further, the computation of the
minimum restrictions to be added to DS every time anoperation is processed is not too difficult, and Scheme 3
can be shown to have a complexity 0(n2daV ).
In scheme 3, ~ssociated with every transactio~ @iis a set ser~e f (Gi ) of transactions such that if Gj G
ser-be f (@i), then @j is serialized before @i in ser(S).
Also, at any point p during the execution of Scheme 3,
for every site .$k,
● Iastk is the transaction @i that is the last among
transactions in {Gj : ser~ (Gj) c ~j } to haveexecuted act(ser~ (Gi)) before point p.
● Setk is the set of transactions
{@j : (serfi(Gj) G ~j)A(act(initj) has executed beforep) A ( UCt(S6W’k (Gj)) has not executed before p)},
I~itially, for 211 sk, /aStk = ndl, Setk = 0, and fOr all
Gi7 serJef (G~) = 0. For an operation oj in QUEUE,cond(oj ) and ~d(oj) are defined as follows.
● cond(initi): true.
●
e
●
b
●
●
●
.*act(initi): For every operation ser~(Gi) c Gi, Gi
is added to Setk. The set serJef(~i) is updatedas follows to include all the transactions serialized
before @i.
cond(8e~k(G;)): w.r-bef (di)n(sdk –{6;}) = 0. If
lastk = ~j, then act(ack(serk (Gj))) has executed.
act(~e~k (Gi)): ~; is deleted from Setk and lastk
is set tO G;. Since for all transactions @j e setk,
Serk (Gj ) has not been processed when Serk (Gi) is
processed, .$e?’k(Gi) executes before Serk (Gj) exe-
cutes, and ~i is thus serialized before ~j in ser(S).
Thus, for certain transactions Cj, serl)e f (~j ) is up-
dated as follo~s to include all the transactions seri-
alized before Gj.
Let SetlA= (ser_bef(@;) U {6~}), Set2 = {~1 :
serJe f (Gl) fl Sdk # 0}. For all transactions Gj
such that either Gj E setk, or Gj C set2,
SerJef(Gj) := SerJef(Gj) U Setl.
Operation ser~(Gi) is submitted to the local DBMSS(through the servers) for execution.
cond(ack(8e7’k (Gi))): true.
act(fZCk(.Se~k (G~))): Operation ack(se?’k (Gi)) is
sent to GTM1.
cond(fini): ser-bef(~i) = 0.
act( f ini): For all transactions @j such that ~i c
ser-be f (dj), @i is deleted from ser_be~(~j ). For all
Serk (Gi) E ~; such that !astk = ~;, la.$tk is set to
null.
Schem~ 3 ensures tha~ for all transactions ~il ~i @
se~Je~(Gi), and thus, Gi is never serialized before it-
self.
Theorem 8: Scheme 3 ensures that ser(S) is seri-alizable. ❑
The complexity of Scheme 3 is dominated by th~ num-
ber of steps required in order to update seriie f (Gj ), for
certain transactions Gj, when act(serk (Gi)) executes.
Theorem 9: The complexity of Scheme 3 is
0(n2daO). ❑
296
In [MRB+91], we have shown that at any point
during the execution of Scheme 3, i~for some set Setk,
Setk # 0, then for some transaction G~ C Seth, SeI’h (G{)
is processed by Scheme 3. Further Scheme 3 permits
the set of all serializable schedules, that is, if se?’k(Gi)
operations are inserted into QUEUE by GTM1 in such a
manner that processing every se?’h(Gi) operation when
it is selected from QUEUE results in a serializable
schedule, then Scheme 3 does not add any serh (Gi)
operation to WAIT. Thus, Scheme 3 permits a higher
degree of concurrency than Schemes O, 1 and 2.
8 Conclusion
There has been no systematic study of the concur-
rency control problem in MDBS environments. Existing
schemes for ensuring global serializability in MDBSS are
ad-hoc, and no analysis of their performance, the degree
of concurrency provided by them, or their complexity
has been made. In this paper, we have reduced the
problem of developing schemes for ensuring global seri-
alizability in an MDBS environment to that of develop-
ing conservative schemes for ensuring serializability y in a
centralized DBMS. Since concurrence y control in central-
ized DBMSS is a well studied problem, the development
of concurrence y control schemes for MDBSS is simplified.
We have proposed a model for conservative concur-
rency control schemes, and have developed a number of
conservative schemes for ensuring serializability, includ-
ing one that permits the set of all serializable schedules.
We have analyzed the complexities of each of the devel-
oped schemes and compared the degree of concurrency
provided by the various schemes. Since conservativeschemes delay the execution of operations belonging to
transactions instead of aborting transactions later, in
our analysis of the complexity of conservative schemes
we have taken into account the coat of attempting to
reschedule an operation that was previously made to
wait, Further work still remains to be done on making
the developed schemes fault-tolerant,
Acknowledgements
We would like to thank Nandit Soparkar for discussionsthat helped in the development of the paper.
References
[AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Unm-
an. The Design and Analysis of Computer
Algorithms. Addison-Wesley, Reading, MA,
1974.
[BHG87] P. A. Bernstein, V. Hadzilacos, and
N. Goodman. Concurrency Control and
Recovery in Database Systems. Addison-
Wesley, Reading, MA, 1987.
[BS88] Y, Breitbart and A. Silberschatz. Mul-
tidatabase update issues. In Proceedings
of A CM-SIGMOD 1988 International Con-
ference on Management of Daia, Chicago,
pages 135–141, 1988.
[BST90] Y. Breitbart, A. Silberschatz, and G. R,
Thompson. Reliable transaction manage-
ment in a multidatabase system. In Pro-
ceedings of A CM- SIGMOD 1990 Int erna-
tional Conference on Management of Data,
Atlantic City, New Jersey, pages 215-224,1990.
[ED90] A.K. Elmagarmid and W. Du. A paradigm
for concurrency control in heterogeneous
distributed database systems. In Proceed-
ings of the Sixth International Conference
on Data Engineering, 1990.
[GJ79] M. R. Garey and D. S. Johnson. Computers
and Intractability. W. H. Freeman and
Company, New York, 1979.
[GRS91] D. Georgakopolous, M. Rusinkiewicz, andA. Sheth. On serializability of multi-
database transactions through forced local
conflicts. In Proceedings of the Seventh In-
ternational Conference on Data Engineer-
ing, Kobe, Japan, 1991.
[MRB+91] S. Mehrotra, R. Rastogi, Y. Breitbart, H. F.
Korth, and A. Silberschatz. The concur-rency control problem in multidatabases:
Characteristics and solutions. Technical Re-port TR-91-37, Department of Computer
Science, University of Texas at Austin, 1991.
[MRKS91] S. Mehrotra, R. Rastogi, H. F. Korth, and
A. Silberschatz. Non-serializable executions
in heterogeneous distributed database sys-
tems. In Proceedings of the First Inter-
national Conference on Parallel and Dis-
tributed Information Systems, Miami Beach,
Florida, 1991.
[Pap86] C. Papadimitriou. The Theory of Database
Concurrency Control. Computer Science
Press, Rockville, Maryland, 1986.
[PU88] C. Pu. Superdatabases for composition of
heterogeneous databases. In Proceedings
of the Fourth International Conference on
Data Engineering, Los Angeles, 1988.
[SKS91] N.R. Soparkar, H.F. Korth, and A. Silber-
schatz. Failure-resilient transaction man-
agement in multidatabases. IEEE Com-
puter, December 1991.
297