Archie: A Speculagve Replicated Transacgonal ... - Hyflow project
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Archie: A Speculagve Replicated Transacgonal ... - Hyflow project
Archie: A Specula.ve Replicated Transac.onal System
Sachin Hirve, Roberto Palmieri, Binoy Ravindran Systems So>ware Research Group
Virginia Tech
2
What do we want for our transactional application?
Strong consistency
High availability
Great scalability in small/med size
High performance
Fault tolerance Low latency
Programmability
4
How does it work when combined?
Ordering phase Client Request (tx1)
Client Reply
R2 R1
L
R3 R4
msg (tx1) execu.on of (tx1)
tx1
tx1 tx2
tx2
5
…and when the load increases?
Ordering phase Tx1
Ordering phase Tx2
Ordering phase Tx3
Ordering phase Tx4
Ordering phase Tx5
Ordering phase Tx6
Tx1
Tx2
Tx3
Tx4
Tx5
Tx6
6
Goal: Boost performance in the common case
Challenge:
Transac.on response .me = total order latency
7
An.cipate the work
Ordering phase Tx1
Tx1
Ordering phase Tx2
Tx2
Ordering phase Tx3
Tx3
• Exploita.on of an early no.fica.on triggered during the ordering phase (called op#mis#c-‐delivery)
8
Reliable Op.mis.c Delivery
• Avoid any mismatch between the op.mis.c delivery order and total order
• Maximize the .me between the op.mis.c delivery and the establishment of the total order
Ordering phase Tx1
Ordering phase
Tx2
Ordering phase
Tx3
Tx1
Tx2
Tx3
Tx1 Tx2 Tx3
Op.mis.c Order
Tx1 Tx2 Tx3
Total Order
Stable leader
9
MIMOX
• Mul. Paxos + Reliable Op.mis.c Delivery – Exploit the leader’s batching .me for maximizing the overlapping .me
– Exploit the leader’s knowledge for making the op.mis.c delivery reliable
• Tag messages with leader’s expected order
11
MIMOX: Batch broadcast
R2
R1 L
Client Requests
1 1 1
1
1 2 2
2
3 3
3
Batching window
Op$mis$c Delivery
Op$mis$c Delivery
1 2 3
1 2 3
12
MIMOX: Performance
90
95
100
105
110
115
120
125
130
3 5 7 9 11 13 15 17 19
1000x
msg
s per
sec
Nodes
10 bytes20 bytes50 bytes
0
2
4
6
8
10
12
14
3 5 7 9 11 13 15 17 19m
illis
eco
nds
(mse
c)
Nodes
10 bytes20 bytes50 bytes
13
Transac.on Processing
• Exploit parallelism • Commit in order • Minimal overhead for conflict detec.on and resolu.on
Ordering phase Tx1
Ordering phase
Tx2
Ordering phase
Tx3
Tx1
Tx2
Tx3
Tx1 Tx2 Tx3
Op.mis.c Order
Tx1 Tx2 Tx3
Total Order
14
Key points of ParSpec
• Reduce problem’s complexity by ac.va.ng MaxSpec transac.ons at-‐a-‐.me – meta-‐data’s size is fixed – set of possible conflic.ng transac.ons is limited
• Parallel execu.on but in-‐order specula.ve commit (x-‐commit)
15
ParSpec: details T1 T2 T3 T4
T2 (X): Begin T3 (X): Begin
Read(X)
0 0 0 0
X: Read bit-‐array
0 0 0 0
0 1 1 0
0 0 0 0
Abort bit-‐array
0 0 0 0
0 0 0 0
Write(X) 0 0 0 0
Wait
Read(X) 0 0 1 0 0 0 0 0
X-‐commit Abort; Re-‐execute
0 1 1 0
Op.mis.c Order:
0 0 1 0 0 0 0 0
MaxSpec = 4 T5 T6 T7 T8
0 0 1 0 OR bitwise
16
Is this enough for achieving the goal?
• Unfortunately not…
Ordering phase Tx1
Δ
Valida.on ReadSet
Commit WriteSet
≈Δ
17
Solu.on for Write Set
• Specula.ve versions are associated with an tenta.ve commit .mestamp, which is the expected commit .mestamp in case of no mismatch between op.mis.c and total order
Ordering phase Tx1
Valida.on Read Set
Commit Write Set
TS++
18
Solu.on for Read Set
• If the leader is not either crashed or suspected (it is stable), then there is no chance of mismatch between op.mis.c and total order
• ParSpec commits according to op.mis.c order and make them visible during the total order
Ordering phase Tx1
Valida.on ReadSet
?
20
Evalua.on
• Testbed – PRObE cluster (19 nodes) – AMD Opteron 6272, 64-‐core, 2.1 GHz CPU – 128 GB RAM and 40 Gbps Ethernet
• Benchmarks – Bank, TPC-‐C and Vaca.on
• Compe.tors – PaxosSTM: DUR-‐based approach; it suffers from remote aborts – SM-‐DER: Post final delivery single-‐thread execu.on – HiperTM: SM-‐DER based single thread execu.on – Archie-‐FD: Archie but post final delivery parallel processing
21
Evalua.on: TPC-‐C
Throughput 19 warehouses
0
5000
10000
15000
20000
25000
3 5 7 9 11 13 15 17 19
Tran
sact
ions
per
sec
Replicas
SMRab
ARCHIE-FDARCHIE
0
5000
10000
15000
20000
25000
3 5 7 9 11 13 15 17 19
Tran
sact
ions
per
sec
Replicas
SMRab
ARCHIE-FDARCHIE
0
5000
10000
15000
20000
25000
3 5 7 9 11 13 15 17 19
Tran
sact
ions
per
sec
Replicas
SMRHiperTM
PaxosSTM
ARCHIE-FDARCHIE
0
5000
10000
15000
20000
25000
3 5 7 9 11 13 15 17 19
Tran
sact
ions
per
sec
Replicas
SMRHiperTM
PaxosSTM
ARCHIE-FDARCHIE
0
5
10
15
20
25
3 5 7 9 11 13 15 17 19
1000
x Tr
ansa
ctio
ns p
er s
ec
Replicas
SM-DERHiperTM
PaxosSTM
ARCHIE-FDARCHIE
20
30
40
50
60
70
80
90
100
1 19 100
% X
-co
mm
itte
d t
ran
sa
ctio
ns
Number of warehouses
3 nodes11 nodes19 nodesThroughput
100 warehouses
22
Evalua.on: Distributed Vaca.on
Conten.on level
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
100 250 500 750 1000
Tra
nsa
ctio
ns
per
sec
Number of relations
SM-DER
HiperTM
PaxosSTM
ARCHIE-FD
ARCHIE