Interactively Visualizing Procedurally Encoded Scalar Fields
Decentralized Scalar Timestamp - USENIX
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Decentralized Scalar Timestamp - USENIX
Unifying Timestamp with Transaction Ordering for MVCC with Decentralized Scalar Timestamp
Xingda Wei, Rong Chen, Haibo Chen, Zhaoguo Wang, Zhenhan Gong, Binyu Zang Institute of Parallel and Distributed Systems@Shanghai Jiao Tong University
1
Multi-versioning is important for TXs[1]
Data are stored in multiple-copies (aka, versions)
n e.g., updates not overwrite (as in single-versioning)
2
Single-version database
Multi-version (MV) database
vs.
A=1 A=2
A=1 Av0=1Av1=2
A+=1TX1
[1] Transactions
Data are stored in multiple-copies (aka, versions)
Higher concurrency for readers writers
n Reader reads from the (consistent) past
n Write concurrently updates the latest
Multi-versioning is important for TXs[1]
3
Av0=1Av1=2
A+=1TX1
MV database
Reader:
Writer:
Print(A)TX2
[1] Transactions
How to assign version & choose version to read?
Multi-versioning is important for TXs[1]
Data are stored in multiple-copies (aka, versions)
Higher concurrency for readers writers
n Reader reads from the (consistent) past
n Write concurrently updates the latest
4
Av0=1Av1=2
A+=1TX1
Print(A)TX2
MV database
Reader:
Writer:
[1] Transactions
How to assign version & choose version to read?
Deciding version should be consistent & efficient
n Consistent = Match TX’s order
6
Print(A)TX1 A+=1TX2 A+=1TX3...Time Av0=0
AV0 < v1 < ... < v2 n i.e., versions sorted by TXs
Av2=1Av1=1
How to assign version & choose version to read?
Deciding version should be consistent & efficient
n Consistent = Match TX’s order
n Efficient: minimal impact to the TX
7
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
A common approach: use timestamp
8
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#1. Physical clock
ü Nearly no cost
✘ Not consistent: unsynchronized clocks
A common approach: use timestamp
9
Server0 Server1
Network
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#2. Global logical clock (GTS)
ü Easy to ensure consistency
A common approach: use timestamp
10
Timestamp Server
Network
TX1
TX2
TXn
...clock
clock
A common approach: use timestamp
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#2. Global logical clock (GTS)
ü Easy to ensure consistency
✘ Non-trivial costs
11
clock
clock
Extra network requests per TX
A common approach: use timestamp
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#2. Global logical clock (GTS)
ü Easy to ensure consistency
✘ Non-trivial costs
✘ Scalability bottleneck
12
Timestamp Server
Handles all the requests
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#3. Global vectorized clock (VTS)
n Per-worker clock
A common approach: use timestamp
13
Timestamp Server
Network
TX1
TX2
TXn
...clock
clock
...
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#3. Global vectorized clock (VTS)
ü Reduce requests to the global server
A common approach: use timestamp
14
Timestamp Server
Network
TX1
TX2
TXn
...clock
clock
...
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#3. Global vectorized clock (VTS)
ü Reduce requests to the global server
✘ Still needs the global server
A common approach: use timestamp
15
Timestamp Server
...
Yet, ordering timestamps is challenging in networked systems
n Recall the goals: Consistency + Efficiency
#3. Global vectorized clock (VTS)
ü Reduce requests to the global server
✘ Still needs the global server
✘ Non-scalable meta-data
A common approach: use timestamp
16
...
One clock per-worker
Observation: concurrency control (CC)
18
Recall: consistency = timestamps match TXs’ order
n While CC orders TXs (e.g., 2PL OCC)
Physical clocks are approximately good
n Hybrid clock: using CC in read-write TX to order physical clocks
This work: Decentralized Scalar Timestamp
Scalable scalar timestamp mechanism to enable MVCC
n Provide snapshot reads for read-only TX (i.e., bypasses original CC)
Consistent & efficient scalar hybrid physical-logical clock
n Reader uses the physical clock to read snapshots (bypass CC)
n DST is consistent by piggybacking w/ read-write TX’s CC
19
This work: Decentralized Scalar Timestamp
Scalable scalar timestamp mechanism to enable MVCC
n Provide snapshot reads for read-only TX (i.e., bypasses original CC)
20
Read-write(TX1) (TX2) Read-onlyw/o DST
Network
Network
TX start
TX end
Read A
A
Data server
Concurrency control(CC)e.g., 2PL, OCC, etc
Read data(A)
This work: Decentralized Scalar Timestamp
Scalable scalar timestamp mechanism to enable MVCC
n Provide snapshot reads for read-only TX (i.e., bypasses original CC)
21
Read-write(TX1) (TX2) Read-onlyw/ DST
Network
Network
TX start
TX end
Read AvAv
Data server
Read w/o CC
v =
This work: Decentralized Scalar Timestamp
Use physical clock as an tentative timestamp
n Minimal cost & Scalable: scalar + no global timestamp
22
(TX2) Read-onlyw/ DST
v =
This work: Decentralized Scalar Timestamp
Use the (un-sync) physical is not sufficient for read-write TXs
n DST is consistent by piggybacking w/ read-write TX’s CC
23
Read-write(TX1) Data server
Network
TX start
TX end
v =Write Av CC
Piggyback on CC:Update v to be consistent
This work: Decentralized Scalar Timestamp
Use physical clock as an tentative timestamp
n Also fresh: snapshot has bounded staleness (e.g., ms-scale)
24
(TX2) Read-onlyw/ DST
v =
Decentralized (Scalar) Timestamp is not so new
Based on many prior work on decentralized timestamps
n TicToc@SIGMOD’18, Cicada@SIGMOD’17, etc
Differently:
n DST is a general timestamp method for different CC
n DST further targets MVCC in a distributed setting
25
Outline of the remaining content
How to piggyback on CC for read-write TX ? DST rules
How snapshot reads bypass CC? DST reads
How good is DST’s snapshot read? Bounded staleness
Evaluation results
27
DST rules for read-write TX
Inspired from conflicting relationships of TXs
Two TXs conflict if and only if:
30
TX access Read Read
Read - Read-WriteWrite Write-Read
Timestamp should adjust follow these rules, too !
Write-Write
Write-Write
If TX2 has a write-write conflict with TX1
nTimestamptx2(Tx2) should > Timestamptx1 (Tx1)
32
TimeA v0
Tx1
Write(A)
TX1
Write(A)
TX2
Write-Write rule#2:TX'= MAX(TX,A.ver)+ 1
Write-Write rule#1:A.ver = TX1
TX2'
TX2'>TX1
Achieved via storing the (last-writer) timestamp with data
Example: Write-Write in 2PL
33Time
Write(A)TX1RPC
Lock(A)do other things …
Write & Unlock(A)
RPC
Server
2PLA.lock();
2PLA.write(…);A.unlock();
Can be simply piggybacked to 2PL in a lightweight way
A
Write & Unlock(A)
Example: Write-Write in 2PL
34Time
RPC
Lock(A)
RPC
Server
2PL+DSTA.write(…);A.ver = TS;A.unlock();
Can be simply piggybacked to 2PL in a lightweight way
Start
DSTTS=clock();
DST
TS=MAX(TS,A.ver)+1
A
Write(A)TX1
2PL+DSTA.lock();ret A.ver;
DST rules for read-write TX
Inspired from conflicting relationships of TXs
Two TXs conflict if and only if:
35
TX access Read Read
Read - Read-Write
Write Write-Read Write-Write
Similar to Write-Write
Similar to Write-Write
Timestamp should adjust follow these rules, too !
DST reads
Key benefit: Bypass CC for the reader
n e.g., avoid read locks in 2PL
Can we use MVCC w/ DST the same as global clock?
38
DST reads
Key benefit: Bypass CC for the reader
n e.g., avoid read locks in 2PL
Can we use MVCC w/ DST the same as global clock?
39Time
Read(A)TX1
Server
Start
DSTTS=clock();
RPC
Read A@TS
MVCC readRead A w/ version less up to TS;
DST reads
Key benefit: Bypass CC for the reader
n e.g., avoid read locks in 2PL
Can we use MVCC w/ DST the same as global clock?
n No, because timestamp of reader does not use CC !
n i.e., not apply DST rules because rules are piggyback to the CC !
40
DST reads
Key benefit: Bypass CC for the reader
n e.g., avoid read locks in 2PL
DST reads (a lightweight rule for snapshot) to fix this:
41Time
Read(A)TX1
Server
RPC
Read A@TX Start
DSTTS=clock();
MVCC read + DSTA.ver = MAX(A.ver,TS);Read A w/ version less up to TS;
DST works well various CCs
n 2PL MySQL cluster X DST
n OCC DrTM+R[Eurosys’16] X DST
n ROCOCO ROCOCO[OSDI’14] X DST
Also on various network settings
n Local clusters (TCP/IP), AWS cloud, RDMA
Evaluation results of DST
45
DST works well various CCs
n 2PL MySQL cluster X DST
n OCC DrTM+R[Eurosys’16] X DST
Also on various network settings
n Local clusters (TCP/IP), AWS cloud, RDMA
Evaluation results of DST
46
SmallBank Thpt(MTxs/sec)
Number of clients
2PL X DST
Workloads: TPC-C,Smallbank
nDST ~= RC in performance
47
TPC-C Thpt(KTxs/sec)
Number of clients
Beneficial to write-mostly workloadsn Reduce lock between read/write
Cluster Name CPU Network Num
Val 2 X 12 cores 10GbE 16
OCC X DST
Workloads: TPC-E (read-mostly)
nDST also ~= RC in performance
48
Cluster Name CPU Network Num
AWS r4.2xlarge (8 vcpu) 10GbE 32
VALR 2 X 12 cores 100Gbp IB (RDMA) 16
Latency (ms)
Throughput (K reqs/sec)
Latency (ms)
Throughput (K reqs/sec) w/RDMA
OCC X DST
Workloads: TPC-E (read-mostly)
nNear zero-cost timestamp overhead
49
Cluster Name CPU Network Num
AWS r4.2xlarge (8 vcpu) 10GbE 32
VALR 2 X 12 cores 100Gbp IB (RDMA) 16
Latency (ms)
Throughput (K reqs/sec)
Latency (ms)
Throughput (K reqs/sec) w/RDMA
Conclusion
DST provides serializable & fresh MVCC in a networked setting
n Also generalized to different CCs
Efficient & Scalable: Near zero-cost timestamp overhead
Thanks & QA!