Idit Keidar
description
Transcript of Idit Keidar
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE1
Principles of Reliable Distributed Systems
Lecture 11: Atomic Shared
Memory Objects & Shared Memory Emulations
Idit Keidar
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE2
Material
• Attiya and Welch, Distributed Computing– Ch. 9 & 10
• Nancy Lynch, Distributed Algorithms– Ch. 13 & 17
• Linearizability slides adapted from Maurice Herlihy
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE3
Shared Memory Model
• All communication through shared memory!– No message-passing.
• Shared memory registers/objects.
• Accessed by processes with ids 1,2,…
• Note: we have two types of entities: objects and processes
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE4
Motivation• Multiprocessors with shared memory• Multi-threaded programs• Distributed shared memory (DSM)• Abstraction for message passing systems –
we will see how to:– Emulate shared memory in message passing
systems– Use shared memory for consensus and state
machine replication
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE5
Linearizability (Atomicity)Semantics for Concurrent
Objects
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE6
FIFO Queue: Enqueue Method
q.enq( )
Process
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE7
FIFO Queue: Dequeue Method
q.deq()/
Process
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE8
Sequential Objects
• Each object has a state– Usually given by a set of fields– Queue example: sequence of items
• Each object has a set of methods– Only way to manipulate state– Queue example: enq and deq methods
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE9
Methods Take Time
time
Method call
invocation 12:00
q.enq(...)
response 12:01
void
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE10
Split Method Calls into Two Events
• Invocation– Method name & args– q.enq(x)
• Response– Result or exception– q.enq(x) returns void– q.deq() returns x– q.deq() throws empty
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE11
A Single Process (Thread)
• Sequence of events
• First event is an invocation
• Alternates matching invocations and responses
• This is called a well-formed interaction
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE12
Concurrent Methods Take Overlapping Time
time
Method call Method call
Method call
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE13
Concurrent Objects
• What does it mean for a concurrent object to be correct?
• What is a concurrent FIFO queue?– FIFO means strict temporal order– Concurrent means ambiguous temporal order
• Help!
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE14
Sequential Specifications
• Precondition, say for q.deq(…)– Queue is non-empty
• Postcondition:– Returns & removes first item in queue
• You got a problem with that?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE15
Concurrent Specifications
• Naïve approach– Object has n methods– Must specify O(n2) possible interactions– Maybe more
If the queue is empty and then enq begins and deq begins after enq(x) begins but before enq(x) ends and then enq returns before deq then…
• Linearizability: same as it ever was
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE16
Linearizability
• Each method should –– “Take effect”
• Effect defined by the sequential specification
– Instantaneously• Take 0 time
– Between its invocation and response events• Real-time order• Pending method (invocation and no response) can
either occur after its invocation or not at all
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE17
Linearization
• A linearization of a concurrent execution is1. A sequential execution
• Each invocation is immediately followed by its response
• Satisfies the object’s sequential specification
2. Looks like • Responses to all invocations are the same as in • Responses to pending invocations in may be added
3. Preserves real-time order• Each invocation-response pair occurs between the
corresponding invocation and response in
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE18
Linearizability and Atomicity
• A concurrent execution that has a linearization is linearizable
• An object that has only linearizable executions is atomic
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE19
Why Linearizability?
• “Religion”, not science
• Scientific justification:– Facilitates reasoning– Nice mathematic properties
• Common-sense justification– Preserves real-time order– Matches my intuition (sorry about yours)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE20
Example
time
q.enq(x)
q.enq(y) q.deq(x)
q.deq(y)
time
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE21
Example
time
q.enq(x)
q.enq(y)
q.deq(y)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE22
Example
time
q.enq(x)
q.deq(x)
time
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE23
Example
time
q.enq(x)
q.enq(y)
q.deq(y)
q.deq(x)
time
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE24
Read/Write Variable Example
time
read(1)write(0)
write(1)
time
read(0)
write(1) happened
after write(0)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE25
Read/Write Variable Example
time
read(1)write(0)
write(1)
write(2)
time
read(1)write(1) already
happened
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE26
Read/Write Variable Example
time
read(1)write(0)
write(1)
write(2)
time
read(2)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE27
Concurrency
• How much concurrency does linearizability allow?
• When must a method invocation block?
• Focus on total methods– Defined in every state– Why?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE28
Concurrency
• Question: when does linearizability require a method invocation to block?
• Answer: never!
• Linearizability is non-blocking
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE29
Non-Blocking Theorem
If method invocationA q.invoc()
is pending in linearizable history H, then there exists a responseA q:resp()
such thatH + A q:resp()
is legal
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE30
Note on Non-Blocking
• A given implementation of linearizability may be blocking
• The property itself does not mandate it– For every pending invocation, there is always a
possible return value that does not violate linearizability
– The implementation may not always know it…
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE31
Atomic Objects
• An object is atomic if all of its concurrent executions are linearizable
• What if we want an atomic operation on multiple objects?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE32
Serializability
• A transaction is a finite sequence of method calls
• A history is serializable if transactions appear to execute serially– It is strictly serializable if the order is also
compatible with real-time
• Used in databases, more recently, transactional memory
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE33
Serializability is Blocking
x.read(0)
y.read(0) x.write(1)
y.write(1)
deadlock
Transaction
Transaction
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE34
Comparison
• Serializability appropriate for– Fault-tolerance– Multi-step transactions
• Linearizability appropriate for– Single objects– Multiprocessor synchronization
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE35
Critical Sections
• Easy way to implement linearizability– Take sequential object– Make each method a critical section
• Like synchronized methods in Java™
• Problems?– Blocking– No concurrency
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE36
Linearizability Summary
• Linearizability– Operation takes effect instantaneously between
invocation and response
• Uses sequential specification– No O(n2) interactions
• Non-Blocking– Never required to pause method call
• Granularity matters
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE37
Atomic Register Emulation in a Message-Passing System
[Attiya, Bar-Noy, Dolev]
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE38
Distributed Shared Memory (DSM)
• Can we provide the illusion of atomic shared-memory registers in a message-passing system?
• In an asynchronous system?
• Where processes can fail?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE39
Liveness Requirement
• Wait-freedom: every operation by a correct process p eventually completes – In a finite number of p’s steps
• Regardless of steps taken by other processes– In particular, the other processes may fail
or take any number of steps between p’s steps
– But p must be given a chance to take as many steps as it needs. (Fairness).
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE40
Register
• Holds a value
• Can be read
• Can be written
• Interface: – int read(); /* returns a value */
– void write(int v); /* returns ack */
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE41
Take I: Failure-Free Case
• Each process keeps a local copy of the register
• Let’s try state machine replication– Step1: Implement atomic broadcast– How?
• Recall: atomic broadcast service interface:– broadcast(m)– deliver(m)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE42
Emulation with Atomic Broadcast (Failure-Free)
• Upon client request (read/write)– Broadcast (abcast) the request
• Upon deliver write request – Write to local copy of register– If from local client, return ack to client
• Upon deliver read request– If from local client, return local register value to client
• Homework questions: – Show that the emulated register is atomic– Is broadcasting reads required for atomicity?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE43
What If Processes Can Crash?
• Does the same solution work?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE44
ABD: Fault-Tolerant Emulation[Attiya, Bar-Noy, Dolev]
• Assumes up to f<n/2 processes can fail
• Main ideas: – Store value at majority of processes before
write completes
– read from majority
– read intersects write, hence sees latest value
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE45
Take II: 1-Reader 1-Writer (SRSW)
• Single-reader – there is only one process that can read from the register
• Single-writer – there is only one process that can write to the register
• The reader and writer are just 2 processes– The other n-2 processes are there to help
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE46
Trivial Solution?
• Writer simply sends message to reader – When does it return ack?– What about failures?
• We want a wait-free solution: – If the reader (writer) fails, the writer (reader)
should be able to continue writing (reading)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE47
SRSW Algorithm: Variables
• At each process:– x, a copy of the register– t, initially 0, unique tag associated with latest
write
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE48
SRSW AlgorithmEmulating Write
• To perform write(x,v)– choose tag > t– set x ← v; t ← tag– send (“write”, v, t) to all
• Upon receive (“write”, v, tag) – if (tag > t) then set x ← v; t ← tag fi– send (“ack”, v, tag) to writer
• When writer receives (“ack”, v, t) from majority (counting an ack from itslef too)– return ack to client
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE49
SRSW AlgorithmEmulating Read
• To perform read(x,v)– send (“read”) to all
• Upon receive (“read”) – send (“read-ack”, x, t) to reader
• When reader receives (“read-ack”, v, tag) from majority (including local values of x and t)– choose value v associated with largest tag– store these values in x,t– return x
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE50
Does This Work?
• Only possible overlap is between read and write– why?
• When a read does not overlap any write –– It reads at least one copy that was written by the latest
write (why?)– This copy has the highest tag (why?)
• What is the linearization order when there is overlap between read and write?
• What if 2 reads overlap the same write?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE51
Example
time
read(1) read(?)
write(1)
time
write(1) already
happened
finds a copy that was written
does not find a written copy
but local copy written by
read
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE52
Wait-Freedom
• Only waiting is for majority of responses
• There is a correct majority
• All correct processes respond to all requests– Respond even if the tag is smaller
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE53
Take III: n-Reader 1-Writer (MRSW)
• n-reader – all the processes can read
• Does the previous solution work?
• What if 2 reads by different processes overlap the same write?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE54
Example
time
read(1)
read(?)
write(1)
time
write(1) already
happened
finds a copy that was written
does not find a written
copy,returns 0
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE55
MRSW Algorithm Extending the Read
• When reader receives (“read-ack”, v, tag) from majority – choose value v associated with largest tag– store these values in x,t– send (“propagate”, x, t) to all (except writer)
• Upon receive (“propagate”, v, tag) from process i– if (tag > t) then set x ← v; t ← tag fi– send (“prop-ack”, x, t) to process i
• When reader receives (“prop-ack”, v, tag) from majority (including itself)– return x
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE56
The Complete Read
S1S1 S1
S2
Sn
.
.
.
S1
S2
Sn
.
.
.
S1
(“read”) (“read-ack”,v, t)
Phase 1: Read Phase 2 : Write-BackMulti-reader only
read() return
(“propagate”, v, t)(“prop-ack”)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE57
Take IV: n-Reader n-Writer (MRMW)
• n-writer – all the processes can write to the register
• Does the previous solution work?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE58
Playing Tag
• What if two writers use the same tag for writing different values?
• Need to ensure unique tags– That’s easy: break ties, e.g., by process id
• What if a later write uses a smaller tag than an earlier one?– Must be prevented (why?)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE59
MRMW Algorithm Extending the Write
• To perform write(x,v)– send (“query”) to all
• Upon receive (“query”) from i– send (“query-ack”, t) to i
• When writer receives (“query-ack”, tag) from majority (counting its own tag)– choose unique tag > all received tags– continue as in 1-writer algorithm
• What if another writer chooses a higher tag before write completes?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE
60
The Complete Write
S1S1 S1
S2
Sn
.
.
.
S1
S2
Sn
.
.
.
S1
(“query”) (“query-ack”, t)
Phase 1: ReadMulti-writer only
Phase 2: Write
write(v) ack
(“write”, v, t) (“ack”)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE61
How Long Does it Take?
• The write emulation– Single-writer: 2 rounds (steps)– Multi-writer: 4 rounds (steps)
• The read emulation– Single-reader: 2 rounds (steps)– Multi-reader: 4 rounds (steps)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE62
What if A Majority Can Fail?
• You guessed it!
• Homework question
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE63
Can We Emulate Every Atomic Object the Same Way?
• We only emulated a read/write object
• Think of a general object type, with some data members and some methods
• Can we support it the same way?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE64
R/W Registers vs. Consensus
• ABD works even if the system is completely asynchronous
• In Paxos, there is no progress when there are multiple leaders
• Here, there is always progress – multiple writers can write concurrently– One will prevail (which?)